Tag: How-To

OpsMgr 2012 Agent & Gateway Failover - The Basics [#opsmgr, #powershell]

I have previously posted a few scripts on managing and configuring fail-over management servers on gateways and agents in System Center Operations Manager 2007 R2. Now that System Center 2012 Operations Manager is RTM and users are starting to explore the differences between the versions I see more and more questions on how you do, in OpsMgr 2012, what you did in OpsMgr 2007. In a few posts henceforth I will go through Agent and Gateway server fail-over configuration and management. In this first post I’ll look at the very basics of fail-over configuration, the cmdlets to use and some one-liners. The cmdlet First of all, the cmdlets of OpsMgr powershell have all got new names looking like Verb-SCOM_noun_ and to list them all in the console you can execute the following command: get-command *SCOM* The cmdlet we are looking for to set and manage primary and fail-over management servers is Get-SCOMParentManagementServer As usual, you can pass the cmdlet as a parameter to get-help for information about its parameters and a few use-cases. SYNOPSIS Changes the primary and failover management servers for an agent or gateway management server. SYNTAX Set-SCOMParentManagementServer -Agent -PrimaryServer [-PassThru ] [-Confirm ] [-WhatIf ] [] Set-SCOMParentManagementServer -Agent -FailoverServer [-PassThru ] [-Confirm ] [-WhatIf ] [] Set-SCOMParentManagementServer -GatewayServer -FailoverServer [-PassThru ] [-Confirm ] [-WhatIf ] [] Set-SCOMParentManagementServer -GatewayServer -PrimaryServer [-PassThru] [-Confirm] [-WhatIf] [] But that’s so boring to read the manual is a bit sketchy on how it behaves and the limitations.

Load-balanced SCOM2012 SDK Services for Network Illiterates [#opsmgr, #nlb]

Prelude Now that System Center Operations Manager no longer has that pesky Root Management Server role; a server role that in larger environments quickly became the choking point and made creating a fully Highly Available SCOM-environment both complex and frustrating to support and with little gain at that. With that gone and the SDK Service, or Data Access Service, thriving on all the Management Servers HA suddenly became pretty simple. All you have to do in SCOM2012 to make sure your management groups keep on kicking is to have at-least two Management Servers and your databases clustered. This new distributed architecture does not only give easy HA, it also makes it possible to connect to the SDK-service—be it using the Operations Console or powershell to name two options—on any Management Server. This, in turn, provides for a completely new level of scalability. Choked on sessions? Deploy a new Management Server! Anyway… given all this scalability and HA, would it not be nice if you could load-balance all these SDK-sessions you will be running from System Center Virtual Machine Manager, System Center Service Manager, System Center Orchestrator, regular scheduled powershell scripts and what-not? Of course it would! And you can! The simple solution is to use the built-in Network Load Balancer (NLB for short) feature in Windows Server and that’s what we’re going to discuss in this post. Before we go, I’d like to point to a great article written by Justin Cook that is covering most bases but in a less for-dummies way. So, yeah… I suppose this is the for-dummies version then. 😉 Enjoy! Prerequisites We need to have the Network Load Balancing feature installed on all our targeted Management servers. The quick way to do this is using command-line (Windows Server 2008 R2 or later?). dism /online /enable-feature /featurename:NetworkLoadBalancingFullServer

Installing Linux Integration Services v2.1 on Red Hat ES 5

Ok, so I got the task to install the Linux Integration Service for Hyper-V R2 on a RedHat Enterprise Server 5. Something that turned out to be a bit more to handle than I would have thought. So here’s a little How-To. Preparations Read the documentation provided in the Linux Integration Services download. Much of the information in this article is in there, but some parts are not. Otherwise I would not have bothered writing about it. 😉 I’m not going to go through the OS installation process here, but make sure to select the “Software Development” packages since you will be needing it. In case you missed it, you can install them later by running these commands. # yum groupinstall "Development Tools"# yum install kernel-headers I’m not actually sure that you need to run the kernel-headers install manually or if it’s included in the “Development Tools” package. The first gotcha i ran into was the fact that the link to the Linux Integration Services–previously known as Linux Integration Components or LinuxIC–on RedHat’s information pages gave me a 404 and a redirect to a bing-search that returned the exact same 404. The page have simply been removed by Microsoft without any form of redirection to the new page. Anyway, a search on http://download.microsoft.com for “Linux Integration Components” do return the new page, and that’s where I learned about the new name. Thank you for making it easy for us Microsoft! Here’s a direct link to the search on the current name. And here’s a direct link to the actual download page. This download contains an ISO file that you can mount using the Hyper-V- or VMM-console, or you can do as I did and download the ISO to the virtual machine, mount it locally, copy the files and unmount it. Like this. # mkdir /mnt/ISO# mount -o loop /root/LinuxIC v21.iso /mnt/ISO# mkdir /opt/linux_ic_v21_rtm# cp /mnt/ISO/* -R /opt/linux_ic_v21_rtm/# umount /mnt/ISO

Linux Discovery – Not Enough Entropy

Error Description Here’s a little trouble-shooting guide for discovering Linux systems from OpsMgr R2 when getting the following error from the wizard: <stdout>Generating certificate with hostname="COMPUTERNAME"[/home/serviceb/TfsCoreWrkSpcRedhat/source/code/tools/scx_ssl_config/scxsslcert.cpp:198]Failed to allocate resource of type random data: Failed to get random data - not enough entropy</stdout><stderr>error: %post(scx-1.0.4-248.i386) scriptlet failed, exit status 1</stderr><returnCode>1</returnCode><DataItem type="Microsoft.SSH.SSHCommandData" time="2009-08-05T11:15:01.5800358-04:00" sourceHealthServiceId="0EB1D6DA-202C-7FC5-3D46-BDBB9208547D"><SSHCommandData><stdout>Generating certificate with hostname="COMPUTERNAME"[/home/serviceb/TfsCoreWrkSpcRedhat/source/code/tools/scx_ssl_config/scxsslcert.cpp:198]Failed to allocate resource of type random data: Failed to get random data - not enough entropy</stdout><stderr>error: %post(scx-1.0.4-248.i386) scriptlet failed, exit status 1</stderr><returnCode>1</returnCode></SSHCommandData></DataItem> But first, a little background on the actual “problem”. To generate the certificate, the entropy needs to be high enough to generate random data for the certificate creation. Without the certificate, the OpsMgr agent won’t be able to open up communications with the MS. So, what creates this entropy we need? Bluntly put, a selection of hardware components that are likely to produce non-predictable data. Like a keyboard, mouse and a monitor or videocard. Of course, there’s a lot more to it, but we really don’t need to know this. What we need to know is that there has to be a “bit bucket” of more than 256bytes of entropy for the certificate creation process to succeed. We also need to know that more enterprise-ish servers, like rack- or blade-servers tend to be void of things like directly attached keyboards, mouses and monitors that the linux kernel needs to be able to generate entropy. And herein lies the problem. If you have a new server that is not in full service (likely since we are trying to deploy the monitoring on it) which means that there’s not much random data flowing through the hardware and there’s no keyboard or mouse or monitor connected to it there is quite the risk that the system entropy is going to be very low. Of the linux systems that I have been deploying OpsMgr agents to, about half have failed because of “Not enough entropy”. So, here’s the steps I usually takes to ensure that discovery works. I use PuTTY to connect to the soon-to-be-monitored servers. This guide also assumes that you have SU rights on the system since all of these steps (except #1) needs it. Workaround 1. Check you current entropy cat /proc/sys/kernel/random/entropy_avail Is it less than, or close to, 256? It probably is. If you don’t feel like connecting a mouse and start wiggling it around—not really feasible in a data center—and see if the entropy increases, you can generate your own random data. 2. Generate you own random data. Be advised that this forced entropy will not be as random as the system-created on and thus not as secure. How much more insecure it is, I don’t know, and quite frankly I prefer to have my systems monitored yet slightly less secure than not monitored at all. Anyway, you can force your own random data by running:

Replace/Change a Gateway Server

Description of problem If you are looking into replacing an (or just switching to another primary) Operations Manager 2007 Gateway Server for any reason, there’s a little more to consider than just right-clicking the clients and selecting “Change Primary Management Server” in the Operations Console. You could end up with agents not being able to connect to the Management Group at all due to a small problem with the order in which Operations Manager do things. Here’s basically what happens: You tell Operations Manager to change Primary Management Server for AGENTX from GW1 to GW2. The SDK Service (i guess) tells GW1 that “You’re no longer the Primary Management Server for AGENTX” GW1 acknowledges this and stops talking to AGENTX. And I mean Completely stops talking to AGENTX. OpsMgr then tells GW2 to start accepting communication from AGENTX. OpsMgr tries to tell AGENTX that it should talk to GW2 since GW1 won’t listen. Spotted the problem? This modus operandi probably works when agents are on the same network and in the same domain where fail-over is sort of automatic. The problem we are facing now is that the server are telling the Gateway to stop accepting communications to and from the agent before the agent is notified that there is a new Gateway server to talk to. The agent will continue to talk to GW1 but will be completely ignored and you will probably start seeing events in the Operations Manager eventlog on GW1 with EventID 20000. How do I get around this little feature then? No matter if you found this article after running into the mentioned troubles or if you are googling ahead of time to be prepared, the fix is the same and consists of a few powershell scripts. These scripts are out there allready, but in different contexts, hence this post. First step: Install the new Gateway Documentation on this from Microsoft is good enough, but here’s the short version. Verify name resolution to and from Gateway server and Management Server Create certificate for the Gateway server Approve the Gateway server Install Gateway server Import certificates on Windows system Run MOMCertImport.exe on Gateway server to add the certificate into Gateway server configuration Wait