Tag: TroubleShooting

SNMP GET Errors in OpsMgr EventLog

I’ve been building a little SNMP Management Pack in the past few days to discover and monitor a bunch of PowerWare UPS’s, which turned out to take quite a lot more energy and time than expected. Mostly due to the facts that I am really bad with SNMP and how it works, I’ve never really looked into the inner working of building an SNMP management pack and also because we ran into a couple of errors preventing the discovery process to work alright. To make it clear right away, this is not going to be a “Building an SNMP Management Pack Tutorial” since there’s plentiful good ones out there already, and to be extra helpful I’m gonna include a few links right away: SNMP Setup and Simple Custom SNMP Discovery - Pretty much the basics SNMP Management Pack Example: NetApp Management Pack - Part 4 actually, but has the links to the other parts Creating SNMP Probe Based Monitors - No custom discovery, but a good and simple guide to SNMP Probes It’s the second, the NetApp one, I’ve used as a guide to building the UPS management pack since it goes through the process of building your own filtered discovery using SystemOID to identify your hardware-classes and then building the monitors on top of those. Let’s get to it When building the discovery of my hardware classes I ran into problems. The discovery simply did not work. At first I got some strange errors about “invalid queries”, something that turned out to be related to me reading two guides–seriously though, pick one guide that is closest to what you want to achieve and stick to it–and mixing up the XPathQuery variables. Silly me. I got those errors to go away and I was able to get a few objects to my base-class, but none of the hardware classes who was populated through the return value of an SNMP OID got discovered. The only error I got this time was the following: Log Name: Operations ManagerSource: Health Service ModulesDate: 2010-09-02 11:19:12Event ID: 11001Task Category: NoneLevel: ErrorKeywords: ClassicUser: N/AComputer: CENSOREDDescription:Error sending an SNMP GET message to IP Address XX.XX.XX.XX, Community String:=CENSORED, Status 0x6c.One or more workflows were affected by this.Workflow name: CENSORED.MP.CLASS.DISCOVERYInstance name: CENSORED_DEVICENAMEInstance ID: {5C7EFB30-D885-8843-0DD7-EA86B4FD2311}Management group: CENSORED I went through all the other logical steps of troubleshooting an error like that which include double-checking firewall settings, OIDs, IP-addresses, allowed hosts and so forth. It wasn’t until I loaded the PowerMIB into a MIB Browser installed on the proxy machine (in this case a Management Server) I realized that there was no problem sending an SNMP GET to the UPS from that server. I launched Wireshark and had it listen to SNMP traffic between the UPS and the Management Server. The thing that struck me right-away was the fact that I could see the a bunch of “SNMP Get-Request” but no “SNMP Get-Response” which means that Operations Manager did send an SNMP GET but there was no response. After a bit of intense staring i noticed what you see in the screenshot.

Linux Discovery – Not Enough Entropy

Error Description Here’s a little trouble-shooting guide for discovering Linux systems from OpsMgr R2 when getting the following error from the wizard: <stdout>Generating certificate with hostname="COMPUTERNAME"[/home/serviceb/TfsCoreWrkSpcRedhat/source/code/tools/scx_ssl_config/scxsslcert.cpp:198]Failed to allocate resource of type random data: Failed to get random data - not enough entropy</stdout><stderr>error: %post(scx-1.0.4-248.i386) scriptlet failed, exit status 1</stderr><returnCode>1</returnCode><DataItem type="Microsoft.SSH.SSHCommandData" time="2009-08-05T11:15:01.5800358-04:00" sourceHealthServiceId="0EB1D6DA-202C-7FC5-3D46-BDBB9208547D"><SSHCommandData><stdout>Generating certificate with hostname="COMPUTERNAME"[/home/serviceb/TfsCoreWrkSpcRedhat/source/code/tools/scx_ssl_config/scxsslcert.cpp:198]Failed to allocate resource of type random data: Failed to get random data - not enough entropy</stdout><stderr>error: %post(scx-1.0.4-248.i386) scriptlet failed, exit status 1</stderr><returnCode>1</returnCode></SSHCommandData></DataItem> But first, a little background on the actual “problem”. To generate the certificate, the entropy needs to be high enough to generate random data for the certificate creation. Without the certificate, the OpsMgr agent won’t be able to open up communications with the MS. So, what creates this entropy we need? Bluntly put, a selection of hardware components that are likely to produce non-predictable data. Like a keyboard, mouse and a monitor or videocard. Of course, there’s a lot more to it, but we really don’t need to know this. What we need to know is that there has to be a “bit bucket” of more than 256bytes of entropy for the certificate creation process to succeed. We also need to know that more enterprise-ish servers, like rack- or blade-servers tend to be void of things like directly attached keyboards, mouses and monitors that the linux kernel needs to be able to generate entropy. And herein lies the problem. If you have a new server that is not in full service (likely since we are trying to deploy the monitoring on it) which means that there’s not much random data flowing through the hardware and there’s no keyboard or mouse or monitor connected to it there is quite the risk that the system entropy is going to be very low. Of the linux systems that I have been deploying OpsMgr agents to, about half have failed because of “Not enough entropy”. So, here’s the steps I usually takes to ensure that discovery works. I use PuTTY to connect to the soon-to-be-monitored servers. This guide also assumes that you have SU rights on the system since all of these steps (except #1) needs it. Workaround 1. Check you current entropy cat /proc/sys/kernel/random/entropy_avail Is it less than, or close to, 256? It probably is. If you don’t feel like connecting a mouse and start wiggling it around—not really feasible in a data center—and see if the entropy increases, you can generate your own random data. 2. Generate you own random data. Be advised that this forced entropy will not be as random as the system-created on and thus not as secure. How much more insecure it is, I don’t know, and quite frankly I prefer to have my systems monitored yet slightly less secure than not monitored at all. Anyway, you can force your own random data by running: