Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
Check_iDRAC for DELL iDRAC
- Nagios 3.x
- Nagios 4.x
- Nagios XI
Meet The New Nagios Core Services Platform
Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.
Monitoring Made Magically Better
- Nagios Core on Overdrive
- Powerful Monitoring Dashboards
- Time-Saving Configuration Wizards
- Open Source Powered Monitoring On Steroids
- And So Much More!
Latest: 2.1
https://github.com/dangmocrang/check_idrac
http://www.cloudarch.club
- Virtual Disk
- Physical Disk
- Memory
- CPU
- Power Supply
- Power Unit
- Fan
- Battery
- Temperature Sensor
check out MANUAL on github
------------------------
1.0
- Deprecated
------------------------
The plugin extract the needed data, but the amount of output can be confusing.
For example, checking all memory banks will return the correct status, but the first line will always correspond to the first DIMM.
Instead, you should use the first line to give an overview of the status (i.e. checking all memory could return: "WARNING - Dimm3 nonok"), and then all the other output lines to give the status of each component. In this way, SMS and other messages will return the correct component with warn or crit state.
Hi, script is not intended to be used this way, it was intended to check only one device, i.e one Memory dimm at a time.
PU 1: UNKNOWN(!)/OTHER(!), RedundancyStatus: OTHER(!), SystemBoard Pwr Consumption: 168 W | PwrConsumption=168;;;;
Is possible fixed?
Thx
Because your device returns output that way :) script simply maps the OID with MIB and translate into human readable. You should investigate why it returns that output
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_idrac", line 841, in
result, exit_code = PARSER().main()
File "/usr/lib/nagios/plugins/check_idrac", line 664, in main
hw_dict, exit_code = self.raise_alert(hw_dict, value_on_alert)
File "/usr/lib/nagios/plugins/check_idrac", line 494, in raise_alert
if int(tmp[key][stat_t]) >= conf['fan_thresholds'][1]:
ValueError: invalid literal for int() with base 10: '3600(!)'
Hi,
Fixed with new merge (from another guy).
One feature request is that a nagios alert could be done on a whole group instead of each individual item. Mainly for FAN, MEMORY, SENSOR and PDISK groups
For example doing "-w MEM#0" would return something like
OK - All memory ENABLED/OK
or
CRITICAL - DIMM Socket B7 failed ...
while "-w MEM" would still return the non-alert full info like it does now.
Thanks,
script currently limited support for global health, it will return with exit code of alert but now print prefix (OK, CRIT, WARN) and not combine check result, this can use with multi line check view in front-end
If removed PSU does not remove SNMP OID in idrac, then you must add 'lost' to state alert for script to know.
If SNMP OID removed also, then script will say that hardware not exist
However, at least in my experiments the no_alerts option seemed to be switched on by default and that means that even though the HW issues (like an unplugged PS) were detected, they did not trigger a Nagios alert.
For example:
$ ./check_idrac -f ./check_idrac.conf -H 10.0.0.1 -w ps
PS 1: CRITICAL, Volt I/O: 264 V/(N/A) V, Current: (N/A) A, Watt I/O: 900.0 W/750.0 W
PS 2: OK, Volt I/O: 264 V/246.0 V, Current: 0.4 A, Watt I/O: 900.0 W/750.0 W
$ echo $?
0
So the correct exit code was not set.
I could find no other way to fix it than to change the code:
179c179
opts.no_alert = True
Also the PS and PU options should be better implemented / documented. Both, the wat-warn and the wat-crit need to be defined, otherwise you'll get a parsing error:
$ ./check_idrac -f ./check_idrac.conf -H 10.0.0.1 -w ps --wat-warn=100,500
Error parsing threshold.
I believe the options should be parsed in separate code blocks or at least the error should be more precise. (This specific code starts on the line 256).
Thank you for the plugin and hope these issues will get addressed to make it a little bit more user friendly.
Hi,
You should visit my github, there is a manual.
The reason script does not return alert because you are using it wrong way :x
Regards,
Here´s one example (1 PS is not connected, so it should get critical)
./check_idrac_2.py -H 10.xx.xx.xx -c public -m /usr/local/icinga/libexec/iDRAC-SMIv2.mib -w PS#2 --conf /usr/local/icinga/libexec/check_idrac.conf
OK - PS 2: CRITICAL, Volt I/O: 264 V/0.084 V, Current: 8.4 A, Watt I/O: 900.0 W/750.0 W
Also when I change the OK/WARN/CRIT Parameters in the Config, it also show´s alway OK.
With version 2.0b2 it works fine.
I think with the bugfix 2.0b3 the script ignors the config?
It's realy strange. I tested with CRIT changed and it work fine.
Does 2.0b4 work for you?
On the plugin's page, it's unclear which one to download, "check_iDRAC.tar.gz" or "idrac_2.0b.tar.gz"? Is check_iDRAC.tar.gz version 1? (It turns out: Yes, check_iDRAC.tar.gz is version 1.x.)
At the time of writing this, the files are weird, in that they are double-gzipped(?).
I think it's a shame that generation 2 demands a configuration file. I suggest that the general case be made simple (and not require a configuration file): Check all hardware's status using SMNP version 2c with "public" as community shouldn't require anything but a "-H ".
I couldn't get version 2.0b to work. After having created the configuration file, I'm trying:
$ python idrac/idrac_2.0/idrac_2.0b.py -c check_idrac.conf -H foobar
Unknown flag passed to -C: u
I don't understand that, as I didn't put in a "-C" argument.
I'm giving a rating of Average, as generation 1 is fine, but generation 2 is bad at the time of writing.
Thanks for your review. I will check the "-C" error and consider authen snmp as option.