Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
check_esxi_hardware.py
- Nagios 1.x
- Nagios 2.x
- Nagios 3.x
- Nagios 4.x
- Nagios XI
Meet The New Nagios Core Services Platform
Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.
Monitoring Made Magically Better
- Nagios Core on Overdrive
- Powerful Monitoring Dashboards
- Time-Saving Configuration Wizards
- Open Source Powered Monitoring On Steroids
- And So Much More!
https://www.claudiokuenzler.com/monitoring-plugins/check_esxi_hardware.php
Version History
------------------
20080820 Initial release by David Ligeret
20080821 Add verbose mode by David Ligeret
20090219 Add try/except to catch AuthError and CIMError by Joshua Daniel Franklin
20100202 Added HP Support (HealthState) by Branden Schneider
20100512 Combined different versions (Joshua and Branden) and added hardware type switch
20100628 Outputs server model, s/n and bios version and set Unknown as default exit code by Samir Ibradzic
20100702 GlobalStatus was incorrectly getting (re)set to OK with every CIM element check by Aaron Rogers
20100705 After last version all Dell servers return UNKNOWN instead of OK, added Aaron's logic for Dell checks as well
20101028 Changed text in Usage and Example so people dont forget to use https://
20110110 If Dell Blade Servers were used, Serial Number of Chassis instead of Blade was returned - by Ludovic Huttin
20110207 Bugfix/new feature for Intel server systems by Carsten Schoene
20110215 Plugin now catches Socket Error (Timeout Error) and added a timeout parameter by Ludovic Hutin
20110221 Removed recently added timeout parameter due to incompatibility on Windows systems
20110221 Changed plugin name from check_esxi_wbem.py to check_esxi_hardware.py
20110426 Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson on an IBM x3550.
20110503 Plugin rewritten, added automatic hardware detection, opt params, perfdata and much more by Phil Randal
20110504 Some minor code changes, removed typo, bugfix for voltage sensors on IBM server by Phil Randal
20110505 Added possibility to use first line of a file as password (file:) by Fredrik Åslund
20110507 A lot of bugfixes and enhancements from Phil Randal (see changelog in plugin for details)
20110520 Bugfix for IBM Blade Servers by Bertrand Jomin
20110614 Rewrote external file handling, file can now be used for password AND username
20111003 Added ignore option to ignore certain elements by Ian Chard
20120402 Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
20120405 Fix lookup of warranty info for Dell by Phil Randal
20120501 Bugfix in manufacturer discovery when cim entry not found or empty by Craig Hart
20121027 Workaround for Dell PE x620 for Riser Config Err 0: Connected element (wrong return code)
20130424 Another workaround for Dell systems "System Board 1 LCD Cable Pres 0: Connected"
20130702 Improving wrong authentication timeout and exit UNKNOWN by Carl R. Friend
20130725 Fix lookup of warranty info for Dell by Phil Randal
20140319 Another workaround for Dell systems "System Board 1 VGA Cable Pres 0: Connected"
20150109 Output serial number of chassis if a blade server is checked
20150119 Fix NoneType element bug by Andreas Gottwald
20150626 Added support for patched pywbem 0.7.0 and new version 0.8.0, handle SSL error exception
20150710 Exit Unknown instead of Critical for timeouts and auth errors by Stanislav German-Evtushenko
20151111 Cleanup and define variables by Stefan Roos
20160411 Distinguish between/add support for minor versions of pywbem 0.7 and 0.8
20160531 Add parameter for variable CIM port (useful when behind NAT)
20161013 Added support for pywbem 0.9.x (and upcoming releases)
20170905 Added option to ignore LCD/Display related elements (--no-lcd)
20180329 Try to use internal pywbem function to determine version
20180411 Throw an unknown if we can't fetch the data for some reason by Peter Newman
20181001 python3 compatibility
20180510 Allow regular expressions from ignore list (-r)
20190701 Fix lookup of warranty info for Dell (again) by Phil Randal
20200605 Added otion to ignore chassis intrusion elements (--no-intrusion) by Luca Berra
20200605 Add parameter (-S) for custom SSL/TLS protocol version
20200710 Improve missing mandatory parameter error text (issue #47), Delete temporary openssl config file after use (issue #48)
20210809 Fix TLSv1 usage (issue #51)
20220708 Added JSON-output (Zabbix needs it) by Marco Markgraf
Requirements
------------------
- Python must be installed (both Python2 and Python3 are supported)
- The Python extension pywbem must be installed
- If there is a firewall between your monitoring and ESXi server, open tcp port 5989 (or the port you define with -C)
- The CIM server and agent must be running on the ESXi server. Starting from ESXi 6.5 the CIM agent is disabled by default. See VMware KB 1025757 for more information.
Usage
------------------
./check_esxi_hardware.py -H esxi-server-ip -U username -P mypass [-C -S -V -i -r -v -p -I]
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 608, in ?
pywbemversion = pkg_resources.get_distribution("pywbem").version
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 229, in get_distribution
if isinstance(dist,Requirement): dist = get_provider(dist)
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 115, in get_provider
return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 585, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 483, in resolve
raise DistributionNotFound(req) # XXX put more info here
pkg_resources.DistributionNotFound: pywbem
I have the following version of pywbem on the system.
pywbem.noarch 0:0.7.0-3.el5
I tried reinstalling that package but no change.
Anyone have any ideas please?
I'm runing this check on 3 ESXi Servers.
On 2 Server it's all OK.
The third Server is making problems. The result of the check says "Memory Critical". I checked the Memory of the Server and all the memory is OK.
I think the problem is here:
20160713 12:24:21 Check classe CIM_Memory
20160713 12:24:22 Element Name = Socket 1 Level-1 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 1 Level-2 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 1 Level-3 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-1 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-2 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-3 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Memory
20160713 12:24:22 Element Op Status = 6
When I comment out the line "CIM_Memory" then all is OK.
This is no solution because I want to monitor the Memory.
Can anyone please help me ?
i'm checking 3 same ESXI Servers with the plugin.
With 2 Servers all is OK.
The third Server is giving me a Memory Critical, I checked the Server Memory and all the Memory is okay there is no failure.
I think the problem is here:
Check classe CIM_Memory
20160713 11:00:09 Element Name = Socket 1 Level-1 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 1 Level-2 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 1 Level-3 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-1 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-2 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-3 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Memory
20160713 11:00:09 Element Op Status = 6
20160713 11:00:09 Global exit set to CRITICAL
If I comment out "CIM_Memory" then the plugin shows "OK", but thats no solution because I want to monitor the memory.
Can anyone help me please?
I trying to monitor VMware ESXi 5.1.0 build-1065491 (Update 1) on the server ProLiant DL360p Gen8.
My enviroment:
Centos 5.9 x386
Python 2.7
Nagios 4.0.8
check_esxi_hardware.py version 20150710.
I installed python-pywbem (0.7.0) extension from here http://pywbem.github.io/pywbem/installation.html
When I try to check I receive the error:
# ./check_esxi_hardware.py -H 192.168.33.252 -U root -P passw -V hp
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 646, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 168, in imethodcall
verify_callback = self.verify_callback)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_http.py", line 184, in wbem_request
h.putheader('Content-length', len(data))
File "/usr/local/lib/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, int found
Can anybody help me? Please)
Please try it with the newest version (20160411 as of today) and the new and stable pywbem 0.8.x.
It works with ESXi 6.x.
On Claudios website, there's a very good faq section with all the latest infos - but to make things short: downgrade pybwm like this
apt install python-pywbem=0.7.0-4
this should temporarily fix the issue until they release an update.
for updates on this issue, look here:
https://bugs.launchpad.net/ubuntu/+source/pywbem/+bug/1434991
Happy Easter everyone =)
Thanks for that hint. This is now solved with the current version 20150626.
./check_esxi_hardware.py -H hostname -U root -P password -V hp
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 619, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 421, in EnumerateInstances
**params)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 183, in imethodcall
no_verification = self.no_verification)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 268, in wbem_request
h.endheaders()
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 115, in send
self.connect()
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 167, in connect
except ( Err.SSLError, SSL.SSLError, SSL.SSLTimeoutError
AttributeError: 'module' object has no attribute 'SSLTimeoutError'
Any ideas?
I have found similar problems reported elsewhere some are refering to failed module imports. I can't say I know my way around Python but I am willing to try.
This is fixed in version 20150626.
WARNING : System Board 8 Memory: Correctable ECC logging limit reached -
Server: HP ProLiant DL380 G7
after replacing it to the new one I have another problem:
WARNING : Memory - Server: HP ProLiant DL380 G7
I think that the problem is in the section "Element Op Status = 3"
20150203 11:20:32 Check classe CIM_Memory
20150203 11:20:32 Element Name = Proc 1 Level-1 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 1 Level-2 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 1 Level-3 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-1 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-2 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-3 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Memory
20150203 11:20:32 Element Op Status = 3
20150203 11:20:32 GLobal exit set to WARNING
Error:
CRITICAL: (0, 'Socket error: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol')
I've searched the web and also searched for solutions myself but I couldn't find anything. The webinterface is reachable and the SSL certificate is working just fine.
It works on all the other ESXi servers except for this one.
Does somebody know the solution for this error?
Gr,
Channing
The Netherlands
Another user had the same problem and it seemed to be a problem of the ESXi host. After a restart of this particular ESXi host, the check worked again.
Here is a better work around until VMware fixes 'CIM interaction' permission (which doesn't work at least since 4.0 and up to the recent 5.5):
1) Create a local user 'nagios' on a ESXi host
2) Add a cron job to check and update /etc/security/access.conf
user=nagios; access=/etc/security/access.conf; crontab=/var/spool/cron/crontabs/root; grep $access $crontab > /dev/null || cat $crontab
*/5 * * * * grep '^+:$user:sfcb$' $access > /dev/null || sed -i '2i +:$user:sfcb' $access
EOF
3) Done!
Now you can use nagios user to check check_esxi_hardware.py, no special roles or permissions are needed.
One of the best scripts i found for monitoring ESXi Hosts without vCenter!
I'm using it on Fujitsu Servers, Hardware Option = intel.
The Script detects Disk Errors without Problems, but if an BBU needs to be replaces the Scrips returns OK??
Scriptoutput:
OK - Server: FUJITSU System BIOS: V4.6.5.3 R1.15.0 for D2939-A1x 2012-09-12
ESXi 4.1
Status = Critical
BBU on Controller 0 (Health State Not Good)
Maybe anybody can help to find an solution
cheers
Roland
I have latest patches updated on ESXi5.1 and when I monitor the esxi dell r710 hardware using the plugin, it shows me warning that my RAID controller is not working properly i.e. "WARNING : Controller XXXXXXXXXXXXXXX(PERC H700 Integrated) WARNING : Controller XXXXXXXXXXXXXXX(PERC H700 Integrated)" However there is no warning in vSphere Hardware Status tab. Is the plugin showing wrong info or vSphere not showing correct information.
I am confused. Please suggest.
This is because vsphere client uses the CIM "HealthState" while the plugin uses "OperationalState" responses for Dell servers. Hence the difference. There is a simple reason to this: In the past, Dell servers rarely submitted good HealthState information. Since more or less 2013, Dell seems to have switched to HealthState. I intend to change this in the future.
Works perfectly, out of the box for my Dell R905s & R620s
As an ugly patch, I inserted the following before line 198 in pywbem/cim_operations.py.
resp_xml = filter(lambda x: x in string.printable, resp_xml)
I'm running it on Esxi 5.1 UP2 and BL460c G8.
Enabling verbose mode it waits forever at VMware_Controller:
...
20131227 06:42:09 Element Name = Fan 1
20131227 06:42:09 Element Op Status = 2
20131227 06:42:09 Check classe OMC_PowerSupply
20131227 06:42:09 Element Name = Power Supply 1
20131227 06:42:09 Element Op Status = 2
20131227 06:42:09 Element Name = Power Supply 2
20131227 06:42:09 Element Op Status = 0
20131227 06:42:09 Check classe VMware_StorageExtent
20131227 06:42:10 Check classe VMware_Controller
Any idea why? Thanks
Daniel
Please refer to the FAQ, maybe you'll find your solution: http://www.claudiokuenzler.com/blog/308/check-esxi-hardware-faq-frequently-asked-questions
define command{
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$
}
define service{
use generic-service
hostgroup_name VMWare-servers
service_description System: ESXi Hardware status
check_command check_esxi_hardware!root!mypass!hp
}
I have verified that the check_esxi_hardware.py is executable, considering I am able to run it from the command line. I used chmod -R 755 root.root /usr/lib64/nagios/plugins/check_esxi_hardware.py. So why am I getting a return code 126 from Nagios when the command is executable?
Check your resources.cfg and make sure that $USER1$ is actually /usr/lib64/nagios/plugins.
When I use the command
/usr/lib64/nagios/plugins/check_esxi_hardware.py -H xx.xx.xx.xxx -U root -P password -V dell
it work.
OK - Server: Dell Inc. PowerEdge XXX s/n: XXXXXX System BIOS: XX.xx.XX
However when I use the check_nrpe command
/usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c VMwareESXi
I got this error.
CRITICAL: (0, 'Socket error: [Errno 13] Permission denied')
Can you advice me what when wrong?