Search Exchange

Search All Sites

Nagios Live Webinars

Let our experts show you how Nagios can help your organization.

Contact Us

Phone: 1-888-NAGIOS-1
Email: sales@nagios.com

Login

Remember Me

Directory Tree

check_esxi_hardware.py

Rating
70 votes
Favoured:
13
Current Version
20220708
Last Release Date
2022-07-08
Compatible With
  • Nagios 1.x
  • Nagios 2.x
  • Nagios 3.x
  • Nagios 4.x
  • Nagios XI
Owner
License
GPL
Hits
345005
Nagios CSP

Meet The New Nagios Core Services Platform

Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.

Monitoring Made Magically Better

  • Nagios Core on Overdrive
  • Powerful Monitoring Dashboards
  • Time-Saving Configuration Wizards
  • Open Source Powered Monitoring On Steroids
  • And So Much More!
check_esxi_hardware (formerly known as check_esx_wbem) is an open source monitoring plugin to monitor the hardware of ESXi (and previously ESX) servers. It queries the CIM (Common Information Model) server running on the ESXi server to retrieve the current status of all discovered hardware parts. The plugin can also be used as standalone script to check the hardware. The plugin is written in python and uses the pywbem module. It supports both python2 and python3. See Requirements for more information.
Please find the plugin, more information and full documentation, including FAQ, here:

https://www.claudiokuenzler.com/monitoring-plugins/check_esxi_hardware.php

Version History
------------------
20080820 Initial release by David Ligeret
20080821 Add verbose mode by David Ligeret
20090219 Add try/except to catch AuthError and CIMError by Joshua Daniel Franklin
20100202 Added HP Support (HealthState) by Branden Schneider
20100512 Combined different versions (Joshua and Branden) and added hardware type switch
20100628 Outputs server model, s/n and bios version and set Unknown as default exit code by Samir Ibradzic
20100702 GlobalStatus was incorrectly getting (re)set to OK with every CIM element check by Aaron Rogers
20100705 After last version all Dell servers return UNKNOWN instead of OK, added Aaron's logic for Dell checks as well
20101028 Changed text in Usage and Example so people dont forget to use https://
20110110 If Dell Blade Servers were used, Serial Number of Chassis instead of Blade was returned - by Ludovic Huttin
20110207 Bugfix/new feature for Intel server systems by Carsten Schoene
20110215 Plugin now catches Socket Error (Timeout Error) and added a timeout parameter by Ludovic Hutin
20110221 Removed recently added timeout parameter due to incompatibility on Windows systems
20110221 Changed plugin name from check_esxi_wbem.py to check_esxi_hardware.py
20110426 Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson on an IBM x3550.
20110503 Plugin rewritten, added automatic hardware detection, opt params, perfdata and much more by Phil Randal
20110504 Some minor code changes, removed typo, bugfix for voltage sensors on IBM server by Phil Randal
20110505 Added possibility to use first line of a file as password (file:) by Fredrik Åslund
20110507 A lot of bugfixes and enhancements from Phil Randal (see changelog in plugin for details)
20110520 Bugfix for IBM Blade Servers by Bertrand Jomin
20110614 Rewrote external file handling, file can now be used for password AND username
20111003 Added ignore option to ignore certain elements by Ian Chard
20120402 Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
20120405 Fix lookup of warranty info for Dell by Phil Randal
20120501 Bugfix in manufacturer discovery when cim entry not found or empty by Craig Hart
20121027 Workaround for Dell PE x620 for Riser Config Err 0: Connected element (wrong return code)
20130424 Another workaround for Dell systems "System Board 1 LCD Cable Pres 0: Connected"
20130702 Improving wrong authentication timeout and exit UNKNOWN by Carl R. Friend
20130725 Fix lookup of warranty info for Dell by Phil Randal
20140319 Another workaround for Dell systems "System Board 1 VGA Cable Pres 0: Connected"
20150109 Output serial number of chassis if a blade server is checked
20150119 Fix NoneType element bug by Andreas Gottwald
20150626 Added support for patched pywbem 0.7.0 and new version 0.8.0, handle SSL error exception
20150710 Exit Unknown instead of Critical for timeouts and auth errors by Stanislav German-Evtushenko
20151111 Cleanup and define variables by Stefan Roos
20160411 Distinguish between/add support for minor versions of pywbem 0.7 and 0.8
20160531 Add parameter for variable CIM port (useful when behind NAT)
20161013 Added support for pywbem 0.9.x (and upcoming releases)
20170905 Added option to ignore LCD/Display related elements (--no-lcd)
20180329 Try to use internal pywbem function to determine version
20180411 Throw an unknown if we can't fetch the data for some reason by Peter Newman
20181001 python3 compatibility
20180510 Allow regular expressions from ignore list (-r)
20190701 Fix lookup of warranty info for Dell (again) by Phil Randal
20200605 Added otion to ignore chassis intrusion elements (--no-intrusion) by Luca Berra
20200605 Add parameter (-S) for custom SSL/TLS protocol version
20200710 Improve missing mandatory parameter error text (issue #47), Delete temporary openssl config file after use (issue #48)
20210809 Fix TLSv1 usage (issue #51)
20220708 Added JSON-output (Zabbix needs it) by Marco Markgraf

Requirements
------------------
- Python must be installed (both Python2 and Python3 are supported)
- The Python extension pywbem must be installed
- If there is a firewall between your monitoring and ESXi server, open tcp port 5989 (or the port you define with -C)
- The CIM server and agent must be running on the ESXi server. Starting from ESXi 6.5 the CIM agent is disabled by default. See VMware KB 1025757 for more information.

Usage
------------------
./check_esxi_hardware.py -H esxi-server-ip -U username -P mypass [-C -S -V -i -r -v -p -I]
Reviews (55)
byAZTechGuy, November 30, 2016
I have been using this product for about 2 years now. i recently updated one of my hosts to ESXi 6.0 U2 and am now having issues with the hardware check. I wanted to see if there was an update coming or if anyone else had experience this? Also is there any planned support for the new 6.5 release?
byretro69, November 27, 2016
I tried to run the command but I get an error.

Traceback (most recent call last):
File "./check_esxi_hardware.py", line 608, in ?
pywbemversion = pkg_resources.get_distribution("pywbem").version
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 229, in get_distribution
if isinstance(dist,Requirement): dist = get_provider(dist)
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 115, in get_provider
return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 585, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 483, in resolve
raise DistributionNotFound(req) # XXX put more info here
pkg_resources.DistributionNotFound: pywbem

I have the following version of pywbem on the system.
pywbem.noarch 0:0.7.0-3.el5

I tried reinstalling that package but no change.


Anyone have any ideas please?
bychuckesn, July 12, 2016
Hello,
I'm runing this check on 3 ESXi Servers.
On 2 Server it's all OK.
The third Server is making problems. The result of the check says "Memory Critical". I checked the Memory of the Server and all the memory is OK.

I think the problem is here:
20160713 12:24:21 Check classe CIM_Memory
20160713 12:24:22 Element Name = Socket 1 Level-1 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 1 Level-2 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 1 Level-3 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-1 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-2 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Socket 2 Level-3 Cache
20160713 12:24:22 Element Op Status = 0
20160713 12:24:22 Element Name = Memory
20160713 12:24:22 Element Op Status = 6

When I comment out the line "CIM_Memory" then all is OK.
This is no solution because I want to monitor the Memory.

Can anyone please help me ?
bychuckes, July 12, 2016
Hi,
i'm checking 3 same ESXI Servers with the plugin.
With 2 Servers all is OK.
The third Server is giving me a Memory Critical, I checked the Server Memory and all the Memory is okay there is no failure.

I think the problem is here:

Check classe CIM_Memory
20160713 11:00:09 Element Name = Socket 1 Level-1 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 1 Level-2 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 1 Level-3 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-1 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-2 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Socket 2 Level-3 Cache
20160713 11:00:09 Element Op Status = 0
20160713 11:00:09 Element Name = Memory
20160713 11:00:09 Element Op Status = 6
20160713 11:00:09 Global exit set to CRITICAL

If I comment out "CIM_Memory" then the plugin shows "OK", but thats no solution because I want to monitor the memory.

Can anyone help me please?
Hello!
I trying to monitor VMware ESXi 5.1.0 build-1065491 (Update 1) on the server ProLiant DL360p Gen8.

My enviroment:
Centos 5.9 x386
Python 2.7
Nagios 4.0.8
check_esxi_hardware.py version 20150710.

I installed python-pywbem (0.7.0) extension from here http://pywbem.github.io/pywbem/installation.html

When I try to check I receive the error:

# ./check_esxi_hardware.py -H 192.168.33.252 -U root -P passw -V hp
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 646, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 168, in imethodcall
verify_callback = self.verify_callback)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_http.py", line 184, in wbem_request
h.putheader('Content-length', len(data))
File "/usr/local/lib/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, int found


Can anybody help me? Please)
Owner's reply

Please try it with the newest version (20160411 as of today) and the new and stable pywbem 0.8.x.

by4iter4, July 25, 2015
Realy nice script but no esxi 6.0 support. It is planned?
Owner's reply

It works with ESXi 6.x.

bySeraph, April 4, 2015
Not much of a Review, but a reply to @itheodoridis

On Claudios website, there's a very good faq section with all the latest infos - but to make things short: downgrade pybwm like this

apt install python-pywbem=0.7.0-4

this should temporarily fix the issue until they release an update.

for updates on this issue, look here:

https://bugs.launchpad.net/ubuntu/+source/pywbem/+bug/1434991

Happy Easter everyone =)
Owner's reply

Thanks for that hint. This is now solved with the current version 20150626.

Hello. The plugin worked like a charm with Ubuntu 12.04 LTS Server and nagios 3.5. I decided to upgrade to Ubuntu 14.04.02 LTS (and then possibly to Nagios 4) but right after the upgrade the plugin started producing errors. When I try this on the command line:
./check_esxi_hardware.py -H hostname -U root -P password -V hp
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 619, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 421, in EnumerateInstances
**params)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 183, in imethodcall
no_verification = self.no_verification)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 268, in wbem_request
h.endheaders()
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 115, in send
self.connect()
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 167, in connect
except ( Err.SSLError, SSL.SSLError, SSL.SSLTimeoutError
AttributeError: 'module' object has no attribute 'SSLTimeoutError'

Any ideas?
I have found similar problems reported elsewhere some are refering to failed module imports. I can't say I know my way around Python but I am willing to try.
Owner's reply

This is fixed in version 20150626.

Works great, but after problem with one of the memory
WARNING : System Board 8 Memory: Correctable ECC logging limit reached -
Server: HP ProLiant DL380 G7
after replacing it to the new one I have another problem:
WARNING : Memory - Server: HP ProLiant DL380 G7

I think that the problem is in the section "Element Op Status = 3"

20150203 11:20:32 Check classe CIM_Memory
20150203 11:20:32 Element Name = Proc 1 Level-1 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 1 Level-2 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 1 Level-3 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-1 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-2 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Proc 2 Level-3 Cache
20150203 11:20:32 Element Op Status = 0
20150203 11:20:32 Element Name = Memory
20150203 11:20:32 Element Op Status = 3
20150203 11:20:32 GLobal exit set to WARNING
byClawalatta, December 10, 2014
The script is great!! I've implemented it a couple of weeks ago. It worked perfectly fine but since about a week ago, one of the checks is giving an error.

Error:
CRITICAL: (0, 'Socket error: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol')

I've searched the web and also searched for solutions myself but I couldn't find anything. The webinterface is reachable and the SSL certificate is working just fine.

It works on all the other ESXi servers except for this one.

Does somebody know the solution for this error?


Gr,

Channing
The Netherlands
Owner's reply

Another user had the same problem and it seemed to be a problem of the ESXi host. After a restart of this particular ESXi host, the check worked again.

Adding nagios user to the root group is a big security hole. I suggest never doing this.

Here is a better work around until VMware fixes 'CIM interaction' permission (which doesn't work at least since 4.0 and up to the recent 5.5):

1) Create a local user 'nagios' on a ESXi host
2) Add a cron job to check and update /etc/security/access.conf
user=nagios; access=/etc/security/access.conf; crontab=/var/spool/cron/crontabs/root; grep $access $crontab > /dev/null || cat $crontab
*/5 * * * * grep '^+:$user:sfcb$' $access > /dev/null || sed -i '2i +:$user:sfcb' $access
EOF
3) Done!

Now you can use nagios user to check check_esxi_hardware.py, no special roles or permissions are needed.
bySturm, August 18, 2014
Hy!

One of the best scripts i found for monitoring ESXi Hosts without vCenter!
I'm using it on Fujitsu Servers, Hardware Option = intel.

The Script detects Disk Errors without Problems, but if an BBU needs to be replaces the Scrips returns OK??

Scriptoutput:
OK - Server: FUJITSU System BIOS: V4.6.5.3 R1.15.0 for D2939-A1x 2012-09-12

ESXi 4.1
Status = Critical
BBU on Controller 0 (Health State Not Good)


Maybe anybody can help to find an solution

cheers
Roland
Hi,

I have latest patches updated on ESXi5.1 and when I monitor the esxi dell r710 hardware using the plugin, it shows me warning that my RAID controller is not working properly i.e. "WARNING : Controller XXXXXXXXXXXXXXX(PERC H700 Integrated) WARNING : Controller XXXXXXXXXXXXXXX(PERC H700 Integrated)" However there is no warning in vSphere Hardware Status tab. Is the plugin showing wrong info or vSphere not showing correct information.

I am confused. Please suggest.
Owner's reply

This is because vsphere client uses the CIM "HealthState" while the plugin uses "OperationalState" responses for Dell servers. Hence the difference. There is a simple reason to this: In the past, Dell servers rarely submitted good HealthState information. Since more or less 2013, Dell seems to have switched to HealthState. I intend to change this in the future.

byTrevThorpe, June 27, 2014
Great avenue to add a basic health check.

Works perfectly, out of the box for my Dell R905s & R620s
bykairu0, June 5, 2014
Ran into an instance where I too got the "invalid tokens" errors when running the script. It turns out that my Chassis had an unprintable character in it (Thanks HP!).

As an ugly patch, I inserted the following before line 198 in pywbem/cim_operations.py.

resp_xml = filter(lambda x: x in string.printable, resp_xml)
bysmarechal, February 25, 2014
Thanks a lot for this great work!
I'm running it on Esxi 5.1 UP2 and BL460c G8.
It works fine on an IBM with ESXi 4.1 but it doesn't work on a HP ProLiant DL380 G5 with ESXi 5.5.
Enabling verbose mode it waits forever at VMware_Controller:

...
20131227 06:42:09 Element Name = Fan 1
20131227 06:42:09 Element Op Status = 2
20131227 06:42:09 Check classe OMC_PowerSupply
20131227 06:42:09 Element Name = Power Supply 1
20131227 06:42:09 Element Op Status = 2
20131227 06:42:09 Element Name = Power Supply 2
20131227 06:42:09 Element Op Status = 0
20131227 06:42:09 Check classe VMware_StorageExtent
20131227 06:42:10 Check classe VMware_Controller

Any idea why? Thanks

Daniel
Owner's reply

Please refer to the FAQ, maybe you'll find your solution: http://www.claudiokuenzler.com/blog/308/check-esxi-hardware-faq-frequently-asked-questions

I am unable to get it to return anything in NagiosXI Output empty
This works great when running it from CentOS terminal with root, but when I created the command in /etc/nagios/objects/commands.cfg and called it in services.cfg nagios gives me the following status(Return code of 126 is out of bounds - plugin may not be executable.

define command{
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$
}

define service{
use generic-service
hostgroup_name VMWare-servers
service_description System: ESXi Hardware status
check_command check_esxi_hardware!root!mypass!hp
}

I have verified that the check_esxi_hardware.py is executable, considering I am able to run it from the command line. I used chmod -R 755 root.root /usr/lib64/nagios/plugins/check_esxi_hardware.py. So why am I getting a return code 126 from Nagios when the command is executable?
Owner's reply

Check your resources.cfg and make sure that $USER1$ is actually /usr/lib64/nagios/plugins.

Hi,

When I use the command
/usr/lib64/nagios/plugins/check_esxi_hardware.py -H xx.xx.xx.xxx -U root -P password -V dell
it work.
OK - Server: Dell Inc. PowerEdge XXX s/n: XXXXXX System BIOS: XX.xx.XX

However when I use the check_nrpe command
/usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c VMwareESXi
I got this error.

CRITICAL: (0, 'Socket error: [Errno 13] Permission denied')

Can you advice me what when wrong?
Page 2 of 3