Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
check_IBM_DS_health
- Nagios 3.x
File | Description |
---|---|
check_IBM_DS_health_1.4.sh | check_IBM_DS_health v1.4 |
check_IBM_DS_health_1.5.sh | check_IBM_DS_health v1.5 |
Meet The New Nagios Core Services Platform
Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.
Monitoring Made Magically Better
- Nagios Core on Overdrive
- Powerful Monitoring Dashboards
- Time-Saving Configuration Wizards
- Open Source Powered Monitoring On Steroids
- And So Much More!
You need to install IBM DS Storage Manager. The plugin uses SMcli command usually based in "/opt/IBM_DS/client/SMcli". Location can be controlled with the "COMMAND" variable.
Check that the Nagios User has sufficient rights on "/opt/IBM_DS/client/SMcli" and "/var/opt/SM", otherwise the check could fail or produce messages like "attempt to update the configuration file was unsuccessful".
At least one Controller IP must be specified.
Usage: check_IBM_health.sh -a X.X.X.X -b X.X.X.X
-a IP of Controller A
-b IP of Controller B
define command {
command_name Check_IBM_DS_Health
command_line $USER1$/check_IBM_DS_health.sh -a $HOSTADDRESS$ -b $ARG1$
}
Tested with DS4300, DS4700, DS4800, DS5020 , DS5100 and Storage Manager 10.70, 10.77 and 10.83.
##################
Version 1.1 adds more intelligent filtering of unnecessary SMcli output and differentiation between Critical status for Hardware failures and Warning status for Preferred Path errors.
Version 1.2 patches the SMcli output parsing. Thanks to user "cseres" for the input!
Version 1.3 removes Clock Sync Warnings from the output.
Version 1.4 changes result parsing to fix "Unreadable sector" messages from DS3300/3400 not getting reported correctly. Thanks to user "Deep911" for the input!
Version 1.5 changes result parsing to fix "Battery Canister Expiration" messagesnot getting reported correctly. Also another wildcard entry in the nested "case"-statement was addedd to get at least a UNKNOWN response for any possible message. Thanks to user "dedri" for the input!
A little correction, because when plugins retreive status, it show a warning about ds storage name.
Juste add option -quick to the command on check_IBM_DS_HEALTH.sh :
#RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c "show storageSubsystem healthStatus;")
RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c 'show storageSubsystem healthStatus;' -quick)
thanks to Dick Visser : https://wiki.terena.org/display/~federated-user-3/Installing+SMcli+on+Ubuntu+12.04
thanks for providing this script. It also works on my DS3400 and DS3512 boxes.
I recently got a DS3512 which (somehow) requires a monitor/administrator password. I didn't want to provide the password in the script, but rather as parameter on the command line. Thus, I just forward any additional parameters directly to SMcli.
Here's my modification to the script:
# diff -u /scripts/check_IBM_DS_health_1.5.sh-orig /scripts/check_IBM_DS_health_1.5.sh
--- /scripts/check_IBM_DS_health_1.5.sh-orig 2014-11-04 16:54:45.000000000 +0100
+++ /scripts/check_IBM_DS_health_1.5.sh 2014-11-04 17:44:38.000000000 +0100
@@ -27,7 +27,7 @@
#########################################################
#SMcli location
-COMMAND=/opt/IBM_DS/client/SMcli
+COMMAND="sudo /opt/IBM_DS/client/SMcli"
# Define Nagios return codes
#
@@ -45,12 +45,14 @@
echo "IBM DS4x00/5x00 Health Check"
echo "the script requires IP of at least one DS4x00/5x00 Controller, second is optional"
echo ""
- echo "Usage check_IBM_health.sh -a X.X.X.X -b X.X.X.X"
+ echo "Usage check_IBM_health.sh -a X.X.X.X -b X.X.X.X [...]"
echo ""
echo " -h Show this page"
echo " -a IP of Controller A"
echo " -b IP of Controller B"
echo ""
+ echo " additional parameters are forwarded to SMcli"
+ echo ""
exit 0
}
@@ -78,10 +80,10 @@
shift
CTRLB_IP=$1
;;
+# pass unknown commands to SMcli
*)
- echo "Unknown argument: $1"
- print_help
- exit $STATE_UNKNOWN
+ PAR="$@"
+ break
;;
esac
shift
@@ -92,7 +94,7 @@
#
##execute SMcli
-RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c "show storageSubsystem healthStatus;")
+RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP $PAR -c "show storageSubsystem healthStatus;")
##filter unnecessary SMcli output
RESULT=$(echo $RESULT |sed 's/Performing syntax check...//g' | sed 's/Syntax check complete.//g' | sed 's/Executing script...//g' | sed 's/Script execution complete.//g'| sed 's/SMcli completed successfully.//g' | sed 's/The controller clocks in the storage subsystem are out of synchronization with the storage management station.//g' | sed 's/ Controller in Slot [AB]://g' | sed 's/Storage Management Station://g' | sed 's/\\s\\s[0-9]\{2\}\s[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\s\(CEST\|CET\)\s[0-9]\{4\}//g')
me@server libexec]# ./check_IBM_DS_health_1.5.sh -a 999.999.999.999 -b 999.999.999.999
Storage Subsystem health status = optimal.
OK
However when I try to run it in Nagios 4.0.8 I get:
Unknown response from SMcli: " "
UNKNOWN
At a bit of a loss as to how to get it to work, any suggestions?
"The following failures have been found: Nominal Temperature Exceeded Storage Subsystem: (XXX) Component reporting problem: Thermal sensor Status: Nominal temperature exceeded Location: Drive enclosure 0 Component requiring service: Temperature sensor Enclosure: Controller/Drive enclosure"
then the check returns unknown. I added "*failures*" in line 109 so also this error gets reported and the check becomes critical. Maybe you can considerate this for the next version of the check.
=====
Unkown response from SMcli: " The following failures have been found: Insufficient Cache Backup Device Capacity Storage Subsystem: [[Array Name]] Component reporting problem: Not Available Status: Not Available Location: Controller/Drive enclosure, Controller in slot A Component requiring service: Controller in slot A Service action (removal) allowed: No Service action LED on component: Yes "
=====
I think that modifying line 114 to include "Insufficient" under the warning search would be a reasonable change. Would you concur?
== Changed Code ===
case "$RESULT" in
*optimal*)
echo $RESULT
echo "OK"
exit $STATE_OK
;;
*failure*)
case "$RESULT" in
*failed*|*Failed*|*Unreadable*)
echo $RESULT
echo "CRITICAL"
exit $STATE_CRITICAL
;;
*preferred*|*Preferred*|*Expiration*|*Insufficient*)
echo $RESULT
echo "WARNING"
exit $STATE_WARNING
;;
*)
echo "Unkown response from SMcli: \" $RESULT \""
echo "UNKNOWN"
exit $STATE_UNKNOWN
;;
esac
;;
====
Further to the topic started from Deep911, the below error in my storagesubsystem also cannot be found and plugin gave me the output (null).
------------------
The following failures have been found:
Battery Canister Nearing Expiration
Storage Subsystem: MyCOmpany
Component reporting problem: Battery
Status: Near expiration
Location: Controller enclosure 85, Controller in Slot A
Smart battery: Yes
Component requiring service: Controller A
Service action (removal) allowed: No
Service action LED on component: No
Script execution complete.
SMcli completed successfully.
-----
The strange is that the output is nothing (null), and not what is written in the code:
echo "Unkown response from SMcli: " $RESULT ""
echo "UNKNOWN"
exit $STATE_UNKNOWN
Thanks for pointing that out. I missed a wildcard entry in the nested "case"-statement. In Version 1.5 I added both, correct reporting for the battery expiration and a "unknown" result for all messages I didn't think of.
Tested successfully on DS4300 monitoring from Debian Etch with Storage Manager 10.83.
Install SMcli on Debian:
1) explode SM10.83_Linux_32bit_x86_single-10.83.x5.23.tgz on filesystem
2) move to Linux_32bit_x86_10p83_singleLinux folder
3) extract files with "rpm2cpio SMclient-LINUX-10.83.G5.22-1.noarch.rpm | cpio -vid"
4) copy optIBM_DSclient where you want on filesystem
5) edit BASEDIR and JAVA_EXEC variables inside SMcli script (use JRE6 from Sun)
If you want to run this plugin as nagios user remember to give execute permission on SMcli and the script itself (chmod 755) and run it as root editing /etc/sudoers
for example:
Cmnd_Alias SMCLI = /opt/IBM_DS/client/SMcli
Cmnd_Alias IBMDS = /usr/lib/nagios/plugins/check_IBM_DS_health_1.3.sh
nagios ALL=NOPASSWD: SMCLI
nagios ALL=NOPASSWD: IBMDS
Thanks moep!
it is possible to make the output more clearer.
When i remove a power cord, it's show me the following output:
The following failures have been found: Power-Fan CRU/FRU - No Power Input Storage Subsystem: DS3512_1 Component reporting problem: Power supply CRU/FRU (Right) Status: No power input Location: Controller/Drive expansion enclosure Component requiring service: Power supply CRU/FRU (Right) Service action (removal) allowed: No Service action LED on component: Yes Subcomponent affected: Power supply (0)
It would be clearer with: "Failed power supply @ right side"
I am just using the output that comes from the DS Storage Manager client. Parsing all the possible messages to "translate" them would be a incredible amount of work.
Besides the client supports systems with more then one enclosure, so the level of detail of the message is useful in many cases.
Conclusion: No, it is not possible.
This error:
The following failures have been found:
Unreadable sector(s) detected
Storage Subsystem: DS3300
Unreadable sectors detected: 1
gave me the output (null).
you must change
*failure*)
case "$RESULT" in
*failed*|*Failed*)
echo $RESULT
echo "CRITICAL"
exit $STATE_CRITICAL
;;
to
*failure*)
case "$RESULT" in
*failed*|*Failed*|*failures*)
echo $RESULT
echo "CRITICAL"
exit $STATE_CRITICAL
;;
for the right output like this.
The following failures have been found: Unreadable sector(s) detected Storage Subsystem: DS3300 Unreadable sectors detected: 1
CRITICAL
Your patch kills the functionality to report "preferred path" errors as warnings. Anyway I can filter for *unreadable* to fix this. Thanks for the input!
- output contains string "Failed" with capital F
- output doesn't contain strings "optimal" or "failure"
Here's a patch for 1.1 that should fix it:
100c100
*failed*|*Failed*)
109a110,114
> *)
> echo $RESULT
> echo "UNKNOWN"
> exit $STATE_UNKNOWN
> ;;
112c117,122
*)
> echo "Unkown response from SMCLI: \" $RESULT \""
> echo "UNKNOWN"
> exit $STATE_UNKNOWN
> ;;
> esac
SMCli can be downloaded here :
http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm/Storage_Disk&product=ibm/Storage_Disk/DS3400&release=All&platform=All&function=all
The script do not bother of clock synchronisation , it reports only the status.
Here a sample output:
./check_IBM_DS_health.sh -a 10.0.0.1
The controller clocks in the storage subsystem are out of synchronization with the storage management station. Controller in Slot A: Mon Nov 21 22:29:53 CET
2011 Controller in Slot B: Mon Nov 21 22:27:57 CET 2011 Storage Management Station: Mon Nov 21 13:41:00 CET 2011 Storage Subsystem health status = optimal.
OK