Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
check_iostat - I/O statistics
Meet The New Nagios Core Services Platform
Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.
Monitoring Made Magically Better
- Nagios Core on Overdrive
- Powerful Monitoring Dashboards
- Time-Saving Configuration Wizards
- Open Source Powered Monitoring On Steroids
- And So Much More!
This simple plugins uses iostat to obtain it's metrics, parses it, and uses bc for comparing the results with the specified WARNING and CRITICAL levels (since the shell can't compare floating point numbers).
Feedbacks/suggestions are appreciated =)
Feedbacks/suggestions are appreciated =)
Reviews (18)
byao, September 25, 2024
When upgrading to Debian 11, the check didn't work anymore. I found the problem in Debian 11 with the help of randomtask commend. The variable $samples is defined with '2i'. In Debian 11 it will ignore the $samples in command (see below) and will infinitely generate reports. That is the reason why the check isn't working anymore as it is not reporting anything back. The fix is simple: remove the i.
TMPX=`$iostat $disk -x -k -d 5 $samples | grep $disk | tail -1`
TMPD=`$iostat $disk -k -d 5 $samples | grep $disk | tail -1`
----------
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
#
# Version 0.0.9 - Okt/2024
# Changes:
# - Fixed the problem with infinite generating reports with iostat in Debian 11. Changed the $samples from 2i to 2.
#
# by Alex
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
TMPX=`$iostat $disk -x -k -d 5 $samples | grep $disk | tail -1`
TMPD=`$iostat $disk -k -d 5 $samples | grep $disk | tail -1`
----------
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
#
# Version 0.0.9 - Okt/2024
# Changes:
# - Fixed the problem with infinite generating reports with iostat in Debian 11. Changed the $samples from 2i to 2.
#
# by Alex
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
byrandomtask, December 2, 2021
Great plugin, but I did find one issue. When I upgraded one of my servers to Debian 11 this plugin stopped working. It appears to be an issue with an updated version of iostat. To fix just edit the following lines from around line 234.
TMPX=$($iostat $disk -x -k -d 10 2 $samples | grep $disk | tail -1)
TMPD=$($iostat $disk -k -d 10 2 $samples | grep $disk | tail -1)
All I did was add a 2 after the 10 so that only 2 lines are returned to get the stats out. So far it seems to be working just fine. I hope this helps.
TMPX=$($iostat $disk -x -k -d 10 2 $samples | grep $disk | tail -1)
TMPD=$($iostat $disk -k -d 10 2 $samples | grep $disk | tail -1)
All I did was add a 2 after the 10 so that only 2 lines are returned to get the stats out. So far it seems to be working just fine. I hope this helps.
byjosephw, June 30, 2020
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
#
# Version 0.0.9 - Jun/2020
# Changes:
# - Updated to use bash 4.4 mechanisms
#
# by Joseph Waggy / joseph.waggy@gmail.com
iostat=$(which iostat 2>/dev/null)
bc=$(which bc 2>/dev/null)
help()
{
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
}
# Ensuring we have the needed tools:
if [[ ! -f $iostat ]] || [[ ! -f $bc ]]; then
echo -e "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n"
exit -1
fi
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d")
disk=$OPTARG
;;
"w")
warning=$OPTARG
;;
"c")
critical=$OPTARG
;;
"i")
io=1
;;
"p")
printperfdata=1
;;
"q")
queue=1
;;
"W")
waittime=1
;;
"g")
samples=1
;;
"h")
echo "help:"
help
exit 0
;;
\?)
echo "Invalid option: -$OPTARG" >&2
help
exit -1
;;
esac
done
# Autofill if parameters are empty
if [[ -z "$disk" ]]; then
disk=sda
fi
#Checks that only one query type is run
if [[ $((io+queue+waittime)) -ne "1" ]]; then
echo "ERROR: select one and only one run mode"
help
exit -1
fi
#set warning and critical to insane value is empty, else set the individual values
if [[ -z "$warning" ]]; then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=$(echo $warning | cut -d, -f1)
#Read/s with IO,Queue Length with queue
warn_2=$(echo $warning | cut -d, -f2)
#Write/s with IO
warn_3=$(echo $warning | cut -d, -f3)
#KB/s read with IO
warn_4=$(echo $warning | cut -d, -f4)
#KB/s written with IO
warn_5=$(echo $warning | cut -d, -f5)
#Crude hack due to integer expression later in the script
warning=1
fi
if [[ -z "$critical" ]]; then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=$(echo $critical | cut -d, -f1)
#Read/s with IO,Queue Length with queue
crit_2=$(echo $critical | cut -d, -f2)
#Write/s with IO
crit_3=$(echo $critical | cut -d, -f3)
#KB/s read with IO
crit_4=$(echo $critical | cut -d, -f4)
#KB/s written with IO
crit_5=$(echo $critical | cut -d, -f5)
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
if [[ ! -b "/dev/$disk" ]]; then
echo "ERROR: Device incorrectly specified"
help
exit -1
fi
#Checks for sane warning/critical levels
if [[ $warning -ne "99999" || $critical -ne "99999" ]]; then
if [[ "$warn_1" -gt "$crit_1" || "$warn_2" -gt "$crit_2" ]]; then
echo "ERROR: critical levels must be higher than warning levels"
help
exit -1
elif [[ $io -eq "1" || $waittime -eq "1" ]]; then
if [[ "$warn_3" -gt "$crit_3" || "$warn_4" -gt "$crit_4" || "$warn_5" -gt "$crit_5" ]]; then
echo "ERROR: critical levels must be higher than warning levels"
help
exit -1
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=$($iostat $disk -x -k -d 10 $samples | grep $disk | tail -1)
#------------IO Test-------------
if [[ "$io" == "1" ]]; then
TMPD=$($iostat $disk -k -d 10 $samples | grep $disk | tail -1)
#Requests per second:
tps=$(echo "$TMPD" | awk '{print $2}')
read_sec=$(echo "$TMPX" | awk '{print $4}')
written_sec=$(echo "$TMPX" | awk '{print $5}')
#Kb per second:
kbytes_read_sec=$(echo "$TMPX" | awk '{print $6}')
kbytes_written_sec=$(echo "$TMPX" | awk '{print $7}')
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$tps >= $warn_1" | bc)" == "1" || "$(echo "$read_sec >= $warn_2" | bc)" == "1" || "$(echo "$written_sec >= $warn_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $warn_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $warn_5" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$tps >= $crit_1" | bc)" == "1" || "$(echo "$read_sec >= $crit_2" | bc -q)" == "1" || "$(echo "$written_sec >= $crit_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $crit_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $crit_5" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [[ "$queue" == "1" ]]; then
qsize=$(echo "$TMPX" | awk '{print $8}')
qlength=$(echo "$TMPX" | awk '{print $9}')
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$qsize >= $warn_1" | bc)" == "1" || "$(echo "$qlength >= $warn_2" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$qsize >= $crit_1" | bc)" == "1" || "$(echo "$qlength >= $crit_2" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [[ "$waittime" == "1" ]]; then
avgwait=$(echo "$TMPX" | awk '{print $10}')
avgrwait=$(echo "$TMPX" | awk '{print $11}')
avgwwait=$(echo "$TMPX" | awk '{print $12}')
avgsvctime=$(echo "$TMPX" | awk '{print $13}')
avgcpuutil=$(echo "$TMPX" | awk '{print $14}')
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$avgwait >= $warn_1" | bc)" == "1" || "$(echo "$avgrwait >= $warn_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $warn_3" | bc)" == "1" || "$(echo "$avgsvctime >= $warn_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $warn_5" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$avgwait >= $crit_1" | bc)" == "1" || "$(echo "$avgrwait >= $crit_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $crit_3" | bc)" == "1" || "$(echo "$avgsvctime >= $crit_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $crit_5" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [[ "x$printperfdata" == "x1" ]]; then
echo -n "$PERFDATA"
fi
echo ""
exit $status
#----------/check_iostat.sh-----------
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
#
# Version 0.0.9 - Jun/2020
# Changes:
# - Updated to use bash 4.4 mechanisms
#
# by Joseph Waggy / joseph.waggy@gmail.com
iostat=$(which iostat 2>/dev/null)
bc=$(which bc 2>/dev/null)
help()
{
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
}
# Ensuring we have the needed tools:
if [[ ! -f $iostat ]] || [[ ! -f $bc ]]; then
echo -e "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n"
exit -1
fi
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d")
disk=$OPTARG
;;
"w")
warning=$OPTARG
;;
"c")
critical=$OPTARG
;;
"i")
io=1
;;
"p")
printperfdata=1
;;
"q")
queue=1
;;
"W")
waittime=1
;;
"g")
samples=1
;;
"h")
echo "help:"
help
exit 0
;;
\?)
echo "Invalid option: -$OPTARG" >&2
help
exit -1
;;
esac
done
# Autofill if parameters are empty
if [[ -z "$disk" ]]; then
disk=sda
fi
#Checks that only one query type is run
if [[ $((io+queue+waittime)) -ne "1" ]]; then
echo "ERROR: select one and only one run mode"
help
exit -1
fi
#set warning and critical to insane value is empty, else set the individual values
if [[ -z "$warning" ]]; then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=$(echo $warning | cut -d, -f1)
#Read/s with IO,Queue Length with queue
warn_2=$(echo $warning | cut -d, -f2)
#Write/s with IO
warn_3=$(echo $warning | cut -d, -f3)
#KB/s read with IO
warn_4=$(echo $warning | cut -d, -f4)
#KB/s written with IO
warn_5=$(echo $warning | cut -d, -f5)
#Crude hack due to integer expression later in the script
warning=1
fi
if [[ -z "$critical" ]]; then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=$(echo $critical | cut -d, -f1)
#Read/s with IO,Queue Length with queue
crit_2=$(echo $critical | cut -d, -f2)
#Write/s with IO
crit_3=$(echo $critical | cut -d, -f3)
#KB/s read with IO
crit_4=$(echo $critical | cut -d, -f4)
#KB/s written with IO
crit_5=$(echo $critical | cut -d, -f5)
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
if [[ ! -b "/dev/$disk" ]]; then
echo "ERROR: Device incorrectly specified"
help
exit -1
fi
#Checks for sane warning/critical levels
if [[ $warning -ne "99999" || $critical -ne "99999" ]]; then
if [[ "$warn_1" -gt "$crit_1" || "$warn_2" -gt "$crit_2" ]]; then
echo "ERROR: critical levels must be higher than warning levels"
help
exit -1
elif [[ $io -eq "1" || $waittime -eq "1" ]]; then
if [[ "$warn_3" -gt "$crit_3" || "$warn_4" -gt "$crit_4" || "$warn_5" -gt "$crit_5" ]]; then
echo "ERROR: critical levels must be higher than warning levels"
help
exit -1
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=$($iostat $disk -x -k -d 10 $samples | grep $disk | tail -1)
#------------IO Test-------------
if [[ "$io" == "1" ]]; then
TMPD=$($iostat $disk -k -d 10 $samples | grep $disk | tail -1)
#Requests per second:
tps=$(echo "$TMPD" | awk '{print $2}')
read_sec=$(echo "$TMPX" | awk '{print $4}')
written_sec=$(echo "$TMPX" | awk '{print $5}')
#Kb per second:
kbytes_read_sec=$(echo "$TMPX" | awk '{print $6}')
kbytes_written_sec=$(echo "$TMPX" | awk '{print $7}')
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$tps >= $warn_1" | bc)" == "1" || "$(echo "$read_sec >= $warn_2" | bc)" == "1" || "$(echo "$written_sec >= $warn_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $warn_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $warn_5" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$tps >= $crit_1" | bc)" == "1" || "$(echo "$read_sec >= $crit_2" | bc -q)" == "1" || "$(echo "$written_sec >= $crit_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $crit_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $crit_5" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [[ "$queue" == "1" ]]; then
qsize=$(echo "$TMPX" | awk '{print $8}')
qlength=$(echo "$TMPX" | awk '{print $9}')
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$qsize >= $warn_1" | bc)" == "1" || "$(echo "$qlength >= $warn_2" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$qsize >= $crit_1" | bc)" == "1" || "$(echo "$qlength >= $crit_2" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [[ "$waittime" == "1" ]]; then
avgwait=$(echo "$TMPX" | awk '{print $10}')
avgrwait=$(echo "$TMPX" | awk '{print $11}')
avgwwait=$(echo "$TMPX" | awk '{print $12}')
avgsvctime=$(echo "$TMPX" | awk '{print $13}')
avgcpuutil=$(echo "$TMPX" | awk '{print $14}')
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [[ "$warning" -ne "99999" ]]; then
if [[ "$(echo "$avgwait >= $warn_1" | bc)" == "1" || "$(echo "$avgrwait >= $warn_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $warn_3" | bc)" == "1" || "$(echo "$avgsvctime >= $warn_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $warn_5" | bc)" == "1" ]]; then
STATE="WARNING"
status=1
fi
fi
if [[ "$critical" -ne "99999" ]]; then
if [[ "$(echo "$avgwait >= $crit_1" | bc)" == "1" || "$(echo "$avgrwait >= $crit_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $crit_3" | bc)" == "1" || "$(echo "$avgsvctime >= $crit_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $crit_5" | bc)" == "1" ]]; then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [[ "x$printperfdata" == "x1" ]]; then
echo -n "$PERFDATA"
fi
echo ""
exit $status
#----------/check_iostat.sh-----------
bydvzunderd, January 20, 2019
I've added the warning/critical thresholds to the performance data.
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
# Version 0.0.8 - Jan/2019
# Changes:
# - Added Warn/Crit thresholds to performance output
#
# by Danny van Zunderd / danny_vz@live.nl
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then
warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then
critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
Fixed performance data for Wait check. Wasn't displaying any data.
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize; queue_length=$qlength;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait; avg_r_waittime_ms=$avgrwait; avg_w_waittime_ms=$avgwwait; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
#
# Version 0.0.7 - Sep/2014
# Changes:
# - Fixed performance data for Wait check
#
# by Christian Westergard / christian.westergard@gmail.com
#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize; queue_length=$qlength;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgwait; avg_r_waittime_ms=$avgrwait; avg_w_waittime_ms=$avgwwait; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
bybenjfield, July 14, 2014
I have changed the script to work with the above system and cleaned it up a fair amount. Someone might want to have a look at parsing the inputs using the column names rather than column numbers in the future:
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize; queue_length=$qlength;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_r_waittime_ms=$avgiotime; avg_w_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
#!/bin/bash
#----------check_iostat.sh-----------
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
# Version 0.0.6 - Jul/2014
# Changes:
# - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use
# - Fixed all inputs to match current iostat output (Ubuntu 12.04)
# - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag.
# - added extra comments/whitespace etc to make add readability
#
# by Ben Field / ben.field@concreteplatform.com
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
Run only one of i, q, W:
-i = IO Check Mode
--Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Average size of requests, Queue length of requests
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization
-w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK.
-p = Provide performance data for later graphing
-g = Since last reboot for system (more for debugging that nagios use!)
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
samples=2i
status=0
MSG=""
PERFDATA=""
#------------Argument Set-------------
while getopts "d:w:c:ipqWhg" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"g") samples=1;;
"h") echo "help:" && help;;
\?) echo "Invalid option: -$OPTARG" >&2
exit -1
;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
#Checks that only one query type is run
[[ `expr $io+$queue+$waittime` -ne "1" ]] && \
echo "ERROR: select one and only one run mode" && help
#set warning and critical to insane value is empty, else set the individual values
if [ -z "$warning" ]
then warning=99999
else
#TPS with IO, Request size with queue
warn_1=`echo $warning | cut -d, -f1`
#Read/s with IO,Queue Length with queue
warn_2=`echo $warning | cut -d, -f2`
#Write/s with IO
warn_3=`echo $warning | cut -d, -f3`
#KB/s read with IO
warn_4=`echo $warning | cut -d, -f4`
#KB/s written with IO
warn_5=`echo $warning | cut -d, -f5`
#Crude hack due to integer expression later in the script
warning=1
fi
if [ -z "$critical" ]
then critical=99999
else
#TPS with IO, Request size with queue
crit_1=`echo $critical | cut -d, -f1`
#Read/s with IO,Queue Length with queue
crit_2=`echo $critical | cut -d, -f2`
#Write/s with IO
crit_3=`echo $critical | cut -d, -f3`
#KB/s read with IO
crit_4=`echo $critical | cut -d, -f4`
#KB/s written with IO
crit_5=`echo $critical | cut -d, -f5`
#Crude hack due to integer expression later in the script
critical=1
fi
#------------Argument Set End-------------
#------------Parameter Check-------------
#Checks for sane Disk name:
[ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
#Checks for sane warning/critical levels
if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then
if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then
if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then
echo "ERROR: critical levels must be higher than warning levels" && help
fi
fi
fi
#------------Parameter Check End-------------
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# -d is the duration for second run, -x the rest
TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1`
#------------IO Test-------------
if [ "$io" == "1" ]; then
TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1`
#Requests per second:
tps=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
#Kb per second:
kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'`
# "Converting" values to float (string replace , with .)
tps=${tps/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
#------------IO Test End-------------
#------------Queue Test-------------
if [ "$queue" == "1" ]; then
qsize=`echo "$TMPX" | awk '{print $8}'`
qlength=`echo "$TMPX" | awk '{print $9}'`
# "Converting" values to float (string replace , with .)
qsize=${qsize/,/.}
qlength=${qlength/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength"
PERFDATA=" | qsize=$qsize; queue_length=$qlength;"
fi
#------------Queue Test End-------------
#------------Wait Time Test-------------
#Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return
if [ "$waittime" == "1" ]; then
avgwait=`echo "$TMPX" | awk '{print $10}'`
avgrwait=`echo "$TMPX" | awk '{print $11}'`
avgwwait=`echo "$TMPX" | awk '{print $12}'`
avgsvctime=`echo "$TMPX" | awk '{print $13}'`
avgcpuutil=`echo "$TMPX" | awk '{print $14}'`
# "Converting" values to float (string replace , with .)
avgwait=${avgwait/,/.}
avgrwait=${avgrwait/,/.}
avgwwait=${avgwwait/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# Comparing the result and setting the correct level:
if [ "$warning" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then
STATE="WARNING"
status=1
fi
fi
if [ "$critical" -ne "99999" ]; then
if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || \
[ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then
STATE="CRITICAL"
status=2
fi
fi
# Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_r_waittime_ms=$avgiotime; avg_w_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
#------------Wait Time End-------------
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
#----------/check_iostat.sh-----------
byfkrueger, June 11, 2014
Hi,
I had to do a few fixes and some (minor) clearing up compared to the 0.0.4 version posted here.
The plugin works again now.. as for SElinux, I will find out once I created an RPM for our environment and do a testing rollout :-)
Regards,
Frederic
----------check_iostat.sh-----------
#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# --------------------------------------
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
-i = IO Check Mode
--Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Total Queue Length,Read Queue Length,Write Queue Length
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms
-p = Provide performance data for later graphing
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
MSG=""
PERFDATA=""
# Getting parameters:
while getopts "d:w:c:io:pqu:Wt:h" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"h") help;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
if [ -z "$warning" ]
then warning=99999
fi
if [ -z "$critical" ]
then critical=99999
fi
# Adjusting the warn and crit levels:
crit_total=`echo $critical | cut -d, -f1`
crit_read=`echo $critical | cut -d, -f2`
crit_written=`echo $critical | cut -d, -f3`
crit_kbytes_read=`echo $critical | cut -d, -f4`
crit_kbytes_written=`echo $critical | cut -d, -f5`
warn_total=`echo $warning | cut -d, -f1`
warn_read=`echo $warning | cut -d, -f2`
warn_written=`echo $warning | cut -d, -f3`
warn_kbytes_read=`echo $warning | cut -d, -f4`
warn_kbytes_written=`echo $warning | cut -d, -f5`
## # Checking parameters:
# [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
# ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || \
# [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) &&
# echo "ERROR: You must specify all warning and critical levels" && help
# ( [[ "$warn_total" -ge "$crit_total" ]] || \
# [[ "$warn_read" -ge "$crit_read" ]] || \
# [[ "$warn_written" -ge "$crit_written" ]] ) && \
# echo "ERROR: critical levels must be highter than warning levels" && help
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual checks:
# -d has the total per second, -x the rest
TMPD=`$iostat $disk -k -d 2 1 | grep $disk`
TMPX=`$iostat $disk -x -d 2 1 | grep $disk`
## IO Check ##
if [ "$io" == "1" ]
then
total=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
kbytes_read_sec=`echo "$TMPD" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPD" | awk '{print $7}'`
# IO # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# IO # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# IO # Printing the results:
MSG="$STATE - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$total; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
## QUEUE Check ##
if [ "$queue" == "1" ]
then
total=`echo "$TMPX" | awk '{print $8}'`
readq_sec=`echo "$TMPX" | awk '{print $6}'`
writtenq_sec=`echo "$TMPX" | awk '{print $7}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
readq_sec=${readq_sec/,/.}
writtenq_sec=${writtenq_sec/,/.}
# QUEUE # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$writtenq_sec >= $warn_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$writtenq_sec >= $crit_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# QUEUE # Printing the results:
MSG="$STATE - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$readq_sec Write Queue/Sec=$writtenq_sec"
PERFDATA=" | total=$total; read_queue_sec=$readq_sec; write_queue_sec=$writtenq_sec;"
fi
## WAIT TIME Check ##
if [ "$waittime" == "1" ]
then
TMP=`$iostat $disk -x -k -d 2 1 | grep $disk`
avgiotime=`echo "$TMP" | awk '{print $10}'`
avgsvctime=`echo "$TMP" | awk '{print $11}'`
avgcpuutil=`echo "$TMP" | awk '{print $12}'`
# QUEUE # "Converting" values to float (string replace , with .)
avgiotime=${avgiotime/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# WAIT TIME # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$avgiotime >= $warn_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$avgiotime >= $crit_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# WAIT TIME # Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time/ms=$avgiotime Avg Service Wait Time/ms=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
----------/check_iostat.sh-----------
I had to do a few fixes and some (minor) clearing up compared to the 0.0.4 version posted here.
The plugin works again now.. as for SElinux, I will find out once I created an RPM for our environment and do a testing rollout :-)
Regards,
Frederic
----------check_iostat.sh-----------
#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# --------------------------------------
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
#
# Version 0.0.5 - Jun/2014
# Changes:
# - removed -y flag from call since iostat doesn't know about it any more (June 2014)
# - only needed executions of iostat are done now (save cpu time whenever you can)
# - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values
# - made perfomance data optional (I like to have choice in the matter)
#
# by Frederic Krueger / fkrueger-dev-checkiostat@holics.at
#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
-i = IO Check Mode
--Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Total Queue Length,Read Queue Length,Write Queue Length
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms
-p = Provide performance data for later graphing
-h = This help
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
printperfdata=0
STATE="OK"
MSG=""
PERFDATA=""
# Getting parameters:
while getopts "d:w:c:io:pqu:Wt:h" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"p") printperfdata=1;;
"q") queue=1;;
"W") waittime=1;;
"h") help;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
if [ -z "$warning" ]
then warning=99999
fi
if [ -z "$critical" ]
then critical=99999
fi
# Adjusting the warn and crit levels:
crit_total=`echo $critical | cut -d, -f1`
crit_read=`echo $critical | cut -d, -f2`
crit_written=`echo $critical | cut -d, -f3`
crit_kbytes_read=`echo $critical | cut -d, -f4`
crit_kbytes_written=`echo $critical | cut -d, -f5`
warn_total=`echo $warning | cut -d, -f1`
warn_read=`echo $warning | cut -d, -f2`
warn_written=`echo $warning | cut -d, -f3`
warn_kbytes_read=`echo $warning | cut -d, -f4`
warn_kbytes_written=`echo $warning | cut -d, -f5`
## # Checking parameters:
# [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
# ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || \
# [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) &&
# echo "ERROR: You must specify all warning and critical levels" && help
# ( [[ "$warn_total" -ge "$crit_total" ]] || \
# [[ "$warn_read" -ge "$crit_read" ]] || \
# [[ "$warn_written" -ge "$crit_written" ]] ) && \
# echo "ERROR: critical levels must be highter than warning levels" && help
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual checks:
# -d has the total per second, -x the rest
TMPD=`$iostat $disk -k -d 2 1 | grep $disk`
TMPX=`$iostat $disk -x -d 2 1 | grep $disk`
## IO Check ##
if [ "$io" == "1" ]
then
total=`echo "$TMPD" | awk '{print $2}'`
read_sec=`echo "$TMPX" | awk '{print $4}'`
written_sec=`echo "$TMPX" | awk '{print $5}'`
kbytes_read_sec=`echo "$TMPD" | awk '{print $6}'`
kbytes_written_sec=`echo "$TMPD" | awk '{print $7}'`
# IO # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# IO # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# IO # Printing the results:
MSG="$STATE - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec"
PERFDATA=" | total_io_sec'=$total; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;"
fi
## QUEUE Check ##
if [ "$queue" == "1" ]
then
total=`echo "$TMPX" | awk '{print $8}'`
readq_sec=`echo "$TMPX" | awk '{print $6}'`
writtenq_sec=`echo "$TMPX" | awk '{print $7}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
readq_sec=${readq_sec/,/.}
writtenq_sec=${writtenq_sec/,/.}
# QUEUE # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$writtenq_sec >= $warn_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$writtenq_sec >= $crit_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# QUEUE # Printing the results:
MSG="$STATE - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$readq_sec Write Queue/Sec=$writtenq_sec"
PERFDATA=" | total=$total; read_queue_sec=$readq_sec; write_queue_sec=$writtenq_sec;"
fi
## WAIT TIME Check ##
if [ "$waittime" == "1" ]
then
TMP=`$iostat $disk -x -k -d 2 1 | grep $disk`
avgiotime=`echo "$TMP" | awk '{print $10}'`
avgsvctime=`echo "$TMP" | awk '{print $11}'`
avgcpuutil=`echo "$TMP" | awk '{print $12}'`
# QUEUE # "Converting" values to float (string replace , with .)
avgiotime=${avgiotime/,/.}
avgsvctime=${avgsvctime/,/.}
avgcpuutil=${avgcpuutil/,/.}
# WAIT TIME # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$avgiotime >= $warn_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$avgcpuutil >= $warn_written" | bc`" == "1" ] )
then
STATE="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$avgiotime >= $crit_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$avgcpuutil >= $crit_written" | bc`" == "1" ] )
then
STATE="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
STATE="OK"
status=0
fi
# WAIT TIME # Printing the results:
MSG="$STATE - Wait Time Stats: Avg I/O Wait Time/ms=$avgiotime Avg Service Wait Time/ms=$avgsvctime Avg CPU Utilization=$avgcpuutil"
PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;"
fi
# now output the official result
echo -n "$MSG"
if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi
echo ""
exit $status
----------/check_iostat.sh-----------
byEndlessTundra, April 25, 2014
Hey Everyone, this script was very nice but it also had some weird irritations so I reworked it and added:
- Allow empty Warning/Critical values
- Added Modes so that you can check Disk IOs, Disk Queue, or Disk Wait Times
- To see the usage information use check_diskio.sh -h
Sorry I don't have this anywhere on the web so I'm just going to paste it here:
#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# --------------------------------------
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
-i = IO Check Mode
--Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Total Queue Length,Read Queue Length,Write Queue Length
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
msg="OK"
# Getting parameters:
while getopts "d:w:c:io:qu:Wt:h" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"q") queue=1;;
"W") waittime=1;;
"h") help;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
if [ -z "$warning" ]
then warning=99999
fi
if [ -z "$critical" ]
then critical=99999
fi
# Adjusting the warn and crit levels:
crit_total=`echo $critical | cut -d, -f1`
crit_read=`echo $critical | cut -d, -f2`
crit_written=`echo $critical | cut -d, -f3`
crit_kbytes_read=`echo $critical | cut -d, -f4`
crit_kbytes_written=`echo $critical | cut -d, -f5`
warn_total=`echo $warning | cut -d, -f1`
warn_read=`echo $warning | cut -d, -f2`
warn_written=`echo $warning | cut -d, -f3`
warn_kbytes_read=`echo $warning | cut -d, -f4`
warn_kbytes_written=`echo $warning | cut -d, -f5`
# # Checking parameters:
# [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
# ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || \
# [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) &&
# echo "ERROR: You must specify all warning and critical levels" && help
# ( [[ "$warn_total" -ge "$crit_total" ]] || \
# [[ "$warn_read" -ge "$crit_read" ]] || \
# [[ "$warn_written" -ge "$crit_written" ]] ) && \
# echo "ERROR: critical levels must be highter than warning levels" && help
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual checks:
## IO Check ##
if [ "$io" == "1" ]
then
total=`$iostat $disk -y -k -d 2 1 | grep $disk | awk '{print $2}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $4}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $5}'`
kbytes_read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $6}'`
kbytes_written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $7}'`
# IO # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# IO # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# IO # Printing the results:
echo "$msg - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec | 'Total IO/Sec'=$total; 'Read IO/Sec'=$read_sec; 'Write IO/Sec'=$written_sec; 'KBytes Read/Sec'=$kbytes_read_sec; 'KKBytes_Written/Sec'=$kbytes_written_sec;"
fi
## QUEUE Check ##
if [ "$queue" == "1" ]
then
total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $8}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $2}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $3}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
# QUEUE # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# QUEUE # Printing the results:
echo "$msg - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$read_sec Write Queue/Sec=$written_sec | 'total'=$total; 'Read Queue/Sec'=$read_sec; 'Write Queue/Sec'=$written_sec;"
fi
## WAIT TIME Check ##
if [ "$waittime" == "1" ]
then
total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $10}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $11}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $12}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
# WAIT TIME # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# WAIT TIME # Printing the results:
echo "$msg - Wait Time Stats: Avg I/O Wait Time/ms=$total Avg Read Wait Time/ms=$read_sec Avg Write Wait Time/ms=$written_sec | 'Avg I/O Wait Time/ms'=$total; 'Avg Read Wait Time/ms'=$read_sec; 'Avg Write Wait Time/ms'=$written_sec;"
fi
exit $status
- Allow empty Warning/Critical values
- Added Modes so that you can check Disk IOs, Disk Queue, or Disk Wait Times
- To see the usage information use check_diskio.sh -h
Sorry I don't have this anywhere on the web so I'm just going to paste it here:
#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
#
# by Thiago Varela - thiago@iplenix.com
#
# --------------------------------------
#
# Version 0.0.3 - Dec/2011
# Changes:
# - changed values from bytes to mbytes
# - fixed bug to get traffic data without comma but point
# - current values are displayed now, not average values (first run of iostat)
#
# by Philipp Niedziela - pn@pn-it.com
#
# Version 0.0.4 - April/2014
# Changes:
# - Allow Empty warn/crit levels
# - Can check I/O, WAIT Time, or Queue
#
# by Warren Turner
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
function help {
echo -e "
Usage:
-d =
--Device to be checked. Example: \"-d sda\"
-i = IO Check Mode
--Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec
--warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec
-q = Queue Mode
--Checks Disk Queue Lengths
--warning/critial = Total Queue Length,Read Queue Length,Write Queue Length
-W = Wait Time Mode
--Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
--warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms
"
exit -1
}
# Ensuring we have the needed tools:
( [ ! -f $iostat ] || [ ! -f $bc ] ) && \
( echo "ERROR: You must have iostat and bc installed in order to run this plugin\n\tuse: apt-get install systat bc\n" && exit -1 )
io=0
queue=0
waittime=0
msg="OK"
# Getting parameters:
while getopts "d:w:c:io:qu:Wt:h" OPT; do
case $OPT in
"d") disk=$OPTARG;;
"w") warning=$OPTARG;;
"c") critical=$OPTARG;;
"i") io=1;;
"q") queue=1;;
"W") waittime=1;;
"h") help;;
esac
done
# Autofill if parameters are empty
if [ -z "$disk" ]
then disk=sda
fi
if [ -z "$warning" ]
then warning=99999
fi
if [ -z "$critical" ]
then critical=99999
fi
# Adjusting the warn and crit levels:
crit_total=`echo $critical | cut -d, -f1`
crit_read=`echo $critical | cut -d, -f2`
crit_written=`echo $critical | cut -d, -f3`
crit_kbytes_read=`echo $critical | cut -d, -f4`
crit_kbytes_written=`echo $critical | cut -d, -f5`
warn_total=`echo $warning | cut -d, -f1`
warn_read=`echo $warning | cut -d, -f2`
warn_written=`echo $warning | cut -d, -f3`
warn_kbytes_read=`echo $warning | cut -d, -f4`
warn_kbytes_written=`echo $warning | cut -d, -f5`
# # Checking parameters:
# [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help
# ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || \
# [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) &&
# echo "ERROR: You must specify all warning and critical levels" && help
# ( [[ "$warn_total" -ge "$crit_total" ]] || \
# [[ "$warn_read" -ge "$crit_read" ]] || \
# [[ "$warn_written" -ge "$crit_written" ]] ) && \
# echo "ERROR: critical levels must be highter than warning levels" && help
# iostat parameters:
# -m: megabytes
# -k: kilobytes
# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual checks:
## IO Check ##
if [ "$io" == "1" ]
then
total=`$iostat $disk -y -k -d 2 1 | grep $disk | awk '{print $2}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $4}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $5}'`
kbytes_read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $6}'`
kbytes_written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $7}'`
# IO # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
kbytes_read_sec=${kbytes_read_sec/,/.}
kbytes_written_sec=${kbytes_written_sec/,/.}
# IO # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] ||
[ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || \
[ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# IO # Printing the results:
echo "$msg - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec | 'Total IO/Sec'=$total; 'Read IO/Sec'=$read_sec; 'Write IO/Sec'=$written_sec; 'KBytes Read/Sec'=$kbytes_read_sec; 'KKBytes_Written/Sec'=$kbytes_written_sec;"
fi
## QUEUE Check ##
if [ "$queue" == "1" ]
then
total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $8}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $2}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $3}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
# QUEUE # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# QUEUE # Printing the results:
echo "$msg - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$read_sec Write Queue/Sec=$written_sec | 'total'=$total; 'Read Queue/Sec'=$read_sec; 'Write Queue/Sec'=$written_sec;"
fi
## WAIT TIME Check ##
if [ "$waittime" == "1" ]
then
total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $10}'`
read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $11}'`
written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $12}'`
# QUEUE # "Converting" values to float (string replace , with .)
total=${total/,/.}
read_sec=${read_sec/,/.}
written_sec=${written_sec/,/.}
# WAIT TIME # Comparing the result and setting the correct level:
if [ "$warn_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || \
[ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] )
then
msg="WARNING"
status=1
fi
fi
if [ "$crit_total" -ne "99999" ]
then
if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] )
then
msg="CRITICAL"
status=2
fi
fi
if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ]
then
msg="OK"
status=0
fi
# WAIT TIME # Printing the results:
echo "$msg - Wait Time Stats: Avg I/O Wait Time/ms=$total Avg Read Wait Time/ms=$read_sec Avg Write Wait Time/ms=$written_sec | 'Avg I/O Wait Time/ms'=$total; 'Avg Read Wait Time/ms'=$read_sec; 'Avg Write Wait Time/ms'=$written_sec;"
fi
exit $status
bychlewis, April 15, 2014
Hi,
I have posted an updated version of this script here:
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_iostat--2D-I-2FO-statistics--2D-updated-2014/details
The script fixes the bugs mentioned in other posts also adds await (how long the system spends waiting to wrtie to disk) to the output and added a pnp4nagios graphing template.
I have posted an updated version of this script here:
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_iostat--2D-I-2FO-statistics--2D-updated-2014/details
The script fixes the bugs mentioned in other posts also adds await (how long the system spends waiting to wrtie to disk) to the output and added a pnp4nagios graphing template.
I have created a patched version between original version and philippn's one. This patch:
* Runs iostat just once.
* Avoids the conversion between '.' and ',' by running iostat with LANG=C
* Gets actual values not the ones from last reboot.
* Runs from bash
This is the patch:
Index: check_iostat
===================================================================
--- check_iostat (revisiĆ³n: 11002)
+++ check_iostat (copia de trabajo)
@@ -1,9 +1,20 @@
-#!/bin/sh
+#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
+#
+# by Thiago Varela - thiago@iplenix.com
#
-# by Thiago Varela - thiago@iplenix.com
+# --------------------------------------
+#
+# Version 0.0.3 - Dec/2011
+# Changes:
+# - changed values from bytes to mbytes
+# - fixed bug to get traffic data without comma but point
+# - current values are displayed now, not average values (first run of iostat)
+#
+# by Philipp Niedziela - pn@pn-it.com
+#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
@@ -50,14 +61,19 @@
echo "ERROR: critical levels must be highter than warning levels" && help
+# iostat parameters:
+# -m: megabytes
+# -k: kilobytes
+# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual check:
-tps=`$iostat $disk | grep $disk | awk '{print $2}'`
-kbread=`$iostat $disk | grep $disk | awk '{print $3}'`
-kbwritten=`$iostat $disk | grep $disk | awk '{print $4}'`
+# We get just 2nd line, which is the actual value
+output=$(LANG=C $iostat $disk -d 1 2 | grep $disk | sed -n '2p')
+tps=$(echo "$output" | awk '{print $2}')
+kbread=$(echo "$output" | awk '{print $3}')
+kbwritten=$(echo "$output" | awk '{print $4}')
-
# Comparing the result and setting the correct level:
-if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc`" == "1" ] || \
+if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$kbwritten >= $crit_written" | bc`" == "1" ] ); then
msg="CRITICAL"
status=2
* Runs iostat just once.
* Avoids the conversion between '.' and ',' by running iostat with LANG=C
* Gets actual values not the ones from last reboot.
* Runs from bash
This is the patch:
Index: check_iostat
===================================================================
--- check_iostat (revisiĆ³n: 11002)
+++ check_iostat (copia de trabajo)
@@ -1,9 +1,20 @@
-#!/bin/sh
+#!/bin/bash
#
# Version 0.0.2 - Jan/2009
# Changes: added device verification
+#
+# by Thiago Varela - thiago@iplenix.com
#
-# by Thiago Varela - thiago@iplenix.com
+# --------------------------------------
+#
+# Version 0.0.3 - Dec/2011
+# Changes:
+# - changed values from bytes to mbytes
+# - fixed bug to get traffic data without comma but point
+# - current values are displayed now, not average values (first run of iostat)
+#
+# by Philipp Niedziela - pn@pn-it.com
+#
iostat=`which iostat 2>/dev/null`
bc=`which bc 2>/dev/null`
@@ -50,14 +61,19 @@
echo "ERROR: critical levels must be highter than warning levels" && help
+# iostat parameters:
+# -m: megabytes
+# -k: kilobytes
+# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd
# Doing the actual check:
-tps=`$iostat $disk | grep $disk | awk '{print $2}'`
-kbread=`$iostat $disk | grep $disk | awk '{print $3}'`
-kbwritten=`$iostat $disk | grep $disk | awk '{print $4}'`
+# We get just 2nd line, which is the actual value
+output=$(LANG=C $iostat $disk -d 1 2 | grep $disk | sed -n '2p')
+tps=$(echo "$output" | awk '{print $2}')
+kbread=$(echo "$output" | awk '{print $3}')
+kbwritten=$(echo "$output" | awk '{print $4}')
-
# Comparing the result and setting the correct level:
-if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc`" == "1" ] || \
+if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc -q`" == "1" ] || \
[ "`echo "$kbwritten >= $crit_written" | bc`" == "1" ] ); then
msg="CRITICAL"
status=2
Anybody cooperated this with nrpe and selinux ? What type of context should it has ?
byGldRush98, July 24, 2012
philippn's changes made this script useful. With out those changes, the averages this check provides be default are fairly worthless.
bykonstantin, May 14, 2012
Hi,
I want to add 2 Hints. The Expression from comma to point is not needed. Just export LANG=C in the script. Then the output of iostat will be dotted.
The second is that I would suggest to use #!/bin/bash as interpreter due to the fact that /bin/sh is linked to /bin/dash in newer distributions. And this script will not work without it.
I want to add 2 Hints. The Expression from comma to point is not needed. Just export LANG=C in the script. Then the output of iostat will be dotted.
The second is that I would suggest to use #!/bin/bash as interpreter due to the fact that /bin/sh is linked to /bin/dash in newer distributions. And this script will not work without it.
bydarfnader, April 16, 2012
In ubuntu this has to be ran as a bash script. Also you need 'bc' installed on the system
bymguthrie, February 2, 2012
Gave me exactly what I needed, thanks!
I've changed a bit to get it working on my server (performance data in MB; showing current read/write, not average vaules since last restart)
http://www.pn-it.com/wp-content/uploads/2011/12/check_iostat / http://www.pn-it.com/linux-ubuntu/nagios-festplatten-mit-check_iostat-uberwachen/
http://www.pn-it.com/wp-content/uploads/2011/12/check_iostat / http://www.pn-it.com/linux-ubuntu/nagios-festplatten-mit-check_iostat-uberwachen/
bykforbus, June 9, 2010
4 of 4 people found this review helpful
Very nice plugin. Only change I made was adding "-k" to the lines:
kbread=`$iostat $disk -k | grep $disk | awk '{print $3}'`
kbwritten=`$iostat $disk -k | grep $disk | awk '{print $4}'`
This is because the plugin appears to return blocks read and written per second instead of kilobytes read and written per second. The "-k" option for iostat fixes this.
kbread=`$iostat $disk -k | grep $disk | awk '{print $3}'`
kbwritten=`$iostat $disk -k | grep $disk | awk '{print $4}'`
This is because the plugin appears to return blocks read and written per second instead of kilobytes read and written per second. The "-k" option for iostat fixes this.
Change this lines :
# Doing the actual check:
tps=`$iostat $disk | grep $disk | awk '{print $2}' | sed -e 's/,/./g'`
kbread=`$iostat $disk | grep $disk | awk '{print $3}' | sed -e 's/,/./g'`
kbwritten=`$iostat $disk | grep $disk | awk '{print $4}' | sed -e 's/,/./g'`
# Doing the actual check:
tps=`$iostat $disk | grep $disk | awk '{print $2}' | sed -e 's/,/./g'`
kbread=`$iostat $disk | grep $disk | awk '{print $3}' | sed -e 's/,/./g'`
kbwritten=`$iostat $disk | grep $disk | awk '{print $4}' | sed -e 's/,/./g'`