Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
Directory
gd
bygd, May 24, 2016
This is a useful check. FWIW... At a customer site where they use this check for AIX 6.1, and noticed an issue when the vmstat samples look similar to the following:
10 0 2750794 5040508 0 0 0 0 0 0 75 7806 2920 6 1 93 0 0.34 11.4
8 0 2750684 5040617 0 0 0 0 0 0 345 5356 2780 0 1 99 0 0.08 2.6
8 0 2750686 5040615 0 0 0 0 0 0 53 4138 2640 0 0 99 0 0.04 1.3
11 0 2750686 5040615 0 0 0 0 0 0 8 4358 2965 0 0 99 0 0.04 1.3
Note that the first field in the second and third sample above have a leading whitespace character in "r" column.
That appears to throw off the field identification and calculation.
You can see the wild inconsistencies, if you loop the check from command line:
OK - CPU usage at 75.25%
CRITICAL - CPU usage at 100%
CRITICAL - CPU usage at 100%
OK - CPU usage at 50.5%
CRITICAL - CPU usage at 100%
CRITICAL - CPU usage at 100%
OK - CPU usage at 50.5%
OK - CPU usage at 1%
OK - CPU usage at 50.75%
CRITICAL - CPU usage at 100%
And then compare that to a simple "vmstat 1":
4 0 2691885 5098619 0 0 0 0 0 0 30 3364 2164 0 0 99 0 0.03 1.0
6 0 2691885 5098619 0 0 0 0 0 0 29 3595 2239 1 0 99 0 0.05 1.6
0 0 2693761 5096742 0 0 0 0 0 0 126 9301 2525 17 1 82 0 0.84 28.1
85 0 2693762 5096741 0 0 0 0 0 0 51 3564 2223 0 0 99 0 0.04 1.5
35 0 2693762 5096741 0 0 0 0 0 0 19 3509 2129 0 0 99 0 0.04 1.2
32 0 2693762 5096741 0 0 0 0 0 0 8 3251 2152 0 0 99 0 0.03 0.8
31 0 2693697 5096806 0 0 0 0 0 0 20 3765 2153 1 0 98 0 0.09 3.0
0 0 2693918 5096584 0 0 0 0 0 0 151 6348 2315 5 1 94 0 0.29 9.8
Note that the CPU is typically > 90+% idle...
To test a fix to this, I used "xargs -l" to sanitize the vmstat output, accounted to the actual number of fields, and then adjusted the "strip" portion for whitespace:
# diff orig_check_aix_cpu.pl new_check_aix_cpu.pl
30c30
open(PS, "/usr/bin/vmstat 1 4 | egrep -v '[a-z,A-Z]|-' |egrep '[0-9]' | xargs -l |") || return 1;
32c32
(undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,$idle,undef,undef,undef) = split(/[\s+\n]+/);
When I run the modified script in a loop, the results looks a bit more sane:
OK - CPU usage at 1%
OK - CPU usage at 1%
OK - CPU usage at 1.75%
OK - CPU usage at 1%
OK - CPU usage at 4%
OK - CPU usage at 1%
OK - CPU usage at 2%
OK - CPU usage at 1%
YMMV...
10 0 2750794 5040508 0 0 0 0 0 0 75 7806 2920 6 1 93 0 0.34 11.4
8 0 2750684 5040617 0 0 0 0 0 0 345 5356 2780 0 1 99 0 0.08 2.6
8 0 2750686 5040615 0 0 0 0 0 0 53 4138 2640 0 0 99 0 0.04 1.3
11 0 2750686 5040615 0 0 0 0 0 0 8 4358 2965 0 0 99 0 0.04 1.3
Note that the first field in the second and third sample above have a leading whitespace character in "r" column.
That appears to throw off the field identification and calculation.
You can see the wild inconsistencies, if you loop the check from command line:
OK - CPU usage at 75.25%
CRITICAL - CPU usage at 100%
CRITICAL - CPU usage at 100%
OK - CPU usage at 50.5%
CRITICAL - CPU usage at 100%
CRITICAL - CPU usage at 100%
OK - CPU usage at 50.5%
OK - CPU usage at 1%
OK - CPU usage at 50.75%
CRITICAL - CPU usage at 100%
And then compare that to a simple "vmstat 1":
4 0 2691885 5098619 0 0 0 0 0 0 30 3364 2164 0 0 99 0 0.03 1.0
6 0 2691885 5098619 0 0 0 0 0 0 29 3595 2239 1 0 99 0 0.05 1.6
0 0 2693761 5096742 0 0 0 0 0 0 126 9301 2525 17 1 82 0 0.84 28.1
85 0 2693762 5096741 0 0 0 0 0 0 51 3564 2223 0 0 99 0 0.04 1.5
35 0 2693762 5096741 0 0 0 0 0 0 19 3509 2129 0 0 99 0 0.04 1.2
32 0 2693762 5096741 0 0 0 0 0 0 8 3251 2152 0 0 99 0 0.03 0.8
31 0 2693697 5096806 0 0 0 0 0 0 20 3765 2153 1 0 98 0 0.09 3.0
0 0 2693918 5096584 0 0 0 0 0 0 151 6348 2315 5 1 94 0 0.29 9.8
Note that the CPU is typically > 90+% idle...
To test a fix to this, I used "xargs -l" to sanitize the vmstat output, accounted to the actual number of fields, and then adjusted the "strip" portion for whitespace:
# diff orig_check_aix_cpu.pl new_check_aix_cpu.pl
30c30
open(PS, "/usr/bin/vmstat 1 4 | egrep -v '[a-z,A-Z]|-' |egrep '[0-9]' | xargs -l |") || return 1;
32c32
(undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,$idle,undef,undef,undef) = split(/[\s+\n]+/);
When I run the modified script in a loop, the results looks a bit more sane:
OK - CPU usage at 1%
OK - CPU usage at 1%
OK - CPU usage at 1.75%
OK - CPU usage at 1%
OK - CPU usage at 4%
OK - CPU usage at 1%
OK - CPU usage at 2%
OK - CPU usage at 1%
YMMV...