Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
check_load2
Meet The New Nagios Core Services Platform
Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.
Monitoring Made Magically Better
- Nagios Core on Overdrive
- Powerful Monitoring Dashboards
- Time-Saving Configuration Wizards
- Open Source Powered Monitoring On Steroids
- And So Much More!
We are currently in the process of distributing a standard set of Nagios monitoring scripts to over 300 client systems. One of the metrics we would like to monitor is the three load averages (or as Dr. Gunther calls them: the LaLaLa triplets).
Since these 300 servers aren't all alike, we are bound to run into systems with one, two, four, eight or more processors. That way there is no nice way of making one standard configuration, since you'll have to define separate LA levels for WARN and CRIT. Why? Cause a quad system can take much more load than a single core system.
One way to get around this would be by defining separate host groups, based on the amount of processors in a system. You could then define a unique check_load command for each CPU host group.
I've gone the other way around though...
My work-around for this is by replacing check_load with check_load2. This script takes no command line parameters and works on the basis of standard multipliers. We are of the opinion that the number of processors multiplied by a certain factor (150%? 200%? and so on) is a good enough way to define these WARN and CRIT levels. These multipliers can easily be modified (at the top of the script) to fit what -you- think is a worrying level of activity.
This script was tested on Redhat ES3, Solaris 8 and Mac OS X 10.4. It should run on other versions of these OSes as well.
.
EDIT:
Oh! Just like my other recent Nagios scripts, check_load2 comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
Since these 300 servers aren't all alike, we are bound to run into systems with one, two, four, eight or more processors. That way there is no nice way of making one standard configuration, since you'll have to define separate LA levels for WARN and CRIT. Why? Cause a quad system can take much more load than a single core system.
One way to get around this would be by defining separate host groups, based on the amount of processors in a system. You could then define a unique check_load command for each CPU host group.
I've gone the other way around though...
My work-around for this is by replacing check_load with check_load2. This script takes no command line parameters and works on the basis of standard multipliers. We are of the opinion that the number of processors multiplied by a certain factor (150%? 200%? and so on) is a good enough way to define these WARN and CRIT levels. These multipliers can easily be modified (at the top of the script) to fit what -you- think is a worrying level of activity.
This script was tested on Redhat ES3, Solaris 8 and Mac OS X 10.4. It should run on other versions of these OSes as well.
.
EDIT:
Oh! Just like my other recent Nagios scripts, check_load2 comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
Reviews (1)
byJoobs888, March 16, 2015
Love this script, if anyone had the same issue I was facing. The script doesn't exit if there was a warning or critical output.
Last two lines change from this:
[ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_CRITICAL)
[ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_WARNING);
To this:
[ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?;
[ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?;
Last two lines change from this:
[ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_CRITICAL)
[ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_WARNING);
To this:
[ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?;
[ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?;