Home

Search Exchange

Search All Sites

Nagios Live Webinars

Let our experts show you how Nagios can help your organization.

Contact Us

Phone: 1-888-NAGIOS-1
Email: sales@nagios.com

Login

RSS Feeds

New Listings

Updated Listings

Directory Quicklinks

Home

All Categories

Add Listing

New Listings

Directory Tree

check_oomkiller

Submit review Recommend Print Contact Owner

Rating

1 vote

Favoured:

Current Version

2.0

Last Release Date

2010-09-07

Compatible With

Nagios 3.x

Owner

jchivian

Hits

100694

Files:

File	Description
check_oomkiller.pl.txt	check_oomkiller client plugin
check_oomkiller.c.txt	check_oomkiller suid wrapper

Meet The New Nagios Core Services Platform

Built on over 25 years of monitoring experience, the Nagios Core Services Platform provides insightful monitoring dashboards, time-saving monitoring wizards, and unmatched ease of use. Use it for free indefinitely.

Download Now

Monitoring Made Magically Better

Nagios Core on Overdrive
Powerful Monitoring Dashboards
Time-Saving Configuration Wizards
Open Source Powered Monitoring On Steroids
And So Much More!

LINUX ONLY - Check for OOM-Killer (out of memory killer) Activity

The LINUX OS will assassinate "big memory" processes during extreme memory shortages as a self defense. This plugin checks for such activity and returns a critical status if any has occurred since the previous check.

This plugin was written on and for RHEL4 using Nagios and NRPE and may need to be tweaked for other distro's.

IMPORTANT - This check requires read only access to the system messages file /var/log/messages which is not by default available to unprivileged accounts. For this reason it is either necessary to make /var/log/messages readable to the nagios user account (DON'T DO THIS), or it is necessary to write a small compiled wrapper program around the script and install the wrapper as a root owned SUID executable (DO THIS!)

Installation Instructions

1) Put the PERL script and C program in the nagios/libexec directory on the client system that will be checked.
2) Edit the C program if needed changing the REAL_PATH definition for your environment.
3) Compile the C program and install it as an SUID application. (chmod 4555 and chown root)
4) Use the following plugin definition on the client system in the nrpe.cfg configuration file. As with the C program edit the path if needed for your environment.

   command[check_oomkiller]=/usr/local/nagios/libexec/check_oomkiller

5) Use the following service check definition on the Nagios server to perform the check on monitored systems.

define service{
   use                   generic-service
   host_name             possible-oom-killer-victim
   service_description   OOM Killer
   check_command         check_nrpe60!check_oomkiller
   max_check_attempts    1
}

Because each instance of the OOM-Killer check resets the current status, the service check definition on the Nagios server MUST contain "max_check_attempts 1". If you don't do this you will NEVER be notified.

Also notice that I am using a custom check_command called check_nrpe60. The only difference between check_nrpe60 and the standard check_nrpe is the addition of a 60 second timeout specification (see below). This is necessary because on systems with large /var/log/messages files (or busy systems with few CPU cycles to spare) the standard NRPE check on the server can timeout before the plugin has actually completed on the client.

define command{
   command_name    check_nrpe60
   command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$
   }

The plugin returns a warning status if it can't perform its task, and a critical status if any OOM killer activity has taken place. If the status is critical, it also returns extended status information detailing the PID's and users affected.

And finally, it is worth noting that on a properly tuned system this activity will probably not occur. We discovered it "by accident" when a physical server was converted to a virtual machine and not given the same amount of memory it had previously. When we identified and applied the correct memory tuning parameter (vm.lower_zone_protection applied in /etc/syctl.conf in this case) the OOM-Killer activity ceased.

Reviews (1)

Appears to work still on Centos/RHEL7 but having trouble with NRPE

bybmoreitdan, March 22, 2017

I've got the compiled C wrapper working on the command line as ./check_oomkiller, but when attempting to run with NRPE as ./check_nrpe -H 127.0.0.1 -c check_oomkiller I continue to get NRPE: Unable to read output. Using NRPE 3.0.1. I have also update my nrpe.cfg file with the matching command and restarted nrpe.

Nagios, the Nagios logo, and Nagios graphics are the servicemarks, trademarks, or registered trademarks owned by Nagios Enterprises. All other servicemarks and trademarks are the property of their respective owner. The files and information on this site are the property of their respective owner(s). Nagios Enterprises makes no claims or warranties as to the fitness of any file or information on this website, for any purpose whatsoever. In fact, we officially disclaim all liability. We do, however, think these community contributions are pretty damn cool. Website Copyright © 2009-2025 Nagios Enterprises, LLC. All rights reserved.