Poor man’s processes monitor

From andreinc.net

Recently a member of the Romanian Ubuntu Community asked for a script to monitor the running processes on his server . He didn’t requested for anything fancy, just a small utility that will be able to detect a hanging application (a process that is “eating” more than 80% of the CPU for a long period of time) and then log the results.

I am no sysadmin, but I am sure there are a lot of dedicated open-source solutions for monitoring a server .Still the functionality he asked for can be easily achieved by combining bash and awk . One of the things I like Linux for is the power of the shell and the full-control over the your system . You can write a script for every repeating task that arise as bash is easy to learn but of course, hard to master .

More as an exercise for myself I’ve proposed the following solution :


#DATE: Nov 5, 2010
#AUTHOR: nomemory

#Maximum memory for a process (%)
declare -i MEM_LIMIT=1

#Maximum CPU for a process (%)
declare -i CPU_LIMIT=1

#Loop sleep interval
declare -i SEC_INT=30

while true; do
ps aux | awk -v MEM_LIMIT=${MEM_LIMIT} \
-v CDATE="`date`" '{
if ($3 > CPU_LIMIT) {
printf "%s [ %10s %d %40s ] CPU LIMIT EXCEED: %2.2f (MAX: %2.2f) \n", \
CDATE, $1, $2, $11, $3, CPU_LIMIT
if ($4 > MEM_LIMIT) {
printf "%s [ %10s %d %40s ] MEM LIMIT EXCEED: %2.2f (MAX: %2.2f) \n", \
CDATE, $1, $2, $11, $4, MEM_LIMIT
sleep ${SEC_INT}

If you run this script the output will probably look similar to this one :

Mon Nov 8 00:01:08 EET 2010 [ andrei 1718 /opt/google/chrome/google-chrome ] MEM LIMIT EXCEED: 2.20 (MAX: 1.00)
Mon Nov 8 00:01:08 EET 2010 [ andrei 1726 pidgin ] MEM LIMIT EXCEED: 1.40 (MAX: 1.00)
Mon Nov 8 00:01:08 EET 2010 [ andrei 1853 /opt/google/chrome/chrome ] CPU LIMIT EXCEED: 5.70 (MAX: 1.00)
Mon Nov 8 00:01:08 EET 2010 [ andrei 1853 /opt/google/chrome/chrome ] MEM LIMIT EXCEED: 2.70 (MAX: 1.00)
Mon Nov 8 00:01:08 EET 2010 [ andrei 2054 gnome-terminal ] CPU LIMIT EXCEED: 1.50 (MAX: 1.00)
Mon Nov 8 00:01:08 EET 2010 [ andrei 2058 bash ] CPU LIMIT EXCEED: 1.70 (MAX: 1.00)

The output can then be redirected to a file (>>) and interpreted as:

In this form the script support the following variables:

A float number (integers are accepted) representing the maximum memory percent a process can use before triggering the alarm .
A float number (integers are accepted) representing the maximum
The pause in the main loop . Every SEC_INT the process will be scanned
Those variables are passed as awk variables whilst using the ‘ -v ‘ flag .

The shortcomings of the script are obvious: sometimes a process can have a short spike of CPU consumption, so false positives may appear . Probably the best thing to do will be to write another script to analyze the log, and see how many times a certain command is repeated . For example the log should be ‘grep’ed to find a certain command, then use the ‘wc’ utility the count how many times the process triggered the alarm . All in all the problem worthed a try !

Written by: Andrei Ciobanu on November 7, 2010.
Last revised by: Andrei Ciobanu on November 14, 2010.

From andreinc.net


I am a g33k, Linux blogger, developer, student and Tech Writer for Liquidweb.com/kb. My passion for all things tech drives my hunt for all the coolz. I often need a vacation after I get back from vacation....