Oh Noes, my is load high! Part 1

Server Load:

Server load is a measure of the amount of work that a computer system performs. The load average represents the average system load over a period of time.

A server load average will measure the number of active processes at any given time. The load average seen in top is simplistic and uses several variables to define it. Depending on the processor and memory available the “nominal” or normal load will vary. High load averages will usually be indicated by higher than average swap usage. Generally, linux will use the memory it has available and utilize swap to alleviate the higher than average load.

Linux splits up it usable RAM into chunks called pages. In order to free up memory, linux will write these chunks to a predefined space on the hard disk, called swap space, to free up that chunk of memory. The totals of RAM and swap space is equal to the amount of virtual memory a system has.

When viewing the results of, lets say top, the load averages are for the time frames of 1, 5 and 15 minutes. There are a several ways to monitor your servers load. The first thing you will need to do is login to your server via SSH.

  • 1) uptime: The uptime command produces the following output:

    uptime
    14:08:20 up 26 days, 3:46, 1 user, load average: 0.08, 0.07, 0.02

    According to the man page, Uptime gives a one line display of the following information. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.

    2)procinfo: On Linux systems, the procinfo command produces the following output:


    procinfo -a
    Linux 2.6.9-023stab046.2-enterprise (root@rhel4-32) (gcc 3.4.5 20051201 ) #1 SMP Mon Dec 10 15:22:33 MSK 2007 4CPU [host]

    Memory: Total Used Free Shared Buffers
    Mem: 524288 326660 197628 0 0
    Swap: 0 0 0

    Bootup: Tue Jul 8 20:39:12 2008 Load average: 0.03 0.06 0.02 1/67 5086

    user : 8:55:25.21 0.3% page in : 0
    nice : 8:42:41.50 0.3% page out: 0
    system: 9:28:11.27 0.3% swap in : 0
    idle : 102d 23:25:21.71 98.4% swap out: 0
    steal : 0:00:00.00 0.0%
    uptime: 26d 3:50:49.00 context :4294967295 interrupts: 0

    Kernel Command Line:
    quiet

    Modules:

    File Systems:
    ext3 ext2 [proc] [tmpfs] [devpts]

    Procinfo gives a wealth of information including;

      Memory: The amount of memory available including Total, Used, Free, Shared, Buffers.

      Bootup Time: The time the system was booted.

      Load average: The average number of jobs running, followed by the number of runnable processes and the total number of processes (if your kernel is recent enough), followed by the PID of the last process run (idem).

      user: The amount of time spent running jobs in user space

      nice: The amount of time spent running niced jobs in user space.

      system: The amount of time spent running in kernel space.

      idle: The amount of time spent doing nothing.

      steal: The amount of time spent the virtual CPU waiting for physical CPU.

      uptime: The time that the system has been up.

      page in: The number of disk block paged into core from disk.

      page out: The reverse of the above.

      swap in: The number of memory pages (chunks) page (written) in from swapspace.

      swap out: The number of memory pages (chunks) page (written) out to swapspace.
      (Swap in and out only refer to transferring pages between RAM and dedicated swap space or a swap file)

      context: The total number of context switch since bootup.

      disk 1-4: The number of times your hard disks have been accessed.

      Interrupts: This is the two rows of numbers for each IRQ channel if your kernel is at version 1.0.5 or later.

      Modules: The modules (device drivers) installed on your machine, with their sizes in kilobytes.

      Character and Block Devices: All available devices with their major numbers.

      File Systems: All available file systems.

    3) w: The w command produces the following output:

    w
    14:38:03 up 26 days, 4:16, 1 user, load average: 0.01, 0.04, 0.00
    USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
    root ttyp1 x.x.x.x 14:08 550days 0.03s 0.00s w

    Notice that the first line of the output is identical to the output of the uptime command.

    4) top: The top program provides a dynamic real-time view and system summary information as well as a list of tasks currently being managed by the Linux kernel of a running system. The top command ranks processes according to the amount of CPU time they consume.


    top

    output

    top - 14:41:33 up 26 days, 4:20, 1 user, load average: 0.04, 0.04, 0.00
    Tasks: 58 total, 1 running, 57 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.1% us, 0.1% sy, 0.0% ni, 99.8% id, 0.0% wa, 0.0% hi, 0.0% si
    Mem: 524288k total, 323680k used, 200608k free, 0k buffers
    Swap: 0k total, 0k used, 0k free, 0k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

    12014 root 16 0 1908 984 780 R 0.3 0.2 0:00.01 top
    1 root 16 0 1640 604 524 S 0.0 0.1 0:02.15 init
    27817 root 16 0 1544 528 444 S 0.0 0.1 0:11.73 syslogd
    27821 root 16 0 1484 376 316 S 0.0 0.1 0:02.10 klogd
    27834 named 15 0 68224 3204 1944 S 0.0 0.6 0:17.46 named

    5) sar: The sar command writes the accumulated activity from the contents of a selected file to standard output (monitor) for the operating system for a specific timeframe. You can select specific information about system activities using flags. (Run the command ‘man sar’ for more information regarding these flags)


    sar -q

    outputs

    14:40:01 3 80 0.00 0.02 0.00
    14:50:02 3 77 0.03 0.05 0.01
    15:00:01 4 84 0.00 0.02 0.00
    15:10:02 3 87 0.06 0.07 0.02
    Average: 4 73 0.11 0.09 0.08

  • Load Average:

    Servers calculate the load average as the exponentially damped/weighted moving average of the load number. The three values of load average refer to the past one, five, and fifteen minutes of system operation.

    To explain further:

    If you have a single CPU, the load average is a percentage of the system utilization for a specific time period.
    If you have multiple CPU’s, you must divide the number by the number of processors in order to get a comparable percentage.

    For example, with a single CPU, you can interpret a load average of “1.75 0.40 9.28” as:

    during the previous minute: the CPU was overloaded by 75% (1 CPU with 1.75 runnable processes, so that 0.75 processes had to wait for a turn)

    during the last 5 minutes, the CPU was underloaded 40% (no processes had to wait for a turn)

    during the last 15 minutes, the CPU was overloaded 828% (1 CPU with 8.28 runnable processes, so that 8.28 processes had to wait for a turn)

    This means that this CPU could have handled all of the work scheduled for the last minute if it were 1.75 times as fast, or if there were two (1.75 rounded up) times as many CPUs, but that over the last five minutes it was twice as fast as necessary to prevent runnable processes from waiting their turn.

    What is the right load for my server?

    In a single CPU environment, anything around 1.0 and below is fine, try to stay under 1.0 for regular load averages. If your server slows down, check the load. A large trafic spike may cause the load to rise.

    When your regular load averages starts to raise up around 2.0 then your server is very busy and you should consider upgrading your RAM if your hardware allows it. A regular average would be defined as when the server is doing what it was intended for, serving up webpages, not when processing logs or doing backups.

    My next article will deal with what to do when you see the load constantly above normal.

    g33kadmin

    I am a g33k, Linux blogger, developer, student and Tech Writer for Liquidweb.com/kb. My passion for all things tech drives my hunt for all the coolz. I often need a vacation after I get back from vacation....

    Leave a Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.