[Nagios] Changing check_load Settings for Load Average Monitoring

Since load average spikes have been increasing, I changed the settings for Nagios’s check_load command.

Default values for check_load

define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_load!1,1,1!2,2,2
        contact_groups                  linux-admins
}

I changed max_check_attempts from 3 to 2 times, and normal_check_interval from 5 to 3 minutes.

check_load values after configuration change

define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        max_check_attempts              2
        normal_check_interval           3
        retry_check_interval            1
        check_command                   check_load!1,1,1!2,2,2
}

By shortening the check interval, we should be able to respond before the server starts crying for help.

That’s all from the Gemba, where load average is a concern.

Reference Information

That’s all from the Gemba.