[Nagios] Changing check_load Settings for Load Average Monitoring
Since load average spikes have been increasing, I changed the settings for Nagios’s check_load command.
Default values for check_load
define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_load!1,1,1!2,2,2
        contact_groups                  linux-admins
}
I changed max_check_attempts from 3 to 2 times, and normal_check_interval from 5 to 3 minutes.
check_load values after configuration change
define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        max_check_attempts              2
        normal_check_interval           3
        retry_check_interval            1
        check_command                   check_load!1,1,1!2,2,2
}
By shortening the check interval, we should be able to respond before the server starts crying for help.
That’s all from the Gemba, where load average is a concern.
That’s all from the Gemba.