Nagios core Event Handler not working

Posted by sivashanmugam on Server Fault See other posts from Server Fault or by sivashanmugam
Published on 2014-05-30T12:37:14Z Indexed on 2014/06/02 15:31 UTC
Read the original article Hit count: 409

Filed under:

system-monitoring

Nagios Event Handler is not triggering when the service is taking more time to response or down.

My configuration in below

nagios.cfg

enable_event_handlers=1

localhost.cfg

define service {

    use   generic-service
    host_name                Server
   service_description       test-server
    servicegroups            test-service
    check_command             check-service
    is_volatile               0
    check_period              24x7
    max_check_attempts        4
    normal_check_interval     2
    retry_check_interval      2
    contact_groups            testcontacts
    notification_period       24x7
    notification_options       w,u,c,r
    notifications_enabled      1
    event_handler_enabled      1
    event_handler              recheck-service
}

command.cfg

define command{
                command_name recheck-service
                command_line /usr/local/nagios/libexec/alert.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
}

alert.sh file

!/bin/sh
set -x

case "$1" in

OK)

# The service just came back up, so don't do anything...

;;

WARNING)

# We don't really care about warning states, since the service is probably still running...

;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
Aha! The HTTP service appears to have a problem - perhaps we should restart the server...

Is this a "soft" or a "hard" state?
case "$2" in

We're in a "soft" state, meaning that Nagios is in the middle of retrying the
check before it turns into a "hard" state and contacts get notified...
SOFT)

# What check attempt are we on? We don't want to restart the web server on the first
check, because it may just be a fluke!
case "$3" in

Wait until the check has been tried 3 times before restarting the web server.
If the check fails on the 4th time (after we restart the web server), the state
type will turn to "hard" and contacts will be notified of the problem.
Hopefully this will restart the web server successfully, so the 4th check will
result in a "soft" recovery. If that happens no one gets notified because we
fixed the problem!
3)
echo -n "Going To Ping the Virtual Machine (3rd soft critical state)..."
# Call the init script to restart the HTTPD server
myresult=`/usr/local/nagios/libexec/check_http xyz.com -t 100 | grep 'time'| awk '{print $10}'`
echo "Your Service Is taking the following time Delay" "$myresult Seconds" |mail -s "WARNING : Service Taken More Time To Response" [email protected]
;;
esac
;;

# The HTTP service somehow managed to turn into a hard error without getting fixed.
# It should have been restarted by the code above, but for some reason it didn't.
# Let's give it one last try, shall we?
# Note: Contacts have already been notified of a problem with the service at this

Developer IT

Nagios core Event Handler not working - Developer IT

Nagios core Event Handler not working

monitoring

nagios

network-monitoring

nagios-plugins

system-monitoring

Related posts about monitoring

Issues with signal handling [closed]

JMX Based Monitoring - Part Two - JVM Monitoring

JMX Based Monitoring - Part Three - Web App Server Monitoring

JMX Based Monitoring - Part Four - Business App Server Monitoring

Keeping track of File System Utilization in Ops Center 12c

Related posts about nagios

Nagios Creating lots of zombie process

Nagios: NRPE: Unable to read output, Can't find the reason, can you?

nagios NRPE: Unable to read output

Nagios returns "No output returned from plugin" running process

Error with Apache, Nagios and Snorby integration

Categories cloud