Nagios DNX plugins

Posted by danneh3826 on Server Fault See other posts from Server Fault or by danneh3826
Published on 2012-06-25T08:10:18Z Indexed on 2012/06/25 9:18 UTC
Read the original article Hit count: 320

Filed under:

I'm toying with the idea of multiple Nagios instances setup to monitor our infrastructure. I've looked at all the various methods of distributed Nagios checks, and I think DNX comes out the closest. DNX handles failure of worker nodes, that's fine. What happens if the main DNX server fails though? Is there a way to replicate the server too? I'm using AWS EC2 primarily, so I can utilise Elastic Load Balancing for the web UI, but I need to be able to handle the AZ where the monitoring server is to fail over, and essentially for a second to pick up the checking load (active/passive, active/active, so long as it doesn't fail completely)

The other thing I'm trying to solve is an issue with routing. What I'd like is to have multiple nodes report a fault before Nagios confirms it as critical. Not the NRPE checks, as they're pretty self explanitory, but things more like check_ping. I often have routing issues out of AWS to certain datacenters, so Nagios can often report bad/no ping/timeout as a critical issue, even though the machine in question is working fine. Would it be possible to have a setup where a worker complains a service check is critical, and have a second worker node (positioned in another datacenter/AZ) also report the service as critical before the Nagios central server issues a critical alert?

I realise I might be asking a bit much (how far down the line do you go setting up failover systems before it starts to get ridiculous), however surely someone must have thought of this scenario when developing DNX?

Developer IT

Nagios DNX plugins - Developer IT

Nagios DNX plugins

nagios

failover

distributed

Related posts about nagios

Nagios Creating lots of zombie process

Nagios: NRPE: Unable to read output, Can't find the reason, can you?

nagios NRPE: Unable to read output

Nagios returns "No output returned from plugin" running process

Error with Apache, Nagios and Snorby integration

Related posts about failover

SQL Server 2012 - AlwaysOn

Secondary DHCP server won't start on Centos 6.2

Un-failing over a Cisco PIX 515e

Failover or load balancing configuration on Apple Xserve and Mac OS X Server Snow Leopard (10.6)

SQL Cluster on Hyper V Failover Cluster

Categories cloud