Network monitoring tools with API features

Posted by Kev on Server Fault See other posts from Server Fault or by Kev
Published on 2011-03-13T17:58:02Z Indexed on 2011/03/14 0:11 UTC
Read the original article Hit count: 542

We use ks-soft's Advanced Hostmonitor package to monitor around 2000 items on our network. We think it's great, the chap that supports it is fantastic, the product is fast, stable and mature but I feel as as we grow as a company it's beginning to show some friction points in the area of integration with our back office admin systems.

One of the things we'd like to do is be able to add new tests to whatever monitoring tool we use via an API. For example, when orders for servers come from our retail interface, the server gets built automatically, and as part of the automated build process we'd like to automatically add new tests to the network monitoring systems.

Hostmonitor has some support for this via a feature called HM Script but we're starting to encounter some speedbumps -

  1. we can't add new operators/users
  2. we can't define new "Action Profiles" - these are the actions to be taken when a test goes good or bad.

What we love about hostmonitor though are the Action Profiles. For example if a Windows IIS box goes bad our action profile for a bad test does something like:

  • Check host again (one time)
  • Wait another 30 seconds then test again
  • Try restart app pool on remote machine (up to two times)
  • Send an email to ops about the restart failure
  • Try restarting IIS on remote machine (up to four times)
  • Page duty admin (up to 5 times - stops after duty admin ACKS alert)
  • Page backup duty admin (5 times - stops after duty admin ACKS alert)

I'm starting to look around at other network monitoring tools and I'm looking for:

  1. a comprehensive API to be able to add/remove/control tests/test "action profiles"/operators (not just plugins, we need control and admin interfaces)
  2. the ability to have quite detailed action/escalation profiles (and define these via an API)

I've looked at Nagios and Icinga but Ican't seem to glean from their documentation whether we could have these features or not, or if we could, how much work would be involved to implement/customise.

Can anyone provide any advice, guidance or experiences?

© Server Fault or respective owner

Related posts about monitoring

Related posts about nagios