In solaris, how monitor & auto-respond to critical events

Posted by mamcx on Server Fault See other posts from Server Fault or by mamcx
Published on 2011-01-10T02:15:23Z Indexed on 2011/01/10 2:55 UTC
Read the original article Hit count: 256

I have a website that randomly fail. Is running in open solaris on joyent.

I have a monitoring service that alert me when the site is down, but, I want a way to put a "insider" tool that tell me why that happened.

Is because the cpu is too high? Not memory? Which process fail? Is possible to have a backtrace of that?

Everything is running on the Solaris Service Management Facility. The webserver is cherokee, the database is mysql and the language is python/django.

I want the most simple setup to monitor that & auto-respond , ie: restart the webserver or the django process in case of failure.

I prefer a low-overhead tool. I don't need the fancy monitoring that some tools have, no ned graphs or sms alert. Only know what fail, restart it if possible (maybe up to n times), and have a log somewhere when I will check it.

© Server Fault or respective owner

Related posts about webserver

Related posts about monitoring