Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 10/14 | < Previous Page | 6 7 8 9 10 11 12 13 14 | Next Page >

My Package Version Number Appears Greater Yet apt-get Doesn't Select It

- by nutznboltz

Backstory: It was determined that when using lxc container VMs the Nagios nrpe shutdown script when run on the host of the containers would kill the nrpe processes inside the containers. This was remediated by changing the script to use pidfiles instead of searching the process table for the nrpe process. Regrettably start-stop-daemon is a C program that resulted from translating a Perl script and it shows. There are far too many global varibles in start-stop-daemon.c and although there are some nice blocks of comments there are far to few comments that explain the intent behind variable names such as "schedule" (the string "schedule" appears in many contexts.) The manual page for start-stop-daemon strongly suggests that unless you use the "--retry" option the start-stop-daemon program may return before the process it sent a signal to actually calls exit() and terminates, however it doesn't actually state this in plain English. The obtuseness of start-stop-daemon is most likely the reason that the "fixed" version of the script includes a dubious comment indicating that sometimes the pid file has not been removed. I can easily see why someone would not understand that he left the --retry option missing. This bug also causes failures when the script is given the "restart" option; the nrpe daemon will shutdown but not start up again. Did I mention that since applying the update our nrpe servers started crashing over and over? Repairing this is why I am doing this work. I have been working on remediating the fix. You can see my current work in this PPA. Actual Question: The upstream version number of nagios-nrpe-server in lucid-updates is 2.12-4ubuntu1.10.04.1 My PPA uses this version number 2.12-4ubuntu1.10.04.1.1~ppa1~lucid1 I check the rules here and use this test program and I am lead to believe that the version number I use in my PPA is greater than the one in lucid-updates yet when I ran: sudo add-apt-repository ppa:nutznboltz/nrpe-unbreak-lp-600941 sudo apt-get update sudo aptitiude dist-upgrade The replacement package was not installed. I was able to install it using sudo aptitude install nagios-nrpe-server=2.12-4ubuntu1.10.04.1.1~ppa1~lucid1 Can anyone explain this behavior? Why didn't my version number appear greater to "aptitude dist-upgrade"? Thanks $ cat /etc/apt/preferences Package: * Pin: release a=lucid-backports Pin-Priority: 400 Package: * Pin: release a=lucid-security Pin-Priority: 990 Package: * Pin: release a=lucid-updates Pin-Priority: 900 Package: * Pin: release a=lucid-proposed Pin-Priority: 400 $ ls /etc/apt/preferences.d/ $ Should not make any difference as a PPA cannot be in any of those pockets. I went ahead and bumped the version number in the PPA to 2.12-4ubuntu1.10.04.2~ppa1~lucid1. I'll see if that makes a difference. I do notice that lintian complains: W: nagios-nrpe-server: debian-revision-not-well-formed 2.12-4ubuntu1.10.04.2~ppa1~lucid1

Read the article
Better logging for cronjob output using /usr/bin/logger

- by Stefan Lasiewski

I am looking for a better way to log cronjobs. Most cronjobs tend to spam email or the console, get ignored, or create yet another logfile. In this case, I have a Nagios NSCA script which sends data to a central Nagios sever. This send_nsca script also prints a single status line to STDOUT, indicating success or failure. 0 * * * * root /usr/local/nagios/sbin/nsca_check_disk This emails the following message to root@localhost, which is then forwarded to my team of sysadmins. Spam. forwarded nsca_check_disk: 1 data packet(s) sent to host successfully. I'm looking for a log method which: Doesn't spam the messages to email or the console Don't create yet another krufty logfile which requires cleanup months or years later. Capture the log information somewhere, so it can be viewed later if desired. Works on most unixes Fits into an existing log infrastructure. Uses common syslog conventions like 'facility' Some of these are third party scripts, and don't always do logging internally. UPDATE 2010-04-30 In the process of writing this question, I think I have answered myself. So I'll answer myself "Jeopardy-style". Is there any problem with this method? The following will send any Cron output to /usr/bin//logger, which will send to syslog, with a 'tag' of 'nsca_check_disk'. Syslog handles it from there. My systems (CentOS and FreeBSD) already handle log rotation. */5 * * * * root /usr/local/nagios/sbin/nsca_check_disk 2>&1 |/usr/bin/logger -t nsca_check_disk /var/log/messages now has one additional message which says this: Apr 29, 17:40:00 192.168.6.19 nsca_check_disk: 1 data packet(s) sent to host successfully. I like /usr/bin/logger , because it works well with an existing syslog configuration and infrastructure, and is included with most Unix distros. Most *nix distributions already do logrotation, and do it well.

Read the article
Exchange not preserving the "To:" field

- by Matt Simmons

I've got a hosted exchange solution through Apptix, which isn't the problem, I think, but it may be relevant. I have my main account, [email protected], and to that, I have an alias, [email protected]. Whenever I send an email to [email protected], I examine the headers, and I see the "To:" field being correct, "To: [email protected]". All is well. I recently set up another user, [email protected] to function as a multipurpose mailbox. I aliased "[email protected]" to the services account in the same method that I did "[email protected]", however nothing I have sent to "[email protected]" actually goes TO "[email protected]". All of the headers say "To: [email protected]". This makes it extremely difficult to filter based on headers alone. Does anyone have any feedback on what settings I would need to look at in order to fix that?

Read the article
sSMTP Unable to send message using external mail server SMTP

- by OrangeGrover

I'm trying to finish up my Nagios install by having it email me. It was emailing me using /bin/mail so it always got sent to my spam folders. I installed sSMTP to try to send a request to my work's email server to be able to send out a message from an authenticated user. Here is my /etc/ssmtp/ssmtp.conf file: mailhub=10.200.120.148:25 UseTLS=NO AuthUser= [email protected] AuthPass=PASSWORD So far I've been using the following command, and it will still arrive to my email inbox as root@localhost which causes it to go to my spam folder (with the exception of one email provider I have). cat message |ssmtp [email protected] I've looked at a few examples online, and they all seem to have pretty much the same as me. Does anybody see the any mistakes that I'm making? Just to clarify, [email protected] is a user on the mail server that my work uses.

Read the article
Exchange not preserving the "To:" field

- by Matt Simmons

I've got a hosted exchange solution through Apptix, which isn't the problem, I think, but it may be relevant. I have my main account, [email protected], and to that, I have an alias, [email protected]. Whenever I send an email to [email protected], I examine the headers, and I see the "To:" field being correct, "To: [email protected]". All is well. I recently set up another user, [email protected] to function as a multipurpose mailbox. I aliased "[email protected]" to the services account in the same method that I did "[email protected]", however nothing I have sent to "[email protected]" actually goes TO "[email protected]". All of the headers say "To: [email protected]". This makes it extremely difficult to filter based on headers alone. Does anyone have any feedback on what settings I would need to look at in order to fix that?

Read the article
Automating and deploying new linux servers

- by luckytaxi

I'm in the process of developing a method to automate new virtual machines into my environment. 90% of our machines are virtual but the process is similar for both physical and vmware based images. What I do now is I use cobbler to install the base OS. The kickstart script has post hooks to modify the yum repo and installs puppet and func. Once the servers are running, I manually add them into nagios and sign the certificate via the puppetmaster. I've since migrated most of the resources to use mysql as the backend. I wanted to see what others are doing and my goal for 2011 is to have puppet inventory the hardware into mysql, and somehow i'll script a python script to have nagios grab the info and automatically add it for monitoring purposes. It's kind of tedious to have to add each new server into nagios, puppet's dashboard, munin, etc...

Read the article
Is Zabbix the right tool for me?

- by hortitude

I just want to monitor a small handful of servers (less than 10). From reading various places it sounds like the top leading contenders (for open source at least) are: nagios munin zabbix From what I have read a lot of people tend to use munin and nagios together -- munin for history and graphs, and nagios for alerting. On the other hand it sounds like Zabbix is a more complete solution and easier to configure than either of the other two. So I was thinking of going that route. My thoughts right now are: What are the general disadvantages of Zabbix? Does Zabbix have a small footprint on boxes it is monitoring? Do I really need to setup an entire other server for it? I currently have a server that is under very light load -- can I dual purpose it?

Read the article
Title: Better logging for cronjob output

- by Stefan Lasiewski

I am looking for a better way to log cronjobs. Most cronjobs tend to spam email or the console, get ignored, or create yet another logfile. In this case, I have a Nagios NSCA script which sends data to a central Nagios sever. This send_nsca script also prints a single status line to STDOUT, indicating success or failure. 0 * * * * root /usr/local/nagios/sbin/nsca_check_disk This emails the following message to root@localhost, which is then forwarded to my team of sysadmins. Spam. forwarded nsca_check_disk: 1 data packet(s) sent to host successfully. I'm looking for a log method which: Doesn't spam the messages to email or the console Don't create yet another krufty logfile which requires cleanup months or years later. Capture the log information somewhere, so it can be viewed later if desired. Works on most unixes Fits into an existing log infrastructure. Uses common syslog conventions like 'facility' Some of these are third party scripts, and don't always do logging internally.

Read the article
OpenNMS monitoring SAP

- by HannesFostie

I was wondering if anyone had any experience plugging SAP into their OpenNMS installation. Mostly looking for experiences, perhaps Nagios comparisons, or some more concrete information on what is being monitored and how you did it. Go into as much details as you like. I am currently in the process of evaluating both Nagios and OpenNMS and the possibilities with SAP might be the deciding factor here. Sadly, I didn't find a whole lot on google on the subject.

Read the article
What is start_daemon?

- by David Parks

I'm trying to understand start_daemon in the following /etc/init.d/nagios-nrpe-server startup script: start) if [ "$INETD" = 1 ]; then exit 1 fi log_daemon_msg "Starting $DESC" "$NAME" start_daemon -p $PIDDIR/nrpe.pid $NICENESS $DAEMON -c $CONFIG -d $DAEMON_OPTS log_end_msg $? ;; In particular, when I start this service it isn't writing a PID file as expected, thus the stop service nagios-nrpe-server command is not working (I need to manually kill the processes). I'm trying to figure out how to trouble shoot the problem, but I can't run start_daemon ... from the command line. I want to reproduce what the script is doing manually so I can work on what the problem is.

Read the article
nagios3 Error: Could not read object configuration data!

- by user1493730

I have a brand new install of nagios3 on ubuntu 12.04. After I log in to the web interface and click any link I get the error: Error: Could not read object configuration data! Here are some things you should check in order to resolve this error: Verify configuration options using the -v command-line option to check for errors. Check the Nagios log file for messages relating to startup or status data errors. I ran it with the -v option and it reported no errors: Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check The nagios log and apache error log and debug log all have nothing regarding this. Does anyone know how to turn on logging that will give me some kind of useful error? Or if anyone knows how to fix this specific problem without additional logging, I guess that's okay too. Thanks!

Read the article
Linux Port 80 to redirect to a Windows box

- by Richard Staehler

I have 2 servers here at work. One is a Windows 2008 Server R2 (for safety's sake, lets use 192.168.1.100) and the other is a Fedora 14 (192.168.1.101). Currently when you hit our subdomain, x.test.com, our routers tell it to go to our Fedora box, and since Apache is installed and listening to port 80, it displays the Fedora Apache Test Page. It's obvious that I don't use port 80 for this machine, however I do use NAGIOS on it and its always nice to be able to access that from anywhere in the world. So when I want to access it, I just type x.test.com/nagios. Now here comes the dilemma.... On the Windows R2 box, we recently have installed a program that requires us to setup a web server using IIS7. Because of this application, I'm going to be creating a new subdomain called y.test.com, but since we only have 1 WAN/router, it will still get pointed to our Fedora box. That being said, it wants to use port 80 as well (or whatever port I damn well wish to assign it). So my question is: since our router is pointing to the Fedora 14 box (.101), and I want to make sure I can access NAGIOS from anywhere in the world, how do I tell Apache (httpd) to redirect port 80 to the other server (.100)? If not possible, what are my other options? I have rinetd installed on Fedora and have even tried the option 192.168.1.101 80 192.168.1.100 80 and it didn't seem to work "because port 80 was already bound" Thoughts? and Thanks!

Read the article
Monitoring / metric collection for system collectives that change a lot in time (a.k.a. cloud)

- by Florin Andrei

When your server fleet doesn't change a lot in time, like when you're using bare-metal hosting, classic monitoring and metric collection solutions (Nagios, Munin) work well. But if the number of systems varies a lot in time, and may in fact vary rapidly, classic software is more difficult to setup and use. E.g., trying to make Nagios (monitoring) keep up with a rapidly evolving cloud infrastructure can be cumbersome. Same for Munin (metric collection). It's not just the configuration, but the way the information is conveyed to the user, or displayed, is inadequate for the cloud. What are some possible alternatives that work well with the cloud? The goals are to collect and display metrics (analog to Munin), and generate alerts when certain metrics go out of bounds or when certain services are unavailable (analog to Nagios), and do everything in a cloud-friendly manner. Some cloud providers offer monitoring / metric collection as services, but not always, and if you use more than one provider you don't want to become too dependent of just one vendor. So provider-independent solutions are required. EDIT: I am asking this question in a general fashion - not limited to any given cloud infrastructure (like OpenStack), but in the general case of using arbitrary cloud providers.

Read the article
How to manage iowait over cifs?

- by Silvia

For backup purposes we have Cifs file Server running that contains encrypted containers for backing up the more sensitive data. The container is mounted with cryptsetup and loop as a local filesystem and the rsync is used for backups. Because the Cifs server is not the fastest machine ever built, running the rsync process results in an iowait on the servers running the backup which in turn drives Nagios into an email frenzy. The question is, how do reduce the iowait on the server? Configuring Nagios to not report seems more like a workaround then a solution. Stretching the backups over different time intervals is already done with little effect and spending money is also not an option because apparently, we are talking about a "non-critical system".

Read the article
System Monitoring Redundancy

- by Josh Brower

I consult in a small business environment where I have two HyperV hosts (with <10 VMs) + a couple other servers. I recently had an issue where one of the HyperV hosts had a CPU issue and it came down, bringing most of my non-critical VMs with it, plus a free piece of software that I use for network & system monitoring and availability. Because of this, and the fact that iDRAC locked up to, I did not get any alerts about the crash. So I am wondering how I can (cheaply) get a redundant availability monitoring system in place--Is is as simple as running Nagios or Zenoss (or whatever) on two different HyperV hosts? It just seems like running more than one copy of Nagios/Zenoss/etc could be expensive and have high overhead. Thoughts? Thanks! -Josh

Read the article
Monitor a log file on Linux and send each line to another program

- by mlambie

I run an apt-cacher-ng server on Ubuntu Linux which writes logs in the following format: 1299745593|O|149406|XXX.XXX.XXX.XXX|uburep/pool/main/t/tiff/libtiff4_3.9.2-2ubuntu0.4_amd64.deb 1299745593|O|10154976|XXX.XXX.XXX.XXX|uburep/pool/main/l/linux-firmware/linux-firmware_1.34.4_all.deb 1299748529|O|39368|XXX.XXX.XXX.XXX|uburep/pool/main/n/nagios-nrpe/nagios-nrpe-server_2.12-4ubuntu1_amd64.deb 1300155440|O|680100|XXX.XXX.XXX.XXX|uburep/pool/main/t/tzdata/tzdata_2011c-0ubuntu0.10.04_all.deb It shows the timestamp, direction (in or out), byte count, IP and filename. Every time a line is written to it, I'd like to also send that line to another program. I will have this program insert the line into a database so that I can crunch some statistics about how much bandwidth we're saving through operating a caching server. I do not want to cat the log file every X minutes (via cron) looking for new entries as it'd be somewhat computationally uneconomical. Instead I'd prefer to have a daemon monitor the log, and when a change is detected, each line is sent to my database-insertion script. Will swatch achieve this, or are there better options?

Read the article
Free / Cached / Available memory on Linux

- by pkoraca

I have read that linux uses free memory for caching, to make system faster. However, both Nagios and Paessler PRTG monitoring system show me that my memory usage is critical. I could change Nagios mem_usage script to sum free and cached memory, but would that be correct information? I doubt that they misunderstood Linux memory usage. Lets say I have 8 GB RAM. 5 GB are used, 2 GB is cached, and I have 1 GB of free memory. Real available memory should be free+cached (3 GB)? If some new application would need additional 3 GB RAM, could it take everything from cache and free without using swap, or is there a minimum that should be in cache? Real example: $ cat /proc/meminfo MemTotal: 5984256 kB MemFree: 137052 kB Buffers: 140484 kB Cached: 3439616 kB SwapCached: 244 kB Active: 3148824 kB Inactive: 2341768 kB ... My monitoring tools show that I have 137 MB free RAM, however I have ~3,5 GB in Cache. Thanks!

Read the article
drupal bootstrap script: how to get list of all nodes of type x?

- by groovehunter

hi. I create a custom import and export, at the moment as an external script (via bootstrap), i plan to create a module in a more generic fashion lateron. I am building a frontend for nagios and for our host management and nagios configuration btw. Maybe it might become useful for other environments (networkmanagement) Now i need to know how to get list of all nodes of type x? I want to avoid direct SQL. A suggestion i got was to make an rss and parse it but i acess the drupal db a dozen times to extract various nodes, so it feels strange to do a web request for one thing So what i am looking for as newbie drupal dev is just a pointer to basic search module api for this task TIA florian

Read the article
Log transport and aggregation at scale

- by markdrayton

How're you analysing log files from UNIX/Linux machines? We run several hundred servers which all generate their own log files, either directly or through syslog. I'm looking for a decent solution to aggregate these and pick out important events. This problem breaks down into 3 components: 1) Message transport The classic way is to use syslog to log messages to a remote host. This works fine for applications that log into syslog but less useful for apps that write to a local file. Solutions for this might include having the application log into a FIFO connected to a program to send the message using syslog, or by writing something that will grep the local files and send the output to the central syslog host. However, if we go to the trouble of writing tools to get messages into syslog would we be better replacing the whole lot with something like Facebook's Scribe which offers more flexibility and reliability than syslog? 2) Message aggregation Log entries seem to fall into one of two types: per-host and per-service. Per-host messages are those which occur on one machine; think disk failures or suspicious logins. Per-service messages occur on most or all of the hosts running a service. For instance, we want to know when Apache finds an SSI error but we don't want the same error from 100 machines. In all cases we only want to see one of each type of message: we don't want 10 messages saying the same disk has failed, and we don't want a message each time a broken SSI is hit. One approach to solving this is to aggregate multiple messages of the same type into one on each host, send the messages to a central server and then aggregate messages of the same kind into one overall event. SER can do this but it's awkward to use. Even after a couple of days of fiddling I had only rudimentary aggregations working and had to constantly look up the logic SER uses to correlate events. It's powerful but tricky stuff: I need something which my colleagues can pick up and use in the shortest possible time. SER rules don't meet that requirement. 3) Generating alerts How do we tell our admins when something interesting happens? Mail the group inbox? Inject into Nagios? So, how're you solving this problem? I don't expect an answer on a plate; I can work out the details myself but some high-level discussion on what is surely a common problem would be great. At the moment we're using a mishmash of cron jobs, syslog and who knows what else to find events. This isn't extensible, maintainable or flexible and as such we miss a lot of stuff we shouldn't. Updated: we're already using Nagios for monitoring which is great for detected down hosts/testing services/etc but less useful for scraping log files. I know there are log plugins for Nagios but I'm interested in something more scalable and hierarchical than per-host alerts.

Read the article
Production monitoring for EC2 instances

- by Janine

I'm setting up my first production instance on EC2 and want to make sure I have all necessary monitoring in place. There are three different types of things I want to monitor: Is the instance running? EC2 instances can be terminated without warning if the underlying hardware fails, and as far as I know they aren't automatically restarted. So if not, start it back up. Is UNIX running properly? This is the usual stuff about CPU load, disk space, etc. Is the website responding? If not, restart it. I initially set up Nagios on a physical server outside the cloud, but it is really only helpful for item 2. It can tell me if the instance is gone or if the website is not responding, but as far as I can tell it can't execute any commands to fix the situation. My Googling on this subject has yielded a plethora of options - Cacti, Monit, God, Ganglia, and probably more I'm forgetting now. I don't have time to research them all. I am aware of Amazon's Cloudwatch but it doesn't seem to do anything that my Nagios installation doesn't already do. If you already have something like this in place, can you please share what has worked well for you?

Read the article
Monitoring AWS Systems Behind ElasticBeanStalk

- by A. Avadis

So I'm getting a company set up in the Amazon Cloud -- creating IAAS protocol/solutions/standardized implementation, etc while also being the SysAdmin for individual systems, app environments, and day-to-day uptime. One of the biggest issues I'm having is tracking various system/application logs, as well as logging/monitoring/archiving system metrics like memory usage, cpu usage, etc etc In a centralized fashion. E.g. -- Nagios + Urchin. The BIGGEST impediment to my endeavors is the following: The company application is deployed in the form of a Java *.WAR file, uploaded to an Elastic BeanStalk application environment, load balancing and auto-scaling between 3(min) and 10(max) servers, and the EC2's that run the application are fired up and disposed of ad-hoc. That is to say, I can't monitor the individual EC2's for very long because so many are being terminated then auto-provisioned/auto-scaled on the fly -- so I'd constantly be having to "monitor what I'm monitoring", and continuously remove/add EC2 machine addresses to my monitoring lists. IS there some sort of way to use monitoring tools like Zabbix or Nagios to monitor the ElasticBeanStalk, and have it automatically add on new EC2's, and remove terminated/failed EC2's from its monitoring list automatically? Furthermore, is there anything I can do with GrayLog to achieve similar results with the aggregation/centralization of my application logs from multiple EC2 instances into ONE consolidated set of logs/events? If not GrayLog, is there ANYTHING LIKE GrayLog that can automatically detect what EC2 members are being added/removed from the environment, and collect the logs from them automatically? Any and all advice or direction is appreciated. Thanks much, and cheers!!

Read the article
NFS-shared file-system is locking up

- by fredden

Our NFS-shared file-system is locking up. Please feel free to ask any questions you feel relevant. :) At the time, there are a lot of processes in "disk sleep" state, and the load averages on our machines sky-rocket. The machines are responsive on SSH, but our the majority of our websites (apache+mod_php) just hang, as does our email system (exim+dovecot). Any websites which don't require write access to the file-system continue to operate. The load averages continue to rise until some kind of time-out is reached, but for at least 10-15 minutes. I've seen load averages over 800, yet the machines are still responsive for actions which don't require writing to the shared file-system. I've been investigating a variety of options, which have all turned out to be red-herrings: nagios, proftpd, bind, cron tasks. I'm seeing these messages in the file server's system log: Jul 30 09:37:17 fs0 kernel: [1810036.560046] statd: server localhost not responding, timed out Jul 30 09:37:17 fs0 kernel: [1810036.560053] nsm_mon_unmon: rpc failed, status=-5 Jul 30 09:37:17 fs0 kernel: [1810036.560064] lockd: cannot monitor node2 Jul 30 09:38:22 fs0 kernel: [1810101.384027] statd: server localhost not responding, timed out Jul 30 09:38:22 fs0 kernel: [1810101.384033] nsm_mon_unmon: rpc failed, status=-5 Jul 30 09:38:22 fs0 kernel: [1810101.384044] lockd: cannot monitor node0 Software involved: VMWare, Debian lenny (64bit), ancient Red Hat (32 bit) (version 7 I believe), Debian etch (32bit) NFS, apache2+mod_php, exim, dovecot, bind, amanda, proftpd, nagios, cacti, drbd, heartbeat, keepalived, LVS, cron, ssmtp, NIS, svn, puppet, memcache, mysql, postgres Joomla!, Magento, Typo3, Midgard, Symfony, custom php apps

Read the article
pdns-recursor allocates resources to non-existing queries

- by azzid

I've got a lab-server running pdns-recursor. I set it up to experiment with rate limiting, so it has been resolving requests openly from the whole internet for weeks. My idea was that sooner or later it would get abused, giving me a real user case to experiment with. To keep track of the usage I set up nagios to monitor the number of concurrent-queries to the server. Today I got notice from nagios that my specified limit had been reached. I logged in to start trimming away the malicious questions I was expecting, however, when I started looking at it I couldn't see the expected traffic. What I found is that even though I have over 20 concurrent-queries registered by the server I see no requests in the logs. The following command describes the situation well: $ sudo rec_control get concurrent-queries; sudo rec_control top-remotes 22 Over last 0 queries: How can there be 22 concurrent-queries when the server has 0 queries registered? EDIT: Figured it out! To get top-remotes working I needed to set ################################# # remotes-ringbuffer-entries maximum number of packets to store statistics for # remotes-ringbuffer-entries=100000 It defaults to 0 storing no information to base top-remotes statistics on.

Read the article
Is there a monitoring software suite that will alert me if it has received no activity in a time period?

- by matt b

This might be a very basic question, but I am not very familiar with the exact features of Nagios versus Munin versus other monitoring tools. Let's say we have a process that needs to run daily for some very important infrastructure reasons. We've had cases where the process did not run or was otherwise down for a number of days before anyone noticed. I'd like to set up a system that will enable me to easily know when the daily run did not take place for some reason. I can set up this process to send an email on every successful run (or every failed run), but I do not trust that the people receiving this email would notice an absence of an "I'm OK" message. What I am envisioning is some type of "tripwire" service which this V.I.P. (very-important-process) can send a status message to each time it runs, whether successfully or not; and if the "tripwire" service has not received any word from the VIP within a configurable amount of time, it can then send an alert to someone. (The difference between what I envision and the first approach I outlined is a service that sends a message only in abnormal conditions, rather than a service that sends messages each day that the status is normal/OK). Can Nagios be set up to send an alert like this, if it has not heard from a certain service/device/process in N days? Is there another tool out there which does have this feature?

Read the article
What should I use to ping multiple IPs and get notified of time outs?

- by HumanVirus

I've been using MultiPing to ping hundreds of IPs (from access points and such) and check their performance (packet loss, latency) and uptime. The program is very easy to use, but I was wondering if someone could recommend me something that would work better and that would also work in Linux. The features I'm looking for are: Notification Types: At least desktop notifications and SMS, but it would be great if it also had e-mail, IM, or other types of notifications. (MultiPing has some of these, but they don't work too well.) Being notified about the root problem only: Since some devices are dependent on others, I'd like to be notified only about the root problem. E.g. Let's say I have A[x.x.x.222]B[x.x.x.33C[x.x.x.44]D[x.x.x.55], and B goes down, therefore C and D will also be down. Is it possible to get a notification only about B being down? Light on resources. Ideally multiplatform or at least available for both Linux and Windows. I've heard about Nagios and Shinken being used for monitoring. Would you recommend that I use something of the sort or would that be too much for my needs? If using Nagios, Shinken, or similar software is recommended, can anyone tell me what sites I should go to or what books I should get that would be good for someone who is totally new at this? I'd appreciate any suggestions.

Read the article

Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 10/14 | < Previous Page | 6 7 8 9 10 11 12 13 14 | Next Page >

- by nutznboltz

- by Stefan Lasiewski

- by Matt Simmons

- by OrangeGrover

- by Matt Simmons

- by luckytaxi

- by hortitude

- by Stefan Lasiewski

- by HannesFostie

- by David Parks

- by user1493730

- by Richard Staehler

- by Florin Andrei

- by Silvia

- by Josh Brower

- by mlambie

- by pkoraca

- by groovehunter

- by markdrayton

- by Janine

- by A. Avadis

- by fredden

- by azzid

- by matt b

- by HumanVirus

< Previous Page | 6 7 8 9 10 11 12 13 14 | Next Page >