markdrayton - Developer IT

Search Results

Search found 2 results on 1 pages for 'markdrayton'.

Page 1/1 | 1

Log transport and aggregation at scale

- by markdrayton

How're you analysing log files from UNIX/Linux machines? We run several hundred servers which all generate their own log files, either directly or through syslog. I'm looking for a decent solution to aggregate these and pick out important events. This problem breaks down into 3 components: 1) Message transport The classic way is to use syslog to log messages to a remote host. This works fine for applications that log into syslog but less useful for apps that write to a local file. Solutions for this might include having the application log into a FIFO connected to a program to send the message using syslog, or by writing something that will grep the local files and send the output to the central syslog host. However, if we go to the trouble of writing tools to get messages into syslog would we be better replacing the whole lot with something like Facebook's Scribe which offers more flexibility and reliability than syslog? 2) Message aggregation Log entries seem to fall into one of two types: per-host and per-service. Per-host messages are those which occur on one machine; think disk failures or suspicious logins. Per-service messages occur on most or all of the hosts running a service. For instance, we want to know when Apache finds an SSI error but we don't want the same error from 100 machines. In all cases we only want to see one of each type of message: we don't want 10 messages saying the same disk has failed, and we don't want a message each time a broken SSI is hit. One approach to solving this is to aggregate multiple messages of the same type into one on each host, send the messages to a central server and then aggregate messages of the same kind into one overall event. SER can do this but it's awkward to use. Even after a couple of days of fiddling I had only rudimentary aggregations working and had to constantly look up the logic SER uses to correlate events. It's powerful but tricky stuff: I need something which my colleagues can pick up and use in the shortest possible time. SER rules don't meet that requirement. 3) Generating alerts How do we tell our admins when something interesting happens? Mail the group inbox? Inject into Nagios? So, how're you solving this problem? I don't expect an answer on a plate; I can work out the details myself but some high-level discussion on what is surely a common problem would be great. At the moment we're using a mishmash of cron jobs, syslog and who knows what else to find events. This isn't extensible, maintainable or flexible and as such we miss a lot of stuff we shouldn't. Updated: we're already using Nagios for monitoring which is great for detected down hosts/testing services/etc but less useful for scraping log files. I know there are log plugins for Nagios but I'm interested in something more scalable and hierarchical than per-host alerts.

Read the article

Concatenating gziped Apache logs

- by markdrayton

We rotate and compress our Apache logs each day but it's become apparent that this isn't frequently enough. An uncompressed log is about 6G, which is getting close to filling our log partition (yep, we'll make it bigger in the future!) as well as taking a lot of time and CPU to compress each day. We have to produce a gziped log for each day for our stats processing. Obviously we could move our logs to a partition with more space but I also want to spread the compression overhead throughout the day. Using Apache's rotatelogs we can rotate and compress the log more often -- hourly, say -- but how can I concatenate all the hourly compressed logs into a running compressed log for the day, without decompressing the previous logs? I don't want to uncompress 24 hours' worth of data and recompress it because that has all the disadvantages of our current solution. Gzip doesn't seem to offer any append or concatenate option but perhaps I've missed something obvious. This question suggests straight shell concatenation "works" in that the archive can be decompressed but that gzip -l doesn't work seems a bit dodgy. Alternatively, perhaps this is still a bad way to do things. Other suggestions are welcome -- our only constraints are our relatively small log partitions and the need to provide a daily compressed log.