Developing an analytics's system processing large amounts of data - where to start

Posted by Ryan on Programmers See other posts from Programmers or by Ryan
Published on 2012-07-10T19:00:31Z Indexed on 2012/07/10 21:24 UTC
Read the original article Hit count: 369

Imagine you're writing some sort of Web Analytics system - you're recording raw page hits along with some extra things like tagging cookies etc and then producing stats such as

  • Which pages got most traffic over a time period
  • Which referers sent most traffic
  • Goals completed (goal being a view of a particular page)
  • And more advanced things like which referers sent the most number of vistors who later hit a goal.

The naieve way of approaching this would be to throw it in a relational database and run queries over it - but that won't scale.

You could pre-calculate everything (have a queue of incoming 'hits' and use to update report tables) - but what if you later change a goal - how could you efficiently re-calculate just the data that would be effected.

Obviously this has been done before ;) so any tips on where to start, methods & examples, architecture, technologies etc.

© Programmers or respective owner

Related posts about architecture

Related posts about data