Architecture for database analytics

Posted by David Cournapeau on Stack Overflow See other posts from Stack Overflow or by David Cournapeau
Published on 2010-04-21T06:52:34Z Indexed on 2010/04/21 8:53 UTC
Read the original article Hit count: 210

Hi,

We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.

The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:

  • as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
  • tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
  • this is slow and non-"agile"

Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

© Stack Overflow or respective owner

Related posts about database

Related posts about analytics