Architecture for analysing search result impressions/clicks to improve future searches

Posted by Hais on Stack Overflow See other posts from Stack Overflow or by Hais
Published on 2011-06-29T07:26:38Z Indexed on 2011/06/29 8:22 UTC
Read the original article Hit count: 169

Filed under:
|
|
|
|

We have a large database of items (10m+) stored in MySQL and intend to implement search on metadata on these items, taking advantage of something like Sphinx. The dataset will be changing slightly on a daily basis so Sphinx will be re-indexing daily.

However we want the algorithm to self-learn and improve search results by analysing impression and click data so that we provide better results for our customers on that search term, and possibly other similar search terms too.

I've been reading up on Hadoop and it seems like it has the potential to crunch all this data, although I'm still unsure how to approach it. Amazon has tutorials for compiling impression vs click data using MapReduce but I can't see how to get this data in a useable format.

My idea is that when a search term comes in I query Sphinx to get all the matching items from the dataset, then query the analytics (compiled on an hourly basis or similar) so that we know the most popular items for that search term, then cache the final results using something like Memcached, Membase or similar.

Am I along the right lines here?

© Stack Overflow or respective owner

Related posts about search

Related posts about hadoop