Architecture for analysing search result impressions/clicks to improve future searches

Posted by Hais on Stack Overflow See other posts from Stack Overflow or by Hais
Published on 2011-06-29T07:26:38Z Indexed on 2011/06/29 8:22 UTC
Read the original article Hit count: 238

Filed under:

search

|

hadoop

|

sphinx

|

mapreduce

|

analytics

We have a large database of items (10m+) stored in MySQL and intend to implement search on metadata on these items, taking advantage of something like Sphinx. The dataset will be changing slightly on a daily basis so Sphinx will be re-indexing daily.

However we want the algorithm to self-learn and improve search results by analysing impression and click data so that we provide better results for our customers on that search term, and possibly other similar search terms too.

I've been reading up on Hadoop and it seems like it has the potential to crunch all this data, although I'm still unsure how to approach it. Amazon has tutorials for compiling impression vs click data using MapReduce but I can't see how to get this data in a useable format.

My idea is that when a search term comes in I query Sphinx to get all the matching items from the dataset, then query the analytics (compiled on an hourly basis or similar) so that we know the most popular items for that search term, then cache the final results using something like Memcached, Membase or similar.

Am I along the right lines here?

© Stack Overflow or respective owner

Related posts about search

"Error in the Site Data Web Service." when performing crawl

as seen on Server Fault - Search for 'Server Fault'
Installed SharePoint Services v3 (SP2, october 2009 cumulative updates, Language Pack), attached to a content database I had previously (all works). Installed Search server 2008 Express (with language pack) on top of WSS and crawl does not work. However it works for newly created web application +… >>> More
Search Alternative Search Engines from within Bing’s Search Page

as seen on How to geek - Search for 'How to geek'
So you love using Bing Search but may still be curious to see what another search engine will provide if used. Now you can search using another search engine from within the Bing Search page and enjoy numbered results using two simple user scripts. Note: These user scripts may also be added to other… >>> More
CONVERT(int, (datepart(month, @search)), (datepart(day, @search)), DateAdd(year, Years.Year - (datepart(year, @search)))

as seen on Stack Overflow - Search for 'Stack Overflow'
In the query the top part is getting all the years that will run in the stored procedure. Works fine But at first i just wanted to run the queries for yesterdays date for all the years, but now i realized i want the user to select a date that will be in a parameter @search Booked <= CONVERT(int… >>> More
Am?lioration du Search MOSS: synonyme et Best Bet - La gestion des synonymes dans MOSS Search

as seen on ASP-PHP.net - Search for 'ASP-PHP.net'
Le moteur de recherche de MOSS permet la configuration d'une liste de synonymes. Nous verrons donc dans cet article comment effectuer cette tache et ce que cela peut apporter ? vos utilisateurs. Nous verrons aussi comment automatiser un peu plus cette configuration par l'utilisation de code ou d'outils… >>> More
Utiliser un MOSS 2007 Search avec SPS 2003 - Comment utiliser un MOSS Search avec SPS Portail

as seen on ASP-PHP.net - Search for 'ASP-PHP.net'
Microsoft Office SharePoint Server 2007 (MOSS) fournit de nombreuses fonctionnalit?s qui ne sont pas disponibles sous SharePoint Portal Server 2003 (SPS). C'est particuli?rement vrai pour le moteur de recherche. Ce moteur de recherche peut pourtant ?tre utilis? sans attendre une ?volution du site… >>> More

Related posts about hadoop

prerequisites of learnig hadoop, can php developer learn hadoop without java experience [closed]

as seen on Programmers - Search for 'Programmers'
i am willing to learn hadoop as a Developer , but i am confused over the prerequisite of learning it.? is having a good experience in java programming very essential to learn hadoop? I have 4 years of experience in application development in LAMP. But i am not in touch with java programming as a part… >>> More
Hadoop hdfs namenode is throwing an error

as seen on Server Fault - Search for 'Server Fault'
Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG:… >>> More
Combining HBase and HDFS results in Exception in makeDirOnFileSystem

as seen on Server Fault - Search for 'Server Fault'
Introduction An attempt to combine HBase and HDFS results in the following: 2014-06-09 00:15:14,777 WARN org.apache.hadoop.hbase.HBaseFileSystem: Create Dir ectory, retries exhausted 2014-06-09 00:15:14,780 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java… >>> More
Problem compiling hive with ant

as seen on Stack Overflow - Search for 'Stack Overflow'
I compiling with Solaris 10 SPARC, jdk 1.6 from Sun, Ant 1.7.1 from OpenCSW. I have no problem running hadoop 0.17.2.1 However, I have problem compiling/integrating hive with the error 'cannot find symbol', although I followed the tutorial. I have the hive source code from SVN exactly from tutorial… >>> More
no namenode error in pseudo-mode

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm new to hadoop and is in learning phase. As per Hadoop Definitve guide, i have set up my hadoop in pseudo distributed mode and everything was working fine. I was even able to execute all the examples from chapter 3 yesterday. Today, when i rebooted my unix and tried to run start-dfs.sh and then… >>> More