Faceted search with Solr on Windows

Posted by Dr.NETjes on ASP.net Weblogs See other posts from ASP.net Weblogs or by Dr.NETjes
Published on Tue, 28 Dec 2010 23:12:00 GMT Indexed on 2010/12/29 1:54 UTC
Read the original article Hit count: 830

Filed under:
|
|
|

With over 10 million hits a day, funda.nl is probably the largest ASP.NET website which uses Solr on a Windows platform.

While all our data (i.e. real estate properties) is stored in SQL Server, we're using Solr 1.4.1 to return the faceted search results as fast as we can.
And yes, Solr is very fast. We did do some heavy stress testing on our Solr service, which allowed us to do over 1,000 req/sec on a single 64-bits Solr instance; and that's including converting search-url's to Solr http-queries and deserializing Solr's result-XML back to .NET objects!

Let me tell you about faceted search and how to integrate Solr in a .NET/Windows environment. I'll bet it's easier than you think :-)

What is faceted search?

Faceted search is the clustering of search results into categories, allowing users to drill into search results. By showing the number of hits for each facet category, users can easily see how many results match that category.

If you're still a bit confused, this example from CNET explains it all:



The SQL solution for faceted search

Our ("pre-Solr") solution for faceted search was done by adding a lot of redundant columns to our SQL tables and doing a COUNT(...) for each of those columns:

 

So if a user was searching for real estate properties in the city 'Amsterdam', our facet-query would be something like:

SELECT COUNT(hasGarden), COUNT(has2Bathrooms), COUNT(has3Bathrooms), COUNT(etc...)
FROM Houses
WHERE city = 'Amsterdam'

While this solution worked fine for a couple of years, it wasn't very easy for developers to add new facets. And also, performing COUNT's on all matched rows only performs well if you have a limited amount of rows in a table (i.e. less than a million).

Enter Solr

"Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface." (quoted from Wikipedia's page on Solr)

Solr isn't a database, it's more like a big index. Every time you upload data to Solr, it will analyze the data and create an inverted index from it (like the index-pages of a book). This way Solr can lookup data very quickly. To explain the inner workings of Solr is beyond the scope of this post, but if you want to learn more, please visit the Solr Wiki pages.

Getting faceted search results from Solr is very easy; first let me show you how to send a http-query to Solr:

   http://localhost:8983/solr/select?q=city:Amsterdam

This will return an XML document containing the search results (in this example only three houses in the city of Amsterdam):

   <response>
  
   <result name="response" numFound="3" start="0">
   
      <doc>
    
        <long name="id">3203</long>
     
       <str name="city">Amsterdam</str>
     
       <str name="steet">Keizersgracht</str>
     
       <int name="numberOfBathrooms">2</int>
     
   </doc>
     
   <doc>
      
      <long name="id">3205</long>
      
      <str name="city">Amsterdam</str>
      
      <str name="steet">Vondelstraat</str>
    
        <int name="numberOfBathrooms">3</int>
    
     </doc>
   
      <doc>
     
       <long name="id">4293</long>
      
      <str name="city">Amsterdam</str>
      
      <str name="steet">Wibautstraat</str>
      
      <int name="numberOfBathrooms">2</int>
    
     </doc>
 
     </result>
   </response>

By adding a facet-querypart for the field "numberOfBathrooms", Solr will return the facets for this particular field. We will see that there's one house in Amsterdam with three bathrooms and two houses with two bathrooms.

   http://localhost:8983/solr/select?q=city:Amsterdam&facet=true&facet.field=numberOfBathrooms

The complete XML response from Solr now looks like:

   <response>
      <result name="response" numFound="3" start="0">
         <doc>
            <long name="id">3203</long>
            <str name="city">Amsterdam</str>
            <str name="steet">Keizersgracht</str>
            <int name="numberOfBathrooms">2</int>
         </doc>
         <doc>
            <long name="id">3205</long>
            <str name="city">Amsterdam</str>
            <str name="steet">Vondelstraat</str>
            <int name="numberOfBathrooms">3</int>
         </doc>
         <doc>
            <long name="id">4293</long>
            <str name="city">Amsterdam</str>
            <str name="steet">Wibautstraat</str>
            <int name="numberOfBathrooms">2</int>
         </doc>
      </result>
  
   <lst name="facet_fields">
     
   <lst name="numberOfBathrooms">
        
   <int name="2">2</int>
        
   <int name="3">1</int>
     
   </lst>
  
   </lst>
   </response>

Trying Solr for yourself

To run Solr on your local machine and experiment with it, you should read the Solr tutorial. This tutorial really only takes 1 hour, in which you install Solr, upload sample data and get some query results. And yes, it works on Windows without a problem.

Note that in the Solr tutorial, you're using Jetty as a Java Servlet Container (that's why you must start it using "java -jar start.jar"). In our environment we prefer to use Apache Tomcat to host Solr, which installs like a Windows service and works more like .NET developers expect. See the SolrTomcat page.

Some best practices for running Solr on Windows:

  • Use the 64-bits version of Tomcat. In our tests, this doubled the req/sec we were able to handle!
  • Use a .NET XmlReader to convert Solr's XML output-stream to .NET objects. Don't use XPath; it won't scale well.
  • Use filter queries ("fq" parameter) instead of the normal "q" parameter where possible. Filter queries are cached by Solr and will speed up Solr's response time (see FilterQueryGuidance)

In my next post I’ll talk about how to keep Solr's indexed data in sync with the data in your SQL tables. Timestamps / rowversions will help you out here!

© ASP.net Weblogs or respective owner

Related posts about ASP.NET

Related posts about solr