How to setup Lucene search for a B2B web app?

Posted by Bill Paetzke on Stack Overflow See other posts from Stack Overflow or by Bill Paetzke
Published on 2010-04-25T03:34:23Z Indexed on 2010/04/25 3:43 UTC
Read the original article Hit count: 335

Given:

  • 5000 databases (spread out over 5 servers)
  • 1 database per client (so you can infer there are 1000 clients)
  • 2 to 2000 users per client (let's say avg is 100 users per client)
  • Clients (databases) come and go every day (let's assume most remain for at least one year)
  • Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support

The Question:

How would you setup Lucene search so that each client can only search within its database?

How would you setup the index(es)?
Would you need to add a filter to all search queries?
If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet)

Possible Solutions:

Make an index for each client (database)

  • Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data.
  • Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope.

Have a single, gigantic index with a database_name field. Always include database_name as a filter.

  • Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info.
  • Con: Search is slower (than index-per-client method). Flawed security if query filter removed.

For Example:

Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database.

His situation is quite similar to mine. Although, he didn't elaborate on the setup (particularly indices); hence, the need for this question.

One last thing:
I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

© Stack Overflow or respective owner

Related posts about lucene

Related posts about solr