I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in a C++ desktop application. Hence, I’m looking for a fast full-text search solution. Ideally, it should:
Skip common words, such as the, of, and, etc.
Support stemming, i.e. search for run also finds documents containing runner, running and ran.
Be able to update its index in the background as new documents are added to the database.
Be able to provide search word suggestions (like Google Suggest)
To illustrate, assume the database has just two documents:
Document 1: This is a test of text search.
Document 2: Testing is fun.
The following words should be in the index: fun, search, test, testing, text. If the user types t in the search box, I want the application to be able to suggest test, testing and text (Ideally, the application should be able to query the search engine for the 10 most common search words starting with t). A search for testing should return both documents.
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).