Lucene: Fastest way to return the document occurance of a phrase?

Posted by dont say the kid's name on Stack Overflow See other posts from Stack Overflow or by dont say the kid's name
Published on 2010-05-09T05:00:48Z Indexed on 2010/05/09 7:58 UTC
Read the original article Hit count: 201

Filed under:
|
|

Hi Guys,

I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts?

phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases...

countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True)
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)

for phrase in phraseList:
     query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"")
     scoreDocs = countsearcher.search(query, 200).scoreDocs
     print "count is: " + str(len(scoreDocs))

© Stack Overflow or respective owner

Related posts about lucene

Related posts about python