string matching algorithms used by lucene

Posted by iamrohitbanga on Stack Overflow See other posts from Stack Overflow or by iamrohitbanga
Published on 2010-02-05T16:31:28Z Indexed on 2010/04/29 13:07 UTC
Read the original article Hit count: 432

Filed under:
|
|
|

i want to know the string matching algorithms used by Apache Lucene. i have been going through the index file format used by lucene given here. it seems that lucene stores all words occurring in the text as is with their frequency of occurrence in each document. but as far as i know that for efficient string matching it would need to preprocess the words occurring in the Documents.

example: search for "iamrohitbanga is a user of stackoverflow" (use fuzzy matching)

in some documents.

it is possible that there is a document containing the string "rohit banga"

to find that the substrings rohit and banga are present in the search string, it would use some efficient substring matching.

i want to know which algorithm it is. also if it does some preprocessing which function call in the java api triggers it.

© Stack Overflow or respective owner

Related posts about lucene

Related posts about string-matching