How to use NGramTokenizerFactory or NGramFilterFactory?

Posted by user572485 on Stack Overflow See other posts from Stack Overflow or by user572485
Published on 2011-01-12T09:51:11Z Indexed on 2011/01/12 9:53 UTC
Read the original article Hit count: 192

Filed under:
|
|
|

Hi,

Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned.

I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word.

I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me?

Thanks!

© Stack Overflow or respective owner

Related posts about lucene

Related posts about solr