Lucene stop words not removed during searching need a substitute for AnalyzingQueryParser

Posted by iamrohitbanga on Stack Overflow See other posts from Stack Overflow or by iamrohitbanga
Published on 2010-05-17T15:38:51Z Indexed on 2010/05/17 18:50 UTC
Read the original article Hit count: 567

Filed under:

lucene

|

stop-words

|

porter-stemmer

|

analyzer

I have created a Lucene index with the following analyzer.

public class DocSpecAnalyzer extends Analyzer {
   private static CharArraySet stopSet;// = new HashSet<String>(Arrays.asList());//STOP_WORDS_SET;
   static {
     stopSet = new CharArraySet(FDConstants.stopwords, true);
// uncommenting this displays all the stop words
//      for (String s: FDConstants.stopwords) {
//          System.out.println(s);
//      }
    }

  /**
   * Specifies whether deprecated acronyms should be replaced with HOST type.
   * See {@linkplain https://issues.apache.org/jira/browse/LUCENE-1068}
   */
  private final boolean enableStopPositionIncrements;

  private final Version matchVersion;

  public DocSpecAnalyzer(Version matchVersion) {
    this.matchVersion = matchVersion;
    enableStopPositionIncrements =         
    StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion);
  }


  public TokenStream tokenStream(String fieldName, Reader reader) {

     StandardTokenizer tokenStream = new StandardTokenizer(matchVersion, reader);
     tokenStream.setMaxTokenLength(DEFAULT_MAX_TOKEN_LENGTH);
     TokenStream result = new StandardFilter(tokenStream);

     result = new LowerCaseFilter(result);
     result = new StopFilter(enableStopPositionIncrements, result, stopSet);
     result = new PorterStemFilter(result);

     return result;
   }

   /** Default maximum allowed token length */
       public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;

}

Now when I search for documents for a query containing stop words, i get hits for stop words also.

It is because of http://lucene.apache.org/java/2_9_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html not handling stop words. Is there a substitute?

Update: forgot to mention that I need to do a fuzzy search. that is why i am using an AnalyzingQueryParser.

Update portion of code that invokes AnalyzingQueryParser

        AnalyzingQueryParser parser = new AnalyzingQueryParser(Version.LUCENE_CURRENT,"description", analyzer);

        // fuzzy matching preparation
        String fuzzyStr = TextQuery.prepareFuzzy(tq.text, fuzzyDist);

        Query query = parser.parse(fuzzyStr);
        TopScoreDocCollector collector = TopScoreDocCollector.create(numHits, true);
        searcher.search(query, collector);

Developer IT

Lucene stop words not removed during searching need a substitute for AnalyzingQueryParser - Developer IT

Lucene stop words not removed during searching need a substitute for AnalyzingQueryParser

lucene

stop-words

porter-stemmer

analyzer

Related posts about lucene

performance comparision between Zend Lucene and Java Lucene

Why wasn't fast-vector-highlighter (lucene-contrib) made an official part of Lucene 3.0 core

pylucene: install error

Solr WordDelimiterFilter + Lucene Highlighter

java AbstractMethodError

Related posts about stop-words

Lucene stop words not removed during searching need a substitute for AnalyzingQueryParser

Lucene stop words not removed during searching

Making profit - Adsense contains too many stopwords

Removing stopwords,but should return as a line

How to get the top keys from a hash by value

Categories cloud