Lucene: Wildcards are missing from index
        Posted  
        
            by Eleasar
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Eleasar
        
        
        
        Published on 2010-03-05T10:12:15Z
        Indexed on 
            2010/03/30
            9:03 UTC
        
        
        Read the original article
        Hit count: 479
        
Hi - i am building a search index that contains special names - containing ! and ? and & and + and ... I have to tread the following searches different:
me & you
me + you
But whatever i do (did try with queryparser escaping before indexing, escaped it manually, tried different indexers...) - if i check the search index with Luke they do not show up (question marks and @-symbols and the like show up)
The logic behind is that i am doing partial searches for a live suggestion (and the fields are not that large) so i split it up into "m" and "me" and "+" and "y" and "yo" and "you" and then index it (that way it is way faster than a wildcard query search (and the index size is not a big problem).
So what i would need is to also have this special wildcard characters be inserted into the index.
This is my code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using Lucene.Net.Analysis;
using Lucene.Net.Util;
namespace AnalyzerSpike
{
    public class CustomAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            return new ASCIIFoldingFilter(new LowerCaseFilter(new CustomCharTokenizer(reader)));
        }
    }
    public class CustomCharTokenizer : CharTokenizer
    {
        public CustomCharTokenizer(TextReader input) : base(input)
        {
        }
        public CustomCharTokenizer(AttributeSource source, TextReader input) : base(source, input)
        {
        }
        public CustomCharTokenizer(AttributeFactory factory, TextReader input) : base(factory, input)
        {
        }
        protected override bool IsTokenChar(char c)
        {
            return c != ' ';
        }
    }
}
The code to create the index:
private void InitIndex(string path, Analyzer analyzer)
{
    var writer = new IndexWriter(path, analyzer, true);
    //some multiline textbox that contains one item per line:
    var all = new List<string>(txtAllAvailable.Text.Replace("\r","").Split('\n'));
    foreach (var item in all)
    {
        writer.AddDocument(GetDocument(item));
    }
    writer.Optimize();
    writer.Close();
}
private static Document GetDocument(string name)
{
    var doc = new Document();
    doc.Add(new Field(
        "name",
        DeNormalizeName(name),
        Field.Store.YES,
        Field.Index.ANALYZED));
    doc.Add(new Field(
                "raw_name",
                name,
                Field.Store.YES,
                Field.Index.NOT_ANALYZED));
    return doc;
}
(Code is with Lucene.net in version 1.9.x (EDIT: sorry - was 2.9.x) but is compatible with Lucene from Java)
Thx
© Stack Overflow or respective owner