Lucene Search Returning Extra, Undesired Records

Posted by Brandon on Stack Overflow See other posts from Stack Overflow or by Brandon
Published on 2010-04-29T13:31:14Z Indexed on 2010/04/29 13:37 UTC
Read the original article Hit count: 193

Filed under:

c#

|

lucene

|

lucene.net

|

indexing

I have a Lucene index that contains a field called 'Name'.

I escape all special characters before inserting a value into my index using QueryParser.Escape(value).

In my example I have 2 documents with the following names respectively:

Test
Test (Test)

They get inserted into my index as such (I can confirm this using Luke):

[test]
[test] [\(test\)]

I insert these values as TOKENIZED and using the StandardAnalyzer.

When I perform a search, I use the QueryParser.Escape(searchString) against my search string input to escape special characters and then use the QueryParser with my 'Name' field and the StandardAnalyzer to perform my search.

When I perform a search for 'Test', I get back both documents in my index (as expected). However, when I perform a search for 'Test (Test)', I am getting back both documents still.

I realize that in both examples it matches on the 'test' term in the index, but I am confused in my 2nd example why it would not just pull back the document with the value of 'Test (Test)' because my search should create two terms:

[test] and [\(test\)]

I would imagine it would perform some sort of boolean operator where BOTH terms must match in that situation so I would get back just one record.

Is there something I am missing or a trick to make the search behave as desired?

© Stack Overflow or respective owner

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More

Related posts about lucene

performance comparision between Zend Lucene and Java Lucene

as seen on Stack Overflow - Search for 'Stack Overflow'
Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java. Just wondering How big the performance difference among these two, regarding to index building and data searching? Is it much more effective to let java create and rebuild index, and… >>> More
Why wasn't fast-vector-highlighter (lucene-contrib) made an official part of Lucene 3.0 core

as seen on Stack Overflow - Search for 'Stack Overflow'
I've read some Jira entries and they mentioned moving fast-vector-highlighter to core about a year ago but it never made it. Looking at the svn for contrib it seems incomplete. There are no tests for FastVectorHighlighter Documentation is lacking No samples anywhere on apache.org Anyone have… >>> More
pylucene: install error

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10. I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks. ICU not installed /usr/bin/python… >>> More
Solr WordDelimiterFilter + Lucene Highlighter

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect: Expected: 'test 1,500 this' Observed: 'test 11,500 this' I… >>> More
java AbstractMethodError

as seen on Stack Overflow - Search for 'Stack Overflow'
How to handle this error in lucene: java.lang.AbstractMethodError: org.apache.lucene.store.Directory.listAll()[Ljava/lang/String; at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) … >>> More