Fuzzy Search on Material Descriptions including numerical sizes & general descriptions of material t

Posted by Kyle on Stack Overflow See other posts from Stack Overflow or by Kyle
Published on 2010-01-08T16:54:55Z Indexed on 2010/05/10 8:04 UTC
Read the original article Hit count: 399

We're looking to provide a fuzzy search on an electrical materials database (i.e. conduit, cable, etc.). The problem is that, because of a lack of consistency across all material types, we could not split sizes into separate fields from the text description because some materials are rated by things other than size.

I've attempted a combination of a full text search & a SQL CLR implementation of the Levenshtein search algorithm (for assistance in ranking), but my results are a little funky (i.e. they are not sorting correctly due to improper ranking).

For example, if the search term is "3/4" ABCD Conduit", I'll might get back several irrelevant results in the following order:

1/2" Conduit 1/4" X 3/4" Cable 1/4" Cable Ties 3/4" DFC Conduit Tees 3/4" ABCD Conduit 3/4" Conduit

I believe I've nailed the problem down to the fact that these two search algorithms do not factor in the relevance of punctuation & numeric. That is, in such a search, I'd expect the size to take precedence over any fuzzy match on the rest of the description, but my results don't reflect that.

My question is: Can anyone recommend better search algorithms or different approaches that may be better suited for searching a combination of alphanumerics & punctuation characters?

© Stack Overflow or respective owner

Related posts about fuzzy-search

Related posts about algorithm