Disambiguating Named Entities in Java

Posted by Alterscape on Stack Overflow See other posts from Stack Overflow or by Alterscape
Published on 2010-06-09T15:18:07Z Indexed on 2010/06/09 15:22 UTC
Read the original article Hit count: 199

I have a list of strings (company names, in this case), and a Java program that extracts a list of things that look like company names out of mostly-unstructured text. I need to match each element of extracted text to a string in the list. Caveat: the unstructured text has typos, things like "Blah, Inc." referred to as "Blah," etc. I've tried Levenshtein Edit Distance, but that fails for predictable reasons. Are there known best-practices ways of tackling this problem? Or am I back to manual data-entry?

© Stack Overflow or respective owner

Related posts about java

Related posts about named-entity-recognition