How can I parse free text (Twitter tweets) against a large database of values?

Posted by user136416 on Stack Overflow See other posts from Stack Overflow or by user136416
Published on 2010-05-16T12:30:38Z Indexed on 2010/05/16 12:40 UTC
Read the original article Hit count: 136

Filed under:
|

Hi there

Suppose I have a database containing 500,000 records, each representing, say, an animal. What would be the best approach for parsing 140 character tweets to identify matching records by animal name? For instance, in this string...

"I went down to the woods to day and couldn't believe my eyes: I saw a bear having a picnic with a squirrel."

... I would like to flag up the words "bear" and "squirrel", as they appear in my database.

This strikes me as a problem that has probably been solved many times, but from where I'm sitting it looks prohibitively intensive - iterating over every db record checking for a match in the string is surely a crazy way to do it.

Can anyone with a comp sci degree put me out of my misery? I'm working in C# if that makes any difference. Cheers!

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about c#