Are there libraries or techniques for collecting and weighing keywords from a block of text?

Posted by Soviut on Stack Overflow See other posts from Stack Overflow or by Soviut
Published on 2010-05-27T20:07:13Z Indexed on 2010/05/27 20:11 UTC
Read the original article Hit count: 167

I have a field in my database that can contain large blocks of text. I need to make this searchable but don't have the ability to use full text searching. Instead, on update, I want my business layer to process the block of text and extract keywords from it which I can save as searchable metadata. Ideally, these keywords could then be weighed based on the number of times they appear in the block of text. Naturally, words like "the", "and", "of", etc. should be discarded as they just add a lot of noise to the search.

Are there tools or libraries in Python that can do this filtering or should I roll my own?

© Stack Overflow or respective owner

Related posts about python

Related posts about search