Are there libraries or techniques for collecting and weighing keywords from a block of text?

Posted by Soviut on Stack Overflow See other posts from Stack Overflow or by Soviut
Published on 2010-05-27T20:07:13Z Indexed on 2010/05/27 20:11 UTC
Read the original article Hit count: 206

Filed under:

python

|

search

|

parsing

|

full-text-search

|

keywords

I have a field in my database that can contain large blocks of text. I need to make this searchable but don't have the ability to use full text searching. Instead, on update, I want my business layer to process the block of text and extract keywords from it which I can save as searchable metadata. Ideally, these keywords could then be weighed based on the number of times they appear in the block of text. Naturally, words like "the", "and", "of", etc. should be discarded as they just add a lot of noise to the search.

Are there tools or libraries in Python that can do this filtering or should I roll my own?

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about search

"Error in the Site Data Web Service." when performing crawl

as seen on Server Fault - Search for 'Server Fault'
Installed SharePoint Services v3 (SP2, october 2009 cumulative updates, Language Pack), attached to a content database I had previously (all works). Installed Search server 2008 Express (with language pack) on top of WSS and crawl does not work. However it works for newly created web application +… >>> More
Search Alternative Search Engines from within Bing’s Search Page

as seen on How to geek - Search for 'How to geek'
So you love using Bing Search but may still be curious to see what another search engine will provide if used. Now you can search using another search engine from within the Bing Search page and enjoy numbered results using two simple user scripts. Note: These user scripts may also be added to other… >>> More
CONVERT(int, (datepart(month, @search)), (datepart(day, @search)), DateAdd(year, Years.Year - (datepart(year, @search)))

as seen on Stack Overflow - Search for 'Stack Overflow'
In the query the top part is getting all the years that will run in the stored procedure. Works fine But at first i just wanted to run the queries for yesterdays date for all the years, but now i realized i want the user to select a date that will be in a parameter @search Booked <= CONVERT(int… >>> More
Am?lioration du Search MOSS: synonyme et Best Bet - La gestion des synonymes dans MOSS Search

as seen on ASP-PHP.net - Search for 'ASP-PHP.net'
Le moteur de recherche de MOSS permet la configuration d'une liste de synonymes. Nous verrons donc dans cet article comment effectuer cette tache et ce que cela peut apporter ? vos utilisateurs. Nous verrons aussi comment automatiser un peu plus cette configuration par l'utilisation de code ou d'outils… >>> More
Utiliser un MOSS 2007 Search avec SPS 2003 - Comment utiliser un MOSS Search avec SPS Portail

as seen on ASP-PHP.net - Search for 'ASP-PHP.net'
Microsoft Office SharePoint Server 2007 (MOSS) fournit de nombreuses fonctionnalit?s qui ne sont pas disponibles sous SharePoint Portal Server 2003 (SPS). C'est particuli?rement vrai pour le moteur de recherche. Ce moteur de recherche peut pourtant ?tre utilis? sans attendre une ?volution du site… >>> More