Language parsing to find important words

Posted by Matt Huggins on Programmers See other posts from Programmers or by Matt Huggins
Published on 2012-12-17T22:23:44Z Indexed on 2012/12/17 23:13 UTC
Read the original article Hit count: 384

Filed under:

importance

I'm looking for some input and theory on how to approach a lexical topic.

Let's say I have a collection of strings, which may just be one sentence or potentially multiple sentences. I'd like to parse these strings to and rip out the most important words, perhaps with a score that denotes how likely the word is to be important.

Let's look at a few examples of what I mean.

Example #1:

"I really want a Keurig, but I can't afford one!"

This is a very basic example, just one sentence. As a human, I can easily see that "Keurig" is the most important word here. Also, "afford" is relatively important, though it's clearly not the primary point of the sentence. The word "I" appears twice, but it is not important at all since it doesn't really tell us any information. I might expect to see a hash of word/scores something like this:

"Keurig" => 0.9
"afford" => 0.4
"want"   => 0.2
"really" => 0.1
etc...

Example #2:

"Just had one of the best swimming practices of my life. Hopefully I can maintain my times come the competition. If only I had remembered to take of my non-waterproof watch."

This example has multiple sentences, so there will be more important words throughout. Without repeating the point exercise from example #1, I would probably expect to see two or three really important words come out of this: "swimming" (or "swimming practice"), "competition", & "watch" (or "waterproof watch" or "non-waterproof watch" depending on how the hyphen is handled).

Given a couple examples like this, how would you go about doing something similar? Are there any existing (open source) libraries or algorithms in programming that already do this?

Developer IT

Language parsing to find important words - Developer IT

Language parsing to find important words

languages

parsing

importance

Related posts about languages

Programming languages, positional languages and natural languages

Dynamic type languages versus static type languages

Are Mark Up languages considered programming languages?

Managed Languages vs Compiled Language difference?

Advantages of compilers for functional languages over compilers for imperative languages

Related posts about parsing

Hot to fix nautilus desktop on linux mint

Is parsing JSON faster than parsing XML

Looking for a tutorial on Recursive Descent Parsing.

Parsing XML with Hpricot, a Gem of a Ruby Gem

Parsing scripts that use curly braces

Categories cloud