Sentence Tree v/s Words List

Posted by Rohit Jose on Programmers See other posts from Programmers or by Rohit Jose
Published on 2013-12-21T10:04:54Z Indexed on 2014/08/21 16:29 UTC
Read the original article Hit count: 340

I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities.

One approach that was suggested was to keep a lookup table for all the know connector words like articles and conjunctions, remove them from the words list after splitting the sentence on the basis of the spaces. This would leave out the Name Entities in the sentence.

A lookup is then done for these identified entities on another lookup table that associates them to the entity type, for example if the sentence was: Remember the Titans was a movie directed by Boaz Yakin, the possible outputs would be:


{Remember the Titans,Movie} was {a movie,Movie} directed by {Boaz Yakin,director}
{Remember the Titans,Movie} was a movie directed by Boaz Yakin
{Remember the Titans,Movie} was {a movie,Movie} directed by Boaz Yakin
{Remember the Titans,Movie} was a movie directed by {Boaz Yakin,director}
Remember the Titans was {a movie,Movie} directed by Boaz Yakin
Remember the Titans was {a movie,Movie} directed by {Boaz Yakin,director}
Remember the Titans was a movie directed by {Boaz Yakin,director}
Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by {Boaz Yakin,director}
Remember the {the titans,Movie,Sports Team} was a movie directed by Boaz Yakin
Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by Boaz Yakin
Remember the {the titans,Movie,Sports Team} was a movie directed by {Boaz Yakin,director}

The entity lookup table here would contain the following data:

Remember the Titans=>Movie
a movie=>Movie
Boaz Yakin=>director
the Titans=>Movie
the Titans=>Sports Team

Another alternative logic that was put forward was to build a crude sentence tree that would contain the connector words in the lookup table as parent nodes and do a lookup in the entity table for the leaf node that might contain the entities. The tree that was built for the sentence above would be:

enter image description here

The question I am faced with is the benefits of the two approaches, should I be going for the tree approach to represent the sentence parsing, since it provides a more semantic structure? Is there a better approach I should be going for solving it?

© Programmers or respective owner

Related posts about parsing

Related posts about algorithm-analysis