How to write efficient code for extracting Noun phrases?
        Posted  
        
            by 
                Arun Abraham
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Arun Abraham
        
        
        
        Published on 2012-12-13T01:31:43Z
        Indexed on 
            2012/12/13
            11:05 UTC
        
        
        Read the original article
        Hit count: 305
        
nlp
|language-processing
I am trying to extract phrases using rules such as the ones mentioned below on text which has been POS tagged
1) NNP -> NNP (-> indicates followed by) 2) NNP -> CC -> NNP 3) VP -> NP etc..
I have written code in this manner, Can someone tell me how i can do in a better manner.
    List<String> nounPhrases = new ArrayList<String>();
    for (List<HasWord> sentence : documentPreprocessor) {
        //System.out.println(sentence.toString());
        System.out.println(Sentence.listToString(sentence, false));
        List<TaggedWord> tSentence = tagger.tagSentence(sentence);
        String lastTag = null, lastWord = null;
        for (TaggedWord taggedWord : tSentence) {
            if (lastTag != null && taggedWord.tag().equalsIgnoreCase("NNP") && lastTag.equalsIgnoreCase("NNP")) {
                nounPhrases.add(taggedWord.word() + " " + lastWord);
                //System.out.println(taggedWord.word() + " " + lastWord);
            }
            lastTag = taggedWord.tag();
            lastWord = taggedWord.word();
        }
    }
In the above code, i have done only for NNP followed by NNP extraction, how can i generalise it so that i can add other rules too. I know that there are libraries available for doing this , but wanted to do this manually.
© Stack Overflow or respective owner