Efficient Context-Free Grammar parser, preferably Python-friendly

Posted by Max Shawabkeh on Stack Overflow See other posts from Stack Overflow or by Max Shawabkeh
Published on 2010-12-28T01:06:15Z Indexed on 2010/12/28 1:53 UTC
Read the original article Hit count: 550

Filed under:

nltk

I am in need of parsing a small subset of English for one of my project, described as a context-free grammar with (1-level) feature structures (example) and I need to do it efficiently .

Right now I'm using NLTK's parser which produces the right output but is very slow. For my grammar of ~450 fairly ambiguous non-lexicon rules and half a million lexical entries, parsing simple sentences can take anywhere from 2 to 30 seconds, depending it seems on the number of resulting trees. Lexical entries have little to no effect on performance.

Another problem is that loading the (25MB) grammar+lexicon at the beginning can take up to a minute.

From what I can find in literature, the running time of the algorithm used to parse such a grammar (Earley or CKY) should be linear to the size of the grammar and cubic to the size of the input token list. My experience with NLTK indicates that ambiguity is what hurts the performance most, not the absolute size of the grammar.

So now I'm looking for a CFG parser to replace NLTK. I've been considering PLY but I can't tell whether it supports feature structures in CFGs, which are required in my case, and the examples I've seen seem to be doing a lot of procedural parsing rather than just specifying a grammar. Can anybody show me an example of PLY both supporting feature structs and using a declarative grammar?

I'm also fine with any other parser that can do what I need efficiently. A Python interface is preferable but not absolutely necessary.

Developer IT

Efficient Context-Free Grammar parser, preferably Python-friendly - Developer IT

Efficient Context-Free Grammar parser, preferably Python-friendly

python

parsing

nlp

grammar

nltk

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about parsing

Hot to fix nautilus desktop on linux mint

Is parsing JSON faster than parsing XML

Looking for a tutorial on Recursive Descent Parsing.

Parsing XML with Hpricot, a Gem of a Ruby Gem

Parsing scripts that use curly braces

Categories cloud