converting a treebank of vertical trees to s-expressions

Posted by Andreas on Stack Overflow See other posts from Stack Overflow or by Andreas
Published on 2010-05-11T22:58:28Z Indexed on 2010/05/11 23:04 UTC
Read the original article Hit count: 237

Filed under:
|

I need to preprocess a treebank corpus of sentences with parse trees. The input format is a vertical representation of trees, like so:

S
=NP
==(DT +def) the
== (N +ani) man
=VP
==V walks

...and I need it like:

(S (NP (DT the) (N man)) (VP (V walks)))

I have code that almost does it, but not quite. There's always a missing paren somewhere. Should I use a proper parser, maybe a CFG? The current code is at http://github.com/andreasvc/eodop/blob/master/arbobanko.py

The code also contains real examples from the treebank.

© Stack Overflow or respective owner

Related posts about corpus

Related posts about python