Search Results

Search found 1 results on 1 pages for 'wilson32'.

Page 1/1 | 1 

  • lexers / parsers for (un) structured text documents

    - by wilson32
    There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc. It's relatively easy for a person to identify them: where the Table of Contents, acknowledgements, or where the main body starts and it is possible to build rule based systems to identify some of these (such as paragraphs). I don't expect it to be perfect, but does any one know of such a broad 'block based' lexer / parser? Or could you point me in the direction of literature which may help?

    Read the article

1