Algorithm for sentence analysis and tokenization

Posted by Andrea Nagar on Stack Overflow See other posts from Stack Overflow or by Andrea Nagar
Published on 2010-05-28T00:27:01Z Indexed on 2010/05/28 0:31 UTC
Read the original article Hit count: 240

Filed under:
|

I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

© Stack Overflow or respective owner

Related posts about c#

Related posts about natural-language