Algorithm for sentence analysis and tokenization
- by Andrea Nagar
I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency.
It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx
Do you have anything written in C#?