Algorithm for sentence analysis and tokenization
        Posted  
        
            by Andrea Nagar
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Andrea Nagar
        
        
        
        Published on 2010-05-28T00:27:01Z
        Indexed on 
            2010/05/28
            0:31 UTC
        
        
        Read the original article
        Hit count: 310
        
c#
|natural-language
I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?
© Stack Overflow or respective owner