Iterating through String word at a time in Python
        Posted  
        
            by AlgoMan
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by AlgoMan
        
        
        
        Published on 2010-05-04T20:10:45Z
        Indexed on 
            2010/05/04
            20:18 UTC
        
        
        Read the original article
        Hit count: 461
        
I have a string buffer of a huge text file. I have to search a given words/phrases in the string buffer. Whats the efficient way to do it ?
I tried using re module matches. But As i have a huge text corpus that i have to search through. This is taking large amount of time.
Given a Dictionary of words and Phrases.
I iterate through the each file, read that into string , search all the words and phrases in the dictionary and increment the count in the dictionary if the keys are found.
One small optimization that we thought was to sort the dictionary of phrases/words with the max number of words to lowest. And then compare each word start position from the string buffer and compare the list of words. If one phrase is found, we don search for the other phrases (as it matched the longest phrase ,which is what we want)
Can some one suggest how to go about word by word in the string buffer. (Iterate string buffer word by word) ?
Also, Is there any other optimization that can be done on this ?
© Stack Overflow or respective owner