fastest way to perform string search in general and in python

Posted by Rkz on Stack Overflow See other posts from Stack Overflow or by Rkz
Published on 2012-10-28T04:36:32Z Indexed on 2012/10/28 5:00 UTC
Read the original article Hit count: 105

Filed under:
|
|
|

My task is to search for a string or a pattern in a list of documents that are very short (say 200 characters long). However, say there are 1 million documents of such time. What is the most efficient way to perform this search?. I was thinking of tokenizing each document and putting the words in hashtable with words as key and document number as value, there by creating a bag of words. Then perform the word search and retrieve the list of documents that contained this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?.

Also, is there a python library or third party package that can perform efficient searches?

© Stack Overflow or respective owner

Related posts about python

Related posts about algorithm