Detecting similar words among n text documents

Posted by javanes on Stack Overflow See other posts from Stack Overflow or by javanes
Published on 2010-03-18T09:23:53Z Indexed on 2010/03/18 12:31 UTC
Read the original article Hit count: 197

Hi;

I have n documents and want to find common words that are included in these documents. For example I want to say (n-3) documents include the word "web".

Certainly I can do this by basic data structures but there maybe efficient algorithm or a way to handle same words with different suffix. Is there any algorithm for such purposes?

I am unfamiliar with datamining world. In general manner is there a term used for efforts of finding similarities between different documents? If there is then I will make my research easily.

Thanks.

© Stack Overflow or respective owner

Related posts about datamining

Related posts about Patterns