How to detect if 2 news articles have the same topic? (Python language-comparison)

Posted by resopollution on Stack Overflow See other posts from Stack Overflow or by resopollution
Published on 2010-04-05T18:52:38Z Indexed on 2010/04/05 18:53 UTC
Read the original article Hit count: 273

I'm looking for ideas on recommended approach.

I'm trying to scrape some headlines and body text from articles for a few specific sites, similar to what Google does with Google News.

The problem is across different sites, they may have articles on the same exact subject, worded slightly differently.

Can anyone point to me what I need to know in order to write a comparison algorithm to auto-detect similar articles?

Thanks very much in advance.

I use Python.

© Stack Overflow or respective owner

Related posts about python

Related posts about comparison