How to detect if 2 news articles have the same topic? (Python language-comparison)
        Posted  
        
            by resopollution
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by resopollution
        
        
        
        Published on 2010-04-05T18:52:38Z
        Indexed on 
            2010/04/05
            18:53 UTC
        
        
        Read the original article
        Hit count: 351
        
I'm looking for ideas on recommended approach.
I'm trying to scrape some headlines and body text from articles for a few specific sites, similar to what Google does with Google News.
The problem is across different sites, they may have articles on the same exact subject, worded slightly differently.
Can anyone point to me what I need to know in order to write a comparison algorithm to auto-detect similar articles?
Thanks very much in advance.
I use Python.
© Stack Overflow or respective owner