Verify uniqueness of new content
        Posted  
        
            by 
                rogerkk
            
        on Programmers
        
        See other posts from Programmers
        
            or by rogerkk
        
        
        
        Published on 2012-11-15T09:37:50Z
        Indexed on 
            2012/11/15
            11:22 UTC
        
        
        Read the original article
        Hit count: 324
        
I'm working on a review site, where there is a minor issue with almost duplicate reviews across items. Just a few words are changed. It would be very nice to be able to uncover these duplicates before they are approved by a moderator, and I'm hoping someone could chime in on the best strategy to get there.
The site is running Ruby on Rails on a Postgres database and using Thinking Sphinx for search (all on Heroku), and so far the best option I see is to be pulling all the reviews out of the db and using a module like amatch to compare the strings. Not very efficient, so in this case I guess I'll have to limit the number/age of reviews to scan for dupes.
Anyone got a better idea?
© Programmers or respective owner