Removing duplicate images (deduplication) - calculating "overlap" of images

Posted by jotango on Server Fault See other posts from Server Fault or by jotango
Published on 2010-05-06T09:33:26Z Indexed on 2010/05/06 9:38 UTC
Read the original article Hit count: 737

Filed under:
|
|

Hello,

I have a ton of product images on our file system. Our code removes 100% identical images (or does not allow them to be uploaded). However our sellers often upload items pictures which are very similar, but not exactly. They could have more whitespace, a worse quality (compression), a different size etc.

Is there any way I can calculate the degree of overlap between two images, to flag ones for deletion? Kind of like a Levenshtein distance between two images...

Any pointers would be very cool. Thanks!

© Server Fault or respective owner

Related posts about images

Related posts about deduplication