Determining whether a file is a duplicate

Posted by Todd R on Stack Overflow See other posts from Stack Overflow or by Todd R
Published on 2010-05-11T17:15:30Z Indexed on 2010/05/11 17:24 UTC
Read the original article Hit count: 215

Is there a reliable way to determine whether or not two files are the same? For example, two files with the same size and type may or may not be the same binarilly (yeah, I know it's not really a word). I assume that comparing one or two checksums of the files will help, but I wonder:

  1. How reliable are checksums at determining whether two files are different; what are the chances of two different files having the same checksum?
  2. Would reliability increase by applying additional checksum comparisons?
  3. Which checksum algorithm(s) would be the most efficient and/or reliable?

Any ideas, suggestions or thoughts are appreciated!

P.S. The code for this is being written in Java running on a nix system, but generic or platform agnostic input is most helpful.

© Stack Overflow or respective owner

Related posts about file

Related posts about checksum