Tracking unique versions of files with hashes

Posted by rwmnau on Stack Overflow See other posts from Stack Overflow or by rwmnau
Published on 2010-03-13T05:18:05Z Indexed on 2010/03/13 5:25 UTC
Read the original article Hit count: 533

Filed under:

hashing

|

collision-detection

I'm going to be tracking different versions of potentially millions of different files, and my intent is to hash them to determine I've already seen that particular version of the file. Currently, I'm only using MD5 (the product is still in development, so it's never dealt with millions of files yet), which is clearly not long enough to avoid collisions.

However, here's my question - Am I more likely to avoid collisions if I hash the file using two different methods and store both hashes (say, SHA1 and MD5), or if I pick a single, longer hash (like SHA256) and rely on that alone? I know option 1 has 288 hash bits and option 2 has only 256, but assume my two choices are the same total hash length.

Since I'm dealing with potentially millions of files (and multiple versions of those files over time), I'd like to do what I can to avoid collisions. However, CPU time isn't (completely) free, so I'm interested in how the community feels about the tradeoff - is adding more bits to my hash proportionally more expensive to compute, and are there any advantages to multiple different hashes as opposed to a single, longer hash, given an equal number of bits in both solutions?

© Stack Overflow or respective owner

Related posts about hashing

Is hashing of just "username + password" as safe as salted hashing

as seen on Programmers - Search for 'Programmers'
I want to hash "user + password". EDIT: prehashing "user" would be an improvement, so my question is also for hashing "hash(user) + password". If cross-site same user is a problem then the hashing changed to hashing "hash(serviceName + user) + password" From what I read about salted hash, using… >>> More
Index Hashing vs Normal Hashing

as seen on Stack Overflow - Search for 'Stack Overflow'
What is index hashing ? What are its advantages over regular hashing techniques ? >>> More
Secure hash and salt for PHP passwords

as seen on Stack Overflow - Search for 'Stack Overflow'
It is currently said that MD5 is partially unsafe. Taking this into consideration, I'd like to know which mechanism to use for password protection. Is “double hashing” a password less secure than just hashing it once? Suggests that hashing multiple times may be a good idea. How to implement password… >>> More
SHA512 vs. Blowfish and Bcrypt

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm looking at hashing algorithms, but couldn't find an answer. Bcrypt uses Blowfish Blowfish is better than MD5 Q: but is Blowfish better than SHA512? Thanks.. Update: I want to clarify that I understand the difference between hashing and encryption. What prompted me to ask the question… >>> More
Do encryption algorithms require an internal hashing algorithm?

as seen on Stack Overflow - Search for 'Stack Overflow'
When I use C# to implement the AES symmetric encryption cipher, I noticed: PasswordDeriveBytes derivedPassword = new PasswordDeriveBytes(password, saltBytesArray, hashAlgorithmName, numPasswordIterations); Why do I need to use a hashing algorithm for AES encryption? Aren't they separate? Or is… >>> More

Related posts about collision-detection

Collision detection with curves

as seen on Game Development - Search for 'Game Development'
I'm working on a 2D game in which I would like to do collision detection between a moving circle and some kind of static curves (maybe Bezier curves). Currently my game features only straight lines as the static geometry and I'm doing the collision detection by calculating the distance from the circle… >>> More
Need some advice regarding collision detection with the sprite changing its width and height

as seen on Game Development - Search for 'Game Development'
So I'm messing around with collision detection in my tile-based game and everything works fine and dandy using this method. However, now I am trying to implement sprite sheets so my character can have a walking and jumping animation. For one, I'd like to to be able to have each frame of variable… >>> More
Collision Detection - Java - Rectangle

as seen on Stack Overflow - Search for 'Stack Overflow'
I would like to know if this is a good idea that conforms to best practices that does not lead to obscenely confusing code or major performance hit(s): Make my own Collision detection class that extends Rectangle class. Then when instantiating that object doing something such as Collision col =… >>> More
Collision Detection for Actionscript 3

as seen on Stack Overflow - Search for 'Stack Overflow'
Well, I was searching for a simple collision detection function for as3, I found Collision Detection Kit, but it is too complicated, I just want a damn function that I give 2 objects as paramenters and that's it. I would like to know where can I find a pixel-perfect collision detection function (The… >>> More
The Maths for 2D Collision Detection between an OBB and a Circle

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm creating a 2D game and want to test for collision between an OBB (Oriented Bounding Box) and a Circle. I'm unsure of the maths and code to do this. I'm creating the game in c++ and opengl. >>> More