Creating a unique key based on file content in python

Posted by Cawas on Stack Overflow See other posts from Stack Overflow or by Cawas
Published on 2010-05-04T22:47:26Z Indexed on 2010/05/04 22:58 UTC
Read the original article Hit count: 226

I got many, many files to be uploaded to the server, and I just want a way to avoid duplicates.

Thus, generating a unique and small key value from a big string seemed something that a checksum was intended to do, and hashing seemed like the evolution of that.

So I was going to use hash md5 to do this. But then I read somewhere that "MD5 are not meant to be unique keys" and I thought that's really weird.

What's the right way of doing this?

edit: by the way, I took two sources to get to the following, which is how I'm currently doing it and it's working just fine, with Python 2.5:

import hashlib

def md5_from_file (fileName, block_size=2**14):
    md5 = hashlib.md5()
    f = open(fileName)
    while True:
        data = f.read(block_size)
        if not data:
            break
        md5.update(data)
    f.close()
    return md5.hexdigest()

© Stack Overflow or respective owner

Related posts about python

Related posts about unique-key