Making libmagic/file detect .docx files

Posted by Jonatan Littke on Server Fault See other posts from Server Fault or by Jonatan Littke
Published on 2011-12-06T11:11:39Z Indexed on 2012/04/09 11:32 UTC
Read the original article Hit count: 265

Filed under:
|
|
|
|

As seen elsewhere, docx, xlsx and pttx are ZIPs. When uploading them to my web application, file (via libmagic andpython-magic) detects them as being ZIP.

I store the contents of the file as a blob in the database, but naturally I don't want to trust the user with what kind of file type this is. So I would like to trust file for and automatically generate a filename during download.

I know one can modify /etc/magic but the format (magic(5)) is way too complicated for me. I found a bug report on the issue at Debian bugs but since it's from 2008 it doesn't seem to be fixed any time soon.

I guess my only other alternative is to indeed trust the user (but still store the contents as a blob) and only check the file extension based on the file name. This way I can disallow some extensions and allow others. And when the user re-downloads his file, he can have it in whatever way he uploaded it. But this solution is insecure if the file is shared with others, since you can simply rename the file to allow uploading it.

Any ideas?

Lastly, I found a list of magic numbers for docx etc, but I'm unable to convert these into the magic(5) format.

© Server Fault or respective owner

Related posts about linux

Related posts about debian