Uncompress OpenOffice files for better storage in version control

Posted by Craig McQueen on Stack Overflow See other posts from Stack Overflow or by Craig McQueen
Published on 2009-06-10T12:01:32Z Indexed on 2010/03/16 7:46 UTC
Read the original article Hit count: 341

I've heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change the data, so delta compression doesn't work well in version control systems.

I've done basic testing on an OpenOffice file, unzipping it and then rezipping it with zero compression. I used the Linux zip utility for my testing. OpenOffice will still happily open it.

So I'm wondering if it's worth developing a small utility to run on ODF files each time just before I commit to version control. Any thoughts on this idea? Possible better alternatives?

Secondly, what would be a good and robust way to implement this little utility? Bash shell that calls zip (probably Linux only)? Python? Any gotchas you can think of? Obviously I don't want to accidentally mangle a file, and there are several ways that could happen.

Possible gotchas I can think of:

  • Insufficient disk space
  • Some other permissions issue that prevents writing the file or temporary files
  • ODF document is encrypted (probably should just leave these alone; the encryption probably also causes large file changes and thus prevents efficient delta compression)

© Stack Overflow or respective owner

Related posts about openoffice.org

Related posts about version-control