Remove Duplicate Messages from Maildir
- by Joseph Holsten
I've got a bunch of duplicate messages in my IMAP server's Maildir. What's the best way to remove them?
Some relevant points:
Shared Message-ID is usually a good enough definition of duplicate. A tiny script that removes all but one of the duplicate messages would work.
Sometimes it's necessary to find duplicates based on shared message bodies. What's a reasonable definition of shared here? Bitwise equivalent? What about weird differences in line wrapping, escaping, character encoding?
Sometimes there's some meaningful difference between 'duplicate' messages. What's the best way to review the differences in sets of 'duplicate' messages? Diffs?