Search Results

Search found 1 results on 1 pages for 'newtoflume'.

Page 1/1 | 1 

  • Looking for Unix tool/script that, given an input path, will compress every batch of uncompressed 100MB text files into a single gzip file

    - by newToFlume
    I have a dump of thousands of small text files (1-5MB) large, each containing lines of text. I need to "batch" them up, so that each batch is of a fixed size - say 100MB, and compress that batch. Now that batch could be: A single file that is just a 'cat' of the contents of the individual text files, or Just the individual text files themselves Caveats: unix split -b will not work here as I need to keep lines of text intact. Using the lines option is a bit complicated as there is a large variance in the number of bytes in each line. The files need not be a fixed size strictly, as long as it's within 5% of the requested size The lines are critical, and should not be lost: I need to confirm that the input made its way to output without loss - what rolling checksum (something like CRC32, BUT better/"stronger" in face of collisions) A script should do nicely, but this seems like a task someone has done before, and it would be nice to see some code (preferably python or ruby) that does atleast something similar.

    Read the article

1