Converting large files in python

Posted by Cenoc on Programmers See other posts from Programmers or by Cenoc
Published on 2014-06-03T09:21:34Z Indexed on 2014/06/03 9:34 UTC
Read the original article Hit count: 220

Filed under:
|
|

I have a few files that are ~64GB in size that I think I would like to convert to hdf5 format. I was wondering what the best approach for doing so would be? Reading line-by-line seems to take more than 4 hours, so I was thinking of using multiprocessing in sequence, but was hoping for some direction on what would be the most efficient way without resorting to hadoop. Any help would be very much appreciated. (and thank you in advance)

EDIT: Right now I'm just doing a for line in fd: approach. After that right now I just check to make sure I'm picking out the right sort of data, which is very short; I'm not writing anywhere, and it's taking around 4 hours to complete with that. I can't read blocks of data because the blocks in this weird file format I'm reading are not standard, it switches between three different sizes... and you can only tell which by reading the first few characters of the block.

© Programmers or respective owner

Related posts about python

Related posts about database