Converting large files in python
        Posted  
        
            by 
                Cenoc
            
        on Programmers
        
        See other posts from Programmers
        
            or by Cenoc
        
        
        
        Published on 2014-06-03T09:21:34Z
        Indexed on 
            2014/06/03
            9:34 UTC
        
        
        Read the original article
        Hit count: 318
        
I have a few files that are ~64GB in size that I think I would like to convert to hdf5 format. I was wondering what the best approach for doing so would be? Reading line-by-line seems to take more than 4 hours, so I was thinking of using multiprocessing in sequence, but was hoping for some direction on what would be the most efficient way without resorting to hadoop. Any help would be very much appreciated. (and thank you in advance)
EDIT:
Right now I'm just doing a for line in fd: approach. After that right now I just check to make sure I'm picking out the right sort of data, which is very short; I'm not writing anywhere, and it's taking around 4 hours to complete with that. I can't read blocks of data because the blocks in this weird file format I'm reading are not standard, it switches between three different sizes... and you can only tell which by reading the first few characters of the block. 
© Programmers or respective owner