Random access gzip stream
        Posted  
        
            by jkff
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by jkff
        
        
        
        Published on 2010-03-26T21:39:33Z
        Indexed on 
            2010/03/26
            21:43 UTC
        
        
        Read the original article
        Hit count: 386
        
I'd like to be able to do random access into a gzipped file. I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the preprocessing is much smaller than the file itself.
Any advice?
My thoughts were:
- Hack on an existing gzip implementation and serialize its decompressor state every, say, 1 megabyte of compressed data. Then to do random access, deserialize the decompressor state and read from the megabyte boundary. This seems hard, especially since I'm working with Java and I couldn't find a pure-java gzip implementation :(
- Re-compress the file in chunks of 1Mb and do same as above. This has the disadvantage of doubling the required disk space.
- Write a simple parser of the gzip format that doesn't do any decompressing and only detects and indexes block boundaries (if there even are any blocks: I haven't yet read the gzip format description)
© Stack Overflow or respective owner