Hadoop Rolling Small files
        Posted  
        
            by 
                Arenstar
            
        on Server Fault
        
        See other posts from Server Fault
        
            or by Arenstar
        
        
        
        Published on 2010-11-16T03:03:56Z
        Indexed on 
            2011/01/05
            13:56 UTC
        
        
        Read the original article
        Hit count: 270
        
I am running Hadoop on a project and need a suggestion.
Generally by default Hadoop has a "block size" of around 64mb..
There is also a suggestion to not use many/small files..
I am currently having very very very small files being put into HDFS due to the application design of flume..
The problem is, that Hadoop <= 0.20 cannot append to files, whereby i have too many files for my map-reduce to function efficiently..
There must be a correct way to simply roll/merge roughly 100 files into one..
Therefore Hadoop is effectively reading 1 large file instead of 10
Any Suggestions??
© Server Fault or respective owner