Control the MultipleOutputFormat files sub-path

Posted by iCode on Stack Overflow See other posts from Stack Overflow or by iCode
Published on 2012-06-28T03:44:21Z Indexed on 2012/06/29 3:16 UTC
Read the original article Hit count: 128

Filed under:

I need to control the sub-path of the different different files being managed by MultipleOutputFormat based on the reducer key.

I basically want to set the sub path of the file based on the key given to the reducer.

I can changed the file name by overwrting the generateFileNameForKeyValue method of MultipleOutputFormatbut how can I also change the sub-path of these files?

I mean with just overriding the generateFileNameForKeyValue, I get

mySetJobConfigOutputPath/fileNameBasedKey1.dat
                        /fileNameBasedKey2.dat
                        /fileNameBasedKey3.dat
                        ...

but I want to make it to be organize files like below

 mySetJobConfigOutputPath/path0ConfiguredInsideReducerBasedOnKey/fileNameBasedKey1.dat

                         /path1ConfiguredInsideReducerBasedOnKey/fileNameBasedKey2.dat
                                                                /fileNameBasedKey3.dat

                         /path2ConfiguredInsideReducerBasedOnKey/fileNameBasedKey8.dat

as seen, the sub-path and the file name are both figured out by the key inside the reducer.

I know how to configure the file name but was wondering if I can configure the sub-path of the each file under the mySetJobConfigOutputPath folder?

© Stack Overflow or respective owner

Related posts about hadoop