How do I control output files name and content of an Hadoop streaming job?

Posted by Eran Kampf on Stack Overflow See other posts from Stack Overflow or by Eran Kampf
Published on 2009-05-20T13:18:43Z Indexed on 2010/03/17 10:11 UTC
Read the original article Hit count: 166

Is there a way to control the output filenames of an Hadoop Streaming job? Specifically I would like my job's output files content and name to be organized by the ket the reducer outputs - each file would only contain values for one key and its name would be the key.

Update: Just found the answer - Using a Java class that derives from MultipleOutputFormat as the jobs output format allows control of the output file names. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

I havent seen any samples for this out there... Can anyone point out to an Hadoop Streaming sample that makes use of a custom output format Java class?

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about mapreduce