How does Hadoop perform input splits?

Posted by Deepak Konidena on Stack Overflow See other posts from Stack Overflow or by Deepak Konidena
Published on 2010-05-14T02:27:21Z Indexed on 2010/05/14 2:34 UTC
Read the original article Hit count: 361

Filed under:
|

Hi,

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where k is the offset of the line from the beginning and value is the content of the line.

Now, when we say that we want to run N map tasks, does the framework split the input file into N splits and run each map task on that split? or do we have to write a partitioning function that does the N splits and run each map task on the split generated?

All i want to know is, whether the splits are done internally or do we have to split the data manually?

More specifically, each time the map() function is called what are its Key key and Value val parameters?

Thanks, Deepak

© Stack Overflow or respective owner

Related posts about apache-hadoop

Related posts about mapreduce