Network bandwidth bottleneck for sorting of mapreduce intermediate keys?

Posted by Zubair on Stack Overflow See other posts from Stack Overflow or by Zubair
Published on 2010-03-11T08:42:42Z Indexed on 2010/03/13 5:55 UTC
Read the original article Hit count: 126

Filed under:

I have been learning the mapreduce algorithm and how it can potentially scale to millions of machines, but I don't understand how the sorting of the intermediate keys after the map phase can scale, as there will be:

1,000,000 x 1,000,000

: potential machines communicating small key / value pairs of the intermediate results with each other? Isn't this a bottleneck?

© Stack Overflow or respective owner

Related posts about mapreduce