Network bandwidth bottleneck for sorting of mapreduce intermediate keys?
        Posted  
        
            by Zubair
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Zubair
        
        
        
        Published on 2010-03-11T08:42:42Z
        Indexed on 
            2010/03/13
            5:55 UTC
        
        
        Read the original article
        Hit count: 180
        
mapreduce
I have been learning the mapreduce algorithm and how it can potentially scale to millions of machines, but I don't understand how the sorting of the intermediate keys after the map phase can scale, as there will be:
1,000,000 x 1,000,000
: potential machines communicating small key / value pairs of the intermediate results with each other? Isn't this a bottleneck?
© Stack Overflow or respective owner