Why isn't Hadoop implemented using MPI?
        Posted  
        
            by 
                artif
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by artif
        
        
        
        Published on 2011-01-04T04:34:54Z
        Indexed on 
            2011/01/04
            5:53 UTC
        
        
        Read the original article
        Hit count: 383
        
Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.
What are the technical reasons for this?
I could hazard a few guesses, but I do not know enough of how MPI is implemented "under the hood" to know whether or not I'm right.
Come to think of it, I'm not entirely familiar with Hadoop's internals either. I understand the framework at a conceptual level (map/combine/shuffle/reduce and how that works at a high level) but I don't know the nitty gritty implementation details. I've always assumed Hadoop was transmitting serialized data structures (perhaps GPBs) over a TCP connection, eg during the shuffle phase. Let me know if that's not true.
© Stack Overflow or respective owner