Why do HDFS clusters have only a single NameNode?

Posted by grautur on Programmers See other posts from Programmers or by grautur
Published on 2012-04-04T03:07:34Z Indexed on 2012/04/04 5:39 UTC
Read the original article Hit count: 232

Filed under:

I'm trying to understand better how Hadoop works, and I'm reading

The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy. Hadoop 0.21+ has a BackupNameNode that is part of a plan to have an HA name service, but it needs active contributions from the people who want it (i.e. you) to make it Highly Available.

from http://wiki.apache.org/hadoop/NameNode

So why is the NameNode a single point of failure? What is bad or difficult about having a complete duplicate of the NameNode running as well?

© Programmers or respective owner

Related posts about hadoop