HDFS datanode startup fails when disks are full

Posted by mbac on Server Fault See other posts from Server Fault or by mbac
Published on 2013-10-29T14:53:57Z Indexed on 2013/10/29 15:58 UTC
Read the original article Hit count: 233

Filed under:
|

Our HDFS cluster is only 90% full but some datanodes have some disks that are 100% full. That means when we mass reboot the entire cluster some datanodes completely fail to start with a message like this:

2013-10-26 03:58:27,295 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Mkdirs failed to create /mnt/local/sda1/hadoop/dfsdata/blocksBeingWritten

Only three have to fail this way before we start experiencing real data loss.

Currently we workaround it by decreasing the amount of space reserved for the root user but we'll eventually run out. We also run the re-balancer pretty much constantly, but some disks stay stuck at 100% anyway.

Changing the dfs.datanode.failed.volumes.tolerated setting is not the solution as the volume has not failed.

Any ideas?

© Server Fault or respective owner

Related posts about hadoop

Related posts about hdfs