mdadm raid5 recover double disk failure - with a twist (drive order)

Posted by Peter Bos on Server Fault See other posts from Server Fault or by Peter Bos
Published on 2013-09-14T10:28:01Z Indexed on 2013/11/11 3:57 UTC
Read the original article Hit count: 510

Filed under:

Let me acknowledge first off that I have made mistakes, and that I have a backup for most but not all of the data on this RAID. I still have hope of recovering the rest of the data. I don't have the kind of money to take the drives to a recovery expert company.

Mistake #0, not having a 100% backup. I know.

I have a mdadm RAID5 system of 4x3TB. Drives /dev/sd[b-e], all with one partition /dev/sd[b-e]1. I'm aware that RAID5 on very large drives is risky, yet I did it anyway.

Recent events

The RAID become degraded after a two drive failure. One drive [/dev/sdc] is really gone, the other [/dev/sde] came back up after a power cycle, but was not automatically re-added to the RAID. So I was left with a 4 device RAID with only 2 active drives [/dev/sdb and /dev/sdd].

Mistake #1, not using dd copies of the drives for restoring the RAID. I did not have the drives or the time. Mistake #2, not making a backup of the superblock and mdadm -E of the remaining drives.

Recovery attempt

I reassembled the RAID in degraded mode with

mdadm --assemble --force /dev/md0, using /dev/sd[bde]1.

I could then access my data. I replaced /dev/sdc with a spare; empty; identical drive.

I removed the old /dev/sdc1 from the RAID

mdadm --fail /dev/md0 /dev/sdc1

Mistake #3, not doing this before replacing the drive

I then partitioned the new /dev/sdc and added it to the RAID.

mdadm --add /dev/md0 /dev/sdc1

It then began to restore the RAID. ETA 300 mins. I followed the process via /proc/mdstat to 2% and then went to do other stuff.

Checking the result

Several hours (but less then 300 mins) later, I checked the process. It had stopped due to a read error on /dev/sde1.

Here is where the trouble really starts

I then removed /dev/sde1 from the RAID and re-added it. I can't remember why I did this; it was late.

mdadm --manage /dev/md0 --remove /dev/sde1
mdadm --manage /dev/md0 --add /dev/sde1

However, /dev/sde1 was now marked as spare. So I decided to recreate the whole array using --assume-clean using what I thought was the right order, and with /dev/sdc1 missing.

mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1

That worked, but the filesystem was not recognized while trying to mount. (It should have been EXT4).

Device order

I then checked a recent backup I had of /proc/mdstat, and I found the drive order.

md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

I then remembered this RAID had suffered a drive loss about a year ago, and recovered from it by replacing the faulty drive with a spare one. That may have scrambled the device order a bit...so there was no drive [3] but only [0],[1],[2], and [4].

I tried to find the drive order with the Permute_array script: https://raid.wiki.kernel.org/index.php/Permute_array.pl but that did not find the right order.

Questions

I now have two main questions:

I screwed up all the superblocks on the drives, but only gave:
```
mdadm --create --assume-clean
```
commands (so I should not have overwritten the data itself on /dev/sd[bde]1. Am I right that in theory the RAID can be restored [assuming for a moment that /dev/sde1 is ok] if I just find the right device order?
Is it important that /dev/sde1 be given the device number [4] in the RAID? When I create it with
```
mdadm --create /dev/md0 --assume-clean -l5 -n4 \
  /dev/sdb1 missing /dev/sdd1 /dev/sde1
```
it is assigned the number [3]. I wonder if that is relevant to the calculation of the parity blocks. If it turns out to be important, how can I recreate the array with /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]? If I could get that to work I could start it in degraded mode and add the new drive /dev/sdc1 and let it resync again.

It's OK if you would like to point out to me that this may not have been the best course of action, but you'll find that I realized this. It would be great if anyone has any suggestions.

Developer IT