How to get rid of a stubborn 'removed' device in mdadm
- by T.J. Crowder
One of my server's drives failed and so I removed the failed drive from all three relevant arrays, had the drive swapped out, and then added the new drive to the arrays. Two of the arrays worked perfectly. The third added the drive back as a spare, and there's an odd "removed" entry in the mdadm details.
I tried both
mdadm /dev/md2 --remove failed
and
mdadm /dev/md2 --remove detached
as suggested here and here, neither of which complained, but neither of which had any effect, either.
Does anyone know how I can get rid of that entry and get the drive added back properly? (Ideally without resyncing a third time, I've already had to do it twice and it takes hours. But if that's what it takes, that's what it takes.) The new drive is /dev/sda, the relevant partition is /dev/sda3.
Here's the detail on the array:
# mdadm --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Wed Oct 26 12:27:49 2011
     Raid Level : raid1
     Array Size : 729952192 (696.14 GiB 747.47 GB)
  Used Dev Size : 729952192 (696.14 GiB 747.47 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent
    Update Time : Tue Nov 12 17:48:53 2013
          State : clean, degraded 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
           UUID : 2fdbf68c:d572d905:776c2c25:004bd7b2 (local to host blah)
         Events : 0.34665
    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       19        1      active sync   /dev/sdb3
       2       8        3        -      spare   /dev/sda3
If it's relevant, it's a 64-bit server. It normally runs Ubuntu, but right now I'm in the data centre's "rescue" OS, which is Debian 7 (wheezy). The "removed" entry was there the last time I was in Ubuntu (it won't, currently, boot from the disk), so I don't think that's not some Ubuntu/Debian conflict (and they are, of course, closely related).
Update:
Having done extensive tests with test devices on a local machine, I'm just plain getting anomalous behavior from mdadm with this array. For instance, with /dev/sda3 removed from the array again, I did this:
mdadm /dev/md2 --grow --force --raid-devices=1
And that got rid of the "removed" device, leaving me just with /dev/sdb3. Then I nuked /dev/sda3 (wrote a file system to it, so it didn't have the raid fs anymore), then:
mdadm /dev/md2 --grow --raid-devices=2
...which gave me an array with /dev/sdb3 in slot 0 and "removed" in slot 1 as you'd expect. Then
mdadm /dev/md2 --add /dev/sda3
...added it — as a spare again. (Another 3.5 hours down the drain.)
So with the rebuilt spare in the array, given that mdadm's man page says
  RAID-DEVICES CHANGES
  
  ...
  
  When the number of devices is increased, any hot spares that are present will be activated immediately.
...I grew the array to three devices, to try to activate the "spare":
mdadm /dev/md2 --grow --raid-devices=3
What did I get? Two "removed" devices, and the spare. And yet when I do this with a test array, I don't get this behavior.
So I nuked /dev/sda3 again, used it to create a brand-new array, and am copying the data from the old array to the new one:
rsync -r -t -v --exclude 'lost+found' --progress /mnt/oldarray/* /mnt/newarray
This will, of course, take hours. Hopefully when I'm done, I can stop the old array entirely, nuke /dev/sdb3, and add it to the new array. Hopefully, it won't get added as a spare!