Search Results

Search found 242 results on 10 pages for 'mdadm'.

Page 2/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >

Accessing a broken mdadm raid

- by CarstenCarsten

Hi! I used a western digital mybookworld (SOHO NAS storage using Linux) as backup for my Linux box. Suddenly, the mybookworld does not boot up any more. So I opened the box, removed the hard disk and put the hard disk into an external USB HDD case, and connected it to my Linux box. [ 530.640301] usb 2-1: new high speed USB device using ehci_hcd and address 3 [ 530.797630] scsi7 : usb-storage 2-1:1.0 [ 531.794844] scsi 7:0:0:0: Direct-Access WDC WD75 00AAKS-00RBA0 PQ: 0 ANSI: 2 [ 531.796490] sd 7:0:0:0: Attached scsi generic sg3 type 0 [ 531.797966] sd 7:0:0:0: [sdc] 1465149168 512-byte logical blocks: (750 GB/698 GiB) [ 531.800317] sd 7:0:0:0: [sdc] Write Protect is off [ 531.800327] sd 7:0:0:0: [sdc] Mode Sense: 38 00 00 00 [ 531.800333] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 531.803821] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 531.803836] sdc: sdc1 sdc2 sdc3 sdc4 [ 531.815831] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 531.815842] sd 7:0:0:0: [sdc] Attached SCSI disk The dmesg output looks normal, but I was wondering why the hardisk was not mounted at all. And why there are 4 different partitions on it. fdisk showed the following: root@ubuntu:/home/ubuntu# fdisk /dev/sdc WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Command (m for help): p Disk /dev/sdc: 750.2 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00007c00 Device Boot Start End Blocks Id System /dev/sdc1 4 369 2939895 fd Linux raid autodetect /dev/sdc2 370 382 104422+ fd Linux raid autodetect /dev/sdc3 383 505 987997+ fd Linux raid autodetect /dev/sdc4 506 91201 728515620 fd Linux raid autodetect Oh no! Everything seems to be created as a mdadm software raid. Calling mdadm --examine with the different partitions seems to affirm that. I think the only partition I am interested in, is /dev/sdc4 (because it is the largest). But nevertheless I called mdadm --examine with every partition. root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : 5626a2d8:070ad992:ef1c8d24:cd8e13e4 Creation Time : Wed Feb 20 00:57:49 2002 Raid Level : raid1 Used Dev Size : 2939776 (2.80 GiB 3.01 GB) Array Size : 2939776 (2.80 GiB 3.01 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Sun Nov 21 11:05:27 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 4c90bc55 - correct Events : 16682 Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 0 0 1 faulty removed root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdc2 /dev/sdc2: Magic : a92b4efc Version : 00.90.00 UUID : 9734b3ee:2d5af206:05fe3413:585f7f26 Creation Time : Wed Feb 20 00:57:54 2002 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Update Time : Wed Oct 27 20:19:08 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 55560b40 - correct Events : 9884 Number Major Minor RaidDevice State this 0 8 2 0 active sync /dev/sda2 0 0 8 2 0 active sync /dev/sda2 1 1 0 0 1 faulty removed root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdc3 /dev/sdc3: Magic : a92b4efc Version : 00.90.00 UUID : 08f30b4f:91cca15d:2332bfef:48e67824 Creation Time : Wed Feb 20 00:57:54 2002 Raid Level : raid1 Used Dev Size : 987904 (964.91 MiB 1011.61 MB) Array Size : 987904 (964.91 MiB 1011.61 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 3 Update Time : Sun Nov 21 11:05:27 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 39717874 - correct Events : 73678 Number Major Minor RaidDevice State this 0 8 3 0 active sync 0 0 8 3 0 active sync 1 1 0 0 1 faulty removed root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdc4 /dev/sdc4: Magic : a92b4efc Version : 00.90.00 UUID : febb75ca:e9d1ce18:f14cc006:f759419a Creation Time : Wed Feb 20 00:57:55 2002 Raid Level : raid1 Used Dev Size : 728515520 (694.77 GiB 746.00 GB) Array Size : 728515520 (694.77 GiB 746.00 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 4 Update Time : Sun Nov 21 11:05:27 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 2f36a392 - correct Events : 519320 Number Major Minor RaidDevice State this 0 8 4 0 active sync 0 0 8 4 0 active sync 1 1 0 0 1 faulty removed If I read the output correctly everything was removed, because it was faulty. Is there ANY way to see the contents of the largest partition? Or seeing somehow which files are broken? I see that everything is raid1 which is only mirroring, so this should be a normal partition. I am anxious to do anything with mdadm, in fear that I destroy the data on the hard disk. I would be very thankful for any help.

Read the article
[Ubuntu 10.04] mdadm - Can't get RAID5 Array To Start

- by Matthew Hodgkins

Hello, after a power failure my RAID array refuses to start. When I boot I have to sudo mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 to get mdadm to notice the array. Here are the details (after I force assemble). sudo mdadm --misc --detail /dev/md0: /dev/md0: Version : 00.90 Creation Time : Sun Apr 25 01:39:25 2010 Raid Level : raid5 Used Dev Size : 1465135872 (1397.26 GiB 1500.30 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Jun 17 23:02:38 2010 State : active, Not Started Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K UUID : 44a8f730:b9bea6ea:3a28392c:12b22235 (local to host hodge-fs) Events : 0.1249691 Number Major Minor RaidDevice State 0 8 65 0 active sync /dev/sde1 1 8 81 1 active sync /dev/sdf1 2 8 97 2 active sync /dev/sdg1 3 8 49 3 active sync /dev/sdd1 4 8 33 4 active sync /dev/sdc1 5 8 17 5 active sync /dev/sdb1 mdadm.conf: # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions /dev/sdb1 /dev/sdb1 # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # definitions of existing MD arrays ARRAY /dev/md0 level=raid5 num-devices=6 UUID=44a8f730:b9bea6ea:3a28392c:12b22235 Any help would be appreciated.

Read the article
mdadm raid1 fails to resync

- by JuanD

Hello, I'm trying to solve this problem I'm having with an mdadm raid1. I have an ubuntu 9.04 server running on a software 2-drive raid1 with mdadm. Yesterday, one of the drives failed, and so I replaced it with a brand new drive of the same size. I removed the faulty drive, copied the partition from the remaining good drive to the new drive and then added it to the raid. It re-synced and the system worked fine, until the drive that hadn't failed, was also labeled failed. Now I had the raid running solely on the new drive. So I purchased another drive and repeated the procedure above. So now I had 2 brand new drives and the raid was syncing. However, after a few minutes I checked /proc/mdstat and the raid was no longer syncing. mdadm --detail /dev/md1 shows: (sdb is the first new drive, and sdc is the second new drive) root@dola:/home/jjaramillo# mdadm --detail /dev/md1 /dev/md1: Version : 00.90 Creation Time : Sat Dec 20 00:42:05 2008 Raid Level : raid1 Array Size : 974711680 (929.56 GiB 998.10 GB) Used Dev Size : 974711680 (929.56 GiB 998.10 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Jun 2 10:09:35 2010 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : bba497c6:5029ba0b:bfa4f887:c0dc8f3d Events : 0.5395594 Number Major Minor RaidDevice State 2 8 35 0 spare rebuilding /dev/sdc3 1 8 19 1 active sync /dev/sdb3 I've tried removing and re-adding the drive a few times, but the same thing happens. The raid fails to resync. I've looked at /var/log/messages, and found the following: Jun 2 07:57:36 dola kernel: [35708.917337] sd 5:0:0:0: [sdb] Unhandled sense code Jun 2 07:57:36 dola kernel: [35708.917339] sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 2 07:57:36 dola kernel: [35708.917342] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] Jun 2 07:57:36 dola kernel: [35708.917346] Descriptor sense data with sense descriptors (in hex): Jun 2 07:57:36 dola kernel: [35708.917348] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 2 07:57:36 dola kernel: [35708.917357] 00 43 9e 47 Jun 2 07:57:36 dola kernel: [35708.917360] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed So it looks like there's some kind of error on sdb (the first new drive). My question is, what would be the best approach to get the raid up and running again? I've thought about dd'ing the /dev/md1 to a blank hard drive, then re-doing the raid from scratch and loading the data back, but there could be an easier solution.. Any help would be appreciated.

Read the article
Failed MDADM Array With Ext.4 Partition - "e2fsck: unable to set superblock flags on /dev/md0"

- by Matthew Hodgkins

Had a power failure and now my mdadm array is having problems. sudo mdadm -D /dev/md0 [hodge@hodge-fs ~]$ sudo mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Sun Apr 25 01:39:25 2010 Raid Level : raid5 Array Size : 8790815232 (8383.57 GiB 9001.79 GB) Used Dev Size : 1465135872 (1397.26 GiB 1500.30 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Aug 7 19:10:28 2010 State : clean, degraded, recovering Active Devices : 6 Working Devices : 7 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Rebuild Status : 10% complete UUID : 44a8f730:b9bea6ea:3a28392c:12b22235 (local to host hodge-fs) Events : 0.1307608 Number Major Minor RaidDevice State 0 8 81 0 active sync /dev/sdf1 1 8 97 1 active sync /dev/sdg1 2 8 113 2 active sync /dev/sdh1 3 8 65 3 active sync /dev/sde1 4 8 49 4 active sync /dev/sdd1 7 8 33 5 spare rebuilding /dev/sdc1 6 8 16 6 active sync /dev/sdb sudo mount -a [hodge@hodge-fs ~]$ sudo mount -a mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so sudo fsck.ext4 /dev/md0 [hodge@hodge-fs ~]$ sudo fsck.ext4 /dev/md0 e2fsck 1.41.12 (17-May-2010) fsck.ext4: Group descriptors look bad... trying backup blocks... /dev/md0: recovering journal fsck.ext4: unable to set superblock flags on /dev/md0 sudo dumpe2fs /dev/md0 | grep -i superblock [hodge@hodge-fs ~]$ sudo dumpe2fs /dev/md0 | grep -i superblock dumpe2fs 1.41.12 (17-May-2010) Primary superblock at 0, Group descriptors at 1-524 Backup superblock at 32768, Group descriptors at 32769-33292 Backup superblock at 98304, Group descriptors at 98305-98828 Backup superblock at 163840, Group descriptors at 163841-164364 Backup superblock at 229376, Group descriptors at 229377-229900 Backup superblock at 294912, Group descriptors at 294913-295436 Backup superblock at 819200, Group descriptors at 819201-819724 Backup superblock at 884736, Group descriptors at 884737-885260 Backup superblock at 1605632, Group descriptors at 1605633-1606156 Backup superblock at 2654208, Group descriptors at 2654209-2654732 Backup superblock at 4096000, Group descriptors at 4096001-4096524 Backup superblock at 7962624, Group descriptors at 7962625-7963148 Backup superblock at 11239424, Group descriptors at 11239425-11239948 Backup superblock at 20480000, Group descriptors at 20480001-20480524 Backup superblock at 23887872, Group descriptors at 23887873-23888396 Backup superblock at 71663616, Group descriptors at 71663617-71664140 Backup superblock at 78675968, Group descriptors at 78675969-78676492 Backup superblock at 102400000, Group descriptors at 102400001-102400524 Backup superblock at 214990848, Group descriptors at 214990849-214991372 Backup superblock at 512000000, Group descriptors at 512000001-512000524 Backup superblock at 550731776, Group descriptors at 550731777-550732300 Backup superblock at 644972544, Group descriptors at 644972545-644973068 Backup superblock at 1934917632, Group descriptors at 1934917633-1934918156 sudo e2fsck -b 32768 /dev/md0 [hodge@hodge-fs ~]$ sudo e2fsck -b 32768 /dev/md0 e2fsck 1.41.12 (17-May-2010) /dev/md0: recovering journal e2fsck: unable to set superblock flags on /dev/md0 sudo dmesg | tail [hodge@hodge-fs ~]$ sudo dmesg | tail EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115) EXT4-fs (md0): group descriptors corrupted! EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115) EXT4-fs (md0): group descriptors corrupted! Please Help!!!

Read the article
mdadm auto grow raid

- by johannes

I have a raid0/1 on lvm logical volumes. I resized the logical volumes. Now I want to resize the raid to use the complete logical volumes. This can be done with mdadm /dev/md? --grow -z newsize But somehow I can't figure out how to calculate the newsize argument. Is there a way to tell mdadm to grow to the biggest possible size? If not, how do I calculate the biggest possible size of the raid to use for the newsize argument?

Read the article
breaking mdadm raid and moving to NTFS

- by daveyt

I'm running Ubuntu 8 something and my data is on a mirrored pair of 1TB disks formatted as ext3, and the RAID is via mdadm. I want to move to Windows 7 (yeah yeah I know but Linux aint doing it for me at the moment) and migrate the disks to NTFS. My plan is: Break the MDADM RAID (by failing one disk logically) Format the 'failed' disk as NTFS Copy data from the RAID array to the NTFS disk (dont care about perms) Install Windows, (new separate non RAid disk) and my data disk is available. I've researched this and it seems the easiest way. I dont have another disk to back up to so I think this is my only way. Can anyone see a better/easier way?

Read the article
Why do I get a DegradedArray event with mdadm

- by azera

Hello Just so we're clear on what's happening: I bought 4 new sata 2 drives, with the intent of using them in a raid5 all drive are fully recognised by both my bios and my linux box (gentoo) I created a raid5 array, fiddled a bit with it to understand how it works, how to monitor ect At some point, this triggered a degradedarray event, even though the array is brand new. I tried to stopping the array and recreating a new array with the same drive but the new array starts degraded too. here is what I used to create it mdadm --create -l5 -n4 /dev/md/md0-r5 /dev/sdb /dev/sdd /dev/sde /dev/sdf here are the output from my /proc/mdstat and mdadm --detail --scan **mdstat** Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid5 sdf[4] sde[2] sdd[1] sdb[0] 4395415488 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [>....................] recovery = 2.8% (41689732/1465138496) finish=890.3min speed=26645K/sec unused devices: <none> **detail** ARRAY /dev/md/md0-r5 metadata=0.90 spares=1 UUID=453e2833:81f22a74:64188b84:66721085 As such I have a couple questions: does a raid5 array always start in degraded mode at first ? why does sdf have the number 4 between bracket instead of 3, why does it see a spare disk and why is the 4th drive marked with _ instead of U ? (bad configuration ?) How can I recreate the array from scratch, do i have to format each drive on its own before recreating it ? Thanks for any help, I'm not sure about what I should do at the moment

Read the article
CentOS - mdadm raid1 drive won't mount to default location

- by danny

I'm running CentOS 5.5, the system, boot, swap, etc. is all on /dev/sda and I have two identical single-partition drives /dev/sdb1 /dev/sdc1 that are configured in RAID1 (using mdadm). It was working fine (configured to mount to /mnt/data in the fstab file) and I recently let yum install a couple of automatic updates without paying attention to what they were, and now it doesn't work. Raid is working fine (dmesg shows it gets loaded correctly). mdstat shows: # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdc1[1] sdb1[0] XXXX blocks [2/2] [UU] unused devices: <none> Additionally, I can mount it anywhere other than its default directory (i.e. the following works, and I can read data off the drives). # mount /dev/md0 /mnt/data2 EXT3-fs warning: mounting fs with errors, running e2fsck is recommended But when I run the following I get: # mount -a mount: /dev/sdb1 already mounted or /mnt/data busy It says nothing is mounted when I try to umount /dev/sdb1 or umount /mnt/data, so I assume it's the second of those errors. However, lsof | grep mnt shows nothing. The weird thing is that I can save files in /mnt/data. So something is obviously mounted there, but when I try to umount it I get the error that nothing is mounted. /etc/mtab doesn't mention any of the partitions or files I am trying to work with, and fstab just has that one line I mentioned above that is supposed to mount my raid partition. Again, it was all working fine until I On Google I've found a few things about dmraid interfering with mdadm after an update, but I yum remove'd dmraid and rebooted and it didn't help. I'm really confused and need to get this working to get on with my work!

Read the article
Auto-rebuild RAID6 with MDADM

- by user65632

Hello everybody, i'm new in this forum, and because I see that a lot of people get helped here, i'll ask my question here! I have a openSUSE 11.3 Linux computer with 5 disks of 1TB (WD enterprise disks) in it. with mdadm I configured an RAID6 device. Now, after a lot of thorough testing, i've noticed that when the computer goes down unsuspectedly it could happen (1 time out of 10) that while booting, the md0 device isn't recognised, and then the machine goes in "recovery mode", which means that i have to 'CTRL + C' it so it can boot to openSUSE. Once in openSUSE i have to readd the drive manually with 'mdadm /dev/md0 --add /dev/sdX'. After this everything works back fine (after resynching). So my question is: Is there a way to auto-rebuild the RAID6 device when there are problems? and how can I stop this "recovery mode" from happening. Because the computer will be in a place I can't go to, to connect a keyboard, 'CTRL + C' it just to get in openSUSE. If you could help me, I would be a very very very! happy man! :-) thanks in advance! Mikhail

Read the article
How to move Mdadm RAID drive (EBS based) to different AWS Instance

- by Stanley

We have a media-rich web application that is hosted on AWS. We have several Web Servers and we have an NFS server. On the NFS server (Linux server) we have several EBS volumes that are mounted and we've used mdadm to implement the different mounted volumes as a single RAID volume. The Web Servers simply access the NFS storage through a mount point. Amazon has now let us know that they will be performing power maintenance on this server in a couple of days time. Since all our media is on here it would render our site unusable for the hours while Amazon is working on it. We want to try and prevent this downtime. I was thinking that we can prevent server downtime by perhaps setting up a new server temporarily and attaching the EBS drives (raid volume) to that server and have our web servers point there during maintenance. This is a very high risk operation since this involves several terabytes of our production data. What would be the safe way to move over our logical raid drive (md0) to a new amazon instance? I was hoping that I could start with building the new server, mounting the ebs volumes and assembling the RAID partition using mdadm --assemble --scan before unmounting from the existing instance so that I can first test that everything works and thus having it mounted on two instances at the same time, but I don't believe that is possible with the way that filesystems work. How do I move a Linux software RAID to a new machine? suggests a way to move drives, but isn't really a cloud-based question. Perhaps there are simpler ways to prevent system downtime with our solution being hosted on the cloud? I have considered taking an EBS snapshot, but that tries to replicate all the many terabytes of mounted storage, so this is not a practical solution. Any ideas?

Read the article
How to display/define Mirror/Stripping pairs with mdadm

- by Chris

I want to make a standard linux software Raid10 over 4 HDD. The server has 4HDDs, 2 pairs from different vendors in order to avoid batch problems. I want to have the mirror over two different Vendors, and then the Stripe over the mirror pairs. I could do that by manually creating Raid1/0, but mdadm supports Raid level 10. I just cant figure out how the Raid10 is then handled and how the data is distributed. mdadm --detail /dev/md10 /dev/md10: Version : 1.2 Creation Time : Wed May 28 11:06:23 2014 Raid Level : raid10 Array Size : 1953260544 (1862.77 GiB 2000.14 GB) Used Dev Size : 976630272 (931.39 GiB 1000.07 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed May 28 11:06:23 2014 State : clean, resyncing (PENDING) Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 512K Name : pdwhost:10 (local to host pdwhost) UUID : a3de0ad5:9e694ee1:addc6786:c4449e40 Events : 0 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 81 1 active sync /dev/sdf1 2 8 97 2 active sync /dev/sdg1 3 8 113 3 active sync /dev/sdh1 does not really give any information about that. How it should be: Raid 1 / Mirror over /dev/sda1 /dev/sdf1 and /dev/sdg1 /dev/sdh1 Raid 0 over the two Raid 1 pairs Is it possible to do that with the built in "level=10", how can I see what pairs are mirrored? Thanks a lot for you help

Read the article
Did I lose my RAID again?

- by BarsMonster

Hi! A little history: 2 years ago I was really excited to find out that mdadm is so powerful that it even can reshape arrays, so you can start with a smaller array and then grow it as you need. I've bought 3x1Tb drives and made a RAID-5. It was fine for a year. Then I bought 2x more, and tried to reshape to RAID-6 out of 5 drives, and due to some mess with superblock versions, lost all content. Had to rebuild it from scratch, but 2Tb of data were gone. Yesterday I bought 2 more drives, and this time I had everything: properly built array, UPS. I've disabled write intent map, added 2 new drives as spares and run a command to grow array to 7-disks. It started working, but speed was ridiculously slow, ~100kb/sec. After processing first 37Mb at such an amazing speed, one of old HDDs fails. I properly shutdown the PC and disconnected the failed drive. After bootup it appeared that it recreated the intent map as it was still in mdadm config, so I removed it from config and rebooted again. Now all I see is that all mdadm processes deadlock, and don't do anything. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1937 root 20 0 12992 608 444 D 0 0.1 0:00.00 mdadm 2283 root 20 0 12992 852 704 D 0 0.1 0:00.01 mdadm 2287 root 20 0 0 0 0 D 0 0.0 0:00.01 md0_reshape 2288 root 18 -2 12992 820 676 D 0 0.1 0:00.01 mdadm And all I see in mdstat is: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5] 2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU] [>....................] reshape = 0.0% (37888/976561152) finish=567604147.2min speed=0K/sec I've already tried mdadm 2.6.7, 3.1.4 and 3.2 - nothing helps. Did I lose my data again? Any suggestions on how can I make this work? OS is Ubuntu Server 10.04.2. PS. Needless to say, the data is inaccessible - I cannot mount /dev/md0 to save the most valuable data. You can see my disappointment - the very specific thing I was excited about failed twice taking 5Tb of my data with it. Update: It appears there is some nice info in kern.log: 21:38:48 ...: [ 166.522055] raid5: reshape will continue 21:38:48 ...: [ 166.522085] raid5: device sdb1 operational as raid disk 1 21:38:48 ...: [ 166.522091] raid5: device sdg1 operational as raid disk 4 21:38:48 ...: [ 166.522097] raid5: device sdf1 operational as raid disk 5 21:38:48 ...: [ 166.522102] raid5: device sde1 operational as raid disk 6 21:38:48 ...: [ 166.522107] raid5: device sdd1 operational as raid disk 0 21:38:48 ...: [ 166.522111] raid5: device sdc1 operational as raid disk 3 21:38:48 ...: [ 166.523942] raid5: allocated 7438kB for md0 21:38:48 ...: [ 166.524041] 1: w=1 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524050] 4: w=2 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524056] 5: w=3 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524062] 6: w=4 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524068] 0: w=5 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524073] 3: w=6 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524079] raid5: raid level 6 set md0 active with 6 out of 7 devices, algorithm 2 21:38:48 ...: [ 166.524519] RAID5 conf printout: 21:38:48 ...: [ 166.524523] --- rd:7 wd:6 21:38:48 ...: [ 166.524528] disk 0, o:1, dev:sdd1 21:38:48 ...: [ 166.524532] disk 1, o:1, dev:sdb1 21:38:48 ...: [ 166.524537] disk 3, o:1, dev:sdc1 21:38:48 ...: [ 166.524541] disk 4, o:1, dev:sdg1 21:38:48 ...: [ 166.524545] disk 5, o:1, dev:sdf1 21:38:48 ...: [ 166.524550] disk 6, o:1, dev:sde1 21:38:48 ...: [ 166.524553] ...ok start reshape thread 21:38:48 ...: [ 166.524727] md0: detected capacity change from 0 to 2999995858944 21:38:48 ...: [ 166.524735] md: reshape of RAID array md0 21:38:48 ...: [ 166.524740] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. 21:38:48 ...: [ 166.524745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. 21:38:48 ...: [ 166.524756] md: using 128k window, over a total of 976561152 blocks. 21:39:05 ...: [ 166.525013] md0: 21:42:04 ...: [ 362.520063] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520073] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:42:04 ...: [ 362.520083] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520092] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:42:04 ...: [ 362.520100] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:42:04 ...: [ 362.520107] Call Trace: 21:42:04 ...: [ 362.520133] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:42:04 ...: [ 362.520148] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:42:04 ...: [ 362.520159] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:42:04 ...: [ 362.520169] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:42:04 ...: [ 362.520179] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520188] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:42:04 ...: [ 362.520194] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:42:04 ...: [ 362.520205] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:42:04 ...: [ 362.520214] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:42:04 ...: [ 362.520222] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:42:04 ...: [ 362.520230] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:42:04 ...: [ 362.520236] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:42:04 ...: [ 362.520244] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:42:04 ...: [ 362.520251] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:42:04 ...: [ 362.520258] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:42:04 ...: [ 362.520265] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:42:04 ...: [ 362.520272] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:42:04 ...: [ 362.520279] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:42:04 ...: [ 362.520285] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:42:04 ...: [ 362.520290] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:42:04 ...: [ 362.520297] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:42:04 ...: [ 362.520304] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:42:04 ...: [ 362.520310] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:42:04 ...: [ 362.520317] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:42:04 ...: [ 362.520324] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:42:04 ...: [ 362.520331] [<ffffffff811a7938>] check_partition+0x138/0x190 21:42:04 ...: [ 362.520338] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:42:04 ...: [ 362.520344] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:42:04 ...: [ 362.520350] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:42:04 ...: [ 362.520356] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:42:04 ...: [ 362.520362] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:42:04 ...: [ 362.520369] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:42:04 ...: [ 362.520377] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:42:04 ...: [ 362.520385] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:42:04 ...: [ 362.520391] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:42:04 ...: [ 362.520398] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:42:04 ...: [ 362.520406] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:42:04 ...: [ 362.520414] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:42:04 ...: [ 362.520421] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:42:04 ...: [ 362.520428] [<ffffffff811418b0>] sys_open+0x20/0x30 21:42:04 ...: [ 362.520437] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:42:04 ...: [ 362.520446] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520454] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:42:04 ...: [ 362.520462] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520470] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:42:04 ...: [ 362.520478] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:42:04 ...: [ 362.520485] Call Trace: 21:42:04 ...: [ 362.520495] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:42:04 ...: [ 362.520502] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:42:04 ...: [ 362.520508] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:42:04 ...: [ 362.520514] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:42:04 ...: [ 362.520520] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:42:04 ...: [ 362.520527] [<ffffffff81145375>] __fput+0xf5/0x210 21:42:04 ...: [ 362.520534] [<ffffffff811454b5>] fput+0x25/0x30 21:42:04 ...: [ 362.520540] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:42:04 ...: [ 362.520546] [<ffffffff81141697>] sys_close+0xb7/0x120 21:42:04 ...: [ 362.520553] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:42:04 ...: [ 362.520559] INFO: task md0_reshape:2287 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520567] md0_reshape D ffff88003aee96f0 0 2287 2 0x00000000 21:42:04 ...: [ 362.520575] ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520582] ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0 21:42:04 ...: [ 362.520590] 0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8 21:42:04 ...: [ 362.520597] Call Trace: 21:42:04 ...: [ 362.520608] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:42:04 ...: [ 362.520616] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:42:04 ...: [ 362.520626] [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456] 21:42:04 ...: [ 362.520634] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520644] [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456] 21:42:04 ...: [ 362.520651] [<ffffffff81052713>] ? __wake_up+0x53/0x70 21:42:04 ...: [ 362.520658] [<ffffffff814156b1>] md_do_sync+0x621/0xbb0 21:42:04 ...: [ 362.520668] [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10 21:42:04 ...: [ 362.520675] [<ffffffff8141640c>] md_thread+0x5c/0x130 21:42:04 ...: [ 362.520681] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520688] [<ffffffff814163b0>] ? md_thread+0x0/0x130 21:42:04 ...: [ 362.520694] [<ffffffff81084416>] kthread+0x96/0xa0 21:42:04 ...: [ 362.520701] [<ffffffff810131ea>] child_rip+0xa/0x20 21:42:04 ...: [ 362.520707] [<ffffffff81084380>] ? kthread+0x0/0xa0 21:42:04 ...: [ 362.520713] [<ffffffff810131e0>] ? child_rip+0x0/0x20 21:42:04 ...: [ 362.520718] INFO: task mdadm:2288 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520725] mdadm D 0000000000000000 0 2288 1 0x00000000 21:42:04 ...: [ 362.520733] ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520741] ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000 21:42:04 ...: [ 362.520748] 0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8 21:42:04 ...: [ 362.520755] Call Trace: 21:42:04 ...: [ 362.520763] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:42:04 ...: [ 362.520771] [<ffffffff812a6d50>] ? exact_match+0x0/0x10 21:42:04 ...: [ 362.520777] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:42:04 ...: [ 362.520783] [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0 21:42:04 ...: [ 362.520790] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:42:04 ...: [ 362.520795] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:42:04 ...: [ 362.520801] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:42:04 ...: [ 362.520808] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:42:04 ...: [ 362.520815] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:42:04 ...: [ 362.520821] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:42:04 ...: [ 362.520828] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:42:04 ...: [ 362.520834] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:42:04 ...: [ 362.520841] [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40 21:42:04 ...: [ 362.520848] [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330 21:42:04 ...: [ 362.520855] [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0 21:42:04 ...: [ 362.520862] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:42:04 ...: [ 362.520868] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:42:04 ...: [ 362.520874] [<ffffffff811418b0>] sys_open+0x20/0x30 21:42:04 ...: [ 362.520882] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520065] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520077] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:44:04 ...: [ 482.520087] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520096] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:44:04 ...: [ 482.520104] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:44:04 ...: [ 482.520112] Call Trace: 21:44:04 ...: [ 482.520139] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:44:04 ...: [ 482.520154] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:44:04 ...: [ 482.520165] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:44:04 ...: [ 482.520175] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:44:04 ...: [ 482.520185] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520194] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:44:04 ...: [ 482.520201] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:44:04 ...: [ 482.520212] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:44:04 ...: [ 482.520221] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:44:04 ...: [ 482.520229] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:44:04 ...: [ 482.520237] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:44:04 ...: [ 482.520244] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:44:04 ...: [ 482.520252] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:44:04 ...: [ 482.520258] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:44:04 ...: [ 482.520266] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:44:04 ...: [ 482.520273] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:44:04 ...: [ 482.520280] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:44:04 ...: [ 482.520286] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:44:04 ...: [ 482.520293] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:44:04 ...: [ 482.520299] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:44:04 ...: [ 482.520306] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:44:04 ...: [ 482.520313] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:44:04 ...: [ 482.520319] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:44:04 ...: [ 482.520327] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:44:04 ...: [ 482.520334] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:44:04 ...: [ 482.520341] [<ffffffff811a7938>] check_partition+0x138/0x190 21:44:04 ...: [ 482.520348] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:44:04 ...: [ 482.520355] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:44:04 ...: [ 482.520361] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:44:04 ...: [ 482.520367] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:44:04 ...: [ 482.520373] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:44:04 ...: [ 482.520380] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:44:04 ...: [ 482.520388] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:44:04 ...: [ 482.520396] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:44:04 ...: [ 482.520403] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:44:04 ...: [ 482.520410] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:44:04 ...: [ 482.520417] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:44:04 ...: [ 482.520426] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:44:04 ...: [ 482.520432] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:44:04 ...: [ 482.520438] [<ffffffff811418b0>] sys_open+0x20/0x30 21:44:04 ...: [ 482.520447] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520458] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520467] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:44:04 ...: [ 482.520475] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520483] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:44:04 ...: [ 482.520490] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:44:04 ...: [ 482.520498] Call Trace: 21:44:04 ...: [ 482.520508] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:44:04 ...: [ 482.520515] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:44:04 ...: [ 482.520521] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:44:04 ...: [ 482.520527] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:44:04 ...: [ 482.520533] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:44:04 ...: [ 482.520541] [<ffffffff81145375>] __fput+0xf5/0x210 21:44:04 ...: [ 482.520547] [<ffffffff811454b5>] fput+0x25/0x30 21:44:04 ...: [ 482.520554] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:44:04 ...: [ 482.520560] [<ffffffff81141697>] sys_close+0xb7/0x120 21:44:04 ...: [ 482.520568] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520574] INFO: task md0_reshape:2287 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520582] md0_reshape D ffff88003aee96f0 0 2287 2 0x00000000 21:44:04 ...: [ 482.520590] ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520597] ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0 21:44:04 ...: [ 482.520605] 0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8 21:44:04 ...: [ 482.520612] Call Trace: 21:44:04 ...: [ 482.520623] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:44:04 ...: [ 482.520633] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:44:04 ...: [ 482.520643] [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456] 21:44:04 ...: [ 482.520651] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520661] [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456] 21:44:04 ...: [ 482.520668] [<ffffffff81052713>] ? __wake_up+0x53/0x70 21:44:04 ...: [ 482.520675] [<ffffffff814156b1>] md_do_sync+0x621/0xbb0 21:44:04 ...: [ 482.520685] [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10 21:44:04 ...: [ 482.520692] [<ffffffff8141640c>] md_thread+0x5c/0x130 21:44:04 ...: [ 482.520699] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520705] [<ffffffff814163b0>] ? md_thread+0x0/0x130 21:44:04 ...: [ 482.520711] [<ffffffff81084416>] kthread+0x96/0xa0 21:44:04 ...: [ 482.520718] [<ffffffff810131ea>] child_rip+0xa/0x20 21:44:04 ...: [ 482.520725] [<ffffffff81084380>] ? kthread+0x0/0xa0 21:44:04 ...: [ 482.520730] [<ffffffff810131e0>] ? child_rip+0x0/0x20 21:44:04 ...: [ 482.520735] INFO: task mdadm:2288 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520743] mdadm D 0000000000000000 0 2288 1 0x00000000 21:44:04 ...: [ 482.520751] ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520759] ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000 21:44:04 ...: [ 482.520767] 0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8 21:44:04 ...: [ 482.520774] Call Trace: 21:44:04 ...: [ 482.520782] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:44:04 ...: [ 482.520790] [<ffffffff812a6d50>] ? exact_match+0x0/0x10 21:44:04 ...: [ 482.520797] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:44:04 ...: [ 482.520804] [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0 21:44:04 ...: [ 482.520810] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:44:04 ...: [ 482.520816] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:44:04 ...: [ 482.520822] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:44:04 ...: [ 482.520829] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:44:04 ...: [ 482.520837] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:44:04 ...: [ 482.520843] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:44:04 ...: [ 482.520850] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:44:04 ...: [ 482.520857] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:44:04 ...: [ 482.520864] [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40 21:44:04 ...: [ 482.520871] [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330 21:44:04 ...: [ 482.520878] [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0 21:44:04 ...: [ 482.520885] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:44:04 ...: [ 482.520891] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:44:04 ...: [ 482.520897] [<ffffffff811418b0>] sys_open+0x20/0x30 21:44:04 ...: [ 482.520905] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:46:04 ...: [ 602.520053] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:46:04 ...: [ 602.520059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:46:04 ...: [ 602.520065] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:46:04 ...: [ 602.520075] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:46:04 ...: [ 602.520084] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:46:04 ...: [ 602.520091] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:46:04 ...: [ 602.520099] Call Trace: 21:46:04 ...: [ 602.520127] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:46:04 ...: [ 602.520142] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:46:04 ...: [ 602.520153] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:46:04 ...: [ 602.520162] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:46:04 ...: [ 602.520171] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:46:04 ...: [ 602.520180] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:46:04 ...: [ 602.520187] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:46:04 ...: [ 602.520197] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:46:04 ...: [ 602.520206] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:46:04 ...: [ 602.520215] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:46:04 ...: [ 602.520222] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:46:04 ...: [ 602.520229] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:46:04 ...: [ 602.520237] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:46:04 ...: [ 602.520244] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:46:04 ...: [ 602.520252] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:46:04 ...: [ 602.520259] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:46:04 ...: [ 602.520266] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:46:04 ...: [ 602.520273] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:46:04 ...: [ 602.520279] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:46:04 ...: [ 602.520285] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:46:04 ...: [ 602.520292] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:46:04 ...: [ 602.520300] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:46:04 ...: [ 602.520306] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:46:04 ...: [ 602.520314] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:46:04 ...: [ 602.520321] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:46:04 ...: [ 602.520328] [<ffffffff811a7938>] check_partition+0x138/0x190 21:46:04 ...: [ 602.520335] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:46:04 ...: [ 602.520342] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:46:04 ...: [ 602.520348] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:46:04 ...: [ 602.520354] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:46:04 ...: [ 602.520359] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:46:04 ...: [ 602.520367] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:46:04 ...: [ 602.520375] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:46:04 ...: [ 602.520383] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:46:04 ...: [ 602.520390] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:46:04 ...: [ 602.520397] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:46:04 ...: [ 602.520404] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:46:04 ...: [ 602.520413] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:46:04 ...: [ 602.520419] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:46:04 ...: [ 602.520425] [<ffffffff811418b0>] sys_open+0x20/0x30 21:46:04 ...: [ 602.520434] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:46:04 ...: [ 602.520443] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:46:04 ...: [ 602.520447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:46:04 ...: [ 602.520451] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:46:04 ...: [ 602.520460] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:46:04 ...: [ 602.520468] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:46:04 ...: [ 602.520475] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:46:04 ...: [ 602.520483] Call Trace: 21:46:04 ...: [ 602.520492] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:46:04 ...: [ 602.520500] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:46:04 ...: [ 602.520506] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:46:04 ...: [ 602.520512] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:46:04 ...: [ 602.520518] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:46:04 ...: [ 602.520526] [<ffffffff81145375>] __fput+0xf5/0x210 21:46:04 ...: [ 602.520533] [<ffffffff811454b5>] fput+0x25/0x30 21:46:04 ...: [ 602.520539] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:46:04 ...: [ 602.520545] [<ffffffff81141697>] sys_close+0xb7/0x120 21:46:04 ...: [ 602.520552] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

Read the article
mdadm software RAID doesn't load at boot

- by martinald

Hi there, I'm using mdadm to create a raid1 mirror across two disks. I can create my /dev/md5 array perfectly using the tools, but it does not automatically reload my /dev/md5 when I restart, I need to manually recreate the array. Am I missing something obvious here?

Read the article
flashcache with mdadm and LVM

- by Backtogeek

I am having trouble setting up flashcache on a system with LVM and mdadm, I suspect I am either just missing an obvious step or getting some mapping wrong and hoped someone could point me in the right direction? system info: CentOS 6.4 64 bit mdadm config md0 : active raid1 sdd3[2] sde3[3] sdf3[4] sdg3[5] sdh3[1] sda3[0] 204736 blocks super 1.0 [6/6] [UUUUUU] md2 : active raid6 sdd5[2] sde5[3] sdf5[4] sdg5[5] sdh5[1] sda5[0] 3794905088 blocks super 1.1 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU] md3 : active raid0 sdc1[1] sdb1[0] 250065920 blocks super 1.1 512k chunks md1 : active raid10 sdh1[1] sda1[0] sdd1[2] sdf1[4] sdg1[5] sde1[3] 76749312 blocks super 1.1 512K chunks 2 near-copies [6/6] [UUUUUU] pcsvan PV /dev/mapper/ssdcache VG Xenvol lvm2 [3.53 TiB / 3.53 TiB free] Total: 1 [3.53 TiB] / in use: 1 [3.53 TiB] / in no VG: 0 [0 ] flashcache create command used: flashcache_create -p back ssdcache /dev/md3 /dev/md2 pvdisplay --- Physical volume --- PV Name /dev/mapper/ssdcache VG Name Xenvol PV Size 3.53 TiB / not usable 106.00 MiB Allocatable yes PE Size 128.00 MiB Total PE 28952 Free PE 28912 Allocated PE 40 PV UUID w0ENVR-EjvO-gAZ8-TQA1-5wYu-ISOk-pJv7LV vgdisplay --- Volume group --- VG Name Xenvol System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size 3.53 TiB PE Size 128.00 MiB Total PE 28952 Alloc PE / Size 40 / 5.00 GiB Free PE / Size 28912 / 3.53 TiB VG UUID 7vfKWh-ENPb-P8dV-jVlb-kP0o-1dDd-N8zzYj So that is where I am at, I thought that was the job done however when creating a logical volume called test and mounting it is /mnt/test the sequential write is pathetic, 60 ish MB/s /dev/md3 has 2 x SSD's in Raid0 which alone is performing at around 800 MB/s sequential write and I am trying to cache /dev/md2 which is 6 x 1TB drives in raid6 I have read a number of pages through the day and some of them here, it is obvious from the results that the cache is not functioning but I am unsure why. I have added the filter line in the lvm.conf filter = [ "r|/dev/sdb|", "r|/dev/sdc|", "r|/dev/md3|" ] It is probably something silly but the cache is clearly performing no writes so I suspect I am not mapping it or have not mounted the cache correctly. dmsetup status ssdcache: 0 7589810176 flashcache stats: reads(142), writes(0) read hits(133), read hit percent(93) write hits(0) write hit percent(0) dirty write hits(0) dirty write hit percent(0) replacement(0), write replacement(0) write invalidates(0), read invalidates(0) pending enqueues(0), pending inval(0) metadata dirties(0), metadata cleans(0) metadata batch(0) metadata ssd writes(0) cleanings(0) fallow cleanings(0) no room(0) front merge(0) back merge(0) force_clean_block(0) disk reads(9), disk writes(0) ssd reads(133) ssd writes(9) uncached reads(0), uncached writes(0), uncached IO requeue(0) disk read errors(0), disk write errors(0) ssd read errors(0) ssd write errors(0) uncached sequential reads(0), uncached sequential writes(0) pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0) lru hot blocks(31136000), lru warm blocks(31136000) lru promotions(0), lru demotions(0) Xenvol-test: 0 10485760 linear I have included as much info as I can think of, look forward to any replies.

Read the article
How do I tell mdadm to start using a missing disk in my RAID5 array again?

- by Jon Cage

I have a 3-disk RAID array running in my Ubuntu server. This has been running flawlessly for over a year but I was recently forced to strip, move and rebuild the machine. When I had it all back together and ran up Ubuntu, I had some problems with disks not being detected. A couple of reboots later and I'd solved that issue. The problem now is that the 3-disk array is showing up as degraded every time I boot up. For some reason it seems that Ubuntu has made a new array and added the missing disk to it. I've tried stopping the new 1-disk array and adding the missing disk, but I'm struggling. On startup I get this: root@uberserver:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md_d1 : inactive sdf1[2](S) 1953511936 blocks md0 : active raid5 sdg1[2] sdc1[3] sdb1[1] sdh1[0] 2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] I have two RAID arrays and the one that normally pops up as md1 isn't appearing. I read somewhere that calling mdadm --assemble --scan would re-assemble the missing array so I've tried first stopping the existing array that ubuntu started: root@uberserver:~# mdadm --stop /dev/md_d1 mdadm: stopped /dev/md_d1 ...and then tried to tell ubuntu to pick the disks up again: root@uberserver:~# mdadm --assemble --scan mdadm: /dev/md/1 has been started with 2 drives (out of 3). So that's started md1 again but it's not picking up the disk from md_d1: root@uberserver:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sde1[1] sdf1[2] 3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU] md_d1 : inactive sdd1[0](S) 1953511936 blocks md0 : active raid5 sdg1[2] sdc1[3] sdb1[1] sdh1[0] 2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] What's going wrong here? Why is Ubuntu trying to pick up sdd into a different array? How do I get that missing disk back home again? [Edit] - After adding the md1 to mdadm.conf it now tries to mount the array on startup but it's still missing the disk. If I tell it to try and assemble automatically I get the impression it know it needs sdd but can't use it: root@uberserver:~# mdadm --assemble --scan /dev/md1: File exists mdadm: /dev/md/1 already active, cannot restart it! mdadm: /dev/md/1 needed for /dev/sdd1... What am I missing?

Read the article
mdadm: Win7-install created a boot partition on one of my RAID6 drives. How to rebuild?

- by EXIT_FAILURE

My problem happened when I attempted to install Windows 7 on it's own SSD. The Linux OS I used which has knowledge of the software RAID system is on a SSD that I disconnected prior to the install. This was so that windows (or I) wouldn't inadvertently mess it up. However, and in retrospect, foolishly, I left the RAID disks connected, thinking that windows wouldn't be so ridiculous as to mess with a HDD that it sees as just unallocated space. Boy was I wrong! After copying over the installation files to the SSD (as expected and desired), it also created an ntfs partition on one of the RAID disks. Both unexpected and totally undesired! . I changed out the SSDs again, and booted up in linux. mdadm didn't seem to have any problem assembling the array as before, but if I tried to mount the array, I got the error message: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I then used qparted to delete the newly created ntfs partition on /dev/sdd so that it matched the other three /dev/sd{b,c,e}, and requested a resync of my array with echo repair > /sys/block/md0/md/sync_action This took around 4 hours, and upon completion, dmesg reports: md: md0: requested-resync done. A bit brief after a 4-hour task, though I'm unsure as to where other log files exist (I also seem to have messed up my sendmail configuration). In any case: No change reported according to mdadm, everything checks out. mdadm -D /dev/md0 still reports: Version : 1.2 Creation Time : Wed May 23 22:18:45 2012 Raid Level : raid6 Array Size : 3907026848 (3726.03 GiB 4000.80 GB) Used Dev Size : 1953513424 (1863.02 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon May 26 12:41:58 2014 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 4K Name : okamilinkun:0 UUID : 0c97ebf3:098864d8:126f44e3:e4337102 Events : 423 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde Trying to mount it still reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I'm a bit unsure where to proceed from here, and trying stuff "to see if it works" is a bit too risky for me. This is what I suggest I should attempt to do: Tell mdadm that /dev/sdd (the one that windows wrote into) isn't reliable anymore, pretend it is newly re-introduced to the array, and reconstruct its content based on the other three drives. I also could be totally wrong in my assumptions, that the creation of the ntfs partition on /dev/sdd and subsequent deletion has changed something that cannot be fixed this way. My question: Help, what should I do? If I should do what I suggested , how do I do that? From reading documentation, etc, I would think maybe: mdadm --manage /dev/md0 --set-faulty /dev/sdd mdadm --manage /dev/md0 --remove /dev/sdd mdadm --manage /dev/md0 --re-add /dev/sdd However, the documentation examples suggest /dev/sdd1, which seems strange to me, as there is no partition there as far as linux is concerned, just unallocated space. Maybe these commands won't work without. Maybe it makes sense to mirror the partition table of one of the other raid devices that weren't touched, before --re-add. Something like: sfdisk -d /dev/sdb | sfdisk /dev/sdd Bonus question: Why would the Windows 7 installation do something so st...potentially dangerous? Update I went ahead and marked /dev/sdd as faulty, and removed it (not physically) from the array: # mdadm --manage /dev/md0 --set-faulty /dev/sdd # mdadm --manage /dev/md0 --remove /dev/sdd However, attempting to --re-add was disallowed: # mdadm --manage /dev/md0 --re-add /dev/sdd mdadm: --re-add for /dev/sdd to /dev/md0 is not possible --add, was fine. # mdadm --manage /dev/md0 --add /dev/sdd mdadm -D /dev/md0 now reports the state as clean, degraded, recovering, and /dev/sdd as spare rebuilding. /proc/mdstat shows the recovery progress: md0 : active raid6 sdd[4] sdc[1] sde[3] sdb[0] 3907026848 blocks super 1.2 level 6, 4k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 2.1% (42887780/1953513424) finish=348.7min speed=91297K/sec nmon also shows expected output: ¦sdb 0% 87.3 0.0| > |¦ ¦sdc 71% 109.1 0.0|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR > |¦ ¦sdd 40% 0.0 87.3|WWWWWWWWWWWWWWWWWWWW > |¦ ¦sde 0% 87.3 0.0|> || It looks good so far. Crossing my fingers for another five+ hours :) Update 2 The recovery of /dev/sdd finished, with dmesg output: [44972.599552] md: md0: recovery done. [44972.682811] RAID conf printout: [44972.682815] --- level:6 rd:4 wd:4 [44972.682817] disk 0, o:1, dev:sdb [44972.682819] disk 1, o:1, dev:sdc [44972.682820] disk 2, o:1, dev:sdd [44972.682821] disk 3, o:1, dev:sde Attempting mount /dev/md0 reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And on dmesg: [44984.159908] EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! [44984.159912] EXT4-fs (md0): group descriptors corrupted! I'm not sure what do do now. Suggestions? Output of dumpe2fs /dev/md0: dumpe2fs 1.42.8 (20-Jun-2013) Filesystem volume name: Atlas Last mounted on: /mnt/atlas Filesystem UUID: e7bfb6a4-c907-4aa0-9b55-9528817bfd70 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 244195328 Block count: 976756712 Reserved block count: 48837835 Free blocks: 92000180 Free inodes: 243414877 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 791 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stripe width: 2 Flex block group size: 16 Filesystem created: Thu May 24 07:22:41 2012 Last mount time: Sun May 25 23:44:38 2014 Last write time: Sun May 25 23:46:42 2014 Mount count: 341 Maximum mount count: -1 Last checked: Thu May 24 07:22:41 2012 Check interval: 0 (<none>) Lifetime writes: 4357 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: e177a374-0b90-4eaa-b78f-d734aae13051 Journal backup: inode blocks dumpe2fs: Corrupt extent header while reading journal super block

Read the article
Problem with RAID5 (mdadm) - disk detached

- by poscaman

Having these lines in /var/log/syslog Apr 18 16:53:05 Server kernel: [4487878.816036] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 Apr 18 16:53:05 Server kernel: [4487878.816058] ata4: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0 Apr 18 16:53:05 Server kernel: [4487878.816059] dhfis 0x1 dmafis 0x1 sdbfis 0x0 Apr 18 16:53:05 Server kernel: [4487878.816093] ata4: ATA_REG 0x40 ERR_REG 0x0 Apr 18 16:53:05 Server kernel: [4487878.816108] ata4: tag : dhfis dmafis sdbfis sacitve Apr 18 16:53:05 Server kernel: [4487878.816125] ata4: tag 0x0: 1 1 0 1 Apr 18 16:53:05 Server kernel: [4487878.816150] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Apr 18 16:53:05 Server kernel: [4487878.816178] ata4.00: failed command: WRITE FPDMA QUEUED Apr 18 16:53:05 Server kernel: [4487878.816199] ata4.00: cmd 61/08:00:00:88:e0/00:00:e8:00:00/40 tag 0 ncq 4096 out Apr 18 16:53:05 Server kernel: [4487878.816200] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 18 16:53:05 Server kernel: [4487878.816253] ata4.00: status: { DRDY } Apr 18 16:53:05 Server kernel: [4487878.816272] ata4: hard resetting link Apr 18 16:53:05 Server kernel: [4487878.816274] ata4: nv: skipping hardreset on occupied port Apr 18 16:53:06 Server kernel: [4487879.676029] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 18 16:53:07 Server kernel: [4487880.416749] ata4.00: n_sectors mismatch 3907029168 != 268435455 Apr 18 16:53:07 Server kernel: [4487880.416752] ata4.00: revalidation failed (errno=-19) Apr 18 16:53:07 Server kernel: [4487880.416773] ata4.00: limiting speed to UDMA/133:PIO2 Apr 18 16:53:11 Server kernel: [4487884.676024] ata4: hard resetting link Apr 18 16:53:11 Server kernel: [4487884.676027] ata4: nv: skipping hardreset on occupied port Apr 18 16:53:12 Server kernel: [4487885.144032] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 18 16:53:12 Server kernel: [4487885.240185] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) Apr 18 16:53:12 Server kernel: [4487885.240190] ata4.00: revalidation failed (errno=-5) Apr 18 16:53:12 Server kernel: [4487885.240210] ata4.00: disabled Apr 18 16:53:17 Server kernel: [4487890.144023] ata4: hard resetting link Apr 18 16:53:17 Server kernel: [4487891.024033] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 18 16:53:17 Server kernel: [4487891.033357] ata4.00: ATA-8: WDC WD20EARS-00S8B1, 80.00A80, max UDMA/133 Apr 18 16:53:17 Server kernel: [4487891.033360] ata4.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32) Apr 18 16:53:17 Server kernel: [4487891.048347] ata4.00: configured for UDMA/133 Apr 18 16:53:17 Server kernel: [4487891.048361] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Apr 18 16:53:17 Server kernel: [4487891.048365] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] Apr 18 16:53:17 Server kernel: [4487891.048369] Descriptor sense data with sense descriptors (in hex): Apr 18 16:53:17 Server kernel: [4487891.048371] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Apr 18 16:53:17 Server kernel: [4487891.048378] 00 00 00 00 Apr 18 16:53:17 Server kernel: [4487891.048382] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information Apr 18 16:53:17 Server kernel: [4487891.048385] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 e8 e0 88 00 00 00 08 00 Apr 18 16:53:17 Server kernel: [4487891.048393] end_request: I/O error, dev sdc, sector 3907028992 Apr 18 16:53:17 Server kernel: [4487891.048420] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048440] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048458] end_request: I/O error, dev sdc, sector 3907028992 Apr 18 16:53:17 Server kernel: [4487891.048477] md: super_written gets error=-5, uptodate=0 Apr 18 16:53:17 Server kernel: [4487891.048482] raid5: Disk failure on sdc, disabling device. Apr 18 16:53:17 Server kernel: [4487891.048483] raid5: Operation continuing on 3 devices. Apr 18 16:53:17 Server kernel: [4487891.048525] ata4: EH complete Apr 18 16:53:17 Server kernel: [4487891.048554] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048576] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048596] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048615] sd 3:0:0:0: [sdc] READ CAPACITY(16) failed Apr 18 16:53:17 Server kernel: [4487891.048617] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Apr 18 16:53:17 Server kernel: [4487891.048620] sd 3:0:0:0: [sdc] Sense not available. Apr 18 16:53:17 Server kernel: [4487891.048624] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048643] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048663] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048681] sd 3:0:0:0: [sdc] READ CAPACITY failed Apr 18 16:53:17 Server kernel: [4487891.048683] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Apr 18 16:53:17 Server kernel: [4487891.048685] sd 3:0:0:0: [sdc] Sense not available. Apr 18 16:53:17 Server kernel: [4487891.048689] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048709] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048800] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.048860] sd 3:0:0:0: rejecting I/O to offline device Apr 18 16:53:17 Server kernel: [4487891.049028] sd 3:0:0:0: [sdc] Asking for cache data failed Apr 18 16:53:17 Server kernel: [4487891.049048] sd 3:0:0:0: [sdc] Assuming drive cache: write through Apr 18 16:53:17 Server kernel: [4487891.049071] sdc: detected capacity change from 2000398934016 to 0 Apr 18 16:53:17 Server kernel: [4487891.049080] ata4.00: detaching (SCSI 3:0:0:0) Apr 18 16:53:18 Server kernel: [4487891.061149] sd 3:0:0:0: [sdc] Stopping disk Apr 18 16:53:18 Server kernel: [4487891.485492] RAID5 conf printout: Apr 18 16:53:18 Server kernel: [4487891.485496] --- rd:4 wd:3 Apr 18 16:53:18 Server kernel: [4487891.485500] disk 0, o:1, dev:sdb Apr 18 16:53:18 Server kernel: [4487891.485502] disk 1, o:0, dev:sdc Apr 18 16:53:18 Server kernel: [4487891.485504] disk 2, o:1, dev:sdd Apr 18 16:53:18 Server kernel: [4487891.485506] disk 3, o:1, dev:sde Apr 18 16:53:18 Server kernel: [4487891.497014] RAID5 conf printout: Apr 18 16:53:18 Server kernel: [4487891.497016] --- rd:4 wd:3 Apr 18 16:53:18 Server kernel: [4487891.497018] disk 0, o:1, dev:sdb Apr 18 16:53:18 Server kernel: [4487891.497019] disk 2, o:1, dev:sdd Apr 18 16:53:18 Server kernel: [4487891.497021] disk 3, o:1, dev:sde Apr 18 16:53:18 Server kernel: [4487891.838719] scsi 3:0:0:0: Direct-Access ATA WDC WD20EARS-00S 80.0 PQ: 0 ANSI: 5 Apr 18 16:53:18 Server kernel: [4487891.838886] sd 3:0:0:0: Attached scsi generic sg3 type 0 Apr 18 16:53:18 Server kernel: [4487891.838911] sd 3:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Apr 18 16:53:18 Server kernel: [4487891.838964] sd 3:0:0:0: [sdf] Write Protect is off Apr 18 16:53:18 Server kernel: [4487891.838967] sd 3:0:0:0: [sdf] Mode Sense: 00 3a 00 00 Apr 18 16:53:18 Server kernel: [4487891.838988] sd 3:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 18 16:53:20 Server kernel: [4487891.839147] sdf: unknown partition table Apr 18 16:53:20 Server kernel: [4487893.130026] sd 3:0:0:0: [sdf] Attached SCSI disk Right now, i'm unable to do anything on /dev/sdc. Is there any way to try to re-attach it? I don't want to power-down the server unless absolutely necessary System: Debian Stable 2.6.32-5-amd64 mdadm version 3.1.4-1+8efb9d1 cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdb[0] sdc[4](F) sde[3] sdd[2] 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU] unused devices: <none> mdadm --examine --scan ARRAY /dev/md0 UUID=1a7744b5:912ec7af:f82a9565:e3b453b4

Read the article
MDADM RAID5 array showing as CLEAN FAULTY

- by Dean Reilly

A drive dropped out of my raid 5 array yesterday. It looks like the reason was due to a bad controller so I switched it out and attempted to re-add the drive but mdadm claimed it couldn't do it. So I zerod the superblock and just added the drive normally and left it to resync. When I came to check on the array this morning I was unable to mount it at all and it's now showing as CLEAN FAULTY with two drives missing. The two missing drives are listed as spare and faulty spare. Is there anything I can do in this situation or is the array gone?

Read the article
Recovering a mdadm+lvm+ext4 partition with read error

- by bitwelder

One of disks in my NAS has failed. The NAS is running Linux, and it uses mdadm + LVM technology for its filesystems. I do have backup for most of the contents, but not for the very last changes, and if possible, I'd like to recover that from this failing disk. The disk (a 'green drive' WD10EARS 1TB in size) throws this kind of errors: Oct 3 12:00:41 kernel: [ 3625.620000] ata5.00: read unc at 9453282 Oct 3 12:00:41 kernel: [ 3625.620000] lba 9453282 start 9453280 end 1953511007 Oct 3 12:00:41 kernel: [ 3625.620000] sde5 auto_remap 0 Oct 3 12:00:41 kernel: [ 3625.630000] ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 Oct 3 12:00:41 kernel: [ 3625.630000] ata5.00: edma_err_cause=00000084 pp_flags=00000003, dev error, EDMA self-disable Oct 3 12:00:41 kernel: [ 3625.640000] ata5.00: failed command: READ FPDMA QUEUED Oct 3 12:00:41 kernel: [ 3625.650000] ata5.00: cmd 60/40:00:e0:3e:90/00:00:00:00:00/40 tag 0 ncq 32768 in Oct 3 12:00:41 kernel: [ 3625.650000] res 41/40:00:e2:3e:90/12:00:00:00:00/40 Emask 0x409 (media error) <F> Oct 3 12:00:41 kernel: [ 3625.660000] ata5.00: status: { DRDY ERR } However, while testing with 'dd', I noticed that if I skip the first 4kB, the read seems to be ok, i.e. a command like. dd if=/dev/sde5 of=dev/null bs=4k count=1000 skip=1 doesn't return any read error. Supposing that there is no other read failure in the rest of the disk, would I be able to recover this 900 GB partition (as I mentioned before, it's a 'linux raid autodetect' partition, that contains a a LVM2 volume that contains a ext4 filesystem) if I copy-clone the partition somewhere else, but the first 4kB?

Read the article
mdadm: Replacing array with entirely new drives

- by hellfur

I have a server with three 500GB drives, with most of my data in a RAID5 configuration spanning the three of them. I just purchased and installed four 1TB drives, and the intention is to move off of the old drives and onto the new ones. I have enough SATA ports and power connectors to power all seven of my drives at once, so I've kept the old RAID running while I figure out what to do with the new drives. My question is: Should I create a whole new array on the 1TB drives, then move everything over and reconfigure linux to boot from the new md arrays? Or should I just expand the array, swapping out each of the three 500GBs with the 1TB, then adding the final drive? I've read up on the mdadm extending drive setup, and it makes sense, but I imagine I would use one of the drives as a full backup while I move things over, then add that drive back into the array once things are up and running on three of the 1TB drives, so there's some complication in going that route as well... I'm just not sure which is safer/recommended.

Read the article
What tells initramfs or the Ubuntu Server boot process how to assemble RAID arrays?

- by Brad

The simple question: how does initramfs know how to assemble mdadm RAID arrays at startup? My problem: I boot my server and get: Gave up waiting for root device. ALERT! /dev/disk/by-uuid/[UUID] does not exist. Dropping to a shell! This happens because /dev/md0 (which is /boot, RAID 1) and /dev/md1 (which is /, RAID 5) are not being assembled correctly. What I get is /dev/md0 isn't assembled at all. /dev/md1 is assembled, but instead of using /dev/sda2, /dev/sdb2, /dev/sdc2, and /dev/sdd2, it uses /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd. To fix this and boot my server I do: $(initramfs) mdadm --stop /dev/md1 $(initramfs) mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 $(initramfs) mdadm --assemble /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 $(initramfs) exit And it boots properly and everything works. Now I just need the RAID arrays to assemble properly at boot so I don't have to manually assemble them. I've checked /etc/mdadm/mdadm.conf and the UUIDs of the two arrays listed in that file match the UUIDs from $ mdadm --detail /dev/md[0,1]. Other details: Ubuntu 10.10, GRUB2, mdadm 2.6.7.1 UPDATE: I have a feeling it has to do with superblocks. $ mdadm --examine /dev/sda outputs the same thing as $ mdadm --examine /dev/sda2. $ mdadm --examine /dev/sda1 seems to be fine because it outputs information about /dev/md0. I don't know if this is the problem or not, but it seems to fit with /dev/md1 getting assembled with /dev/sd[abcd] instead of /dev/sd[abcd]2. I tried zeroing the superblock on /dev/sd[abcd]. This removed the superblock from /dev/sd[abcd]2 as well and prevented me from being able to assemble /dev/md1 at all. I had to $ mdadm --create to get it back. This also put the super blocks back to the way they were.

Read the article
Ubuntu Server mdadm drbd ocfs2 kvm hangs under heavy file reading

- by Stefano Annese

I have deployed four ubuntu 10.04 server. They are coupled two by two in a cluster scenario. on both sides we have software raid1 disks, drbd8 and OCFS2 and on top of it some kvm machines run with qcow2 disks. I followed this: Link corosync is just used for DRBD and OCFS, the kvm machines are run "manually" When it works is fine: good performances, good I/O, but at a given time one of the two cluster started hanging. Then we tried with just one server turned on and it hangs the same. It seems to happen when an heavy READ in one of the virtual machines occurs, that is during rsyn backup. When the fact occurs the virtual machines are not reachable any more and the real server responds with good delay to the ping but no screen and no ssh is available. All we can do is force shutdown (hold the button) and restart and when it turns on again the raid on which relay drbd is resyncing. All the time it hangs we see such fact. After a couple of week of pain on one side this morning also the other cluster hung, but it has different moteherboard, ram, kvm instances. What is similar is reading for rsync scenario and Western Digital RAID Edistion disks on both side. Can anybody give me some input to solve such issue?

Read the article
mdadm - Recovering a 'split' RAID1 array

- by Hamza

I have two drives that used to be part of a single RAID1 volume but it appears that one of them went offline for some time, something I've noticed just now when I rebooted my system. I now seem to have two RAID volumes, as reported by: # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md126 : active raid1 sdc[1] 2096116 blocks super 1.2 [2/1] [_U] md127 : active (auto-read-only) raid1 sdb[0] 2096116 blocks super 1.2 [2/1] [U_] unused devices: <none> Not exactly sure where to go from here. How can I merge and re-sync these volumes without data loss? Thanks.

Read the article
Gentoo/mdadm: Remove 1 disk from RAID 0 array

- by Tim

The server has a 7-disk RAID 0 array, and sdf is starting to die. Is there a way to remove sdf while keeping the array intact? # df -h Filesystem Size Used Avail Use% Mounted on /dev/md1 14T 6.6T 7.0T 49% /var [...] # cat /proc/mdstat Personalities : [raid0] md1 : active raid0 sda4[0] sdf1[5] sdd1[3] sdb1[1] sde1[4] sdg1[6] sdc1[2] 14482788352 blocks 512k chunks Looking to keep downtime to a minimum.

Read the article
Rebuild mdadm RAID5 array with fewer disks

- by drjeep

I have a 4 disk RAID5 array, one of which is starting to fail according to smartd. However, since I'm using less than half the space on /dev/md0, I'd like to rebuild the array without the failing disk. The closest scenario I've been able to find online has been this post, however it contains bits that don't apply to me (LVM volumes) and also doesn't explain how I go about resizing the partition after I'm done. Please note I have backups of important data, but I'd like to avoid rebuilding the array from scratch if possible.

Read the article

Search Results

Search found 242 results on 10 pages for 'mdadm'.

Page 2/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >

- by CarstenCarsten

- by Matthew Hodgkins

- by JuanD

- by Matthew Hodgkins

- by johannes

- by daveyt

- by azera

- by danny

- by user65632

- by Stanley

- by Chris

- by BarsMonster

- by martinald

- by Backtogeek

- by Jon Cage

- by EXIT_FAILURE

- by poscaman

- by Dean Reilly

- by bitwelder

- by hellfur

- by Brad

- by Stefano Annese

- by Hamza

- by Tim

- by drjeep

< Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >