Problem with RAID5 (mdadm) - disk detached

Posted by poscaman on Server Fault See other posts from Server Fault or by poscaman
Published on 2011-04-18T14:17:37Z Indexed on 2012/06/28 15:18 UTC
Read the original article Hit count: 773

Filed under:
|
|

Having these lines in /var/log/syslog

 Apr 18 16:53:05 Server kernel: [4487878.816036] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
    Apr 18 16:53:05 Server kernel: [4487878.816058] ata4: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0
    Apr 18 16:53:05 Server kernel: [4487878.816059]   dhfis 0x1 dmafis 0x1 sdbfis 0x0
    Apr 18 16:53:05 Server kernel: [4487878.816093] ata4: ATA_REG 0x40 ERR_REG 0x0
    Apr 18 16:53:05 Server kernel: [4487878.816108] ata4: tag : dhfis dmafis sdbfis sacitve
    Apr 18 16:53:05 Server kernel: [4487878.816125] ata4: tag 0x0: 1 1 0 1
    Apr 18 16:53:05 Server kernel: [4487878.816150] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
    Apr 18 16:53:05 Server kernel: [4487878.816178] ata4.00: failed command: WRITE FPDMA QUEUED
    Apr 18 16:53:05 Server kernel: [4487878.816199] ata4.00: cmd 61/08:00:00:88:e0/00:00:e8:00:00/40 tag 0 ncq 4096 out
    Apr 18 16:53:05 Server kernel: [4487878.816200]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
    Apr 18 16:53:05 Server kernel: [4487878.816253] ata4.00: status: { DRDY }
    Apr 18 16:53:05 Server kernel: [4487878.816272] ata4: hard resetting link
    Apr 18 16:53:05 Server kernel: [4487878.816274] ata4: nv: skipping hardreset on occupied port
    Apr 18 16:53:06 Server kernel: [4487879.676029] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    Apr 18 16:53:07 Server kernel: [4487880.416749] ata4.00: n_sectors mismatch 3907029168 != 268435455
    Apr 18 16:53:07 Server kernel: [4487880.416752] ata4.00: revalidation failed (errno=-19)
    Apr 18 16:53:07 Server kernel: [4487880.416773] ata4.00: limiting speed to UDMA/133:PIO2
    Apr 18 16:53:11 Server kernel: [4487884.676024] ata4: hard resetting link
    Apr 18 16:53:11 Server kernel: [4487884.676027] ata4: nv: skipping hardreset on occupied port
    Apr 18 16:53:12 Server kernel: [4487885.144032] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    Apr 18 16:53:12 Server kernel: [4487885.240185] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
    Apr 18 16:53:12 Server kernel: [4487885.240190] ata4.00: revalidation failed (errno=-5)
    Apr 18 16:53:12 Server kernel: [4487885.240210] ata4.00: disabled
    Apr 18 16:53:17 Server kernel: [4487890.144023] ata4: hard resetting link
    Apr 18 16:53:17 Server kernel: [4487891.024033] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    Apr 18 16:53:17 Server kernel: [4487891.033357] ata4.00: ATA-8: WDC WD20EARS-00S8B1, 80.00A80, max UDMA/133
    Apr 18 16:53:17 Server kernel: [4487891.033360] ata4.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32)
    Apr 18 16:53:17 Server kernel: [4487891.048347] ata4.00: configured for UDMA/133
    Apr 18 16:53:17 Server kernel: [4487891.048361] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Apr 18 16:53:17 Server kernel: [4487891.048365] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
    Apr 18 16:53:17 Server kernel: [4487891.048369] Descriptor sense data with sense descriptors (in hex):
    Apr 18 16:53:17 Server kernel: [4487891.048371]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
    Apr 18 16:53:17 Server kernel: [4487891.048378]         00 00 00 00
    Apr 18 16:53:17 Server kernel: [4487891.048382] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information
    Apr 18 16:53:17 Server kernel: [4487891.048385] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 e8 e0 88 00 00 00 08 00
    Apr 18 16:53:17 Server kernel: [4487891.048393] end_request: I/O error, dev sdc, sector 3907028992
    Apr 18 16:53:17 Server kernel: [4487891.048420] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048440] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048458] end_request: I/O error, dev sdc, sector 3907028992
    Apr 18 16:53:17 Server kernel: [4487891.048477] md: super_written gets error=-5, uptodate=0
    Apr 18 16:53:17 Server kernel: [4487891.048482] raid5: Disk failure on sdc, disabling device.
    Apr 18 16:53:17 Server kernel: [4487891.048483] raid5: Operation continuing on 3 devices.
    Apr 18 16:53:17 Server kernel: [4487891.048525] ata4: EH complete
    Apr 18 16:53:17 Server kernel: [4487891.048554] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048576] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048596] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048615] sd 3:0:0:0: [sdc] READ CAPACITY(16) failed
    Apr 18 16:53:17 Server kernel: [4487891.048617] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    Apr 18 16:53:17 Server kernel: [4487891.048620] sd 3:0:0:0: [sdc] Sense not available.
    Apr 18 16:53:17 Server kernel: [4487891.048624] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048643] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048663] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048681] sd 3:0:0:0: [sdc] READ CAPACITY failed
    Apr 18 16:53:17 Server kernel: [4487891.048683] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    Apr 18 16:53:17 Server kernel: [4487891.048685] sd 3:0:0:0: [sdc] Sense not available.
    Apr 18 16:53:17 Server kernel: [4487891.048689] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048709] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048800] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.048860] sd 3:0:0:0: rejecting I/O to offline device
    Apr 18 16:53:17 Server kernel: [4487891.049028] sd 3:0:0:0: [sdc] Asking for cache data failed
    Apr 18 16:53:17 Server kernel: [4487891.049048] sd 3:0:0:0: [sdc] Assuming drive cache: write through
    Apr 18 16:53:17 Server kernel: [4487891.049071] sdc: detected capacity change from 2000398934016 to 0
    Apr 18 16:53:17 Server kernel: [4487891.049080] ata4.00: detaching (SCSI 3:0:0:0)
    Apr 18 16:53:18 Server kernel: [4487891.061149] sd 3:0:0:0: [sdc] Stopping disk
    Apr 18 16:53:18 Server kernel: [4487891.485492] RAID5 conf printout:
    Apr 18 16:53:18 Server kernel: [4487891.485496]  --- rd:4 wd:3
    Apr 18 16:53:18 Server kernel: [4487891.485500]  disk 0, o:1, dev:sdb
    Apr 18 16:53:18 Server kernel: [4487891.485502]  disk 1, o:0, dev:sdc
    Apr 18 16:53:18 Server kernel: [4487891.485504]  disk 2, o:1, dev:sdd
    Apr 18 16:53:18 Server kernel: [4487891.485506]  disk 3, o:1, dev:sde
    Apr 18 16:53:18 Server kernel: [4487891.497014] RAID5 conf printout:
    Apr 18 16:53:18 Server kernel: [4487891.497016]  --- rd:4 wd:3
    Apr 18 16:53:18 Server kernel: [4487891.497018]  disk 0, o:1, dev:sdb
    Apr 18 16:53:18 Server kernel: [4487891.497019]  disk 2, o:1, dev:sdd
    Apr 18 16:53:18 Server kernel: [4487891.497021]  disk 3, o:1, dev:sde
    Apr 18 16:53:18 Server kernel: [4487891.838719] scsi 3:0:0:0: Direct-Access     ATA      WDC WD20EARS-00S 80.0 PQ: 0 ANSI: 5
    Apr 18 16:53:18 Server kernel: [4487891.838886] sd 3:0:0:0: Attached scsi generic sg3 type 0
    Apr 18 16:53:18 Server kernel: [4487891.838911] sd 3:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
    Apr 18 16:53:18 Server kernel: [4487891.838964] sd 3:0:0:0: [sdf] Write Protect is off
    Apr 18 16:53:18 Server kernel: [4487891.838967] sd 3:0:0:0: [sdf] Mode Sense: 00 3a 00 00
    Apr 18 16:53:18 Server kernel: [4487891.838988] sd 3:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    Apr 18 16:53:20 Server kernel: [4487891.839147]  sdf: unknown partition table
    Apr 18 16:53:20 Server kernel: [4487893.130026] sd 3:0:0:0: [sdf] Attached SCSI disk

Right now, i'm unable to do anything on /dev/sdc. Is there any way to try to re-attach it? I don't want to power-down the server unless absolutely necessary

System:

  • Debian Stable 2.6.32-5-amd64
  • mdadm version 3.1.4-1+8efb9d1

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sdc[4](F) sde[3] sdd[2]
      5860543488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]

unused devices: <none>

mdadm --examine --scan

ARRAY /dev/md0 UUID=1a7744b5:912ec7af:f82a9565:e3b453b4

© Server Fault or respective owner

Related posts about debian

Related posts about mdadm