Formula to calculate probability of unrecoverable read error during RAID rebuild

Posted by OlafM on Super User See other posts from Super User or by OlafM
Published on 2012-12-09T11:34:44Z Indexed on 2012/12/13 17:06 UTC
Read the original article Hit count: 327

Filed under:
|
|

I need to compare the reliability of different RAID systems with either consumer or enterprise drives. The formula to have the probability of success of a rebuild, ignoring mechanical problems, is simple:

error_probability = 1 - (1-per_bit_error_rate)^bit_read

and with 3 TB drives I get

  • 38% probability to experience an URE (unrecoverable read error) for a 2+1 disks RAID5 (4.7% for enterprise drives)

  • 21% for a RAID1 (2.4% for enterprise drives)

  • 51% probability of error during recovery for the 3+1 RAID5 often used by users of SOHO products like Synologys. Most people don't know about this.

Calculating the error for single disk tolerance is easy, my question concerns systems tolerant to multiple disks failures (RAID6/Z2, RAIDZ3 and RAID1 with multiple disks).

If only the first disk is used for rebuild and the second one is read again from the beginning in case or an URE, then the error probability is the one calculated above squared (14.5% for consumer RAID5 2+1, 4.5% for consumer RAID1 1+2). However, I suppose (at least in ZFS that has full checksums!) that the second parity/available disk is read only where needed, meaning that only few sectors are needed: how many UREs can possibly happen in the first disk? not many, otherwise the error probability for single-disk tolerance systems would skyrocket even more than I calculated.

If I'm correct, a second parity disk would practically lower the risk to extremely low values.

Am I correct?

© Super User or respective owner

Related posts about raid

Related posts about zfs