Bad motherboard / controller / HDs?

Posted by quidpro on Server Fault See other posts from Server Fault or by quidpro
Published on 2011-03-05T05:07:40Z Indexed on 2011/03/05 7:26 UTC
Read the original article Hit count: 645

Filed under:
|
|
|

On a leased server, I am running into some timing issues with an application that requires precise timing. Server is a Dual Xeon E5410 running on a Supermicro X7DVL-3 motherboard under CentOs 5.5 x64.

The application I am running is timer sensitive and keeps sensing drift whether under load or at idle, but especially under load. I did some investigating with atop and dd and found some mind-blowing numbers. Mind you, I am no Linux guru but something sure seems out of whack.

I ran:

dd bs=4096 if=/dev/zero of=/bigtestfile

to generate disk activity. Regardless whether I wrote it to sda or sdb my DSK value in atop would go over 100%, at one time peaking at 1700%. Again it does not matter if I am writing to sda or sdb.

DSK |         sdb | busy    675% | read       0 | write    110 | avio   78 ms |

Here are the smartctl outputs:

# smartctl -A /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   165   165   021    Pre-fail  Always       -       2750
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   065   065   000    Old_age   Always       -       25831
 10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
194 Temperature_Celsius     0x0022   116   093   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0


# smartctl -A /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   180   180   021    Pre-fail  Always       -       3958
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       22
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   068   068   000    Old_age   Always       -       24087
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
194 Temperature_Celsius     0x0022   122   096   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

Any idea what's wrong here? Bad motherboard? It would seem rare that both drives are going bad (smartctl says they PASS_, so it leaves the mobo as the culprit in my eyes.

© Server Fault or respective owner

Related posts about centos

Related posts about error