I've been monitoring our SQL server for a while, and have noticed that I/O hits 100% every so often using Task Manager and Perfmon.
I have normally been able to correlate this spike with SUSPENDED processes in SQL Server Management when I execute "exec sp_who2".
The RAID controller is controlled by LSI MegaRAID Storage Manager. We have the following setup:
- System Drive (Windows) on RAID 1 with two 280GB drives
- SQL is on a RAID 10 (2 mirroed drives of 280GB in two different spans)
This is a database that is hammered during the day, but is pretty inactive at night. The DB size is currently about 13GB, and is used by approximately 200 (and growing) users a day.
I have a couple of ideas I'm toying around with:
- Checking for Indexes & reindexing some tables
- Adding an additional RAID 1 (with 2 new, smaller, HDs) and moving the SQL's Log Data File (LDF) onto the new RAID.
For #2, my question is this: Would we really be increasing disk performance (IO) by moving data off of the RAID 10 onto a RAID 1? RAID 10 obviously has better performance than RAID 1. Furthermore, SQL must write to the transaction logs before writing to the database.
But on the flip side, we'll be reducing both the size of the disks as well as the amount of data written to the RAID 10, which is where all of the "meat" is - thereby increasing that RAID's performance for read requests.
Is there any way to find out what our current limiting factor is? (The drives vs. the RAID Controller)? If the limiting factor is the drives, then maybe adding the additional RAID 1 makes sense. But if the limiting factor is the Controller itself, then I think we're approaching this thing wrong.
Finally, are we just wasting our time? Should we instead be focusing our efforts towards #1 (reindexing tables, reducing network latency where possible, etc...)?
© Server Fault or respective owner