I'm having serious issues with a xen based server, this is on the guest partition. It's a paravirtualized CentOS 5.5.
The following numbers are taken from top while copying a large file over the network.
If i copy the file another time the speed decreases in relation to load average. So the second time it's half the speed of the first time. 
It needs some time to cool off after this. Load average slowly decreases until it's once again usable. ls / takes about 30 seconds.
top - 13:26:44 up 13 days, 21:44,  2 users,  load average: 7.03, 5.08, 3.15
Tasks: 134 total,   2 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 25.3%id, 74.5%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:   1048752k total,  1041460k used,     7292k free,     3116k buffers
Swap:  2129912k total,       40k used,  2129872k free,   904740k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1506 root      10  -5     0    0    0 S  0.3  0.0   0:03.94 cifsd
    1 root      15   0  2172  644  556 S  0.0  0.1   0:00.08 init
Meanwhile the host is ~0.5 load avg and steady over time. ~50% wait
Server hardware is dual xeon, 3gb ram, 170gb scsi 320 10k rpm, and shouldn't have any problems with copying files over the network.
disk = [ "tap:aio:/vm/dev01.img,xvda,w" ]
I also get these in the log
INFO: task syslogd:1350 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd       D 00062E4F  2208  1350      1          1353  1312 (NOTLB)
       c0ef0ed0 00000286 6e71a411 00062e4f c0ef0f18 00000009 c0f20000 6e738bfd
       00062e4f 0001e7ec c0f2010c c181a724 c1abd200 00000000 ffffffff c0ef0ecc
       c041a180 00000000 c0ef0ed8 c03d6a50 00000000 00000000 c03d6a00 00000000
Call Trace:
 [<c041a180>] __wake_up+0x2a/0x3d
 [<ee06a1ea>] log_wait_commit+0x80/0xc7 [jbd]
 [<c043128b>] autoremove_wake_function+0x0/0x2d
 [<ee065661>] journal_stop+0x195/0x1ba [jbd]
 [<c0490a32>] __writeback_single_inode+0x1a3/0x2af
 [<c04568ea>] do_writepages+0x2b/0x32
 [<c045239b>] __filemap_fdatawrite_range+0x66/0x72
 [<c04910ce>] sync_inode+0x19/0x24
 [<ee09b007>] ext3_sync_file+0xaf/0xc4 [ext3]
 [<c047426f>] do_fsync+0x41/0x83
 [<c04742ce>] __do_fsync+0x1d/0x2b
 [<c0405413>] syscall_call+0x7/0xb
 =======================
I have tried disabling irqbalanced as suggested here but it does not seem to make any difference.