Why do I see a large performance hit with DRBD?

Posted by BHS on Server Fault See other posts from Server Fault or by BHS
Published on 2011-11-04T17:02:30Z Indexed on 2013/07/03 5:08 UTC
Read the original article Hit count: 483

Filed under:
|
|

I see a much larger performance hit with DRBD than their user manual says I should get. I'm using DRBD 8.3.7 (Fedora 13 RPMs).

I've setup a DRBD test and measured throughput of disk and network without DRBD:

dd if=/dev/zero of=/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 4.62985 s, 116 MB/s

/ is a logical volume on the disk I'm testing with, mounted without DRBD

iperf:

[  4]  0.0-10.0 sec  1.10 GBytes   941 Mbits/sec

According to Throughput overhead expectations, the bottleneck would be whichever is slower, the network or the disk and DRBD should have an overhead of 3%. In my case network and I/O seem to be pretty evenly matched. It sounds like I should be able to get around 100 MB/s.

So, with the raw drbd device, I get

dd if=/dev/zero of=/dev/drbd2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 6.61362 s, 81.2 MB/s

which is slower than I would expect. Then, once I format the device with ext4, I get

dd if=/dev/zero of=/mnt/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 9.60918 s, 55.9 MB/s

This doesn't seem right. There must be some other factor playing into this that I'm not aware of.

global_common.conf

global {
usage-count yes;
}

common {
protocol C;
}

syncer {
al-extents 1801;
rate 33M;
}

data_mirror.res

resource data_mirror {
    device /dev/drbd1;
    disk   /dev/sdb1;

    meta-disk internal;

    on cluster1 {
       address 192.168.33.10:7789;
    }

    on cluster2 {
       address 192.168.33.12:7789;
    }
}

For the hardware I have two identical machines:

  • 6 GB RAM
  • Quad core AMD Phenom 3.2Ghz
  • Motherboard SATA controller
  • 7200 RPM 64MB cache 1TB WD drive

The network is 1Gb connected via a switch. I know that a direct connection is recommended, but could it make this much of a difference?

Edited

I just tried monitoring the bandwidth used to try to see what's happening. I used ibmonitor and measured average bandwidth while I ran the dd test 10 times. I got:

  • avg ~450Mbits writing to ext4
  • avg ~800Mbits writing to raw device

It looks like with ext4, drbd is using about half the bandwidth it uses with the raw device so there's a bottleneck that is not the network.

© Server Fault or respective owner

Related posts about linux

Related posts about Performance