e2fsck extremely slow, although enough memory exists

Posted by kaefert on Server Fault See other posts from Server Fault or by kaefert
Published on 2012-11-06T19:53:12Z Indexed on 2012/11/07 11:06 UTC
Read the original article Hit count: 257

I've got this external USB-Disk:

kaefert@blechmobil:~$ lsusb -s 2:3
Bus 002 Device 003: ID 0bc2:3320 Seagate RSS LLC 

As can be seen in this dmesg output, there is some problem that prevents that disk from beeing mounted:

kaefert@blechmobil:~$ dmesg
...
[  113.084079] usb 2-1: new high-speed USB device number 3 using ehci_hcd
[  113.217783] usb 2-1: New USB device found, idVendor=0bc2, idProduct=3320
[  113.217787] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[  113.217790] usb 2-1: Product: Expansion Desk
[  113.217792] usb 2-1: Manufacturer: Seagate
[  113.217794] usb 2-1: SerialNumber: NA4J4N6K
[  113.435404] usbcore: registered new interface driver uas
[  113.455315] Initializing USB Mass Storage driver...
[  113.468051] scsi5 : usb-storage 2-1:1.0
[  113.468180] usbcore: registered new interface driver usb-storage
[  113.468182] USB Mass Storage support registered.
[  114.473105] scsi 5:0:0:0: Direct-Access     Seagate  Expansion Desk   070B PQ: 0 ANSI: 6
[  114.474342] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[  114.475089] sd 5:0:0:0: [sdb] Write Protect is off
[  114.475092] sd 5:0:0:0: [sdb] Mode Sense: 43 00 00 00
[  114.475959] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  114.477093] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[  114.501649]  sdb: sdb1
[  114.502717] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[  114.504354] sd 5:0:0:0: [sdb] Attached SCSI disk
[  116.804408] EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 3976 failed (47397!=61519)
[  116.804413] EXT4-fs (sdb1): group descriptors corrupted!
...

So I went and fired up my favorite partition manager - gparted, and told it to verify and repair the partition sdb1. This made gparted call e2fsck (version 1.42.4 (12-Jun-2012))

e2fsck -f -y -v /dev/sdb1

Although gparted called e2fsck with the "-v" option, sadly it doesn't show me the output of my e2fsck process (bugreport https://bugzilla.gnome.org/show_bug.cgi?id=467925 )

I started this whole thing on Sunday (2012-11-04_2200) evening, so about 48 hours ago, this is what htop says about it now (2012-11-06-1900):

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 3704 root       39  19 1560M 1166M   768 R 98.0 19.5 42h56:43 e2fsck -f -y -v /dev/sdb1

Now I found a few posts on the internet that discuss e2fsck running slow, for example:

http://gparted-forum.surf4.info/viewtopic.php?id=13613

where they write that its a good idea to see if the disk is just that slow because maybe its damaged, and I think these outputs tell me that this is not the case in my case:

kaefert@blechmobil:~$ sudo hdparm -tT /dev/sdb
/dev/sdb:
 Timing cached reads:   3562 MB in  2.00 seconds = 1783.29 MB/sec
 Timing buffered disk reads:  82 MB in  3.01 seconds =  27.26 MB/sec


kaefert@blechmobil:~$ sudo hdparm /dev/sdb
/dev/sdb:
 multcount     =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 364801/255/63, sectors = 5860533160, start = 0

However, although I can read quickly from that disk, this disk speed doesn't seem to be used by e2fsck, considering tools like gkrellm or iotop or this:

kaefert@blechmobil:~$ iostat -x
Linux 3.2.0-2-amd64 (blechmobil)    2012-11-06  _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14,24   47,81   14,63    0,95    0,00   22,37

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,59     8,29    2,42    5,14    43,17   160,17    53,75     0,30   39,80    8,72   54,42   3,95   2,99
sdb             137,54     5,48    9,23    0,20   587,07    22,73   129,35     0,07    7,70    7,51   16,18   2,17   2,04

Now I researched a little bit on how to find out what e2fsck is doing with all that processor time, and I found the tool strace, which gives me this:

kaefert@blechmobil:~$ sudo strace -p3704
lseek(4, 41026998272, SEEK_SET)         = 41026998272
write(4, "\212\354K[_\361\3nl\212\245\352\255jR\303\354\312Yv\334p\253r\217\265\3567\325\257\3766"..., 4096) = 4096
lseek(4, 48404766720, SEEK_SET)         = 48404766720
read(4, "\7t\260\366\346\337\304\210\33\267j\35\377'\31f\372\252\ffU\317.y\211\360\36\240c\30`\34"..., 4096) = 4096
lseek(4, 41027002368, SEEK_SET)         = 41027002368
write(4, "\232]7Ws\321\352\t\1@[+5\263\334\276{\343zZx\352\21\316`1\271[\202\350R`"..., 4096) = 4096
lseek(4, 48404770816, SEEK_SET)         = 48404770816
read(4, "\17\362r\230\327\25\346//\210H\v\311\3237\323K\304\306\361a\223\311\324\272?\213\tq \370\24"..., 4096) = 4096
lseek(4, 41027006464, SEEK_SET)         = 41027006464
write(4, "\367yy>x\216?=\324Z\305\351\376&\25\244\210\271\22\306}\276\237\370(\214\205G\262\360\257#"..., 4096) = 4096
lseek(4, 48404774912, SEEK_SET)         = 48404774912
read(4, "\365\25\0\21|T\0\21}3t_\272\373\222k\r\177\303\1\201\261\221$\261B\232\3142\21U\316"..., 4096) = 4096
^CProcess 3704 detached

around 16 of these lines every second, so 4 read and 4 write operations every second, which I don't consider to be a lot..

And finally, my question: Will this process ever finish? If those numbers from fseek (48404774912) represent bytes, that would be something like 45 gigabytes, with this beeing a 3 terrabyte disk, which would give me 134 days to go, if the speed stays constant, and e2fsck scans the disk like this completly and only once.

Do you have some advice for me? I have most of the data on that disk elsewhere, but I've put a lot of hours into sorting and merging it to this disk, so I would prefer to getting this disk up and running again, without formatting it anew. I don't think that the hardware is damaged since the disk is only a few months and since I can't see any I/O errors in the dmesg output.

UPDATE: I just looked at the strace output again (2012-11-06_2300), now it looks like this:

lseek(4, 1419860611072, SEEK_SET)       = 1419860611072
read(4, "3#\f\2447\335\0\22A\355\374\276j\204'\207|\217V|\23\245[\7VP\251\242\276\207\317:"..., 4096) = 4096
lseek(4, 43018145792, SEEK_SET)         = 43018145792
write(4, "]\206\231\342Y\204-2I\362\242\344\6R\205\361\324\177\265\317C\334V\324\260\334\275t=\10F."..., 4096) = 4096
lseek(4, 1419860615168, SEEK_SET)       = 1419860615168
read(4, "\262\305\314Y\367\37x\326\245\226\226\320N\333$s\34\204\311\222\7\315\236\336\300TK\337\264\236\211n"..., 4096) = 4096
lseek(4, 43018149888, SEEK_SET)         = 43018149888
write(4, "\271\224m\311\224\25!I\376\16;\377\0\223H\25Yd\201Y\342\r\203\271\24eG<\202{\373V"..., 4096) = 4096
lseek(4, 1419860619264, SEEK_SET)       = 1419860619264
read(4, ";d\360\177\n\346\253\210\222|\250\352T\335M\33\260\320\261\7g\222P\344H?t\240\20\2548\310"..., 4096) = 4096
lseek(4, 43018153984, SEEK_SET)         = 43018153984
write(4, "\360\252j\317\310\251G\227\335{\214`\341\267\31Y\202\360\v\374\307oq\3063\217Z\223\313\36D\211"..., 4096) = 4096

So the numbers in the lseek lines before the reads, like 1419860619264 are already a lot bigger, standing for 1.29 terabytes if those numbers are bytes, so it doesn't seem to be a linear progress on a big scale, maybe there are only some areas that need work, that have big gaps in between them.

UPDATE2: Okey, big disappointment, the numbers are back to very small again (2012-11-07_0720)

lseek(4, 52174548992, SEEK_SET)         = 52174548992
read(4, "\374\312\22\\\325\215\213\23\0357U\222\246\370v^f(\312|f\212\362\343\375\373\342\4\204mU6"..., 4096) = 4096
lseek(4, 46603526144, SEEK_SET)         = 46603526144
write(4, "\370\261\223\227\23?\4\4\217\264\320_Am\246CQ\313^\203U\253\274\204\277\2564n\227\177\267\343"..., 4096) = 4096

so either e2fsck goes over the data multiple times, or it just hops back and forth multiple times. Or my assumption that those numbers are bytes is wrong.

UPDATE3: Since it's mentioned here

http://forums.fedoraforum.org/showthread.php?t=282125&page=2

that you can testisk while e2fsck is running, i tried that, though not with a lot of success. When asking testdisk to display the data of my partition, this is what I get:

TestDisk 6.13, Data Recovery Utility, November 2011
Christophe GRENIER <[email protected]>
http://www.cgsecurity.org
 1 P Linux                    0   4  5 45600  40  8  732566272
Can't open filesystem. Filesystem seems damaged.

And this is what strace currently gives me (2012-11-07_1030)

lseek(4, 212460343296, SEEK_SET)        = 212460343296
read(4, "\315Mb\265v\377Gn \24\f\205EHh\2349~\330\273\203\3375\206\10\r3=W\210\372\352"..., 4096) = 4096
lseek(4, 47347830784, SEEK_SET)         = 47347830784
write(4, "]\204\223\300I\357\4\26\33+\243\312G\230\250\371*m2U\t_\215\265J \252\342Pm\360D"..., 4096) = 4096

(times are in CET)

© Server Fault or respective owner

Related posts about linux

Related posts about filesystems