Search Results

Search found 4 results on 1 pages for 'leapsecond'.

Page 1/1 | 1 

  • Anyone else experiencing high rates of Linux server crashes during a leap second day?

    - by Bron Gondwana
    POSTMORTEM Anticlimax: only thing that died was my VPN (openvpn) link to the cluster, so there was an exciting few seconds while it re-established. Everything else was fine. Starting back ntp everywhere. If you look at Marco's blog at http://my.opera.com/marcomarongiu/blog/2012/06/01/an-humble-attempt-to-work-around-the-leap-second - he has a solution for phasing the time change over 24 hours using ntpd -x to avoid the 1 second skip. Give that a go if it matters to you. For the systems I run, the jump isn't a problem. Just today, Sat June 30th - starting soon after the start of the day GMT. We've had a handful of blades in different datacentres as managed by different teams all go dark - not responding to pings, screen blank. They're all running Debian Squeeze - with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I've also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I'm wondering. The one crash which I did get a screen dump from said: [3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358 [3161000.864001] lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0 Unfortunately the blades all supposedly had kdump configured, but they died so hard that kdump didn't trigger - and they had console blanking turned on. I've disabled console blanking now, so fingers crossed I'll have more information after the next crash. Just want to know if it's a common thread or "just us". It's really odd that they're different units in different datacentres bought at different times and run by different admins (I run the FastMail.FM ones)... and now even different vendor hardware. Most of the machines which crashed had been up for weeks/months and were running 3.1 or 3.2 series kernels. The most recent crash was a machine which had only been up about 6 hours running 3.2.21. THE WORKAROUND Ok people, here's how I worked around it. disabled ntp: /etc/init.d/ntp stop created http://linux.brong.fastmail.fm/2012-06-30/fixtime.pl (code stolen from Marco, see blog posts in comments) ran fixtime.pl without an argument to see that there was a leap second set ran fixtime.pl with an argument to remove the leap second NOTE: depends on adjtimex. I've put a copy of the squeeze adjtimex binary at http://linux.brong.fastmail.fm/2012-06-30/adjtimex - it will run without dependencies on a squeeze 64 bit system. If you put it in the same directory as fixtime.pl, it will be used if the system one isn't present. Obviously if you don't have squeeze 64 bit... find your own. I'm going to start ntp again tomorrow. As an anonymous user suggested - an alternative to running adjtimex is to just set the time yourself, which will presumably also clear the leapsecond counter.

    Read the article

  • Is this leap second bug? [closed]

    - by Sangdol
    Possible Duplicate: Anyone else experiencing high rates of Linux server crashes during a leap second day? On last Saturday(31 June 2012), Java applications suddenly started to use 100% CPU like described in this on some servers. The problem was recovered after restarted tomcat. It's similar to Java leap second bug but it happened during day not midnight. The time was around 17:35(GMT+9) on 31 June. Can cause of this problem be leap second?

    Read the article

  • Anyone else experiencing high rates of linux server crashes today?

    - by Bron Gondwana
    Just today, Sat June 30th - starting soon after the start of the day GMT. We've had a handful of blades in different datacentres as managed by different teams all go dark - not responding to pings, screen blank. They're all running Debian Squeeze - with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I've also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I'm wondering. The one crash which I did get a screen dump from said: [3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358 [3161000.864001] lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0 Unfortunately the blades all supposedly had kdump configured, but they died so hard that kdump didn't trigger - and they had console blanking turned on. I've disabled console blanking now, so fingers crossed I'll have more information after the next crash. Just want to know if it's a common thread or "just us". It's really odd that they're different units in different datacentres bought at different times and run by different admins (I run the FastMail.FM ones)... and now even different vendor hardware. Most of the machines which crashed had been up for weeks/months and were running 3.1 or 3.2 series kernels. The most recent crash was a machine which had only been up about 6 hours running 3.2.21.

    Read the article

  • How to resolve high CPU + excessive stat("/etc/localtime") and clock_gettime(CLOCK_REALTIME) calls

    - by Yemster
    I've been experiencing really high CPU on a ruby on rails app (see stack below) and have been trying to diagnose the possible causes to no avail. Stack: ruby 1.9.3 rails 3.2.6 Apache/2.2.21 (Debian) Phusion Passenger 3.0.11 Whenever I run strace against the spiking Rack process PID (see Top excerpt below), I am seeing a tonne of stat("/etc/localtime") and clock_gettime(CLOCK_REALTIME) calls and have no idea how to stop these. Excerpt from Top showin running PID: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11674 www-user 20 0 313m 182m 5076 R 99 2.3 63:04.60 Rack: /var/www/my_rails_app/current 11634 www-user 20 0 411m 216m 5144 S 10 2.7 197:55.63 Rack: /var/www/my_rails_app/current Strace snippet below: [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 141474018}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 141577456}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 143073982}) = 0 [pid 11674] poll([{fd=15, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 11674] write(15, "b\0\0\0\3SELECT `images`.* FROM `ima"..., 102) = 102 [pid 11674] read(15, "\1\0\0\1\0229\0\0\2\3def\23myappy_productio"..., 16384) = 2063 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 144138035}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 ... [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 154076443}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 154189429}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 157185700}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 157298770}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 165076003}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 165212572}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 167542679}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354058955, 167683436}) = 0 .... [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62052248}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62182486}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 62919948}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 63057266}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 63751707}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 73730686}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 75874687}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 76077133}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 78205019}) = 0 ... [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 89370879}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 89583247}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 91637614}) = 0 [pid 11674] clock_gettime(CLOCK_REALTIME, {1354060036, 91782149}) = 0 Have Google'd around and came across a number of suggestions which I've tried with no success. Things tried so far: Have tried setting time zone as recommended here Made no difference and issue still persists. Content of my /etc/localtime: TZif2UTCTZif2UTC UTC0 Have tried the recommended fix for the leapsecond bug: date -s 'date' No joy so far. I'm fresh out of ideas so any help/advice on how to diagnose or resolve would be greatly appreciated.

    Read the article

1