Addition of leap second causes spuriously high CPU usage and futex lockups

Bug #1020285 reported by James Troup
92
This bug affects 16 people
Affects Status Importance Assigned to Milestone
base-files (Ubuntu)
Fix Released
Undecided
Unassigned
Lucid
Won't Fix
Undecided
Unassigned
Natty
Won't Fix
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Won't Fix
Undecided
Unassigned
Quantal
Won't Fix
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Medium
Brad Figg
Lucid
Fix Released
Medium
Brad Figg
Natty
Invalid
Medium
Brad Figg
Oneiric
Fix Released
Medium
Brad Figg
Precise
Fix Released
Medium
Brad Figg
Quantal
Fix Released
Medium
Brad Figg

Bug Description

[Impact]
Software that relies on fine-grained pthread timeouts will spin indefinitely and drive up system load following a leap second, when the kernel's idea of time has become desynced and sub-1s timeouts are all hit immediately. Mysql and Java are in particular reported to be affected by this. This is a transient issue, in that it will go away the first time the system is rebooted after the leap second and is expected to be fixed before the next leap second occurs; nevertheless admins have been caught off-guard by this misbehavior and in some cases may not have noticed the problem or know what to do about it, so we should help them along by resetting the kernel clock with a minimal-risk base-files update.

[Test Case]
1. Find a system that has been online, with mysqld or a java-based process running since before 2012-06-30.
2. Verify that one or more processes on the system are spinning in futex and driving up the system load.
3. Upgrade to the base-files package from -proposed.
4. Verify that the system load comes back down immediately.
5. A stress-test for leap-second handling has been provided at https://lkml.org/lkml/2012/7/3/37

[Regression potential]
No analysis has been done on the effect of resetting the date on applications that require a high-accuracy clock. While this fixes the problem with the pthreads interfaces, it may cause other problems for other software. Since the proposed fix is to reset the kernel's date to the current date, which is not atomic, there will be a slight skew of the clock backwards in time. ntp *should* fix this shortly thereafter for machines that have it enabled.
Also, because there's a single version check for each copy of the SRU, users whose applications are negatively affected by the running of this date command will also be negatively affected on each subsequent upgrade of the system, up to and including the quantal devel release.

As widely reported, the addition of the leap second on 2012-06-30 has
caused high CPU usage and futex lockups in a lot of applications
including JVMs, Mysql as well as desktop apps like Firefox and
Thunderbird.

https://lkml.org/lkml/2012/6/30/122
http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second
https://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

We've seen this ourselves on the Canonical infrastructure on both
current Lucid and Precise kernels, i.e.

ii linux-image-2.6.32-41-server 2.6.32-41.90 Linux kernel image for version 2.6.32 on x86_64
ii linux-image-3.2.0-24-generic 3.2.0-24.39 Linux kernel image for version 3.2.0 on 64 bit x86 SMP

We can also confirm the 'date -s $(date)' workaround fixes the problem
without requiring a reboot.

Revision history for this message
Adam Conrad (adconrad) wrote :

For the record, those looking for a runtime workaround might prefer:

date -u -s "$(date -u -R)"

The extra switches are to avoid locales and ambiguous timezones getting in your way, and the quoting is, well, for proper quoting. :P

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key lucid precise
Revision history for this message
James Troup (elmo) wrote :

This is perhaps redundant, but for the avoidance of doubt, a reboot
does appear to fix the problem too.

Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Medium → Undecided
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Michael S. Fischer (otterley) wrote :
Andy Whitcroft (apw)
Changed in linux (Ubuntu Precise):
importance: Undecided → Medium
Changed in linux (Ubuntu Oneiric):
importance: Undecided → Medium
Changed in linux (Ubuntu Natty):
importance: Undecided → Medium
Changed in linux (Ubuntu Lucid):
importance: Undecided → Medium
Changed in linux (Ubuntu Quantal):
assignee: Canonical Kernel Team (canonical-kernel-team) → Andy Whitcroft (apw)
Andy Whitcroft (apw)
Changed in linux (Ubuntu Quantal):
assignee: Andy Whitcroft (apw) → Brad Figg (brad-figg)
status: Triaged → In Progress
Changed in linux (Ubuntu Precise):
status: New → Confirmed
status: Confirmed → Triaged
Changed in linux (Ubuntu Oneiric):
status: New → Triaged
Changed in linux (Ubuntu Natty):
status: New → Triaged
Changed in linux (Ubuntu Lucid):
status: New → Triaged
tags: added: kernel-key
Brad Figg (brad-figg)
Changed in linux (Ubuntu Lucid):
assignee: nobody → Brad Figg (brad-figg)
Changed in linux (Ubuntu Natty):
assignee: nobody → Brad Figg (brad-figg)
Changed in linux (Ubuntu Oneiric):
assignee: nobody → Brad Figg (brad-figg)
Changed in linux (Ubuntu Precise):
assignee: nobody → Brad Figg (brad-figg)
Revision history for this message
Steve Langasek (vorlon) wrote :

13:41 < infinity> It would, perhaps, be vaguely close to harmless to have a one-time addition to the kernel postinst
                  that just does date -u -s "$(date -u -R)" (extra switches there to make sure locales and ambiguous
                  timezones don't mess with you).
13:47 < slangasek> infinity: any package other than the kernel could use a version check in the postinst
13:47 < slangasek> e.g. base-files

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package base-files - 6.5ubuntu8

---------------
base-files (6.5ubuntu8) quantal; urgency=low

  * Call date -s $(date -R) on upgrade, to resync any clocks that might
    be desynced (and causing pthread spinning in the kernel) due to the leap
    second. LP: #1020285.
 -- Steve Langasek <email address hidden> Tue, 03 Jul 2012 10:43:12 -0700

Changed in base-files (Ubuntu Quantal):
status: New → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello James, or anyone else affected,

Accepted into base-files-proposed and precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/None/6.5ubuntu6.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in base-files (Ubuntu Precise):
status: New → Fix Committed
Changed in linux (Ubuntu Precise):
status: Triaged → Fix Committed
status: Fix Committed → New
status: New → Triaged
Changed in base-files (Ubuntu Oneiric):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Adam Conrad (adconrad) wrote :

Hello James, or anyone else affected,

Accepted base-files into oneiric-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/base-files/6.4ubuntu5.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Steve Langasek (vorlon)
description: updated
Revision history for this message
Adam Conrad (adconrad) wrote :

Hello James, or anyone else affected,

Accepted base-files into natty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/base-files/5.0.0ubuntu28.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

description: updated
Changed in base-files (Ubuntu Natty):
status: New → Fix Committed
Steve Langasek (vorlon)
description: updated
Revision history for this message
Adam Conrad (adconrad) wrote :

Hello James, or anyone else affected,

Accepted base-files into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/base-files/5.0.0ubuntu20.10.04.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in base-files (Ubuntu Lucid):
status: New → Fix Committed
Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

> For the record, those looking for a runtime workaround might prefer:
>
> date -u -s "$(date -u -R)"
>
> The extra switches are to avoid locales and ambiguous timezones getting in your way, and the quoting is, well, for proper quoting. :P

For what it's worth, an even simpler command to do this is:
  date -s now

(as mentioned at the top of
http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second
)

Nathan

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Debbugs #679882 pulls together a list of various leap-second-related kernel patches:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679882

Revision history for this message
Steve Langasek (vorlon) wrote :

SRUs withdrawn; the window when this would have been a useful SRU has since passed.

Changed in base-files (Ubuntu Lucid):
status: Fix Committed → Won't Fix
Changed in base-files (Ubuntu Natty):
status: Fix Committed → Won't Fix
Changed in base-files (Ubuntu Oneiric):
status: Fix Committed → Won't Fix
Changed in base-files (Ubuntu Precise):
status: Fix Committed → Won't Fix
Changed in base-files (Ubuntu Quantal):
status: Fix Released → Won't Fix
Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Note that even though it's been a while since the leap second, a kernel affected by this bug could persist with its desynced internal idea of time, and the system would show no noticable symptoms until someone eventually runs an effected application. (See https://lkml.org/lkml/2012/7/1/203 and https://lkml.org/lkml/2012/7/10/446 for more details.)

The attatched short Python script can be used to check a particular system to see if the kernel is left in that desynced state (and avoids causing high CPU usage during the test).

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

(Revised the script to set its exit status based on the results of the testing.)

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Red Hat bug no. 836803, "RHEL6: Potential fix for leapsecond caused futex related load spikes":
 https://bugzilla.redhat.com/show_bug.cgi?id=836803

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Lucid):
status: Triaged → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Lucid in -proposed solves the problem (2.6.32-42.95). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lucid' to 'verification-done-lucid'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid
Revision history for this message
Luis Henriques (henrix) wrote :
Download full text (3.6 KiB)

I've executed the test case on a Lucid system. Here's the output for the 2.6.32-41.94 kernel:
# uname -a
Linux lucid 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 16:51:39 UTC 2012 i686 GNU/Linux
# ./leap_seconds
now: 1343779191:18314044 diff: 0:275081 rem: 0:0
now: 1343779191:518753118 diff: 0:439074 rem: 0:0
now: 1343779192:18885544 diff: 0:132426 rem: 0:0
now: 1343779192:519186044 diff: 0:300500 rem: 0:0
now: 1343779193:19389186 diff: 0:203142 rem: 0:0
now: 1343779193:520615448 diff: 0:1226262 rem: 0:0
now: 1343779194:20769790 diff: 0:154342 rem: 0:0
now: 1343779194:520932743 diff: 0:162953 rem: 0:0
now: 1343779195:21301107 diff: 0:368364 rem: 0:0
now: 1343779195:521610712 diff: 0:309605 rem: 0:0
now: 1343779196:21818618 diff: 0:207906 rem: 0:0
now: 1343779196:522070412 diff: 0:251794 rem: 0:0
now: 1343779197:22276862 diff: 0:206450 rem: 0:0
now: 1343779197:522477856 diff: 0:200994 rem: 0:0
now: 1343779198:22686111 diff: 0:208255 rem: 0:0
now: 1343779198:522924932 diff: 0:238821 rem: 0:0
now: 1343779199:24238584 diff: 0:1313652 rem: 0:0
now: 1343779199:524540590 diff: 0:302006 rem: 0:0
now: 1343779199:24829471 diff: 0:-999711119 rem: 0:0
now: 1343779199:24938078 diff: 0:-499891393 rem: 0:0
now: 1343779199:24982361 diff: 0:-499955717 rem: 0:0
now: 1343779199:25022793 diff: 0:-499959568 rem: 0:0
now: 1343779199:25063446 diff: 0:-499959347 rem: 0:0
now: 1343779199:25104451 diff: 0:-499958995 rem: 0:0
now: 1343779199:25147178 diff: 0:-499957273 rem: 0:0
now: 1343779199:25193846 diff: 0:-499953332 rem: 0:0
now: 1343779199:25243894 diff: 0:-499949952 rem: 0:0
now: 1343779199:25286937 diff: 0:-499956957 rem: 0:0
now: 1343779199:25332647 diff: 0:-499954290 rem: 0:0
now: 1343779199:25374189 diff: 0:-499958458 rem: 0:0
now: 1343779199:25413411 diff: 0:-499960778 rem: 0:0
...
# dmesg
[ 161.756872] Clock: inserting leap second 23:59:60 UTC

With the new 2.6.32-42.95 kernel, here's the output:
# uname -a
Linux lucid 2.6.32-42-generic #95-Ubuntu SMP Wed Jul 25 15:57:54 UTC 2012 i686 GNU/Linux
root@lucid:/home/ubuntu/Downloads# ./leap_seconds
now: 1343692791:233925009 diff: 0:178887 rem: 0:0
now: 1343692791:734525073 diff: 0:600064 rem: 0:0
now: 1343692792:234689367 diff: 0:164294 rem: 0:0
now: 1343692792:734955710 diff: 0:266343 rem: 0:0
now: 1343692793:235121427 diff: 0:165717 rem: 0:0
now: 1343692793:735387972 diff: 0:266545 rem: 0:0
now: 1343692794:235636709 diff: 0:248737 rem: 0:0
now: 1343692794:735885332 diff: 0:248623 rem: 0:0
now: 1343692795:236064344 diff: 0:179012 rem: 0:0
now: 1343692795:736370441 diff: 0:306097 rem: 0:0
now: 1343692796:236552965 diff: 0:182524 rem: 0:0
now: 1343692796:736751874 diff: 0:198909 rem: 0:0
now: 1343692797:236931281 diff: 0:179407 rem: 0:0
now: 1343692797:737179866 diff: 0:248585 rem: 0:0
now: 1343692798:237285693 diff: 0:105827 rem: 0:0
now: 1343692798:737387092 diff: 0:101399 rem: 0:0
now: 1343692799:237472566 diff: 0:85474 rem: 0:0
now: 1343692799:737590980 diff: 0:118414 rem: 0:0
now: 1343692800:237755920 diff: 0:164940 rem: 0:0
now: 1343692800:737882098 diff: 0:126178 rem: 0:0
now: 1343692801:238105101 diff: 0:223003 rem: 0:0
now: 1343692801:738282994 dif...

Read more...

tags: added: verification-done-lucid
removed: verification-needed-lucid
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.32-42.95

---------------
linux (2.6.32-42.95) lucid-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1027831

  [ Upstream Kernel Changes ]

  * hugetlb: fix resv_map leak in error path
    - LP: #1004621
    - CVE-2012-2390
  * mm: fix vma_resv_map() NULL pointer
    - LP: #1004621
    - CVE-2012-2390
  * net: sock: validate data_len before allocating skb in
    sock_alloc_send_pskb()
    - LP: #1006622
    - CVE-2012-2136
  * 2.6.32.x: ntp: Fix leap-second hrtimer livelock
    - LP: #1020285
  * 2.6.32.x: ntp: Correct TAI offset during leap second
    - LP: #1020285
  * 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during
    leapsecond
    - LP: #1020285
  * 2.6.32.x: time: Move common updates to a function
    - LP: #1020285
  * 2.6.32.x: hrtimer: Provide clock_was_set_delayed()
    - LP: #1020285
  * 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue
    - LP: #1020285
  * 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers
    - LP: #1020285
  * 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt()
    - LP: #1020285
  * 2.6.32.x: timekeeping: Provide hrtimer update function
    - LP: #1020285
  * 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
    - LP: #1020285
  * 2.6.32.x: timekeeping: Add missing update call in timekeeping_resume()
    - LP: #1020285
 -- Luis Henriques <email address hidden> Tue, 24 Jul 2012 16:34:35 +0100

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
tags: removed: kernel-key
Revision history for this message
Herton R. Krzesinski (herton) wrote :

Fixed already on Oneiric (3.0.0-24.40) and Precise (3.2.0-29.46), they didn't have a BugLink pointing here as the fixes were applied through stable updates, so no automatic status change to Fix Released.

Changed in linux (Ubuntu Oneiric):
status: Triaged → Fix Released
Changed in linux (Ubuntu Precise):
status: Triaged → Fix Released
Revision history for this message
Herton R. Krzesinski (herton) wrote :

By the way, I confirmed through leapseconds test that they are ok on Oneiric/Precise (using leapseconds test integrated on our autotest).

Revision history for this message
Herton R. Krzesinski (herton) wrote :

This is fixed already for Quantal as well, and leap_seconds test is passing on latest Quantal kernel.

Changed in linux (Ubuntu Quantal):
status: In Progress → Fix Released
Revision history for this message
jan (jan-ubuntu-h-i-s) wrote :

This solution may have introduced a regression problem on Lucid.
https://bugs.launchpad.net/linux-kernel-bugs/+bug/1052323

Revision history for this message
Abel Lopez (al592b) wrote :

Any update on a natty build for this? The next leap second is coming up end of december

Revision history for this message
Paul Collins (pjdc) wrote :

No leap second is scheduled for December 2012.

http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat

Revision history for this message
Abel Lopez (al592b) wrote :

Sorry, didn't mean to split hairs, For all intents and purposes, with the holiday schedule, I'm considering Jan 1 2013 as good as Dec 31st 2012.
Point being, Jan 1 has a leap second
ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3535142400

Revision history for this message
Paul Collins (pjdc) wrote :

Leap seconds are inserted at the end of a given six-month period. You can see that the leap second that was inserted at the end of 30 June 2012 (IERS bulletin C43) results in an update for "1 Jul 2012" in your file. You can then see that the line for "1 Jan 2013", which is not yet enabled (note the "#" at the beginning of the line) would correspond to a leap second inserted at the end of 31 December 2012, which, as IERS bulletin C44 states, will not occur.

You can obtain bulletin C44 at my link above, and more of them are available at http://hpiers.obspm.fr/iers/bul/bulc/?C=M;O=D

Also, Ubuntu 11.04 (Natty) has reached end of life, so I would imagine it's unlikely that this issue will be addressed for that release. https://lists.ubuntu.com/archives/ubuntu-security-announce/2012-October/001882.html

Revision history for this message
Julian Wiedmann (jwiedmann) wrote :

This release has reached end-of-life [0].

[0] https://wiki.ubuntu.com/Releases

Changed in linux (Ubuntu Natty):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.