ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)

Bug #1567540 reported by Pavlushka
118
This bug affects 17 people
Affects Status Importance Assigned to Milestone
NTP
Fix Released
High
ntp (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Christian Ehrhardt 

Bug Description

[Impact]

 * In NTP 4.2.8p4 there are several races that can cause a crash on
   startup or on a bit later but still on startup by DNS querying a
   peer.

 * The crash obviously affects users, especially as it seems - due to
   its racy nature - not appear on most, but severely hamstring some
   other users.

 * The details are a bit blurred, but overall there were four fixes
   upstream that address just this "kind of issue" that seemed to
   surface post 4.2.8p4.

[Test Case]

 * Start NTP (service)

 * Expectation: work

 * Failure: Crash

 * Constraints: this is a race, it seems to appear at <0.1% chance to
   all systems I have (or lower - as I just can say it didn't trigger in
   1000 tests). But that matches other reports. OTOH for some systems it
   seems to trigger >50% which also matches the high amount of crash
   reports (close to 20k now) as referred in comment 43

[Regression Potential]

 * Eventually the change is rather invasive as it changes the locking
   scheme of parts of the code - so there surely is some regression
   potential.

 * Fortunately all of this change is upstream and tested there
   quite heavily. Most of it for a few months already.

 * I tested as good as I could and could neither in code nor in test
   find an obvious weakness, and looking at all the crash reports it is
   about time.

[Other Info]

 * While all study of bugs, upstream changes and tests suggest we
   haven't broken anything, still I have to admit that "on my own" I
   can't confirm that it fixed the bug. So we are really dependent on
   the reporters here that seem to have the kind of hardware where it
   "crashes reliably".

--------

ntp crashes every time the network goes up or down while the system is running and also crashes after booting up without network.
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (26 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-17-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (31 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (63 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-25-generic root=UUID=3aea4570-4011-4247-9636-68317385324d ro
ProcVersionSignature: Ubuntu 4.4.0-25.44-generic 4.4.13
Tags: xenial third-party-packages
Uname: Linux 4.4.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dialout dip lpadmin mail netdev plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

tags: added: apport-collected xenial
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: ntp crashes everytime the network goes up or down.

Thank you for filing this bug report! None of the logs provide any of the crashing output, would it be possible to provide a bit more detail as to the crash?

Changed in ntp (Ubuntu):
status: New → Incomplete
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Pavlushka (pavelsayekat)
Changed in ntp (Ubuntu):
status: Incomplete → New
55 comments hidden view all 102 comments
Revision history for this message
In , Stenn (stenn) wrote :

Comment on attachment 1325
Do mlockall before threads

Pearly, thoughts?

Revision history for this message
In , Stenn (stenn) wrote :

Comment on attachment 1399
protect dnsworker_contexts with a mutex

Pearly, thoughts?

Revision history for this message
In , Stenn (stenn) wrote :

Mike,

Thanks for the patch - I hope we can get it reviewed soon.

Revision history for this message
In , H-murray (h-murray) wrote :

Mike: Thanks for tracking this down. I think this explains all the problems.

I don't think the patch is good enough.

You also need a lock on read references to dnsworker_contexts. The only
other reference is a few lines below and in a subroutine called from there.
I suggest moving the lock to the top of get_worker_context and the
unlock to the bottom. (and adding a assumes-lock comment to the top of
alloc_dnsworker_context)

57 comments hidden view all 102 comments
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

54 comments hidden view all 102 comments
Revision history for this message
In , Smallm (smallm) wrote :

Ack, that was careless of me. Started fixing it the way you suggested, but I'm wondering if something more major is needed. Even if I spread out the locks within get_worker_context() to top and bottom, it would still be giving out a pointer into the array that realloc can relocate. Return a copy of the struct? Does anyone else have ideas for this module (I thought I saw a comment in another CR to that effect)? I'm really very bad at multi-threaded coding.

Also, I guess a real patch needs to consider Windows.

Revision history for this message
In , H-murray (h-murray) wrote :

There is probably another copy of this problem in the other direction.

There are two places where info gets queued up and passed from thread
to thread. One is when the main thread tells the worker thread(s) what
to do. The other is when a worker thread is telling the main thread
an answer.

Looks like the other one is reserve_dnschild_ctx

I suggest folding alloc_dnsworker_context into get_worker_context
It's only called from one place and it will be easier to make
sure the locks are right without that extra layer. It's only a few
lines of code. The abstraction layer isn't helping anything.

It might be cleaner to move the definition of dnsworker_contexts
and dnsworker_contexts_alloc into get_worker_context. The idea
is to make sure the lock covers all uses.
  static xxx
I think all c compilers support that.

> Even if I spread out the locks within get_worker_context()
> to top and bottom, it would still be giving out a pointer
> into the array that realloc can relocate.

The thing that is getting realloc-ed is the array holding pointers
to blocks. The individual block never gets realloc-ed. The lock
only needs to protect the array. It's only referenced within
that routine. (aside from the alloc which I suggested moving)

54 comments hidden view all 102 comments
Revision history for this message
Pavlushka (pavelsayekat) wrote : Re: ntp crashes everytime the network goes up or down.
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

45 comments hidden view all 102 comments
Revision history for this message
In , Stenn (stenn) wrote :

*** This bug has been marked as a duplicate of bug 2954 ***

Revision history for this message
In , Perlinger (perlinger) wrote :

(In reply to comment #24)
>
> *** This bug has been marked as a duplicate of bug 2954 ***

That was bit early -- my fault. It is *not* exactly a dup of 2954, but related -- that is, it is also a race condition in the async/threaded resolver code.

I think the lock in the latest patch does not protect all data races here, but I'm still digging.

Revision history for this message
In , Perlinger (perlinger) wrote :

Harlan, the repo is in

  psp.ntp.org:~perlinger/ntp-stable-2831

compiled and run with

  linux/x64 --with-threads (threading resolver)
  linux/x64 --without-threads (forking resolver)
  Windows7/x64/VS2008 (threading resolver)

Hal, Mike, good catch. Only the proposed lock falls a bit short. You have to interlock all access to the global table, not just the realloc() call.

And using pthread_mutex_t is not so easy with Windows, but we all knew that ;) I used a semaphore (again) since there is already a suitable wrapper.

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Thanks for the report. John, Pearly, et al, thanks for your work on this.

Pearly's fix is STAGED for 4.2.8p7.

47 comments hidden view all 102 comments
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

39 comments hidden view all 102 comments
Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Thanks - please mark this bug as VERIFIED or IN_PROGRESS, as appropriate.

38 comments hidden view all 102 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: ntp crashes everytime the network goes up or down.

From the dumps

 à¦à¦ªà§à¦°à¦¿ 07 15:51:02 nowhere-6 ntpd[6197]: work_thread.c:271: INSIST(((void *)0) != req) failed
 à¦à¦ªà§à¦°à¦¿ 07 15:51:02 nowhere-6 ntpd[6197]: exiting (due to assertion failure)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The system also looks as it is in various problems - the kernel often reports vblank issues and dumps traces, also it seems that network in general or at least towards the configured NTP and DNS seems to be down.

Could you share a bit about the configuration related to that (networking/ntp/dns) - as this might help to recreate the case.

Changed in ntp (Ubuntu):
status: New → Incomplete
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

tags: added: third-party-packages
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

summary: - ntp crashes everytime the network goes up or down.
+ ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
+ up or down.)
Changed in ntp (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Changed in ntp (Ubuntu):
status: Confirmed → Triaged
Changed in ntp:
importance: Unknown → High
status: Unknown → Fix Released
33 comments hidden view all 102 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

TL;DR

The NTP Code around this is simplified:
1. Complex thing eventually setting ret to next element
2. assert ret!=NULL <-- this is what our bug reports hit

"Complex thing" was completely rewritten in the referred patch set to close the race window that it had.

So no matter these NTP bugs refer to slightly other ways of this issue to surface it is very likely the same "race".

Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :

I used to use a PPOE connection through ethernet/adsl router, and there the issue was more frequent. ANd no, I am not heavily firewalled, the dlink 2750U supposed to has its own firewall, and in my system ufw.

Revision history for this message
Pavlushka (pavelsayekat) wrote :

And it was very unlucky that I tried mtrace on ntp but it didn't crashed then but it crashes on boot or when the trace is not running, sounds funny.

Revision history for this message
Pavlushka (pavelsayekat) wrote :

And now I am using a 3g modem for internet but still ntp crashes sometime.

Revision history for this message
Sudhir Reddy (t-sudhirkumar) wrote : Re: [Bug 1567540] Re: ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)
Download full text (6.2 KiB)

Yes.. this issue is seen once a while. Issue is occurring more often when I
upgraded..

On Mon, Jun 27, 2016 at 9:38 PM, Pavlushka <email address hidden> wrote:

> And now I am using a 3g modem for internet but still ntp crashes
> sometime.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1577292).
> https://bugs.launchpad.net/bugs/1567540
>
> Title:
> ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
> up or down.)
>
> Status in NTP:
> Fix Released
> Status in ntp package in Ubuntu:
> Triaged
>
> Bug description:
> ntp crashes every time the network goes up or down while the system is
> running and also crashes after booting up without network.
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (26 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-17-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (31 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-04-13 (0 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64
> (20160412)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: I...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ntp (Ubuntu Xenial):
status: New → Confirmed
Changed in ntp (Ubuntu Xenial):
importance: Undecided → High
Changed in ntp (Ubuntu):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - Merge that will fix this in the development release created and entered review processing.
Once that is completed we have to think about SRU into Xenial (needs a rather complex backport from upstream)

Revision history for this message
Bob Jones (r-a-n-d-o-m-d-e-v-4+ubuntu) wrote :

Have you finished "thinking about" fixing the broken NTP in Xenial yet ? Its becoming a bit of a joke to be without such a key package (let alone the hacks that have had to be put in place to keep server time accurate !).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.6 KiB)

Hi,
the fix has to go to the development release first (Yakkety).
The merge is waiting for a review quite a while already.

Thanks you for the push on this, I'll take this as an opportunity to raise
the priority of the merge.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 13, 2016 at 6:30 PM, Bob Jones <
<email address hidden>> wrote:

> Have you finished "thinking about" fixing the broken NTP in Xenial yet ?
> Its becoming a bit of a joke to be without such a key package (let alone
> the hacks that have had to be put in place to keep server time accurate
> !).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1567540
>
> Title:
> ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
> up or down.)
>
> Status in NTP:
> Fix Released
> Status in ntp package in Ubuntu:
> Triaged
> Status in ntp source package in Xenial:
> Confirmed
>
> Bug description:
> ntp crashes every time the network goes up or down while the system is
> running and also crashes after booting up without network.
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (26 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-17-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (31 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-04-13 (0 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64
> (20160412)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
> ProcV...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package ntp - 1:4.2.8p8+dfsg-1ubuntu1

---------------
ntp (1:4.2.8p8+dfsg-1ubuntu1) yakkety; urgency=medium

  [ Christian Ehrhardt ]
  * Merge from Debian testing. Remaining changes:
    + debian/rules: enable debugging. Asked debian to add this in bug #643954.
    + debian/rules, debian/ntp.dirs, debian/source_ntp.py: Add apport hook.
    + debian/control: Add Suggests on apparmor.
    + debian/source_ntp.py: Add filter on AppArmor profile names to prevent
      false positives from denials originating in other packages
    + debian/ntpdate.if-up: Fix interaction with openntpd. Stop ntp before
      running ntpdate when an interface comes up, then start again afterwards.
    + debian/ntp.init, debian/rules: Only stop when entering single user mode,
      don't use /var/lib/ntp/ntp.conf.dhcp if /etc/ntp.conf is newer - it can
      get stale. Patch by Simon Déziel.
    + debian/ntp.conf, debian/ntpdate.default: Change default server to
      ntp.ubuntu.com.
    + debian/control: Add bison to Build-Depends (for ntpd/ntp_parser.y).
    + Extend PPS support
      - debian/README.Debian: Add a PPS section to the README.Debian
      - debian/ntp.conf: Add some configuration examples from the offical
        documentation.
    + SECURITY UPDATE: NTP statsdir cleanup cronjob insecure (LP: #1528050)
      - debian/ntp.cron.daily: fix security issues, patch thanks to halfdog!
      - CVE-2016-0727
    + Merge also contains an upstream fix that solves (LP: #1567540)
  * Added changes
    + match Ubuntu packages now that Debian has ntp apparmor accepted in
      d/control for Apparmor conflicts/replaces
    + d/apparmor-profile add samba winbindd pipe (LP: #1582767)
  * Drop Changes:
    + Add enforcing AppArmor profile (accepted in Debian):
      - debian/control: Add Conflicts/Replaces on apparmor-profiles.
      - debian/control: Add Suggests on apparmor.
      - debian/control: Build-Depends on dh-apparmor.
      - add debian/apparmor-profile*.
      - debian/ntp.dirs: Add apparmor directories.
      - debian/rules: Install apparmor-profile and apparmor-profile.tunable.
      - debian/source_ntp.py: Add filter on AppArmor profile names to prevent
        false positives from denials originating in other packages.
      - debian/README.Debian: Add note on AppArmor.
    + Add PPS support (accepted in Debian)
      - debian/control: Add Build-Depends on pps-tools
    + debian/apparmor-profile: allow 'rw' access to /dev/pps[0-9]* devices.
    + d/p/fix_local_sync.patch: fix local clock sync (fixed upstream)
    + debian/patches/ntpdate-fix-lp1526264.patch (fixed upstream):
      - Add Alfonso Sanchez-Beato's patch for fixing the cannot correct dates in
        the future bug
    + debian/apparmor-profile: adjust to handle AF_UNSPEC with dgram and stream
    + dropping previous ubuntu security patches/fixes that have been upstreamed
      in 4.2.8p6: CVE-2015-7973, CVE-2015-7975, CVE-2015-7976, CVE-2015-7977,
      CVE-2015-7978, CVE-2015-7979, CVE-2015-8138, CVE-2015-8158
    + dropping previous ubuntu security patches/fixes that have been upstreamed
      in 4.2.8p7: CVE-2016-1548, CVE-2016-1550, CVE-2016-2516, CVE-2016-2518...

Read more...

Changed in ntp (Ubuntu):
status: Triaged → Fix Released
Changed in ntp (Ubuntu):
assignee: ChristianEhrhardt (paelzer) → nobody
Changed in ntp (Ubuntu Xenial):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Looking through the crashes this seems usually one of the two:
http://bugs.ntp.org/show_bug.cgi?id=2831 (as mentioned before)
http://bugs.ntp.org/show_bug.cgi?id=2954 (related)

There was a bit back and forth on what of the two is a dop to the other.
Eventually 2954 has a fix I should be able to backport rather easily.
Eventually 2954 has the fix, but 2831 later on got reopened and became an extension of the old fix.

This is a rather invasive patch, but fortunately at least 2954 came right after p4 got released so it matches still. Not so sure about 2831.

A few, but really only a few crashes I looked at could have been http://bugs.ntp.org/show_bug.cgi?id=2969
Backporting that is rather easy so I'm going to include it (low risk of changing anything else).

Not getting recent patches is hard atm due to https://github.com/ntp-project/ntp/issues/15.
Looking for alternatives right now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Found an alternative, thanks to Unit193 in #ubuntu-devel!

Backported four fixes overall to address the crashes I've seen in the reports.
Currently building and testing.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Backports and Testing done, coordinated with mdeslaur to not conflict with Security updates staged in https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa/+packages.

Plan is to push the SRU asap (which also i favorable looking at all the crash reports) and the Security Team will continue on top of that (there were some extra CVEs that need to be included anyway).

Adding patch and SRU template now...

tags: removed: third-party-packages
description: updated
Changed in ntp (Ubuntu Xenial):
status: Confirmed → Fix Committed
description: updated
2 comments hidden view all 102 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Refined for SRU upload:
- clarified SRU template in this bug
- reduced number of patches by one to follow the "minimal change possible" guidance for SRU as hard
  as possible
- fixed the reference in d/p/* files to the related upstream commits

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Build and Tested again, ready for sponsoring review

Revision history for this message
Robie Basak (racb) wrote :

Looks good, thanks! I dropped debian/patches/ntp-4.2.8p4-segfaults-3-4.patch, added an apostrophe to the changelog and uploaded.

Changed in ntp (Ubuntu Xenial):
status: Fix Committed → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Pavlushka, or anyone else affected,

Accepted ntp into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ntp/1:4.2.8p4+dfsg-3ubuntu5.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ntp (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Martin for verifying and passing this SRU on to proposed!

To Pavlushka, Bob Jones, Sudhir Reddy - due to the raciness of this bug it is hard for anyone else to "really" verify. While I'm quite confident given the upstream discussions and by studying fix and crashes I'd really like your help testing the fix as in proposed if it fixes your issue as intended.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you very much in advance!

P.S. I beg a pardon, but to let this message and status change reach you I took the initiative and also subscribed some of you to the bug.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
since there was no update I'd once more ask one of the people out there which have systems affected by this to verify the new ntp package available in proposed.

Revision history for this message
Alexis Huxley (alexishuxley) wrote :

I am using ntp 1:4.2.8p4+dfsg-3ubuntu5 (i.e. the currently available package) and it crashes consistently and silently at, or shortly after, system boot up. If I restart it using systemctl then it crashes. If I start it manually from the command line just running 'ntpd' then it seems to keep running for longer, but even then it eventually silently crashes.

This is on a KVM-based VM running as a mail server with stock Postfix and Dovecot. I mention these packages because all other VMs (and PMs) have no issues, despite identical underlying installations.

I've installed 1:4.2.8p4+dfsg-3ubuntu5.1 and this seems to have fixed the problem.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you so much Alexis!

Due to the raciness even the slightest setup change can open/close the race-window - so it is "expected" that "... because all other VMs (and PMs) have no issues, despite identical underlying installations." can happen.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ntp - 1:4.2.8p4+dfsg-3ubuntu5.1

---------------
ntp (1:4.2.8p4+dfsg-3ubuntu5.1) xenial; urgency=medium

  * d/p/ntp-4.2.8p4-segfaults-[1-3]-3.patch fix startup crashes by
    including Juergen Perlinger's work on upstream bugs 2954 and 2831 to
    fix those (LP: #1567540).

 -- Christian Ehrhardt <email address hidden> Mon, 01 Aug 2016 10:50:52 +0200

Changed in ntp (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for ntp has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Displaying first 40 and last 40 comments. View all 102 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.