chrony.service doesn't start on LXD container

Bug #1589780 reported by Dongwon Cho
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
chrony (Ubuntu)
Fix Released
High
Christian Ehrhardt 

Bug Description

As you can see below, there seems some problem with adjtimex on LXD container.
sudo systemctl status chrony.service
● chrony.service - LSB: Controls chronyd NTP time daemon
   Loaded: loaded (/etc/init.d/chrony; bad; vendor preset: enabled)
   Active: active (exited) since Tue 2016-06-07 11:26:24 KST; 4min 33s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1849 ExecStop=/etc/init.d/chrony stop (code=exited, status=0/SUCCESS)
  Process: 1855 ExecStart=/etc/init.d/chrony start (code=exited, status=0/SUCCESS)

Jun 07 11:26:24 ntp systemd[1]: Starting LSB: Controls chronyd NTP time daemon...
Jun 07 11:26:24 ntp chronyd[1861]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -DEBUG +ASYNCDNS +IPV6 +SECHASH)
Jun 07 11:26:24 ntp chrony[1855]: adjtimex() failed
Jun 07 11:26:24 ntp systemd[1]: Started LSB: Controls chronyd NTP time daemon.
Jun 07 11:30:40 ntp systemd[1]: Started LSB: Controls chronyd NTP time daemon.

adjtimex -p
         mode: 0
       offset: -1511319
    frequency: -205143
     maxerror: 140500
     esterror: 0
       status: 24577
time_constant: 7
    precision: 1
    tolerance: 32768000
         tick: 10000
     raw time: 1465268304s 620580175us = 1465268304.620580175

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

dpkg -l | egrep 'chrony|adjtimex'
ii adjtimex 1.29-7 amd64 kernel time variables configuration utility
ii chrony 2.1.1-1 amd64 Versatile implementation of the Network Time Protocol

Revision history for this message
Dongwon Cho (dongwoncho) wrote :

ntp seems to have the similar issue but does not stop the service.
Since LXC container is limited to access kernel variables, it might be difficult to make it.

sudo systemctl status ntp
● ntp.service - LSB: Start NTP daemon
   Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
   Active: active (running) since Tue 2016-06-07 12:20:52 KST; 4s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 6567 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)
  Process: 6577 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)
    Tasks: 2
   Memory: 648.0K
      CPU: 46ms
   CGroup: /system.slice/ntp.service
           └─6587 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 113:117

Jun 07 12:20:54 ntp ntpd[6587]: Soliciting pool server 211.233.84.186
Jun 07 12:20:54 ntp ntpd[6587]: Soliciting pool server 211.233.40.78
Jun 07 12:20:55 ntp ntpd[6587]: Soliciting pool server 80.241.0.72
Jun 07 12:20:55 ntp ntpd[6587]: Soliciting pool server 118.67.201.10
Jun 07 12:20:55 ntp ntpd[6587]: Soliciting pool server 218.234.23.44
Jun 07 12:20:55 ntp ntpd[6587]: adj_systime: Operation not permitted
Jun 07 12:20:56 ntp ntpd[6587]: Soliciting pool server 82.200.209.236
Jun 07 12:20:56 ntp ntpd[6587]: Soliciting pool server 62.201.225.9
Jun 07 12:20:56 ntp ntpd[6587]: Soliciting pool server 160.16.101.116
Jun 07 12:20:56 ntp ntpd[6587]: Soliciting pool server 106.247.248.106

Revision history for this message
Dongwon Cho (dongwoncho) wrote :

It seems not possible a container to have its own system time being managed.
http://lwn.net/Articles/179825/
There seemed a patch as above for time namespace, but it seems rejected.

Changed in chrony (Ubuntu):
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I'm re-opening this. As per the previous comments, chrony indeed doesn't work inside a container because it fails to set the time of the local system, it crashes, and fails to start.

However, chrony should be able to run just fine inside a container even if it cannot set the time of the local system.

This requires the '-x' option to be sent to the daemon in a container:

-x
           This option disables the control of the system clock. chronyd will
           not make any adjustments of the clock, but it will still track its
           offset and frequency relative to the estimated true time, and be
           able to operate as an NTP server. This allows chronyd to run
           without the capability to adjust or set the system clock (e.g. in
           some containers).

Changed in chrony (Ubuntu):
status: Invalid → New
David Britton (dpb)
Changed in chrony (Ubuntu):
assignee: nobody → ChristianEhrhardt (paelzer)
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I started a discussion upstream, but afterwards on tests realized that even then it fails with:
cap_set_proc() failed

Need to look into that a bit deeper ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems that on priv-drop (independent to -x) it wants to keep privs for certain things.
Essentially ntp only if it binds, but cap_sys_time it wants always.

SYS_Linux_DropRoot(uid_t uid, gid_t gid)
{
  const char *cap_text;
  cap_t cap;

  if (prctl(PR_SET_KEEPCAPS, 1)) {
    LOG_FATAL("prctl() failed");
  }

  UTI_DropRoot(uid, gid);

  /* Keep CAP_NET_BIND_SERVICE only if NTP port can be opened */
  cap_text = CNF_GetNTPPort() ?
             "cap_net_bind_service,cap_sys_time=ep" : "cap_sys_time=ep";

  if ((cap = cap_from_text(cap_text)) == NULL) {
    LOG_FATAL("cap_from_text() failed");
  }

  if (cap_set_proc(cap)) {
    LOG_FATAL("cap_set_proc() failed");
  }

That is the failing part, we'd need to check if we can make the info of -x available so that it does not want cap_sys_time in that case.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So far it only has uid/gid, call chain is

main
/* Drop root privileges if the specified user has a non-zero UID */
if (!geteuid() && (pw->pw_uid || pw->pw_gid))
  SYS_DropRoot(pw->pw_uid, pw->pw_gid);
      ->
            SYS_DropRoot (maps to implementations)
          ->
                SYS_Linux_DropRoot (linux implementation)
                There also is Solaris, netbsd, MacOSX

There is an arg clock_control that can be passed, and if zero should make it not demand cap_sys_ntp.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks to Vincent Blut I was pointed to [1]

That pretty much looks like the patch I was going to write, thanks a lot Vincent!

There are some extra needs, to actually start in a container, but maybe those are upstream as well - I'll check that.

Otherwise my plan would be to somehow match on !cap_sys_time to add -x as parameter.
Maybe a second systemd file chronyd-container.service or such would do (a bit annoying to be a different name, but alias won't work as there is the real "chrony" service. Maybe I can do that in one service file to depend the arguments on the capability.
Since !cap / cap is mutally exclusive only one of each would run at any time.

But as I said, maybe such a change was made upstream already and could also be backported.

[1]: https://git.tuxfamily.org/chrony/chrony.git/commit/?id=e8096330be1eb4db25b14152b14550c6c0bbaa63

Changed in chrony (Ubuntu):
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Two system files are stupid for not being able to have a single name.
But a wrapper will work.
We can have one service, drop the Condition off the current .service.
And then the wrapper can add -x if running without cap_sys_time.
It can also make sure there is no -x already in DAEMON_OPTS and such.

Eventually the behavior will be:
1. it will run everywhere
2. it will set the time if it is able to

That should be a valid solution for now and we might upstream if they want to make this the default without -x or something like that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I had the attached wrapper and it worked, but it feels wrong to solve it that way.
Instead (plan) I'm gonna implement in C instead.
I think it will be after getopt checking cap_sys_time and imply -x if not available.

I plan to suggest so upstream, if accepted I'll apply.

Otherwise for now we might wrap it in a if to an environment variable that the .service file will set.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.1 KiB)

Note: I now also implemented it trying to follow the usual chrony generic/OSType split as I hope this will make it more acceptable.
WIP Test PKG is available in PPA: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3188

Tests:
1. starts in container if -x is set in /etc/defaults/chrony - yes
2. implies -x if not able to control clock - yes
3. in an environment where it can control the time it does so (without fallback)- yes
4. -x in an environment where it can control the time works - TBD

#1
● chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-03-07 11:15:23 UTC; 2s ago
     Docs: man:chronyd(8)
           man:chronyc(1)
           man:chrony.conf(5)
  Process: 30709 ExecStartPost=/usr/lib/chrony/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 30705 ExecStart=/usr/sbin/chronyd $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 30707 (chronyd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/chrony.service
           └─30707 /usr/sbin/chronyd -x

Mar 07 11:15:23 b systemd[1]: Starting chrony, an NTP client/server...
Mar 07 11:15:23 b chronyd[30707]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYN
Mar 07 11:15:23 b chronyd[30707]: Disabled control of system clock
Mar 07 11:15:23 b chronyd[30707]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/chrony.drift
Mar 07 11:15:23 b systemd[1]: Started chrony, an NTP client/server.

#2
Mar 07 11:16:45 b systemd[1]: Starting chrony, an NTP client/server...
Mar 07 11:16:45 b chronyd[30727]: 2018-03-07T11:16:45Z Time not adjustable, implying -x (do not set system clock)
Mar 07 11:16:45 b chronyd[30729]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYN
Mar 07 11:16:45 b chronyd[30729]: Disabled control of system clock
Mar 07 11:16:45 b chronyd[30729]: Frequency -5.126 +/- 50.533 ppm read from /var/lib/chrony/chrony.drift
Mar 07 11:16:45 b systemd[1]: Started chrony, an NTP client/server.

#3
● chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-03-07 11:25:40 UTC; 12s ago
     Docs: man:chronyd(8)
           man:chronyc(1)
           man:chrony.conf(5)
 Main PID: 26894 (chronyd)
    Tasks: 1 (limit: 551)
   CGroup: /system.slice/chrony.service
           └─26894 /usr/sbin/chronyd

Mär 07 11:25:40 b-test systemd[1]: Starting chrony, an NTP client/server...
Mär 07 11:25:40 b-test chronyd[26894]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND
Mär 07 11:25:40 b-test chronyd[26894]: Initial frequency -3.327 ppm
Mär 07 11:25:40 b-test systemd[1]: Started chrony, an NTP client/server.
Mär 07 11:25:53 b-test chronyd[26894]: Selected source 84.2.44.19

#4
● chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-03-07 11:27:00 UTC; 1s ago
     D...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I submitted upstream, but still don't see it in the archive.
I'm subscribed and got no reject or such, so I wonder what I miss ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I resubmitted to -dev instead of -user, which is more correct for the patch anyway.
Lets see if it shows up there ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And there it is https://<email address hidden>/msg01807.html
Of course now it shows up on both :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Discussion went back and forth upstream as well as with people on the bug.
Trying to summarize:

For upstream it is not reasonable to "silently" do not work as expected - and that is correct.
There was a discussion on a "Fallback if unable to set it" option as an alternative.
But even that you would not want to set everywhere to avoid silent fallback on many places it is supposed to work.
This is mostly about user expectation especially of the client portion "I install chrony to sync my time" vs silently not doing so.
On the server side the also the degradation of that time serving is considered requiring an explicit opt in.

In the downstream discussion as a Distribution we are in a different spot.
Moving the decision point from "CAP_SYS_TIME" to "in-container" there are several benefits.

This is important, chrony will go on to drop all but a few CAPs as it did so far.
And the code we are considering now to improve behavior to be better and more consistent is not on CAP_SYS_TIME but instead "if in container".

1. if you are in a container you very likely can't set the time. so installing chrony there would
   silently not start the chrony service for lacking CAP_SYS_TIME
   Not only did you now install chrony and got no error, but it does nothing.
   If there are services depending on chrony service they are down as well.
2. if you are in busted Host that grants no CAP_SYS_TIME the same as above will silently happen.
   And on a host you'd really want to know it is failing.
3. If you are in a container with special privileges to set the time (rare) it is very dangerous
   to do so. As multiple containers on that system doing fighting time adjustments is the worst.
4. If run in a container a log message should make it clear that expectations should not be to
   sync the local time as it is impossible (in 99.9+% of the cases)

For #1 you want to default to -x if you are in a container
For #2 you want to drop the condition on CAP_SYS_TIME
For #3 you want to default to -x if you are in a container
For #4 you want to log a message if the case is detected

This are defaults, admins can override to get the other behavior.

This could either be done:
- in a Chrony patch to provide this behavior and a cmdline option to override.
- in a wrapper script to the ExecStart doing the checks/messaging and adding the -x chrony already provides

The former would be nicer, but require to re-implement a lot of systemd-detect-virt in chrony which feels wrong a bit, so I'll go with a wrapper for now (a bit like comment #9, but evolved).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I summarized the above in a reply to upstream, since we have different places to implement I'd pick the one they prefer and also hear their opinion in general.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - upstream discussion goes on, maybe we find a way that works better, but is less hacky.
I'll update here once we settled on something there.

OTOH - everybody interested on this is invited to chime in there.
For the mail archive of it see: https://<email address hidden>/msg01807.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - New patches sent upstream, discussion ongoing.
V1 - https://<email address hidden>/msg01807.html
V2 - https://<email address hidden>/msg01829.html
V3 - https://<email address hidden>/msg01832.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.3 KiB)

With discussions ongoing a new option came up.

Upstream doesn't really see the case for -X yet, but we can't maintain it reasonably for what we need it without that upstream. (we can do it in shell wrappers, but that is a huge maintenance debt and fails as it can never fully test if it can adjust the time the way chrony would).

Here a log of the discussion that came up with a new two-service model based on OnFailure.
That needs some experiments if it could work for us.

[16:03] <cpaelzer> mlichvar: ping, on the discussion around -X - would you have time to discuss that?
[16:03] <mlichvar> cpaelzer: sure
[16:04] <cpaelzer> mlichvar: I just replied to your last mail - but I'll happily try to outline the use-case in more ways until we found something we can agree upon
[16:04] <cpaelzer> currently I really think it is a needed use case in addition to "-x or not -x"
[16:05] * mlichvar is reading the email
[16:05] <cpaelzer> to some extend it is due to the lack of a split between client/server - the service covers both
[16:05] <cpaelzer> I essentially want -X to be "be a server for sure, and a client if you can"
[16:05] <cpaelzer> maybe these words are better?
[16:06] <cpaelzer> waiting until you have read the mail ...
[16:06] <mlichvar> the trouble that I have with this, is that the client part is much more important the the server part
[16:07] <mlichvar> I think enabling -X by default would be a mistake
[16:07] <mlichvar> people and applications will see a service running, from chronyc everything looks as expected, but the clock is not synchronized
[16:08] <cpaelzer> I see that, which is part why I wrote "discussions are ongoing" and I'm leaning to the "-X is not the default"
[16:08] <cpaelzer> but
[16:08] <cpaelzer> I have applications dragging chrony in that need just this behavior
[16:08] <mlichvar> then there is a question when it would make sense to use -X, but not -x
[16:08] <mlichvar> any examples?
[16:09] <mlichvar> maybe there should be two different services? one for client and optionally a server, and another for server only?
[16:09] <cpaelzer> for said application above, they want to have the server portion work for sure (which -X gives them) but they also want the client part if they can get it (that only -X provides, -x won#t do that)
[16:10] <r3> kenyon: I've changed it so it now allows IPv6 addresses. Tested it too - let me know if it works for you?
[16:10] <cpaelzer> mlichvar: yeah I mentioned the two service approach a while ago I think
[16:11] <cpaelzer> but I think you told me that they are usually tightly interlocked
[16:11] <mlichvar> -x/-X breaks the chrony-wait.service, which could be a problem
[16:11] <mlichvar> and everything that just checks if the service is running and/or chronyc reports "synchronized"
[16:11] <cpaelzer> mlichvar: for the last mention of server/client split https://<email address hidden>/msg01816.html
[16:12] <cpaelzer> mlichvar: but that lack of chrony-wait is the drawback of -x already, -X would inherit that in the case it can't sync the clock
[16:13] <mlichvar> yes, that would be a reason to have a separate service
[16:13] <mlichvar> they would never run both at the...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In case upstream won't accept the changes, here he revised Design of the wrapper as mentioned in comment #9:
- The amount of "-x" passed to chronyd doesn't matter, so we can just add one.
  - But OTOH the X-SET check is good and can be used to silence the warnings we otherwise would
    emit (no need to fall back if set this way)
- We want the capsh check only to be a warning a la "you likely can't set the time as you lack
  CAP_SYS_TIME" but not "the only decision maker to set -x
  If CAP_SYS_TIME is missing, set -x and warn (that way around the logic works)
  But if it is avail that is no guarantee that all is fine.
- The extra decision to default to -x as well is "systemd-detect-virt --container"
  Also along with a message that explains that (and why) we do so.
- /etc/default/chrony needs an option to override this for people that really WANT to start without -x in containers

For Debian packaging:
- Yet untested code for the wrapper approach available at [1].
- The code that uses the suggested change to chrony itsel is available in [2]

[1]: https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+ref/bionic-lp1589780-run-in-container-wrapper
[2]: https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+ref/bionic-lp1589780-run-in-container

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: while the wrapper approach would work (after some test and polishing), compared to the upstream -X it will have one drawback.

If future containers grow time-namespacing as a feature it will still default to not adjusting the time in containers.
But as long as we don't know how time-namespaces will be implemented we can't code special cases for that (while chrony itself can try and fallback which is the -X option).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From the ML discussions upstream might take the initialization improvements, but not the try-and-fall-back we need.
So we will go with the wrapper for now.

There might be a best of both worlds approach to make us safe against time namespaces.
We could use the wrapper to set -X (upper case) and carry a patch.
But this is already a maintenance burden being the script and with it in place -X won't give us too much on top.

So going the fix via wrapper that sets -x (lower case) in the cases we think it is needed.
And eating the implications this have (install chrony and service working means not always we sync time).
But we provide:
  - log entries that warn about it
  - config options to override

I'll let you know when some tests are done and this is ready for wider Ubuntu review.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Pre-Checks:
- Message on install only visible in container
- Service running in default conf after install in container and Metal
- Warnings that this is a non syncing fallback emphasized in log (log level recognized)
- -x added in container as default.
- Warnings/Fallback skipped if manual -x is present
- if SYNC_IN_CONTAINER="yes" is set it gets warnings (to understand why) + failing service
- started through shell still correct pid is tracked by systemd
- Multiple daemon opts in default conf passed and extended correctly
- Nothing specified in defaults file (use sane defaults fallback)
- default conf file deleted (proper exit message)
- daemon binary deleted (proper exit message)

Prepping a MP with that now ...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package chrony - 3.2-4ubuntu2

---------------
chrony (3.2-4ubuntu2) bionic; urgency=medium

  * Set -x as default if unable to set time (e.g. in containers) (LP: #1589780)
    Chrony is a single service which acts as both NTP client (i.e. syncing the
    local clock) and NTP server (i.e. providing NTP services to the network),
    and that is both desired and expected in the vast majority of cases.
    But in containers syncing the local clock is usually impossible, but this
    shall not break the providing of NTP services to the network.
    To some extent this makes chrony's default config more similar to 'ntpd',
    which complained in syslog but still provided NTP server service in those
    cases.
    - d/p/lp1589780-sys_linux-don-t-keep-CAP_SYS_TIME-with-x-option.patch:
      When dropping the root privileges, don't try to keep the CAP_SYS_TIME
      capability if the -x option was enabled. This allows chronyd to be
      started without the capability (e.g. in containers) and also drop the
      root privileges.
    - debian/chrony.service: allow the service to run without CAP_SYS_TIME
    - debian/control: add new dependency libcap2-bin for capsh (usually
      installed anyway, but make them explicit to be sure).
    - debian/chrony.default: new option SYNC_IN_CONTAINER to not fall back
      (Default off).
    - debian/chronyd-starter.sh: wrapper to handle special cases in containers
      and if CAP_SYS_TIME is missing. Effectively allows to run NTP server in
      containers on a default installation and avoid failing to sync time (or
      if allowed to sync, avoid multiple containers to fight over it by
      accident).
    - debian/install: make chronyd-starter.sh available on install.
    - debian/docs, debian/README.container: provide documentation about the
      handling of this case.
  * debian/chrony.conf: update default chrony.conf to not violate the policy
    of pool.ntp.org (to use no more than four of their servers) and to provide
    more ipv6 capable sources by default (LP: #1754358)

 -- Christian Ehrhardt <email address hidden> Fri, 16 Mar 2018 12:25:44 +0100

Changed in chrony (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.