NTP Charm

Merge ~paulgear/ntp-charm/+git/ntp-charm:autopeers into ntp-charm:master

Proposed by Paul Gear on 2017-10-16

Status:	Merged
Merged at revision:	9825def33421f3177ee772e80f8f68dd1367d7f8
Proposed branch:	~paulgear/ntp-charm/+git/ntp-charm:autopeers
Merge into:	ntp-charm:master
Diff against target:	1041 lines (+794/-62) 12 files modified .gitignore (+1/-0) Makefile (+18/-3) README.md (+4/-5) config.yaml (+8/-3) files/nagios/check_ntpmon.py (+1/-1) hooks/ntp_hooks.py (+131/-49) hooks/ntp_scoring.py (+159/-0) hooks/ntp_source_score.py (+198/-0) tests/10-deploy-test.py (+1/-1) unit_tests/test_ntp_hooks.py (+0/-0) unit_tests/test_ntp_scoring.py (+106/-0) unit_tests/test_ntp_source_score.py (+167/-0)
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Stuart Bishop (community)		2017-10-16	Approve on 2017-11-07
Review via email: mp+332326@code.launchpad.net

Description of the change

This MP introduces one major piece of functionality and some other minor improvements.

The previously-deprecated auto_peers option now enables new logic which tests connectivity to upstream NTP servers, then uses a simple scoring mechanism to select the most suitable servers to act as a service stratum between the upstream NTP servers and the rest of the juju environment.

This is primarily intended for use in medium-to-large OpenStack environments where Ceph needs a stable, nearby set of NTP sources. At present in a number of our production OpenStack environments this is achieved by creating two separate ntp services; the new auto_peers functionality achieves this without requiring manual setup. It's my expectation that this will reduce manual tuning in some of our less-well-connected OpenStacks, and result in fewer alerts.

In some environments, NTP is erroneously deployed to containers, conflicting with NTP on the host; this version of the charm automatically detects when it is running in a container and disables NTP.

Both of the above are reflected in juju status, to make it easy for the operator to see the results of the charm's automated decisions and correct if necessary.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-10-18:

#

I will rebase before merge, but the contents shouldn't change from this.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2017-10-18:

#

Can you add a description of this change and why it's needed? Also some comments inline.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-10-18:

#

Replies to comments inline.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2017-10-18:

#

I get a "make lint" error from this:

hooks/ntp_hooks.py:92:1: C901 'write_config' is too complex (10)

You may want to consider ignoring that in the default lint target and adding a "complex-lint" target that doesn't ignore it so you can work it down to zero over time.

How does some use ntp_source_score.py? You mention in the comments for that file that it can be for diagnostic purposes. That sounds like something that would be nice to expose via juju actions. I don't think that should necessarily block the landing of this, but if you're planning to advertise that functionality at all, I think it should be done via juju actions first.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2017-10-18:

#

Also, just noticed there's a unit_tests directory. Would be good to have a Makefile target for running tests.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-10-18:

#

> I get a "make lint" error from this:
>
> hooks/ntp_hooks.py:92:1: C901 'write_config' is too complex (10)
>
> You may want to consider ignoring that in the default lint target and adding a
> "complex-lint" target that doesn't ignore it so you can work it down to zero
> over time.

Which version of flake8 are you running? On my system (flake8 3.2.1-1), I have to drop max-complexity from 10 to 8 to get a failure in that method.

> How does some use ntp_source_score.py? You mention in the comments for that
> file that it can be for diagnostic purposes. That sounds like something that
> would be nice to expose via juju actions. I don't think that should
> necessarily block the landing of this, but if you're planning to advertise
> that functionality at all, I think it should be done via juju actions first.

That would probably be important to have for receiving troubleshooting reports from others, but I wasn't planning to advertise it in the first instance. I'll try to get some time to add one in the next couple of weeks.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2017-10-19:

#

> > I get a "make lint" error from this:
> >
> > hooks/ntp_hooks.py:92:1: C901 'write_config' is too complex (10)
> >
> > You may want to consider ignoring that in the default lint target and adding
> a
> > "complex-lint" target that doesn't ignore it so you can work it down to zero
> > over time.
>
> Which version of flake8 are you running? On my system (flake8 3.2.1-1), I
> have to drop max-complexity from 10 to 8 to get a failure in that method.

2.5.4.

$ dpkg -l | grep flake8
ii flake8 2.5.4-2 all code checker using pep8 and pyflakes
ii python-flake8 2.5.4-2 all code checker using pep8 and pyflakes - Python 2.x
ii python3-flake8 2.5.4-2 all code checker using pep8 and pyflakes - Python 3.x

> > How does some use ntp_source_score.py? You mention in the comments for that
> > file that it can be for diagnostic purposes. That sounds like something that
> > would be nice to expose via juju actions. I don't think that should
> > necessarily block the landing of this, but if you're planning to advertise
> > that functionality at all, I think it should be done via juju actions first.
>
> That would probably be important to have for receiving troubleshooting reports
> from others, but I wasn't planning to advertise it in the first instance.
> I'll try to get some time to add one in the next couple of weeks.

Revision history for this message

Stuart Bishop (stub) wrote on 2017-10-19:

#

Comments added inline. Leaving actual approval for Tom and test suite/flake issues.

review: Approve

Revision history for this message

Paul Gear (paulgear) wrote on 2017-10-19:

#

Some replies to Stuart's comments inline.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-10-24:

#

Setting this back to WIP while I improve test suite coverage.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-11-06:

#

I've added quite a few things to the test suite; it's not 100% coverage, but it should be enough to provide sufficient confidence in the code to merge.

Revision history for this message

Stuart Bishop (stub) wrote on 2017-11-06:

#

Looks good. A few minor comments. The psutil magic import at the start of ntp_scoring.py needs to be fixed, or it may fail the first time it is run.

review: Approve

Revision history for this message

Paul Gear (paulgear) wrote on 2017-11-07:

#

Pushed new version with updates.

Revision history for this message

Stuart Bishop (stub) wrote on 2017-11-07:

#

Looks good.

You probably want 'fatal=True' in install_packages(), which I missed last time. I doubt the rest of this charm will work if the packages are missing, so its better to fail early.

review: Approve

Revision history for this message

Paul Gear (paulgear) wrote on 2017-11-08:

#

On 07/11/17 19:10, Stuart Bishop wrote:
> Review: Approve
>
> Looks good.
>
> You probably want 'fatal=True' in install_packages(), which I missed last time. I doubt the rest of this charm will work if the packages are missing, so its better to fail early.

The rest of the charm should work fine if those packages are missing,
and the scoring & auto-peering should fail gracefully if they are.

I've pushed a fix to handle the case where python3-psutil is missing.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

NTP charm developers

Paul Gear