Merge ~lihuiguo/charm-nrpe:bug/1901260 into charm-nrpe:master

Proposed by Linda Guo
Status: Rejected
Rejected by: Haw Loeung
Proposed branch: ~lihuiguo/charm-nrpe:bug/1901260
Merge into: charm-nrpe:master
Diff against target: 142 lines (+108/-0)
4 files modified
config.yaml (+5/-0)
files/plugins/check_snap_refresh.py (+69/-0)
hooks/nrpe_helpers.py (+13/-0)
tests/functional/tests/nrpe_tests.py (+21/-0)
Reviewer Review Type Date Requested Status
James Troup (community) Needs Fixing
Joe Guo (community) Needs Fixing
BootStack Reviewers mr tracking; do not claim Pending
BootStack Reviewers Pending
BootStack Reviewers Pending
Review via email: mp+405155@code.launchpad.net

Commit message

    Added snap refresh check

    check_snap_refresh.py to check if snap refresh is successful
    Alerts if snap refresh failed.

    LP#1901260

Description of the change

Added snap refresh check

To post a comment you must log in.
Revision history for this message
Xav Paice (xavpaice) wrote :

1 nit

~lihuiguo/charm-nrpe:bug/1901260 updated
405910a... by Linda Guo

fixed typo

79e6f9e... by Linda Guo

skip if last refresh time is n/a

last refresh time unavailable on newly deployed machine,
skip snap refresh checking

5a9a4b9... by Linda Guo

 trigger a snap refresh on installing the check

run snap refresh on install the checking
to avoid newly deployed machines start off
with warning

Revision history for this message
Joe Guo (guoqiao) wrote :

2 inline comments, only the `/` vs `//` matters.

review: Needs Fixing
~lihuiguo/charm-nrpe:bug/1901260 updated
950d4fc... by Linda Guo

Updated according to Joe's comment

Revision history for this message
Linda Guo (lihuiguo) wrote :

Fix's Joe's comments

Revision history for this message
James Troup (elmo) wrote :

I think we need to be more careful in parsing the timer output; I just checked a random etcd unit and saw this:

ubuntu@juju-647be6-10-lxd-0:~$ snap refresh --abs-time --time
timer: wed2
last: 2021-06-09T00:02:00Z
next: 2021-07-14T00:00:00Z
ubuntu@juju-647be6-10-lxd-0:~$

See also: https://snapcraft.io/docs/timer-string-format

review: Needs Fixing
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Unmerged commits

950d4fc... by Linda Guo

Updated according to Joe's comment

5a9a4b9... by Linda Guo

 trigger a snap refresh on installing the check

run snap refresh on install the checking
to avoid newly deployed machines start off
with warning

79e6f9e... by Linda Guo

skip if last refresh time is n/a

last refresh time unavailable on newly deployed machine,
skip snap refresh checking

405910a... by Linda Guo

fixed typo

4750431... by Linda Guo

fixed typo

c25bd0d... by Linda Guo

Merge branch 'master' of git+ssh://git.launchpad.net/charm-nrpe into bug/1901260

ab631fc... by Linda Guo

Added snap refresh check

check_snap_refresh.py to check if snap refresh is successful
Alerts if snap refresh failed.

LP#1901260

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/config.yaml b/config.yaml
2index 58ff320..a4e1844 100644
3--- a/config.yaml
4+++ b/config.yaml
5@@ -193,3 +193,8 @@ options:
6 beginning with the string /sys.
7 The check is disabled on all LXD units, and also for non-container units if this parameter is
8 set to ''.
9+ snap_refresh_check:
10+ default: False
11+ type: boolean
12+ description: |
13+ To enable or disable snap refresh failure check.
14\ No newline at end of file
15diff --git a/files/plugins/check_snap_refresh.py b/files/plugins/check_snap_refresh.py
16new file mode 100755
17index 0000000..348cc60
18--- /dev/null
19+++ b/files/plugins/check_snap_refresh.py
20@@ -0,0 +1,69 @@
21+#!/usr/bin/env python3
22+"""Check snap refresh failure and alert."""
23+#
24+# Copyright 2021 Canonical Ltd
25+#
26+#
27+# Check snap refresh and alert.
28+#
29+import datetime
30+import os
31+import subprocess
32+import sys
33+
34+
35+def get_interval(timer):
36+ """Get snap refresh interval."""
37+ try:
38+ frequency = timer.split("/")[-1]
39+ interval = 24 // int(frequency)
40+ except Exception:
41+ # default 6 hours
42+ interval = 6
43+ return interval
44+
45+
46+def main():
47+ """Check snap refresh is successful or not."""
48+ try:
49+ # check if snap is installed. If not, then skip.
50+ if not os.path.exists('/usr/bin/snap'):
51+ print("OK: snap isn't installed")
52+ sys.exit(0)
53+ refresh_time = {}
54+ # snap refresh output looks like
55+ # $snap refresh --abs-time --time
56+ # timer: 00:00~24:00/4
57+ # last: 2021-03-29T15:34:00+13:00
58+ # next: 2021-03-29T22:24:00+13:00
59+ lines = subprocess.getoutput("snap refresh --abs-time --time").splitlines()
60+ if len(lines) < 3:
61+ print("WARNING: snap command returned unexpected result")
62+ sys.exit(2)
63+ for line in lines:
64+ k, v = line.split(":", 1)
65+ refresh_time[k] = v
66+ interval = get_interval(refresh_time['timer'])
67+ last_refresh = refresh_time['last'].split('+')[0].strip().replace('T', ' ')
68+ # last refresh time unavailable, maybe the machine was just installed
69+ # charm would run 'snap refresh' when installing this check, ignore here
70+ if last_refresh == "n/a":
71+ print("OK: last snap refresh time is unavailable.")
72+ sys.exit(0)
73+ last_refresh_dtime = datetime.datetime.strptime(last_refresh,
74+ '%Y-%m-%d %H:%M:%S')
75+ expect_refresh_dtime = last_refresh_dtime + datetime.timedelta(hours=interval)
76+ if datetime.datetime.now() > expect_refresh_dtime:
77+ print("CRITICAL: snap hasn't been refreshed for more than {} hours!"
78+ .format(interval))
79+ sys.exit(2)
80+ else:
81+ print("OK: next snap refresh time {}".format(refresh_time['next']))
82+ sys.exit(0)
83+ except Exception as e:
84+ print("WARNING: {}".format(str(e)))
85+ sys.exit(2)
86+
87+
88+if __name__ == "__main__":
89+ main()
90diff --git a/hooks/nrpe_helpers.py b/hooks/nrpe_helpers.py
91index e4b51fd..240925b 100644
92--- a/hooks/nrpe_helpers.py
93+++ b/hooks/nrpe_helpers.py
94@@ -507,6 +507,19 @@ class SubordinateCheckDefinitions(dict):
95 }
96 checks.append(check_ro_filesystem)
97
98+ if hookenv.config("snap_refresh_check"):
99+ snap_refresh_check = {
100+ "description": "Snap refresh check",
101+ "cmd_name": "check_snap_refresh",
102+ "cmd_exec": local_plugin_dir + "check_snap_refresh.py",
103+ "cmd_params": " ",
104+ }
105+ checks.append(snap_refresh_check)
106+ try:
107+ subprocess.check_call(["sudo", "snap", "refresh"])
108+ except subprocess.CalledProcessError as err:
109+ hookenv.log(err.output)
110+
111 if hookenv.config("lacp_bonds").strip():
112 for bond_iface in hookenv.config("lacp_bonds").strip().split():
113 if os.path.exists("/sys/class/net/{}".format(bond_iface)):
114diff --git a/tests/functional/tests/nrpe_tests.py b/tests/functional/tests/nrpe_tests.py
115index f887fdb..5f09f42 100644
116--- a/tests/functional/tests/nrpe_tests.py
117+++ b/tests/functional/tests/nrpe_tests.py
118@@ -224,3 +224,24 @@ class TestNrpe(TestBase):
119 raise model.CommandRunFailed(cmd, result)
120 content = result.get("Stdout")
121 self.assertTrue(line in content)
122+
123+ def test_06_snap_refresh(self):
124+ """Check snap refresh checks are applied."""
125+ model.set_application_config(self.application_name,
126+ {"snap_refresh_check": "True"})
127+ model.block_until_all_units_idle()
128+ cmd = "cat /etc/nagios/nrpe.d/check_snap_refresh.cfg"
129+ line = (
130+ "command[check_snap_refresh]=/usr/local/lib/nagios/plugins/"
131+ "check_snap_refresh.py"
132+ )
133+ result = model.run_on_unit(self.lead_unit_name, cmd)
134+ code = result.get("Code")
135+ if code != "0":
136+ logging.warning(
137+ "Unable to find nrpe check at "
138+ "/etc/nagios/nrpe.d/check_snap_refresh.cfg"
139+ )
140+ raise model.CommandRunFailed(cmd, result)
141+ content = result.get("Stdout")
142+ self.assertTrue(line in content)

Subscribers

People subscribed via source and target branches

to all changes: