Merge ~aieri/charm-prometheus2:lp1891942-properly-reload-prom-process-via-sighup into charm-prometheus2:master

Proposed by Andrea Ieri
Status: Merged
Approved by: Jeremy Lounder
Approved revision: ed9cebd991e402d4d4fd1199ea1bdc872f819b0d
Merged at revision: ed9cebd991e402d4d4fd1199ea1bdc872f819b0d
Proposed branch: ~aieri/charm-prometheus2:lp1891942-properly-reload-prom-process-via-sighup
Merge into: charm-prometheus2:master
Diff against target: 117 lines (+31/-16)
3 files modified
src/reactive/prometheus.py (+8/-12)
src/tests/functional/tests/tests_prometheus.py (+23/-1)
src/tests/unit/test_reactive_prometheus.py (+0/-3)
Reviewer Review Type Date Requested Status
Jeremy Lounder (community) Approve
Drew Freiberger Pending
Jose Guedez Pending
Alvaro Uria Pending
Review via email: mp+390044@code.launchpad.net

This proposal supersedes a proposal from 2020-08-24.

Commit message

Properly reload all Prometheus processes. Fixes LP#1891942.

This solution is very similar to that which was *removed* in 30e80. The removed solution did not identify and reload all Prometheus processes. As per the official Prometheus documentation SIGHUP can be used to reload Prometheus: https://prometheus.io/docs/prometheus/latest/configuration/configuration/

An alternative solution was considered where Prometheus would be reloaded using the HTTP API. This solution was not implemented because it had additional security implications that needed to be considered.

To post a comment you must log in.
Revision history for this message
Jose Guedez (jfguedez) wrote : Posted in a previous version of this proposal

LGTM +1

review: Approve
Revision history for this message
Joel Sing (jsing) : Posted in a previous version of this proposal
Revision history for this message
Alvaro Uria (aluria) wrote : Posted in a previous version of this proposal

By security implications, I understand you mean that "--web.enable-lifecycle" arg is not safe because it would allow anybody to reload and shutdown the Prometheus instance. I see, the reload_prometheus function fix in this MP uses SIGHUP instead of the HTTP API endpoint, which looks good (please review comment from Joel to simplify the SIGHUP oneliner).

I also see that the daemon-args Juju config option [2] is empty by default. This means that the previous approach of using /-/reload could not work by default. Should we notify via status_set when --web.enable-lifecycle is used? I think banning its use would be too agressive, but some users may have set it up and with this fix, the daemon arg option would not be needed anymore.

I'll mark the MP as Needs fixing to use pkill to reload Prometheus. Once fixed, I will run all tests before approval.

Thank you,
-Alvaro.

1. https://prometheus.io/docs/operating/security/
2. https://git.launchpad.net/~exsdev/charm-prometheus2/tree/src/config.yaml?h=lp1891942-properly-reload-prom-process-via-sighup#n2

review: Needs Fixing
Revision history for this message
Drew Freiberger (afreiberger) wrote : Posted in a previous version of this proposal

tested joel's recommendation +1 to update to pkill -HUP -f ...

review: Needs Fixing
Revision history for this message
Drew Freiberger (afreiberger) wrote : Posted in a previous version of this proposal

LGTM!

review: Approve
Revision history for this message
Alvaro Uria (aluria) wrote : Posted in a previous version of this proposal

lint run-test: commands[1] | black --check --exclude '/(\.eggs|\.git|\.tox|\.venv|\.build|dist|charmhelpers|mod)/' .
would reformat /home/circleci/project/charm-prometheus2/src/reactive/prometheus.py
Oh no! 💥 💔 💥
1 file would be reformatted, 3 files would be left unchanged.

review: Needs Fixing
Revision history for this message
Jeremy Lounder (jldev) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/src/reactive/prometheus.py b/src/reactive/prometheus.py
2index b176191..15bcd28 100644
3--- a/src/reactive/prometheus.py
4+++ b/src/reactive/prometheus.py
5@@ -1,30 +1,25 @@
6 """Charm Prometheus hooks."""
7 import glob
8 import grp
9+from hashlib import sha1
10 import os
11+from pathlib import Path
12 import pwd
13 import re
14 import shutil
15 import socket
16 import subprocess
17 import time
18-from hashlib import sha1
19-from pathlib import Path
20 from urllib.parse import urlparse
21
22 from charmhelpers import fetch
23 from charmhelpers.contrib.charmsupport import nrpe
24 from charmhelpers.core import hookenv, host, unitdata
25 from charmhelpers.core.templating import render
26-
27 from charms.layer import snap
28 from charms.reactive import hook, is_state, remove_state, set_state, when, when_not
29 from charms.reactive.helpers import any_file_changed, data_changed
30-
31 from jinja2 import Template
32-
33-import requests
34-
35 import yaml
36
37
38@@ -416,7 +411,9 @@ def write_prometheus_config_yml():
39
40 if config.changed("prometheus_registration_listen"):
41 write_rendered_template(
42- source="promreg.yaml.j2", target=PATHS["promreg_yml"], context=options,
43+ source="promreg.yaml.j2",
44+ target=PATHS["promreg_yml"],
45+ context=options,
46 )
47 hookenv.open_port(config["prometheus_registration_port"])
48 set_state("promreg.do-restart")
49@@ -575,10 +572,9 @@ def restart_prometheus():
50 def reload_prometheus():
51 """Reload prometheus configuration."""
52 if host.service_running(PATHS["service_name"]):
53- # Use /-/reload endpoint to reload config gracefully
54- port = unitdata.kv().get("prometheus.port", 9090)
55- url = "http://127.0.0.1:{}/-/reload".format(port)
56- requests.get(url)
57+ subprocess.call(
58+ "pkill -HUP -f '/snap/prometheus/.*/bin/prometheus '", shell=True
59+ )
60 remove_state("prometheus.do-reload")
61 hookenv.status_set("active", "Ready")
62 else:
63diff --git a/src/tests/functional/tests/tests_prometheus.py b/src/tests/functional/tests/tests_prometheus.py
64index 306b760..9794ce0 100644
65--- a/src/tests/functional/tests/tests_prometheus.py
66+++ b/src/tests/functional/tests/tests_prometheus.py
67@@ -1,9 +1,10 @@
68 #!/usr/bin/env python3
69 """Zaza functional tests for prometheus."""
70+import json
71 import unittest
72
73+import requests
74 import yaml
75-
76 from zaza import model
77 from zaza.charm_lifecycle import utils as lifecycle_utils
78
79@@ -68,3 +69,24 @@ class BasicPrometheusCharmTest(PrometheusCharmTestBase):
80 label_addr = result["Stdout"].strip()
81 telegraf_addr = model.get_lead_unit_ip("telegraf")
82 self.assertEqual(label_addr, telegraf_addr)
83+
84+ def test_service_reload_after_change(self):
85+ """Test service reload."""
86+ model.add_unit("ubuntu", wait_appear=True)
87+ model.block_until_all_units_idle(timeout=600)
88+
89+ action = model.run_on_unit("ubuntu/1", "hostname -f")
90+ new_target = action["Stdout"].strip()
91+
92+ prom_ip = model.get_lead_unit_ip("prometheus2")
93+ prom_port = 9090
94+ url = "http://{}:{}/api/v1/targets".format(prom_ip, prom_port)
95+ response = requests.get(url)
96+ content = json.loads(response.content)
97+
98+ target_found = False
99+ for targets in content["data"]["activeTargets"]:
100+ if targets["labels"].get("dns_name", None) == new_target:
101+ target_found = True
102+
103+ self.assertTrue(target_found)
104diff --git a/src/tests/unit/test_reactive_prometheus.py b/src/tests/unit/test_reactive_prometheus.py
105index 30e1ca9..84e3144 100644
106--- a/src/tests/unit/test_reactive_prometheus.py
107+++ b/src/tests/unit/test_reactive_prometheus.py
108@@ -7,11 +7,8 @@ import unittest
109
110 from charmhelpers.core import hookenv, host, unitdata
111 from charmhelpers.core.templating import render
112-
113 from charms.reactive import bus
114-
115 import mock
116-
117 import yaml
118
119

Subscribers

People subscribed via source and target branches

to all changes: