Merge ~al3jandrosg/charm-telegraf:LP1976521 into charm-telegraf:master
- Git
- lp:~al3jandrosg/charm-telegraf
- LP1976521
- Merge into master
Status: | Needs review | ||||
---|---|---|---|---|---|
Proposed branch: | ~al3jandrosg/charm-telegraf:LP1976521 | ||||
Merge into: | charm-telegraf:master | ||||
Diff against target: |
307 lines (+216/-0) 4 files modified
src/config.yaml (+28/-0) src/reactive/telegraf.py (+144/-0) src/templates/base_inputs.conf (+12/-0) src/tests/unit/test_telegraf.py (+32/-0) |
||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
BootStack Reviewers | Pending | ||
BootStack Reviewers | Pending | ||
Review via email: mp+460245@code.launchpad.net |
Commit message
Adding support for intel_powerstat
This patch adds support for intel_powerstat by:
- fixing the permissions issues when attempting to read
`/sys/devices/
- adding charm configs: `collect_
`package_metrics`, `cpu_metrics`
Fixes [LP#1976521](https:/
Description of the change
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : | # |
Alejandro Santoyo Gonzalez (al3jandrosg) wrote : | # |
Lint tests failing but that has nothing to do with the patch:
Lint tests without patch applied:
~/MyData/
Running lint checks
lint: install_deps> python -I -m pip install black colorama flake8 flake8-colors flake8-docstrings isort pep8-naming pyproject-flake8
lint: commands[0]> pflake8
lint: commands[1]> black --check --diff --color .
--- /home/alejandro
+++ /home/alejandro
@@ -1736,12 +1736,11 @@
class TestGrafanaDash
@patch(
- def test_register_
- ...
+ def test_register_
@patch(
@patch(
@patch(
@patch(
would reformat /home/alejandro
Oh no! 💥 💔 💥
1 file would be reformatted, 15 files would be left unchanged.
lint: exit 1 (0.81 seconds) /home/alejandro
lint: FAIL code 1 (3.60=setup[
evaluation failed :( (3.66 seconds)
make: *** [Makefile:75: lint] Error 1
after the patch was applied:
alejandro@
Running lint checks
lint: commands[0]> pflake8
lint: commands[1]> black --check --diff --color .
--- /home/alejandro
+++ /home/alejandro
@@ -1736,12 +1736,11 @@
class TestGrafanaDash
@patch(
- def test_register_
- ...
+ def test_register_
@patch(
@patch(
@patch(
@patch(
would reformat /home/alejandro
Oh no! 💥 💔 💥
1 file would be reformatted, 15 files would be left unchanged.
lint: exit 1 (0.83 seconds) /home/alejandro
lint: FAIL code 1 (1.26=setup[
evaluation failed :( (1.30 seconds)
make: *** [Makefile:75: lint] Error 1
Alejandro Santoyo Gonzalez (al3jandrosg) wrote (last edit ): | # |
Lots of func test issues due to charmcraft problems. Eventually
managed to get that going and the Zaza model deployed
correctly with the below versioning on a Focal host:
sudo apt update; sudo snap refresh lxd --channel=
sudo snap install juju --channel=
juju bootstrap localhost controller; juju add-model test
sudo snap install charmcraft --channel=
sudo apt install -y make tox
git clone https:/
make functional
However, this now fails as neutron-gateway is unable to start
because of the below:
Feb 8 10:37:22 juju-32f109-7 ovs-ctl[9436]: modprobe: FATAL: Module openvswitch not found in directory /lib/modules/
However, this is not related to the patch (the steps above
fail with the out-of-the-box charm), hence why I submitted
it as is. I did check that the patch was working, i.e.,
permissions changed correctly, charm config options doing
their job.
Robert Gildein (rgildein) wrote : | # |
In general, it looks fine to me, but I would really like to see more unit tests,
since we do not have proper func either.
Chi Wai CHAN (raychan96) wrote : | # |
It might be related to the environment (lxd cloud) not your patch, because telegraf charm functional test requires openstack related charms, and that may not work well on container. We will try it from our side with VMs. Or if you could use VMs, you can also try it again without waiting for us.
Chi Wai CHAN (raychan96) wrote : | # |
I've tested your patch in our environment with VMs, it doesn't have the issue you mentioned in your functional test. https:/
The charm is in maintenance mode, so you should build and test the charm with the following setup at least (though there might be some other versions still work, I did not test it thoroughly):
```
juju 3.1.7 25751 3.1/stable canonical✓ -
charmcraft 2.0.0 1033 2.0/stable canonical✓ classic
```
and a juju controller at version 3.1.7
Alejandro Santoyo Gonzalez (al3jandrosg) wrote : | # |
Using the recommended versioning but the deployment (with the unchanged charm-telegraf repo, no patch added yet) fails as the charm cannot install:
```
charm 3.0.6 712 latest/stable canonical** classic
charmcraft 2.0.0 1033 2.0/stable canonical** classic
juju 3.1.7 25751 3.1/stable canonical** -
```
status:
Unit Workload Agent Machine Public address Ports Message
ubuntu/0* active idle 0 10.5.1.103
telegraf/0* error idle 10.5.1.103 hook failed: "install"
I'm facing lots of dependency errors. First it complains about:
2024-03-25 15:14:18 WARNING unit.telegraf/
I will upload the unit log.
This is unfortunately taking quite a bit of time. Is the automated/repo func testing going to be fixed any time soon?
Alejandro Santoyo Gonzalez (al3jandrosg) wrote : | # |
2024-03-25 15:17:39 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "install" hook
2024-03-25 15:21:41 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "install" hook
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
2024-03-25 15:21:58 WARNING unit.telegraf/
Alejandro Santoyo Gonzalez (al3jandrosg) : | # |
Eric Chen (eric-chen) wrote (last edit ): | # |
Unfortunately, we currently have no plans to restore the CI server due to the complete loss of our infrastructure. Additionally, this project is in maintenance mode and is likely to be deprecated within the year. All system metrics should be provided with node exporter. I recommend that you also raise your request there.
https:/
I could find a member to take a look again since it's close to the time of 24.04 release.
Chi Wai CHAN (raychan96) wrote : | # |
This is unfortunately caused by external dependency: https:/
Unmerged commits
- bb7a195... by Alejandro Santoyo Gonzalez
-
Adding support for intel_powerstat
This patch adds support for intel_powerstat by:
- fixing the permissions issues when attempting to read
`/sys/devices/virtual/ powercap/ intel-rapl/ `
- adding charm configs: `collect_intel_powerstat _metrics` ,
`package_metrics`, `cpu_metrics`Fixes [LP#1976521](https:/
/bugs.launchpad .net/charm- telegraf/ +bug/1976521)
Preview Diff
1 | diff --git a/src/config.yaml b/src/config.yaml |
2 | index 282a163..a66b279 100644 |
3 | --- a/src/config.yaml |
4 | +++ b/src/config.yaml |
5 | @@ -268,3 +268,31 @@ options: |
6 | intel-cmt-cat package on ppa:canonical-bootstack/public. Also, Telegraf needs |
7 | to include a patch for https://github.com/influxdata/telegraf/issues/9380, currently |
8 | available in ppa:guoqiao/telegraf. |
9 | + collect_intel_powerstat_metrics: |
10 | + default: false |
11 | + type: boolean |
12 | + description: > |
13 | + Enable the collection of power statistics on Intel-based platforms using |
14 | + the telegraf intel_powerstat plugin. |
15 | + . |
16 | + There are certain requisites to run this plugin, including having a |
17 | + kernel >= v3.13, for `uncore_frequency` metrics `intel-uncore-frequency` |
18 | + module is required (available since kernel v5.6). |
19 | + . |
20 | + See https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md |
21 | + package_metrics: |
22 | + default: "current_power_consumption" |
23 | + type: string |
24 | + description: > |
25 | + If set, the comma-separated list of package metrics to be monitored by the |
26 | + plugin. |
27 | + |
28 | + ex: "current_power_consumption,current_dram_power_consumption,thermal_design_power" |
29 | + cpu_metrics: |
30 | + default: "cpu_temperature" |
31 | + type: string |
32 | + description: > |
33 | + If set, the comma-separated list of per-CPU metrics to be monitored by the |
34 | + plugin. |
35 | + |
36 | + ex: "cpu_frequency,cpu_temperature" |
37 | \ No newline at end of file |
38 | diff --git a/src/reactive/telegraf.py b/src/reactive/telegraf.py |
39 | index eea17e6..3df37a0 100644 |
40 | --- a/src/reactive/telegraf.py |
41 | +++ b/src/reactive/telegraf.py |
42 | @@ -116,6 +116,16 @@ RDT_KERNEL_MODULE_NAME = "msr" |
43 | TLS_CERT = pathlib.Path("/etc/telegraf/tls/server.crt") |
44 | TLS_KEY = pathlib.Path("/etc/telegraf/tls/server.key") |
45 | |
46 | +# constants related to Intel Powerstat metrics support |
47 | +POWERSTAT_MINIMUM_KERNEL_VERSION = (3, 13, 0) |
48 | +POWERSTAT_KERNEL_VERSION_RESTRICTED_PERMS = (5, 4, 77) |
49 | +POWERSTAT_KERNEL_MODULE_NAMES = { # keys are kernel versions, values are module names |
50 | + (4, 0, 0): ["msr", "intel_rapl"], |
51 | + (5, 0, 0): ["rapl", "msr", "intel_rapl_common", "intel_rapl_msr"], |
52 | + (5, 6, 0): ["intel-uncore-frequency"], |
53 | +} |
54 | +POWERSTAT_ADD_PERMS_PATH = "/sys/devices/virtual/powercap/intel-rapl/" |
55 | + |
56 | |
57 | class InvalidInstallMethodError(Exception): |
58 | pass |
59 | @@ -129,6 +139,10 @@ class InvalidIntelRDTConfigurationError(Exception): |
60 | pass |
61 | |
62 | |
63 | +class InvalidIntelPowerstatConfigurationError(Exception): |
64 | + pass |
65 | + |
66 | + |
67 | class Address(Enum): |
68 | IPV4 = 1 |
69 | IPV6 = 2 |
70 | @@ -140,6 +154,17 @@ def is_intel_rdt_enabled(): |
71 | return config.get("collect_intel_rdt_metrics", False) and is_metal() |
72 | |
73 | |
74 | +def is_intel_powerstat_enabled(): |
75 | + config = hookenv.config() |
76 | + if not is_metal(): |
77 | + hookenv.log( |
78 | + """WARNING: Intel Powerstat can only be enabled on baremetal. |
79 | + Skipping.""" |
80 | + ) |
81 | + |
82 | + return config.get("collect_intel_powerstat_metrics", False) and is_metal() |
83 | + |
84 | + |
85 | def write_telegraf_file(path, content): |
86 | return host.write_file( |
87 | path, |
88 | @@ -403,6 +428,77 @@ def check_valid_intel_rdt_configuration(): |
89 | return None |
90 | |
91 | |
92 | +def check_valid_intel_powerstat_configuration(): |
93 | + """ |
94 | + Check that the requirements for Intel Powerstat are met. |
95 | + |
96 | + Will raise a InvalidIntelPowerstatConfigurationError exception when a |
97 | + validation issue is encountered, otherwise None |
98 | + """ |
99 | + current_kernel_version = get_current_kern_ver() |
100 | + if current_kernel_version: |
101 | + # check the minimum kernel version requirement |
102 | + if current_kernel_version < POWERSTAT_MINIMUM_KERNEL_VERSION: |
103 | + raise InvalidIntelPowerstatConfigurationError( |
104 | + "unsupported kernel version: {}, need version higher than {}".format( |
105 | + current_kernel_version, POWERSTAT_MINIMUM_KERNEL_VERSION |
106 | + ) |
107 | + ) |
108 | + |
109 | + # check that the required modules are loaded |
110 | + kernel_version = get_intel_powerstat_kern_ver() |
111 | + if not kernel_version: |
112 | + raise InvalidIntelPowerstatConfigurationError( |
113 | + "Kernel version '{}' is not supported".format( |
114 | + ".".join(map(str, current_kernel_version)) |
115 | + ) |
116 | + ) |
117 | + |
118 | + for mod in POWERSTAT_KERNEL_MODULE_NAMES[kernel_version]: |
119 | + if not kernel.is_module_loaded(mod): |
120 | + raise InvalidIntelPowerstatConfigurationError( |
121 | + "required module '{}' is not loaded".format(mod) |
122 | + ) |
123 | + else: |
124 | + linux_release = platform.release() # format is like '5.4.0-73-generic' |
125 | + raise InvalidIntelPowerstatConfigurationError( |
126 | + "Incompatible platform.release output: {}".format(linux_release) |
127 | + ) |
128 | + return None |
129 | + |
130 | + |
131 | +def get_current_kern_ver(): |
132 | + # check that we meet the minimum kernel version |
133 | + linux_release = platform.release() # format is like '5.4.0-73-generic' |
134 | + re_kernel_version = r"^(\d+)\.(\d+)\.(\d+)" |
135 | + match = re.match(re_kernel_version, linux_release) |
136 | + |
137 | + if match: |
138 | + return tuple(int(d) for d in match.groups()) |
139 | + |
140 | + return None |
141 | + |
142 | + |
143 | +def get_intel_powerstat_kern_ver(): |
144 | + """ |
145 | + Return kernel version. |
146 | + |
147 | + Return which kernel version to use as key to parse |
148 | + POWERSTAT_KERNEL_MODULE_NAMES in order to check |
149 | + which modules need to be loaded. |
150 | + """ |
151 | + current_kernel_version = get_current_kern_ver() |
152 | + kernel_version = None |
153 | + if current_kernel_version >= (5, 6, 0): |
154 | + kernel_version = (5, 6, 0) |
155 | + elif current_kernel_version >= (5, 0, 0): |
156 | + kernel_version = (5, 0, 0) |
157 | + elif current_kernel_version >= (4, 0, 0): |
158 | + kernel_version = (4, 0, 0) |
159 | + |
160 | + return kernel_version |
161 | + |
162 | + |
163 | def get_cpu_cores(): |
164 | """Get the list of available cores for the cpu(s).""" |
165 | # should return something like ["0-23"] |
166 | @@ -473,6 +569,15 @@ def get_base_inputs(): |
167 | else: |
168 | intel_rdt_cores = None |
169 | |
170 | + # handle the Intel Powerstat collection parameters |
171 | + intel_powerstat = is_intel_powerstat_enabled() |
172 | + if intel_powerstat: |
173 | + package_metrics = config["package_metrics"] |
174 | + cpu_metrics = config["cpu_metrics"] |
175 | + else: |
176 | + package_metrics = None |
177 | + cpu_metrics = None |
178 | + |
179 | return { |
180 | "extra_options": extra_options["inputs"], |
181 | "bcache": is_bcache(), |
182 | @@ -485,6 +590,9 @@ def get_base_inputs(): |
183 | "ipmi_sensor": ipmi_sensor, |
184 | "intel_rdt": intel_rdt, |
185 | "intel_rdt_cores": intel_rdt_cores, |
186 | + "intel_powerstat": intel_powerstat, |
187 | + "package_metrics": package_metrics, |
188 | + "cpu_metrics": cpu_metrics, |
189 | } |
190 | |
191 | |
192 | @@ -893,6 +1001,36 @@ def configure_telegraf(): # noqa: C901 |
193 | else: |
194 | remove_sudoers_file(sudoers_filename) |
195 | |
196 | + if is_intel_powerstat_enabled(): |
197 | + hookenv.log("Intel Powerstat enabled, enabling modules and running checks") |
198 | + # load and persist the required module(s) |
199 | + kernel_version = get_intel_powerstat_kern_ver() |
200 | + try: |
201 | + for mod in POWERSTAT_KERNEL_MODULE_NAMES[kernel_version]: |
202 | + kernel.modprobe(mod, persist=True) |
203 | + except subprocess.CalledProcessError: |
204 | + error_msg = "modprobe {} failed".format(mod) |
205 | + hookenv.log(error_msg, level=hookenv.ERROR) |
206 | + hookenv.status_set("blocked", error_msg) |
207 | + return |
208 | + |
209 | + try: |
210 | + check_valid_intel_powerstat_configuration() |
211 | + except InvalidIntelPowerstatConfigurationError() as e: |
212 | + # on error we abort configuration and block the charm |
213 | + error_msg = "Intel Powerstat configuration failed." |
214 | + hookenv.log(e, level=hookenv.ERROR) |
215 | + hookenv.status_set("blocked", error_msg) |
216 | + return |
217 | + |
218 | + # change the permissions for POWERSTAT_ADD_PERMS_PATH |
219 | + if kernel_version >= POWERSTAT_KERNEL_VERSION_RESTRICTED_PERMS: |
220 | + cmd = ["chmod", "-R", "a+rx", POWERSTAT_ADD_PERMS_PATH] |
221 | + hookenv.log( |
222 | + "Setting permissions for Powerstat to work: {}".format(" ".join(cmd)) |
223 | + ) |
224 | + subprocess.call(cmd) |
225 | + |
226 | telegraf_exec_metrics = os.path.join(get_files_dir(), "telegraf_exec_metrics.py") |
227 | cmd = [ |
228 | telegraf_exec_metrics, |
229 | @@ -1072,6 +1210,12 @@ def handle_config_changes(): |
230 | else: |
231 | clear_flag("telegraf.intel_rdt.enabled") |
232 | |
233 | + # handle the Intel Powerstat metrics collection |
234 | + if is_intel_powerstat_enabled(): |
235 | + set_flag("telegraf.intel_powerstat.enabled") |
236 | + else: |
237 | + clear_flag("telegraf.intel_powerstat.enabled") |
238 | + |
239 | clear_flag("telegraf.configured") |
240 | clear_flag("telegraf.apt.configured") |
241 | clear_flag("telegraf.snap.configured") |
242 | diff --git a/src/templates/base_inputs.conf b/src/templates/base_inputs.conf |
243 | index 9e178be..117acdd 100644 |
244 | --- a/src/templates/base_inputs.conf |
245 | +++ b/src/templates/base_inputs.conf |
246 | @@ -179,6 +179,18 @@ use_sudo = true |
247 | {%- endif %} |
248 | {% endif %} |
249 | |
250 | +{% if "intel_powerstat" not in disabled_plugins %} |
251 | +{% if intel_powerstat -%} |
252 | +[[inputs.intel_powerstat]] |
253 | +{% if package_metrics %} |
254 | +package_metrics = {{ package_metrics.split(',') | tojson }} |
255 | +{% endif %} |
256 | +{% if cpu_metrics %} |
257 | +cpu_metrics = {{ cpu_metrics.split(',') | tojson }} |
258 | +{% endif %} |
259 | +{%- endif %} |
260 | +{% endif %} |
261 | + |
262 | [[inputs.exec]] |
263 | commands = [ |
264 | "/usr/bin/awk '{ print $1 }' /proc/sys/fs/file-nr" |
265 | diff --git a/src/tests/unit/test_telegraf.py b/src/tests/unit/test_telegraf.py |
266 | index 2ba2a0e..0493a94 100644 |
267 | --- a/src/tests/unit/test_telegraf.py |
268 | +++ b/src/tests/unit/test_telegraf.py |
269 | @@ -2044,6 +2044,38 @@ def test_check_valid_intel_rdt_configuration_pqos(monkeypatch): |
270 | telegraf.check_valid_intel_rdt_configuration() |
271 | |
272 | |
273 | +def test_collect_intel_powerstat_metrics(monkeypatch, config): |
274 | + monkeypatch.setattr(telegraf, "is_container", lambda: False) |
275 | + monkeypatch.setattr(telegraf, "is_metal", lambda: True) |
276 | + config["collect_intel_powerstat_metrics"] = True |
277 | + config["package_metrics"] = "current_power_consumption,thermal_design_power" |
278 | + config["cpu_metrics"] = "cpu_frequency" |
279 | + monkeypatch.setattr( |
280 | + telegraf, "check_valid_intel_powerstat_configuration", lambda: "" |
281 | + ) |
282 | + monkeypatch.setattr(kernel, "modprobe", lambda module, persist: "msr") |
283 | + telegraf.configure_telegraf() |
284 | + |
285 | + expected = """ |
286 | +[[inputs.intel_powerstat]] |
287 | +package_metrics = ["current_power_consumption", "thermal_design_power"] |
288 | +cpu_metrics = ["cpu_frequency"] |
289 | +""" |
290 | + config_file = base_dir().join("telegraf.conf") |
291 | + assert expected in config_file.read() |
292 | + |
293 | + |
294 | +def test_check_valid_intel_powerstat_configuration_kernel_version(monkeypatch): |
295 | + monkeypatch.setattr(telegraf, "is_container", lambda: False) |
296 | + monkeypatch.setattr(telegraf, "is_metal", lambda: True) |
297 | + monkeypatch.setattr(platform, "release", lambda: "3.3.0-73-generic") |
298 | + with pytest.raises( |
299 | + telegraf.InvalidIntelPowerstatConfigurationError, |
300 | + match="unsupported kernel version", |
301 | + ): |
302 | + telegraf.check_valid_intel_powerstat_configuration() |
303 | + |
304 | + |
305 | def test_configure_tls(monkeypatch): |
306 | tnp_key_location = pathlib.Path("/tmp/fake-key") |
307 | tmp_cert_location = pathlib.Path("/tmp/fake-crt") |
This merge proposal is being monitored by mergebot. Change the status to Approved to merge.