Telegraf Charm

Merge ~jfguedez/charm-telegraf:feature/intel-cmt-cat into charm-telegraf:master

Proposed by Jose Guedez on 2021-07-01

Status:	Merged
Approved by:	Xav Paice on 2021-07-21
Approved revision:	ed52df526595684011a7df9946549e34519ce560
Merged at revision:	fb9446ae1a98b8b25478dfb0048a07ed0ec59bda
Proposed branch:	~jfguedez/charm-telegraf:feature/intel-cmt-cat
Merge into:	charm-telegraf:master
Diff against target:	1061 lines (+854/-28) 6 files modified src/config.yaml (+18/-1) src/reactive/telegraf.py (+199/-23) src/templates/base_inputs.conf (+7/-0) src/templates/dashboards/grafana/IntelRDT.json.j2 (+508/-0) src/templates/sudoers/telegraf_intel_rdt.tmpl (+4/-0) src/tests/unit/test_telegraf.py (+118/-4)
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
🤖 prod-jenkaas-bootstack	continuous-integration		Approve on 2021-07-16
Celia Wang		2021-07-01	Approve on 2021-07-16
Joe Guo (community)			Needs Fixing on 2021-07-14
Junien F			Approve on 2021-07-05
Edin S (community)			Approve on 2021-07-02
Canonical IS Reviewers		2021-07-01	Pending
Review via email: mp+405064@code.launchpad.net

Commit message

Add support for Memory Bandwidth Monitoring (Intel RDT)

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2021-07-01:

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-01:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-01:

FAILED: Continuous integration, rev:e9451b441b29327ad9f48df2dce179d208b8afc8
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/10/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/68/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/265/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/10//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Jose Guedez (jfguedez) wrote on 2021-07-02 (last edit on 2021-07-02):

It seems that the CI might have some issues. I don't think it has had a successful build yet. The failing tests are unrelated to the changes afaict.

When I run the unit tests locally there are no failures before/after the change fwiw - https://pastebin.canonical.com/p/hQ8wMrQ5WV/

Revision history for this message

James Troup (elmo) wrote on 2021-07-02:

LGTM, one minor comment inline.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-02:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

Jose Guedez (jfguedez) wrote on 2021-07-02:

@James Troup. Thanks, I replied inline and will be adding the comment back

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-02:

FAILED: Continuous integration, rev:913ac81e42365a2b55c5f6859c11b7c802836153
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/15/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/73/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/270/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/15//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-02:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

Edin S (exsdev) wrote on 2021-07-02:

LGTM

review: Approve

Revision history for this message

Jose Guedez (jfguedez) wrote on 2021-07-02:

All unit tests pass - https://pastebin.ubuntu.com/p/ddG4SJV2YJ/
There's an issue with the CI that is being addressed in https://code.launchpad.net/~xavpaice/charm-telegraf/+git/charm-telegraf/+merge/405068

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-02:

FAILED: Continuous integration, rev:f549208353e42fb97c3c5ba65ef40bed108a9e71
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/16/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/74/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/271/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/16//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Junien F (axino) wrote on 2021-07-02:

See below about the sudoers file - thanks !

review: Needs Fixing

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-05:

for function `check_valid_intel_rdt_configuration`, a few issues:

1) to report issues, it used both exception and string message, maybe just use one way. I will prefer exception.

2) it returns empty str (false) as ok, which maybe misleading or be misused.

3) for kernel version compare, I noticed[0] there is version like `5.13`?

Can we also use the `fetch.apt_pkg.version_compare` to do this ?

[0]: https://en.wikipedia.org/wiki/Linux_kernel_version_history

review: Needs Fixing

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-05:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-05:

FAILED: Continuous integration, rev:c3b6d8aaf372e3e8884bc2248cf16c15cbe35e9e
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/21/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/83/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/282/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/21//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Jose Guedez (jfguedez) wrote on 2021-07-05:

@axino:

Thanks, in this case the plugin executes the sudo command only once when the telegraf service starts. However, I did add the extra commands to the sudoers file to avoid logging the command. Please take a look again.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-05:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-05:

FAILED: Continuous integration, rev:6b146fc34faa36114bddd9c7701c45417606f69b
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/22/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/84/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/283/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/22//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Jose Guedez (jfguedez) wrote on 2021-07-05 (last edit on 2021-07-05):

@ guoqiao

Thanks, please see comments inline. Addressed in the latest push.

> for function `check_valid_intel_rdt_configuration`, a few issues:
>
> 1) to report issues, it used both exception and string message, maybe just use
> one way. I will prefer exception.
>
> 2) it returns empty str (false) as ok, which maybe misleading or be misused.
>

I had originally wanted to use the exception as a separate mechanism, but I can see how it would be confusing. Definitely agree with the empty string, so I switched it to use exceptions.

> 3) for kernel version compare, I noticed[0] there is version like `5.13`?
>
> Can we also use the `fetch.apt_pkg.version_compare` to do this ?
>
> [0]: https://en.wikipedia.org/wiki/Linux_kernel_version_history

According to [0], there is always a 3rd number. However, it seems that at least in Ubuntu it's always zero so is has no meaning, as it doesn't match the third digit from upstream. You can see the full table of ubuntu/upstream here [1], they all seem to have the 3 number. The first two numbers (major, minor) always match the kernel version so I changed the validation to use only those (e.g. 5.4), which should be enough for our purposes here.

As to using the apt_pkg version for this, you could have multiple kernels installed, some good, some bad (for example in bionic you need the HWE kernel) so it is more reliable to use the version of the running kernel.

I believe the comments/changes address the issues you brought up. Please take a look again, thanks.

[0] https://ubuntu.com/kernel
[1] https://people.canonical.com/~kernel/info/kernel-version-map.html

Revision history for this message

Junien F (axino) wrote on 2021-07-05:

Thanks for the sudoers change !

review: Approve

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-05:

Hi Jose,

Thanks for the quick change, another small question:

In doc[0], it mentioned the required minimal pqos version is `4.0.0`.
But here in code we are using `RDT_MINIMUM_PKG_VERSION = "4.1-1ppa3"`.

I understand we have to use ppa to backport for boinic, but as my understanding, is will more generic and reliable to use `4.0.0` here ?

[0]: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_rdt/README.md

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-06:

Jose has explained the version issue in chat. +1.

review: Approve

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-06:

+1, but worth noting:

according to doc[0], so far telegraf can not stop the rdt plugin with sudo=true.

2 potential solutions are suggested there.

Before the final solution on telegraf side is released, we may need to provide workaround for the charm to work.

[0]: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_rdt/README.md

review: Approve

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-14:

Hi Jose:

I am doing some testing with this patch in a lxd container (for conditions unmet case), I noticed the `kernel.modprobe` will raise exception: https://pastebin.canonical.com/p/xYBJXsSqZH/

Instead of creating a new patch, I am wondering could you apply following change to your code and re-push, so we can keep the review history here, please ?

diff --git a/src/reactive/telegraf.py b/src/reactive/telegraf.py
index 0b1e71d..8bfd801 100644
--- a/src/reactive/telegraf.py
+++ b/src/reactive/telegraf.py
@@ -807,7 +807,14 @@ def configure_telegraf(): # noqa: C901
     if config["collect_intel_rdt_metrics"]:
         hookenv.log("Intel RDT enabled, enabling module and running checks")
         # load and persist the required module
- kernel.modprobe(RDT_KERNEL_MODULE_NAME, persist=True)
+ try:
+ kernel.modprobe(RDT_KERNEL_MODULE_NAME, persist=True)
+ except subprocess.CalledProcessError:
+ error_msg = "modprobe {} failed".format(RDT_KERNEL_MODULE_NAME)
+ hookenv.log(error_msg, level=hookenv.ERROR)
+ hookenv.status_set("blocked", error_msg)
+ return
+
         try:
             check_valid_intel_rdt_configuration()
         except InvalidIntelRDTConfiguration as e:

review: Needs Fixing

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-14:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-14:

FAILED: Continuous integration, rev:38b6c858a18b1a896e0cd78e137e10f8b222d5db
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/25/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/94/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/302/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/25//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-14:

@jfguedez CI failed and there is unresolved merge conflict in code.

review: Needs Fixing

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-14:

Re: the `modprobe msr` failure in lxc/lxd, I am able to reproduce it with:

lxc launch ubuntu:20.04 ubuntu
lxc exec ubuntu -- bash
root@ubuntu:~# modprobe msr
modprobe: FATAL: Module msr not found in directory /lib/modules/5.8.0-59-generic

Revision history for this message

Celia Wang (ziyiwang) on 2021-07-15:

review: Needs Fixing

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

FAILED: Continuous integration, rev:66114d61675adfcce90f52255d84bf24015504c1
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/26/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/96/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/304/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/26//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-16:

new changes pushed:

1) rebased against mater to resolve conflicts.
2) block charm if collect_intel_rdt_metrics enabled but `is_container` returns true
3) rename rdt option `sudo` to `use_sudo`

for 3), the purpose is to be consistent with the existing plugins.
upstream patch: https://github.com/influxdata/telegraf/pull/9501

new ppa built with above patch:

ppa:guoqiao/telegraf or https://launchpad.net/~guoqiao/+archive/ubuntu/telegraf

to use ppa:

juju config telegraf install_sources="[ppa:guoqiao/telegraf, ppa:canonical-bootstack/public]"

New review appreciated !

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

FAILED: Continuous integration, rev:0e452420071d238f38b0538cce7fc40229b8220a
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/27/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/97/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/307/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/27//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Joe Guo (guoqiao) wrote on 2021-07-16:

Unit tests works on local machine but failed on CI.

I have triggered another CI job on master to see how it works:

https://code.launchpad.net/~guoqiao/charm-telegraf/+git/charm-telegraf/+merge/405790

Revision history for this message

Celia Wang (ziyiwang) wrote on 2021-07-16:

lgtm

review: Approve

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

FAILED: Continuous integration, rev:1f2837854757dc1a8613e4c92a97a0d56193cc10
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/33/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/103/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/313/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/33//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

FAILED: Continuous integration, rev:1f2837854757dc1a8613e4c92a97a0d56193cc10
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/34/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/104/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/314/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/34//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

FAILED: Continuous integration, rev:ed52df526595684011a7df9946549e34519ce560
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/35/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/105/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/315/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/35//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2021-07-16:

PASSED: Continuous integration, rev:ed52df526595684011a7df9946549e34519ce560
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/36/
Executed test runs:
SUCCESS: https://jenkins.canonical.com/bootstack/job/lp-charm-test-unit-withcompiler/106/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/316/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-telegraf-ci/36//rebuild

review: Approve (continuous-integration)

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2021-07-21:

Change successfully merged at revision fb9446ae1a98b8b25478dfb0048a07ed0ec59bda

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Jose Guedez

 diff --git a/src/config.yaml b/src/config.yaml
 index e4f6f04..eee3fbe 100644
 --- a/src/config.yaml
 +++ b/src/config.yaml
@@ -233,4 +233,21 @@ options:
      description: >
          Enable the collection of IPMI sensor metrics, using the ipmi sensor telegraf
          input plugin. Collecting these metrics requires sudo access - enabling
--        this option will install an appropriate, locked-down sudoers file.
 \ No newline at end of file
++        this option will install an appropriate, locked-down sudoers file.
++  collect_intel_rdt_metrics:
++    default: false
++    type: boolean
++    description: >
++        Enable the collection of Intel memory bandwidth metrics, using the
++        telegraf intel_rdt input plugin. Collecting these metrics requires sudo
++        access - enabling this option will install an appropriate, locked-down
++        sudoers file.
++        .
++        There are certain requisites to run this plugin, including having a
++        kernel >= v5.4, Intel RDT tools >= v4.1 available in a repository, and
++        a supported CPU (as reported by the Intel utility `pqos`)
++        .
++        Currently the charm will configure monitoring of all detected cores.
++        .
++        See https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_rdt/README.md
++        for info on the telegraf intel_rdt plugin.
 diff --git a/src/reactive/telegraf.py b/src/reactive/telegraf.py
 index ef89a47..7a19d16 100644
 --- a/src/reactive/telegraf.py
 +++ b/src/reactive/telegraf.py
@@ -22,6 +22,7 @@ import io
  import ipaddress
  import json
  import os
++import platform
  import re
  import socket
  import subprocess
@@ -29,9 +30,9 @@ import sys
  import time
  from distutils.version import LooseVersion
--from charmhelpers import context
++from charmhelpers import context, fetch
  from charmhelpers.contrib.charmsupport import nrpe
--from charmhelpers.core import hookenv, host, unitdata
++from charmhelpers.core import hookenv, host, kernel, unitdata
  from charmhelpers.core.host import is_container
  from charmhelpers.core.templating import render
@@ -67,9 +68,18 @@ CONFIG_FILE = "telegraf.conf"
  CONFIG_DIR = "telegraf.d"
--GRAFANA_DASHBOARD_TELEGRAF_FILE_NAME = "Telegraf.json.j2"
--
--GRAFANA_DASHBOARD_NAME = "telegraf"
++GRAFANA_DASHBOARD_CONFIG = {
++    "telegraf": {
++        "template_file": "Telegraf.json.j2",
++        "context_vars": {
++            # TODO: Figure out if metrics exist and then set bools accordingly.
++            # For now, setting bools to true.
++            "bonds_enabled": True,
++            "bcache_enabled": True,
++            "conntrack_enabled": True,
++        },
++    },
++}
  SNAP_SERVICE = "snap.telegraf.telegraf"
  DEB_SERVICE = "telegraf"
@@ -83,6 +93,11 @@ DEB_USER = "telegraf"
  # Utilities #
++# constants related to RDT metrics support
++RDT_MINIMUM_KERNEL_VERSION = (5, 4)
++RDT_MINIMUM_PKG_VERSION = "4.1-1ppa3"
++RDT_KERNEL_MODULE_NAME = "msr"
++
  class InvalidInstallMethodError(Exception):
      pass
@@ -92,6 +107,10 @@ class InvalidPrometheusIPRangeError(Exception):
      pass
++class InvalidIntelRDTConfigurationError(Exception):
++    pass
++
++
  def write_telegraf_file(path, content):
      return host.write_file(
          path,
@@ -283,6 +302,96 @@ def get_remote_unit_name():
                      return rel["__unit__"]
++def check_valid_intel_rdt_configuration():
++    """
++    Check that the requirements for RDT are met.
++
++    Will raise a InvalidIntelRDTConfigurationError exception when a validation
++    issue is encountered, otherwise None
++    """
++    # check that we meet the minimum kernel version
++    linux_release = platform.release()  # format is like '5.4.0-73-generic'
++    re_kernel_version = r"^(\d+)\.(\d+)"
++    match = re.match(re_kernel_version, linux_release)
++
++    if match:
++        current_kernel_version = tuple(int(d) for d in match.groups())
++        if current_kernel_version < RDT_MINIMUM_KERNEL_VERSION:
++            raise InvalidIntelRDTConfigurationError(
++                "unsupported kernel version: {}, need version higher than {}".format(
++                    current_kernel_version, RDT_MINIMUM_KERNEL_VERSION
++                )
++            )
++    else:
++        raise InvalidIntelRDTConfigurationError(
++            "Incompatible platform.release output: {}".format(linux_release)
++        )
++
++    # check that package `intel-cmt-cat` is installed
++    current_pkg_version = fetch.get_installed_version("intel-cmt-cat")
++    if not current_pkg_version:
++        raise InvalidIntelRDTConfigurationError(
++            "package 'intel-cmt-cat' is not installed yet"
++        )
++
++    current_pkg_version_str = current_pkg_version["ver_str"]
++
++    # check that package `intel-cmt-cat` is recent enough
++    if (
++        fetch.apt_pkg.version_compare(current_pkg_version_str, RDT_MINIMUM_PKG_VERSION)
++        < 0  # noqa: W503
++    ):
++        base_error_msg = "package 'intel-cmt-cat' is older than required"
++        raise InvalidIntelRDTConfigurationError(
++            "{}: '{}' (installed '{}')".format(
++                base_error_msg, RDT_MINIMUM_PKG_VERSION, current_pkg_version_str
++            )
++        )
++
++    # check that the required module is loaded
++    if not kernel.is_module_loaded(RDT_KERNEL_MODULE_NAME):
++        raise InvalidIntelRDTConfigurationError(
++            "required module '{}' is not loaded".format(RDT_KERNEL_MODULE_NAME)
++        )
++
++    # check that the `pqos` utility reports no issues
++    # this performs a sanity check on the RDT utility configuration
++    command = ["sudo", "pqos", "-d"]
++    try:
++        subprocess.check_call(command)
++        # this performs a sanity check on the RDT utility configuration
++    except subprocess.CalledProcessError as error:
++        hookenv.log(
++            "pqos -d call failed:\n{}".format(error.output.decode("utf8")),
++            level=hookenv.ERROR,
++        )
++        raise InvalidIntelRDTConfigurationError("pqos -d failed, see logs for details")
++
++    return None
++
++
++def get_cpu_cores():
++    """Get the list of available cores for the cpu(s)."""
++    # should return something like ["0-23"]
++    command = ["lscpu", "--json"]
++    try:
++        lscpu_output = subprocess.check_output(command).decode("utf8")
++    except subprocess.CalledProcessError as error:
++        hookenv.log(
++            "lscpu call failed:\n{}".format(error.output.decode("utf8")),
++            level=hookenv.ERROR,
++        )
++        raise error
++
++    lscpu_json = json.loads(lscpu_output)
++
++    for data_pair in lscpu_json["lscpu"]:
++        if data_pair["field"] == "On-line CPU(s) list:":
++            return '["{}"]'.format(data_pair["data"])
++
++    raise Exception("Incompatible lscpu output: {}".format(lscpu_output))
++
++
  def get_disabled_plugins():
      """Return consolidated list of all plugins to be disabled."""
      config = hookenv.config()
@@ -324,6 +433,13 @@ def get_base_inputs():
      ipmi_sensor = config["collect_ipmi_sensor_metrics"]
      disabled_plugins = get_disabled_plugins()
++    # handle the Intel RDT collection parameters
++    intel_rdt = config["collect_intel_rdt_metrics"]
++    if intel_rdt:
++        intel_rdt_cores = get_cpu_cores()
++    else:
++        intel_rdt_cores = None
++
      return {
          "extra_options": extra_options["inputs"],
          "bcache": is_bcache(),
@@ -334,6 +450,8 @@ def get_base_inputs():
          "iptables": iptables,
          "smart": smart,
          "ipmi_sensor": ipmi_sensor,
++        "intel_rdt": intel_rdt,
++        "intel_rdt_cores": intel_rdt_cores,
+     }
@@ -694,6 +812,38 @@ def configure_telegraf():  # noqa: C901
      else:
          remove_sudoers_file(sudoers_filename)
++    # handle the configuration of intel_rdt
++    sudoers_filename = "telegraf_intel_rdt"
++    if config["collect_intel_rdt_metrics"]:
++        hookenv.log("Intel RDT enabled, enabling module and running checks")
++
++        if is_container():
++            error_msg = "Intel RDT can not be enabled in container"
++            hookenv.log(error_msg, level=hookenv.WARNING)
++            hookenv.status_set("blocked", error_msg)
++            return
++
++        # load and persist the required module
++        try:
++            kernel.modprobe(RDT_KERNEL_MODULE_NAME, persist=True)
++        except subprocess.CalledProcessError:
++            error_msg = "modprobe {} failed".format(RDT_KERNEL_MODULE_NAME)
++            hookenv.log(error_msg, level=hookenv.ERROR)
++            hookenv.status_set("blocked", error_msg)
++            return
++
++        try:
++            check_valid_intel_rdt_configuration()
++        except InvalidIntelRDTConfigurationError as e:
++            # on error we abort configuration and block the charm
++            error_msg = "Cannot configure Intel RDT: {}".format(e)
++            hookenv.log(error_msg, level=hookenv.ERROR)
++            hookenv.status_set("blocked", error_msg)
++            return
++        render_sudoers_file(sudoers_filename)
++    else:
++        remove_sudoers_file(sudoers_filename)
++
      telegraf_exec_metrics = os.path.join(get_files_dir(), "telegraf_exec_metrics.py")
      cmd = [
          telegraf_exec_metrics,
@@ -720,7 +870,12 @@ def configure_telegraf():  # noqa: C901
      for service in [DEB_SERVICE, SNAP_SERVICE]:
          if service == get_service():
              host.service_resume(service)
--            host.service_reload(service)
++            # skip reload when Intel RDT is enabled, as it stops the plugin from
++            # publishing data. The service will be restarted via the flag
++            # "telegraf.needs_reload" on changes later
++            if not config["collect_intel_rdt_metrics"]:
++                hookenv.log("reloading service: {}".format(service), level="DEBUG")
++                host.service_reload(service)
          else:
              try:
                  host.service_pause(service)
@@ -846,6 +1001,13 @@ def handle_config_changes():
      ):
          clear_flag("plugins.prometheus-client.configured")
          clear_flag("prometheus-client.relation.configured")
++
++    # handle the Intel RDT/MBM metrics collection
++    if config.get("collect_intel_rdt_metrics"):
++        set_flag("telegraf.intel_rdt.enabled")
++    else:
++        clear_flag("telegraf.intel_rdt.enabled")
++
      clear_flag("telegraf.configured")
      clear_flag("telegraf.apt.configured")
      clear_flag("telegraf.snap.configured")
@@ -1556,32 +1718,40 @@ def prometheus_client_departed():
+ )
  @when_not("grafana.configured")
  def register_grafana_dashboard():
++    config = hookenv.config()
      grafana = endpoint_from_flag("endpoint.dashboards.joined")
--    hookenv.log("Loading grafana dashboard", level=hookenv.DEBUG)
--    dashboard = _load_grafana_dashboard()
--    digest = hashlib.md5(dashboard.encode("utf8")).hexdigest()
--    dashboard_dict = json.loads(dashboard)
--    dashboard_dict["digest"] = digest
--    hookenv.log(
--        "Rendered dashboard dict:\n{}".format(dashboard_dict), level=hookenv.DEBUG
--    )
--    grafana.register_dashboard(name=GRAFANA_DASHBOARD_NAME, dashboard=dashboard_dict)
--    hookenv.log('Grafana dashboard "{}" registered.'.format(GRAFANA_DASHBOARD_NAME))
++    grafana_dashboard_config = GRAFANA_DASHBOARD_CONFIG.copy()
++
++    # if RDT is enabled inject the relevant dashboard config
++    if config["collect_intel_rdt_metrics"]:
++        grafana_dashboard_config["Intel RDT"] = {"template_file": "IntelRDT.json.j2"}
++
++    # process all the configured dashboards
++    for dashboard_name, dashboard_data in grafana_dashboard_config.items():
++        hookenv.log(
++            "Loading grafana dashboard: {}".format(dashboard_name), level=hookenv.DEBUG
++        )
++        dashboard = _load_grafana_dashboard(dashboard_data)
++        digest = hashlib.md5(dashboard.encode("utf8")).hexdigest()
++        dashboard_dict = json.loads(dashboard)
++        dashboard_dict["digest"] = digest
++        hookenv.log(
++            "Rendered dashboard dict:\n{}".format(dashboard_dict), level=hookenv.DEBUG
++        )
++        grafana.register_dashboard(name=dashboard_name, dashboard=dashboard_dict)
++        hookenv.log('Grafana dashboard "{}" registered.'.format(dashboard_name))
++
      set_flag("grafana.configured")
--def _load_grafana_dashboard():
++def _load_grafana_dashboard(dashboard_data):
      prometheus_datasource = "{} - Juju generated source".format(
          hookenv.config().get("prometheus_datasource", "prometheus")
+     )
      dashboard_context = dict(datasource=prometheus_datasource)
--    # TODO: Figure out if metrics exist and then set bools accordingly.
--    # For now, setting bools to true.
--    dashboard_context["bonds_enabled"] = True
--    dashboard_context["bcache_enabled"] = True
--    dashboard_context["conntrack_enabled"] = True
++    dashboard_context.update(dashboard_data.get("context_vars", {}))
      return render_custom(
--        source=GRAFANA_DASHBOARD_TELEGRAF_FILE_NAME,
++        source=dashboard_data["template_file"],
          render_context=dashboard_context,
          variable_start_string="<<",
          variable_end_string=">>",
@@ -1731,3 +1901,9 @@ def configure_nagios(nagios):
  @when_not("apt.nvme-cli.installed")
  def install_smart_metrics_packages():
      apt.queue_install(["smartmontools", "nvme-cli"])
++
++
++@when("telegraf.intel_rdt.enabled")
++@when_not("apt.installed.intel-cmt-cat")
++def install_intel_rdt_packages():
++    apt.queue_install(["intel-cmt-cat"])
 diff --git a/src/templates/base_inputs.conf b/src/templates/base_inputs.conf
 index 7f42543..8feea45 100644
 --- a/src/templates/base_inputs.conf
 +++ b/src/templates/base_inputs.conf
@@ -170,6 +170,13 @@ use_sudo = true
  {%- endif %}
  {% endif %}
++{% if "intel_rdt" not in disabled_plugins %}
++{% if intel_rdt -%}
++[[inputs.intel_rdt]]
++cores = {{ intel_rdt_cores }}
++use_sudo = true
++{%- endif %}
++{% endif %}
  [[inputs.exec]]
  commands = [
 diff --git a/src/templates/dashboards/grafana/IntelRDT.json.j2 b/src/templates/dashboards/grafana/IntelRDT.json.j2
 new file mode 100644
 index 0000000..5a915c4
 --- /dev/null
 +++ b/src/templates/dashboards/grafana/IntelRDT.json.j2
@@ -0,0 +1,508 @@
++{
++  "annotations": {
++    "list": [
++      {
++        "builtIn": 1,
++        "datasource": "-- Grafana --",
++        "enable": true,
++        "hide": true,
++        "iconColor": "rgba(0, 211, 255, 1)",
++        "name": "Annotations & Alerts",
++        "type": "dashboard"
++      }
++    ]
++  },
++  "editable": true,
++  "gnetId": null,
++  "graphTooltip": 0,
++  "id": null,
++  "iteration": 1625126969668,
++  "links": [],
++  "panels": [
++    {
++      "collapsed": false,
++      "datasource": null,
++      "gridPos": {
++        "h": 1,
++        "w": 24,
++        "x": 0,
++        "y": 0
++      },
++      "id": 8,
++      "panels": [],
++      "title": "Memory Bandwidth",
++      "type": "row"
++    },
++    {
++      "datasource": "<< datasource >>",
++      "fieldConfig": {
++        "defaults": {
++          "color": {
++            "mode": "palette-classic"
++          },
++          "custom": {
++            "axisLabel": "MB/s",
++            "axisPlacement": "auto",
++            "barAlignment": 0,
++            "drawStyle": "line",
++            "fillOpacity": 0,
++            "gradientMode": "none",
++            "hideFrom": {
++              "legend": false,
++              "tooltip": false,
++              "viz": false
++            },
++            "lineInterpolation": "linear",
++            "lineWidth": 1,
++            "pointSize": 5,
++            "scaleDistribution": {
++              "type": "linear"
++            },
++            "showPoints": "auto",
++            "spanNulls": false,
++            "stacking": {
++              "group": "A",
++              "mode": "none"
++            },
++            "thresholdsStyle": {
++              "mode": "off"
++            }
++          },
++          "mappings": [],
++          "thresholds": {
++            "mode": "absolute",
++            "steps": [
++              {
++                "color": "green",
++                "value": null
++              },
++              {
++                "color": "red",
++                "value": 80
++              }
++            ]
++          }
++        },
++        "overrides": []
++      },
++      "gridPos": {
++        "h": 11,
++        "w": 24,
++        "x": 0,
++        "y": 1
++      },
++      "id": 2,
++      "options": {
++        "legend": {
++          "calcs": [],
++          "displayMode": "list",
++          "placement": "bottom"
++        },
++        "tooltip": {
++          "mode": "single"
++        }
++      },
++      "targets": [
++        {
++          "exemplar": true,
++          "expr": "{name=\"MBL\"}",
++          "interval": "",
++          "legendFormat": "{{ host }}",
++          "queryType": "randomWalk",
++          "refId": "Memory Bandwidth"
++        }
++      ],
++      "title": "MBL",
++      "type": "timeseries"
++    },
++    {
++      "datasource": "<< datasource >>",
++      "fieldConfig": {
++        "defaults": {
++          "color": {
++            "mode": "palette-classic"
++          },
++          "custom": {
++            "axisLabel": "MB/s",
++            "axisPlacement": "auto",
++            "barAlignment": 0,
++            "drawStyle": "line",
++            "fillOpacity": 0,
++            "gradientMode": "none",
++            "hideFrom": {
++              "legend": false,
++              "tooltip": false,
++              "viz": false
++            },
++            "lineInterpolation": "linear",
++            "lineWidth": 1,
++            "pointSize": 5,
++            "scaleDistribution": {
++              "type": "linear"
++            },
++            "showPoints": "auto",
++            "spanNulls": false,
++            "stacking": {
++              "group": "A",
++              "mode": "none"
++            },
++            "thresholdsStyle": {
++              "mode": "off"
++            }
++          },
++          "mappings": [],
++          "thresholds": {
++            "mode": "absolute",
++            "steps": [
++              {
++                "color": "green",
++                "value": null
++              },
++              {
++                "color": "red",
++                "value": 80
++              }
++            ]
++          }
++        },
++        "overrides": []
++      },
++      "gridPos": {
++        "h": 11,
++        "w": 24,
++        "x": 0,
++        "y": 12
++      },
++      "id": 11,
++      "options": {
++        "legend": {
++          "calcs": [],
++          "displayMode": "list",
++          "placement": "bottom"
++        },
++        "tooltip": {
++          "mode": "single"
++        }
++      },
++      "targets": [
++        {
++          "exemplar": true,
++          "expr": "{name=\"MBR\"}",
++          "interval": "",
++          "legendFormat": "{{ host }}",
++          "queryType": "randomWalk",
++          "refId": "Memory Bandwidth"
++        }
++      ],
++      "title": "MBR",
++      "type": "timeseries"
++    },
++    {
++      "datasource": "<< datasource >>",
++      "fieldConfig": {
++        "defaults": {
++          "color": {
++            "mode": "palette-classic"
++          },
++          "custom": {
++            "axisLabel": "MB/s",
++            "axisPlacement": "auto",
++            "barAlignment": 0,
++            "drawStyle": "line",
++            "fillOpacity": 0,
++            "gradientMode": "none",
++            "hideFrom": {
++              "legend": false,
++              "tooltip": false,
++              "viz": false
++            },
++            "lineInterpolation": "linear",
++            "lineWidth": 1,
++            "pointSize": 5,
++            "scaleDistribution": {
++              "type": "linear"
++            },
++            "showPoints": "auto",
++            "spanNulls": false,
++            "stacking": {
++              "group": "A",
++              "mode": "none"
++            },
++            "thresholdsStyle": {
++              "mode": "off"
++            }
++          },
++          "mappings": [],
++          "thresholds": {
++            "mode": "absolute",
++            "steps": [
++              {
++                "color": "green",
++                "value": null
++              },
++              {
++                "color": "red",
++                "value": 80
++              }
++            ]
++          }
++        },
++        "overrides": []
++      },
++      "gridPos": {
++        "h": 11,
++        "w": 24,
++        "x": 0,
++        "y": 23
++      },
++      "id": 10,
++      "options": {
++        "legend": {
++          "calcs": [],
++          "displayMode": "list",
++          "placement": "bottom"
++        },
++        "tooltip": {
++          "mode": "single"
++        }
++      },
++      "targets": [
++        {
++          "exemplar": true,
++          "expr": "{name=\"MBT\"}",
++          "interval": "",
++          "legendFormat": "{{ host }}",
++          "queryType": "randomWalk",
++          "refId": "Memory Bandwidth"
++        }
++      ],
++      "title": "MBT",
++      "type": "timeseries"
++    },
++    {
++      "collapsed": true,
++      "datasource": null,
++      "gridPos": {
++        "h": 1,
++        "w": 24,
++        "x": 0,
++        "y": 34
++      },
++      "id": 6,
++      "panels": [
++        {
++          "datasource": "<< datasource >>",
++          "fieldConfig": {
++            "defaults": {
++              "color": {
++                "mode": "palette-classic"
++              },
++              "custom": {
++                "axisLabel": "",
++                "axisPlacement": "auto",
++                "barAlignment": 0,
++                "drawStyle": "line",
++                "fillOpacity": 0,
++                "gradientMode": "none",
++                "hideFrom": {
++                  "legend": false,
++                  "tooltip": false,
++                  "viz": false
++                },
++                "lineInterpolation": "linear",
++                "lineWidth": 1,
++                "pointSize": 5,
++                "scaleDistribution": {
++                  "type": "linear"
++                },
++                "showPoints": "auto",
++                "spanNulls": false,
++                "stacking": {
++                  "group": "A",
++                  "mode": "none"
++                },
++                "thresholdsStyle": {
++                  "mode": "off"
++                }
++              },
++              "mappings": [],
++              "thresholds": {
++                "mode": "absolute",
++                "steps": [
++                  {
++                    "color": "green",
++                    "value": null
++                  },
++                  {
++                    "color": "red",
++                    "value": 80
++                  }
++                ]
++              },
++              "unit": "deckbytes"
++            },
++            "overrides": []
++          },
++          "gridPos": {
++            "h": 11,
++            "w": 24,
++            "x": 0,
++            "y": 2
++          },
++          "id": 4,
++          "options": {
++            "legend": {
++              "calcs": [],
++              "displayMode": "list",
++              "placement": "bottom"
++            },
++            "tooltip": {
++              "mode": "single"
++            }
++          },
++          "targets": [
++            {
++              "exemplar": true,
++              "expr": "rdt_metric{name=\"LLC\", host=\"$host\"}",
++              "interval": "",
++              "legendFormat": "{{ host }}",
++              "queryType": "randomWalk",
++              "refId": "A"
++            }
++          ],
++          "title": "LLC",
++          "type": "timeseries"
++        },
++        {
++          "datasource": "<< datasource >>",
++          "fieldConfig": {
++            "defaults": {
++              "color": {
++                "mode": "palette-classic"
++              },
++              "custom": {
++                "axisLabel": "",
++                "axisPlacement": "auto",
++                "barAlignment": 0,
++                "drawStyle": "line",
++                "fillOpacity": 0,
++                "gradientMode": "none",
++                "hideFrom": {
++                  "legend": false,
++                  "tooltip": false,
++                  "viz": false
++                },
++                "lineInterpolation": "linear",
++                "lineWidth": 1,
++                "pointSize": 5,
++                "scaleDistribution": {
++                  "type": "linear"
++                },
++                "showPoints": "auto",
++                "spanNulls": false,
++                "stacking": {
++                  "group": "A",
++                  "mode": "none"
++                },
++                "thresholdsStyle": {
++                  "mode": "off"
++                }
++              },
++              "mappings": [],
++              "thresholds": {
++                "mode": "absolute",
++                "steps": [
++                  {
++                    "color": "green",
++                    "value": null
++                  },
++                  {
++                    "color": "red",
++                    "value": 80
++                  }
++                ]
++              },
++              "unit": "short"
++            },
++            "overrides": []
++          },
++          "gridPos": {
++            "h": 11,
++            "w": 24,
++            "x": 0,
++            "y": 13
++          },
++          "id": 9,
++          "options": {
++            "legend": {
++              "calcs": [],
++              "displayMode": "list",
++              "placement": "bottom"
++            },
++            "tooltip": {
++              "mode": "single"
++            }
++          },
++          "targets": [
++            {
++              "exemplar": true,
++              "expr": "rdt_metric{name=\"LLC_Misses\", host=\"$host\"}",
++              "interval": "",
++              "legendFormat": "{{ host }}",
++              "queryType": "randomWalk",
++              "refId": "A"
++            }
++          ],
++          "title": "LLC Misses",
++          "type": "timeseries"
++        }
++      ],
++      "title": "Cache Occupancy",
++      "type": "row"
++    }
++  ],
++  "refresh": "",
++  "schemaVersion": 30,
++  "style": "dark",
++  "tags": [],
++  "templating": {
++    "list": [
++      {
++        "allValue": null,
++        "current": {
++          "selected": false,
++          "text": "controller:ubuntu-1",
++          "value": "controller:ubuntu-1"
++        },
++        "datasource": "<< datasource >>",
++        "definition": "label_values(host)",
++        "description": null,
++        "error": null,
++        "hide": 0,
++        "includeAll": false,
++        "label": null,
++        "multi": true,
++        "name": "host",
++        "options": [],
++        "query": {
++          "query": "label_values(host)",
++          "refId": "StandardVariableQuery"
++        },
++        "refresh": 1,
++        "regex": "",
++        "skipUrlSync": false,
++        "sort": 0,
++        "type": "query"
++      }
++    ]
++  },
++  "time": {
++    "from": "now-6h",
++    "to": "now"
++  },
++  "timepicker": {},
++  "timezone": "utc",
++  "title": "Intel RDT - Memory Bandwidth Monitoring",
++  "uid": "GWblKcRnd",
++  "version": 5
++}
 diff --git a/src/templates/sudoers/telegraf_intel_rdt.tmpl b/src/templates/sudoers/telegraf_intel_rdt.tmpl
 new file mode 100644
 index 0000000..0324a77
 --- /dev/null
 +++ b/src/templates/sudoers/telegraf_intel_rdt.tmpl
@@ -0,0 +1,4 @@
++Cmnd_Alias PQOS = /usr/sbin/pqos -r --iface-os --mon-file-type=csv --mon-interval=*
++{{ telegraf_user }} ALL=(root) NOPASSWD: PQOS
++Defaults!PQOS !logfile, !syslog, !pam_session
++
 diff --git a/src/tests/unit/test_telegraf.py b/src/tests/unit/test_telegraf.py
 index 4178a34..251a9b9 100644
 --- a/src/tests/unit/test_telegraf.py
 +++ b/src/tests/unit/test_telegraf.py
@@ -20,6 +20,7 @@ import getpass
  import grp
  import json
  import os
++import platform
  import shutil
  import subprocess
  import sys
@@ -27,9 +28,11 @@ from textwrap import dedent
  from unittest import mock
  from unittest.mock import MagicMock, call, patch
--from charmhelpers.core import host
++from charmhelpers import fetch
++from charmhelpers.core import host, kernel
  from charmhelpers.core.hookenv import Config
  from charmhelpers.core.templating import render
++from charmhelpers.fetch import apt_pkg
  import charms
  from charms.reactive import RelationBase, bus, helpers, set_flag
@@ -1465,7 +1468,9 @@ class TestGrafanaDashboard:
          mock_render,
      ):
          expected_datasource = "my_prometheus"
--        fake_config = dict(prometheus_datasource=expected_datasource)
++        fake_config = dict(
++            prometheus_datasource=expected_datasource, collect_intel_rdt_metrics=False
++        )
          expected_dashboard_context = dict(
              datasource="{} - Juju generated source".format(expected_datasource),
              bonds_enabled=True,
@@ -1484,15 +1489,19 @@ class TestGrafanaDashboard:
          mock_render.return_value = mock_rendered_content
          telegraf.register_grafana_dashboard()
++        dashboard_name = "telegraf"
++        dashboard_filename = telegraf.GRAFANA_DASHBOARD_CONFIG[dashboard_name][
++            "template_file"
++        ]
          mock_render.assert_called_once_with(
--            source=telegraf.GRAFANA_DASHBOARD_TELEGRAF_FILE_NAME,
++            source=dashboard_filename,
              render_context=expected_dashboard_context,
              variable_start_string="<<",
              variable_end_string=">>",
+         )
          mock_grafana.register_dashboard.assert_called_once_with(
--            name=telegraf.GRAFANA_DASHBOARD_NAME, dashboard=mock_dashboard_dict
++            name=dashboard_name, dashboard=mock_dashboard_dict
+         )
          mock_set_flag.assert_called_once_with("grafana.configured")
@@ -1617,3 +1626,108 @@ def test_collect_ipmi_sensor_metrics(monkeypatch, config):
  """
      config_file = base_dir().join("telegraf.conf")
      assert expected in config_file.read()
++
++
++def test_collect_intel_rdt_metrics(monkeypatch, config):
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    config["collect_intel_rdt_metrics"] = True
++    monkeypatch.setattr(telegraf, "get_cpu_cores", lambda: '["0-23"]')
++    monkeypatch.setattr(telegraf, "check_valid_intel_rdt_configuration", lambda: "")
++    monkeypatch.setattr(kernel, "modprobe", lambda module, persist: None)
++    telegraf.configure_telegraf()
++
++    expected = """
++[[inputs.intel_rdt]]
++cores = ["0-23"]
++use_sudo = true
++"""
++    config_file = base_dir().join("telegraf.conf")
++    assert expected in config_file.read()
++
++
++def test_get_cpu_cores(monkeypatch):
++    lscpu_output = """
++{
++   "lscpu": [
++      {"field": "Architecture:", "data": "x86_64"},
++      {"field": "On-line CPU(s) list:", "data": "0-23"}
++   ]
++}
++""".encode(
++        "utf8"
++    )
++    monkeypatch.setattr(subprocess, "check_output", lambda cmd: lscpu_output)
++    cores = telegraf.get_cpu_cores()
++    assert cores == '["0-23"]'
++
++
++def test_check_valid_intel_rdt_configuration_kernel_version(monkeypatch):
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    monkeypatch.setattr(platform, "release", lambda: "4.4.0-73-generic")
++    with pytest.raises(
++        telegraf.InvalidIntelRDTConfigurationError, match="unsupported kernel version"
++    ):
++        telegraf.check_valid_intel_rdt_configuration()
++
++
++def test_check_valid_intel_rdt_configuration_pkg_present(monkeypatch):
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    monkeypatch.setattr(platform, "release", lambda: "5.4.0-73-generic")
++    monkeypatch.setattr(fetch, "get_installed_version", lambda pkg: None)
++    with pytest.raises(
++        telegraf.InvalidIntelRDTConfigurationError,
++        match="package 'intel-cmt-cat' is not installed yet",
++    ):
++        telegraf.check_valid_intel_rdt_configuration()
++
++
++def test_check_valid_intel_rdt_configuration_pkg_version(monkeypatch):
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    monkeypatch.setattr(platform, "release", lambda: "5.4.0-73-generic")
++    monkeypatch.setattr(fetch, "get_installed_version", lambda pkg: {"ver_str": "0.0"})
++    monkeypatch.setattr(apt_pkg, "version_compare", lambda a, b: -1)
++    with pytest.raises(
++        telegraf.InvalidIntelRDTConfigurationError,
++        match="package 'intel-cmt-cat' is older than required",
++    ):
++        telegraf.check_valid_intel_rdt_configuration()
++
++
++def test_check_valid_intel_rdt_configuration_kernel_module(monkeypatch):
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    monkeypatch.setattr(platform, "release", lambda: "5.4.0-73-generic")
++    monkeypatch.setattr(kernel, "is_module_loaded", lambda module: False)
++    monkeypatch.setattr(
++        fetch,
++        "get_installed_version",
++        lambda pkg: {"ver_str": telegraf.RDT_MINIMUM_PKG_VERSION},
++    )
++    monkeypatch.setattr(apt_pkg, "version_compare", lambda a, b: 0)
++    with pytest.raises(
++        telegraf.InvalidIntelRDTConfigurationError,
++        match="required module",
++    ):
++        telegraf.check_valid_intel_rdt_configuration()
++
++
++def test_check_valid_intel_rdt_configuration_pqos(monkeypatch):
++    def mock_check_call(*args, **kwargs):
++        raise subprocess.CalledProcessError(
++            cmd="fake", returncode=1, output="fail".encode("utf8")
++        )
++
++    monkeypatch.setattr(telegraf, "is_container", lambda: False)
++    monkeypatch.setattr(platform, "release", lambda: "5.4.0-73-generic")
++    monkeypatch.setattr(kernel, "is_module_loaded", lambda module: True)
++    monkeypatch.setattr(
++        fetch,
++        "get_installed_version",
++        lambda pkg: {"ver_str": telegraf.RDT_MINIMUM_PKG_VERSION},
++    )
++    monkeypatch.setattr(apt_pkg, "version_compare", lambda a, b: 0)
++    monkeypatch.setattr(subprocess, "check_call", mock_check_call)
++    with pytest.raises(
++        telegraf.InvalidIntelRDTConfigurationError,
++        match="pqos -d failed",
++    ):
++        telegraf.check_valid_intel_rdt_configuration()

Telegraf Charm

Merge ~jfguedez/charm-telegraf:feature/intel-cmt-cat into charm-telegraf:master

Commit message

Description of the change

Preview Diff

Subscribers