Merge ~spencerrunde/landscape-charm:cos-integration into landscape-charm:main

Proposed by Spencer Runde
Status: Merged
Approved by: Spencer Runde
Approved revision: bbdc76a234cbb107120165a96b88a82dbc9553b5
Merge reported by: Spencer Runde
Merged at revision: bbdc76a234cbb107120165a96b88a82dbc9553b5
Proposed branch: ~spencerrunde/landscape-charm:cos-integration
Merge into: landscape-charm:main
Diff against target: 1050 lines (+941/-1)
5 files modified
lib/charms/grafana_agent/v0/cos_agent.py (+842/-0)
metadata.yaml (+2/-0)
requirements-dev.txt (+4/-0)
src/charm.py (+52/-0)
tests/test_charm.py (+41/-1)
Reviewer Review Type Date Requested Status
Mitch Burton Approve
Kevin Nasto Pending
Review via email: mp+454658@code.launchpad.net

Commit message

add basic COS integration

Description of the change

Add Grafana Agent library for basic level of integration. Currently supports logging through Loki. Creates symlinks for log files to work around limited support for log scraping in COS. Creates `/var/log/landscape-server` directory and sets permissions to syslog:landscape to work around a bug in which

Manual Testing:

Set up COS and Landscape Server:

1. Checkout branch
2. `cd landscape-charm` (if not already there)
3. `make build`
4. `juju deploy grafana-agent --channel edge`
5. `juju relate grafana-agent landscape-server`
6. Set up COS-Lite according to this guide: https://charmhub.io/prometheus-k8s/docs/deploy-cos-lite?channel=edge#heading--deploy-the-cos-lite-bundle
7. Follow Step 4 of the Grafana Machine Charm integration guide: https://charmhub.io/topics/canonical-observability-stack/tutorials/instrumenting-machine-charms#step-4-relate-grafana-agent-to-cos-lite-prometheus-loki-and-grafana
8. Navigate to Grafana dashboard - identify it using the COS-Lite guide again: https://charmhub.io/prometheus-k8s/docs/deploy-cos-lite?channel=edge#heading--browse-dashboards

Verify Logs:

1. Try retrieving logs in the dashboard by navigating to "Explore" (the compass in the left sidebar).
2. Select the Loki instance from the dropdown in the upper left
3. Use the "code" mode of the query constructor. Construct a query like {juju_application="landscape-server", filename="/var/log/landscape-server-appserver.log"} and confirm that service logs (appserver, message-system, pingserver, etc.) are being logged. May also check that syslog logs appear. Grafana should hint filenames that have appeared so you don't have to sift through too much.

To post a comment you must log in.
Revision history for this message
Mitch Burton (mitchburton) wrote :

Not the world's best installation experience, but I think it'll do for now/MVP. Tested it out on a microk8s cloud (after fighting for awhile with the setup there).

+1 LGTM with 1 minor inline comment.

review: Approve
bbdc76a... by Spencer Runde

add basic COS integration

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/charms/grafana_agent/v0/cos_agent.py b/lib/charms/grafana_agent/v0/cos_agent.py
2new file mode 100644
3index 0000000..d3130b2
4--- /dev/null
5+++ b/lib/charms/grafana_agent/v0/cos_agent.py
6@@ -0,0 +1,842 @@
7+# Copyright 2023 Canonical Ltd.
8+# See LICENSE file for licensing details.
9+
10+r"""## Overview.
11+
12+This library can be used to manage the cos_agent relation interface:
13+
14+- `COSAgentProvider`: Use in machine charms that need to have a workload's metrics
15+ or logs scraped, or forward rule files or dashboards to Prometheus, Loki or Grafana through
16+ the Grafana Agent machine charm.
17+
18+- `COSAgentConsumer`: Used in the Grafana Agent machine charm to manage the requirer side of
19+ the `cos_agent` interface.
20+
21+
22+## COSAgentProvider Library Usage
23+
24+Grafana Agent machine Charmed Operator interacts with its clients using the cos_agent library.
25+Charms seeking to send telemetry, must do so using the `COSAgentProvider` object from
26+this charm library.
27+
28+Using the `COSAgentProvider` object only requires instantiating it,
29+typically in the `__init__` method of your charm (the one which sends telemetry).
30+
31+The constructor of `COSAgentProvider` has only one required and nine optional parameters:
32+
33+```python
34+ def __init__(
35+ self,
36+ charm: CharmType,
37+ relation_name: str = DEFAULT_RELATION_NAME,
38+ metrics_endpoints: Optional[List[_MetricsEndpointDict]] = None,
39+ metrics_rules_dir: str = "./src/prometheus_alert_rules",
40+ logs_rules_dir: str = "./src/loki_alert_rules",
41+ recurse_rules_dirs: bool = False,
42+ log_slots: Optional[List[str]] = None,
43+ dashboard_dirs: Optional[List[str]] = None,
44+ refresh_events: Optional[List] = None,
45+ scrape_configs: Optional[Union[List[Dict], Callable]] = None,
46+ ):
47+```
48+
49+### Parameters
50+
51+- `charm`: The instance of the charm that instantiates `COSAgentProvider`, typically `self`.
52+
53+- `relation_name`: If your charmed operator uses a relation name other than `cos-agent` to use
54+ the `cos_agent` interface, this is where you have to specify that.
55+
56+- `metrics_endpoints`: In this parameter you can specify the metrics endpoints that Grafana Agent
57+ machine Charmed Operator will scrape. The configs of this list will be merged with the configs
58+ from `scrape_configs`.
59+
60+- `metrics_rules_dir`: The directory in which the Charmed Operator stores its metrics alert rules
61+ files.
62+
63+- `logs_rules_dir`: The directory in which the Charmed Operator stores its logs alert rules files.
64+
65+- `recurse_rules_dirs`: This parameters set whether Grafana Agent machine Charmed Operator has to
66+ search alert rules files recursively in the previous two directories or not.
67+
68+- `log_slots`: Snap slots to connect to for scraping logs in the form ["snap-name:slot", ...].
69+
70+- `dashboard_dirs`: List of directories where the dashboards are stored in the Charmed Operator.
71+
72+- `refresh_events`: List of events on which to refresh relation data.
73+
74+- `scrape_configs`: List of standard scrape_configs dicts or a callable that returns the list in
75+ case the configs need to be generated dynamically. The contents of this list will be merged
76+ with the configs from `metrics_endpoints`.
77+
78+
79+### Example 1 - Minimal instrumentation:
80+
81+In order to use this object the following should be in the `charm.py` file.
82+
83+```python
84+from charms.grafana_agent.v0.cos_agent import COSAgentProvider
85+...
86+class TelemetryProviderCharm(CharmBase):
87+ def __init__(self, *args):
88+ ...
89+ self._grafana_agent = COSAgentProvider(self)
90+```
91+
92+### Example 2 - Full instrumentation:
93+
94+In order to use this object the following should be in the `charm.py` file.
95+
96+```python
97+from charms.grafana_agent.v0.cos_agent import COSAgentProvider
98+...
99+class TelemetryProviderCharm(CharmBase):
100+ def __init__(self, *args):
101+ ...
102+ self._grafana_agent = COSAgentProvider(
103+ self,
104+ relation_name="custom-cos-agent",
105+ metrics_endpoints=[
106+ # specify "path" and "port" to scrape from localhost
107+ {"path": "/metrics", "port": 9000},
108+ {"path": "/metrics", "port": 9001},
109+ {"path": "/metrics", "port": 9002},
110+ ],
111+ metrics_rules_dir="./src/alert_rules/prometheus",
112+ logs_rules_dir="./src/alert_rules/loki",
113+ recursive_rules_dir=True,
114+ log_slots=["my-app:slot"],
115+ dashboard_dirs=["./src/dashboards_1", "./src/dashboards_2"],
116+ refresh_events=["update-status", "upgrade-charm"],
117+ scrape_configs=[
118+ {
119+ "job_name": "custom_job",
120+ "metrics_path": "/metrics",
121+ "authorization": {"credentials": "bearer-token"},
122+ "static_configs": [
123+ {
124+ "targets": ["localhost:9003"]},
125+ "labels": {"key": "value"},
126+ },
127+ ],
128+ },
129+ ]
130+ )
131+```
132+
133+### Example 3 - Dynamic scrape configs generation:
134+
135+Pass a function to the `scrape_configs` to decouple the generation of the configs
136+from the instantiation of the COSAgentProvider object.
137+
138+```python
139+from charms.grafana_agent.v0.cos_agent import COSAgentProvider
140+...
141+
142+class TelemetryProviderCharm(CharmBase):
143+ def generate_scrape_configs(self):
144+ return [
145+ {
146+ "job_name": "custom",
147+ "metrics_path": "/metrics",
148+ "static_configs": [{"targets": ["localhost:9000"]}],
149+ },
150+ ]
151+
152+ def __init__(self, *args):
153+ ...
154+ self._grafana_agent = COSAgentProvider(
155+ self,
156+ scrape_configs=self.generate_scrape_configs,
157+ )
158+```
159+
160+## COSAgentConsumer Library Usage
161+
162+This object may be used by any Charmed Operator which gathers telemetry data by
163+implementing the consumer side of the `cos_agent` interface.
164+For instance Grafana Agent machine Charmed Operator.
165+
166+For this purpose the charm needs to instantiate the `COSAgentConsumer` object with one mandatory
167+and two optional arguments.
168+
169+### Parameters
170+
171+- `charm`: A reference to the parent (Grafana Agent machine) charm.
172+
173+- `relation_name`: The name of the relation that the charm uses to interact
174+ with its clients that provides telemetry data using the `COSAgentProvider` object.
175+
176+ If provided, this relation name must match a provided relation in metadata.yaml with the
177+ `cos_agent` interface.
178+ The default value of this argument is "cos-agent".
179+
180+- `refresh_events`: List of events on which to refresh relation data.
181+
182+
183+### Example 1 - Minimal instrumentation:
184+
185+In order to use this object the following should be in the `charm.py` file.
186+
187+```python
188+from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
189+...
190+class GrafanaAgentMachineCharm(GrafanaAgentCharm)
191+ def __init__(self, *args):
192+ ...
193+ self._cos = COSAgentRequirer(self)
194+```
195+
196+
197+### Example 2 - Full instrumentation:
198+
199+In order to use this object the following should be in the `charm.py` file.
200+
201+```python
202+from charms.grafana_agent.v0.cos_agent import COSAgentConsumer
203+...
204+class GrafanaAgentMachineCharm(GrafanaAgentCharm)
205+ def __init__(self, *args):
206+ ...
207+ self._cos = COSAgentRequirer(
208+ self,
209+ relation_name="cos-agent-consumer",
210+ refresh_events=["update-status", "upgrade-charm"],
211+ )
212+```
213+"""
214+
215+import base64
216+import json
217+import logging
218+import lzma
219+from collections import namedtuple
220+from itertools import chain
221+from pathlib import Path
222+from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, List, Optional, Set, Union
223+
224+import pydantic
225+from cosl import JujuTopology
226+from cosl.rules import AlertRules
227+from ops.charm import RelationChangedEvent
228+from ops.framework import EventBase, EventSource, Object, ObjectEvents
229+from ops.model import Relation, Unit
230+from ops.testing import CharmType
231+
232+if TYPE_CHECKING:
233+ try:
234+ from typing import TypedDict
235+
236+ class _MetricsEndpointDict(TypedDict):
237+ path: str
238+ port: int
239+
240+ except ModuleNotFoundError:
241+ _MetricsEndpointDict = Dict # pyright: ignore
242+
243+LIBID = "dc15fa84cef84ce58155fb84f6c6213a"
244+LIBAPI = 0
245+LIBPATCH = 6
246+
247+PYDEPS = ["cosl", "pydantic < 2"]
248+
249+DEFAULT_RELATION_NAME = "cos-agent"
250+DEFAULT_PEER_RELATION_NAME = "peers"
251+DEFAULT_SCRAPE_CONFIG = {
252+ "static_configs": [{"targets": ["localhost:80"]}],
253+ "metrics_path": "/metrics",
254+}
255+
256+logger = logging.getLogger(__name__)
257+SnapEndpoint = namedtuple("SnapEndpoint", "owner, name")
258+
259+
260+class GrafanaDashboard(str):
261+ """Grafana Dashboard encoded json; lzma-compressed."""
262+
263+ # TODO Replace this with a custom type when pydantic v2 released (end of 2023 Q1?)
264+ # https://github.com/pydantic/pydantic/issues/4887
265+ @staticmethod
266+ def _serialize(raw_json: Union[str, bytes]) -> "GrafanaDashboard":
267+ if not isinstance(raw_json, bytes):
268+ raw_json = raw_json.encode("utf-8")
269+ encoded = base64.b64encode(lzma.compress(raw_json)).decode("utf-8")
270+ return GrafanaDashboard(encoded)
271+
272+ def _deserialize(self) -> Dict:
273+ try:
274+ raw = lzma.decompress(base64.b64decode(self.encode("utf-8"))).decode()
275+ return json.loads(raw)
276+ except json.decoder.JSONDecodeError as e:
277+ logger.error("Invalid Dashboard format: %s", e)
278+ return {}
279+
280+ def __repr__(self):
281+ """Return string representation of self."""
282+ return "<GrafanaDashboard>"
283+
284+
285+class CosAgentProviderUnitData(pydantic.BaseModel):
286+ """Unit databag model for `cos-agent` relation."""
287+
288+ # The following entries are the same for all units of the same principal.
289+ # Note that the same grafana agent subordinate may be related to several apps.
290+ # this needs to make its way to the gagent leader
291+ metrics_alert_rules: dict
292+ log_alert_rules: dict
293+ dashboards: List[GrafanaDashboard]
294+ subordinate: Optional[bool]
295+
296+ # The following entries may vary across units of the same principal app.
297+ # this data does not need to be forwarded to the gagent leader
298+ metrics_scrape_jobs: List[Dict]
299+ log_slots: List[str]
300+
301+ # when this whole datastructure is dumped into a databag, it will be nested under this key.
302+ # while not strictly necessary (we could have it 'flattened out' into the databag),
303+ # this simplifies working with the model.
304+ KEY: ClassVar[str] = "config"
305+
306+
307+class CosAgentPeersUnitData(pydantic.BaseModel):
308+ """Unit databag model for `peers` cos-agent machine charm peer relation."""
309+
310+ # We need the principal unit name and relation metadata to be able to render identifiers
311+ # (e.g. topology) on the leader side, after all the data moves into peer data (the grafana
312+ # agent leader can only see its own principal, because it is a subordinate charm).
313+ principal_unit_name: str
314+ principal_relation_id: str
315+ principal_relation_name: str
316+
317+ # The only data that is forwarded to the leader is data that needs to go into the app databags
318+ # of the outgoing o11y relations.
319+ metrics_alert_rules: Optional[dict]
320+ log_alert_rules: Optional[dict]
321+ dashboards: Optional[List[GrafanaDashboard]]
322+
323+ # when this whole datastructure is dumped into a databag, it will be nested under this key.
324+ # while not strictly necessary (we could have it 'flattened out' into the databag),
325+ # this simplifies working with the model.
326+ KEY: ClassVar[str] = "config"
327+
328+ @property
329+ def app_name(self) -> str:
330+ """Parse out the app name from the unit name.
331+
332+ TODO: Switch to using `model_post_init` when pydantic v2 is released?
333+ https://github.com/pydantic/pydantic/issues/1729#issuecomment-1300576214
334+ """
335+ return self.principal_unit_name.split("/")[0]
336+
337+
338+class COSAgentProvider(Object):
339+ """Integration endpoint wrapper for the provider side of the cos_agent interface."""
340+
341+ def __init__(
342+ self,
343+ charm: CharmType,
344+ relation_name: str = DEFAULT_RELATION_NAME,
345+ metrics_endpoints: Optional[List["_MetricsEndpointDict"]] = None,
346+ metrics_rules_dir: str = "./src/prometheus_alert_rules",
347+ logs_rules_dir: str = "./src/loki_alert_rules",
348+ recurse_rules_dirs: bool = False,
349+ log_slots: Optional[List[str]] = None,
350+ dashboard_dirs: Optional[List[str]] = None,
351+ refresh_events: Optional[List] = None,
352+ *,
353+ scrape_configs: Optional[Union[List[dict], Callable]] = None,
354+ ):
355+ """Create a COSAgentProvider instance.
356+
357+ Args:
358+ charm: The `CharmBase` instance that is instantiating this object.
359+ relation_name: The name of the relation to communicate over.
360+ metrics_endpoints: List of endpoints in the form [{"path": path, "port": port}, ...].
361+ This argument is a simplified form of the `scrape_configs`.
362+ The contents of this list will be merged with the contents of `scrape_configs`.
363+ metrics_rules_dir: Directory where the metrics rules are stored.
364+ logs_rules_dir: Directory where the logs rules are stored.
365+ recurse_rules_dirs: Whether to recurse into rule paths.
366+ log_slots: Snap slots to connect to for scraping logs
367+ in the form ["snap-name:slot", ...].
368+ dashboard_dirs: Directory where the dashboards are stored.
369+ refresh_events: List of events on which to refresh relation data.
370+ scrape_configs: List of standard scrape_configs dicts or a callable
371+ that returns the list in case the configs need to be generated dynamically.
372+ The contents of this list will be merged with the contents of `metrics_endpoints`.
373+ """
374+ super().__init__(charm, relation_name)
375+ dashboard_dirs = dashboard_dirs or ["./src/grafana_dashboards"]
376+
377+ self._charm = charm
378+ self._relation_name = relation_name
379+ self._metrics_endpoints = metrics_endpoints or []
380+ self._scrape_configs = scrape_configs or []
381+ self._metrics_rules = metrics_rules_dir
382+ self._logs_rules = logs_rules_dir
383+ self._recursive = recurse_rules_dirs
384+ self._log_slots = log_slots or []
385+ self._dashboard_dirs = dashboard_dirs
386+ self._refresh_events = refresh_events or [self._charm.on.config_changed]
387+
388+ events = self._charm.on[relation_name]
389+ self.framework.observe(events.relation_joined, self._on_refresh)
390+ self.framework.observe(events.relation_changed, self._on_refresh)
391+ for event in self._refresh_events:
392+ self.framework.observe(event, self._on_refresh)
393+
394+ def _on_refresh(self, event):
395+ """Trigger the class to update relation data."""
396+ relations = self._charm.model.relations[self._relation_name]
397+
398+ for relation in relations:
399+ # Before a principal is related to the grafana-agent subordinate, we'd get
400+ # ModelError: ERROR cannot read relation settings: unit "zk/2": settings not found
401+ # Add a guard to make sure it doesn't happen.
402+ if relation.data and self._charm.unit in relation.data:
403+ # Subordinate relations can communicate only over unit data.
404+ try:
405+ data = CosAgentProviderUnitData(
406+ metrics_alert_rules=self._metrics_alert_rules,
407+ log_alert_rules=self._log_alert_rules,
408+ dashboards=self._dashboards,
409+ metrics_scrape_jobs=self._scrape_jobs,
410+ log_slots=self._log_slots,
411+ subordinate=self._charm.meta.subordinate,
412+ )
413+ relation.data[self._charm.unit][data.KEY] = data.json()
414+ except (
415+ pydantic.ValidationError,
416+ json.decoder.JSONDecodeError,
417+ ) as e:
418+ logger.error("Invalid relation data provided: %s", e)
419+
420+ @property
421+ def _scrape_jobs(self) -> List[Dict]:
422+ """Return a prometheus_scrape-like data structure for jobs.
423+
424+ https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
425+ """
426+ if callable(self._scrape_configs):
427+ scrape_configs = self._scrape_configs()
428+ else:
429+ # Create a copy of the user scrape_configs, since we will mutate this object
430+ scrape_configs = self._scrape_configs.copy()
431+
432+ # Convert "metrics_endpoints" to standard scrape_configs, and add them in
433+ for endpoint in self._metrics_endpoints:
434+ scrape_configs.append(
435+ {
436+ "metrics_path": endpoint["path"],
437+ "static_configs": [{"targets": [f"localhost:{endpoint['port']}"]}],
438+ }
439+ )
440+
441+ scrape_configs = scrape_configs or [DEFAULT_SCRAPE_CONFIG]
442+
443+ # Augment job name to include the app name and a unique id (index)
444+ for idx, scrape_config in enumerate(scrape_configs):
445+ scrape_config["job_name"] = "_".join(
446+ [self._charm.app.name, str(idx), scrape_config.get("job_name", "default")]
447+ )
448+
449+ return scrape_configs
450+
451+ @property
452+ def _metrics_alert_rules(self) -> Dict:
453+ """Use (for now) the prometheus_scrape AlertRules to initialize this."""
454+ alert_rules = AlertRules(
455+ query_type="promql", topology=JujuTopology.from_charm(self._charm)
456+ )
457+ alert_rules.add_path(self._metrics_rules, recursive=self._recursive)
458+ return alert_rules.as_dict()
459+
460+ @property
461+ def _log_alert_rules(self) -> Dict:
462+ """Use (for now) the loki_push_api AlertRules to initialize this."""
463+ alert_rules = AlertRules(query_type="logql", topology=JujuTopology.from_charm(self._charm))
464+ alert_rules.add_path(self._logs_rules, recursive=self._recursive)
465+ return alert_rules.as_dict()
466+
467+ @property
468+ def _dashboards(self) -> List[GrafanaDashboard]:
469+ dashboards: List[GrafanaDashboard] = []
470+ for d in self._dashboard_dirs:
471+ for path in Path(d).glob("*"):
472+ dashboard = GrafanaDashboard._serialize(path.read_bytes())
473+ dashboards.append(dashboard)
474+ return dashboards
475+
476+
477+class COSAgentDataChanged(EventBase):
478+ """Event emitted by `COSAgentRequirer` when relation data changes."""
479+
480+
481+class COSAgentValidationError(EventBase):
482+ """Event emitted by `COSAgentRequirer` when there is an error in the relation data."""
483+
484+ def __init__(self, handle, message: str = ""):
485+ super().__init__(handle)
486+ self.message = message
487+
488+ def snapshot(self) -> Dict:
489+ """Save COSAgentValidationError source information."""
490+ return {"message": self.message}
491+
492+ def restore(self, snapshot):
493+ """Restore COSAgentValidationError source information."""
494+ self.message = snapshot["message"]
495+
496+
497+class COSAgentRequirerEvents(ObjectEvents):
498+ """`COSAgentRequirer` events."""
499+
500+ data_changed = EventSource(COSAgentDataChanged)
501+ validation_error = EventSource(COSAgentValidationError)
502+
503+
504+class MultiplePrincipalsError(Exception):
505+ """Custom exception for when there are multiple principal applications."""
506+
507+ pass
508+
509+
510+class COSAgentRequirer(Object):
511+ """Integration endpoint wrapper for the Requirer side of the cos_agent interface."""
512+
513+ on = COSAgentRequirerEvents() # pyright: ignore
514+
515+ def __init__(
516+ self,
517+ charm: CharmType,
518+ *,
519+ relation_name: str = DEFAULT_RELATION_NAME,
520+ peer_relation_name: str = DEFAULT_PEER_RELATION_NAME,
521+ refresh_events: Optional[List[str]] = None,
522+ ):
523+ """Create a COSAgentRequirer instance.
524+
525+ Args:
526+ charm: The `CharmBase` instance that is instantiating this object.
527+ relation_name: The name of the relation to communicate over.
528+ peer_relation_name: The name of the peer relation to communicate over.
529+ refresh_events: List of events on which to refresh relation data.
530+ """
531+ super().__init__(charm, relation_name)
532+ self._charm = charm
533+ self._relation_name = relation_name
534+ self._peer_relation_name = peer_relation_name
535+ self._refresh_events = refresh_events or [self._charm.on.config_changed]
536+
537+ events = self._charm.on[relation_name]
538+ self.framework.observe(
539+ events.relation_joined, self._on_relation_data_changed
540+ ) # TODO: do we need this?
541+ self.framework.observe(events.relation_changed, self._on_relation_data_changed)
542+ for event in self._refresh_events:
543+ self.framework.observe(event, self.trigger_refresh) # pyright: ignore
544+
545+ # Peer relation events
546+ # A peer relation is needed as it is the only mechanism for exchanging data across
547+ # subordinate units.
548+ # self.framework.observe(
549+ # self.on[self._peer_relation_name].relation_joined, self._on_peer_relation_joined
550+ # )
551+ peer_events = self._charm.on[peer_relation_name]
552+ self.framework.observe(peer_events.relation_changed, self._on_peer_relation_changed)
553+
554+ @property
555+ def peer_relation(self) -> Optional["Relation"]:
556+ """Helper function for obtaining the peer relation object.
557+
558+ Returns: peer relation object
559+ (NOTE: would return None if called too early, e.g. during install).
560+ """
561+ return self.model.get_relation(self._peer_relation_name)
562+
563+ def _on_peer_relation_changed(self, _):
564+ # Peer data is used for forwarding data from principal units to the grafana agent
565+ # subordinate leader, for updating the app data of the outgoing o11y relations.
566+ if self._charm.unit.is_leader():
567+ self.on.data_changed.emit() # pyright: ignore
568+
569+ def _on_relation_data_changed(self, event: RelationChangedEvent):
570+ # Peer data is the only means of communication between subordinate units.
571+ if not self.peer_relation:
572+ event.defer()
573+ return
574+
575+ cos_agent_relation = event.relation
576+ if not event.unit or not cos_agent_relation.data.get(event.unit):
577+ return
578+ principal_unit = event.unit
579+
580+ # Coherence check
581+ units = cos_agent_relation.units
582+ if len(units) > 1:
583+ # should never happen
584+ raise ValueError(
585+ f"unexpected error: subordinate relation {cos_agent_relation} "
586+ f"should have exactly one unit"
587+ )
588+
589+ if not (raw := cos_agent_relation.data[principal_unit].get(CosAgentProviderUnitData.KEY)):
590+ return
591+
592+ if not (provider_data := self._validated_provider_data(raw)):
593+ return
594+
595+ # Copy data from the principal relation to the peer relation, so the leader could
596+ # follow up.
597+ # Save the originating unit name, so it could be used for topology later on by the leader.
598+ data = CosAgentPeersUnitData( # peer relation databag model
599+ principal_unit_name=event.unit.name,
600+ principal_relation_id=str(event.relation.id),
601+ principal_relation_name=event.relation.name,
602+ metrics_alert_rules=provider_data.metrics_alert_rules,
603+ log_alert_rules=provider_data.log_alert_rules,
604+ dashboards=provider_data.dashboards,
605+ )
606+ self.peer_relation.data[self._charm.unit][
607+ f"{CosAgentPeersUnitData.KEY}-{event.unit.name}"
608+ ] = data.json()
609+
610+ # We can't easily tell if the data that was changed is limited to only the data
611+ # that goes into peer relation (in which case, if this is not a leader unit, we wouldn't
612+ # need to emit `on.data_changed`), so we're emitting `on.data_changed` either way.
613+ self.on.data_changed.emit() # pyright: ignore
614+
615+ def _validated_provider_data(self, raw) -> Optional[CosAgentProviderUnitData]:
616+ try:
617+ return CosAgentProviderUnitData(**json.loads(raw))
618+ except (pydantic.ValidationError, json.decoder.JSONDecodeError) as e:
619+ self.on.validation_error.emit(message=str(e)) # pyright: ignore
620+ return None
621+
622+ def trigger_refresh(self, _):
623+ """Trigger a refresh of relation data."""
624+ # FIXME: Figure out what we should do here
625+ self.on.data_changed.emit() # pyright: ignore
626+
627+ @property
628+ def _principal_unit(self) -> Optional[Unit]:
629+ """Return the principal unit for a relation.
630+
631+ Assumes that the relation is of type subordinate.
632+ Relies on the fact that, for subordinate relations, the only remote unit visible to
633+ *this unit* is the principal unit that this unit is attached to.
634+ """
635+ if relations := self._principal_relations:
636+ # Technically it's a list, but for subordinates there can only be one relation
637+ principal_relation = next(iter(relations))
638+ if units := principal_relation.units:
639+ # Technically it's a list, but for subordinates there can only be one
640+ return next(iter(units))
641+
642+ return None
643+
644+ @property
645+ def _principal_relations(self):
646+ relations = []
647+ for relation in self._charm.model.relations[self._relation_name]:
648+ if not json.loads(relation.data[next(iter(relation.units))]["config"]).get(
649+ ["subordinate"], False
650+ ):
651+ relations.append(relation)
652+ if len(relations) > 1:
653+ logger.error(
654+ "Multiple applications claiming to be principal. Update the cos-agent library in the client application charms."
655+ )
656+ raise MultiplePrincipalsError("Multiple principal applications.")
657+ return relations
658+
659+ @property
660+ def _remote_data(self) -> List[CosAgentProviderUnitData]:
661+ """Return a list of remote data from each of the related units.
662+
663+ Assumes that the relation is of type subordinate.
664+ Relies on the fact that, for subordinate relations, the only remote unit visible to
665+ *this unit* is the principal unit that this unit is attached to.
666+ """
667+ all_data = []
668+
669+ for relation in self._charm.model.relations[self._relation_name]:
670+ if not relation.units:
671+ continue
672+ unit = next(iter(relation.units))
673+ if not (raw := relation.data[unit].get(CosAgentProviderUnitData.KEY)):
674+ continue
675+ if not (provider_data := self._validated_provider_data(raw)):
676+ continue
677+ all_data.append(provider_data)
678+
679+ return all_data
680+
681+ def _gather_peer_data(self) -> List[CosAgentPeersUnitData]:
682+ """Collect data from the peers.
683+
684+ Returns a trimmed-down list of CosAgentPeersUnitData.
685+ """
686+ relation = self.peer_relation
687+
688+ # Ensure that whatever context we're running this in, we take the necessary precautions:
689+ if not relation or not relation.data or not relation.app:
690+ return []
691+
692+ # Iterate over all peer unit data and only collect every principal once.
693+ peer_data: List[CosAgentPeersUnitData] = []
694+ app_names: Set[str] = set()
695+
696+ for unit in chain((self._charm.unit,), relation.units):
697+ if not relation.data.get(unit):
698+ continue
699+
700+ for unit_name in relation.data.get(unit): # pyright: ignore
701+ if not unit_name.startswith(CosAgentPeersUnitData.KEY):
702+ continue
703+ raw = relation.data[unit].get(unit_name)
704+ if raw is None:
705+ continue
706+ data = CosAgentPeersUnitData(**json.loads(raw))
707+ # Have we already seen this principal app?
708+ if (app_name := data.app_name) in app_names:
709+ continue
710+ peer_data.append(data)
711+ app_names.add(app_name)
712+
713+ return peer_data
714+
715+ @property
716+ def metrics_alerts(self) -> Dict[str, Any]:
717+ """Fetch metrics alerts."""
718+ alert_rules = {}
719+
720+ seen_apps: List[str] = []
721+ for data in self._gather_peer_data():
722+ if rules := data.metrics_alert_rules:
723+ app_name = data.app_name
724+ if app_name in seen_apps:
725+ continue # dedup!
726+ seen_apps.append(app_name)
727+ # This is only used for naming the file, so be as specific as we can be
728+ identifier = JujuTopology(
729+ model=self._charm.model.name,
730+ model_uuid=self._charm.model.uuid,
731+ application=app_name,
732+ # For the topology unit, we could use `data.principal_unit_name`, but that unit
733+ # name may not be very stable: `_gather_peer_data` de-duplicates by app name so
734+ # the exact unit name that turns up first in the iterator may vary from time to
735+ # time. So using the grafana-agent unit name instead.
736+ unit=self._charm.unit.name,
737+ ).identifier
738+
739+ alert_rules[identifier] = rules
740+
741+ return alert_rules
742+
743+ @property
744+ def metrics_jobs(self) -> List[Dict]:
745+ """Parse the relation data contents and extract the metrics jobs."""
746+ scrape_jobs = []
747+ for data in self._remote_data:
748+ for job in data.metrics_scrape_jobs:
749+ # In #220, relation schema changed from a simplified dict to the standard
750+ # `scrape_configs`.
751+ # This is to ensure backwards compatibility with Providers older than v0.5.
752+ if "path" in job and "port" in job and "job_name" in job:
753+ job = {
754+ "job_name": job["job_name"],
755+ "metrics_path": job["path"],
756+ "static_configs": [{"targets": [f"localhost:{job['port']}"]}],
757+ }
758+
759+ scrape_jobs.append(job)
760+
761+ return scrape_jobs
762+
763+ @property
764+ def snap_log_endpoints(self) -> List[SnapEndpoint]:
765+ """Fetch logging endpoints exposed by related snaps."""
766+ plugs = []
767+ for data in self._remote_data:
768+ targets = data.log_slots
769+ if targets:
770+ for target in targets:
771+ if target in plugs:
772+ logger.warning(
773+ f"plug {target} already listed. "
774+ "The same snap is being passed from multiple "
775+ "endpoints; this should not happen."
776+ )
777+ else:
778+ plugs.append(target)
779+
780+ endpoints = []
781+ for plug in plugs:
782+ if ":" not in plug:
783+ logger.error(f"invalid plug definition received: {plug}. Ignoring...")
784+ else:
785+ endpoint = SnapEndpoint(*plug.split(":"))
786+ endpoints.append(endpoint)
787+ return endpoints
788+
789+ @property
790+ def logs_alerts(self) -> Dict[str, Any]:
791+ """Fetch log alerts."""
792+ alert_rules = {}
793+ seen_apps: List[str] = []
794+
795+ for data in self._gather_peer_data():
796+ if rules := data.log_alert_rules:
797+ # This is only used for naming the file, so be as specific as we can be
798+ app_name = data.app_name
799+ if app_name in seen_apps:
800+ continue # dedup!
801+ seen_apps.append(app_name)
802+
803+ identifier = JujuTopology(
804+ model=self._charm.model.name,
805+ model_uuid=self._charm.model.uuid,
806+ application=app_name,
807+ # For the topology unit, we could use `data.principal_unit_name`, but that unit
808+ # name may not be very stable: `_gather_peer_data` de-duplicates by app name so
809+ # the exact unit name that turns up first in the iterator may vary from time to
810+ # time. So using the grafana-agent unit name instead.
811+ unit=self._charm.unit.name,
812+ ).identifier
813+
814+ alert_rules[identifier] = rules
815+
816+ return alert_rules
817+
818+ @property
819+ def dashboards(self) -> List[Dict[str, str]]:
820+ """Fetch dashboards as encoded content.
821+
822+ Dashboards are assumed not to vary across units of the same primary.
823+ """
824+ dashboards: List[Dict[str, Any]] = []
825+
826+ seen_apps: List[str] = []
827+ for data in self._gather_peer_data():
828+ app_name = data.app_name
829+ if app_name in seen_apps:
830+ continue # dedup!
831+ seen_apps.append(app_name)
832+
833+ for encoded_dashboard in data.dashboards or ():
834+ content = GrafanaDashboard(encoded_dashboard)._deserialize()
835+
836+ title = content.get("title", "no_title")
837+
838+ dashboards.append(
839+ {
840+ "relation_id": data.principal_relation_id,
841+ # We have the remote charm name - use it for the identifier
842+ "charm": f"{data.principal_relation_name}-{app_name}",
843+ "content": content,
844+ "title": title,
845+ }
846+ )
847+
848+ return dashboards
849diff --git a/metadata.yaml b/metadata.yaml
850index efc7563..61ddaad 100644
851--- a/metadata.yaml
852+++ b/metadata.yaml
853@@ -31,6 +31,8 @@ provides:
854 nrpe-external-master:
855 interface: nrpe-external-master
856 scope: container
857+ cos-agent:
858+ interface: cos_agent
859
860 peers:
861 replicas:
862diff --git a/requirements-dev.txt b/requirements-dev.txt
863index 4f2a3f5..671f33c 100644
864--- a/requirements-dev.txt
865+++ b/requirements-dev.txt
866@@ -1,3 +1,7 @@
867 -r requirements.txt
868 coverage
869 flake8
870+
871+# Grafana Agent Library
872+cosl
873+pydantic < 2
874diff --git a/src/charm.py b/src/charm.py
875index c5f5c59..7819018 100755
876--- a/src/charm.py
877+++ b/src/charm.py
878@@ -24,6 +24,7 @@ from charms.operator_libs_linux.v0 import apt
879 from charms.operator_libs_linux.v0.apt import PackageError, PackageNotFoundError
880 from charms.operator_libs_linux.v0.passwd import group_exists, user_exists
881 from charms.operator_libs_linux.v0.systemd import service_reload
882+from charms.grafana_agent.v0.cos_agent import COSAgentProvider
883
884 from ops.charm import (
885 ActionEvent,
886@@ -100,6 +101,50 @@ OIDC_CONFIG_VALS = (
887 "oidc_logout_url",
888 )
889
890+LOG_DIRECTORY = "/var/log/landscape-server"
891+LOG_FILES = [
892+ "api.log",
893+ "appserver.log",
894+ "async-frontend.log",
895+ "job-handler.log",
896+ "landscape-profiles.log",
897+ "message-server.log",
898+ "package-search.log",
899+ "package-upload.log",
900+ "pingserver.log",
901+]
902+
903+
904+def _create_logfile_symlinks(log_dir, log_files):
905+ """Symlink /var/log/<application>/ logs to /var/log/ as <application>-<logfile>.log
906+
907+ Temporary method to support logging with COS. COS is limited to scraping
908+ logs directly under /var/log/
909+ """
910+ base_dir, prefix = os.path.split(log_dir)
911+ for logfile in log_files:
912+ src = os.path.join(log_dir, f"{logfile}")
913+ dst = os.path.join(base_dir, f"{prefix}-{logfile}")
914+ try:
915+ os.symlink(src, dst)
916+ except FileExistsError:
917+ pass
918+
919+
920+def _set_log_dir_permissions(log_dir):
921+ """Set permissions on /var/log/landscape-server to drwxrwxr-x.
922+
923+ Set ownership to syslog:landscape.
924+ """
925+ try:
926+ os.chmod(log_dir, 0o775)
927+ except FileNotFoundError:
928+ os.makedirs(log_dir)
929+ os.chmod(log_dir, 0o775)
930+ uid = 104 # syslog
931+ gid = 116 # landscape
932+ os.chown(log_dir, uid, gid)
933+
934
935 class LandscapeServerCharm(CharmBase):
936 """Charm the service."""
937@@ -178,6 +223,8 @@ class LandscapeServerCharm(CharmBase):
938 self.landscape_uid = user_exists("landscape").pw_uid
939 self.root_gid = group_exists("root").gr_gid
940
941+ self._grafana_agent = COSAgentProvider(self)
942+
943 def _on_config_changed(self, _) -> None:
944 prev_status = self.unit.status
945
946@@ -297,6 +344,11 @@ class LandscapeServerCharm(CharmBase):
947 self.unit.status = MaintenanceStatus("Installing SSL certificate")
948 write_ssl_cert(config_ssl_cert)
949
950+ _set_log_dir_permissions(LOG_DIRECTORY)
951+
952+ # For logging integration with COS
953+ _create_logfile_symlinks(LOG_DIRECTORY, LOG_FILES)
954+
955 # Write the license file, if it exists.
956 license_file = self.model.config.get("license_file")
957
958diff --git a/tests/test_charm.py b/tests/test_charm.py
959index 0b1477d..a4d2b2d 100644
960--- a/tests/test_charm.py
961+++ b/tests/test_charm.py
962@@ -4,6 +4,7 @@
963 # Learn more about testing at: https://juju.is/docs/sdk/testing
964
965 import os
966+import stat
967 import unittest
968 from grp import struct_group
969 from io import BytesIO
970@@ -24,7 +25,8 @@ from charms.operator_libs_linux.v0.apt import (
971
972 from charm import (
973 DEFAULT_SERVICES, HAPROXY_CONFIG_FILE, LANDSCAPE_PACKAGES, LEADER_SERVICES, LSCTL,
974- NRPE_D_DIR, SCHEMA_SCRIPT, HASH_ID_DATABASES, LandscapeServerCharm)
975+ NRPE_D_DIR, SCHEMA_SCRIPT, HASH_ID_DATABASES, LandscapeServerCharm, _create_logfile_symlinks,
976+ _set_log_dir_permissions)
977
978
979 class TestCharm(unittest.TestCase):
980@@ -65,6 +67,8 @@ class TestCharm(unittest.TestCase):
981 apt=DEFAULT,
982 prepend_default_settings=DEFAULT,
983 update_service_conf=DEFAULT,
984+ _create_logfile_symlinks=DEFAULT,
985+ _set_log_dir_permissions=DEFAULT,
986 )
987 ppa = harness.model.config.get("landscape_ppa")
988
989@@ -141,6 +145,8 @@ class TestCharm(unittest.TestCase):
990 write_ssl_cert=DEFAULT,
991 update_service_conf=DEFAULT,
992 prepend_default_settings=DEFAULT,
993+ _create_logfile_symlinks=DEFAULT,
994+ _set_log_dir_permissions=DEFAULT,
995 )
996
997 peer_relation_id = harness.add_relation("replicas", "landscape-server")
998@@ -170,6 +176,8 @@ class TestCharm(unittest.TestCase):
999 write_license_file=DEFAULT,
1000 prepend_default_settings=DEFAULT,
1001 update_service_conf=DEFAULT,
1002+ _create_logfile_symlinks=DEFAULT,
1003+ _set_log_dir_permissions=DEFAULT,
1004 )
1005
1006 with patches as mocks:
1007@@ -193,6 +201,8 @@ class TestCharm(unittest.TestCase):
1008 update_service_conf=DEFAULT,
1009 prepend_default_settings=DEFAULT,
1010 write_license_file=DEFAULT,
1011+ _create_logfile_symlinks=DEFAULT,
1012+ _set_log_dir_permissions=DEFAULT,
1013 ) as mocks:
1014 harness.begin_with_initial_hooks()
1015
1016@@ -1320,3 +1330,33 @@ class TestBootstrapAccount(unittest.TestCase):
1017 text=True
1018 )
1019 event.fail.assert_called_once()
1020+
1021+ def test_log_symlink_src_does_not_exist(self):
1022+ """Creating a symlink that points to a non-existent src doesn't raise an Exception"""
1023+ with TemporaryDirectory() as tmpdir:
1024+ log_dir = os.path.join(tmpdir, "var", "log", "landscape-server")
1025+ os.makedirs(log_dir)
1026+ log_files = ["test1.log", "test2.log", "test3.log"]
1027+ _create_logfile_symlinks(log_dir, log_files)
1028+
1029+ def test_log_symlink_already_exists(self):
1030+ """Creating symlinks that already exist doesn't raise an Exception"""
1031+ with TemporaryDirectory() as tmpdir:
1032+ log_dir = os.path.join(tmpdir, "var", "log", "landscape-server")
1033+ os.makedirs(log_dir)
1034+ file_names = ["test1.log", "test2.log", "test3.log"]
1035+ for file_name in file_names:
1036+ log_file = os.path.join(log_dir, file_name)
1037+ open(log_file, "wb").close() # create empty files
1038+ _create_logfile_symlinks(log_dir, file_names)
1039+ _create_logfile_symlinks(log_dir, file_names)
1040+
1041+ def test_set_log_dir_permissions_dir_does_not_exist(self):
1042+ """Setting permissions on a non-existent log dir creates the dir and sets permissions"""
1043+ with TemporaryDirectory() as tmpdir:
1044+ with patch("os.chown") as _: # chown to new user requires root
1045+ log_dir = os.path.join(tmpdir, "var", "log", "landscape-server")
1046+ _set_log_dir_permissions(log_dir)
1047+ self.assertTrue(os.path.exists(log_dir))
1048+ permissions = os.stat(log_dir).st_mode
1049+ self.assertTrue(0o755, oct(stat.S_IMODE(permissions))) # 755 = drwxrwxr-x file perms
1050\ No newline at end of file

Subscribers

People subscribed via source and target branches

to all changes: