Merge ~lcvcode/charm-nagios:LP1939068-duplicates into charm-nagios:master
- Git
- lp:~lcvcode/charm-nagios
- LP1939068-duplicates
- Merge into master
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 67fe2db9f283fb4fdf584578597e8520d566b92d | ||||
Proposed branch: | ~lcvcode/charm-nagios:LP1939068-duplicates | ||||
Merge into: | charm-nagios:master | ||||
Prerequisite: | ~lcvcode/charm-nagios:LP1939068 | ||||
Diff against target: |
153 lines (+51/-16) 1 file modified
hooks/monitors_relation_changed.py (+51/-16) |
||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Paul Goins | Needs Fixing | ||
🤖 prod-jenkaas-bootstack (community) | continuous-integration | Approve | |
Zachary Zehring (community) | Needs Fixing | ||
Drew Freiberger (community) | Needs Fixing | ||
James Troup | Pending | ||
Review via email: mp+406873@code.launchpad.net |
Commit message
Force uniqueness in cases of host_name duplication
Description of the change
This change ensures that overlapping host names in nagios are made unique so that parent-child relationships are never ambiguous.
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : | # |
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : | # |
Unable to determine commit message from repository - please click "Set commit message" and enter the commit message manually.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
A CI job is currently in progress. A follow up comment will be added when it completes.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
FAILED: Continuous integration, rev:b4f0578daee
No commit message was specified in the merge proposal. Click on the following link and set the commit message (if you want jenkins to rebuild you need to trigger it yourself):
https:/
https:/
Executed test runs:
SUCCESS: https:/
None: https:/
Click here to trigger a rebuild:
https:/
Drew Freiberger (afreiberger) wrote : | # |
Please see comments inline.
Zachary Zehring (zzehring) wrote : | # |
Added some comments.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
PASSED: Continuous integration, rev:b4f0578daee
https:/
Executed test runs:
SUCCESS: https:/
None: https:/
Click here to trigger a rebuild:
https:/
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
A CI job is currently in progress. A follow up comment will be added when it completes.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
FAILED: Continuous integration, rev:f85717778d5
https:/
Executed test runs:
FAILURE: https:/
None: https:/
Click here to trigger a rebuild:
https:/
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
A CI job is currently in progress. A follow up comment will be added when it completes.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
PASSED: Continuous integration, rev:8c6d53f56e4
https:/
Executed test runs:
SUCCESS: https:/
None: https:/
Click here to trigger a rebuild:
https:/
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
A CI job is currently in progress. A follow up comment will be added when it completes.
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote : | # |
PASSED: Continuous integration, rev:137f7a67d44
https:/
Executed test runs:
SUCCESS: https:/
None: https:/
Click here to trigger a rebuild:
https:/
Paul Goins (vultaire) wrote : | # |
I'm +1 for the change as it fills a gap in the case that nagios_host_context is specified with the same value on multiple models.
However, I believe it is superior to use distinct nagios_
For example, if you have keystone deployed to both an OpenStack and K8s model; and if hostnames are autogenerated as e.g. keystone-0, keystone-1, etc.; then including the model name in the host context would generate hostnames like e.g. <prefix>
Paul Goins (vultaire) wrote : | # |
During a 1:1, I remembered: I had concerns over how the status was being set here, and whether it might be being clobbered elsewhere in the charm. I very much support this MR as a stopgap to avoid having services clobber the wrong host in Nagios, especially if we have a message indicating that duplicate hostnames have been found, as that may guide toward reconfiguring the nagios_host_context to something more appropriate to avoid the collisions.
Temporarily marking as abstain; intending to check this tomorrow.
Paul Goins (vultaire) wrote : | # |
status_set is only really called in 2 places - one on charm upgrade, the other in the main() used by basically all other hooks. With that in mind, this is probably good enough in my opinion. I'm +1.
Paul Goins (vultaire) wrote : | # |
Just FYI: added a warning log message as part of a new MR which builds upon this one as a prerequisite: https:/
To be clear: this MR still needs one more +1 before merge.
Paul Goins (vultaire) wrote : | # |
Sorry for the thrash here.
I've tried to deploy this on serverstack, with multiple related models and nrpe related to apps on top-level nodes and to LXDs, and I'm seeing these issues:
* For a hostname collision at the parent level, I do see that I get records for both hosts; one has a suffix appended. However, the host group names are not consistent. For example, host juju-host-0 is associated with host group juju-host, while host juju-host-0-[1] is associated with host group juju-host-0. This is potentially confusing, especially since the latter host group matches the former host which isn't even part of that host group.
* If I have an app of the same name installed onto containers in two different models, they collide and I only see one in nagios.
This is with the following deployment: https:/
(Summary: nagios on one model, plus 2 other models, each with cs:ubuntu deployed as host and LXD, with nrpe related to each.)
This was done on serverstack (i.e. OpenStack), not a MAAS or LXD based controller.
I'm not sure what exactly in the code needs fixing yet, but I'm -1'ing for now.
Preview Diff
1 | diff --git a/hooks/monitors_relation_changed.py b/hooks/monitors_relation_changed.py | |||
2 | index 9297cdf..441fe4e 100755 | |||
3 | --- a/hooks/monitors_relation_changed.py | |||
4 | +++ b/hooks/monitors_relation_changed.py | |||
5 | @@ -28,6 +28,7 @@ from charmhelpers.core.hookenv import ( | |||
6 | 28 | related_units, | 28 | related_units, |
7 | 29 | relation_get, | 29 | relation_get, |
8 | 30 | relation_ids, | 30 | relation_ids, |
9 | 31 | status_set, | ||
10 | 31 | ) | 32 | ) |
11 | 32 | 33 | ||
12 | 33 | from common import ( | 34 | from common import ( |
13 | @@ -41,8 +42,10 @@ from common import ( | |||
14 | 41 | 42 | ||
15 | 42 | import yaml | 43 | import yaml |
16 | 43 | 44 | ||
18 | 44 | REQUIRED_REL_DATA_KEYS = ["target-address", "monitors", "target-id"] | 45 | MACHINE_ID_KEY = "machine_id" |
19 | 45 | MODEL_ID_KEY = "model_id" | 46 | MODEL_ID_KEY = "model_id" |
20 | 47 | TARGET_ID_KEY = "target-id" | ||
21 | 48 | REQUIRED_REL_DATA_KEYS = ["target-address", "monitors", TARGET_ID_KEY] | ||
22 | 46 | 49 | ||
23 | 47 | 50 | ||
24 | 48 | def _prepare_relation_data(unit, rid): | 51 | def _prepare_relation_data(unit, rid): |
25 | @@ -58,7 +61,7 @@ def _prepare_relation_data(unit, rid): | |||
26 | 58 | 61 | ||
27 | 59 | if rid.split(":")[0] == "nagios": | 62 | if rid.split(":")[0] == "nagios": |
28 | 60 | # Fake it for the more generic 'nagios' relation | 63 | # Fake it for the more generic 'nagios' relation |
30 | 61 | relation_data["target-id"] = unit.replace("/", "-") | 64 | relation_data[TARGET_ID_KEY] = unit.replace("/", "-") |
31 | 62 | relation_data["monitors"] = {"monitors": {"remote": {}}} | 65 | relation_data["monitors"] = {"monitors": {"remote": {}}} |
32 | 63 | 66 | ||
33 | 64 | if not relation_data.get("target-address"): | 67 | if not relation_data.get("target-address"): |
34 | @@ -101,7 +104,7 @@ def main(argv): # noqa: C901 | |||
35 | 101 | # | 104 | # |
36 | 102 | 105 | ||
37 | 103 | if len(argv) > 1: | 106 | if len(argv) > 1: |
39 | 104 | relation_settings = {"monitors": open(argv[1]).read(), "target-id": argv[2]} | 107 | relation_settings = {"monitors": open(argv[1]).read(), TARGET_ID_KEY: argv[2]} |
40 | 105 | 108 | ||
41 | 106 | if len(argv) > 3: | 109 | if len(argv) > 3: |
42 | 107 | relation_settings["target-address"] = argv[3] | 110 | relation_settings["target-address"] = argv[3] |
43 | @@ -114,13 +117,13 @@ def main(argv): # noqa: C901 | |||
44 | 114 | 117 | ||
45 | 115 | for relid, units in all_relations.iteritems(): | 118 | for relid, units in all_relations.iteritems(): |
46 | 116 | for unit, relation_settings in units.items(): | 119 | for unit, relation_settings in units.items(): |
49 | 117 | if "target-id" in relation_settings: | 120 | if TARGET_ID_KEY in relation_settings: |
50 | 118 | targets_with_addresses.add(relation_settings["target-id"]) | 121 | targets_with_addresses.add(relation_settings.get(TARGET_ID_KEY)) |
51 | 119 | new_all_relations = {} | 122 | new_all_relations = {} |
52 | 120 | 123 | ||
53 | 121 | for relid, units in all_relations.iteritems(): | 124 | for relid, units in all_relations.iteritems(): |
54 | 122 | for unit, relation_settings in units.items(): | 125 | for unit, relation_settings in units.items(): |
56 | 123 | if relation_settings["target-id"] in targets_with_addresses: | 126 | if relation_settings.get(TARGET_ID_KEY) in targets_with_addresses: |
57 | 124 | if relid not in new_all_relations: | 127 | if relid not in new_all_relations: |
58 | 125 | new_all_relations[relid] = {} | 128 | new_all_relations[relid] = {} |
59 | 126 | new_all_relations[relid][unit] = relation_settings | 129 | new_all_relations[relid][unit] = relation_settings |
60 | @@ -129,23 +132,55 @@ def main(argv): # noqa: C901 | |||
61 | 129 | 132 | ||
62 | 130 | initialize_inprogress_config() | 133 | initialize_inprogress_config() |
63 | 131 | 134 | ||
64 | 135 | uniq_hostnames = set() | ||
65 | 136 | duplicate_hostnames = defaultdict(int) | ||
66 | 137 | |||
67 | 138 | def record_hostname(hostname): | ||
68 | 139 | if hostname not in uniq_hostnames: | ||
69 | 140 | uniq_hostnames.add(hostname) | ||
70 | 141 | else: | ||
71 | 142 | duplicate_hostnames[hostname] += 1 | ||
72 | 143 | |||
73 | 132 | # make a dict of machine ids to target-id hostnames | 144 | # make a dict of machine ids to target-id hostnames |
74 | 133 | all_hosts = {} | 145 | all_hosts = {} |
75 | 134 | for relid, units in all_relations.items(): | 146 | for relid, units in all_relations.items(): |
76 | 135 | for unit, relation_settings in units.iteritems(): | 147 | for unit, relation_settings in units.iteritems(): |
79 | 136 | machine_id = relation_settings.get("machine_id", None) | 148 | machine_id = relation_settings.get(MACHINE_ID_KEY) |
80 | 137 | model_id = relation_settings.get(MODEL_ID_KEY, None) | 149 | model_id = relation_settings.get(MODEL_ID_KEY) |
81 | 150 | target_id = relation_settings.get(TARGET_ID_KEY) | ||
82 | 138 | 151 | ||
84 | 139 | if not machine_id: | 152 | if not machine_id or not target_id: |
85 | 140 | continue | 153 | continue |
86 | 141 | 154 | ||
87 | 155 | # Check for duplicate hostnames and amend them if needed | ||
88 | 156 | record_hostname(target_id) | ||
89 | 157 | if target_id in duplicate_hostnames: | ||
90 | 158 | # Duplicate hostname has been found | ||
91 | 159 | # Append "-[<number of duplicates>]" hostname | ||
92 | 160 | # Example: | ||
93 | 161 | # 1st hostname of "juju-ubuntu-0" is unchanged | ||
94 | 162 | # 2nd hostname of "juju-ubuntu-0" is changed to "juju-ubuntu-0-[1]" | ||
95 | 163 | # 3rd hostname of "juju-ubuntu-0" is changed to "juju-ubuntu-0-[2]" | ||
96 | 164 | target_id += "-[{}]".format(duplicate_hostnames[target_id]) | ||
97 | 165 | relation_settings[TARGET_ID_KEY] = target_id | ||
98 | 166 | |||
99 | 142 | # Backwards compatible hostname from machine id | 167 | # Backwards compatible hostname from machine id |
100 | 143 | if not model_id: | 168 | if not model_id: |
102 | 144 | all_hosts[machine_id] = relation_settings["target-id"] | 169 | all_hosts[machine_id] = target_id |
103 | 145 | # New hostname from machine id using model id | 170 | # New hostname from machine id using model id |
104 | 146 | else: | 171 | else: |
105 | 147 | all_hosts.setdefault(model_id, {}) | 172 | all_hosts.setdefault(model_id, {}) |
107 | 148 | all_hosts[model_id][machine_id] = relation_settings["target-id"] | 173 | all_hosts[model_id][machine_id] = target_id |
108 | 174 | |||
109 | 175 | if duplicate_hostnames: | ||
110 | 176 | status_set( | ||
111 | 177 | "active", | ||
112 | 178 | "Duplicate host names detected: {}".format( | ||
113 | 179 | ", ".join(duplicate_hostnames.keys()) | ||
114 | 180 | ), | ||
115 | 181 | ) | ||
116 | 182 | else: | ||
117 | 183 | status_set("active", "ready") | ||
118 | 149 | 184 | ||
119 | 150 | for relid, units in all_relations.items(): | 185 | for relid, units in all_relations.items(): |
120 | 151 | apply_relation_config(relid, units, all_hosts) | 186 | apply_relation_config(relid, units, all_hosts) |
121 | @@ -158,11 +193,11 @@ def main(argv): # noqa: C901 | |||
122 | 158 | def apply_relation_config(relid, units, all_hosts): # noqa: C901 | 193 | def apply_relation_config(relid, units, all_hosts): # noqa: C901 |
123 | 159 | for unit, relation_settings in units.iteritems(): | 194 | for unit, relation_settings in units.iteritems(): |
124 | 160 | monitors = relation_settings["monitors"] | 195 | monitors = relation_settings["monitors"] |
127 | 161 | target_id = relation_settings["target-id"] | 196 | target_id = relation_settings[TARGET_ID_KEY] |
128 | 162 | machine_id = relation_settings.get("machine_id", None) | 197 | machine_id = relation_settings.get(MACHINE_ID_KEY) |
129 | 163 | parent_host = None | 198 | parent_host = None |
130 | 164 | 199 | ||
132 | 165 | model_id = relation_settings.get(MODEL_ID_KEY, None) | 200 | model_id = relation_settings.get(MODEL_ID_KEY) |
133 | 166 | 201 | ||
134 | 167 | if machine_id: | 202 | if machine_id: |
135 | 168 | 203 | ||
136 | @@ -173,7 +208,7 @@ def apply_relation_config(relid, units, all_hosts): # noqa: C901 | |||
137 | 173 | # Get hostname using model id | 208 | # Get hostname using model id |
138 | 174 | if model_id: | 209 | if model_id: |
139 | 175 | model_hosts = all_hosts.get(model_id, {}) | 210 | model_hosts = all_hosts.get(model_id, {}) |
141 | 176 | parent_host = model_hosts.get(parent_machine, None) | 211 | parent_host = model_hosts.get(parent_machine) |
142 | 177 | 212 | ||
143 | 178 | # Get hostname without model id | 213 | # Get hostname without model id |
144 | 179 | # this conserves backwards compatibility with older | 214 | # this conserves backwards compatibility with older |
145 | @@ -184,7 +219,7 @@ def apply_relation_config(relid, units, all_hosts): # noqa: C901 | |||
146 | 184 | # If not set, we don't mess with it, as multiple services may feed | 219 | # If not set, we don't mess with it, as multiple services may feed |
147 | 185 | # monitors in for a particular address. Generally a primary will set | 220 | # monitors in for a particular address. Generally a primary will set |
148 | 186 | # this to its own private-address | 221 | # this to its own private-address |
150 | 187 | target_address = relation_settings.get("target-address", None) | 222 | target_address = relation_settings.get("target-address") |
151 | 188 | 223 | ||
152 | 189 | if type(monitors) != dict: | 224 | if type(monitors) != dict: |
153 | 190 | monitors = yaml.safe_load(monitors) | 225 | monitors = yaml.safe_load(monitors) |
This merge proposal is being monitored by mergebot. Change the status to Approved to merge.