Merge ~yoshikadokawa/charm-nrpe:master-check-lacpdu into ~nrpe-charmers/charm-nrpe:master

Proposed by Yoshi Kadokawa
Status: Superseded
Proposed branch: ~yoshikadokawa/charm-nrpe:master-check-lacpdu
Merge into: ~nrpe-charmers/charm-nrpe:master
Diff against target: 85 lines (+46/-1)
1 file modified
files/plugins/check_lacp_bond.py (+46/-1)
Reviewer Review Type Date Requested Status
Alvaro Uria (community) Needs Information
Canonical IS Reviewers Pending
Review via email: mp+383577@code.launchpad.net

This proposal has been superseded by a proposal from 2020-07-26.

Commit message

Add additional checks for LACP bond status

Add checks if the LACPDU port state matches for the underlaying interfaces.

LP: #1877255

To post a comment you must log in.
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
Alvaro Uria (aluria) wrote :

* LACP Port States (aka. Actor states) shared in binary format can be found at [1].
* Bond netlink code on latest stable [2]
* Partner state - what is considered partner state? Is LACP considered to be configured with 2 slaves (max and min number)?

I would like to understand use cases for this change and why the aggregator ID mismatch is not enough. On the other hand, there could be more than 2 slaves, so I'm not clear on what the "partner" would be in such cases. Or, there are deployments that initially use a single interface (no real purpose for bonding except for a near future reconfig to add a second slave). Would that mean that no "partner" would exist and "WARNING" would be alerted?

Once we have use cases (what it could have been spotted with this fix), I'd also like to know if these checks could be made optional (ie. check_lacp_bond.py --skip-extra-port-state).

With regard to the code, there are a couple of for loops over the same values (list of slaves). Could a single for loop be used?

Thank you.

1. http://movingpackets.net/2017/10/17/decoding-lacp-port-state/
2. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/bonding/bond_netlink.c?h=v5.6.12

review: Needs Information
Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :

> * Partner state - what is considered partner state? Is LACP considered to be
> configured with 2 slaves (max and min number)?
For each underlying interface of an LACP bond, it has an actor state and partner state.
Actor state is the status of the host side, and partner state is the status coming from the switch side. Basically these have to be identical.

> I would like to understand use cases for this change and why the aggregator ID
> mismatch is not enough. On the other hand, there could be more than 2 slaves,
> so I'm not clear on what the "partner" would be in such cases. Or, there are
> deployments that initially use a single interface (no real purpose for bonding
> except for a near future reconfig to add a second slave). Would that mean that
> no "partner" would exist and "WARNING" would be alerted?
In my experience, aggregator ID mismatch is a more critical situation, which is usually one interface is DOWN/Unstable or not configured as LACP, etc.
And of course, the actor port state and partner port state will be in a mismatch as well.
However, there are use cases when aggregator ID are identical for the underlying interfaces, but a mismatch between actor port state and partner port state.
This is often seen with different LACP port rate configuration between host conf and switch conf.
For example, slow(30s) is configured on the switch side and fast(1s) is configured on the host side.

> Once we have use cases (what it could have been spotted with this fix), I'd
> also like to know if these checks could be made optional (ie.
> check_lacp_bond.py --skip-extra-port-state).
Since we already have the aggregator ID mismatch check, I think this can be optional.
I'll update it to make this optional.

> With regard to the code, there are a couple of for loops over the same values
> (list of slaves). Could a single for loop be used?
I did this to separate the output for aggregator ID mismatch and port state mismatch. If aggregator ID mismatch happens, there will be port state mismatch as well, and since the priority for aggregator ID mismatch is higher, I wanted to make sure that the warning message for aggregator ID comes up in Nagios.
However, if I make this port state mismatch check optional, I can put this check into the same for loop, and show the message only if the option is true.

Thank you

Unmerged commits

6a41da3... by Yoshi Kadokawa

add details of lacpdu port mismatch

LP: #1877255

f79a231... by Yoshi Kadokawa

add LACPDU port state check

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/files/plugins/check_lacp_bond.py b/files/plugins/check_lacp_bond.py
2index 3e39986..b6e27d7 100755
3--- a/files/plugins/check_lacp_bond.py
4+++ b/files/plugins/check_lacp_bond.py
5@@ -17,6 +17,30 @@ from nagios_plugin3 import (
6 try_check,
7 )
8
9+# LACPDU port states in binary
10+LACPDU_ACTIVE = 0b1
11+LACPDU_RATE = 0b10
12+LACPDU_AGGREGATED = 0b100
13+LACPDU_SYNC = 0b1000
14+LACPDU_COLLECT = 0b10000
15+LACPDU_DIST = 0b100000
16+LACPDU_DEFAULT = 0b1000000
17+LACPDU_EXPIRED = 0b10000000
18+
19+
20+def check_lacpdu_port(actor_port, partner_port):
21+ diff = int(actor_port) ^ int(partner_port)
22+ msg = ''
23+ if diff & LACPDU_RATE:
24+ msg += 'lacp rate mismatch,'
25+ if diff & LACPDU_AGGREGATED:
26+ msg += 'not aggregated,'
27+ if diff & LACPDU_SYNC:
28+ msg += 'not in sync,'
29+ if diff & LACPDU_COLLECT:
30+ msg += 'not collecting,'
31+ return msg
32+
33
34 def check_lacp_bond(iface):
35 """Checks LACP bonds are correctly configured (AD Aggregator IDs match)
36@@ -25,6 +49,12 @@ def check_lacp_bond(iface):
37 BOND_SLAVES_TEMPLATE = '/sys/class/net/{0}/bonding/slaves'
38 BOND_MODE_TEMPLATE = '/sys/class/net/{0}/bonding/mode'
39 SLAVE_TEMPLATE = '/sys/class/net/{0}/bonding_slave/ad_aggregator_id'
40+ ACTOR_PORT_STATE = (
41+ '/sys/class/net/{0}/bonding_slave/ad_actor_oper_port_state'
42+ )
43+ PARTNET_PORT_STATE = (
44+ '/sys/class/net/{0}/bonding_slave/ad_partner_oper_port_state'
45+ )
46
47 bond_aggr = BOND_AGGR_TEMPLATE.format(iface)
48 bond_slaves = BOND_SLAVES_TEMPLATE.format(iface)
49@@ -44,7 +74,7 @@ def check_lacp_bond(iface):
50
51 with open(bond_slaves) as fd:
52 slaves = fd.readline().strip().split(' ')
53-
54+ # Check aggregator ID
55 for slave in slaves:
56 with open(SLAVE_TEMPLATE.format(slave)) as fd:
57 slave_aggr_value = fd.readline().strip()
58@@ -58,6 +88,20 @@ def check_lacp_bond(iface):
59 msg = msg.format(iface, bond_aggr_value,
60 slave, slave_aggr_value)
61 raise WarnError(msg)
62+ # Check LACPDU port state
63+ for slave in slaves:
64+ with open(ACTOR_PORT_STATE.format(slave)) as fd:
65+ actor_port_value = fd.readline().strip()
66+ with open(PARTNET_PORT_STATE.format(slave)) as fd:
67+ partner_port_value = fd.readline().strip()
68+ if actor_port_value != partner_port_value:
69+ res = check_lacpdu_port(actor_port_value, partner_port_value)
70+ msg = 'WARNING: LACPDU port state mismatch '
71+ msg += '({0}: {1} - actor_port_state={2}, '
72+ msg += 'partner_port_state={3})'
73+ msg = msg.format(res, slave, actor_port_value,
74+ partner_port_value)
75+ raise WarnError(msg)
76
77 else:
78 msg = 'CRITICAL: {} is not a bonding interface'.format(iface)
79@@ -90,5 +134,6 @@ def main():
80 try_check(check_lacp_bond,
81 args.iface)
82
83+
84 if __name__ == '__main__':
85 main()

Subscribers

People subscribed via source and target branches