Merge ~tamilmani1989/cloud-init:azure_networking into cloud-init:master
- Git
- lp:~tamilmani1989/cloud-init
- azure_networking
- Merge into master
Status: | Merged |
---|---|
Approved by: | Ryan Harper |
Approved revision: | 56559c2548f858046a0775d21190cb167463cdcb |
Merge reported by: | Server Team CI bot |
Merged at revision: | not available |
Proposed branch: | ~tamilmani1989/cloud-init:azure_networking |
Merge into: | cloud-init:master |
Diff against target: |
859 lines (+678/-16) 4 files modified
cloudinit/sources/DataSourceAzure.py (+27/-4) cloudinit/sources/helpers/netlink.py (+250/-0) cloudinit/sources/helpers/tests/test_netlink.py (+373/-0) tests/unittests/test_datasource/test_azure.py (+28/-12) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Server Team CI bot | continuous-integration | Approve | |
Scott Moser | Approve | ||
Sushant Sharma (community) | Approve | ||
Ryan Harper | Approve | ||
Chad Smith | Approve | ||
Paul Meyer (community) | Needs Fixing | ||
Douglas Jordan | Pending | ||
Review via email: mp+336392@code.launchpad.net |
Commit message
azure: detect vnet migration via netlink media change event
Replace Azure pre-provision polling on IMDS with a blocking call
which watches for netlink link state change messages. The media
change event happens when a pre-provisioned VM has been activated
and is connected to the users virtual network and cloud-init can
then resume operation to complete image instantiation.
Description of the change
This PR enables moving a virtual machine (VM) in Azure from one virtual network (Vnet) to another. It adds a new azure specific module that listens for Azure host-node to indicate that VM has now moved to a different network (by listening for nic connect/
Robert Schweikert (rjschwei) wrote : | # |
Sushant Sharma (sushantsharma) : | # |
Douglas Jordan (dojordan) wrote : | # |
Some comments inline
Ryan Harper (raharper) wrote : | # |
Can you describe in what scenario the current code which polls the IMDS url for failures is inadequate?
Is there a way we can test the current code on Azure directly?
Sushant Sharma (sushantsharma) wrote : | # |
> Can you describe in what scenario the current code which polls the IMDS url
> for failures is inadequate?
>
> Is there a way we can test the current code on Azure directly?
Thanks Ryan, This is a valid question, and we are in process of testing in Azure with all components involved (including updated host). Having said that, IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting updated on the host, or any random communication failure (since this is a network call).
I think relying on a clean deterministic event should be preferred.
Furthermore, I believe the code that polls host for new provisioning data should just do that, and not worry about setting up networking. We should aim for clear and concise goals for every piece of code.
Ryan Harper (raharper) wrote : | # |
> > Can you describe in what scenario the current code which polls the IMDS url
> > for failures is inadequate?
> >
> > Is there a way we can test the current code on Azure directly?
>
> Thanks Ryan, This is a valid question, and we are in process of testing in
> Azure with all components involved (including updated host). Having said that,
> IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> updated on the host, or any random communication failure (since this is a
> network call).
Which of these failures are not handled already by the retry logic? If we have any failure attempting to fetch the URL from IMDS, then then the retry logic will acquire a new lease; so I'm struggling to see which scenario is not covered.
> I think relying on a clean deterministic event should be preferred.
I agree; I'd much rather see use of the HyperV Keystore here for indicating both that a VM has been tagged as pre-provision, and when its network has been moved.
> Furthermore, I believe the code that polls host for new provisioning data
> should just do that, and not worry about setting up networking.
The _poll_imds is not setting up networking for the system; rather it's setting up a temporary interface and DHCP'ing on to acquire information at a specific time for a specific use-case: preprovisioning.
Sushant Sharma (sushantsharma) wrote : | # |
> > > Can you describe in what scenario the current code which polls the IMDS
> url
> > > for failures is inadequate?
> > >
> > > Is there a way we can test the current code on Azure directly?
> >
> > Thanks Ryan, This is a valid question, and we are in process of testing in
> > Azure with all components involved (including updated host). Having said
> that,
> > IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> > updated on the host, or any random communication failure (since this is a
> > network call).
>
> Which of these failures are not handled already by the retry logic? If we
> have any failure attempting to fetch the URL from IMDS, then then the retry
> logic will acquire a new lease; so I'm struggling to see which scenario is not
> covered.
>
Let's remember that time to move VM in the new network is also critical for our customers. If we can save any amount of time (even if it is few seconds on average) that we spend on polling and retrying dhcp on failure, we should prefer that. Having said that, let us run some tests and see how much we are saving in the solution that depends on netlink versus the existing checked in code. Will report the results and hopefully we can make the case again for the current PR.
Thanks,
Sushant
Douglas Jordan (dojordan) wrote : | # |
Added a few (mostly style) comments
Tamilmani Manoharan (tamilmani1989) wrote : | # |
I addressed review comments
Douglas Jordan (dojordan) : | # |
Douglas Jordan (dojordan) : | # |
Paul Meyer (paul-meyer) wrote : | # |
Overall LGTM, except for the lack of unit tests for the new code.
You could change the 0/-1 return code from create_
Tamilmani Manoharan (tamilmani1989) wrote : | # |
Hi Paul, I added unit test for network switch. Can you please review?
Chad Smith (chad.smith) wrote : | # |
Updated my comment and patch suggestions, please feel free to disagree where you think points are irrelevant or wrong
Here are a couple thoughts on your branch to aid in testability of the branch features:
The complete patch suggestion is here http://
I've itemized the general thoughts here for your review to see what you think
1. generally let's limit what lives in a try/except block to code that we expect will actually generate exceptions. I moved general increment logic outside of the struct.unpack calls and moved the try/except socket.error checks to around the read_netlink_socket as I didn't think socket.errors would be raised by anything else in wait_for_
2. I suggest breaking up NetworkEvents class as it seems just a collection of functions more than a class with state. When I see one-line class methods that don't actually depend on any instance attributes, it's a hint that it might be an empty level of indirection we just don't need (like select_
3. renam3 networkevents.py -> netlink.py as a module to make it a bit more specific
4 personal preference is to use namedtuples instead of classes to host simple structured attributes like your rta_attr definition
5. I'd like to see simple unit tests on each function per the simple example I added in cloudinit/
Chad Smith (chad.smith) wrote : | # |
Also here's a cloudinit/
Server Team CI bot (server-team-bot) wrote : | # |
FAILED: Continuous integration, rev:f561d8f6a14
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
FAILED: Ubuntu LTS: Build
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:7712d6fa901
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Joshua Powers (powersj) wrote : | # |
CI passed on latest commit, there are conflicts below that need to be worked out, moving it back to needs review at Tamilmani's request.
- 0c14e44... by Tamilmani Manoharan
-
Merge branch 'master' of https:/
/git.launchpad. net/cloud- init into azure_networking # Conflicts:
# cloudinit/sources/ DataSourceAzure .py - 144e6d8... by Tamilmani Manoharan
-
fixed indentation error
Ryan Harper (raharper) wrote : | # |
Thanks for working on the refactor. This looks a lot better and it's much easier to read and write unittests.
I've quite a few inline comments, please look through those. The biggest query is around how do we know when the vnet migration has started to prevent us from getting stuck forever if it's already started before we attempt to listen for events.
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> Thanks for working on the refactor. This looks a lot better and it's much
> easier to read and write unittests.
>
> I've quite a few inline comments, please look through those. The biggest
> query is around how do we know when the vnet migration has started to prevent
> us from getting stuck forever if it's already started before we attempt to
> listen for events.
Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet switch won't happen. Its way of informing the top level services that vm is ready for vnet switch. So I have created and bound netlink socket before that call so it won't miss any event that happens after that.
Tamilmani Manoharan (tamilmani1989) wrote : | # |
I have few concern over some review comments which i replied inline. Other than that, I'm fine with other comments and will work on addressing those
Ryan Harper (raharper) wrote : | # |
> > Thanks for working on the refactor. This looks a lot better and it's much
> > easier to read and write unittests.
> >
> > I've quite a few inline comments, please look through those. The biggest
> > query is around how do we know when the vnet migration has started to
> prevent
> > us from getting stuck forever if it's already started before we attempt to
> > listen for events.
>
> Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet
> switch won't happen. Its way of informing the top level services that vm is
> ready for vnet switch. So I have created and bound netlink socket before that
> call so it won't miss any event that happens after that.
OK, so, the sequence then is:
1) crawl initial metadata and detect if pre-provision VM, if so write a REPROVISION marker file
2) in _poll_imds create nl socket
3) write a REPORT_READY marker file and call _report_ready()
4) _report_ready() will invoke the shim
5) wait_for_
The window is now between (2) and (5); and the question is what does the netlink socket do with the incoming messages that we won't process until (5)?
From reading the upstream docs on netlink, the default netlink socket size is 32kb, and while the typical messages may be small, the payload for messages is arbitrary. I know that DELLINK and NEWLINK are small but other messages may come through and if we're not processing then the socket could become full.
I suspect that in practice there isn't a lot of netlink traffic on the node with only a single interface up at this point but I'm wondering if we shouldn't employ something that allows us to:
a) spawn a thread to start doing the "Wait" which would start reading messages as they come in on the socket *before* we send the READY flag to the fabric
b) after sending the ready flag, we would join() the thread which would require the main program to wait until the thread exited (ie, we detected the media change event).
Thoughts?
Ryan Harper (raharper) : | # |
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> > > Thanks for working on the refactor. This looks a lot better and it's much
> > > easier to read and write unittests.
> > >
> > > I've quite a few inline comments, please look through those. The biggest
> > > query is around how do we know when the vnet migration has started to
> > prevent
> > > us from getting stuck forever if it's already started before we attempt to
> > > listen for events.
> >
> > Thanks for reviewing it. Unless cloud-init report ready to azure fabric,
> vnet
> > switch won't happen. Its way of informing the top level services that vm is
> > ready for vnet switch. So I have created and bound netlink socket before
> that
> > call so it won't miss any event that happens after that.
>
> OK, so, the sequence then is:
>
> 1) crawl initial metadata and detect if pre-provision VM, if so write a
> REPROVISION marker file
> 2) in _poll_imds create nl socket
> 3) write a REPORT_READY marker file and call _report_ready()
> 4) _report_ready() will invoke the shim
> 5) wait_for_
>
> The window is now between (2) and (5); and the question is what does the
> netlink socket do with the incoming messages that we won't process until (5)?
>
> From reading the upstream docs on netlink, the default netlink socket size is
> 32kb, and while the typical messages may be small, the payload for messages is
> arbitrary. I know that DELLINK and NEWLINK are small but other messages may
> come through and if we're not processing then the socket could become full.
>
> I suspect that in practice there isn't a lot of netlink traffic on the node
> with only a single interface up at this point but I'm wondering if we
> shouldn't employ something that allows us to:
>
> a) spawn a thread to start doing the "Wait" which would start reading messages
> as they come in on the socket *before* we send the READY flag to the fabric
> b) after sending the ready flag, we would join() the thread which would
> require the main program to wait until the thread exited (ie, we detected the
> media change event).
>
> Thoughts?
I don't think we receive that many netlink messages since this happens before systemd-networkd starts. we manually setup an ephemeral network(one interface) to send report ready. Also netlink code listen for only RTMGRP_LINK(network interface create/
Tamilmani Manoharan (tamilmani1989) : | # |
Ryan Harper (raharper) wrote : | # |
OK. Let's update the create_
It would be nice if we could bind to those events and only on a specific ifindex (we could look up the currently up interface) but it's sufficient to filter in the wait_for_
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> OK. Let's update the create_
> details, binding to only RTNLGRP_LINK (which only includes
> RTM_NEWLINK/
>
> It would be nice if we could bind to those events and only on a specific
> ifindex (we could look up the currently up interface) but it's sufficient to
> filter in the wait_for_
ok will update it.
- 58ef420... by Tamilmani Manoharan
-
addressed review comments
- 3d3efee... by Tamilmani Manoharan
-
reverted back the line break change
- 148c88c... by Tamilmani Manoharan
-
added few more unit tests.
- 7e2bba6... by Tamilmani Manoharan
-
Added a test case to validate down up scenario
- 45806ec... by Tamilmani Manoharan
-
cleanup and retry on netlink socket creation error
- cc64c51... by Tamilmani Manoharan
-
optimized test code
Ryan Harper (raharper) wrote : | # |
I couple more in-line requests below. Thanks for incorporating the requested changes.
Tamilmani Manoharan (tamilmani1989) : | # |
- 6e4cfb7... by Tamilmani Manoharan
-
addressed iteration 3 review comments
- 91a4304... by Tamilmani Manoharan
-
Merge branch 'master' of https:/
/git.launchpad. net/cloud- init into azure_networking # Conflicts:
# cloudinit/sources/ DataSourceAzure .py
# tests/unittests/test_datasourc e/test_ azure.py - feb9f10... by Tamilmani Manoharan
-
fixed azure UTs regression with recent changes in master
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> I couple more in-line requests below. Thanks for incorporating the requested
> changes.
Addressed your comments. Please take a look.
Ryan Harper (raharper) : | # |
Tamilmani Manoharan (tamilmani1989) wrote : | # |
thanks for reviewing. I will address your comments. Do you have any concerns other than these comments? Can you sign off once i address below comments?
- 59e5fe9... by Tamilmani Manoharan
-
addressed iteration 4 review comments
- 9c6eb4a... by Tamilmani Manoharan
-
added comments
Tamilmani Manoharan (tamilmani1989) : | # |
- b507379... by Tamilmani Manoharan
-
addressed iteration 5 review comments
Fixed code style error
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:b5073795740
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Chad Smith (chad.smith) : | # |
Chad Smith (chad.smith) wrote : | # |
Leaving this approved pending your comments regarding ignoring iterim state transitions for operstates != DOWN, DORMANT or UP.
- 2fc6e87... by Tamilmani Manoharan
-
addressed iteration 6 comments
- 75c8987... by Tamilmani Manoharan
-
dummy commit to sync with launchpad
Tamilmani Manoharan (tamilmani1989) wrote : | # |
i addressed your comments
- ad8ebf4... by Tamilmani Manoharan
-
moved log inside function as per review comment
Chad Smith (chad.smith) wrote : | # |
Looks good, and unit tests pass. 2 Minor inline followups.
Thanks Tamilmani.
Tamilmani Manoharan (tamilmani1989) : | # |
Ryan Harper (raharper) wrote : | # |
Please review the commit message I added.
I've a couple in-line comments on what we return when reading the attrs, please comment there as well.
- f12813e... by Tamilmani Manoharan
-
fixed typo
Tamilmani Manoharan (tamilmani1989) : | # |
Ryan Harper (raharper) wrote : | # |
One more follow-up on the rta processing.
Tamilmani Manoharan (tamilmani1989) : | # |
- 205864e... by Tamilmani Manoharan
-
return interfaceoperstate as none in case of failure
Ryan Harper (raharper) wrote : | # |
Some comments on debug messages. And another occurred to me about the contents we get back from the socket, wondering if that is only ever a single message or possible a sequence of messages, and if so, don't we need to loop over the data?
Tamilmani Manoharan (tamilmani1989) wrote : | # |
For netlink down/up events we won't get multipart message and i already tested it. Multipart messages will be received only if payload is too big.
- dcf4c86... by Tamilmani Manoharan
-
Addressed reading multiple netlink messages in single receive.
- 978f7ed... by Tamilmani Manoharan
-
added test case for receiving multiple netlink messages in single receive call
- d3fd194... by Tamilmani Manoharan
-
reverted changes in test_azure
- fd355fc... by Tamilmani Manoharan
-
adhere to tox standard
Ryan Harper (raharper) wrote : | # |
Thanks for handling multiple messages in the buffer. Some in-line comments, suggestions. I'd to add a test-case that ensure we parse the socket data buffer to completion and then a second read to the buffer to return the media change sequence so we validate that we handle entering and exiting both sets of loops.
Scott Moser (smoser) wrote : | # |
1 small comment in line.
generally, i wonder what happens here on freebsd.
- 006bf63... by Tamilmani Manoharan
-
Added testcase for reading partial messages
Tamilmani Manoharan (tamilmani1989) wrote : | # |
I responded for your comments.
- 437b25b... by Tamilmani Manoharan
-
modified logs
Tamilmani Manoharan (tamilmani1989) : | # |
Scott Moser (smoser) wrote : | # |
what about freebsd?
what happens there? do we hit your newly added code or is it avoided.
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> what about freebsd?
> what happens there? do we hit your newly added code or is it avoided.
Yes it will hit for freebsd also. Is there any problem with that?
- 85c5a6d... by Tamilmani Manoharan
-
1. Check for free bsd
2. Handle assertion errors
Tamilmani Manoharan (tamilmani1989) wrote : | # |
Scott/Ryan, addressed your comments. Can you sign off?
Ryan Harper (raharper) : | # |
Tamilmani Manoharan (tamilmani1989) : | # |
- c600588... by Tamilmani Manoharan
-
addressed reading partial messages from buffer issue and handled netlink socket close in case of exception
- 77dfb5f... by Tamilmani Manoharan
-
removed blank line at eof
Ryan Harper (raharper) : | # |
- 8157985... by Tamilmani Manoharan
-
removed saving data to another variable
Ryan Harper (raharper) wrote : | # |
Just a spelling fix in the unittests.
I think the core logic in wait_() is solid. Do you have a way to test this branch on an instances doing the pre-provision logic just to ensure we're not outsmarting ourselves with the additional parsing logic ?
Tamilmani Manoharan (tamilmani1989) wrote : | # |
> Just a spelling fix in the unittests.
>
> I think the core logic in wait_() is solid. Do you have a way to test this
> branch on an instances doing the pre-provision logic just to ensure we're not
> outsmarting ourselves with the additional parsing logic ?
Yes i already tested that multiple times. I push image(with my changes) as vhd in azure marketplace and then deploy pps vm using that.
- 4d7dde8... by Tamilmani Manoharan
-
fixed spell error
Ryan Harper (raharper) wrote : | # |
Thank you for the hard work and many changes.
- ea309e3... by Tamilmani Manoharan
-
Fixed code style errors
Sushant Sharma (sushantsharma) wrote : | # |
Thanks a lot Tamilmani!
Scott Moser (smoser) : | # |
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:ea309e371cb
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
- 5917a00... by Tamilmani Manoharan
-
Merge branch 'master' of https:/
/git.launchpad. net/cloud- init into azure_networking - 9bc4a08... by Tamilmani Manoharan
-
There are 2 fixes:
1. if timeout happens, read returns none and concatenate data after that check
2. While resolving merge conflicts, removed else part accidentally. Restored the else condition.
3. Also reverted testcase changes - 56559c2... by Tamilmani Manoharan
-
Fixed code style error
Server Team CI bot (server-team-bot) wrote : | # |
FAILED: Continuous integration, rev:9bc4a08cbfb
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:56559c2548f
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Preview Diff
1 | diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py |
2 | index be82ec4..e076d5d 100644 |
3 | --- a/cloudinit/sources/DataSourceAzure.py |
4 | +++ b/cloudinit/sources/DataSourceAzure.py |
5 | @@ -22,6 +22,7 @@ from cloudinit.event import EventType |
6 | from cloudinit.net.dhcp import EphemeralDHCPv4 |
7 | from cloudinit import sources |
8 | from cloudinit.sources.helpers.azure import get_metadata_from_fabric |
9 | +from cloudinit.sources.helpers import netlink |
10 | from cloudinit.url_helper import UrlError, readurl, retry_on_url_exc |
11 | from cloudinit import util |
12 | |
13 | @@ -409,6 +410,10 @@ class DataSourceAzure(sources.DataSource): |
14 | |
15 | perform_reprovision = reprovision or self._should_reprovision(ret) |
16 | if perform_reprovision: |
17 | + if util.is_FreeBSD(): |
18 | + msg = "Free BSD is not supported for PPS VMs" |
19 | + LOG.error(msg) |
20 | + raise sources.InvalidMetaDataException(msg) |
21 | ret = self._reprovision() |
22 | imds_md = get_metadata_from_imds( |
23 | self.fallback_interface, retries=3) |
24 | @@ -523,8 +528,8 @@ class DataSourceAzure(sources.DataSource): |
25 | response. Then return the returned JSON object.""" |
26 | url = IMDS_URL + "reprovisiondata?api-version=2017-04-02" |
27 | headers = {"Metadata": "true"} |
28 | + nl_sock = None |
29 | report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE)) |
30 | - LOG.debug("Start polling IMDS") |
31 | |
32 | def exc_cb(msg, exception): |
33 | if isinstance(exception, UrlError) and exception.code == 404: |
34 | @@ -533,12 +538,19 @@ class DataSourceAzure(sources.DataSource): |
35 | # call DHCP and setup the ephemeral network to acquire the new IP. |
36 | return False |
37 | |
38 | + LOG.debug("Wait for vnetswitch to happen") |
39 | while True: |
40 | try: |
41 | # Save our EphemeralDHCPv4 context so we avoid repeated dhcp |
42 | self._ephemeral_dhcp_ctx = EphemeralDHCPv4() |
43 | lease = self._ephemeral_dhcp_ctx.obtain_lease() |
44 | if report_ready: |
45 | + try: |
46 | + nl_sock = netlink.create_bound_netlink_socket() |
47 | + except netlink.NetlinkCreateSocketError as e: |
48 | + LOG.warning(e) |
49 | + self._ephemeral_dhcp_ctx.clean_network() |
50 | + return |
51 | path = REPORTED_READY_MARKER_FILE |
52 | LOG.info( |
53 | "Creating a marker file to report ready: %s", path) |
54 | @@ -546,13 +558,24 @@ class DataSourceAzure(sources.DataSource): |
55 | pid=os.getpid(), time=time())) |
56 | self._report_ready(lease=lease) |
57 | report_ready = False |
58 | - return readurl(url, timeout=1, headers=headers, |
59 | - exception_cb=exc_cb, infinite=True, |
60 | - log_req_resp=False).contents |
61 | + try: |
62 | + netlink.wait_for_media_disconnect_connect( |
63 | + nl_sock, lease['interface']) |
64 | + except AssertionError as error: |
65 | + LOG.error(error) |
66 | + return |
67 | + self._ephemeral_dhcp_ctx.clean_network() |
68 | + else: |
69 | + return readurl(url, timeout=1, headers=headers, |
70 | + exception_cb=exc_cb, infinite=True, |
71 | + log_req_resp=False).contents |
72 | except UrlError: |
73 | # Teardown our EphemeralDHCPv4 context on failure as we retry |
74 | self._ephemeral_dhcp_ctx.clean_network() |
75 | pass |
76 | + finally: |
77 | + if nl_sock: |
78 | + nl_sock.close() |
79 | |
80 | def _report_ready(self, lease): |
81 | """Tells the fabric provisioning has completed """ |
82 | diff --git a/cloudinit/sources/helpers/netlink.py b/cloudinit/sources/helpers/netlink.py |
83 | new file mode 100644 |
84 | index 0000000..d377ae3 |
85 | --- /dev/null |
86 | +++ b/cloudinit/sources/helpers/netlink.py |
87 | @@ -0,0 +1,250 @@ |
88 | +# Author: Tamilmani Manoharan <tamanoha@microsoft.com> |
89 | +# |
90 | +# This file is part of cloud-init. See LICENSE file for license information. |
91 | + |
92 | +from cloudinit import log as logging |
93 | +from cloudinit import util |
94 | +from collections import namedtuple |
95 | + |
96 | +import os |
97 | +import select |
98 | +import socket |
99 | +import struct |
100 | + |
101 | +LOG = logging.getLogger(__name__) |
102 | + |
103 | +# http://man7.org/linux/man-pages/man7/netlink.7.html |
104 | +RTMGRP_LINK = 1 |
105 | +NLMSG_NOOP = 1 |
106 | +NLMSG_ERROR = 2 |
107 | +NLMSG_DONE = 3 |
108 | +RTM_NEWLINK = 16 |
109 | +RTM_DELLINK = 17 |
110 | +RTM_GETLINK = 18 |
111 | +RTM_SETLINK = 19 |
112 | +MAX_SIZE = 65535 |
113 | +RTA_DATA_OFFSET = 32 |
114 | +MSG_TYPE_OFFSET = 16 |
115 | +SELECT_TIMEOUT = 60 |
116 | + |
117 | +NLMSGHDR_FMT = "IHHII" |
118 | +IFINFOMSG_FMT = "BHiII" |
119 | +NLMSGHDR_SIZE = struct.calcsize(NLMSGHDR_FMT) |
120 | +IFINFOMSG_SIZE = struct.calcsize(IFINFOMSG_FMT) |
121 | +RTATTR_START_OFFSET = NLMSGHDR_SIZE + IFINFOMSG_SIZE |
122 | +RTA_DATA_START_OFFSET = 4 |
123 | +PAD_ALIGNMENT = 4 |
124 | + |
125 | +IFLA_IFNAME = 3 |
126 | +IFLA_OPERSTATE = 16 |
127 | + |
128 | +# https://www.kernel.org/doc/Documentation/networking/operstates.txt |
129 | +OPER_UNKNOWN = 0 |
130 | +OPER_NOTPRESENT = 1 |
131 | +OPER_DOWN = 2 |
132 | +OPER_LOWERLAYERDOWN = 3 |
133 | +OPER_TESTING = 4 |
134 | +OPER_DORMANT = 5 |
135 | +OPER_UP = 6 |
136 | + |
137 | +RTAAttr = namedtuple('RTAAttr', ['length', 'rta_type', 'data']) |
138 | +InterfaceOperstate = namedtuple('InterfaceOperstate', ['ifname', 'operstate']) |
139 | +NetlinkHeader = namedtuple('NetlinkHeader', ['length', 'type', 'flags', 'seq', |
140 | + 'pid']) |
141 | + |
142 | + |
143 | +class NetlinkCreateSocketError(RuntimeError): |
144 | + '''Raised if netlink socket fails during create or bind.''' |
145 | + pass |
146 | + |
147 | + |
148 | +def create_bound_netlink_socket(): |
149 | + '''Creates netlink socket and bind on netlink group to catch interface |
150 | + down/up events. The socket will bound only on RTMGRP_LINK (which only |
151 | + includes RTM_NEWLINK/RTM_DELLINK/RTM_GETLINK events). The socket is set to |
152 | + non-blocking mode since we're only receiving messages. |
153 | + |
154 | + :returns: netlink socket in non-blocking mode |
155 | + :raises: NetlinkCreateSocketError |
156 | + ''' |
157 | + try: |
158 | + netlink_socket = socket.socket(socket.AF_NETLINK, |
159 | + socket.SOCK_RAW, |
160 | + socket.NETLINK_ROUTE) |
161 | + netlink_socket.bind((os.getpid(), RTMGRP_LINK)) |
162 | + netlink_socket.setblocking(0) |
163 | + except socket.error as e: |
164 | + msg = "Exception during netlink socket create: %s" % e |
165 | + raise NetlinkCreateSocketError(msg) |
166 | + LOG.debug("Created netlink socket") |
167 | + return netlink_socket |
168 | + |
169 | + |
170 | +def get_netlink_msg_header(data): |
171 | + '''Gets netlink message type and length |
172 | + |
173 | + :param: data read from netlink socket |
174 | + :returns: netlink message type |
175 | + :raises: AssertionError if data is None or data is not >= NLMSGHDR_SIZE |
176 | + struct nlmsghdr { |
177 | + __u32 nlmsg_len; /* Length of message including header */ |
178 | + __u16 nlmsg_type; /* Type of message content */ |
179 | + __u16 nlmsg_flags; /* Additional flags */ |
180 | + __u32 nlmsg_seq; /* Sequence number */ |
181 | + __u32 nlmsg_pid; /* Sender port ID */ |
182 | + }; |
183 | + ''' |
184 | + assert (data is not None), ("data is none") |
185 | + assert (len(data) >= NLMSGHDR_SIZE), ( |
186 | + "data is smaller than netlink message header") |
187 | + msg_len, msg_type, flags, seq, pid = struct.unpack(NLMSGHDR_FMT, |
188 | + data[:MSG_TYPE_OFFSET]) |
189 | + LOG.debug("Got netlink msg of type %d", msg_type) |
190 | + return NetlinkHeader(msg_len, msg_type, flags, seq, pid) |
191 | + |
192 | + |
193 | +def read_netlink_socket(netlink_socket, timeout=None): |
194 | + '''Select and read from the netlink socket if ready. |
195 | + |
196 | + :param: netlink_socket: specify which socket object to read from |
197 | + :param: timeout: specify a timeout value (integer) to wait while reading, |
198 | + if none, it will block indefinitely until socket ready for read |
199 | + :returns: string of data read (max length = <MAX_SIZE>) from socket, |
200 | + if no data read, returns None |
201 | + :raises: AssertionError if netlink_socket is None |
202 | + ''' |
203 | + assert (netlink_socket is not None), ("netlink socket is none") |
204 | + read_set, _, _ = select.select([netlink_socket], [], [], timeout) |
205 | + # Incase of timeout,read_set doesn't contain netlink socket. |
206 | + # just return from this function |
207 | + if netlink_socket not in read_set: |
208 | + return None |
209 | + LOG.debug("netlink socket ready for read") |
210 | + data = netlink_socket.recv(MAX_SIZE) |
211 | + if data is None: |
212 | + LOG.error("Reading from Netlink socket returned no data") |
213 | + return data |
214 | + |
215 | + |
216 | +def unpack_rta_attr(data, offset): |
217 | + '''Unpack a single rta attribute. |
218 | + |
219 | + :param: data: string of data read from netlink socket |
220 | + :param: offset: starting offset of RTA Attribute |
221 | + :return: RTAAttr object with length, type and data. On error, return None. |
222 | + :raises: AssertionError if data is None or offset is not integer. |
223 | + ''' |
224 | + assert (data is not None), ("data is none") |
225 | + assert (type(offset) == int), ("offset is not integer") |
226 | + assert (offset >= RTATTR_START_OFFSET), ( |
227 | + "rta offset is less than expected length") |
228 | + length = rta_type = 0 |
229 | + attr_data = None |
230 | + try: |
231 | + length = struct.unpack_from("H", data, offset=offset)[0] |
232 | + rta_type = struct.unpack_from("H", data, offset=offset+2)[0] |
233 | + except struct.error: |
234 | + return None # Should mean our offset is >= remaining data |
235 | + |
236 | + # Unpack just the attribute's data. Offset by 4 to skip length/type header |
237 | + attr_data = data[offset+RTA_DATA_START_OFFSET:offset+length] |
238 | + return RTAAttr(length, rta_type, attr_data) |
239 | + |
240 | + |
241 | +def read_rta_oper_state(data): |
242 | + '''Reads Interface name and operational state from RTA Data. |
243 | + |
244 | + :param: data: string of data read from netlink socket |
245 | + :returns: InterfaceOperstate object containing if_name and oper_state. |
246 | + None if data does not contain valid IFLA_OPERSTATE and |
247 | + IFLA_IFNAME messages. |
248 | + :raises: AssertionError if data is None or length of data is |
249 | + smaller than RTATTR_START_OFFSET. |
250 | + ''' |
251 | + assert (data is not None), ("data is none") |
252 | + assert (len(data) > RTATTR_START_OFFSET), ( |
253 | + "length of data is smaller than RTATTR_START_OFFSET") |
254 | + ifname = operstate = None |
255 | + offset = RTATTR_START_OFFSET |
256 | + while offset <= len(data): |
257 | + attr = unpack_rta_attr(data, offset) |
258 | + if not attr or attr.length == 0: |
259 | + break |
260 | + # Each attribute is 4-byte aligned. Determine pad length. |
261 | + padlen = (PAD_ALIGNMENT - |
262 | + (attr.length % PAD_ALIGNMENT)) % PAD_ALIGNMENT |
263 | + offset += attr.length + padlen |
264 | + |
265 | + if attr.rta_type == IFLA_OPERSTATE: |
266 | + operstate = ord(attr.data) |
267 | + elif attr.rta_type == IFLA_IFNAME: |
268 | + interface_name = util.decode_binary(attr.data, 'utf-8') |
269 | + ifname = interface_name.strip('\0') |
270 | + if not ifname or operstate is None: |
271 | + return None |
272 | + LOG.debug("rta attrs: ifname %s operstate %d", ifname, operstate) |
273 | + return InterfaceOperstate(ifname, operstate) |
274 | + |
275 | + |
276 | +def wait_for_media_disconnect_connect(netlink_socket, ifname): |
277 | + '''Block until media disconnect and connect has happened on an interface. |
278 | + Listens on netlink socket to receive netlink events and when the carrier |
279 | + changes from 0 to 1, it considers event has happened and |
280 | + return from this function |
281 | + |
282 | + :param: netlink_socket: netlink_socket to receive events |
283 | + :param: ifname: Interface name to lookout for netlink events |
284 | + :raises: AssertionError if netlink_socket is None or ifname is None. |
285 | + ''' |
286 | + assert (netlink_socket is not None), ("netlink socket is none") |
287 | + assert (ifname is not None), ("interface name is none") |
288 | + assert (len(ifname) > 0), ("interface name cannot be empty") |
289 | + carrier = OPER_UP |
290 | + prevCarrier = OPER_UP |
291 | + data = bytes() |
292 | + LOG.debug("Wait for media disconnect and reconnect to happen") |
293 | + while True: |
294 | + recv_data = read_netlink_socket(netlink_socket, SELECT_TIMEOUT) |
295 | + if recv_data is None: |
296 | + continue |
297 | + LOG.debug('read %d bytes from socket', len(recv_data)) |
298 | + data += recv_data |
299 | + LOG.debug('Length of data after concat %d', len(data)) |
300 | + offset = 0 |
301 | + datalen = len(data) |
302 | + while offset < datalen: |
303 | + nl_msg = data[offset:] |
304 | + if len(nl_msg) < NLMSGHDR_SIZE: |
305 | + LOG.debug("Data is smaller than netlink header") |
306 | + break |
307 | + nlheader = get_netlink_msg_header(nl_msg) |
308 | + if len(nl_msg) < nlheader.length: |
309 | + LOG.debug("Partial data. Smaller than netlink message") |
310 | + break |
311 | + padlen = (nlheader.length+PAD_ALIGNMENT-1) & ~(PAD_ALIGNMENT-1) |
312 | + offset = offset + padlen |
313 | + LOG.debug('offset to next netlink message: %d', offset) |
314 | + # Ignore any messages not new link or del link |
315 | + if nlheader.type not in [RTM_NEWLINK, RTM_DELLINK]: |
316 | + continue |
317 | + interface_state = read_rta_oper_state(nl_msg) |
318 | + if interface_state is None: |
319 | + LOG.debug('Failed to read rta attributes: %s', interface_state) |
320 | + continue |
321 | + if interface_state.ifname != ifname: |
322 | + LOG.debug( |
323 | + "Ignored netlink event on interface %s. Waiting for %s.", |
324 | + interface_state.ifname, ifname) |
325 | + continue |
326 | + if interface_state.operstate not in [OPER_UP, OPER_DOWN]: |
327 | + continue |
328 | + prevCarrier = carrier |
329 | + carrier = interface_state.operstate |
330 | + # check for carrier down, up sequence |
331 | + isVnetSwitch = (prevCarrier == OPER_DOWN) and (carrier == OPER_UP) |
332 | + if isVnetSwitch: |
333 | + LOG.debug("Media switch happened on %s.", ifname) |
334 | + return |
335 | + data = data[offset:] |
336 | + |
337 | +# vi: ts=4 expandtab |
338 | diff --git a/cloudinit/sources/helpers/tests/test_netlink.py b/cloudinit/sources/helpers/tests/test_netlink.py |
339 | new file mode 100644 |
340 | index 0000000..c2898a1 |
341 | --- /dev/null |
342 | +++ b/cloudinit/sources/helpers/tests/test_netlink.py |
343 | @@ -0,0 +1,373 @@ |
344 | +# Author: Tamilmani Manoharan <tamanoha@microsoft.com> |
345 | +# |
346 | +# This file is part of cloud-init. See LICENSE file for license information. |
347 | + |
348 | +from cloudinit.tests.helpers import CiTestCase, mock |
349 | +import socket |
350 | +import struct |
351 | +import codecs |
352 | +from cloudinit.sources.helpers.netlink import ( |
353 | + NetlinkCreateSocketError, create_bound_netlink_socket, read_netlink_socket, |
354 | + read_rta_oper_state, unpack_rta_attr, wait_for_media_disconnect_connect, |
355 | + OPER_DOWN, OPER_UP, OPER_DORMANT, OPER_LOWERLAYERDOWN, OPER_NOTPRESENT, |
356 | + OPER_TESTING, OPER_UNKNOWN, RTATTR_START_OFFSET, RTM_NEWLINK, RTM_SETLINK, |
357 | + RTM_GETLINK, MAX_SIZE) |
358 | + |
359 | + |
360 | +def int_to_bytes(i): |
361 | + '''convert integer to binary: eg: 1 to \x01''' |
362 | + hex_value = '{0:x}'.format(i) |
363 | + hex_value = '0' * (len(hex_value) % 2) + hex_value |
364 | + return codecs.decode(hex_value, 'hex_codec') |
365 | + |
366 | + |
367 | +class TestCreateBoundNetlinkSocket(CiTestCase): |
368 | + |
369 | + @mock.patch('cloudinit.sources.helpers.netlink.socket.socket') |
370 | + def test_socket_error_on_create(self, m_socket): |
371 | + '''create_bound_netlink_socket catches socket creation exception''' |
372 | + |
373 | + """NetlinkCreateSocketError is raised when socket creation errors.""" |
374 | + m_socket.side_effect = socket.error("Fake socket failure") |
375 | + with self.assertRaises(NetlinkCreateSocketError) as ctx_mgr: |
376 | + create_bound_netlink_socket() |
377 | + self.assertEqual( |
378 | + 'Exception during netlink socket create: Fake socket failure', |
379 | + str(ctx_mgr.exception)) |
380 | + |
381 | + |
382 | +class TestReadNetlinkSocket(CiTestCase): |
383 | + |
384 | + @mock.patch('cloudinit.sources.helpers.netlink.socket.socket') |
385 | + @mock.patch('cloudinit.sources.helpers.netlink.select.select') |
386 | + def test_read_netlink_socket(self, m_select, m_socket): |
387 | + '''read_netlink_socket able to receive data''' |
388 | + data = 'netlinktest' |
389 | + m_select.return_value = [m_socket], None, None |
390 | + m_socket.recv.return_value = data |
391 | + recv_data = read_netlink_socket(m_socket, 2) |
392 | + m_select.assert_called_with([m_socket], [], [], 2) |
393 | + m_socket.recv.assert_called_with(MAX_SIZE) |
394 | + self.assertIsNotNone(recv_data) |
395 | + self.assertEqual(recv_data, data) |
396 | + |
397 | + @mock.patch('cloudinit.sources.helpers.netlink.socket.socket') |
398 | + @mock.patch('cloudinit.sources.helpers.netlink.select.select') |
399 | + def test_netlink_read_timeout(self, m_select, m_socket): |
400 | + '''read_netlink_socket should timeout if nothing to read''' |
401 | + m_select.return_value = [], None, None |
402 | + data = read_netlink_socket(m_socket, 1) |
403 | + m_select.assert_called_with([m_socket], [], [], 1) |
404 | + self.assertEqual(m_socket.recv.call_count, 0) |
405 | + self.assertIsNone(data) |
406 | + |
407 | + def test_read_invalid_socket(self): |
408 | + '''read_netlink_socket raises assert error if socket is invalid''' |
409 | + socket = None |
410 | + with self.assertRaises(AssertionError) as context: |
411 | + read_netlink_socket(socket, 1) |
412 | + self.assertTrue('netlink socket is none' in str(context.exception)) |
413 | + |
414 | + |
415 | +class TestParseNetlinkMessage(CiTestCase): |
416 | + |
417 | + def test_read_rta_oper_state(self): |
418 | + '''read_rta_oper_state could parse netlink message and extract data''' |
419 | + ifname = "eth0" |
420 | + bytes = ifname.encode("utf-8") |
421 | + buf = bytearray(48) |
422 | + struct.pack_into("HH4sHHc", buf, RTATTR_START_OFFSET, 8, 3, bytes, 5, |
423 | + 16, int_to_bytes(OPER_DOWN)) |
424 | + interface_state = read_rta_oper_state(buf) |
425 | + self.assertEqual(interface_state.ifname, ifname) |
426 | + self.assertEqual(interface_state.operstate, OPER_DOWN) |
427 | + |
428 | + def test_read_none_data(self): |
429 | + '''read_rta_oper_state raises assert error if data is none''' |
430 | + data = None |
431 | + with self.assertRaises(AssertionError) as context: |
432 | + read_rta_oper_state(data) |
433 | + self.assertTrue('data is none', str(context.exception)) |
434 | + |
435 | + def test_read_invalid_rta_operstate_none(self): |
436 | + '''read_rta_oper_state returns none if operstate is none''' |
437 | + ifname = "eth0" |
438 | + buf = bytearray(40) |
439 | + bytes = ifname.encode("utf-8") |
440 | + struct.pack_into("HH4s", buf, RTATTR_START_OFFSET, 8, 3, bytes) |
441 | + interface_state = read_rta_oper_state(buf) |
442 | + self.assertIsNone(interface_state) |
443 | + |
444 | + def test_read_invalid_rta_ifname_none(self): |
445 | + '''read_rta_oper_state returns none if ifname is none''' |
446 | + buf = bytearray(40) |
447 | + struct.pack_into("HHc", buf, RTATTR_START_OFFSET, 5, 16, |
448 | + int_to_bytes(OPER_DOWN)) |
449 | + interface_state = read_rta_oper_state(buf) |
450 | + self.assertIsNone(interface_state) |
451 | + |
452 | + def test_read_invalid_data_len(self): |
453 | + '''raise assert error if data size is smaller than required size''' |
454 | + buf = bytearray(32) |
455 | + with self.assertRaises(AssertionError) as context: |
456 | + read_rta_oper_state(buf) |
457 | + self.assertTrue('length of data is smaller than RTATTR_START_OFFSET' in |
458 | + str(context.exception)) |
459 | + |
460 | + def test_unpack_rta_attr_none_data(self): |
461 | + '''unpack_rta_attr raises assert error if data is none''' |
462 | + data = None |
463 | + with self.assertRaises(AssertionError) as context: |
464 | + unpack_rta_attr(data, RTATTR_START_OFFSET) |
465 | + self.assertTrue('data is none' in str(context.exception)) |
466 | + |
467 | + def test_unpack_rta_attr_invalid_offset(self): |
468 | + '''unpack_rta_attr raises assert error if offset is invalid''' |
469 | + data = bytearray(48) |
470 | + with self.assertRaises(AssertionError) as context: |
471 | + unpack_rta_attr(data, "offset") |
472 | + self.assertTrue('offset is not integer' in str(context.exception)) |
473 | + with self.assertRaises(AssertionError) as context: |
474 | + unpack_rta_attr(data, 31) |
475 | + self.assertTrue('rta offset is less than expected length' in |
476 | + str(context.exception)) |
477 | + |
478 | + |
479 | +@mock.patch('cloudinit.sources.helpers.netlink.socket.socket') |
480 | +@mock.patch('cloudinit.sources.helpers.netlink.read_netlink_socket') |
481 | +class TestWaitForMediaDisconnectConnect(CiTestCase): |
482 | + with_logs = True |
483 | + |
484 | + def _media_switch_data(self, ifname, msg_type, operstate): |
485 | + '''construct netlink data with specified fields''' |
486 | + if ifname and operstate is not None: |
487 | + data = bytearray(48) |
488 | + bytes = ifname.encode("utf-8") |
489 | + struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3, |
490 | + bytes, 5, 16, int_to_bytes(operstate)) |
491 | + elif ifname: |
492 | + data = bytearray(40) |
493 | + bytes = ifname.encode("utf-8") |
494 | + struct.pack_into("HH4s", data, RTATTR_START_OFFSET, 8, 3, bytes) |
495 | + elif operstate: |
496 | + data = bytearray(40) |
497 | + struct.pack_into("HHc", data, RTATTR_START_OFFSET, 5, 16, |
498 | + int_to_bytes(operstate)) |
499 | + struct.pack_into("=LHHLL", data, 0, len(data), msg_type, 0, 0, 0) |
500 | + return data |
501 | + |
502 | + def test_media_down_up_scenario(self, m_read_netlink_socket, |
503 | + m_socket): |
504 | + '''Test for media down up sequence for required interface name''' |
505 | + ifname = "eth0" |
506 | + # construct data for Oper State down |
507 | + data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN) |
508 | + # construct data for Oper State up |
509 | + data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP) |
510 | + m_read_netlink_socket.side_effect = [data_op_down, data_op_up] |
511 | + wait_for_media_disconnect_connect(m_socket, ifname) |
512 | + self.assertEqual(m_read_netlink_socket.call_count, 2) |
513 | + |
514 | + def test_wait_for_media_switch_diff_interface(self, m_read_netlink_socket, |
515 | + m_socket): |
516 | + '''wait_for_media_disconnect_connect ignores unexpected interfaces. |
517 | + |
518 | + The first two messages are for other interfaces and last two are for |
519 | + expected interface. So the function exit only after receiving last |
520 | + 2 messages and therefore the call count for m_read_netlink_socket |
521 | + has to be 4 |
522 | + ''' |
523 | + other_ifname = "eth1" |
524 | + expected_ifname = "eth0" |
525 | + data_op_down_eth1 = self._media_switch_data( |
526 | + other_ifname, RTM_NEWLINK, OPER_DOWN) |
527 | + data_op_up_eth1 = self._media_switch_data( |
528 | + other_ifname, RTM_NEWLINK, OPER_UP) |
529 | + data_op_down_eth0 = self._media_switch_data( |
530 | + expected_ifname, RTM_NEWLINK, OPER_DOWN) |
531 | + data_op_up_eth0 = self._media_switch_data( |
532 | + expected_ifname, RTM_NEWLINK, OPER_UP) |
533 | + m_read_netlink_socket.side_effect = [data_op_down_eth1, |
534 | + data_op_up_eth1, |
535 | + data_op_down_eth0, |
536 | + data_op_up_eth0] |
537 | + wait_for_media_disconnect_connect(m_socket, expected_ifname) |
538 | + self.assertIn('Ignored netlink event on interface %s' % other_ifname, |
539 | + self.logs.getvalue()) |
540 | + self.assertEqual(m_read_netlink_socket.call_count, 4) |
541 | + |
542 | + def test_invalid_msgtype_getlink(self, m_read_netlink_socket, m_socket): |
543 | + '''wait_for_media_disconnect_connect ignores GETLINK events. |
544 | + |
545 | + The first two messages are for oper down and up for RTM_GETLINK type |
546 | + which netlink module will ignore. The last 2 messages are RTM_NEWLINK |
547 | + with oper state down and up messages. Therefore the call count for |
548 | + m_read_netlink_socket has to be 4 ignoring first 2 messages |
549 | + of RTM_GETLINK |
550 | + ''' |
551 | + ifname = "eth0" |
552 | + data_getlink_down = self._media_switch_data( |
553 | + ifname, RTM_GETLINK, OPER_DOWN) |
554 | + data_getlink_up = self._media_switch_data( |
555 | + ifname, RTM_GETLINK, OPER_UP) |
556 | + data_newlink_down = self._media_switch_data( |
557 | + ifname, RTM_NEWLINK, OPER_DOWN) |
558 | + data_newlink_up = self._media_switch_data( |
559 | + ifname, RTM_NEWLINK, OPER_UP) |
560 | + m_read_netlink_socket.side_effect = [data_getlink_down, |
561 | + data_getlink_up, |
562 | + data_newlink_down, |
563 | + data_newlink_up] |
564 | + wait_for_media_disconnect_connect(m_socket, ifname) |
565 | + self.assertEqual(m_read_netlink_socket.call_count, 4) |
566 | + |
567 | + def test_invalid_msgtype_setlink(self, m_read_netlink_socket, m_socket): |
568 | + '''wait_for_media_disconnect_connect ignores SETLINK events. |
569 | + |
570 | + The first two messages are for oper down and up for RTM_GETLINK type |
571 | + which it will ignore. 3rd and 4th messages are RTM_NEWLINK with down |
572 | + and up messages. This function should exit after 4th messages since it |
573 | + sees down->up scenario. So the call count for m_read_netlink_socket |
574 | + has to be 4 ignoring first 2 messages of RTM_GETLINK and |
575 | + last 2 messages of RTM_NEWLINK |
576 | + ''' |
577 | + ifname = "eth0" |
578 | + data_setlink_down = self._media_switch_data( |
579 | + ifname, RTM_SETLINK, OPER_DOWN) |
580 | + data_setlink_up = self._media_switch_data( |
581 | + ifname, RTM_SETLINK, OPER_UP) |
582 | + data_newlink_down = self._media_switch_data( |
583 | + ifname, RTM_NEWLINK, OPER_DOWN) |
584 | + data_newlink_up = self._media_switch_data( |
585 | + ifname, RTM_NEWLINK, OPER_UP) |
586 | + m_read_netlink_socket.side_effect = [data_setlink_down, |
587 | + data_setlink_up, |
588 | + data_newlink_down, |
589 | + data_newlink_up, |
590 | + data_newlink_down, |
591 | + data_newlink_up] |
592 | + wait_for_media_disconnect_connect(m_socket, ifname) |
593 | + self.assertEqual(m_read_netlink_socket.call_count, 4) |
594 | + |
595 | + def test_netlink_invalid_switch_scenario(self, m_read_netlink_socket, |
596 | + m_socket): |
597 | + '''returns only if it receives UP event after a DOWN event''' |
598 | + ifname = "eth0" |
599 | + data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN) |
600 | + data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP) |
601 | + data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK, |
602 | + OPER_DORMANT) |
603 | + data_op_notpresent = self._media_switch_data(ifname, RTM_NEWLINK, |
604 | + OPER_NOTPRESENT) |
605 | + data_op_lowerdown = self._media_switch_data(ifname, RTM_NEWLINK, |
606 | + OPER_LOWERLAYERDOWN) |
607 | + data_op_testing = self._media_switch_data(ifname, RTM_NEWLINK, |
608 | + OPER_TESTING) |
609 | + data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK, |
610 | + OPER_UNKNOWN) |
611 | + m_read_netlink_socket.side_effect = [data_op_up, data_op_up, |
612 | + data_op_dormant, data_op_up, |
613 | + data_op_notpresent, data_op_up, |
614 | + data_op_lowerdown, data_op_up, |
615 | + data_op_testing, data_op_up, |
616 | + data_op_unknown, data_op_up, |
617 | + data_op_down, data_op_up] |
618 | + wait_for_media_disconnect_connect(m_socket, ifname) |
619 | + self.assertEqual(m_read_netlink_socket.call_count, 14) |
620 | + |
621 | + def test_netlink_valid_inbetween_transitions(self, m_read_netlink_socket, |
622 | + m_socket): |
623 | + '''wait_for_media_disconnect_connect handles in between transitions''' |
624 | + ifname = "eth0" |
625 | + data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN) |
626 | + data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP) |
627 | + data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK, |
628 | + OPER_DORMANT) |
629 | + data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK, |
630 | + OPER_UNKNOWN) |
631 | + m_read_netlink_socket.side_effect = [data_op_down, data_op_dormant, |
632 | + data_op_unknown, data_op_up] |
633 | + wait_for_media_disconnect_connect(m_socket, ifname) |
634 | + self.assertEqual(m_read_netlink_socket.call_count, 4) |
635 | + |
636 | + def test_netlink_invalid_operstate(self, m_read_netlink_socket, m_socket): |
637 | + '''wait_for_media_disconnect_connect should handle invalid operstates. |
638 | + |
639 | + The function should not fail and return even if it receives invalid |
640 | + operstates. It always should wait for down up sequence. |
641 | + ''' |
642 | + ifname = "eth0" |
643 | + data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN) |
644 | + data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP) |
645 | + data_op_invalid = self._media_switch_data(ifname, RTM_NEWLINK, 7) |
646 | + m_read_netlink_socket.side_effect = [data_op_invalid, data_op_up, |
647 | + data_op_down, data_op_invalid, |
648 | + data_op_up] |
649 | + wait_for_media_disconnect_connect(m_socket, ifname) |
650 | + self.assertEqual(m_read_netlink_socket.call_count, 5) |
651 | + |
652 | + def test_wait_invalid_socket(self, m_read_netlink_socket, m_socket): |
653 | + '''wait_for_media_disconnect_connect handle none netlink socket.''' |
654 | + socket = None |
655 | + ifname = "eth0" |
656 | + with self.assertRaises(AssertionError) as context: |
657 | + wait_for_media_disconnect_connect(socket, ifname) |
658 | + self.assertTrue('netlink socket is none' in str(context.exception)) |
659 | + |
660 | + def test_wait_invalid_ifname(self, m_read_netlink_socket, m_socket): |
661 | + '''wait_for_media_disconnect_connect handle none interface name''' |
662 | + ifname = None |
663 | + with self.assertRaises(AssertionError) as context: |
664 | + wait_for_media_disconnect_connect(m_socket, ifname) |
665 | + self.assertTrue('interface name is none' in str(context.exception)) |
666 | + ifname = "" |
667 | + with self.assertRaises(AssertionError) as context: |
668 | + wait_for_media_disconnect_connect(m_socket, ifname) |
669 | + self.assertTrue('interface name cannot be empty' in |
670 | + str(context.exception)) |
671 | + |
672 | + def test_wait_invalid_rta_attr(self, m_read_netlink_socket, m_socket): |
673 | + ''' wait_for_media_disconnect_connect handles invalid rta data''' |
674 | + ifname = "eth0" |
675 | + data_invalid1 = self._media_switch_data(None, RTM_NEWLINK, OPER_DOWN) |
676 | + data_invalid2 = self._media_switch_data(ifname, RTM_NEWLINK, None) |
677 | + data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN) |
678 | + data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP) |
679 | + m_read_netlink_socket.side_effect = [data_invalid1, data_invalid2, |
680 | + data_op_down, data_op_up] |
681 | + wait_for_media_disconnect_connect(m_socket, ifname) |
682 | + self.assertEqual(m_read_netlink_socket.call_count, 4) |
683 | + |
684 | + def test_read_multiple_netlink_msgs(self, m_read_netlink_socket, m_socket): |
685 | + '''Read multiple messages in single receive call''' |
686 | + ifname = "eth0" |
687 | + bytes = ifname.encode("utf-8") |
688 | + data = bytearray(96) |
689 | + struct.pack_into("=LHHLL", data, 0, 48, RTM_NEWLINK, 0, 0, 0) |
690 | + struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3, |
691 | + bytes, 5, 16, int_to_bytes(OPER_DOWN)) |
692 | + struct.pack_into("=LHHLL", data, 48, 48, RTM_NEWLINK, 0, 0, 0) |
693 | + struct.pack_into("HH4sHHc", data, 48 + RTATTR_START_OFFSET, 8, |
694 | + 3, bytes, 5, 16, int_to_bytes(OPER_UP)) |
695 | + m_read_netlink_socket.return_value = data |
696 | + wait_for_media_disconnect_connect(m_socket, ifname) |
697 | + self.assertEqual(m_read_netlink_socket.call_count, 1) |
698 | + |
699 | + def test_read_partial_netlink_msgs(self, m_read_netlink_socket, m_socket): |
700 | + '''Read partial messages in receive call''' |
701 | + ifname = "eth0" |
702 | + bytes = ifname.encode("utf-8") |
703 | + data1 = bytearray(112) |
704 | + data2 = bytearray(32) |
705 | + struct.pack_into("=LHHLL", data1, 0, 48, RTM_NEWLINK, 0, 0, 0) |
706 | + struct.pack_into("HH4sHHc", data1, RTATTR_START_OFFSET, 8, 3, |
707 | + bytes, 5, 16, int_to_bytes(OPER_DOWN)) |
708 | + struct.pack_into("=LHHLL", data1, 48, 48, RTM_NEWLINK, 0, 0, 0) |
709 | + struct.pack_into("HH4sHHc", data1, 80, 8, 3, bytes, 5, 16, |
710 | + int_to_bytes(OPER_DOWN)) |
711 | + struct.pack_into("=LHHLL", data1, 96, 48, RTM_NEWLINK, 0, 0, 0) |
712 | + struct.pack_into("HH4sHHc", data2, 16, 8, 3, bytes, 5, 16, |
713 | + int_to_bytes(OPER_UP)) |
714 | + m_read_netlink_socket.side_effect = [data1, data2] |
715 | + wait_for_media_disconnect_connect(m_socket, ifname) |
716 | + self.assertEqual(m_read_netlink_socket.call_count, 2) |
717 | diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py |
718 | index 5ea7ae5..417d86a 100644 |
719 | --- a/tests/unittests/test_datasource/test_azure.py |
720 | +++ b/tests/unittests/test_datasource/test_azure.py |
721 | @@ -564,6 +564,8 @@ fdescfs /dev/fd fdescfs rw 0 0 |
722 | self.assertEqual(1, report_ready_func.call_count) |
723 | |
724 | @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file') |
725 | + @mock.patch('cloudinit.sources.helpers.netlink.' |
726 | + 'wait_for_media_disconnect_connect') |
727 | @mock.patch( |
728 | 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready') |
729 | @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network') |
730 | @@ -572,7 +574,7 @@ fdescfs /dev/fd fdescfs rw 0 0 |
731 | def test_crawl_metadata_on_reprovision_reports_ready_using_lease( |
732 | self, m_readurl, m_dhcp, |
733 | m_net, report_ready_func, |
734 | - m_write): |
735 | + m_media_switch, m_write): |
736 | """If reprovisioning, report ready using the obtained lease""" |
737 | ovfenv = construct_valid_ovf_env( |
738 | platform_settings={"PreprovisionedVm": "True"}) |
739 | @@ -586,6 +588,7 @@ fdescfs /dev/fd fdescfs rw 0 0 |
740 | 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0', |
741 | 'unknown-245': '624c3620'} |
742 | m_dhcp.return_value = [lease] |
743 | + m_media_switch.return_value = None |
744 | |
745 | reprovision_ovfenv = construct_valid_ovf_env() |
746 | m_readurl.return_value = url_helper.StringResponse( |
747 | @@ -1676,6 +1679,8 @@ class TestPreprovisioningShouldReprovision(CiTestCase): |
748 | |
749 | @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network') |
750 | @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery') |
751 | +@mock.patch('cloudinit.sources.helpers.netlink.' |
752 | + 'wait_for_media_disconnect_connect') |
753 | @mock.patch('requests.Session.request') |
754 | @mock.patch(MOCKPATH + 'DataSourceAzure._report_ready') |
755 | class TestPreprovisioningPollIMDS(CiTestCase): |
756 | @@ -1689,7 +1694,8 @@ class TestPreprovisioningPollIMDS(CiTestCase): |
757 | |
758 | @mock.patch(MOCKPATH + 'EphemeralDHCPv4') |
759 | def test_poll_imds_re_dhcp_on_timeout(self, m_dhcpv4, report_ready_func, |
760 | - fake_resp, m_dhcp, m_net): |
761 | + fake_resp, m_media_switch, m_dhcp, |
762 | + m_net): |
763 | """The poll_imds will retry DHCP on IMDS timeout.""" |
764 | report_file = self.tmp_path('report_marker', self.tmp) |
765 | lease = { |
766 | @@ -1697,7 +1703,7 @@ class TestPreprovisioningPollIMDS(CiTestCase): |
767 | 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0', |
768 | 'unknown-245': '624c3620'} |
769 | m_dhcp.return_value = [lease] |
770 | - |
771 | + m_media_switch.return_value = None |
772 | dhcp_ctx = mock.MagicMock(lease=lease) |
773 | dhcp_ctx.obtain_lease.return_value = lease |
774 | m_dhcpv4.return_value = dhcp_ctx |
775 | @@ -1723,11 +1729,12 @@ class TestPreprovisioningPollIMDS(CiTestCase): |
776 | dsa._poll_imds() |
777 | self.assertEqual(report_ready_func.call_count, 1) |
778 | report_ready_func.assert_called_with(lease=lease) |
779 | - self.assertEqual(2, m_dhcpv4.call_count, 'Expected 2 DHCP calls') |
780 | + self.assertEqual(3, m_dhcpv4.call_count, 'Expected 3 DHCP calls') |
781 | self.assertEqual(3, self.tries, 'Expected 3 total reads from IMDS') |
782 | |
783 | - def test_poll_imds_report_ready_false(self, report_ready_func, |
784 | - fake_resp, m_dhcp, m_net): |
785 | + def test_poll_imds_report_ready_false(self, |
786 | + report_ready_func, fake_resp, |
787 | + m_media_switch, m_dhcp, m_net): |
788 | """The poll_imds should not call reporting ready |
789 | when flag is false""" |
790 | report_file = self.tmp_path('report_marker', self.tmp) |
791 | @@ -1736,6 +1743,7 @@ class TestPreprovisioningPollIMDS(CiTestCase): |
792 | 'interface': 'eth9', 'fixed-address': '192.168.2.9', |
793 | 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0', |
794 | 'unknown-245': '624c3620'}] |
795 | + m_media_switch.return_value = None |
796 | dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths) |
797 | with mock.patch(MOCKPATH + 'REPORTED_READY_MARKER_FILE', report_file): |
798 | dsa._poll_imds() |
799 | @@ -1745,6 +1753,8 @@ class TestPreprovisioningPollIMDS(CiTestCase): |
800 | @mock.patch(MOCKPATH + 'util.subp') |
801 | @mock.patch(MOCKPATH + 'util.write_file') |
802 | @mock.patch(MOCKPATH + 'util.is_FreeBSD') |
803 | +@mock.patch('cloudinit.sources.helpers.netlink.' |
804 | + 'wait_for_media_disconnect_connect') |
805 | @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network') |
806 | @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery') |
807 | @mock.patch('requests.Session.request') |
808 | @@ -1757,10 +1767,13 @@ class TestAzureDataSourcePreprovisioning(CiTestCase): |
809 | self.paths = helpers.Paths({'cloud_dir': tmp}) |
810 | dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d |
811 | |
812 | - def test_poll_imds_returns_ovf_env(self, fake_resp, m_dhcp, m_net, |
813 | + def test_poll_imds_returns_ovf_env(self, fake_resp, |
814 | + m_dhcp, m_net, |
815 | + m_media_switch, |
816 | m_is_bsd, write_f, subp): |
817 | """The _poll_imds method should return the ovf_env.xml.""" |
818 | m_is_bsd.return_value = False |
819 | + m_media_switch.return_value = None |
820 | m_dhcp.return_value = [{ |
821 | 'interface': 'eth9', 'fixed-address': '192.168.2.9', |
822 | 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0'}] |
823 | @@ -1778,16 +1791,19 @@ class TestAzureDataSourcePreprovisioning(CiTestCase): |
824 | 'Cloud-Init/%s' % vs() |
825 | }, method='GET', timeout=1, |
826 | url=full_url)]) |
827 | - self.assertEqual(m_dhcp.call_count, 1) |
828 | + self.assertEqual(m_dhcp.call_count, 2) |
829 | m_net.assert_any_call( |
830 | broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9', |
831 | prefix_or_mask='255.255.255.0', router='192.168.2.1') |
832 | - self.assertEqual(m_net.call_count, 1) |
833 | + self.assertEqual(m_net.call_count, 2) |
834 | |
835 | - def test__reprovision_calls__poll_imds(self, fake_resp, m_dhcp, m_net, |
836 | + def test__reprovision_calls__poll_imds(self, fake_resp, |
837 | + m_dhcp, m_net, |
838 | + m_media_switch, |
839 | m_is_bsd, write_f, subp): |
840 | """The _reprovision method should call poll IMDS.""" |
841 | m_is_bsd.return_value = False |
842 | + m_media_switch.return_value = None |
843 | m_dhcp.return_value = [{ |
844 | 'interface': 'eth9', 'fixed-address': '192.168.2.9', |
845 | 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0', |
846 | @@ -1811,11 +1827,11 @@ class TestAzureDataSourcePreprovisioning(CiTestCase): |
847 | 'User-Agent': |
848 | 'Cloud-Init/%s' % vs()}, |
849 | method='GET', timeout=1, url=full_url)]) |
850 | - self.assertEqual(m_dhcp.call_count, 1) |
851 | + self.assertEqual(m_dhcp.call_count, 2) |
852 | m_net.assert_any_call( |
853 | broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9', |
854 | prefix_or_mask='255.255.255.0', router='192.168.2.1') |
855 | - self.assertEqual(m_net.call_count, 1) |
856 | + self.assertEqual(m_net.call_count, 2) |
857 | |
858 | |
859 | class TestRemoveUbuntuNetworkConfigScripts(CiTestCase): |
The other question that's hidden in here is whether we should "just admit that we need dbus" and start properly depending on it. We already depend on dbus for "set_hostname". Problem is that we "don't really want to depend on dbus" and that creates other problems, i.e. hostname setting only works if it runs after dbus is already operational (for systemd based distributions). In this case the waiting on the socket could be replaces by using dbus notifications.