Merge ~tamilmani1989/cloud-init:azure_networking into cloud-init:master

Proposed by Tamilmani Manoharan
Status: Merged
Approved by: Ryan Harper
Approved revision: 56559c2548f858046a0775d21190cb167463cdcb
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~tamilmani1989/cloud-init:azure_networking
Merge into: cloud-init:master
Diff against target: 859 lines (+678/-16)
4 files modified
cloudinit/sources/DataSourceAzure.py (+27/-4)
cloudinit/sources/helpers/netlink.py (+250/-0)
cloudinit/sources/helpers/tests/test_netlink.py (+373/-0)
tests/unittests/test_datasource/test_azure.py (+28/-12)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
Scott Moser Approve
Sushant Sharma (community) Approve
Ryan Harper Approve
Chad Smith Approve
Paul Meyer (community) Needs Fixing
Douglas Jordan Pending
Review via email: mp+336392@code.launchpad.net

Commit message

azure: detect vnet migration via netlink media change event

Replace Azure pre-provision polling on IMDS with a blocking call
which watches for netlink link state change messages. The media
change event happens when a pre-provisioned VM has been activated
and is connected to the users virtual network and cloud-init can
then resume operation to complete image instantiation.

Description of the change

This PR enables moving a virtual machine (VM) in Azure from one virtual network (Vnet) to another. It adds a new azure specific module that listens for Azure host-node to indicate that VM has now moved to a different network (by listening for nic connect/disconnect). Once VM move is detected (via netlink callback), the module triggers a re-dhcp to get ipaddress from the new Vnet.

To post a comment you must log in.
Revision history for this message
Robert Schweikert (rjschwei) wrote :

The other question that's hidden in here is whether we should "just admit that we need dbus" and start properly depending on it. We already depend on dbus for "set_hostname". Problem is that we "don't really want to depend on dbus" and that creates other problems, i.e. hostname setting only works if it runs after dbus is already operational (for systemd based distributions). In this case the waiting on the socket could be replaces by using dbus notifications.

Revision history for this message
Sushant Sharma (sushantsharma) :
Revision history for this message
Douglas Jordan (dojordan) wrote :

Some comments inline

Revision history for this message
Ryan Harper (raharper) wrote :

Can you describe in what scenario the current code which polls the IMDS url for failures is inadequate?

Is there a way we can test the current code on Azure directly?

review: Needs Fixing
Revision history for this message
Sushant Sharma (sushantsharma) wrote :

> Can you describe in what scenario the current code which polls the IMDS url
> for failures is inadequate?
>
> Is there a way we can test the current code on Azure directly?

Thanks Ryan, This is a valid question, and we are in process of testing in Azure with all components involved (including updated host). Having said that, IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting updated on the host, or any random communication failure (since this is a network call).
I think relying on a clean deterministic event should be preferred.
Furthermore, I believe the code that polls host for new provisioning data should just do that, and not worry about setting up networking. We should aim for clear and concise goals for every piece of code.

Revision history for this message
Ryan Harper (raharper) wrote :

> > Can you describe in what scenario the current code which polls the IMDS url
> > for failures is inadequate?
> >
> > Is there a way we can test the current code on Azure directly?
>
> Thanks Ryan, This is a valid question, and we are in process of testing in
> Azure with all components involved (including updated host). Having said that,
> IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> updated on the host, or any random communication failure (since this is a
> network call).

Which of these failures are not handled already by the retry logic? If we have any failure attempting to fetch the URL from IMDS, then then the retry logic will acquire a new lease; so I'm struggling to see which scenario is not covered.

> I think relying on a clean deterministic event should be preferred.

I agree; I'd much rather see use of the HyperV Keystore here for indicating both that a VM has been tagged as pre-provision, and when its network has been moved.

> Furthermore, I believe the code that polls host for new provisioning data
> should just do that, and not worry about setting up networking.

The _poll_imds is not setting up networking for the system; rather it's setting up a temporary interface and DHCP'ing on to acquire information at a specific time for a specific use-case: preprovisioning.

Revision history for this message
Sushant Sharma (sushantsharma) wrote :

> > > Can you describe in what scenario the current code which polls the IMDS
> url
> > > for failures is inadequate?
> > >
> > > Is there a way we can test the current code on Azure directly?
> >
> > Thanks Ryan, This is a valid question, and we are in process of testing in
> > Azure with all components involved (including updated host). Having said
> that,
> > IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> > updated on the host, or any random communication failure (since this is a
> > network call).
>
> Which of these failures are not handled already by the retry logic? If we
> have any failure attempting to fetch the URL from IMDS, then then the retry
> logic will acquire a new lease; so I'm struggling to see which scenario is not
> covered.
>

Let's remember that time to move VM in the new network is also critical for our customers. If we can save any amount of time (even if it is few seconds on average) that we spend on polling and retrying dhcp on failure, we should prefer that. Having said that, let us run some tests and see how much we are saving in the solution that depends on netlink versus the existing checked in code. Will report the results and hopefully we can make the case again for the current PR.

Thanks,
Sushant

Revision history for this message
Douglas Jordan (dojordan) wrote :

Added a few (mostly style) comments

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I addressed review comments

Revision history for this message
Douglas Jordan (dojordan) :
Revision history for this message
Douglas Jordan (dojordan) :
Revision history for this message
Paul Meyer (paul-meyer) wrote :

Overall LGTM, except for the lack of unit tests for the new code.

You could change the 0/-1 return code from create_bind_netlink_socket to be either an object property (netevent.is_listening) or throwing an exception with handling in the calling code.

review: Needs Fixing
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

Hi Paul, I added unit test for network switch. Can you please review?

Revision history for this message
Chad Smith (chad.smith) wrote :

Updated my comment and patch suggestions, please feel free to disagree where you think points are irrelevant or wrong

  Here are a couple thoughts on your branch to aid in testability of the branch features:

The complete patch suggestion is here http://paste.ubuntu.com/p/pDYvtFQN8g/

I've itemized the general thoughts here for your review to see what you think
  1. generally let's limit what lives in a try/except block to code that we expect will actually generate exceptions. I moved general increment logic outside of the struct.unpack calls and moved the try/except socket.error checks to around the read_netlink_socket as I didn't think socket.errors would be raised by anything else in wait_for_media_disconnect_connect

  2. I suggest breaking up NetworkEvents class as it seems just a collection of functions more than a class with state. When I see one-line class methods that don't actually depend on any instance attributes, it's a hint that it might be an empty level of indirection we just don't need (like select_netlink_socket, read_socket and close_socket methods).

 3. renam3 networkevents.py -> netlink.py as a module to make it a bit more specific
 4 personal preference is to use namedtuples instead of classes to host simple structured attributes like your rta_attr definition
 5. I'd like to see simple unit tests on each function per the simple example I added in cloudinit/sources/helpers/tests/test_netlink.py

review: Needs Information
Revision history for this message
Chad Smith (chad.smith) wrote :

Also here's a cloudinit/sources/helpers/tests/test_netlink.py basic single test example

http://paste.ubuntu.com/p/SHTwRRZPD2/

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:f561d8f6a14a1136c99820fefdd78a3c65a405dd
https://jenkins.ubuntu.com/server/job/cloud-init-ci/429/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    FAILED: Ubuntu LTS: Build

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/429/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:7712d6fa901ea6e132fd3bd7a526af566e5d3aa6
https://jenkins.ubuntu.com/server/job/cloud-init-ci/439/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/439/rebuild

review: Approve (continuous-integration)
Revision history for this message
Joshua Powers (powersj) wrote :

CI passed on latest commit, there are conflicts below that need to be worked out, moving it back to needs review at Tamilmani's request.

0c14e44... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

# Conflicts:
# cloudinit/sources/DataSourceAzure.py

144e6d8... by Tamilmani Manoharan

fixed indentation error

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for working on the refactor. This looks a lot better and it's much easier to read and write unittests.

I've quite a few inline comments, please look through those. The biggest query is around how do we know when the vnet migration has started to prevent us from getting stuck forever if it's already started before we attempt to listen for events.

review: Needs Fixing
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> Thanks for working on the refactor. This looks a lot better and it's much
> easier to read and write unittests.
>
> I've quite a few inline comments, please look through those. The biggest
> query is around how do we know when the vnet migration has started to prevent
> us from getting stuck forever if it's already started before we attempt to
> listen for events.

Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet switch won't happen. Its way of informing the top level services that vm is ready for vnet switch. So I have created and bound netlink socket before that call so it won't miss any event that happens after that.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I have few concern over some review comments which i replied inline. Other than that, I'm fine with other comments and will work on addressing those

Revision history for this message
Ryan Harper (raharper) wrote :

> > Thanks for working on the refactor. This looks a lot better and it's much
> > easier to read and write unittests.
> >
> > I've quite a few inline comments, please look through those. The biggest
> > query is around how do we know when the vnet migration has started to
> prevent
> > us from getting stuck forever if it's already started before we attempt to
> > listen for events.
>
> Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet
> switch won't happen. Its way of informing the top level services that vm is
> ready for vnet switch. So I have created and bound netlink socket before that
> call so it won't miss any event that happens after that.

OK, so, the sequence then is:

1) crawl initial metadata and detect if pre-provision VM, if so write a REPROVISION marker file
2) in _poll_imds create nl socket
3) write a REPORT_READY marker file and call _report_ready()
4) _report_ready() will invoke the shim
5) wait_for_media_disconnect()

The window is now between (2) and (5); and the question is what does the netlink socket do with the incoming messages that we won't process until (5)?

From reading the upstream docs on netlink, the default netlink socket size is 32kb, and while the typical messages may be small, the payload for messages is arbitrary. I know that DELLINK and NEWLINK are small but other messages may come through and if we're not processing then the socket could become full.

I suspect that in practice there isn't a lot of netlink traffic on the node with only a single interface up at this point but I'm wondering if we shouldn't employ something that allows us to:

a) spawn a thread to start doing the "Wait" which would start reading messages as they come in on the socket *before* we send the READY flag to the fabric
b) after sending the ready flag, we would join() the thread which would require the main program to wait until the thread exited (ie, we detected the media change event).

Thoughts?

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> > > Thanks for working on the refactor. This looks a lot better and it's much
> > > easier to read and write unittests.
> > >
> > > I've quite a few inline comments, please look through those. The biggest
> > > query is around how do we know when the vnet migration has started to
> > prevent
> > > us from getting stuck forever if it's already started before we attempt to
> > > listen for events.
> >
> > Thanks for reviewing it. Unless cloud-init report ready to azure fabric,
> vnet
> > switch won't happen. Its way of informing the top level services that vm is
> > ready for vnet switch. So I have created and bound netlink socket before
> that
> > call so it won't miss any event that happens after that.
>
> OK, so, the sequence then is:
>
> 1) crawl initial metadata and detect if pre-provision VM, if so write a
> REPROVISION marker file
> 2) in _poll_imds create nl socket
> 3) write a REPORT_READY marker file and call _report_ready()
> 4) _report_ready() will invoke the shim
> 5) wait_for_media_disconnect()
>
> The window is now between (2) and (5); and the question is what does the
> netlink socket do with the incoming messages that we won't process until (5)?
>
> From reading the upstream docs on netlink, the default netlink socket size is
> 32kb, and while the typical messages may be small, the payload for messages is
> arbitrary. I know that DELLINK and NEWLINK are small but other messages may
> come through and if we're not processing then the socket could become full.
>
> I suspect that in practice there isn't a lot of netlink traffic on the node
> with only a single interface up at this point but I'm wondering if we
> shouldn't employ something that allows us to:
>
> a) spawn a thread to start doing the "Wait" which would start reading messages
> as they come in on the socket *before* we send the READY flag to the fabric
> b) after sending the ready flag, we would join() the thread which would
> require the main program to wait until the thread exited (ie, we detected the
> media change event).
>
> Thoughts?

I don't think we receive that many netlink messages since this happens before systemd-networkd starts. we manually setup an ephemeral network(one interface) to send report ready. Also netlink code listen for only RTMGRP_LINK(network interface create/delete/up/down events) and not for any ipaddress and route changes. There will be only one interface at that time and it reads only above events for that interface

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

OK. Let's update the create_netlink_socket() method with some of these details, binding to only RTNLGRP_LINK (which only includes RTM_NEWLINK/RTM_DELLINK/RTN_GETLINK).

It would be nice if we could bind to those events and only on a specific ifindex (we could look up the currently up interface) but it's sufficient to filter in the wait_for_media_disconnect_connect().

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> OK. Let's update the create_netlink_socket() method with some of these
> details, binding to only RTNLGRP_LINK (which only includes
> RTM_NEWLINK/RTM_DELLINK/RTN_GETLINK).
>
> It would be nice if we could bind to those events and only on a specific
> ifindex (we could look up the currently up interface) but it's sufficient to
> filter in the wait_for_media_disconnect_connect().

ok will update it.

58ef420... by Tamilmani Manoharan

addressed review comments

3d3efee... by Tamilmani Manoharan

reverted back the line break change

148c88c... by Tamilmani Manoharan

added few more unit tests.

7e2bba6... by Tamilmani Manoharan

Added a test case to validate down up scenario

45806ec... by Tamilmani Manoharan

cleanup and retry on netlink socket creation error

cc64c51... by Tamilmani Manoharan

optimized test code

Revision history for this message
Ryan Harper (raharper) wrote :

I couple more in-line requests below. Thanks for incorporating the requested changes.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
6e4cfb7... by Tamilmani Manoharan

addressed iteration 3 review comments

91a4304... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

# Conflicts:
# cloudinit/sources/DataSourceAzure.py
# tests/unittests/test_datasource/test_azure.py

feb9f10... by Tamilmani Manoharan

fixed azure UTs regression with recent changes in master

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> I couple more in-line requests below. Thanks for incorporating the requested
> changes.

Addressed your comments. Please take a look.

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

thanks for reviewing. I will address your comments. Do you have any concerns other than these comments? Can you sign off once i address below comments?

59e5fe9... by Tamilmani Manoharan

addressed iteration 4 review comments

9c6eb4a... by Tamilmani Manoharan

added comments

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
b507379... by Tamilmani Manoharan

addressed iteration 5 review comments
Fixed code style error

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:b5073795740154066b791b2a9ab1f8a624a8bb3e
https://jenkins.ubuntu.com/server/job/cloud-init-ci/454/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/454/rebuild

review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Chad Smith (chad.smith) wrote :

Leaving this approved pending your comments regarding ignoring iterim state transitions for operstates != DOWN, DORMANT or UP.

review: Approve
2fc6e87... by Tamilmani Manoharan

addressed iteration 6 comments

75c8987... by Tamilmani Manoharan

dummy commit to sync with launchpad

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

i addressed your comments

ad8ebf4... by Tamilmani Manoharan

moved log inside function as per review comment

Revision history for this message
Chad Smith (chad.smith) wrote :

Looks good, and unit tests pass. 2 Minor inline followups.
 Thanks Tamilmani.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

Please review the commit message I added.

I've a couple in-line comments on what we return when reading the attrs, please comment there as well.

review: Needs Information
f12813e... by Tamilmani Manoharan

fixed typo

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

One more follow-up on the rta processing.

review: Needs Information
Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
205864e... by Tamilmani Manoharan

return interfaceoperstate as none in case of failure

Revision history for this message
Ryan Harper (raharper) wrote :

Some comments on debug messages. And another occurred to me about the contents we get back from the socket, wondering if that is only ever a single message or possible a sequence of messages, and if so, don't we need to loop over the data?

review: Needs Information
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

For netlink down/up events we won't get multipart message and i already tested it. Multipart messages will be received only if payload is too big.

dcf4c86... by Tamilmani Manoharan

Addressed reading multiple netlink messages in single receive.

978f7ed... by Tamilmani Manoharan

added test case for receiving multiple netlink messages in single receive call

d3fd194... by Tamilmani Manoharan

reverted changes in test_azure

fd355fc... by Tamilmani Manoharan

adhere to tox standard

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for handling multiple messages in the buffer. Some in-line comments, suggestions. I'd to add a test-case that ensure we parse the socket data buffer to completion and then a second read to the buffer to return the media change sequence so we validate that we handle entering and exiting both sets of loops.

Revision history for this message
Scott Moser (smoser) wrote :

1 small comment in line.
generally, i wonder what happens here on freebsd.

006bf63... by Tamilmani Manoharan

Added testcase for reading partial messages

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I responded for your comments.

437b25b... by Tamilmani Manoharan

modified logs

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Scott Moser (smoser) wrote :

what about freebsd?
what happens there? do we hit your newly added code or is it avoided.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> what about freebsd?
> what happens there? do we hit your newly added code or is it avoided.

Yes it will hit for freebsd also. Is there any problem with that?

85c5a6d... by Tamilmani Manoharan

1. Check for free bsd
2. Handle assertion errors

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

Scott/Ryan, addressed your comments. Can you sign off?

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
c600588... by Tamilmani Manoharan

addressed reading partial messages from buffer issue and handled netlink socket close in case of exception

77dfb5f... by Tamilmani Manoharan

removed blank line at eof

Revision history for this message
Ryan Harper (raharper) :
8157985... by Tamilmani Manoharan

removed saving data to another variable

Revision history for this message
Ryan Harper (raharper) wrote :

Just a spelling fix in the unittests.

I think the core logic in wait_() is solid. Do you have a way to test this branch on an instances doing the pre-provision logic just to ensure we're not outsmarting ourselves with the additional parsing logic ?

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> Just a spelling fix in the unittests.
>
> I think the core logic in wait_() is solid. Do you have a way to test this
> branch on an instances doing the pre-provision logic just to ensure we're not
> outsmarting ourselves with the additional parsing logic ?

Yes i already tested that multiple times. I push image(with my changes) as vhd in azure marketplace and then deploy pps vm using that.

4d7dde8... by Tamilmani Manoharan

fixed spell error

Revision history for this message
Ryan Harper (raharper) wrote :

Thank you for the hard work and many changes.

review: Approve
ea309e3... by Tamilmani Manoharan

Fixed code style errors

Revision history for this message
Sushant Sharma (sushantsharma) wrote :

Thanks a lot Tamilmani!

review: Approve
Revision history for this message
Scott Moser (smoser) :
review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:ea309e371cb803edfe9f76a33261bfd02b4b5dda
https://jenkins.ubuntu.com/server/job/cloud-init-ci/467/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/467/rebuild

review: Approve (continuous-integration)
5917a00... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

9bc4a08... by Tamilmani Manoharan

There are 2 fixes:
1. if timeout happens, read returns none and concatenate data after that check
2. While resolving merge conflicts, removed else part accidentally. Restored the else condition.
3. Also reverted testcase changes

56559c2... by Tamilmani Manoharan

Fixed code style error

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:9bc4a08cbfb3cddc44e640d5543aaeef7fb186d5
https://jenkins.ubuntu.com/server/job/cloud-init-ci/469/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/469/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:56559c2548f858046a0775d21190cb167463cdcb
https://jenkins.ubuntu.com/server/job/cloud-init-ci/470/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/470/rebuild

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
2index be82ec4..e076d5d 100644
3--- a/cloudinit/sources/DataSourceAzure.py
4+++ b/cloudinit/sources/DataSourceAzure.py
5@@ -22,6 +22,7 @@ from cloudinit.event import EventType
6 from cloudinit.net.dhcp import EphemeralDHCPv4
7 from cloudinit import sources
8 from cloudinit.sources.helpers.azure import get_metadata_from_fabric
9+from cloudinit.sources.helpers import netlink
10 from cloudinit.url_helper import UrlError, readurl, retry_on_url_exc
11 from cloudinit import util
12
13@@ -409,6 +410,10 @@ class DataSourceAzure(sources.DataSource):
14
15 perform_reprovision = reprovision or self._should_reprovision(ret)
16 if perform_reprovision:
17+ if util.is_FreeBSD():
18+ msg = "Free BSD is not supported for PPS VMs"
19+ LOG.error(msg)
20+ raise sources.InvalidMetaDataException(msg)
21 ret = self._reprovision()
22 imds_md = get_metadata_from_imds(
23 self.fallback_interface, retries=3)
24@@ -523,8 +528,8 @@ class DataSourceAzure(sources.DataSource):
25 response. Then return the returned JSON object."""
26 url = IMDS_URL + "reprovisiondata?api-version=2017-04-02"
27 headers = {"Metadata": "true"}
28+ nl_sock = None
29 report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE))
30- LOG.debug("Start polling IMDS")
31
32 def exc_cb(msg, exception):
33 if isinstance(exception, UrlError) and exception.code == 404:
34@@ -533,12 +538,19 @@ class DataSourceAzure(sources.DataSource):
35 # call DHCP and setup the ephemeral network to acquire the new IP.
36 return False
37
38+ LOG.debug("Wait for vnetswitch to happen")
39 while True:
40 try:
41 # Save our EphemeralDHCPv4 context so we avoid repeated dhcp
42 self._ephemeral_dhcp_ctx = EphemeralDHCPv4()
43 lease = self._ephemeral_dhcp_ctx.obtain_lease()
44 if report_ready:
45+ try:
46+ nl_sock = netlink.create_bound_netlink_socket()
47+ except netlink.NetlinkCreateSocketError as e:
48+ LOG.warning(e)
49+ self._ephemeral_dhcp_ctx.clean_network()
50+ return
51 path = REPORTED_READY_MARKER_FILE
52 LOG.info(
53 "Creating a marker file to report ready: %s", path)
54@@ -546,13 +558,24 @@ class DataSourceAzure(sources.DataSource):
55 pid=os.getpid(), time=time()))
56 self._report_ready(lease=lease)
57 report_ready = False
58- return readurl(url, timeout=1, headers=headers,
59- exception_cb=exc_cb, infinite=True,
60- log_req_resp=False).contents
61+ try:
62+ netlink.wait_for_media_disconnect_connect(
63+ nl_sock, lease['interface'])
64+ except AssertionError as error:
65+ LOG.error(error)
66+ return
67+ self._ephemeral_dhcp_ctx.clean_network()
68+ else:
69+ return readurl(url, timeout=1, headers=headers,
70+ exception_cb=exc_cb, infinite=True,
71+ log_req_resp=False).contents
72 except UrlError:
73 # Teardown our EphemeralDHCPv4 context on failure as we retry
74 self._ephemeral_dhcp_ctx.clean_network()
75 pass
76+ finally:
77+ if nl_sock:
78+ nl_sock.close()
79
80 def _report_ready(self, lease):
81 """Tells the fabric provisioning has completed """
82diff --git a/cloudinit/sources/helpers/netlink.py b/cloudinit/sources/helpers/netlink.py
83new file mode 100644
84index 0000000..d377ae3
85--- /dev/null
86+++ b/cloudinit/sources/helpers/netlink.py
87@@ -0,0 +1,250 @@
88+# Author: Tamilmani Manoharan <tamanoha@microsoft.com>
89+#
90+# This file is part of cloud-init. See LICENSE file for license information.
91+
92+from cloudinit import log as logging
93+from cloudinit import util
94+from collections import namedtuple
95+
96+import os
97+import select
98+import socket
99+import struct
100+
101+LOG = logging.getLogger(__name__)
102+
103+# http://man7.org/linux/man-pages/man7/netlink.7.html
104+RTMGRP_LINK = 1
105+NLMSG_NOOP = 1
106+NLMSG_ERROR = 2
107+NLMSG_DONE = 3
108+RTM_NEWLINK = 16
109+RTM_DELLINK = 17
110+RTM_GETLINK = 18
111+RTM_SETLINK = 19
112+MAX_SIZE = 65535
113+RTA_DATA_OFFSET = 32
114+MSG_TYPE_OFFSET = 16
115+SELECT_TIMEOUT = 60
116+
117+NLMSGHDR_FMT = "IHHII"
118+IFINFOMSG_FMT = "BHiII"
119+NLMSGHDR_SIZE = struct.calcsize(NLMSGHDR_FMT)
120+IFINFOMSG_SIZE = struct.calcsize(IFINFOMSG_FMT)
121+RTATTR_START_OFFSET = NLMSGHDR_SIZE + IFINFOMSG_SIZE
122+RTA_DATA_START_OFFSET = 4
123+PAD_ALIGNMENT = 4
124+
125+IFLA_IFNAME = 3
126+IFLA_OPERSTATE = 16
127+
128+# https://www.kernel.org/doc/Documentation/networking/operstates.txt
129+OPER_UNKNOWN = 0
130+OPER_NOTPRESENT = 1
131+OPER_DOWN = 2
132+OPER_LOWERLAYERDOWN = 3
133+OPER_TESTING = 4
134+OPER_DORMANT = 5
135+OPER_UP = 6
136+
137+RTAAttr = namedtuple('RTAAttr', ['length', 'rta_type', 'data'])
138+InterfaceOperstate = namedtuple('InterfaceOperstate', ['ifname', 'operstate'])
139+NetlinkHeader = namedtuple('NetlinkHeader', ['length', 'type', 'flags', 'seq',
140+ 'pid'])
141+
142+
143+class NetlinkCreateSocketError(RuntimeError):
144+ '''Raised if netlink socket fails during create or bind.'''
145+ pass
146+
147+
148+def create_bound_netlink_socket():
149+ '''Creates netlink socket and bind on netlink group to catch interface
150+ down/up events. The socket will bound only on RTMGRP_LINK (which only
151+ includes RTM_NEWLINK/RTM_DELLINK/RTM_GETLINK events). The socket is set to
152+ non-blocking mode since we're only receiving messages.
153+
154+ :returns: netlink socket in non-blocking mode
155+ :raises: NetlinkCreateSocketError
156+ '''
157+ try:
158+ netlink_socket = socket.socket(socket.AF_NETLINK,
159+ socket.SOCK_RAW,
160+ socket.NETLINK_ROUTE)
161+ netlink_socket.bind((os.getpid(), RTMGRP_LINK))
162+ netlink_socket.setblocking(0)
163+ except socket.error as e:
164+ msg = "Exception during netlink socket create: %s" % e
165+ raise NetlinkCreateSocketError(msg)
166+ LOG.debug("Created netlink socket")
167+ return netlink_socket
168+
169+
170+def get_netlink_msg_header(data):
171+ '''Gets netlink message type and length
172+
173+ :param: data read from netlink socket
174+ :returns: netlink message type
175+ :raises: AssertionError if data is None or data is not >= NLMSGHDR_SIZE
176+ struct nlmsghdr {
177+ __u32 nlmsg_len; /* Length of message including header */
178+ __u16 nlmsg_type; /* Type of message content */
179+ __u16 nlmsg_flags; /* Additional flags */
180+ __u32 nlmsg_seq; /* Sequence number */
181+ __u32 nlmsg_pid; /* Sender port ID */
182+ };
183+ '''
184+ assert (data is not None), ("data is none")
185+ assert (len(data) >= NLMSGHDR_SIZE), (
186+ "data is smaller than netlink message header")
187+ msg_len, msg_type, flags, seq, pid = struct.unpack(NLMSGHDR_FMT,
188+ data[:MSG_TYPE_OFFSET])
189+ LOG.debug("Got netlink msg of type %d", msg_type)
190+ return NetlinkHeader(msg_len, msg_type, flags, seq, pid)
191+
192+
193+def read_netlink_socket(netlink_socket, timeout=None):
194+ '''Select and read from the netlink socket if ready.
195+
196+ :param: netlink_socket: specify which socket object to read from
197+ :param: timeout: specify a timeout value (integer) to wait while reading,
198+ if none, it will block indefinitely until socket ready for read
199+ :returns: string of data read (max length = <MAX_SIZE>) from socket,
200+ if no data read, returns None
201+ :raises: AssertionError if netlink_socket is None
202+ '''
203+ assert (netlink_socket is not None), ("netlink socket is none")
204+ read_set, _, _ = select.select([netlink_socket], [], [], timeout)
205+ # Incase of timeout,read_set doesn't contain netlink socket.
206+ # just return from this function
207+ if netlink_socket not in read_set:
208+ return None
209+ LOG.debug("netlink socket ready for read")
210+ data = netlink_socket.recv(MAX_SIZE)
211+ if data is None:
212+ LOG.error("Reading from Netlink socket returned no data")
213+ return data
214+
215+
216+def unpack_rta_attr(data, offset):
217+ '''Unpack a single rta attribute.
218+
219+ :param: data: string of data read from netlink socket
220+ :param: offset: starting offset of RTA Attribute
221+ :return: RTAAttr object with length, type and data. On error, return None.
222+ :raises: AssertionError if data is None or offset is not integer.
223+ '''
224+ assert (data is not None), ("data is none")
225+ assert (type(offset) == int), ("offset is not integer")
226+ assert (offset >= RTATTR_START_OFFSET), (
227+ "rta offset is less than expected length")
228+ length = rta_type = 0
229+ attr_data = None
230+ try:
231+ length = struct.unpack_from("H", data, offset=offset)[0]
232+ rta_type = struct.unpack_from("H", data, offset=offset+2)[0]
233+ except struct.error:
234+ return None # Should mean our offset is >= remaining data
235+
236+ # Unpack just the attribute's data. Offset by 4 to skip length/type header
237+ attr_data = data[offset+RTA_DATA_START_OFFSET:offset+length]
238+ return RTAAttr(length, rta_type, attr_data)
239+
240+
241+def read_rta_oper_state(data):
242+ '''Reads Interface name and operational state from RTA Data.
243+
244+ :param: data: string of data read from netlink socket
245+ :returns: InterfaceOperstate object containing if_name and oper_state.
246+ None if data does not contain valid IFLA_OPERSTATE and
247+ IFLA_IFNAME messages.
248+ :raises: AssertionError if data is None or length of data is
249+ smaller than RTATTR_START_OFFSET.
250+ '''
251+ assert (data is not None), ("data is none")
252+ assert (len(data) > RTATTR_START_OFFSET), (
253+ "length of data is smaller than RTATTR_START_OFFSET")
254+ ifname = operstate = None
255+ offset = RTATTR_START_OFFSET
256+ while offset <= len(data):
257+ attr = unpack_rta_attr(data, offset)
258+ if not attr or attr.length == 0:
259+ break
260+ # Each attribute is 4-byte aligned. Determine pad length.
261+ padlen = (PAD_ALIGNMENT -
262+ (attr.length % PAD_ALIGNMENT)) % PAD_ALIGNMENT
263+ offset += attr.length + padlen
264+
265+ if attr.rta_type == IFLA_OPERSTATE:
266+ operstate = ord(attr.data)
267+ elif attr.rta_type == IFLA_IFNAME:
268+ interface_name = util.decode_binary(attr.data, 'utf-8')
269+ ifname = interface_name.strip('\0')
270+ if not ifname or operstate is None:
271+ return None
272+ LOG.debug("rta attrs: ifname %s operstate %d", ifname, operstate)
273+ return InterfaceOperstate(ifname, operstate)
274+
275+
276+def wait_for_media_disconnect_connect(netlink_socket, ifname):
277+ '''Block until media disconnect and connect has happened on an interface.
278+ Listens on netlink socket to receive netlink events and when the carrier
279+ changes from 0 to 1, it considers event has happened and
280+ return from this function
281+
282+ :param: netlink_socket: netlink_socket to receive events
283+ :param: ifname: Interface name to lookout for netlink events
284+ :raises: AssertionError if netlink_socket is None or ifname is None.
285+ '''
286+ assert (netlink_socket is not None), ("netlink socket is none")
287+ assert (ifname is not None), ("interface name is none")
288+ assert (len(ifname) > 0), ("interface name cannot be empty")
289+ carrier = OPER_UP
290+ prevCarrier = OPER_UP
291+ data = bytes()
292+ LOG.debug("Wait for media disconnect and reconnect to happen")
293+ while True:
294+ recv_data = read_netlink_socket(netlink_socket, SELECT_TIMEOUT)
295+ if recv_data is None:
296+ continue
297+ LOG.debug('read %d bytes from socket', len(recv_data))
298+ data += recv_data
299+ LOG.debug('Length of data after concat %d', len(data))
300+ offset = 0
301+ datalen = len(data)
302+ while offset < datalen:
303+ nl_msg = data[offset:]
304+ if len(nl_msg) < NLMSGHDR_SIZE:
305+ LOG.debug("Data is smaller than netlink header")
306+ break
307+ nlheader = get_netlink_msg_header(nl_msg)
308+ if len(nl_msg) < nlheader.length:
309+ LOG.debug("Partial data. Smaller than netlink message")
310+ break
311+ padlen = (nlheader.length+PAD_ALIGNMENT-1) & ~(PAD_ALIGNMENT-1)
312+ offset = offset + padlen
313+ LOG.debug('offset to next netlink message: %d', offset)
314+ # Ignore any messages not new link or del link
315+ if nlheader.type not in [RTM_NEWLINK, RTM_DELLINK]:
316+ continue
317+ interface_state = read_rta_oper_state(nl_msg)
318+ if interface_state is None:
319+ LOG.debug('Failed to read rta attributes: %s', interface_state)
320+ continue
321+ if interface_state.ifname != ifname:
322+ LOG.debug(
323+ "Ignored netlink event on interface %s. Waiting for %s.",
324+ interface_state.ifname, ifname)
325+ continue
326+ if interface_state.operstate not in [OPER_UP, OPER_DOWN]:
327+ continue
328+ prevCarrier = carrier
329+ carrier = interface_state.operstate
330+ # check for carrier down, up sequence
331+ isVnetSwitch = (prevCarrier == OPER_DOWN) and (carrier == OPER_UP)
332+ if isVnetSwitch:
333+ LOG.debug("Media switch happened on %s.", ifname)
334+ return
335+ data = data[offset:]
336+
337+# vi: ts=4 expandtab
338diff --git a/cloudinit/sources/helpers/tests/test_netlink.py b/cloudinit/sources/helpers/tests/test_netlink.py
339new file mode 100644
340index 0000000..c2898a1
341--- /dev/null
342+++ b/cloudinit/sources/helpers/tests/test_netlink.py
343@@ -0,0 +1,373 @@
344+# Author: Tamilmani Manoharan <tamanoha@microsoft.com>
345+#
346+# This file is part of cloud-init. See LICENSE file for license information.
347+
348+from cloudinit.tests.helpers import CiTestCase, mock
349+import socket
350+import struct
351+import codecs
352+from cloudinit.sources.helpers.netlink import (
353+ NetlinkCreateSocketError, create_bound_netlink_socket, read_netlink_socket,
354+ read_rta_oper_state, unpack_rta_attr, wait_for_media_disconnect_connect,
355+ OPER_DOWN, OPER_UP, OPER_DORMANT, OPER_LOWERLAYERDOWN, OPER_NOTPRESENT,
356+ OPER_TESTING, OPER_UNKNOWN, RTATTR_START_OFFSET, RTM_NEWLINK, RTM_SETLINK,
357+ RTM_GETLINK, MAX_SIZE)
358+
359+
360+def int_to_bytes(i):
361+ '''convert integer to binary: eg: 1 to \x01'''
362+ hex_value = '{0:x}'.format(i)
363+ hex_value = '0' * (len(hex_value) % 2) + hex_value
364+ return codecs.decode(hex_value, 'hex_codec')
365+
366+
367+class TestCreateBoundNetlinkSocket(CiTestCase):
368+
369+ @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
370+ def test_socket_error_on_create(self, m_socket):
371+ '''create_bound_netlink_socket catches socket creation exception'''
372+
373+ """NetlinkCreateSocketError is raised when socket creation errors."""
374+ m_socket.side_effect = socket.error("Fake socket failure")
375+ with self.assertRaises(NetlinkCreateSocketError) as ctx_mgr:
376+ create_bound_netlink_socket()
377+ self.assertEqual(
378+ 'Exception during netlink socket create: Fake socket failure',
379+ str(ctx_mgr.exception))
380+
381+
382+class TestReadNetlinkSocket(CiTestCase):
383+
384+ @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
385+ @mock.patch('cloudinit.sources.helpers.netlink.select.select')
386+ def test_read_netlink_socket(self, m_select, m_socket):
387+ '''read_netlink_socket able to receive data'''
388+ data = 'netlinktest'
389+ m_select.return_value = [m_socket], None, None
390+ m_socket.recv.return_value = data
391+ recv_data = read_netlink_socket(m_socket, 2)
392+ m_select.assert_called_with([m_socket], [], [], 2)
393+ m_socket.recv.assert_called_with(MAX_SIZE)
394+ self.assertIsNotNone(recv_data)
395+ self.assertEqual(recv_data, data)
396+
397+ @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
398+ @mock.patch('cloudinit.sources.helpers.netlink.select.select')
399+ def test_netlink_read_timeout(self, m_select, m_socket):
400+ '''read_netlink_socket should timeout if nothing to read'''
401+ m_select.return_value = [], None, None
402+ data = read_netlink_socket(m_socket, 1)
403+ m_select.assert_called_with([m_socket], [], [], 1)
404+ self.assertEqual(m_socket.recv.call_count, 0)
405+ self.assertIsNone(data)
406+
407+ def test_read_invalid_socket(self):
408+ '''read_netlink_socket raises assert error if socket is invalid'''
409+ socket = None
410+ with self.assertRaises(AssertionError) as context:
411+ read_netlink_socket(socket, 1)
412+ self.assertTrue('netlink socket is none' in str(context.exception))
413+
414+
415+class TestParseNetlinkMessage(CiTestCase):
416+
417+ def test_read_rta_oper_state(self):
418+ '''read_rta_oper_state could parse netlink message and extract data'''
419+ ifname = "eth0"
420+ bytes = ifname.encode("utf-8")
421+ buf = bytearray(48)
422+ struct.pack_into("HH4sHHc", buf, RTATTR_START_OFFSET, 8, 3, bytes, 5,
423+ 16, int_to_bytes(OPER_DOWN))
424+ interface_state = read_rta_oper_state(buf)
425+ self.assertEqual(interface_state.ifname, ifname)
426+ self.assertEqual(interface_state.operstate, OPER_DOWN)
427+
428+ def test_read_none_data(self):
429+ '''read_rta_oper_state raises assert error if data is none'''
430+ data = None
431+ with self.assertRaises(AssertionError) as context:
432+ read_rta_oper_state(data)
433+ self.assertTrue('data is none', str(context.exception))
434+
435+ def test_read_invalid_rta_operstate_none(self):
436+ '''read_rta_oper_state returns none if operstate is none'''
437+ ifname = "eth0"
438+ buf = bytearray(40)
439+ bytes = ifname.encode("utf-8")
440+ struct.pack_into("HH4s", buf, RTATTR_START_OFFSET, 8, 3, bytes)
441+ interface_state = read_rta_oper_state(buf)
442+ self.assertIsNone(interface_state)
443+
444+ def test_read_invalid_rta_ifname_none(self):
445+ '''read_rta_oper_state returns none if ifname is none'''
446+ buf = bytearray(40)
447+ struct.pack_into("HHc", buf, RTATTR_START_OFFSET, 5, 16,
448+ int_to_bytes(OPER_DOWN))
449+ interface_state = read_rta_oper_state(buf)
450+ self.assertIsNone(interface_state)
451+
452+ def test_read_invalid_data_len(self):
453+ '''raise assert error if data size is smaller than required size'''
454+ buf = bytearray(32)
455+ with self.assertRaises(AssertionError) as context:
456+ read_rta_oper_state(buf)
457+ self.assertTrue('length of data is smaller than RTATTR_START_OFFSET' in
458+ str(context.exception))
459+
460+ def test_unpack_rta_attr_none_data(self):
461+ '''unpack_rta_attr raises assert error if data is none'''
462+ data = None
463+ with self.assertRaises(AssertionError) as context:
464+ unpack_rta_attr(data, RTATTR_START_OFFSET)
465+ self.assertTrue('data is none' in str(context.exception))
466+
467+ def test_unpack_rta_attr_invalid_offset(self):
468+ '''unpack_rta_attr raises assert error if offset is invalid'''
469+ data = bytearray(48)
470+ with self.assertRaises(AssertionError) as context:
471+ unpack_rta_attr(data, "offset")
472+ self.assertTrue('offset is not integer' in str(context.exception))
473+ with self.assertRaises(AssertionError) as context:
474+ unpack_rta_attr(data, 31)
475+ self.assertTrue('rta offset is less than expected length' in
476+ str(context.exception))
477+
478+
479+@mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
480+@mock.patch('cloudinit.sources.helpers.netlink.read_netlink_socket')
481+class TestWaitForMediaDisconnectConnect(CiTestCase):
482+ with_logs = True
483+
484+ def _media_switch_data(self, ifname, msg_type, operstate):
485+ '''construct netlink data with specified fields'''
486+ if ifname and operstate is not None:
487+ data = bytearray(48)
488+ bytes = ifname.encode("utf-8")
489+ struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3,
490+ bytes, 5, 16, int_to_bytes(operstate))
491+ elif ifname:
492+ data = bytearray(40)
493+ bytes = ifname.encode("utf-8")
494+ struct.pack_into("HH4s", data, RTATTR_START_OFFSET, 8, 3, bytes)
495+ elif operstate:
496+ data = bytearray(40)
497+ struct.pack_into("HHc", data, RTATTR_START_OFFSET, 5, 16,
498+ int_to_bytes(operstate))
499+ struct.pack_into("=LHHLL", data, 0, len(data), msg_type, 0, 0, 0)
500+ return data
501+
502+ def test_media_down_up_scenario(self, m_read_netlink_socket,
503+ m_socket):
504+ '''Test for media down up sequence for required interface name'''
505+ ifname = "eth0"
506+ # construct data for Oper State down
507+ data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
508+ # construct data for Oper State up
509+ data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
510+ m_read_netlink_socket.side_effect = [data_op_down, data_op_up]
511+ wait_for_media_disconnect_connect(m_socket, ifname)
512+ self.assertEqual(m_read_netlink_socket.call_count, 2)
513+
514+ def test_wait_for_media_switch_diff_interface(self, m_read_netlink_socket,
515+ m_socket):
516+ '''wait_for_media_disconnect_connect ignores unexpected interfaces.
517+
518+ The first two messages are for other interfaces and last two are for
519+ expected interface. So the function exit only after receiving last
520+ 2 messages and therefore the call count for m_read_netlink_socket
521+ has to be 4
522+ '''
523+ other_ifname = "eth1"
524+ expected_ifname = "eth0"
525+ data_op_down_eth1 = self._media_switch_data(
526+ other_ifname, RTM_NEWLINK, OPER_DOWN)
527+ data_op_up_eth1 = self._media_switch_data(
528+ other_ifname, RTM_NEWLINK, OPER_UP)
529+ data_op_down_eth0 = self._media_switch_data(
530+ expected_ifname, RTM_NEWLINK, OPER_DOWN)
531+ data_op_up_eth0 = self._media_switch_data(
532+ expected_ifname, RTM_NEWLINK, OPER_UP)
533+ m_read_netlink_socket.side_effect = [data_op_down_eth1,
534+ data_op_up_eth1,
535+ data_op_down_eth0,
536+ data_op_up_eth0]
537+ wait_for_media_disconnect_connect(m_socket, expected_ifname)
538+ self.assertIn('Ignored netlink event on interface %s' % other_ifname,
539+ self.logs.getvalue())
540+ self.assertEqual(m_read_netlink_socket.call_count, 4)
541+
542+ def test_invalid_msgtype_getlink(self, m_read_netlink_socket, m_socket):
543+ '''wait_for_media_disconnect_connect ignores GETLINK events.
544+
545+ The first two messages are for oper down and up for RTM_GETLINK type
546+ which netlink module will ignore. The last 2 messages are RTM_NEWLINK
547+ with oper state down and up messages. Therefore the call count for
548+ m_read_netlink_socket has to be 4 ignoring first 2 messages
549+ of RTM_GETLINK
550+ '''
551+ ifname = "eth0"
552+ data_getlink_down = self._media_switch_data(
553+ ifname, RTM_GETLINK, OPER_DOWN)
554+ data_getlink_up = self._media_switch_data(
555+ ifname, RTM_GETLINK, OPER_UP)
556+ data_newlink_down = self._media_switch_data(
557+ ifname, RTM_NEWLINK, OPER_DOWN)
558+ data_newlink_up = self._media_switch_data(
559+ ifname, RTM_NEWLINK, OPER_UP)
560+ m_read_netlink_socket.side_effect = [data_getlink_down,
561+ data_getlink_up,
562+ data_newlink_down,
563+ data_newlink_up]
564+ wait_for_media_disconnect_connect(m_socket, ifname)
565+ self.assertEqual(m_read_netlink_socket.call_count, 4)
566+
567+ def test_invalid_msgtype_setlink(self, m_read_netlink_socket, m_socket):
568+ '''wait_for_media_disconnect_connect ignores SETLINK events.
569+
570+ The first two messages are for oper down and up for RTM_GETLINK type
571+ which it will ignore. 3rd and 4th messages are RTM_NEWLINK with down
572+ and up messages. This function should exit after 4th messages since it
573+ sees down->up scenario. So the call count for m_read_netlink_socket
574+ has to be 4 ignoring first 2 messages of RTM_GETLINK and
575+ last 2 messages of RTM_NEWLINK
576+ '''
577+ ifname = "eth0"
578+ data_setlink_down = self._media_switch_data(
579+ ifname, RTM_SETLINK, OPER_DOWN)
580+ data_setlink_up = self._media_switch_data(
581+ ifname, RTM_SETLINK, OPER_UP)
582+ data_newlink_down = self._media_switch_data(
583+ ifname, RTM_NEWLINK, OPER_DOWN)
584+ data_newlink_up = self._media_switch_data(
585+ ifname, RTM_NEWLINK, OPER_UP)
586+ m_read_netlink_socket.side_effect = [data_setlink_down,
587+ data_setlink_up,
588+ data_newlink_down,
589+ data_newlink_up,
590+ data_newlink_down,
591+ data_newlink_up]
592+ wait_for_media_disconnect_connect(m_socket, ifname)
593+ self.assertEqual(m_read_netlink_socket.call_count, 4)
594+
595+ def test_netlink_invalid_switch_scenario(self, m_read_netlink_socket,
596+ m_socket):
597+ '''returns only if it receives UP event after a DOWN event'''
598+ ifname = "eth0"
599+ data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
600+ data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
601+ data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK,
602+ OPER_DORMANT)
603+ data_op_notpresent = self._media_switch_data(ifname, RTM_NEWLINK,
604+ OPER_NOTPRESENT)
605+ data_op_lowerdown = self._media_switch_data(ifname, RTM_NEWLINK,
606+ OPER_LOWERLAYERDOWN)
607+ data_op_testing = self._media_switch_data(ifname, RTM_NEWLINK,
608+ OPER_TESTING)
609+ data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK,
610+ OPER_UNKNOWN)
611+ m_read_netlink_socket.side_effect = [data_op_up, data_op_up,
612+ data_op_dormant, data_op_up,
613+ data_op_notpresent, data_op_up,
614+ data_op_lowerdown, data_op_up,
615+ data_op_testing, data_op_up,
616+ data_op_unknown, data_op_up,
617+ data_op_down, data_op_up]
618+ wait_for_media_disconnect_connect(m_socket, ifname)
619+ self.assertEqual(m_read_netlink_socket.call_count, 14)
620+
621+ def test_netlink_valid_inbetween_transitions(self, m_read_netlink_socket,
622+ m_socket):
623+ '''wait_for_media_disconnect_connect handles in between transitions'''
624+ ifname = "eth0"
625+ data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
626+ data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
627+ data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK,
628+ OPER_DORMANT)
629+ data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK,
630+ OPER_UNKNOWN)
631+ m_read_netlink_socket.side_effect = [data_op_down, data_op_dormant,
632+ data_op_unknown, data_op_up]
633+ wait_for_media_disconnect_connect(m_socket, ifname)
634+ self.assertEqual(m_read_netlink_socket.call_count, 4)
635+
636+ def test_netlink_invalid_operstate(self, m_read_netlink_socket, m_socket):
637+ '''wait_for_media_disconnect_connect should handle invalid operstates.
638+
639+ The function should not fail and return even if it receives invalid
640+ operstates. It always should wait for down up sequence.
641+ '''
642+ ifname = "eth0"
643+ data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
644+ data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
645+ data_op_invalid = self._media_switch_data(ifname, RTM_NEWLINK, 7)
646+ m_read_netlink_socket.side_effect = [data_op_invalid, data_op_up,
647+ data_op_down, data_op_invalid,
648+ data_op_up]
649+ wait_for_media_disconnect_connect(m_socket, ifname)
650+ self.assertEqual(m_read_netlink_socket.call_count, 5)
651+
652+ def test_wait_invalid_socket(self, m_read_netlink_socket, m_socket):
653+ '''wait_for_media_disconnect_connect handle none netlink socket.'''
654+ socket = None
655+ ifname = "eth0"
656+ with self.assertRaises(AssertionError) as context:
657+ wait_for_media_disconnect_connect(socket, ifname)
658+ self.assertTrue('netlink socket is none' in str(context.exception))
659+
660+ def test_wait_invalid_ifname(self, m_read_netlink_socket, m_socket):
661+ '''wait_for_media_disconnect_connect handle none interface name'''
662+ ifname = None
663+ with self.assertRaises(AssertionError) as context:
664+ wait_for_media_disconnect_connect(m_socket, ifname)
665+ self.assertTrue('interface name is none' in str(context.exception))
666+ ifname = ""
667+ with self.assertRaises(AssertionError) as context:
668+ wait_for_media_disconnect_connect(m_socket, ifname)
669+ self.assertTrue('interface name cannot be empty' in
670+ str(context.exception))
671+
672+ def test_wait_invalid_rta_attr(self, m_read_netlink_socket, m_socket):
673+ ''' wait_for_media_disconnect_connect handles invalid rta data'''
674+ ifname = "eth0"
675+ data_invalid1 = self._media_switch_data(None, RTM_NEWLINK, OPER_DOWN)
676+ data_invalid2 = self._media_switch_data(ifname, RTM_NEWLINK, None)
677+ data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
678+ data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
679+ m_read_netlink_socket.side_effect = [data_invalid1, data_invalid2,
680+ data_op_down, data_op_up]
681+ wait_for_media_disconnect_connect(m_socket, ifname)
682+ self.assertEqual(m_read_netlink_socket.call_count, 4)
683+
684+ def test_read_multiple_netlink_msgs(self, m_read_netlink_socket, m_socket):
685+ '''Read multiple messages in single receive call'''
686+ ifname = "eth0"
687+ bytes = ifname.encode("utf-8")
688+ data = bytearray(96)
689+ struct.pack_into("=LHHLL", data, 0, 48, RTM_NEWLINK, 0, 0, 0)
690+ struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3,
691+ bytes, 5, 16, int_to_bytes(OPER_DOWN))
692+ struct.pack_into("=LHHLL", data, 48, 48, RTM_NEWLINK, 0, 0, 0)
693+ struct.pack_into("HH4sHHc", data, 48 + RTATTR_START_OFFSET, 8,
694+ 3, bytes, 5, 16, int_to_bytes(OPER_UP))
695+ m_read_netlink_socket.return_value = data
696+ wait_for_media_disconnect_connect(m_socket, ifname)
697+ self.assertEqual(m_read_netlink_socket.call_count, 1)
698+
699+ def test_read_partial_netlink_msgs(self, m_read_netlink_socket, m_socket):
700+ '''Read partial messages in receive call'''
701+ ifname = "eth0"
702+ bytes = ifname.encode("utf-8")
703+ data1 = bytearray(112)
704+ data2 = bytearray(32)
705+ struct.pack_into("=LHHLL", data1, 0, 48, RTM_NEWLINK, 0, 0, 0)
706+ struct.pack_into("HH4sHHc", data1, RTATTR_START_OFFSET, 8, 3,
707+ bytes, 5, 16, int_to_bytes(OPER_DOWN))
708+ struct.pack_into("=LHHLL", data1, 48, 48, RTM_NEWLINK, 0, 0, 0)
709+ struct.pack_into("HH4sHHc", data1, 80, 8, 3, bytes, 5, 16,
710+ int_to_bytes(OPER_DOWN))
711+ struct.pack_into("=LHHLL", data1, 96, 48, RTM_NEWLINK, 0, 0, 0)
712+ struct.pack_into("HH4sHHc", data2, 16, 8, 3, bytes, 5, 16,
713+ int_to_bytes(OPER_UP))
714+ m_read_netlink_socket.side_effect = [data1, data2]
715+ wait_for_media_disconnect_connect(m_socket, ifname)
716+ self.assertEqual(m_read_netlink_socket.call_count, 2)
717diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py
718index 5ea7ae5..417d86a 100644
719--- a/tests/unittests/test_datasource/test_azure.py
720+++ b/tests/unittests/test_datasource/test_azure.py
721@@ -564,6 +564,8 @@ fdescfs /dev/fd fdescfs rw 0 0
722 self.assertEqual(1, report_ready_func.call_count)
723
724 @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
725+ @mock.patch('cloudinit.sources.helpers.netlink.'
726+ 'wait_for_media_disconnect_connect')
727 @mock.patch(
728 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')
729 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
730@@ -572,7 +574,7 @@ fdescfs /dev/fd fdescfs rw 0 0
731 def test_crawl_metadata_on_reprovision_reports_ready_using_lease(
732 self, m_readurl, m_dhcp,
733 m_net, report_ready_func,
734- m_write):
735+ m_media_switch, m_write):
736 """If reprovisioning, report ready using the obtained lease"""
737 ovfenv = construct_valid_ovf_env(
738 platform_settings={"PreprovisionedVm": "True"})
739@@ -586,6 +588,7 @@ fdescfs /dev/fd fdescfs rw 0 0
740 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
741 'unknown-245': '624c3620'}
742 m_dhcp.return_value = [lease]
743+ m_media_switch.return_value = None
744
745 reprovision_ovfenv = construct_valid_ovf_env()
746 m_readurl.return_value = url_helper.StringResponse(
747@@ -1676,6 +1679,8 @@ class TestPreprovisioningShouldReprovision(CiTestCase):
748
749 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
750 @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
751+@mock.patch('cloudinit.sources.helpers.netlink.'
752+ 'wait_for_media_disconnect_connect')
753 @mock.patch('requests.Session.request')
754 @mock.patch(MOCKPATH + 'DataSourceAzure._report_ready')
755 class TestPreprovisioningPollIMDS(CiTestCase):
756@@ -1689,7 +1694,8 @@ class TestPreprovisioningPollIMDS(CiTestCase):
757
758 @mock.patch(MOCKPATH + 'EphemeralDHCPv4')
759 def test_poll_imds_re_dhcp_on_timeout(self, m_dhcpv4, report_ready_func,
760- fake_resp, m_dhcp, m_net):
761+ fake_resp, m_media_switch, m_dhcp,
762+ m_net):
763 """The poll_imds will retry DHCP on IMDS timeout."""
764 report_file = self.tmp_path('report_marker', self.tmp)
765 lease = {
766@@ -1697,7 +1703,7 @@ class TestPreprovisioningPollIMDS(CiTestCase):
767 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
768 'unknown-245': '624c3620'}
769 m_dhcp.return_value = [lease]
770-
771+ m_media_switch.return_value = None
772 dhcp_ctx = mock.MagicMock(lease=lease)
773 dhcp_ctx.obtain_lease.return_value = lease
774 m_dhcpv4.return_value = dhcp_ctx
775@@ -1723,11 +1729,12 @@ class TestPreprovisioningPollIMDS(CiTestCase):
776 dsa._poll_imds()
777 self.assertEqual(report_ready_func.call_count, 1)
778 report_ready_func.assert_called_with(lease=lease)
779- self.assertEqual(2, m_dhcpv4.call_count, 'Expected 2 DHCP calls')
780+ self.assertEqual(3, m_dhcpv4.call_count, 'Expected 3 DHCP calls')
781 self.assertEqual(3, self.tries, 'Expected 3 total reads from IMDS')
782
783- def test_poll_imds_report_ready_false(self, report_ready_func,
784- fake_resp, m_dhcp, m_net):
785+ def test_poll_imds_report_ready_false(self,
786+ report_ready_func, fake_resp,
787+ m_media_switch, m_dhcp, m_net):
788 """The poll_imds should not call reporting ready
789 when flag is false"""
790 report_file = self.tmp_path('report_marker', self.tmp)
791@@ -1736,6 +1743,7 @@ class TestPreprovisioningPollIMDS(CiTestCase):
792 'interface': 'eth9', 'fixed-address': '192.168.2.9',
793 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
794 'unknown-245': '624c3620'}]
795+ m_media_switch.return_value = None
796 dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
797 with mock.patch(MOCKPATH + 'REPORTED_READY_MARKER_FILE', report_file):
798 dsa._poll_imds()
799@@ -1745,6 +1753,8 @@ class TestPreprovisioningPollIMDS(CiTestCase):
800 @mock.patch(MOCKPATH + 'util.subp')
801 @mock.patch(MOCKPATH + 'util.write_file')
802 @mock.patch(MOCKPATH + 'util.is_FreeBSD')
803+@mock.patch('cloudinit.sources.helpers.netlink.'
804+ 'wait_for_media_disconnect_connect')
805 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
806 @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
807 @mock.patch('requests.Session.request')
808@@ -1757,10 +1767,13 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
809 self.paths = helpers.Paths({'cloud_dir': tmp})
810 dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
811
812- def test_poll_imds_returns_ovf_env(self, fake_resp, m_dhcp, m_net,
813+ def test_poll_imds_returns_ovf_env(self, fake_resp,
814+ m_dhcp, m_net,
815+ m_media_switch,
816 m_is_bsd, write_f, subp):
817 """The _poll_imds method should return the ovf_env.xml."""
818 m_is_bsd.return_value = False
819+ m_media_switch.return_value = None
820 m_dhcp.return_value = [{
821 'interface': 'eth9', 'fixed-address': '192.168.2.9',
822 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0'}]
823@@ -1778,16 +1791,19 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
824 'Cloud-Init/%s' % vs()
825 }, method='GET', timeout=1,
826 url=full_url)])
827- self.assertEqual(m_dhcp.call_count, 1)
828+ self.assertEqual(m_dhcp.call_count, 2)
829 m_net.assert_any_call(
830 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',
831 prefix_or_mask='255.255.255.0', router='192.168.2.1')
832- self.assertEqual(m_net.call_count, 1)
833+ self.assertEqual(m_net.call_count, 2)
834
835- def test__reprovision_calls__poll_imds(self, fake_resp, m_dhcp, m_net,
836+ def test__reprovision_calls__poll_imds(self, fake_resp,
837+ m_dhcp, m_net,
838+ m_media_switch,
839 m_is_bsd, write_f, subp):
840 """The _reprovision method should call poll IMDS."""
841 m_is_bsd.return_value = False
842+ m_media_switch.return_value = None
843 m_dhcp.return_value = [{
844 'interface': 'eth9', 'fixed-address': '192.168.2.9',
845 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
846@@ -1811,11 +1827,11 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
847 'User-Agent':
848 'Cloud-Init/%s' % vs()},
849 method='GET', timeout=1, url=full_url)])
850- self.assertEqual(m_dhcp.call_count, 1)
851+ self.assertEqual(m_dhcp.call_count, 2)
852 m_net.assert_any_call(
853 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',
854 prefix_or_mask='255.255.255.0', router='192.168.2.1')
855- self.assertEqual(m_net.call_count, 1)
856+ self.assertEqual(m_net.call_count, 2)
857
858
859 class TestRemoveUbuntuNetworkConfigScripts(CiTestCase):

Subscribers

People subscribed via source and target branches