Merge ~tamilmani1989/cloud-init:azure_networking into cloud-init:master

Proposed by Tamilmani Manoharan
Status: Merged
Approved by: Ryan Harper
Approved revision: 56559c2548f858046a0775d21190cb167463cdcb
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~tamilmani1989/cloud-init:azure_networking
Merge into: cloud-init:master
Diff against target: 859 lines (+678/-16)
4 files modified
cloudinit/sources/DataSourceAzure.py (+27/-4)
cloudinit/sources/helpers/netlink.py (+250/-0)
cloudinit/sources/helpers/tests/test_netlink.py (+373/-0)
tests/unittests/test_datasource/test_azure.py (+28/-12)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
Scott Moser Approve
Sushant Sharma (community) Approve
Ryan Harper Approve
Chad Smith Approve
Paul Meyer (community) Needs Fixing
Douglas Jordan Pending
Review via email: mp+336392@code.launchpad.net

Commit message

azure: detect vnet migration via netlink media change event

Replace Azure pre-provision polling on IMDS with a blocking call
which watches for netlink link state change messages. The media
change event happens when a pre-provisioned VM has been activated
and is connected to the users virtual network and cloud-init can
then resume operation to complete image instantiation.

Description of the change

This PR enables moving a virtual machine (VM) in Azure from one virtual network (Vnet) to another. It adds a new azure specific module that listens for Azure host-node to indicate that VM has now moved to a different network (by listening for nic connect/disconnect). Once VM move is detected (via netlink callback), the module triggers a re-dhcp to get ipaddress from the new Vnet.

To post a comment you must log in.
Revision history for this message
Robert Schweikert (rjschwei) wrote :

The other question that's hidden in here is whether we should "just admit that we need dbus" and start properly depending on it. We already depend on dbus for "set_hostname". Problem is that we "don't really want to depend on dbus" and that creates other problems, i.e. hostname setting only works if it runs after dbus is already operational (for systemd based distributions). In this case the waiting on the socket could be replaces by using dbus notifications.

Revision history for this message
Sushant Sharma (sushantsharma) :
Revision history for this message
Douglas Jordan (dojordan) wrote :

Some comments inline

Revision history for this message
Ryan Harper (raharper) wrote :

Can you describe in what scenario the current code which polls the IMDS url for failures is inadequate?

Is there a way we can test the current code on Azure directly?

review: Needs Fixing
Revision history for this message
Sushant Sharma (sushantsharma) wrote :

> Can you describe in what scenario the current code which polls the IMDS url
> for failures is inadequate?
>
> Is there a way we can test the current code on Azure directly?

Thanks Ryan, This is a valid question, and we are in process of testing in Azure with all components involved (including updated host). Having said that, IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting updated on the host, or any random communication failure (since this is a network call).
I think relying on a clean deterministic event should be preferred.
Furthermore, I believe the code that polls host for new provisioning data should just do that, and not worry about setting up networking. We should aim for clear and concise goals for every piece of code.

Revision history for this message
Ryan Harper (raharper) wrote :

> > Can you describe in what scenario the current code which polls the IMDS url
> > for failures is inadequate?
> >
> > Is there a way we can test the current code on Azure directly?
>
> Thanks Ryan, This is a valid question, and we are in process of testing in
> Azure with all components involved (including updated host). Having said that,
> IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> updated on the host, or any random communication failure (since this is a
> network call).

Which of these failures are not handled already by the retry logic? If we have any failure attempting to fetch the URL from IMDS, then then the retry logic will acquire a new lease; so I'm struggling to see which scenario is not covered.

> I think relying on a clean deterministic event should be preferred.

I agree; I'd much rather see use of the HyperV Keystore here for indicating both that a VM has been tagged as pre-provision, and when its network has been moved.

> Furthermore, I believe the code that polls host for new provisioning data
> should just do that, and not worry about setting up networking.

The _poll_imds is not setting up networking for the system; rather it's setting up a temporary interface and DHCP'ing on to acquire information at a specific time for a specific use-case: preprovisioning.

Revision history for this message
Sushant Sharma (sushantsharma) wrote :

> > > Can you describe in what scenario the current code which polls the IMDS
> url
> > > for failures is inadequate?
> > >
> > > Is there a way we can test the current code on Azure directly?
> >
> > Thanks Ryan, This is a valid question, and we are in process of testing in
> > Azure with all components involved (including updated host). Having said
> that,
> > IMDS poll can fail for multiple reasons, e.g., IMDS in the middle of getting
> > updated on the host, or any random communication failure (since this is a
> > network call).
>
> Which of these failures are not handled already by the retry logic? If we
> have any failure attempting to fetch the URL from IMDS, then then the retry
> logic will acquire a new lease; so I'm struggling to see which scenario is not
> covered.
>

Let's remember that time to move VM in the new network is also critical for our customers. If we can save any amount of time (even if it is few seconds on average) that we spend on polling and retrying dhcp on failure, we should prefer that. Having said that, let us run some tests and see how much we are saving in the solution that depends on netlink versus the existing checked in code. Will report the results and hopefully we can make the case again for the current PR.

Thanks,
Sushant

Revision history for this message
Douglas Jordan (dojordan) wrote :

Added a few (mostly style) comments

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I addressed review comments

Revision history for this message
Douglas Jordan (dojordan) :
Revision history for this message
Douglas Jordan (dojordan) :
Revision history for this message
Paul Meyer (paul-meyer) wrote :

Overall LGTM, except for the lack of unit tests for the new code.

You could change the 0/-1 return code from create_bind_netlink_socket to be either an object property (netevent.is_listening) or throwing an exception with handling in the calling code.

review: Needs Fixing
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

Hi Paul, I added unit test for network switch. Can you please review?

Revision history for this message
Chad Smith (chad.smith) wrote :

Updated my comment and patch suggestions, please feel free to disagree where you think points are irrelevant or wrong

  Here are a couple thoughts on your branch to aid in testability of the branch features:

The complete patch suggestion is here http://paste.ubuntu.com/p/pDYvtFQN8g/

I've itemized the general thoughts here for your review to see what you think
  1. generally let's limit what lives in a try/except block to code that we expect will actually generate exceptions. I moved general increment logic outside of the struct.unpack calls and moved the try/except socket.error checks to around the read_netlink_socket as I didn't think socket.errors would be raised by anything else in wait_for_media_disconnect_connect

  2. I suggest breaking up NetworkEvents class as it seems just a collection of functions more than a class with state. When I see one-line class methods that don't actually depend on any instance attributes, it's a hint that it might be an empty level of indirection we just don't need (like select_netlink_socket, read_socket and close_socket methods).

 3. renam3 networkevents.py -> netlink.py as a module to make it a bit more specific
 4 personal preference is to use namedtuples instead of classes to host simple structured attributes like your rta_attr definition
 5. I'd like to see simple unit tests on each function per the simple example I added in cloudinit/sources/helpers/tests/test_netlink.py

review: Needs Information
Revision history for this message
Chad Smith (chad.smith) wrote :

Also here's a cloudinit/sources/helpers/tests/test_netlink.py basic single test example

http://paste.ubuntu.com/p/SHTwRRZPD2/

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:f561d8f6a14a1136c99820fefdd78a3c65a405dd
https://jenkins.ubuntu.com/server/job/cloud-init-ci/429/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    FAILED: Ubuntu LTS: Build

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/429/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:7712d6fa901ea6e132fd3bd7a526af566e5d3aa6
https://jenkins.ubuntu.com/server/job/cloud-init-ci/439/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/439/rebuild

review: Approve (continuous-integration)
Revision history for this message
Joshua Powers (powersj) wrote :

CI passed on latest commit, there are conflicts below that need to be worked out, moving it back to needs review at Tamilmani's request.

0c14e44... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

# Conflicts:
# cloudinit/sources/DataSourceAzure.py

144e6d8... by Tamilmani Manoharan

fixed indentation error

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for working on the refactor. This looks a lot better and it's much easier to read and write unittests.

I've quite a few inline comments, please look through those. The biggest query is around how do we know when the vnet migration has started to prevent us from getting stuck forever if it's already started before we attempt to listen for events.

review: Needs Fixing
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> Thanks for working on the refactor. This looks a lot better and it's much
> easier to read and write unittests.
>
> I've quite a few inline comments, please look through those. The biggest
> query is around how do we know when the vnet migration has started to prevent
> us from getting stuck forever if it's already started before we attempt to
> listen for events.

Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet switch won't happen. Its way of informing the top level services that vm is ready for vnet switch. So I have created and bound netlink socket before that call so it won't miss any event that happens after that.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I have few concern over some review comments which i replied inline. Other than that, I'm fine with other comments and will work on addressing those

Revision history for this message
Ryan Harper (raharper) wrote :

> > Thanks for working on the refactor. This looks a lot better and it's much
> > easier to read and write unittests.
> >
> > I've quite a few inline comments, please look through those. The biggest
> > query is around how do we know when the vnet migration has started to
> prevent
> > us from getting stuck forever if it's already started before we attempt to
> > listen for events.
>
> Thanks for reviewing it. Unless cloud-init report ready to azure fabric, vnet
> switch won't happen. Its way of informing the top level services that vm is
> ready for vnet switch. So I have created and bound netlink socket before that
> call so it won't miss any event that happens after that.

OK, so, the sequence then is:

1) crawl initial metadata and detect if pre-provision VM, if so write a REPROVISION marker file
2) in _poll_imds create nl socket
3) write a REPORT_READY marker file and call _report_ready()
4) _report_ready() will invoke the shim
5) wait_for_media_disconnect()

The window is now between (2) and (5); and the question is what does the netlink socket do with the incoming messages that we won't process until (5)?

From reading the upstream docs on netlink, the default netlink socket size is 32kb, and while the typical messages may be small, the payload for messages is arbitrary. I know that DELLINK and NEWLINK are small but other messages may come through and if we're not processing then the socket could become full.

I suspect that in practice there isn't a lot of netlink traffic on the node with only a single interface up at this point but I'm wondering if we shouldn't employ something that allows us to:

a) spawn a thread to start doing the "Wait" which would start reading messages as they come in on the socket *before* we send the READY flag to the fabric
b) after sending the ready flag, we would join() the thread which would require the main program to wait until the thread exited (ie, we detected the media change event).

Thoughts?

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> > > Thanks for working on the refactor. This looks a lot better and it's much
> > > easier to read and write unittests.
> > >
> > > I've quite a few inline comments, please look through those. The biggest
> > > query is around how do we know when the vnet migration has started to
> > prevent
> > > us from getting stuck forever if it's already started before we attempt to
> > > listen for events.
> >
> > Thanks for reviewing it. Unless cloud-init report ready to azure fabric,
> vnet
> > switch won't happen. Its way of informing the top level services that vm is
> > ready for vnet switch. So I have created and bound netlink socket before
> that
> > call so it won't miss any event that happens after that.
>
> OK, so, the sequence then is:
>
> 1) crawl initial metadata and detect if pre-provision VM, if so write a
> REPROVISION marker file
> 2) in _poll_imds create nl socket
> 3) write a REPORT_READY marker file and call _report_ready()
> 4) _report_ready() will invoke the shim
> 5) wait_for_media_disconnect()
>
> The window is now between (2) and (5); and the question is what does the
> netlink socket do with the incoming messages that we won't process until (5)?
>
> From reading the upstream docs on netlink, the default netlink socket size is
> 32kb, and while the typical messages may be small, the payload for messages is
> arbitrary. I know that DELLINK and NEWLINK are small but other messages may
> come through and if we're not processing then the socket could become full.
>
> I suspect that in practice there isn't a lot of netlink traffic on the node
> with only a single interface up at this point but I'm wondering if we
> shouldn't employ something that allows us to:
>
> a) spawn a thread to start doing the "Wait" which would start reading messages
> as they come in on the socket *before* we send the READY flag to the fabric
> b) after sending the ready flag, we would join() the thread which would
> require the main program to wait until the thread exited (ie, we detected the
> media change event).
>
> Thoughts?

I don't think we receive that many netlink messages since this happens before systemd-networkd starts. we manually setup an ephemeral network(one interface) to send report ready. Also netlink code listen for only RTMGRP_LINK(network interface create/delete/up/down events) and not for any ipaddress and route changes. There will be only one interface at that time and it reads only above events for that interface

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

OK. Let's update the create_netlink_socket() method with some of these details, binding to only RTNLGRP_LINK (which only includes RTM_NEWLINK/RTM_DELLINK/RTN_GETLINK).

It would be nice if we could bind to those events and only on a specific ifindex (we could look up the currently up interface) but it's sufficient to filter in the wait_for_media_disconnect_connect().

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> OK. Let's update the create_netlink_socket() method with some of these
> details, binding to only RTNLGRP_LINK (which only includes
> RTM_NEWLINK/RTM_DELLINK/RTN_GETLINK).
>
> It would be nice if we could bind to those events and only on a specific
> ifindex (we could look up the currently up interface) but it's sufficient to
> filter in the wait_for_media_disconnect_connect().

ok will update it.

58ef420... by Tamilmani Manoharan

addressed review comments

3d3efee... by Tamilmani Manoharan

reverted back the line break change

148c88c... by Tamilmani Manoharan

added few more unit tests.

7e2bba6... by Tamilmani Manoharan

Added a test case to validate down up scenario

45806ec... by Tamilmani Manoharan

cleanup and retry on netlink socket creation error

cc64c51... by Tamilmani Manoharan

optimized test code

Revision history for this message
Ryan Harper (raharper) wrote :

I couple more in-line requests below. Thanks for incorporating the requested changes.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
6e4cfb7... by Tamilmani Manoharan

addressed iteration 3 review comments

91a4304... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

# Conflicts:
# cloudinit/sources/DataSourceAzure.py
# tests/unittests/test_datasource/test_azure.py

feb9f10... by Tamilmani Manoharan

fixed azure UTs regression with recent changes in master

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> I couple more in-line requests below. Thanks for incorporating the requested
> changes.

Addressed your comments. Please take a look.

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

thanks for reviewing. I will address your comments. Do you have any concerns other than these comments? Can you sign off once i address below comments?

59e5fe9... by Tamilmani Manoharan

addressed iteration 4 review comments

9c6eb4a... by Tamilmani Manoharan

added comments

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
b507379... by Tamilmani Manoharan

addressed iteration 5 review comments
Fixed code style error

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:b5073795740154066b791b2a9ab1f8a624a8bb3e
https://jenkins.ubuntu.com/server/job/cloud-init-ci/454/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/454/rebuild

review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Chad Smith (chad.smith) wrote :

Leaving this approved pending your comments regarding ignoring iterim state transitions for operstates != DOWN, DORMANT or UP.

review: Approve
2fc6e87... by Tamilmani Manoharan

addressed iteration 6 comments

75c8987... by Tamilmani Manoharan

dummy commit to sync with launchpad

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

i addressed your comments

ad8ebf4... by Tamilmani Manoharan

moved log inside function as per review comment

Revision history for this message
Chad Smith (chad.smith) wrote :

Looks good, and unit tests pass. 2 Minor inline followups.
 Thanks Tamilmani.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

Please review the commit message I added.

I've a couple in-line comments on what we return when reading the attrs, please comment there as well.

review: Needs Information
f12813e... by Tamilmani Manoharan

fixed typo

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Ryan Harper (raharper) wrote :

One more follow-up on the rta processing.

review: Needs Information
Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
205864e... by Tamilmani Manoharan

return interfaceoperstate as none in case of failure

Revision history for this message
Ryan Harper (raharper) wrote :

Some comments on debug messages. And another occurred to me about the contents we get back from the socket, wondering if that is only ever a single message or possible a sequence of messages, and if so, don't we need to loop over the data?

review: Needs Information
Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

For netlink down/up events we won't get multipart message and i already tested it. Multipart messages will be received only if payload is too big.

dcf4c86... by Tamilmani Manoharan

Addressed reading multiple netlink messages in single receive.

978f7ed... by Tamilmani Manoharan

added test case for receiving multiple netlink messages in single receive call

d3fd194... by Tamilmani Manoharan

reverted changes in test_azure

fd355fc... by Tamilmani Manoharan

adhere to tox standard

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for handling multiple messages in the buffer. Some in-line comments, suggestions. I'd to add a test-case that ensure we parse the socket data buffer to completion and then a second read to the buffer to return the media change sequence so we validate that we handle entering and exiting both sets of loops.

Revision history for this message
Scott Moser (smoser) wrote :

1 small comment in line.
generally, i wonder what happens here on freebsd.

006bf63... by Tamilmani Manoharan

Added testcase for reading partial messages

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

I responded for your comments.

437b25b... by Tamilmani Manoharan

modified logs

Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
Revision history for this message
Scott Moser (smoser) wrote :

what about freebsd?
what happens there? do we hit your newly added code or is it avoided.

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> what about freebsd?
> what happens there? do we hit your newly added code or is it avoided.

Yes it will hit for freebsd also. Is there any problem with that?

85c5a6d... by Tamilmani Manoharan

1. Check for free bsd
2. Handle assertion errors

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

Scott/Ryan, addressed your comments. Can you sign off?

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Tamilmani Manoharan (tamilmani1989) :
c600588... by Tamilmani Manoharan

addressed reading partial messages from buffer issue and handled netlink socket close in case of exception

77dfb5f... by Tamilmani Manoharan

removed blank line at eof

Revision history for this message
Ryan Harper (raharper) :
8157985... by Tamilmani Manoharan

removed saving data to another variable

Revision history for this message
Ryan Harper (raharper) wrote :

Just a spelling fix in the unittests.

I think the core logic in wait_() is solid. Do you have a way to test this branch on an instances doing the pre-provision logic just to ensure we're not outsmarting ourselves with the additional parsing logic ?

Revision history for this message
Tamilmani Manoharan (tamilmani1989) wrote :

> Just a spelling fix in the unittests.
>
> I think the core logic in wait_() is solid. Do you have a way to test this
> branch on an instances doing the pre-provision logic just to ensure we're not
> outsmarting ourselves with the additional parsing logic ?

Yes i already tested that multiple times. I push image(with my changes) as vhd in azure marketplace and then deploy pps vm using that.

4d7dde8... by Tamilmani Manoharan

fixed spell error

Revision history for this message
Ryan Harper (raharper) wrote :

Thank you for the hard work and many changes.

review: Approve
ea309e3... by Tamilmani Manoharan

Fixed code style errors

Revision history for this message
Sushant Sharma (sushantsharma) wrote :

Thanks a lot Tamilmani!

review: Approve
Revision history for this message
Scott Moser (smoser) :
review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:ea309e371cb803edfe9f76a33261bfd02b4b5dda
https://jenkins.ubuntu.com/server/job/cloud-init-ci/467/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/467/rebuild

review: Approve (continuous-integration)
5917a00... by Tamilmani Manoharan

Merge branch 'master' of https://git.launchpad.net/cloud-init into azure_networking

9bc4a08... by Tamilmani Manoharan

There are 2 fixes:
1. if timeout happens, read returns none and concatenate data after that check
2. While resolving merge conflicts, removed else part accidentally. Restored the else condition.
3. Also reverted testcase changes

56559c2... by Tamilmani Manoharan

Fixed code style error

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:9bc4a08cbfb3cddc44e640d5543aaeef7fb186d5
https://jenkins.ubuntu.com/server/job/cloud-init-ci/469/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/469/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:56559c2548f858046a0775d21190cb167463cdcb
https://jenkins.ubuntu.com/server/job/cloud-init-ci/470/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/470/rebuild

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
index be82ec4..e076d5d 100644
--- a/cloudinit/sources/DataSourceAzure.py
+++ b/cloudinit/sources/DataSourceAzure.py
@@ -22,6 +22,7 @@ from cloudinit.event import EventType
22from cloudinit.net.dhcp import EphemeralDHCPv422from cloudinit.net.dhcp import EphemeralDHCPv4
23from cloudinit import sources23from cloudinit import sources
24from cloudinit.sources.helpers.azure import get_metadata_from_fabric24from cloudinit.sources.helpers.azure import get_metadata_from_fabric
25from cloudinit.sources.helpers import netlink
25from cloudinit.url_helper import UrlError, readurl, retry_on_url_exc26from cloudinit.url_helper import UrlError, readurl, retry_on_url_exc
26from cloudinit import util27from cloudinit import util
2728
@@ -409,6 +410,10 @@ class DataSourceAzure(sources.DataSource):
409410
410 perform_reprovision = reprovision or self._should_reprovision(ret)411 perform_reprovision = reprovision or self._should_reprovision(ret)
411 if perform_reprovision:412 if perform_reprovision:
413 if util.is_FreeBSD():
414 msg = "Free BSD is not supported for PPS VMs"
415 LOG.error(msg)
416 raise sources.InvalidMetaDataException(msg)
412 ret = self._reprovision()417 ret = self._reprovision()
413 imds_md = get_metadata_from_imds(418 imds_md = get_metadata_from_imds(
414 self.fallback_interface, retries=3)419 self.fallback_interface, retries=3)
@@ -523,8 +528,8 @@ class DataSourceAzure(sources.DataSource):
523 response. Then return the returned JSON object."""528 response. Then return the returned JSON object."""
524 url = IMDS_URL + "reprovisiondata?api-version=2017-04-02"529 url = IMDS_URL + "reprovisiondata?api-version=2017-04-02"
525 headers = {"Metadata": "true"}530 headers = {"Metadata": "true"}
531 nl_sock = None
526 report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE))532 report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE))
527 LOG.debug("Start polling IMDS")
528533
529 def exc_cb(msg, exception):534 def exc_cb(msg, exception):
530 if isinstance(exception, UrlError) and exception.code == 404:535 if isinstance(exception, UrlError) and exception.code == 404:
@@ -533,12 +538,19 @@ class DataSourceAzure(sources.DataSource):
533 # call DHCP and setup the ephemeral network to acquire the new IP.538 # call DHCP and setup the ephemeral network to acquire the new IP.
534 return False539 return False
535540
541 LOG.debug("Wait for vnetswitch to happen")
536 while True:542 while True:
537 try:543 try:
538 # Save our EphemeralDHCPv4 context so we avoid repeated dhcp544 # Save our EphemeralDHCPv4 context so we avoid repeated dhcp
539 self._ephemeral_dhcp_ctx = EphemeralDHCPv4()545 self._ephemeral_dhcp_ctx = EphemeralDHCPv4()
540 lease = self._ephemeral_dhcp_ctx.obtain_lease()546 lease = self._ephemeral_dhcp_ctx.obtain_lease()
541 if report_ready:547 if report_ready:
548 try:
549 nl_sock = netlink.create_bound_netlink_socket()
550 except netlink.NetlinkCreateSocketError as e:
551 LOG.warning(e)
552 self._ephemeral_dhcp_ctx.clean_network()
553 return
542 path = REPORTED_READY_MARKER_FILE554 path = REPORTED_READY_MARKER_FILE
543 LOG.info(555 LOG.info(
544 "Creating a marker file to report ready: %s", path)556 "Creating a marker file to report ready: %s", path)
@@ -546,13 +558,24 @@ class DataSourceAzure(sources.DataSource):
546 pid=os.getpid(), time=time()))558 pid=os.getpid(), time=time()))
547 self._report_ready(lease=lease)559 self._report_ready(lease=lease)
548 report_ready = False560 report_ready = False
549 return readurl(url, timeout=1, headers=headers,561 try:
550 exception_cb=exc_cb, infinite=True,562 netlink.wait_for_media_disconnect_connect(
551 log_req_resp=False).contents563 nl_sock, lease['interface'])
564 except AssertionError as error:
565 LOG.error(error)
566 return
567 self._ephemeral_dhcp_ctx.clean_network()
568 else:
569 return readurl(url, timeout=1, headers=headers,
570 exception_cb=exc_cb, infinite=True,
571 log_req_resp=False).contents
552 except UrlError:572 except UrlError:
553 # Teardown our EphemeralDHCPv4 context on failure as we retry573 # Teardown our EphemeralDHCPv4 context on failure as we retry
554 self._ephemeral_dhcp_ctx.clean_network()574 self._ephemeral_dhcp_ctx.clean_network()
555 pass575 pass
576 finally:
577 if nl_sock:
578 nl_sock.close()
556579
557 def _report_ready(self, lease):580 def _report_ready(self, lease):
558 """Tells the fabric provisioning has completed """581 """Tells the fabric provisioning has completed """
diff --git a/cloudinit/sources/helpers/netlink.py b/cloudinit/sources/helpers/netlink.py
559new file mode 100644582new file mode 100644
index 0000000..d377ae3
--- /dev/null
+++ b/cloudinit/sources/helpers/netlink.py
@@ -0,0 +1,250 @@
1# Author: Tamilmani Manoharan <tamanoha@microsoft.com>
2#
3# This file is part of cloud-init. See LICENSE file for license information.
4
5from cloudinit import log as logging
6from cloudinit import util
7from collections import namedtuple
8
9import os
10import select
11import socket
12import struct
13
14LOG = logging.getLogger(__name__)
15
16# http://man7.org/linux/man-pages/man7/netlink.7.html
17RTMGRP_LINK = 1
18NLMSG_NOOP = 1
19NLMSG_ERROR = 2
20NLMSG_DONE = 3
21RTM_NEWLINK = 16
22RTM_DELLINK = 17
23RTM_GETLINK = 18
24RTM_SETLINK = 19
25MAX_SIZE = 65535
26RTA_DATA_OFFSET = 32
27MSG_TYPE_OFFSET = 16
28SELECT_TIMEOUT = 60
29
30NLMSGHDR_FMT = "IHHII"
31IFINFOMSG_FMT = "BHiII"
32NLMSGHDR_SIZE = struct.calcsize(NLMSGHDR_FMT)
33IFINFOMSG_SIZE = struct.calcsize(IFINFOMSG_FMT)
34RTATTR_START_OFFSET = NLMSGHDR_SIZE + IFINFOMSG_SIZE
35RTA_DATA_START_OFFSET = 4
36PAD_ALIGNMENT = 4
37
38IFLA_IFNAME = 3
39IFLA_OPERSTATE = 16
40
41# https://www.kernel.org/doc/Documentation/networking/operstates.txt
42OPER_UNKNOWN = 0
43OPER_NOTPRESENT = 1
44OPER_DOWN = 2
45OPER_LOWERLAYERDOWN = 3
46OPER_TESTING = 4
47OPER_DORMANT = 5
48OPER_UP = 6
49
50RTAAttr = namedtuple('RTAAttr', ['length', 'rta_type', 'data'])
51InterfaceOperstate = namedtuple('InterfaceOperstate', ['ifname', 'operstate'])
52NetlinkHeader = namedtuple('NetlinkHeader', ['length', 'type', 'flags', 'seq',
53 'pid'])
54
55
56class NetlinkCreateSocketError(RuntimeError):
57 '''Raised if netlink socket fails during create or bind.'''
58 pass
59
60
61def create_bound_netlink_socket():
62 '''Creates netlink socket and bind on netlink group to catch interface
63 down/up events. The socket will bound only on RTMGRP_LINK (which only
64 includes RTM_NEWLINK/RTM_DELLINK/RTM_GETLINK events). The socket is set to
65 non-blocking mode since we're only receiving messages.
66
67 :returns: netlink socket in non-blocking mode
68 :raises: NetlinkCreateSocketError
69 '''
70 try:
71 netlink_socket = socket.socket(socket.AF_NETLINK,
72 socket.SOCK_RAW,
73 socket.NETLINK_ROUTE)
74 netlink_socket.bind((os.getpid(), RTMGRP_LINK))
75 netlink_socket.setblocking(0)
76 except socket.error as e:
77 msg = "Exception during netlink socket create: %s" % e
78 raise NetlinkCreateSocketError(msg)
79 LOG.debug("Created netlink socket")
80 return netlink_socket
81
82
83def get_netlink_msg_header(data):
84 '''Gets netlink message type and length
85
86 :param: data read from netlink socket
87 :returns: netlink message type
88 :raises: AssertionError if data is None or data is not >= NLMSGHDR_SIZE
89 struct nlmsghdr {
90 __u32 nlmsg_len; /* Length of message including header */
91 __u16 nlmsg_type; /* Type of message content */
92 __u16 nlmsg_flags; /* Additional flags */
93 __u32 nlmsg_seq; /* Sequence number */
94 __u32 nlmsg_pid; /* Sender port ID */
95 };
96 '''
97 assert (data is not None), ("data is none")
98 assert (len(data) >= NLMSGHDR_SIZE), (
99 "data is smaller than netlink message header")
100 msg_len, msg_type, flags, seq, pid = struct.unpack(NLMSGHDR_FMT,
101 data[:MSG_TYPE_OFFSET])
102 LOG.debug("Got netlink msg of type %d", msg_type)
103 return NetlinkHeader(msg_len, msg_type, flags, seq, pid)
104
105
106def read_netlink_socket(netlink_socket, timeout=None):
107 '''Select and read from the netlink socket if ready.
108
109 :param: netlink_socket: specify which socket object to read from
110 :param: timeout: specify a timeout value (integer) to wait while reading,
111 if none, it will block indefinitely until socket ready for read
112 :returns: string of data read (max length = <MAX_SIZE>) from socket,
113 if no data read, returns None
114 :raises: AssertionError if netlink_socket is None
115 '''
116 assert (netlink_socket is not None), ("netlink socket is none")
117 read_set, _, _ = select.select([netlink_socket], [], [], timeout)
118 # Incase of timeout,read_set doesn't contain netlink socket.
119 # just return from this function
120 if netlink_socket not in read_set:
121 return None
122 LOG.debug("netlink socket ready for read")
123 data = netlink_socket.recv(MAX_SIZE)
124 if data is None:
125 LOG.error("Reading from Netlink socket returned no data")
126 return data
127
128
129def unpack_rta_attr(data, offset):
130 '''Unpack a single rta attribute.
131
132 :param: data: string of data read from netlink socket
133 :param: offset: starting offset of RTA Attribute
134 :return: RTAAttr object with length, type and data. On error, return None.
135 :raises: AssertionError if data is None or offset is not integer.
136 '''
137 assert (data is not None), ("data is none")
138 assert (type(offset) == int), ("offset is not integer")
139 assert (offset >= RTATTR_START_OFFSET), (
140 "rta offset is less than expected length")
141 length = rta_type = 0
142 attr_data = None
143 try:
144 length = struct.unpack_from("H", data, offset=offset)[0]
145 rta_type = struct.unpack_from("H", data, offset=offset+2)[0]
146 except struct.error:
147 return None # Should mean our offset is >= remaining data
148
149 # Unpack just the attribute's data. Offset by 4 to skip length/type header
150 attr_data = data[offset+RTA_DATA_START_OFFSET:offset+length]
151 return RTAAttr(length, rta_type, attr_data)
152
153
154def read_rta_oper_state(data):
155 '''Reads Interface name and operational state from RTA Data.
156
157 :param: data: string of data read from netlink socket
158 :returns: InterfaceOperstate object containing if_name and oper_state.
159 None if data does not contain valid IFLA_OPERSTATE and
160 IFLA_IFNAME messages.
161 :raises: AssertionError if data is None or length of data is
162 smaller than RTATTR_START_OFFSET.
163 '''
164 assert (data is not None), ("data is none")
165 assert (len(data) > RTATTR_START_OFFSET), (
166 "length of data is smaller than RTATTR_START_OFFSET")
167 ifname = operstate = None
168 offset = RTATTR_START_OFFSET
169 while offset <= len(data):
170 attr = unpack_rta_attr(data, offset)
171 if not attr or attr.length == 0:
172 break
173 # Each attribute is 4-byte aligned. Determine pad length.
174 padlen = (PAD_ALIGNMENT -
175 (attr.length % PAD_ALIGNMENT)) % PAD_ALIGNMENT
176 offset += attr.length + padlen
177
178 if attr.rta_type == IFLA_OPERSTATE:
179 operstate = ord(attr.data)
180 elif attr.rta_type == IFLA_IFNAME:
181 interface_name = util.decode_binary(attr.data, 'utf-8')
182 ifname = interface_name.strip('\0')
183 if not ifname or operstate is None:
184 return None
185 LOG.debug("rta attrs: ifname %s operstate %d", ifname, operstate)
186 return InterfaceOperstate(ifname, operstate)
187
188
189def wait_for_media_disconnect_connect(netlink_socket, ifname):
190 '''Block until media disconnect and connect has happened on an interface.
191 Listens on netlink socket to receive netlink events and when the carrier
192 changes from 0 to 1, it considers event has happened and
193 return from this function
194
195 :param: netlink_socket: netlink_socket to receive events
196 :param: ifname: Interface name to lookout for netlink events
197 :raises: AssertionError if netlink_socket is None or ifname is None.
198 '''
199 assert (netlink_socket is not None), ("netlink socket is none")
200 assert (ifname is not None), ("interface name is none")
201 assert (len(ifname) > 0), ("interface name cannot be empty")
202 carrier = OPER_UP
203 prevCarrier = OPER_UP
204 data = bytes()
205 LOG.debug("Wait for media disconnect and reconnect to happen")
206 while True:
207 recv_data = read_netlink_socket(netlink_socket, SELECT_TIMEOUT)
208 if recv_data is None:
209 continue
210 LOG.debug('read %d bytes from socket', len(recv_data))
211 data += recv_data
212 LOG.debug('Length of data after concat %d', len(data))
213 offset = 0
214 datalen = len(data)
215 while offset < datalen:
216 nl_msg = data[offset:]
217 if len(nl_msg) < NLMSGHDR_SIZE:
218 LOG.debug("Data is smaller than netlink header")
219 break
220 nlheader = get_netlink_msg_header(nl_msg)
221 if len(nl_msg) < nlheader.length:
222 LOG.debug("Partial data. Smaller than netlink message")
223 break
224 padlen = (nlheader.length+PAD_ALIGNMENT-1) & ~(PAD_ALIGNMENT-1)
225 offset = offset + padlen
226 LOG.debug('offset to next netlink message: %d', offset)
227 # Ignore any messages not new link or del link
228 if nlheader.type not in [RTM_NEWLINK, RTM_DELLINK]:
229 continue
230 interface_state = read_rta_oper_state(nl_msg)
231 if interface_state is None:
232 LOG.debug('Failed to read rta attributes: %s', interface_state)
233 continue
234 if interface_state.ifname != ifname:
235 LOG.debug(
236 "Ignored netlink event on interface %s. Waiting for %s.",
237 interface_state.ifname, ifname)
238 continue
239 if interface_state.operstate not in [OPER_UP, OPER_DOWN]:
240 continue
241 prevCarrier = carrier
242 carrier = interface_state.operstate
243 # check for carrier down, up sequence
244 isVnetSwitch = (prevCarrier == OPER_DOWN) and (carrier == OPER_UP)
245 if isVnetSwitch:
246 LOG.debug("Media switch happened on %s.", ifname)
247 return
248 data = data[offset:]
249
250# vi: ts=4 expandtab
diff --git a/cloudinit/sources/helpers/tests/test_netlink.py b/cloudinit/sources/helpers/tests/test_netlink.py
0new file mode 100644251new file mode 100644
index 0000000..c2898a1
--- /dev/null
+++ b/cloudinit/sources/helpers/tests/test_netlink.py
@@ -0,0 +1,373 @@
1# Author: Tamilmani Manoharan <tamanoha@microsoft.com>
2#
3# This file is part of cloud-init. See LICENSE file for license information.
4
5from cloudinit.tests.helpers import CiTestCase, mock
6import socket
7import struct
8import codecs
9from cloudinit.sources.helpers.netlink import (
10 NetlinkCreateSocketError, create_bound_netlink_socket, read_netlink_socket,
11 read_rta_oper_state, unpack_rta_attr, wait_for_media_disconnect_connect,
12 OPER_DOWN, OPER_UP, OPER_DORMANT, OPER_LOWERLAYERDOWN, OPER_NOTPRESENT,
13 OPER_TESTING, OPER_UNKNOWN, RTATTR_START_OFFSET, RTM_NEWLINK, RTM_SETLINK,
14 RTM_GETLINK, MAX_SIZE)
15
16
17def int_to_bytes(i):
18 '''convert integer to binary: eg: 1 to \x01'''
19 hex_value = '{0:x}'.format(i)
20 hex_value = '0' * (len(hex_value) % 2) + hex_value
21 return codecs.decode(hex_value, 'hex_codec')
22
23
24class TestCreateBoundNetlinkSocket(CiTestCase):
25
26 @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
27 def test_socket_error_on_create(self, m_socket):
28 '''create_bound_netlink_socket catches socket creation exception'''
29
30 """NetlinkCreateSocketError is raised when socket creation errors."""
31 m_socket.side_effect = socket.error("Fake socket failure")
32 with self.assertRaises(NetlinkCreateSocketError) as ctx_mgr:
33 create_bound_netlink_socket()
34 self.assertEqual(
35 'Exception during netlink socket create: Fake socket failure',
36 str(ctx_mgr.exception))
37
38
39class TestReadNetlinkSocket(CiTestCase):
40
41 @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
42 @mock.patch('cloudinit.sources.helpers.netlink.select.select')
43 def test_read_netlink_socket(self, m_select, m_socket):
44 '''read_netlink_socket able to receive data'''
45 data = 'netlinktest'
46 m_select.return_value = [m_socket], None, None
47 m_socket.recv.return_value = data
48 recv_data = read_netlink_socket(m_socket, 2)
49 m_select.assert_called_with([m_socket], [], [], 2)
50 m_socket.recv.assert_called_with(MAX_SIZE)
51 self.assertIsNotNone(recv_data)
52 self.assertEqual(recv_data, data)
53
54 @mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
55 @mock.patch('cloudinit.sources.helpers.netlink.select.select')
56 def test_netlink_read_timeout(self, m_select, m_socket):
57 '''read_netlink_socket should timeout if nothing to read'''
58 m_select.return_value = [], None, None
59 data = read_netlink_socket(m_socket, 1)
60 m_select.assert_called_with([m_socket], [], [], 1)
61 self.assertEqual(m_socket.recv.call_count, 0)
62 self.assertIsNone(data)
63
64 def test_read_invalid_socket(self):
65 '''read_netlink_socket raises assert error if socket is invalid'''
66 socket = None
67 with self.assertRaises(AssertionError) as context:
68 read_netlink_socket(socket, 1)
69 self.assertTrue('netlink socket is none' in str(context.exception))
70
71
72class TestParseNetlinkMessage(CiTestCase):
73
74 def test_read_rta_oper_state(self):
75 '''read_rta_oper_state could parse netlink message and extract data'''
76 ifname = "eth0"
77 bytes = ifname.encode("utf-8")
78 buf = bytearray(48)
79 struct.pack_into("HH4sHHc", buf, RTATTR_START_OFFSET, 8, 3, bytes, 5,
80 16, int_to_bytes(OPER_DOWN))
81 interface_state = read_rta_oper_state(buf)
82 self.assertEqual(interface_state.ifname, ifname)
83 self.assertEqual(interface_state.operstate, OPER_DOWN)
84
85 def test_read_none_data(self):
86 '''read_rta_oper_state raises assert error if data is none'''
87 data = None
88 with self.assertRaises(AssertionError) as context:
89 read_rta_oper_state(data)
90 self.assertTrue('data is none', str(context.exception))
91
92 def test_read_invalid_rta_operstate_none(self):
93 '''read_rta_oper_state returns none if operstate is none'''
94 ifname = "eth0"
95 buf = bytearray(40)
96 bytes = ifname.encode("utf-8")
97 struct.pack_into("HH4s", buf, RTATTR_START_OFFSET, 8, 3, bytes)
98 interface_state = read_rta_oper_state(buf)
99 self.assertIsNone(interface_state)
100
101 def test_read_invalid_rta_ifname_none(self):
102 '''read_rta_oper_state returns none if ifname is none'''
103 buf = bytearray(40)
104 struct.pack_into("HHc", buf, RTATTR_START_OFFSET, 5, 16,
105 int_to_bytes(OPER_DOWN))
106 interface_state = read_rta_oper_state(buf)
107 self.assertIsNone(interface_state)
108
109 def test_read_invalid_data_len(self):
110 '''raise assert error if data size is smaller than required size'''
111 buf = bytearray(32)
112 with self.assertRaises(AssertionError) as context:
113 read_rta_oper_state(buf)
114 self.assertTrue('length of data is smaller than RTATTR_START_OFFSET' in
115 str(context.exception))
116
117 def test_unpack_rta_attr_none_data(self):
118 '''unpack_rta_attr raises assert error if data is none'''
119 data = None
120 with self.assertRaises(AssertionError) as context:
121 unpack_rta_attr(data, RTATTR_START_OFFSET)
122 self.assertTrue('data is none' in str(context.exception))
123
124 def test_unpack_rta_attr_invalid_offset(self):
125 '''unpack_rta_attr raises assert error if offset is invalid'''
126 data = bytearray(48)
127 with self.assertRaises(AssertionError) as context:
128 unpack_rta_attr(data, "offset")
129 self.assertTrue('offset is not integer' in str(context.exception))
130 with self.assertRaises(AssertionError) as context:
131 unpack_rta_attr(data, 31)
132 self.assertTrue('rta offset is less than expected length' in
133 str(context.exception))
134
135
136@mock.patch('cloudinit.sources.helpers.netlink.socket.socket')
137@mock.patch('cloudinit.sources.helpers.netlink.read_netlink_socket')
138class TestWaitForMediaDisconnectConnect(CiTestCase):
139 with_logs = True
140
141 def _media_switch_data(self, ifname, msg_type, operstate):
142 '''construct netlink data with specified fields'''
143 if ifname and operstate is not None:
144 data = bytearray(48)
145 bytes = ifname.encode("utf-8")
146 struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3,
147 bytes, 5, 16, int_to_bytes(operstate))
148 elif ifname:
149 data = bytearray(40)
150 bytes = ifname.encode("utf-8")
151 struct.pack_into("HH4s", data, RTATTR_START_OFFSET, 8, 3, bytes)
152 elif operstate:
153 data = bytearray(40)
154 struct.pack_into("HHc", data, RTATTR_START_OFFSET, 5, 16,
155 int_to_bytes(operstate))
156 struct.pack_into("=LHHLL", data, 0, len(data), msg_type, 0, 0, 0)
157 return data
158
159 def test_media_down_up_scenario(self, m_read_netlink_socket,
160 m_socket):
161 '''Test for media down up sequence for required interface name'''
162 ifname = "eth0"
163 # construct data for Oper State down
164 data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
165 # construct data for Oper State up
166 data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
167 m_read_netlink_socket.side_effect = [data_op_down, data_op_up]
168 wait_for_media_disconnect_connect(m_socket, ifname)
169 self.assertEqual(m_read_netlink_socket.call_count, 2)
170
171 def test_wait_for_media_switch_diff_interface(self, m_read_netlink_socket,
172 m_socket):
173 '''wait_for_media_disconnect_connect ignores unexpected interfaces.
174
175 The first two messages are for other interfaces and last two are for
176 expected interface. So the function exit only after receiving last
177 2 messages and therefore the call count for m_read_netlink_socket
178 has to be 4
179 '''
180 other_ifname = "eth1"
181 expected_ifname = "eth0"
182 data_op_down_eth1 = self._media_switch_data(
183 other_ifname, RTM_NEWLINK, OPER_DOWN)
184 data_op_up_eth1 = self._media_switch_data(
185 other_ifname, RTM_NEWLINK, OPER_UP)
186 data_op_down_eth0 = self._media_switch_data(
187 expected_ifname, RTM_NEWLINK, OPER_DOWN)
188 data_op_up_eth0 = self._media_switch_data(
189 expected_ifname, RTM_NEWLINK, OPER_UP)
190 m_read_netlink_socket.side_effect = [data_op_down_eth1,
191 data_op_up_eth1,
192 data_op_down_eth0,
193 data_op_up_eth0]
194 wait_for_media_disconnect_connect(m_socket, expected_ifname)
195 self.assertIn('Ignored netlink event on interface %s' % other_ifname,
196 self.logs.getvalue())
197 self.assertEqual(m_read_netlink_socket.call_count, 4)
198
199 def test_invalid_msgtype_getlink(self, m_read_netlink_socket, m_socket):
200 '''wait_for_media_disconnect_connect ignores GETLINK events.
201
202 The first two messages are for oper down and up for RTM_GETLINK type
203 which netlink module will ignore. The last 2 messages are RTM_NEWLINK
204 with oper state down and up messages. Therefore the call count for
205 m_read_netlink_socket has to be 4 ignoring first 2 messages
206 of RTM_GETLINK
207 '''
208 ifname = "eth0"
209 data_getlink_down = self._media_switch_data(
210 ifname, RTM_GETLINK, OPER_DOWN)
211 data_getlink_up = self._media_switch_data(
212 ifname, RTM_GETLINK, OPER_UP)
213 data_newlink_down = self._media_switch_data(
214 ifname, RTM_NEWLINK, OPER_DOWN)
215 data_newlink_up = self._media_switch_data(
216 ifname, RTM_NEWLINK, OPER_UP)
217 m_read_netlink_socket.side_effect = [data_getlink_down,
218 data_getlink_up,
219 data_newlink_down,
220 data_newlink_up]
221 wait_for_media_disconnect_connect(m_socket, ifname)
222 self.assertEqual(m_read_netlink_socket.call_count, 4)
223
224 def test_invalid_msgtype_setlink(self, m_read_netlink_socket, m_socket):
225 '''wait_for_media_disconnect_connect ignores SETLINK events.
226
227 The first two messages are for oper down and up for RTM_GETLINK type
228 which it will ignore. 3rd and 4th messages are RTM_NEWLINK with down
229 and up messages. This function should exit after 4th messages since it
230 sees down->up scenario. So the call count for m_read_netlink_socket
231 has to be 4 ignoring first 2 messages of RTM_GETLINK and
232 last 2 messages of RTM_NEWLINK
233 '''
234 ifname = "eth0"
235 data_setlink_down = self._media_switch_data(
236 ifname, RTM_SETLINK, OPER_DOWN)
237 data_setlink_up = self._media_switch_data(
238 ifname, RTM_SETLINK, OPER_UP)
239 data_newlink_down = self._media_switch_data(
240 ifname, RTM_NEWLINK, OPER_DOWN)
241 data_newlink_up = self._media_switch_data(
242 ifname, RTM_NEWLINK, OPER_UP)
243 m_read_netlink_socket.side_effect = [data_setlink_down,
244 data_setlink_up,
245 data_newlink_down,
246 data_newlink_up,
247 data_newlink_down,
248 data_newlink_up]
249 wait_for_media_disconnect_connect(m_socket, ifname)
250 self.assertEqual(m_read_netlink_socket.call_count, 4)
251
252 def test_netlink_invalid_switch_scenario(self, m_read_netlink_socket,
253 m_socket):
254 '''returns only if it receives UP event after a DOWN event'''
255 ifname = "eth0"
256 data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
257 data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
258 data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK,
259 OPER_DORMANT)
260 data_op_notpresent = self._media_switch_data(ifname, RTM_NEWLINK,
261 OPER_NOTPRESENT)
262 data_op_lowerdown = self._media_switch_data(ifname, RTM_NEWLINK,
263 OPER_LOWERLAYERDOWN)
264 data_op_testing = self._media_switch_data(ifname, RTM_NEWLINK,
265 OPER_TESTING)
266 data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK,
267 OPER_UNKNOWN)
268 m_read_netlink_socket.side_effect = [data_op_up, data_op_up,
269 data_op_dormant, data_op_up,
270 data_op_notpresent, data_op_up,
271 data_op_lowerdown, data_op_up,
272 data_op_testing, data_op_up,
273 data_op_unknown, data_op_up,
274 data_op_down, data_op_up]
275 wait_for_media_disconnect_connect(m_socket, ifname)
276 self.assertEqual(m_read_netlink_socket.call_count, 14)
277
278 def test_netlink_valid_inbetween_transitions(self, m_read_netlink_socket,
279 m_socket):
280 '''wait_for_media_disconnect_connect handles in between transitions'''
281 ifname = "eth0"
282 data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
283 data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
284 data_op_dormant = self._media_switch_data(ifname, RTM_NEWLINK,
285 OPER_DORMANT)
286 data_op_unknown = self._media_switch_data(ifname, RTM_NEWLINK,
287 OPER_UNKNOWN)
288 m_read_netlink_socket.side_effect = [data_op_down, data_op_dormant,
289 data_op_unknown, data_op_up]
290 wait_for_media_disconnect_connect(m_socket, ifname)
291 self.assertEqual(m_read_netlink_socket.call_count, 4)
292
293 def test_netlink_invalid_operstate(self, m_read_netlink_socket, m_socket):
294 '''wait_for_media_disconnect_connect should handle invalid operstates.
295
296 The function should not fail and return even if it receives invalid
297 operstates. It always should wait for down up sequence.
298 '''
299 ifname = "eth0"
300 data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
301 data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
302 data_op_invalid = self._media_switch_data(ifname, RTM_NEWLINK, 7)
303 m_read_netlink_socket.side_effect = [data_op_invalid, data_op_up,
304 data_op_down, data_op_invalid,
305 data_op_up]
306 wait_for_media_disconnect_connect(m_socket, ifname)
307 self.assertEqual(m_read_netlink_socket.call_count, 5)
308
309 def test_wait_invalid_socket(self, m_read_netlink_socket, m_socket):
310 '''wait_for_media_disconnect_connect handle none netlink socket.'''
311 socket = None
312 ifname = "eth0"
313 with self.assertRaises(AssertionError) as context:
314 wait_for_media_disconnect_connect(socket, ifname)
315 self.assertTrue('netlink socket is none' in str(context.exception))
316
317 def test_wait_invalid_ifname(self, m_read_netlink_socket, m_socket):
318 '''wait_for_media_disconnect_connect handle none interface name'''
319 ifname = None
320 with self.assertRaises(AssertionError) as context:
321 wait_for_media_disconnect_connect(m_socket, ifname)
322 self.assertTrue('interface name is none' in str(context.exception))
323 ifname = ""
324 with self.assertRaises(AssertionError) as context:
325 wait_for_media_disconnect_connect(m_socket, ifname)
326 self.assertTrue('interface name cannot be empty' in
327 str(context.exception))
328
329 def test_wait_invalid_rta_attr(self, m_read_netlink_socket, m_socket):
330 ''' wait_for_media_disconnect_connect handles invalid rta data'''
331 ifname = "eth0"
332 data_invalid1 = self._media_switch_data(None, RTM_NEWLINK, OPER_DOWN)
333 data_invalid2 = self._media_switch_data(ifname, RTM_NEWLINK, None)
334 data_op_down = self._media_switch_data(ifname, RTM_NEWLINK, OPER_DOWN)
335 data_op_up = self._media_switch_data(ifname, RTM_NEWLINK, OPER_UP)
336 m_read_netlink_socket.side_effect = [data_invalid1, data_invalid2,
337 data_op_down, data_op_up]
338 wait_for_media_disconnect_connect(m_socket, ifname)
339 self.assertEqual(m_read_netlink_socket.call_count, 4)
340
341 def test_read_multiple_netlink_msgs(self, m_read_netlink_socket, m_socket):
342 '''Read multiple messages in single receive call'''
343 ifname = "eth0"
344 bytes = ifname.encode("utf-8")
345 data = bytearray(96)
346 struct.pack_into("=LHHLL", data, 0, 48, RTM_NEWLINK, 0, 0, 0)
347 struct.pack_into("HH4sHHc", data, RTATTR_START_OFFSET, 8, 3,
348 bytes, 5, 16, int_to_bytes(OPER_DOWN))
349 struct.pack_into("=LHHLL", data, 48, 48, RTM_NEWLINK, 0, 0, 0)
350 struct.pack_into("HH4sHHc", data, 48 + RTATTR_START_OFFSET, 8,
351 3, bytes, 5, 16, int_to_bytes(OPER_UP))
352 m_read_netlink_socket.return_value = data
353 wait_for_media_disconnect_connect(m_socket, ifname)
354 self.assertEqual(m_read_netlink_socket.call_count, 1)
355
356 def test_read_partial_netlink_msgs(self, m_read_netlink_socket, m_socket):
357 '''Read partial messages in receive call'''
358 ifname = "eth0"
359 bytes = ifname.encode("utf-8")
360 data1 = bytearray(112)
361 data2 = bytearray(32)
362 struct.pack_into("=LHHLL", data1, 0, 48, RTM_NEWLINK, 0, 0, 0)
363 struct.pack_into("HH4sHHc", data1, RTATTR_START_OFFSET, 8, 3,
364 bytes, 5, 16, int_to_bytes(OPER_DOWN))
365 struct.pack_into("=LHHLL", data1, 48, 48, RTM_NEWLINK, 0, 0, 0)
366 struct.pack_into("HH4sHHc", data1, 80, 8, 3, bytes, 5, 16,
367 int_to_bytes(OPER_DOWN))
368 struct.pack_into("=LHHLL", data1, 96, 48, RTM_NEWLINK, 0, 0, 0)
369 struct.pack_into("HH4sHHc", data2, 16, 8, 3, bytes, 5, 16,
370 int_to_bytes(OPER_UP))
371 m_read_netlink_socket.side_effect = [data1, data2]
372 wait_for_media_disconnect_connect(m_socket, ifname)
373 self.assertEqual(m_read_netlink_socket.call_count, 2)
diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py
index 5ea7ae5..417d86a 100644
--- a/tests/unittests/test_datasource/test_azure.py
+++ b/tests/unittests/test_datasource/test_azure.py
@@ -564,6 +564,8 @@ fdescfs /dev/fd fdescfs rw 0 0
564 self.assertEqual(1, report_ready_func.call_count)564 self.assertEqual(1, report_ready_func.call_count)
565565
566 @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')566 @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
567 @mock.patch('cloudinit.sources.helpers.netlink.'
568 'wait_for_media_disconnect_connect')
567 @mock.patch(569 @mock.patch(
568 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')570 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')
569 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')571 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
@@ -572,7 +574,7 @@ fdescfs /dev/fd fdescfs rw 0 0
572 def test_crawl_metadata_on_reprovision_reports_ready_using_lease(574 def test_crawl_metadata_on_reprovision_reports_ready_using_lease(
573 self, m_readurl, m_dhcp,575 self, m_readurl, m_dhcp,
574 m_net, report_ready_func,576 m_net, report_ready_func,
575 m_write):577 m_media_switch, m_write):
576 """If reprovisioning, report ready using the obtained lease"""578 """If reprovisioning, report ready using the obtained lease"""
577 ovfenv = construct_valid_ovf_env(579 ovfenv = construct_valid_ovf_env(
578 platform_settings={"PreprovisionedVm": "True"})580 platform_settings={"PreprovisionedVm": "True"})
@@ -586,6 +588,7 @@ fdescfs /dev/fd fdescfs rw 0 0
586 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',588 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
587 'unknown-245': '624c3620'}589 'unknown-245': '624c3620'}
588 m_dhcp.return_value = [lease]590 m_dhcp.return_value = [lease]
591 m_media_switch.return_value = None
589592
590 reprovision_ovfenv = construct_valid_ovf_env()593 reprovision_ovfenv = construct_valid_ovf_env()
591 m_readurl.return_value = url_helper.StringResponse(594 m_readurl.return_value = url_helper.StringResponse(
@@ -1676,6 +1679,8 @@ class TestPreprovisioningShouldReprovision(CiTestCase):
16761679
1677@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')1680@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
1678@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')1681@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
1682@mock.patch('cloudinit.sources.helpers.netlink.'
1683 'wait_for_media_disconnect_connect')
1679@mock.patch('requests.Session.request')1684@mock.patch('requests.Session.request')
1680@mock.patch(MOCKPATH + 'DataSourceAzure._report_ready')1685@mock.patch(MOCKPATH + 'DataSourceAzure._report_ready')
1681class TestPreprovisioningPollIMDS(CiTestCase):1686class TestPreprovisioningPollIMDS(CiTestCase):
@@ -1689,7 +1694,8 @@ class TestPreprovisioningPollIMDS(CiTestCase):
16891694
1690 @mock.patch(MOCKPATH + 'EphemeralDHCPv4')1695 @mock.patch(MOCKPATH + 'EphemeralDHCPv4')
1691 def test_poll_imds_re_dhcp_on_timeout(self, m_dhcpv4, report_ready_func,1696 def test_poll_imds_re_dhcp_on_timeout(self, m_dhcpv4, report_ready_func,
1692 fake_resp, m_dhcp, m_net):1697 fake_resp, m_media_switch, m_dhcp,
1698 m_net):
1693 """The poll_imds will retry DHCP on IMDS timeout."""1699 """The poll_imds will retry DHCP on IMDS timeout."""
1694 report_file = self.tmp_path('report_marker', self.tmp)1700 report_file = self.tmp_path('report_marker', self.tmp)
1695 lease = {1701 lease = {
@@ -1697,7 +1703,7 @@ class TestPreprovisioningPollIMDS(CiTestCase):
1697 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',1703 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
1698 'unknown-245': '624c3620'}1704 'unknown-245': '624c3620'}
1699 m_dhcp.return_value = [lease]1705 m_dhcp.return_value = [lease]
17001706 m_media_switch.return_value = None
1701 dhcp_ctx = mock.MagicMock(lease=lease)1707 dhcp_ctx = mock.MagicMock(lease=lease)
1702 dhcp_ctx.obtain_lease.return_value = lease1708 dhcp_ctx.obtain_lease.return_value = lease
1703 m_dhcpv4.return_value = dhcp_ctx1709 m_dhcpv4.return_value = dhcp_ctx
@@ -1723,11 +1729,12 @@ class TestPreprovisioningPollIMDS(CiTestCase):
1723 dsa._poll_imds()1729 dsa._poll_imds()
1724 self.assertEqual(report_ready_func.call_count, 1)1730 self.assertEqual(report_ready_func.call_count, 1)
1725 report_ready_func.assert_called_with(lease=lease)1731 report_ready_func.assert_called_with(lease=lease)
1726 self.assertEqual(2, m_dhcpv4.call_count, 'Expected 2 DHCP calls')1732 self.assertEqual(3, m_dhcpv4.call_count, 'Expected 3 DHCP calls')
1727 self.assertEqual(3, self.tries, 'Expected 3 total reads from IMDS')1733 self.assertEqual(3, self.tries, 'Expected 3 total reads from IMDS')
17281734
1729 def test_poll_imds_report_ready_false(self, report_ready_func,1735 def test_poll_imds_report_ready_false(self,
1730 fake_resp, m_dhcp, m_net):1736 report_ready_func, fake_resp,
1737 m_media_switch, m_dhcp, m_net):
1731 """The poll_imds should not call reporting ready1738 """The poll_imds should not call reporting ready
1732 when flag is false"""1739 when flag is false"""
1733 report_file = self.tmp_path('report_marker', self.tmp)1740 report_file = self.tmp_path('report_marker', self.tmp)
@@ -1736,6 +1743,7 @@ class TestPreprovisioningPollIMDS(CiTestCase):
1736 'interface': 'eth9', 'fixed-address': '192.168.2.9',1743 'interface': 'eth9', 'fixed-address': '192.168.2.9',
1737 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',1744 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
1738 'unknown-245': '624c3620'}]1745 'unknown-245': '624c3620'}]
1746 m_media_switch.return_value = None
1739 dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)1747 dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
1740 with mock.patch(MOCKPATH + 'REPORTED_READY_MARKER_FILE', report_file):1748 with mock.patch(MOCKPATH + 'REPORTED_READY_MARKER_FILE', report_file):
1741 dsa._poll_imds()1749 dsa._poll_imds()
@@ -1745,6 +1753,8 @@ class TestPreprovisioningPollIMDS(CiTestCase):
1745@mock.patch(MOCKPATH + 'util.subp')1753@mock.patch(MOCKPATH + 'util.subp')
1746@mock.patch(MOCKPATH + 'util.write_file')1754@mock.patch(MOCKPATH + 'util.write_file')
1747@mock.patch(MOCKPATH + 'util.is_FreeBSD')1755@mock.patch(MOCKPATH + 'util.is_FreeBSD')
1756@mock.patch('cloudinit.sources.helpers.netlink.'
1757 'wait_for_media_disconnect_connect')
1748@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')1758@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
1749@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')1759@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
1750@mock.patch('requests.Session.request')1760@mock.patch('requests.Session.request')
@@ -1757,10 +1767,13 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
1757 self.paths = helpers.Paths({'cloud_dir': tmp})1767 self.paths = helpers.Paths({'cloud_dir': tmp})
1758 dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d1768 dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
17591769
1760 def test_poll_imds_returns_ovf_env(self, fake_resp, m_dhcp, m_net,1770 def test_poll_imds_returns_ovf_env(self, fake_resp,
1771 m_dhcp, m_net,
1772 m_media_switch,
1761 m_is_bsd, write_f, subp):1773 m_is_bsd, write_f, subp):
1762 """The _poll_imds method should return the ovf_env.xml."""1774 """The _poll_imds method should return the ovf_env.xml."""
1763 m_is_bsd.return_value = False1775 m_is_bsd.return_value = False
1776 m_media_switch.return_value = None
1764 m_dhcp.return_value = [{1777 m_dhcp.return_value = [{
1765 'interface': 'eth9', 'fixed-address': '192.168.2.9',1778 'interface': 'eth9', 'fixed-address': '192.168.2.9',
1766 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0'}]1779 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0'}]
@@ -1778,16 +1791,19 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
1778 'Cloud-Init/%s' % vs()1791 'Cloud-Init/%s' % vs()
1779 }, method='GET', timeout=1,1792 }, method='GET', timeout=1,
1780 url=full_url)])1793 url=full_url)])
1781 self.assertEqual(m_dhcp.call_count, 1)1794 self.assertEqual(m_dhcp.call_count, 2)
1782 m_net.assert_any_call(1795 m_net.assert_any_call(
1783 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',1796 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',
1784 prefix_or_mask='255.255.255.0', router='192.168.2.1')1797 prefix_or_mask='255.255.255.0', router='192.168.2.1')
1785 self.assertEqual(m_net.call_count, 1)1798 self.assertEqual(m_net.call_count, 2)
17861799
1787 def test__reprovision_calls__poll_imds(self, fake_resp, m_dhcp, m_net,1800 def test__reprovision_calls__poll_imds(self, fake_resp,
1801 m_dhcp, m_net,
1802 m_media_switch,
1788 m_is_bsd, write_f, subp):1803 m_is_bsd, write_f, subp):
1789 """The _reprovision method should call poll IMDS."""1804 """The _reprovision method should call poll IMDS."""
1790 m_is_bsd.return_value = False1805 m_is_bsd.return_value = False
1806 m_media_switch.return_value = None
1791 m_dhcp.return_value = [{1807 m_dhcp.return_value = [{
1792 'interface': 'eth9', 'fixed-address': '192.168.2.9',1808 'interface': 'eth9', 'fixed-address': '192.168.2.9',
1793 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',1809 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
@@ -1811,11 +1827,11 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
1811 'User-Agent':1827 'User-Agent':
1812 'Cloud-Init/%s' % vs()},1828 'Cloud-Init/%s' % vs()},
1813 method='GET', timeout=1, url=full_url)])1829 method='GET', timeout=1, url=full_url)])
1814 self.assertEqual(m_dhcp.call_count, 1)1830 self.assertEqual(m_dhcp.call_count, 2)
1815 m_net.assert_any_call(1831 m_net.assert_any_call(
1816 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',1832 broadcast='192.168.2.255', interface='eth9', ip='192.168.2.9',
1817 prefix_or_mask='255.255.255.0', router='192.168.2.1')1833 prefix_or_mask='255.255.255.0', router='192.168.2.1')
1818 self.assertEqual(m_net.call_count, 1)1834 self.assertEqual(m_net.call_count, 2)
18191835
18201836
1821class TestRemoveUbuntuNetworkConfigScripts(CiTestCase):1837class TestRemoveUbuntuNetworkConfigScripts(CiTestCase):

Subscribers

People subscribed via source and target branches