Code review comment for ~dojordan/cloud-init:azure-preprovisioning

Revision history for this message
Douglas Jordan (dojordan) wrote :

Thanks for the comments Scott-

The reason for the bounce_network_with_azure_hostname is actually to get the new IP address. We can't use the digital ocean ipv4 link local approach as the way our IMDS server identifies which VM is talking to it is via the mac address and the ip address. When the control plane updates the DHCP server with the new IP, the vm has to trigger dhcp and acquire the new address. In our polling loop, we call bounce_networking_with_azure_hostname to trigger dhcp when we get an exception. This is because IMDS will not even handle our request if the mac/ip does not match what was expected. In windows, we get around this by just disconnecting and reconnecting the NIC, however on Ubuntu 16.04, we observed the link had to be disconnected for a very long time (>10s) for this behavior to occur. Hopefully, with systemd-networkd, this issue will be fixed. For now however, we are targeting 16.04 for pre=provisioning, and we will continue to test artful/bionic with the link state disconnect / reconnect approach.

> There are some comments in line.
> I'm not sure I fully understand all of this.
>
> I could be wrong here, but I think you're using
> bounce_network_with_azure_hostname in order to interact with IMDS_URL.
>
> We have 2 ways now of brining up ephemeral networking for this purpose:
> a.) Chad has recently added code in the EC2 metadata service using
> cloudinit/net/dhcp.py .
> b.) the digital ocean datasource has code to negotiate a ipv4 link local
> address.
>
> If 'b' is sufficient, I'd prefer that, but either one I'd prefer to the
> bounce_network_ which i think may actually not work for you if you rebased to
> trunk.
>
> As mentioned in IRC, I'm still concerned about systemd giving up and deciding
> that boot has failed after some amount of time polling on a metadata service.
> As Douglas pointed out, cloud-init has timeouts set to 0 and is a 'oneshot',
> so *its* timeout is not an issue, but I think that things that it runs
> 'Before' or (pre-networking or other) might end up timing out.

« Back to merge proposal