Merge ~chad.smith/cloud-init:aws-local-dhcp into cloud-init:master
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Merged at revision: | d5f855dd96ccbea77f61b0515b574ad2c43d116d | ||||
| Proposed branch: | ~chad.smith/cloud-init:aws-local-dhcp | ||||
| Merge into: | cloud-init:master | ||||
| Prerequisite: | ~chad.smith/cloud-init:unittests-in-cloudinit-package | ||||
| Diff against target: |
901 lines (+511/-91) 11 files modified
cloudinit/net/__init__.py (+20/-33) cloudinit/net/dhcp.py (+119/-0) cloudinit/net/tests/test_dhcp.py (+144/-0) cloudinit/net/tests/test_init.py (+1/-1) cloudinit/sources/DataSourceAliYun.py (+6/-3) cloudinit/sources/DataSourceEc2.py (+99/-22) tests/unittests/helpers.py (+1/-1) tests/unittests/test_datasource/test_aliyun.py (+6/-5) tests/unittests/test_datasource/test_common.py (+1/-0) tests/unittests/test_datasource/test_ec2.py (+112/-24) tox.ini (+2/-2) |
||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Andrew Jorgensen (community) | 2017-08-04 | Approve on 2017-08-07 | |
| Server Team CI bot | continuous-integration | Approve on 2017-08-07 | |
| Scott Moser | 2017-07-28 | Approve on 2017-08-04 | |
|
Review via email:
|
|||
Commit Message
ec2: Allow Ec2 to run in init-local using dhclient in a sandbox.
This branch is a prerequisite for IPv6 support in AWS by allowing Ec2 datasource to query the metadata source version 2016-09-02 about whether or not it needs to configure IPv6 on interfaces. If version 2016-09-02 is not present, fallback to the min_metadata_
To query AWS' metadata address @ 169.254.169.254, the instance must have a dhcp-allocated address configured. Configuring IPv4 link-local addresses result in timeouts from the metadata service. We introduced a DataSourceEc2Local subclass which will perform a sandboxed dhclient discovery which obtains an authorized IP address on eth0 and crawl metadata about full instance network configuration.
Since ec2 IPv6 metadata is not sufficient in itself to tell us all the ipv6 knownledge we need, it only be used as a boolean to tell us which nics need IPv6. Cloud-init will then configure desired interfaces to DHCPv6 versus DHCPv4.
Performance side note: Shifting the dhcp work into init-local for Ec2 actually gets us 1 second faster deployments by skipping init-network phase of alternate datasource checks because Ec2Local is configured in an ealier boot stage. In 3 test runs prior to this change: cloud-init runs were 5.5 seconds, with the change we now average 4.6 seconds.
This efficiency could be even further improved if we avoiding dhcp discovery in order to talk to the metadata service from an AWS authorized dhcp address if there were some way to advertize the dhcp configuration via DMI/SMBIOS or system environment variables.
Inspecting time costs of the dhclient setup/teardown in 3 live runs the time cost for the dhcp setup round trip on AWS is:
test 1: 76 milliseconds
dhcp discovery + metadata: 0.347 seconds
metadata alone: 0.271 seconds
test 2: 88 milliseconds
dhcp discovery + metadata: 0.388 seconds
metadata alone: 0.300 seconds
test 3: 75 milliseconds
dhcp discovery + metadata: 0.366 seconds
metadata alone: 0.291 seconds
LP: #1709772
Description of the Change
ec2: Allow Ec2 to run in init-local using dhclient in a sandbox.
This branch is a prerequisite for IPv6 support in AWS by allowing Ec2 datasource to query the metadata source version 2016-09-02 about whether or not it needs to connfigure IPv6 on interfaces. If version 2016-09-02 is not present, fallback to the min_metadata_
To query AWS' metadata address @ 169.254.169.254, the instance must have an AWS-dhcp-allocated address configured. Configuring IPv4 link-local addresses result in timeouts from the metadata service. So we now have a DataSourceEc2Local subclass which will perform a sandboxed dhclient discovery in order to obtain an authorized IP address which is used to set up eth0 and curl metadata about full instance network configuration.
A subsequent branch will inspect IPv6 configuration from the metadata harvested and properly write network IPv4/IPv6 configuration for the instance for all enabled interfaces.
Side note: The only way AWS supports querying ipv6 info from the vm
is via queries of the metadata service. This logic adds an extra dhclient attempt in init-local phase for AWS so there is an additional time cost of around a 10th of a second for boots because of the sandboxed dhclient discovery runs. This timecost would be greater if AWS' dhcp service is slow to respond.
| Ryan Harper (raharper) wrote : | # |
This looks really good. Just some questions and clarifications in inline comments.
- 92e1952... by Chad Smith on 2017-08-03
FAILED: Continuous integration, rev:f08dc1ea917
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
| Scott Moser (smoser) wrote : | # |
theres nothing huge in my comments.
thanks.
- e2dc2b7... by Chad Smith on 2017-08-03
FAILED: Continuous integration, rev:a1268952438
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
- 222567d... by Chad Smith on 2017-08-03
FAILED: Continuous integration, rev:137c2881ed2
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
- e0fb53a... by Chad Smith on 2017-08-03
- 25b526c... by Chad Smith on 2017-08-04
- 23bb804... by Chad Smith on 2017-08-04
FAILED: Continuous integration, rev:d114d715a42
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
- c61035a... by Chad Smith on 2017-08-04
FAILED: Continuous integration, rev:1a01cae3f83
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
- 87d53b4... by Chad Smith on 2017-08-04
| Chad Smith (chad.smith) wrote : | # |
All comments addressed, and we now handle backward compatibility for old metadata versions.
PASSED: Continuous integration, rev:b43779d06d3
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: CentOS 6 & 7: Build & Test
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
| Scott Moser (smoser) wrote : | # |
I have one fun comment in line.
- da6658b... by Chad Smith on 2017-08-04
FAILED: Continuous integration, rev:e77a177c07a
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
| Scott Moser (smoser) wrote : | # |
I added Andrew to this merge specifically for his insight on knowing how to determine if a metadata version is available .
- 4eb786a... by Chad Smith on 2017-08-04
- 3bdde87... by Chad Smith on 2017-08-04
FAILED: Continuous integration, rev:d91194fb65d
https:/
Executed test runs:
SUCCESS: Checkout
FAILED: Unit & Style Tests
Click here to trigger a rebuild:
https:/
- 22f4f6f... by Chad Smith on 2017-08-04
| Scott Moser (smoser) wrote : | # |
I can approve this as is, assuming you've tested.
you'll need to fix the merge conflicts, rebase, squash...
i think it seems sane.
- 813f998... by Chad Smith on 2017-08-04
- 84bf898... by Chad Smith on 2017-08-04
PASSED: Continuous integration, rev:5ace7f3f788
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: CentOS 6 & 7: Build & Test
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
FAILED: Continuous integration, rev:67a0f861ed9
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
FAILED: CentOS 6 & 7: Build & Test
Click here to trigger a rebuild:
https:/
- b666d7e... by Chad Smith on 2017-08-04
PASSED: Continuous integration, rev:84bf898f49a
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: CentOS 6 & 7: Build & Test
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
FAILED: Continuous integration, rev:b666d7e8413
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
FAILED: CentOS 6 & 7: Build & Test
Click here to trigger a rebuild:
https:/
| Andrew Jorgensen (ajorgens) wrote : | # |
> I added Andrew to this merge specifically for his insight on knowing how to
> determine if a metadata version is available .
The most explicit way to discover if a version is available is to ask the instance meta-data service to tell you:
$ curl http://
1.0
2007-01-19
2007-03-01
2007-08-29
2007-10-10
2007-12-15
2008-02-01
2008-09-01
2009-04-04
2011-01-01
2011-05-01
2012-01-12
2014-02-25
2014-11-05
2015-10-20
2016-04-19
2016-06-30
2016-09-02
latest
I don't know if that is supported on work-alike clouds.
But you will get a 404 from EC2 if you ask for a version that is not supported:
$ curl --fail http://
curl: (22) The requested URL returned error: 404 Not Found
| Andrew Jorgensen (ajorgens) wrote : | # |
> This logic adds an extra dhclient attempt in init-local phase for AWS so there is an additional
> time cost of around a 10th of a second for boots because of the sandboxed dhclient discovery
> runs. This timecost would be greater if AWS' dhcp service is slow to respond.
I would want to see some comparisons of launch times in practice, rather than relying on theory here.
| Andrew Jorgensen (ajorgens) wrote : | # |
A clearer explanation in the commit message might be helpful too: IPv6 information is given via DHCPv6, but the only way to tell the difference between DHCPv6 failing to answer and IPv6 not being configured for the instance is by asking the instance metadata service.
| Chad Smith (chad.smith) wrote : | # |
> > I added Andrew to this merge specifically for his insight on knowing how to
> > determine if a metadata version is available .
>
> The most explicit way to discover if a version is available is to ask the
> instance meta-data service to tell you:
>
> $ curl http://
> 1.0
> 2007-01-19
...
> 2016-09-02
> latest
>
> I don't know if that is supported on work-alike clouds.
Andrew, Thanks for this lookover. Yes, we were concerned as well about whether work-alike clouds supported this root-level curl opted to avoid the top-level listing as a 404 at root-level is the ~same~ cost (404 status response) as a direct curl against the specific version we hope to query which would give us a discrete/complete understanding about a specific version. I'd prefer only using the top-level listing (http://
- d462d39... by Chad Smith on 2017-08-07
| Andrew Jorgensen (ajorgens) wrote : | # |
I should have been more clear about the "launch times in practice" bit. In the first minutes of an instance's existence there can be delays in DHCPv4, networking, and even instance metadata. The only valid way to test the impact is to bake the changes onto an AMI and compare launch times with the vanilla AMI. Does that make sense? A reboot isn't sufficient because all the early setup of the instance is already done.
| Chad Smith (chad.smith) wrote : | # |
Roger Andrew. I'll upload my own AMI with this changeset and compare it to a fresh new instance out of the gate before initial any dhcp has run for the instance.
PASSED: Continuous integration, rev:d462d397d4a
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: CentOS 6 & 7: Build & Test
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
| Chad Smith (chad.smith) wrote : | # |
Andrew:
Ok went through 6 more runs w/ fresh AMIs creations, upstream (without Ec2Local) versus proposed (with Dhcp discovery in Ec2Local) Looks like even in the fresh AMI case we still see that benefit.
cloud-init runtime of upstream without dhcp init-local setup or teardown:
test 1: 6.204 seconds
test 2: 7.179 seconds
test 3: 7.406 seconds
cloud-init runtime the new init-local dhcp discovery setup & teardown from DataSourceEc2Local:
test 1: 6.255 seconds
test 2: 5.152 seconds
test 3: 5.411 seconds
| Andrew Jorgensen (ajorgens) wrote : | # |
Several unit tests hang forever on CentOS 7 in EC2 for me:
test_valid_
test_valid_
test_unknown_
test_ec2_
| Scott Moser (smoser) wrote : | # |
Andrew,
./tools/run-centos --unittest --keep 7
That ran to completion for me. there definitely are slower tests, and possibly buggily slow, but i don tsee a forever hang.
| Andrew Jorgensen (ajorgens) wrote : | # |
Hi Scott, Did you run that in an EC2 instance? I suspect some interaction with the instance meta-data service and the new code is causing the hang.
| Andrew Jorgensen (ajorgens) wrote : | # |
Lars has a proposed fix for the hangs: https:/


PASSED: Continuous integration, rev:3b17a848a88 8090f61a7087eb6 17e455c905123e /jenkins. ubuntu. com/server/ job/cloud- init-ci/ 108/
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: CentOS 6 & 7: Build & Test
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild: /jenkins. ubuntu. com/server/ job/cloud- init-ci/ 108/rebuild
https:/