Merge ~paul-meyer/cloud-init:lp1717611 into cloud-init:master
| Status: | Merged |
|---|---|
| Merged at revision: | eaadf52b1010cf189bde2a6abb3265b890f6d36d |
| Proposed branch: | ~paul-meyer/cloud-init:lp1717611 |
| Merge into: | cloud-init:master |
| Diff against target: |
29 lines (+7/-3) 1 file modified
cloudinit/sources/DataSourceAzure.py (+7/-3) |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Chad Smith | Approve on 2017-09-18 | ||
| Server Team CI bot | continuous-integration | Approve on 2017-09-18 | |
| Scott Moser | 2017-09-18 | Approve on 2017-09-18 | |
|
Review via email:
|
|||
Commit Message
Azure: wait longer for SSH pub keys to arrive.
Currently the Azure data source waits up to 60 seconds. This has proven
not to be sufficient to provide resiliency to unrelated transient failures
in other parts of the infrastructure. Azure already has logic outside of
the VM to abort hung provisioning. This changes lengthens the time out to
15 minutes.
LP: #1717611
Description of the Change
Azure: wait longer for SSH pub keys to arrive.
Currently the Azure data source waits up to 60 seconds. This has proven not to be sufficient to provide resiliency to unrelated transient failures in other parts of the infrastructure. Azure already has logic outside of the VM to abort hung provisioning. This changes lengthens the time out to 15 minutes.
| Scott Moser (smoser) wrote : | # |
It still seems like it makes sense to have a timeout of some sort.
cloud-init is blocking boot at this point, and you may well have another way into your instance if cloud-init would let go.
Can't we just set this to 600 seconds or something?
| Paul Meyer (paul-meyer) wrote : | # |
Discussing internally with the provisioning folks...
| Ryan Harper (raharper) wrote : | # |
Is it possible to set this value via vendor-data? Surely then it may be tuned by the cloud without a change to cloud-init (even if we decide on 600 or 900 or whatever else) as a default value.
| Paul Meyer (paul-meyer) wrote : | # |
From historical data, we've seen infrastructure updates (that can take the wireserver temporarily offline) take 5+ minutes. How about 15 minutes to be on the safe side?
BTW, we're aware that this is not the best way to transport SSH keys, so we're working on that separately.
| Scott Moser (smoser) wrote : | # |
this is down a Azure only path.
We could probably make it configurable in some mechanism through vendor-data as ryan suggested, but... if the platform tells us they need it to be this high, then i'm willing to say "they know best".
PASSED: Continuous integration, rev:edc8fba4def
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: MAAS Compatability Testing
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
| Chad Smith (chad.smith) wrote : | # |
Sure this works for me. We already log how long we've decided to maxwait. I see no problem with dropping the maxwait default on wait_for_files since there are only two callers, and they are in the Azure datasource.


PASSED: Continuous integration, rev:63c7c48543a e779590668f8ccc 9f571fd039bed1 /jenkins. ubuntu. com/server/ job/cloud- init-ci/ 323/
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Unit & Style Tests
SUCCESS: Ubuntu LTS: Build
SUCCESS: Ubuntu LTS: Integration
SUCCESS: MAAS Compatability Testing
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild: /jenkins. ubuntu. com/server/ job/cloud- init-ci/ 323/rebuild
https:/