Merge ~paul-meyer/cloud-init:lp1717611 into cloud-init:master

Proposed by Paul Meyer
Status: Merged
Merged at revision: eaadf52b1010cf189bde2a6abb3265b890f6d36d
Proposed branch: ~paul-meyer/cloud-init:lp1717611
Merge into: cloud-init:master
Diff against target: 29 lines (+7/-3)
1 file modified
cloudinit/sources/DataSourceAzure.py (+7/-3)
Reviewer Review Type Date Requested Status
Chad Smith Approve
Server Team CI bot continuous-integration Approve
Scott Moser Approve
Review via email: mp+330925@code.launchpad.net

Commit message

Azure: wait longer for SSH pub keys to arrive.

Currently the Azure data source waits up to 60 seconds. This has proven
not to be sufficient to provide resiliency to unrelated transient failures
in other parts of the infrastructure. Azure already has logic outside of
the VM to abort hung provisioning. This changes lengthens the time out to
15 minutes.

LP: #1717611

Description of the change

Azure: wait longer for SSH pub keys to arrive.
Currently the Azure data source waits up to 60 seconds. This has proven not to be sufficient to provide resiliency to unrelated transient failures in other parts of the infrastructure. Azure already has logic outside of the VM to abort hung provisioning. This changes lengthens the time out to 15 minutes.

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:63c7c48543ae779590668f8ccc9f571fd039bed1
https://jenkins.ubuntu.com/server/job/cloud-init-ci/323/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/323/rebuild

review: Approve (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

It still seems like it makes sense to have a timeout of some sort.
cloud-init is blocking boot at this point, and you may well have another way into your instance if cloud-init would let go.

Can't we just set this to 600 seconds or something?

Revision history for this message
Paul Meyer (paul-meyer) wrote :

Discussing internally with the provisioning folks...

Revision history for this message
Ryan Harper (raharper) wrote :

Is it possible to set this value via vendor-data? Surely then it may be tuned by the cloud without a change to cloud-init (even if we decide on 600 or 900 or whatever else) as a default value.

Revision history for this message
Paul Meyer (paul-meyer) wrote :

From historical data, we've seen infrastructure updates (that can take the wireserver temporarily offline) take 5+ minutes. How about 15 minutes to be on the safe side?
BTW, we're aware that this is not the best way to transport SSH keys, so we're working on that separately.

Revision history for this message
Scott Moser (smoser) wrote :

this is down a Azure only path.
We could probably make it configurable in some mechanism through vendor-data as ryan suggested, but... if the platform tells us they need it to be this high, then i'm willing to say "they know best".

review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:edc8fba4deff06de8cb7127ae555e67a819197d0
https://jenkins.ubuntu.com/server/job/cloud-init-ci/325/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/325/rebuild

review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) wrote :

Sure this works for me. We already log how long we've decided to maxwait. I see no problem with dropping the maxwait default on wait_for_files since there are only two callers, and they are in the Azure datasource.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
2index b5a95a1..80c2bd1 100644
3--- a/cloudinit/sources/DataSourceAzure.py
4+++ b/cloudinit/sources/DataSourceAzure.py
5@@ -317,9 +317,13 @@ class DataSourceAzure(sources.DataSource):
6 LOG.debug("ssh authentication: "
7 "using fingerprint from fabirc")
8
9- missing = util.log_time(logfunc=LOG.debug, msg="waiting for files",
10+ # wait very long for public SSH keys to arrive
11+ # https://bugs.launchpad.net/cloud-init/+bug/1717611
12+ missing = util.log_time(logfunc=LOG.debug,
13+ msg="waiting for SSH public key files",
14 func=wait_for_files,
15- args=(fp_files,))
16+ args=(fp_files, 900))
17+
18 if len(missing):
19 LOG.warning("Did not find files, but going on: %s", missing)
20
21@@ -656,7 +660,7 @@ def pubkeys_from_crt_files(flist):
22 return pubkeys
23
24
25-def wait_for_files(flist, maxwait=60, naplen=.5, log_pre=""):
26+def wait_for_files(flist, maxwait, naplen=.5, log_pre=""):
27 need = set(flist)
28 waited = 0
29 while True:

Subscribers

People subscribed via source and target branches