ssh timeout for bootstrap could be configurable

Bug #1257649 reported by John A Meinel
78
This bug affects 13 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Dimiter Naydenov

Bug Description

We're working on making 'juju bootstrap' try to connect to the machine that it is starting up. As part of this we have stuff like "how long should we wait before we decide that instance is not going to come up."

Right now we just use a fixed 10min timeout + 5s retry delay. It seems like something that people might actually like to be able to configure in there environments.yaml. As it should be client <=> cloud specific. (This cloud takes longer to start up, this one is known to fail if it hasn't started in 10s, etc)

It might also be a good fit for how we connect to the API. Although with synchronous bootstrap, we should probably be less likely to wait for the API to come up. (If you can't run anything until that machine is up, then you should never have to wait very long on every other command.)

At the very least, timeouts are rarely a "one size fits all" so allowing them to be configurable can be useful.

Related branches

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think this is irrelevant now that https://code.launchpad.net/~axwalk/juju-core/waitssh-no-conn-timeout/+merge/198658 is merged in. Can we close this?

Curtis Hovey (sinzui)
Changed in juju-core:
importance: Wishlist → Low
Revision history for this message
Curtis Hovey (sinzui) wrote :

The maas lab reports that the 10 minute bootstrap limit is not enough time. They are currently looking at a hack to the code to give Juju enough time to start on maas + openstack. If we don't want this configurable, then some providers/substrates need different times.

Changed in juju-core:
milestone: none → 1.17.1
importance: Low → High
tags: added: maas-provider
Martin Packman (gz)
Changed in juju-core:
milestone: 1.17.1 → 1.18.0
Revision history for this message
Diogo Matsubara (matsubara) wrote :

Btw, there's no openstack involved. It's juju bootstrap on MAAS that takes longer than 10 min in the MAAS QA lab. With the timeout increased to 30 minutes, bootstrap proceeds further but I couldn't confirm that it actually finishes because my compiled version was compiled on Trusty and I was deploying on Precise. For this reason, --upload-tools fails at some point later on in the bootstrap workflow. Curtis gave me another workaround to use the --metadata-source option so the MAAS instance can download the juju tools from that source instead of searching for them in S3 (the lab doesn't have access to that). Today I'll give it a try again, maybe by setting up a proper simplestreams mirror for the MAAS lab and will report back. I think Curtis have enough details from our IRC conversation to file a proper bug regarding the use of --metadata-source in a network without wide access to the Internet. Thanks a lot for raising the importance of this bug.

Roger Peppe (rogpeppe)
Changed in juju-core:
importance: High → Critical
Changed in juju-core:
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Graham Binns (gmb) wrote :

It's been pointed out to me that there are machines in Canonical's Boston DC that take 10 minutes to *POST*. I'd suggest at least 1h if we're not going to have something configurable.

tags: added: micro-cluster
Changed in juju-core:
status: Triaged → In Progress
Changed in juju-core:
milestone: 1.18.0 → 2.0
milestone: 2.0 → 1.18.0
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.18.0 → 1.17.2
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.