wait() waits many hrs, or even infinity
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mojo: Continuous Delivery for Juju |
Fix Released
|
High
|
Paul Collins |
Bug Description
juju-wait will wait indefinitely for a settled status, and may keep waiting even though a unit is in an error state.
This is preventing us from using the new-ish mojo built-in wait, as we can't have our metal tied up for several hours in CI on failing jobs.
The mojo repo contains an old copy of juju-wait in tree.
Quite a while back we committed an optional max_wait to juju-wait address that, and have been using that as a post-deploy phase with juju-wait trunk to successfully wait for things to settle.
Mojo should consider re-freshing the juju-wait code, and plumbing the max_wait option through to be usable in a manifest.
This issue is likely scarce in production deploys. We run into it as we use Mojo to do charm testing, and sometimes those charms have issues. We need for the tooling to be as tunable and resilient to that as possible, as is possible with juju-wait from trunk.
Example:
#### mojo's juju-wait starts here
00:58:23.997 2016-09-22 08:12:33 [INFO] Waiting for environment to reach steady state
02:58:48.609 2016-09-22 10:12:58 [INFO] All units idle since 2016-09-22 10:12:25.640469Z (ceilometer/0, ceph-osd/0, ceph/0, ceph/1, ceph/2, cinder/0, glance/0, heat/0, keystone/0, mongodb/0, mysql/0, neutron-api/0, neutron-gateway/0, nova-cloud-
02:58:48.609 2016-09-22 10:12:58 [INFO] Environment has reached steady state
#### mojo's juju-wait claims ready state, even though multiple units are in an error state
02:58:48.610 2016-09-22 10:12:58 [INFO] Manifest comment:
02:58:48.610
02:58:48.610 #######
02:58:48.610 Check juju statuses are green and that hooks have finished
02:58:48.610 #######
02:58:48.610
02:58:48.610
02:58:48.610 2016-09-22 10:12:58 [INFO] Pulling secrets from /srv/mojo/
02:58:48.610 2016-09-22 10:12:58 [WARNING] Automatic secrets phase ran but secrets directory /srv/mojo/
02:58:48.611 2016-09-22 10:12:58 [INFO] Running script check_juju.py
03:43:51.354 2016-09-22 10:58:00 [WARNING] No debug log matching debug-logs found. Using default.
#### juju-wait trunk starts here
03:43:52.266 2016-09-22 10:58:01 [ERROR] INFO:root:Calling juju-wait
03:43:52.266 DEBUG:root:
03:43:52.266 DEBUG:root:
03:43:52.266 DEBUG:root:
Related branches
- Hristo Erinin (community): Approve
-
Diff: 82 lines (+14/-7)2 files modifiedmojo/juju/status.py (+7/-4)
mojo/phase.py (+7/-3)
- Thomas Cuthbert (community): Approve
-
Diff: 14 lines (+2/-2)1 file modifiedmojo/juju/status.py (+2/-2)
description: | updated |
description: | updated |
Changed in mojo: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in mojo: | |
status: | In Progress → Fix Committed |
Changed in mojo: | |
status: | Fix Committed → Fix Released |
http:// pastebin. ubuntu. com/23284241/ is a juju status --format=yaml from an environment where mojo has been waiting for over 12 hours.