Merge lp:~hloeung/charm-helpers/status-set-retry-a-few-times into lp:charm-helpers

Proposed by Haw Loeung
Status: Rejected
Rejected by: Haw Loeung
Proposed branch: lp:~hloeung/charm-helpers/status-set-retry-a-few-times
Merge into: lp:charm-helpers
Diff against target: 40 lines (+15/-7)
1 file modified
charmhelpers/core/hookenv.py (+15/-7)
To merge this branch: bzr merge lp:~hloeung/charm-helpers/status-set-retry-a-few-times
Reviewer Review Type Date Requested Status
Stuart Bishop (community) Needs Fixing
Review via email: mp+318438@code.launchpad.net

Description of the change

We're constantly seeing update-status hook failures where a 'juju resolved --retry' would fix. The latest example:

| ksplice/2 error idle 1.25.8 10.25.9.229 hook failed: "update-status"

Juju logs shows the following:

| 2017-02-28 06:20:49 INFO juju-log Application Version: 1.2.31
| 2017-02-28 06:20:49 INFO update-status error: connection is shut down
| 2017-02-28 06:20:49 INFO juju-log status-set failed: active Effective kernel 4.4.0-64-generic

This makes status_set() retry up to 10 times before actually failing.

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

Is there a Juju bug for the problem this is working around? It should be cited in the comments so future-us knows when the work around can be removed.

review: Needs Fixing

Unmerged revisions

696. By Haw Loeung

Updated status_set() to re-try at least 10 times with random splay.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmhelpers/core/hookenv.py'
2--- charmhelpers/core/hookenv.py 2017-02-01 00:12:12 +0000
3+++ charmhelpers/core/hookenv.py 2017-02-28 06:35:17 +0000
4@@ -29,7 +29,9 @@
5 import subprocess
6 import sys
7 import errno
8+import random
9 import tempfile
10+import time
11 from subprocess import CalledProcessError
12
13 import six
14@@ -813,13 +815,19 @@
15 '{!r} is not a valid workload state'.format(workload_state)
16 )
17 cmd = ['status-set', workload_state, message]
18- try:
19- ret = subprocess.call(cmd)
20- if ret == 0:
21- return
22- except OSError as e:
23- if e.errno != errno.ENOENT:
24- raise
25+ count = 1
26+ while (count <= 10):
27+ count += 1
28+ try:
29+ ret = subprocess.call(cmd)
30+ if ret == 0:
31+ return
32+ except OSError as e:
33+ if e.errno != errno.ENOENT:
34+ raise
35+ # For large environments, we could have a storm of update-status hooks
36+ # firing (e.g. when jujud-machine-0 has been bounced) so let's splay.
37+ time.sleep(random.randint(0, 60))
38 log_message = 'status-set failed: {} {}'.format(workload_state,
39 message)
40 log(log_message, level='INFO')

Subscribers

People subscribed via source and target branches