Merge lp:~johnsca/charms/trusty/cloudfoundry/better-basic-reconciler-status into lp:~cf-charmers/charms/trusty/cloudfoundry/trunk

Proposed by Cory Johns
Status: Merged
Merged at revision: 162
Proposed branch: lp:~johnsca/charms/trusty/cloudfoundry/better-basic-reconciler-status
Merge into: lp:~cf-charmers/charms/trusty/cloudfoundry/trunk
Diff against target: 110 lines (+34/-10)
5 files modified
charmhelpers/core/services/base.py (+2/-0)
charmhelpers/core/services/helpers.py (+1/-4)
cloudfoundry/health_checks.py (+19/-0)
cloudfoundry/jobs.py (+1/-1)
reconciler/app.py (+11/-5)
To merge this branch: bzr merge lp:~johnsca/charms/trusty/cloudfoundry/better-basic-reconciler-status
Reviewer Review Type Date Requested Status
Benjamin Saller (community) Approve
Whit Morriss (community) Approve
Review via email: mp+242372@code.launchpad.net

Description of the change

Since the monit status is so unreliable, switch the basic health check to use the self-reported Juju status until we have something that more deeply introspects.

To post a comment you must log in.
Revision history for this message
Cory Johns (johnsca) wrote :

Basically, relies on the extended hookenv.juju_status() report (which accounts for data_ready), if available, and the basic Juju status if not.

Should give us a much more accurate approximation of when the system is up, although there may still be a slight delay (due to, e.g., `monit restart all` taking some time).

Revision history for this message
Benjamin Saller (bcsaller) wrote :

This looks like a good intermediate step. Thanks for doing this.

review: Approve
Revision history for this message
Whit Morriss (whitmo) wrote :

This look good to me.

I think the next step might be for the service framework to leave some breadcrumbs ala the status spec. Basically a list of incomplete relationships from the perspective of the service definition.

review: Approve
164. By Cory Johns

Only log "Incomplete requirement" for reqs that are blocking

Revision history for this message
Cory Johns (johnsca) wrote :

Whit,

The extended status is stored in $CHARM_DIR/juju_status.json on each unit, and is retrieved as part of the health hook that the reconciler calls. It might be nice for this to be pushed up to the reconciler instead of pulled, but I don't think it's all that urgent.

Revision history for this message
Benjamin Saller (bcsaller) wrote :

still LGTM, thanks :)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmhelpers/core/services/base.py'
2--- charmhelpers/core/services/base.py 2014-10-31 16:11:51 +0000
3+++ charmhelpers/core/services/base.py 2014-11-20 17:56:06 +0000
4@@ -297,6 +297,8 @@
5 if getattr(req, 'manual', False):
6 manual = True
7 if hasattr(req, 'name'):
8+ hookenv.log('Incomplete requirement: %s' % req.name,
9+ hookenv.DEBUG)
10 blockers.append(req.name)
11 hookenv.juju_status(status, manual=manual, blockers=blockers)
12
13
14=== modified file 'charmhelpers/core/services/helpers.py'
15--- charmhelpers/core/services/helpers.py 2014-10-31 13:20:03 +0000
16+++ charmhelpers/core/services/helpers.py 2014-11-20 17:56:06 +0000
17@@ -45,10 +45,7 @@
18 """
19 Returns True if all of the `required_keys` are available from any units.
20 """
21- ready = len(self.get(self.name, [])) > 0
22- if not ready:
23- hookenv.log('Incomplete relation: {}'.format(self.__class__.__name__), hookenv.DEBUG)
24- return ready
25+ return len(self.get(self.name, [])) > 0
26
27 def _is_ready(self, unit_data):
28 """
29
30=== modified file 'cloudfoundry/health_checks.py'
31--- cloudfoundry/health_checks.py 2014-08-24 21:36:50 +0000
32+++ cloudfoundry/health_checks.py 2014-11-20 17:56:06 +0000
33@@ -1,3 +1,4 @@
34+from charmhelpers.core import hookenv
35 from cloudfoundry import tasks
36
37
38@@ -17,3 +18,21 @@
39 message='not all services running',
40 data={'services': summary})
41 return result
42+
43+
44+def status(service):
45+ result = {
46+ 'name': 'monit_summary',
47+ 'health': 'pass',
48+ 'message': None,
49+ 'data': {},
50+ }
51+ status = hookenv.juju_status()
52+ if status['status'] == 'error':
53+ return dict(result, health='fail', message=status['message'])
54+ elif status['status'] == 'blocked' and status['manual']:
55+ return dict(result, health='fail', message='Blocked: %s' % status['blockers'])
56+ elif status['status'] == 'up':
57+ return result
58+ else:
59+ return dict(result, health='warn', message='Working (%s)' % status['status'])
60
61=== modified file 'cloudfoundry/jobs.py'
62--- cloudfoundry/jobs.py 2014-10-03 15:47:17 +0000
63+++ cloudfoundry/jobs.py 2014-11-20 17:56:06 +0000
64@@ -68,7 +68,7 @@
65 service_def = service_data[charm_name]
66 results = []
67 health = 'pass'
68- checks = service_def.get('health', []) + [health_checks.monit_summary]
69+ checks = service_def.get('health', []) + [health_checks.status]
70 for health_check in checks:
71 result = health_check(service_def)
72 if result['health'] == 'fail':
73
74=== modified file 'reconciler/app.py'
75--- reconciler/app.py 2014-11-14 17:00:08 +0000
76+++ reconciler/app.py 2014-11-20 17:56:06 +0000
77@@ -83,13 +83,14 @@
78 units = service.get('Units', {}) or {}
79 for unit_name, unit in units.iteritems():
80 unit_addr = unit.get('PublicAddress')
81+ unit_state = unit.get('AgentState')
82 if unit_addr:
83 loop = tornado.ioloop.IOLoop.instance()
84 loop.add_callback(check_health, service_name,
85- unit_name, unit_addr)
86-
87-
88-def check_health(service_name, unit_name, unit_addr):
89+ unit_name, unit_addr, unit_state)
90+
91+
92+def check_health(service_name, unit_name, unit_addr, unit_state):
93 service = health.setdefault(service_name, {
94 'name': service_name,
95 'health': 'unknown',
96@@ -118,8 +119,13 @@
97 unit['health'] = 'fail'
98 unit['state'] = {'message': 'Unable to parse health: {}'.format(output)}
99 except subprocess.CalledProcessError as e:
100- unit['health'] = 'warn'
101 unit['state'] = {'message': 'Unable to retrieve health: {}'.format(e.output)}
102+ if unit_state == 'started':
103+ unit['health'] = 'pass'
104+ elif unit_state == 'error':
105+ unit['health'] = 'fail'
106+ else:
107+ unit['health'] = 'warn'
108
109 units_fail = [u['health'] == 'fail' for u in service['units'].values()]
110 units_not_pass = [u['health'] != 'pass' for u in service['units'].values()]

Subscribers

People subscribed via source and target branches