Merge lp:~hopem/charms/trusty/swift-proxy/lp1448884 into lp:~openstack-charmers-archive/charms/trusty/swift-proxy/next
- Trusty Tahr (14.04)
- lp1448884
- Merge into next
Status: | Work in progress | ||||
---|---|---|---|---|---|
Proposed branch: | lp:~hopem/charms/trusty/swift-proxy/lp1448884 | ||||
Merge into: | lp:~openstack-charmers-archive/charms/trusty/swift-proxy/next | ||||
Diff against target: |
941 lines (+380/-140) 5 files modified
README.Swift_ring_management (+28/-0) hooks/swift_hooks.py (+128/-45) lib/swift_utils.py (+133/-65) unit_tests/test_swift_hooks.py (+11/-9) unit_tests/test_swift_utils.py (+80/-21) |
||||
To merge this branch: | bzr merge lp:~hopem/charms/trusty/swift-proxy/lp1448884 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
OpenStack Charmers | Pending | ||
Review via email: mp+282296@code.launchpad.net |
Commit message
Description of the change
Edward Hope-Morley (hopem) wrote : | # |
Hey Chris, sure thing and thanks for the review.
I've added README.
and builder management in the charm. We can always add more overtime as-and-when.
First i'd like to stress that this is mostly a refactor (upgrade safe) that is aimed at making the ring sync process more robust and cleaner to facilitate easier debugging.
With regards to:
deleting nodes - for storage nodes, this is currently a manual process and
repairing nodes - again, same as above
I've responded to your comment inline as well.
Edward Hope-Morley (hopem) wrote : | # |
(this is a hopefully better formatted version of my previous comment)
Hey Chris, sure thing.
I've added README.
With regards to:
deleting nodes - for storage nodes, this is currently a manual process and something that we need to grow the ability to do in the charm (probably using actions). For proxy units, information about proxy units/hosts is not in the rings/builders so as long as they are synced and the leader switches, removing a proxy unit should not interrupt service.
repairing nodes - again, same as above
I've responded to your comment inline as well.
uosci-testing-bot (uosci-testing-bot) wrote : | # |
charm_lint_check #835 swift-proxy-next for hopem mp282296
LINT OK: passed
uosci-testing-bot (uosci-testing-bot) wrote : | # |
charm_unit_test #739 swift-proxy-next for hopem mp282296
UNIT OK: passed
uosci-testing-bot (uosci-testing-bot) wrote : | # |
charm_amulet_test #337 swift-proxy-next for hopem mp282296
AMULET OK: passed
Edward Hope-Morley (hopem) wrote : | # |
Based on the new workflow I've migrated this to git for review - https:/
Unmerged revisions
- 136. By Edward Hope-Morley
-
add README.
Swift_ring_ management - 135. By Edward Hope-Morley
-
sync /next
- 134. By Edward Hope-Morley
-
[hopem,r=]
Make the ring sync code clearer and implement
tolerance for leader-switch during or after
sync.Closes-Bug: 1448884
Preview Diff
1 | === added file 'README.Swift_ring_management' |
2 | --- README.Swift_ring_management 1970-01-01 00:00:00 +0000 |
3 | +++ README.Swift_ring_management 2016-02-18 13:23:54 +0000 |
4 | @@ -0,0 +1,28 @@ |
5 | +== Swift Charm Ring Management == |
6 | + |
7 | +Swift uses rings and builders to manage data distribution across storage devices |
8 | +in the cluster. More information on this can be found in the upstream |
9 | +documentation [0]. |
10 | + |
11 | +In order to function correctly, the rings and builders need to be the same across |
12 | +all swift proxy units and the rings need to be distributed to storage units. The |
13 | +swift proxy charm achieves this by electing a leader unit and having that unit |
14 | +manage ring and builder files by updating them when new nodes or devices are added |
15 | +to the cluster and distrbuting those files to other nodes in the cluster. |
16 | + |
17 | +Since over time the leader may change, rings syncs use acknowledgements from peer |
18 | +units to determin whether a synchronisation has completed. This was if the leader |
19 | +swicthes to another unit, we are able to know that that unit has up-to-date |
20 | +ring and builder files. |
21 | + |
22 | +When new devices are added to storage units, the leader proxy unit is notified and |
23 | +adds them to the ring. Once complete, the leader unit will broadcast a notification |
24 | +that rings and builders are ready to be synced across the cluster (only proxy units |
25 | +get builders) and each unit in the cluster should then begin syncing from the leader |
26 | +and ackmowledge receipt. |
27 | + |
28 | +During synchronisation, swift-proxy services are stopped in order to avoid having |
29 | +requests being handled while rings are being updated. |
30 | + |
31 | + |
32 | +[0] http://docs.openstack.org/developer/swift/overview_ring.html |
33 | |
34 | === modified file 'hooks/swift_hooks.py' |
35 | --- hooks/swift_hooks.py 2015-11-27 09:48:45 +0000 |
36 | +++ hooks/swift_hooks.py 2016-02-18 13:23:54 +0000 |
37 | @@ -33,6 +33,10 @@ |
38 | get_first_available_value, |
39 | all_responses_equal, |
40 | ensure_www_dir_permissions, |
41 | + sync_builders_and_rings_if_changed, |
42 | + cluster_sync_rings, |
43 | + is_most_recent_timestamp, |
44 | + timestamps_available, |
45 | is_paused, |
46 | pause_aware_restart_on_change, |
47 | assess_status, |
48 | @@ -143,7 +147,7 @@ |
49 | do_openstack_upgrade(CONFIGS) |
50 | status_set('maintenance', 'Running openstack upgrade') |
51 | |
52 | - status_set('maintenance', 'Updating and balancing rings') |
53 | + status_set('maintenance', 'Updating and (maybe) balancing rings') |
54 | update_rings(min_part_hours=config('min-hours')) |
55 | |
56 | if not config('disable-ring-balance') and is_elected_leader(SWIFT_HA_RES): |
57 | @@ -322,7 +326,7 @@ |
58 | private_addr = unit_get('private-address') |
59 | |
60 | |
61 | -def all_peers_stopped(responses): |
62 | +def is_all_peers_stopped(responses): |
63 | """Establish whether all peers have stopped their proxy services. |
64 | |
65 | Each peer unit will set stop-proxy-service-ack to rq value to indicate that |
66 | @@ -336,11 +340,15 @@ |
67 | ack_key = SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK |
68 | token = relation_get(attribute=rq_key, unit=local_unit()) |
69 | if not token or token != responses[0].get(ack_key): |
70 | - log("Unmatched token in ack (expected=%s, got=%s)" % |
71 | + log("Token mismatch, rq and ack tokens differ (expected=%s, " |
72 | + "got=%s)" % |
73 | (token, responses[0].get(ack_key)), level=DEBUG) |
74 | return False |
75 | |
76 | if not all_responses_equal(responses, ack_key): |
77 | + log("Not all ack responses are equal. Either we are still waiting " |
78 | + "for responses or we were not the request originator.", |
79 | + level=DEBUG) |
80 | return False |
81 | |
82 | return True |
83 | @@ -354,20 +362,44 @@ |
84 | log("Cluster changed by unit=%s (local is leader)" % (remote_unit()), |
85 | level=DEBUG) |
86 | |
87 | - # If we have received an ack, check other units |
88 | - settings = relation_get() or {} |
89 | - ack_key = SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK |
90 | - |
91 | - # Protect against leader changing mid-sync |
92 | - if settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC): |
93 | - log("Sync request received yet this is leader unit. This would " |
94 | - "indicate that the leader has changed mid-sync - stopping proxy " |
95 | - "and notifying peers", level=ERROR) |
96 | - service_stop('swift-proxy') |
97 | - SwiftProxyClusterRPC().notify_leader_changed() |
98 | - return |
99 | - elif ack_key in settings: |
100 | - token = settings[ack_key] |
101 | + rx_settings = relation_get() or {} |
102 | + tx_settings = relation_get(unit=local_unit()) or {} |
103 | + |
104 | + rx_rq_token = rx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC) |
105 | + rx_ack_token = rx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK) |
106 | + |
107 | + tx_rq_token = tx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC) |
108 | + tx_ack_token = tx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK) |
109 | + |
110 | + rx_leader_changed = \ |
111 | + rx_settings.get(SwiftProxyClusterRPC.KEY_NOTIFY_LEADER_CHANGED) |
112 | + if rx_leader_changed: |
113 | + log("Leader change notification received and this is leader so " |
114 | + "retrying sync.", level=INFO) |
115 | + # FIXME: check that we were previously part of a successful sync to |
116 | + # ensure we have good rings. |
117 | + cluster_sync_rings(peers_only=tx_settings.get('peers-only', False), |
118 | + token=rx_leader_changed) |
119 | + return |
120 | + |
121 | + rx_resync_request = \ |
122 | + rx_settings.get(SwiftProxyClusterRPC.KEY_REQUEST_RESYNC) |
123 | + resync_request_ack_key = SwiftProxyClusterRPC.KEY_REQUEST_RESYNC_ACK |
124 | + tx_resync_request_ack = tx_settings.get(resync_request_ack_key) |
125 | + if rx_resync_request and tx_resync_request_ack != rx_resync_request: |
126 | + log("Unit '%s' has requested a resync" % (remote_unit()), |
127 | + level=INFO) |
128 | + cluster_sync_rings(peers_only=True) |
129 | + relation_set(**{resync_request_ack_key: rx_resync_request}) |
130 | + return |
131 | + |
132 | + # If we have received an ack token ensure it is not associated with a |
133 | + # request we received from another peer. If it is, this would indicate |
134 | + # a leadership change during a sync and this unit will abort the sync or |
135 | + # attempt to restore the original leader so to be able to complete the |
136 | + # sync. |
137 | + |
138 | + if rx_ack_token and rx_ack_token == tx_rq_token: |
139 | # Find out if all peer units have been stopped. |
140 | responses = [] |
141 | for rid in relation_ids('cluster'): |
142 | @@ -375,21 +407,38 @@ |
143 | responses.append(relation_get(rid=rid, unit=unit)) |
144 | |
145 | # Ensure all peers stopped before starting sync |
146 | - if all_peers_stopped(responses): |
147 | + if is_all_peers_stopped(responses): |
148 | key = 'peers-only' |
149 | if not all_responses_equal(responses, key, must_exist=False): |
150 | msg = ("Did not get equal response from every peer unit for " |
151 | "'%s'" % (key)) |
152 | raise SwiftProxyCharmException(msg) |
153 | |
154 | - peers_only = int(get_first_available_value(responses, key, |
155 | - default=0)) |
156 | + peers_only = bool(get_first_available_value(responses, key, |
157 | + default=0)) |
158 | log("Syncing rings and builders (peers-only=%s)" % (peers_only), |
159 | level=DEBUG) |
160 | - broadcast_rings_available(token, storage=not peers_only) |
161 | + broadcast_rings_available(broker_token=rx_ack_token, |
162 | + storage=not peers_only) |
163 | else: |
164 | log("Not all peer apis stopped - skipping sync until all peers " |
165 | "ready (got %s)" % (responses), level=INFO) |
166 | + elif ((rx_ack_token and (rx_ack_token == tx_ack_token)) or |
167 | + (rx_rq_token and (rx_rq_token == rx_ack_token))): |
168 | + log("It appears that the cluster leader has changed mid-sync - " |
169 | + "stopping proxy service", level=WARNING) |
170 | + service_stop('swift-proxy') |
171 | + broker = rx_settings.get('builder-broker') |
172 | + if broker: |
173 | + # If we get here, manual intervention will be required in order |
174 | + # to restore the cluster |
175 | + log("Failed to restore previous broker '%s' as leader" % |
176 | + (broker), level=ERROR) |
177 | + else: |
178 | + log("No builder-broker on rx_settings relation from '%s' - unable " |
179 | + "to attempt leader restore" % (remote_unit()), level=INFO) |
180 | + else: |
181 | + log("Not taking any sync actions", level=DEBUG) |
182 | |
183 | CONFIGS.write_all() |
184 | |
185 | @@ -401,31 +450,74 @@ |
186 | """ |
187 | log("Cluster changed by unit=%s (local is non-leader)" % (remote_unit()), |
188 | level=DEBUG) |
189 | - settings = relation_get() or {} |
190 | + rx_settings = relation_get() or {} |
191 | + tx_settings = relation_get(unit=local_unit()) or {} |
192 | + |
193 | + token = rx_settings.get(SwiftProxyClusterRPC.KEY_NOTIFY_LEADER_CHANGED) |
194 | + if token: |
195 | + log("Leader-changed notification received from peer unit. Since " |
196 | + "this most likely occurred during a ring sync proxies will " |
197 | + "be disabled until the leader is restored and a fresh sync " |
198 | + "request is set out", level=WARNING) |
199 | + service_stop("swift-proxy") |
200 | + return |
201 | + |
202 | + rx_rq_token = rx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC) |
203 | |
204 | # Check whether we have been requested to stop proxy service |
205 | - rq_key = SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC |
206 | - token = settings.get(rq_key, None) |
207 | - if token: |
208 | + if rx_rq_token: |
209 | log("Peer request to stop proxy service received (%s) - sending ack" % |
210 | - (token), level=INFO) |
211 | + (rx_rq_token), level=INFO) |
212 | service_stop('swift-proxy') |
213 | - peers_only = settings.get('peers-only', None) |
214 | - rq = SwiftProxyClusterRPC().stop_proxy_ack(echo_token=token, |
215 | + peers_only = rx_settings.get('peers-only', None) |
216 | + rq = SwiftProxyClusterRPC().stop_proxy_ack(echo_token=rx_rq_token, |
217 | echo_peers_only=peers_only) |
218 | relation_set(relation_settings=rq) |
219 | return |
220 | |
221 | # Check if there are any builder files we can sync from the leader. |
222 | - log("Non-leader peer - checking if updated rings available", level=DEBUG) |
223 | - broker = settings.get('builder-broker', None) |
224 | + broker = rx_settings.get('builder-broker', None) |
225 | + broker_token = rx_settings.get('broker-token', None) |
226 | + broker_timestamp = rx_settings.get('broker-timestamp', None) |
227 | + tx_ack_token = tx_settings.get(SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK) |
228 | if not broker: |
229 | - log("No update available", level=DEBUG) |
230 | + log("No ring/builder update available", level=DEBUG) |
231 | if not is_paused(): |
232 | service_start('swift-proxy') |
233 | - return |
234 | - |
235 | - builders_only = int(settings.get('sync-only-builders', 0)) |
236 | + |
237 | + return |
238 | + elif broker_token: |
239 | + if tx_ack_token: |
240 | + if broker_token == tx_ack_token: |
241 | + log("Broker and ACK tokens match (%s)" % (broker_token), |
242 | + level=DEBUG) |
243 | + else: |
244 | + log("Received ring/builder update notification but tokens do " |
245 | + "not match (broker-token=%s/ack-token=%s)" % |
246 | + (broker_token, tx_ack_token), level=WARNING) |
247 | + return |
248 | + else: |
249 | + log("Broker token available without handshake, assuming we just " |
250 | + "joined and rings won't change", level=DEBUG) |
251 | + else: |
252 | + log("Not taking any sync actions", level=DEBUG) |
253 | + return |
254 | + |
255 | + # If we upgrade from cluster that did not use timestamps, the new peer will |
256 | + # need to request a re-sync from the leader |
257 | + if not is_most_recent_timestamp(broker_timestamp): |
258 | + if not timestamps_available(excluded_unit=remote_unit()): |
259 | + log("Requesting resync") |
260 | + rq = SwiftProxyClusterRPC().request_resync(broker_token) |
261 | + relation_set(relation_settings=rq) |
262 | + else: |
263 | + log("Did not receive most recent broker timestamp but timestamps " |
264 | + "are available - waiting for next timestamp", level=INFO) |
265 | + |
266 | + return |
267 | + |
268 | + log("Ring/builder update available", level=DEBUG) |
269 | + builders_only = int(rx_settings.get('sync-only-builders', 0)) |
270 | path = os.path.basename(get_www_dir()) |
271 | try: |
272 | sync_proxy_rings('http://%s/%s' % (broker, path), |
273 | @@ -433,7 +525,7 @@ |
274 | except CalledProcessError: |
275 | log("Ring builder sync failed, builders not yet available - " |
276 | "leader not ready?", level=WARNING) |
277 | - return None |
278 | + return |
279 | |
280 | # Re-enable the proxy once all builders and rings are synced |
281 | if fully_synced(): |
282 | @@ -449,16 +541,6 @@ |
283 | @hooks.hook('cluster-relation-changed') |
284 | @pause_aware_restart_on_change(restart_map()) |
285 | def cluster_changed(): |
286 | - key = SwiftProxyClusterRPC.KEY_NOTIFY_LEADER_CHANGED |
287 | - leader_changed = relation_get(attribute=key) |
288 | - if leader_changed: |
289 | - log("Leader changed notification received from peer unit. Since this " |
290 | - "most likely occurred during a ring sync proxies will be " |
291 | - "disabled until the leader is restored and a fresh sync request " |
292 | - "is set out", level=WARNING) |
293 | - service_stop("swift-proxy") |
294 | - return |
295 | - |
296 | if is_elected_leader(SWIFT_HA_RES): |
297 | cluster_leader_actions() |
298 | else: |
299 | @@ -466,6 +548,7 @@ |
300 | |
301 | |
302 | @hooks.hook('ha-relation-changed') |
303 | +@sync_builders_and_rings_if_changed |
304 | def ha_relation_changed(): |
305 | clustered = relation_get('clustered') |
306 | if clustered: |
307 | |
308 | === modified file 'lib/swift_utils.py' |
309 | --- lib/swift_utils.py 2015-11-06 18:33:58 +0000 |
310 | +++ lib/swift_utils.py 2016-02-18 13:23:54 +0000 |
311 | @@ -7,6 +7,7 @@ |
312 | import subprocess |
313 | import tempfile |
314 | import threading |
315 | +import time |
316 | import uuid |
317 | |
318 | from collections import OrderedDict |
319 | @@ -39,6 +40,7 @@ |
320 | INFO, |
321 | WARNING, |
322 | config, |
323 | + local_unit, |
324 | relation_get, |
325 | unit_get, |
326 | relation_set, |
327 | @@ -180,40 +182,56 @@ |
328 | KEY_STOP_PROXY_SVC = 'stop-proxy-service' |
329 | KEY_STOP_PROXY_SVC_ACK = 'stop-proxy-service-ack' |
330 | KEY_NOTIFY_LEADER_CHANGED = 'leader-changed-notification' |
331 | + KEY_REQUEST_RESYNC = 'resync-request' |
332 | + KEY_REQUEST_RESYNC_ACK = 'resync-request-ack' |
333 | |
334 | def __init__(self, version=1): |
335 | self._version = version |
336 | |
337 | + @property |
338 | + def _hostname(self): |
339 | + hostname = get_hostaddr() |
340 | + return format_ipv6_addr(hostname) or hostname |
341 | + |
342 | def template(self): |
343 | # Everything must be None by default so it gets dropped from the |
344 | # relation unless we want it to be set. |
345 | templates = {1: {'trigger': None, |
346 | 'broker-token': None, |
347 | 'builder-broker': None, |
348 | + 'broker-timestamp': None, |
349 | self.KEY_STOP_PROXY_SVC: None, |
350 | self.KEY_STOP_PROXY_SVC_ACK: None, |
351 | self.KEY_NOTIFY_LEADER_CHANGED: None, |
352 | + self.KEY_REQUEST_RESYNC: None, |
353 | 'peers-only': None, |
354 | 'sync-only-builders': None}} |
355 | return copy.deepcopy(templates[self._version]) |
356 | |
357 | - def stop_proxy_request(self, peers_only=False): |
358 | + def stop_proxy_request(self, peers_only=False, token=None): |
359 | """Request to stop peer proxy service. |
360 | |
361 | - NOTE: leader action |
362 | + A token can optionally be supplied in case we want to restart a sync |
363 | + e.g. following a leader change notification. |
364 | + |
365 | + NOTE: this action must only be performed by the cluster leader. |
366 | """ |
367 | rq = self.template() |
368 | - rq['trigger'] = str(uuid.uuid4()) |
369 | + if not token: |
370 | + token = str(uuid.uuid4()) |
371 | + |
372 | + rq['trigger'] = token |
373 | rq[self.KEY_STOP_PROXY_SVC] = rq['trigger'] |
374 | if peers_only: |
375 | rq['peers-only'] = 1 |
376 | |
377 | + rq['builder-broker'] = self._hostname |
378 | return rq |
379 | |
380 | def stop_proxy_ack(self, echo_token, echo_peers_only): |
381 | """Ack that peer proxy service is stopped. |
382 | |
383 | - NOTE: non-leader action |
384 | + NOTE: this action must NOT be performed by the cluster leader. |
385 | """ |
386 | rq = self.template() |
387 | rq['trigger'] = str(uuid.uuid4()) |
388 | @@ -222,11 +240,10 @@ |
389 | rq['peers-only'] = echo_peers_only |
390 | return rq |
391 | |
392 | - def sync_rings_request(self, broker_host, broker_token, |
393 | - builders_only=False): |
394 | + def sync_rings_request(self, broker_token, builders_only=False): |
395 | """Request for peer to sync rings. |
396 | |
397 | - NOTE: leader action |
398 | + NOTE: this action must only be performed by the cluster leader. |
399 | """ |
400 | rq = self.template() |
401 | rq['trigger'] = str(uuid.uuid4()) |
402 | @@ -235,17 +252,34 @@ |
403 | rq['sync-only-builders'] = 1 |
404 | |
405 | rq['broker-token'] = broker_token |
406 | - rq['builder-broker'] = broker_host |
407 | + rq['broker-timestamp'] = "%f" % time.time() |
408 | + rq['builder-broker'] = self._hostname |
409 | return rq |
410 | |
411 | - def notify_leader_changed(self): |
412 | + def notify_leader_changed(self, token): |
413 | """Notify peers that leader has changed. |
414 | |
415 | - NOTE: leader action |
416 | - """ |
417 | - rq = self.template() |
418 | - rq['trigger'] = str(uuid.uuid4()) |
419 | - rq[self.KEY_NOTIFY_LEADER_CHANGED] = rq['trigger'] |
420 | + The token passed in must be that associated with the sync we claim to |
421 | + have been interrupted. It will be re-used by the restored leader once |
422 | + it receives this notification. |
423 | + |
424 | + NOTE: this action must only be performed by the cluster leader that |
425 | + has relinquished it's leader status as part of the current hook |
426 | + context. |
427 | + """ |
428 | + rq = self.template() |
429 | + rq['trigger'] = str(uuid.uuid4()) |
430 | + rq[self.KEY_NOTIFY_LEADER_CHANGED] = token |
431 | + return rq |
432 | + |
433 | + def request_resync(self, token): |
434 | + """Request re-sync from leader. |
435 | + |
436 | + NOTE: this action must not be performed by the cluster leader. |
437 | + """ |
438 | + rq = self.template() |
439 | + rq['trigger'] = str(uuid.uuid4()) |
440 | + rq[self.KEY_REQUEST_RESYNC] = token |
441 | return rq |
442 | |
443 | |
444 | @@ -704,29 +738,27 @@ |
445 | return sha.hexdigest() |
446 | |
447 | |
448 | -def get_broker_token(): |
449 | - """Get ack token from peers to be used as broker token. |
450 | - |
451 | - Must be equal across all peers. |
452 | - |
453 | - Returns token or None if not found. |
454 | +def non_null_unique(data): |
455 | + """Return True if data is a list containing all non-null values that are |
456 | + all equal. |
457 | """ |
458 | - responses = [] |
459 | - ack_key = SwiftProxyClusterRPC.KEY_STOP_PROXY_SVC_ACK |
460 | + return (all(data) and len(set(data)) > 1) |
461 | + |
462 | + |
463 | +def previously_synced(): |
464 | + """If a full sync is not known to have been performed from this unit, False |
465 | + is returned.""" |
466 | + broker = None |
467 | for rid in relation_ids('cluster'): |
468 | - for unit in related_units(rid): |
469 | - responses.append(relation_get(attribute=ack_key, rid=rid, |
470 | - unit=unit)) |
471 | - |
472 | - # If no acks exist we have probably never done a sync so make up a token |
473 | - if len(responses) == 0: |
474 | - return str(uuid.uuid4()) |
475 | - |
476 | - if not all(responses) or len(set(responses)) != 1: |
477 | - log("Not all ack tokens equal - %s" % (responses), level=DEBUG) |
478 | - return None |
479 | - |
480 | - return responses[0] |
481 | + broker = relation_get(attribute='builder-broker', rid=rid, |
482 | + unit=local_unit()) |
483 | + only_builders_synced = relation_get(attribute='sync-only-builders', |
484 | + rid=rid, unit=local_unit()) |
485 | + |
486 | + if broker and not only_builders_synced: |
487 | + return True |
488 | + |
489 | + return False |
490 | |
491 | |
492 | def sync_builders_and_rings_if_changed(f): |
493 | @@ -757,7 +789,8 @@ |
494 | rings_path = os.path.join(SWIFT_CONF_DIR, '*.%s' % |
495 | (SWIFT_RING_EXT)) |
496 | rings_ready = len(glob.glob(rings_path)) == len(SWIFT_RINGS) |
497 | - rings_changed = rings_after != rings_before |
498 | + rings_changed = ((rings_after != rings_before) or |
499 | + not previously_synced()) |
500 | builders_changed = builders_after != builders_before |
501 | if rings_changed or builders_changed: |
502 | # Copy builders and rings (if available) to the server dir. |
503 | @@ -766,9 +799,9 @@ |
504 | # Trigger sync |
505 | cluster_sync_rings(peers_only=not rings_changed) |
506 | else: |
507 | - cluster_sync_rings(peers_only=True, builders_only=True) |
508 | log("Rings not ready for sync - syncing builders", |
509 | level=DEBUG) |
510 | + cluster_sync_rings(peers_only=True, builders_only=True) |
511 | else: |
512 | log("Rings/builders unchanged - skipping sync", level=DEBUG) |
513 | |
514 | @@ -869,20 +902,31 @@ |
515 | "skipping", level=WARNING) |
516 | return |
517 | |
518 | - hostname = get_hostaddr() |
519 | - hostname = format_ipv6_addr(hostname) or hostname |
520 | + if not broker_token: |
521 | + log("No broker token - aborting sync", level=WARNING) |
522 | + return |
523 | + |
524 | + cluster_rids = relation_ids('cluster') |
525 | + if not cluster_rids: |
526 | + log("Cluster relation not yet available - skipping sync", level=DEBUG) |
527 | + return |
528 | + |
529 | + if builders_only: |
530 | + type = "builders" |
531 | + else: |
532 | + type = "builders & rings" |
533 | + |
534 | # Notify peers that builders are available |
535 | - log("Notifying peer(s) that rings are ready for sync.", level=INFO) |
536 | - rq = SwiftProxyClusterRPC().sync_rings_request(hostname, |
537 | - broker_token, |
538 | + log("Notifying peer(s) that %s are ready for sync." % type, level=INFO) |
539 | + rq = SwiftProxyClusterRPC().sync_rings_request(broker_token, |
540 | builders_only=builders_only) |
541 | - for rid in relation_ids('cluster'): |
542 | + for rid in cluster_rids: |
543 | log("Notifying rid=%s (%s)" % (rid, rq), level=DEBUG) |
544 | relation_set(relation_id=rid, relation_settings=rq) |
545 | |
546 | |
547 | -def broadcast_rings_available(broker_token, peers=True, storage=True, |
548 | - builders_only=False): |
549 | +def broadcast_rings_available(storage=True, builders_only=False, |
550 | + broker_token=None): |
551 | """Notify storage relations and cluster (peer) relations that rings and |
552 | builders are availble for sync. |
553 | |
554 | @@ -895,14 +939,13 @@ |
555 | else: |
556 | log("Skipping notify storage relations", level=DEBUG) |
557 | |
558 | - if peers: |
559 | - notify_peers_builders_available(broker_token, |
560 | - builders_only=builders_only) |
561 | - else: |
562 | - log("Skipping notify peer relations", level=DEBUG) |
563 | - |
564 | - |
565 | -def cluster_sync_rings(peers_only=False, builders_only=False): |
566 | + # Always set peer info even if not clustered so that info is present when |
567 | + # units join |
568 | + notify_peers_builders_available(broker_token, |
569 | + builders_only=builders_only) |
570 | + |
571 | + |
572 | +def cluster_sync_rings(peers_only=False, builders_only=False, token=None): |
573 | """Notify peer relations that they should stop their proxy services. |
574 | |
575 | Peer units will then be expected to do a relation_set with |
576 | @@ -923,24 +966,21 @@ |
577 | # If we have no peer units just go ahead and broadcast to storage |
578 | # relations. If we have been instructed to only broadcast to peers this |
579 | # should do nothing. |
580 | - broker_token = get_broker_token() |
581 | - broadcast_rings_available(broker_token, peers=False, |
582 | + broadcast_rings_available(broker_token=str(uuid.uuid4()), |
583 | storage=not peers_only) |
584 | return |
585 | elif builders_only: |
586 | + if not token: |
587 | + token = str(uuid.uuid4()) |
588 | + |
589 | # No need to stop proxies if only syncing builders between peers. |
590 | - broker_token = get_broker_token() |
591 | - broadcast_rings_available(broker_token, storage=False, |
592 | - builders_only=builders_only) |
593 | + broadcast_rings_available(storage=False, builders_only=True, |
594 | + broker_token=token) |
595 | return |
596 | |
597 | - rel_ids = relation_ids('cluster') |
598 | - trigger = str(uuid.uuid4()) |
599 | - |
600 | - log("Sending request to stop proxy service to all peers (%s)" % (trigger), |
601 | - level=INFO) |
602 | - rq = SwiftProxyClusterRPC().stop_proxy_request(peers_only) |
603 | - for rid in rel_ids: |
604 | + log("Sending request to stop proxy service to all peers", level=INFO) |
605 | + rq = SwiftProxyClusterRPC().stop_proxy_request(peers_only, token=token) |
606 | + for rid in relation_ids('cluster'): |
607 | relation_set(relation_id=rid, relation_settings=rq) |
608 | |
609 | |
610 | @@ -996,6 +1036,34 @@ |
611 | return unit_get('private-address') |
612 | |
613 | |
614 | +def is_most_recent_timestamp(timestamp): |
615 | + ts = [] |
616 | + for rid in relation_ids('cluster'): |
617 | + for unit in related_units(rid): |
618 | + settings = relation_get(rid=rid, unit=unit) |
619 | + t = settings.get('broker-timestamp') |
620 | + if t: |
621 | + ts.append(t) |
622 | + |
623 | + if not ts or not timestamp: |
624 | + return False |
625 | + |
626 | + return timestamp >= max(ts) |
627 | + |
628 | + |
629 | +def timestamps_available(excluded_unit): |
630 | + for rid in relation_ids('cluster'): |
631 | + for unit in related_units(rid): |
632 | + if unit == excluded_unit: |
633 | + continue |
634 | + |
635 | + settings = relation_get(rid=rid, unit=unit) |
636 | + if settings.get('broker-timestamp'): |
637 | + return True |
638 | + |
639 | + return False |
640 | + |
641 | + |
642 | def is_paused(status_get=status_get): |
643 | """Is the unit paused?""" |
644 | with HookData()(): |
645 | |
646 | === modified file 'unit_tests/test_swift_hooks.py' |
647 | --- unit_tests/test_swift_hooks.py 2015-11-27 09:48:45 +0000 |
648 | +++ unit_tests/test_swift_hooks.py 2016-02-18 13:23:54 +0000 |
649 | @@ -8,41 +8,43 @@ |
650 | ) |
651 | |
652 | sys.path.append("hooks") |
653 | -with patch('charmhelpers.core.hookenv.log'): |
654 | - with patch('lib.swift_utils.is_paused') as is_paused: |
655 | - import swift_hooks |
656 | +# swift_hooks.py does a register_configs() on import |
657 | +with patch('charmhelpers.contrib.openstack.utils.apt_cache', lambda: None): |
658 | + with patch('charmhelpers.core.hookenv.log'): |
659 | + with patch('lib.swift_utils.is_paused') as is_paused: |
660 | + import swift_hooks |
661 | |
662 | |
663 | class SwiftHooksTestCase(unittest.TestCase): |
664 | |
665 | @patch("swift_hooks.relation_get") |
666 | @patch("swift_hooks.local_unit") |
667 | - def test_all_peers_stopped(self, mock_local_unit, mock_relation_get): |
668 | + def test_is_all_peers_stopped(self, mock_local_unit, mock_relation_get): |
669 | token1 = str(uuid.uuid4()) |
670 | token2 = str(uuid.uuid4()) |
671 | mock_relation_get.return_value = token1 |
672 | |
673 | responses = [{'some-other-key': token1}] |
674 | - self.assertFalse(swift_hooks.all_peers_stopped(responses)) |
675 | + self.assertFalse(swift_hooks.is_all_peers_stopped(responses)) |
676 | |
677 | responses = [{'stop-proxy-service-ack': token1}, |
678 | {'stop-proxy-service-ack': token2}] |
679 | - self.assertFalse(swift_hooks.all_peers_stopped(responses)) |
680 | + self.assertFalse(swift_hooks.is_all_peers_stopped(responses)) |
681 | |
682 | responses = [{'stop-proxy-service-ack': token1}, |
683 | {'stop-proxy-service-ack': token1}, |
684 | {'some-other-key': token1}] |
685 | - self.assertFalse(swift_hooks.all_peers_stopped(responses)) |
686 | + self.assertFalse(swift_hooks.is_all_peers_stopped(responses)) |
687 | |
688 | responses = [{'stop-proxy-service-ack': token1}, |
689 | {'stop-proxy-service-ack': token1}] |
690 | - self.assertTrue(swift_hooks.all_peers_stopped(responses)) |
691 | + self.assertTrue(swift_hooks.is_all_peers_stopped(responses)) |
692 | |
693 | mock_relation_get.return_value = token2 |
694 | |
695 | responses = [{'stop-proxy-service-ack': token1}, |
696 | {'stop-proxy-service-ack': token1}] |
697 | - self.assertFalse(swift_hooks.all_peers_stopped(responses)) |
698 | + self.assertFalse(swift_hooks.is_all_peers_stopped(responses)) |
699 | |
700 | @patch.object(swift_hooks, 'config') |
701 | @patch('charmhelpers.contrib.openstack.ip.config') |
702 | |
703 | === modified file 'unit_tests/test_swift_utils.py' |
704 | --- unit_tests/test_swift_utils.py 2015-11-03 13:52:37 +0000 |
705 | +++ unit_tests/test_swift_utils.py 2016-02-18 13:23:54 +0000 |
706 | @@ -22,7 +22,7 @@ |
707 | |
708 | class SwiftUtilsTestCase(unittest.TestCase): |
709 | |
710 | - @mock.patch('lib.swift_utils.get_broker_token') |
711 | + @mock.patch.object(swift_utils, 'previously_synced') |
712 | @mock.patch('lib.swift_utils.update_www_rings') |
713 | @mock.patch('lib.swift_utils.get_builders_checksum') |
714 | @mock.patch('lib.swift_utils.get_rings_checksum') |
715 | @@ -38,12 +38,12 @@ |
716 | mock_log, mock_balance_rings, |
717 | mock_get_rings_checksum, |
718 | mock_get_builders_checksum, mock_update_www_rings, |
719 | - mock_get_broker_token): |
720 | - mock_get_broker_token.return_value = "token1" |
721 | + mock_previously_synced): |
722 | |
723 | # Make sure same is returned for both so that we don't try to sync |
724 | mock_get_rings_checksum.return_value = None |
725 | mock_get_builders_checksum.return_value = None |
726 | + mock_previously_synced.return_value = True |
727 | |
728 | # Test blocker 1 |
729 | mock_is_elected_leader.return_value = False |
730 | @@ -77,25 +77,23 @@ |
731 | self.assertTrue(mock_set_min_hours.called) |
732 | self.assertTrue(mock_balance_rings.called) |
733 | |
734 | + @mock.patch('lib.swift_utils.previously_synced') |
735 | @mock.patch('lib.swift_utils._load_builder') |
736 | @mock.patch('lib.swift_utils.initialize_ring') |
737 | - @mock.patch('lib.swift_utils.get_broker_token') |
738 | @mock.patch('lib.swift_utils.update_www_rings') |
739 | @mock.patch('lib.swift_utils.get_builders_checksum') |
740 | @mock.patch('lib.swift_utils.get_rings_checksum') |
741 | @mock.patch('lib.swift_utils.balance_rings') |
742 | @mock.patch('lib.swift_utils.log') |
743 | @mock.patch('lib.swift_utils.is_elected_leader') |
744 | - def test_update_rings_multiple_devs(self, |
745 | - mock_is_elected_leader, |
746 | + def test_update_rings_multiple_devs(self, mock_is_elected_leader, |
747 | mock_log, mock_balance_rings, |
748 | mock_get_rings_checksum, |
749 | mock_get_builders_checksum, |
750 | mock_update_www_rings, |
751 | - mock_get_broker_token, |
752 | mock_initialize_ring, |
753 | mock_load_builder, |
754 | - ): |
755 | + mock_previously_synced): |
756 | # To avoid the need for swift.common.ring library, mock a basic |
757 | # rings dictionary, keyed by path. |
758 | # Each ring has enough logic to hold a dictionary with a single 'devs' |
759 | @@ -115,11 +113,13 @@ |
760 | |
761 | def add_dev(self, dev): |
762 | mock_rings[self.path]['devs'].append(dev) |
763 | + |
764 | return mock_ring(path) |
765 | |
766 | def mock_initialize_ring_fn(path, *args): |
767 | mock_rings.setdefault(path, {'devs': []}) |
768 | |
769 | + mock_is_elected_leader.return_value = True |
770 | mock_load_builder.side_effect = mock_load_builder_fn |
771 | mock_initialize_ring.side_effect = mock_initialize_ring_fn |
772 | |
773 | @@ -153,7 +153,6 @@ |
774 | swift_utils.update_rings(nodes) |
775 | self.assertFalse(mock_add_to_ring.called) |
776 | |
777 | - @mock.patch('lib.swift_utils.get_broker_token') |
778 | @mock.patch('lib.swift_utils.balance_rings') |
779 | @mock.patch('lib.swift_utils.log') |
780 | @mock.patch('lib.swift_utils.is_elected_leader') |
781 | @@ -165,9 +164,7 @@ |
782 | mock_config, |
783 | mock_is_elected_leader, |
784 | mock_log, |
785 | - mock_balance_rings, |
786 | - mock_get_broker_token): |
787 | - mock_get_broker_token.return_value = "token1" |
788 | + mock_balance_rings): |
789 | |
790 | @swift_utils.sync_builders_and_rings_if_changed |
791 | def mock_balance(): |
792 | @@ -200,6 +197,7 @@ |
793 | finally: |
794 | shutil.rmtree(tmpdir) |
795 | |
796 | + @mock.patch.object(swift_utils, 'get_hostaddr', lambda *args: '1.2.3.4') |
797 | @mock.patch('lib.swift_utils.uuid') |
798 | def test_cluster_rpc_stop_proxy_request(self, mock_uuid): |
799 | mock_uuid.uuid4.return_value = 'test-uuid' |
800 | @@ -207,9 +205,11 @@ |
801 | rq = rpc.stop_proxy_request(peers_only=True) |
802 | self.assertEqual({'trigger': 'test-uuid', |
803 | 'broker-token': None, |
804 | - 'builder-broker': None, |
805 | - 'peers-only': True, |
806 | + 'broker-timestamp': None, |
807 | + 'builder-broker': '1.2.3.4', |
808 | + 'peers-only': 1, |
809 | 'leader-changed-notification': None, |
810 | + 'resync-request': None, |
811 | 'stop-proxy-service': 'test-uuid', |
812 | 'stop-proxy-service-ack': None, |
813 | 'sync-only-builders': None}, rq) |
814 | @@ -217,13 +217,18 @@ |
815 | rq = rpc.stop_proxy_request() |
816 | self.assertEqual({'trigger': 'test-uuid', |
817 | 'broker-token': None, |
818 | - 'builder-broker': None, |
819 | + 'broker-timestamp': None, |
820 | + 'builder-broker': '1.2.3.4', |
821 | 'peers-only': None, |
822 | 'leader-changed-notification': None, |
823 | + 'resync-request': None, |
824 | 'stop-proxy-service': 'test-uuid', |
825 | 'stop-proxy-service-ack': None, |
826 | 'sync-only-builders': None}, rq) |
827 | |
828 | + template_keys = set(rpc.template()) |
829 | + self.assertTrue(set(rq.keys()).issubset(template_keys)) |
830 | + |
831 | @mock.patch('lib.swift_utils.uuid') |
832 | def test_cluster_rpc_stop_proxy_ack(self, mock_uuid): |
833 | mock_uuid.uuid4.return_value = 'token2' |
834 | @@ -232,40 +237,58 @@ |
835 | self.assertEqual({'trigger': 'token2', |
836 | 'broker-token': None, |
837 | 'builder-broker': None, |
838 | + 'broker-timestamp': None, |
839 | 'peers-only': '1', |
840 | 'leader-changed-notification': None, |
841 | + 'resync-request': None, |
842 | 'stop-proxy-service': None, |
843 | 'stop-proxy-service-ack': 'token1', |
844 | 'sync-only-builders': None}, rq) |
845 | |
846 | + template_keys = set(rpc.template()) |
847 | + self.assertTrue(set(rq.keys()).issubset(template_keys)) |
848 | + |
849 | + @mock.patch.object(swift_utils, 'get_hostaddr', lambda *args: '1.2.3.4') |
850 | + @mock.patch.object(swift_utils, 'time') |
851 | @mock.patch('lib.swift_utils.uuid') |
852 | - def test_cluster_rpc_sync_request(self, mock_uuid): |
853 | + def test_cluster_rpc_sync_request(self, mock_uuid, mock_time): |
854 | + mock_time.time = mock.Mock(return_value=float(1.234)) |
855 | mock_uuid.uuid4.return_value = 'token2' |
856 | rpc = swift_utils.SwiftProxyClusterRPC() |
857 | - rq = rpc.sync_rings_request('HostA', 'token1') |
858 | + rq = rpc.sync_rings_request('token1') |
859 | self.assertEqual({'trigger': 'token2', |
860 | 'broker-token': 'token1', |
861 | - 'builder-broker': 'HostA', |
862 | + 'broker-timestamp': '1.234000', |
863 | + 'builder-broker': '1.2.3.4', |
864 | 'peers-only': None, |
865 | 'leader-changed-notification': None, |
866 | + 'resync-request': None, |
867 | 'stop-proxy-service': None, |
868 | 'stop-proxy-service-ack': None, |
869 | 'sync-only-builders': None}, rq) |
870 | |
871 | + template_keys = set(rpc.template()) |
872 | + self.assertTrue(set(rq.keys()).issubset(template_keys)) |
873 | + |
874 | @mock.patch('lib.swift_utils.uuid') |
875 | def test_cluster_rpc_notify_leader_changed(self, mock_uuid): |
876 | - mock_uuid.uuid4.return_value = 'token1' |
877 | + mock_uuid.uuid4.return_value = 'e4b67426-6cc0-4aa3-829d-227999cd0a75' |
878 | rpc = swift_utils.SwiftProxyClusterRPC() |
879 | - rq = rpc.notify_leader_changed() |
880 | - self.assertEqual({'trigger': 'token1', |
881 | + rq = rpc.notify_leader_changed('token1') |
882 | + self.assertEqual({'trigger': 'e4b67426-6cc0-4aa3-829d-227999cd0a75', |
883 | 'broker-token': None, |
884 | 'builder-broker': None, |
885 | + 'broker-timestamp': None, |
886 | 'peers-only': None, |
887 | 'leader-changed-notification': 'token1', |
888 | 'stop-proxy-service': None, |
889 | 'stop-proxy-service-ack': None, |
890 | + 'resync-request': None, |
891 | 'sync-only-builders': None}, rq) |
892 | |
893 | + template_keys = set(rpc.template().keys()) |
894 | + self.assertTrue(set(rq.keys()).issubset(template_keys)) |
895 | + |
896 | def test_all_responses_equal(self): |
897 | responses = [{'a': 1, 'c': 3}] |
898 | self.assertTrue(swift_utils.all_responses_equal(responses, 'b', |
899 | @@ -301,6 +324,42 @@ |
900 | rsps = [] |
901 | self.assertIsNone(swift_utils.get_first_available_value(rsps, 'key3')) |
902 | |
903 | + @mock.patch.object(swift_utils, 'relation_get') |
904 | + @mock.patch.object(swift_utils, 'related_units', lambda arg: ['proxy/1']) |
905 | + @mock.patch.object(swift_utils, 'relation_ids', lambda arg: ['cluster:1']) |
906 | + def test_is_most_recent_timestamp(self, mock_rel_get): |
907 | + mock_rel_get.return_value = {'broker-timestamp': '1111'} |
908 | + self.assertTrue(swift_utils.is_most_recent_timestamp('1234')) |
909 | + mock_rel_get.return_value = {'broker-timestamp': '2234'} |
910 | + self.assertFalse(swift_utils.is_most_recent_timestamp('1234')) |
911 | + mock_rel_get.return_value = {} |
912 | + self.assertFalse(swift_utils.is_most_recent_timestamp('1234')) |
913 | + mock_rel_get.return_value = {'broker-timestamp': '2234'} |
914 | + self.assertFalse(swift_utils.is_most_recent_timestamp(None)) |
915 | + |
916 | + @mock.patch.object(swift_utils, 'relation_get') |
917 | + @mock.patch.object(swift_utils, 'related_units', lambda arg: ['proxy/1']) |
918 | + @mock.patch.object(swift_utils, 'relation_ids', lambda arg: ['cluster:1']) |
919 | + def test_timestamps_available(self, mock_rel_get): |
920 | + mock_rel_get.return_value = {} |
921 | + self.assertFalse(swift_utils.timestamps_available('proxy/1')) |
922 | + mock_rel_get.return_value = {'broker-timestamp': '1234'} |
923 | + self.assertFalse(swift_utils.timestamps_available('proxy/1')) |
924 | + mock_rel_get.return_value = {'broker-timestamp': '1234'} |
925 | + self.assertTrue(swift_utils.timestamps_available('proxy/2')) |
926 | + |
927 | + def _test_is_paused_unknown(self): |
928 | + fake_status_get = lambda: ("unknown", "") |
929 | + self.assertFalse(swift_utils.is_paused(status_get=fake_status_get)) |
930 | + |
931 | + def _test_is_paused_paused(self): |
932 | + fake_status_get = lambda: ("maintenance", "Paused") |
933 | + self.assertTrue(swift_utils.is_paused(status_get=fake_status_get)) |
934 | + |
935 | + def _test_is_paused_other_maintenance(self): |
936 | + fake_status_get = lambda: ("maintenance", "Hook") |
937 | + self.assertFalse(swift_utils.is_paused(status_get=fake_status_get)) |
938 | + |
939 | @mock.patch('lib.swift_utils.is_paused') |
940 | @mock.patch('lib.swift_utils.config') |
941 | @mock.patch('lib.swift_utils.set_os_workload_status') |
Ed, First off this is fantastic work! This will greatly help the swift charm along. I have a few questions though.
Could you talk a little bit about what cases this solves? For example:
1. Adding a new node to the swift cluster
2. Deleting a node from the cluster
3. Reimaging a previous node and putting it back into service
4. A node goes offline for repairs and then comes back later ( say a day or a week )
Can this handle all these cases? It's hard to tell just from sifting through the code.
Are there potential race conditions with this that can leave the cluster is a broken state if the leader changes midway through the sync? Just curious.
I had a few little questions inline in the code about variable names.