Merge lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units into lp:charms/postgresql

Proposed by Stuart Bishop
Status: Merged
Merged at revision: 87
Proposed branch: lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units
Merge into: lp:charms/postgresql
Prerequisite: lp:~stub/charms/precise/postgresql/pg93
Diff against target: 80 lines (+30/-8)
2 files modified
hooks/hooks.py (+13/-4)
test.py (+17/-4)
To merge this branch: bzr merge lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units
Reviewer Review Type Date Requested Status
Marco Ceppi (community) Approve
Review via email: mp+205725@code.launchpad.net

Description of the change

Fix for Bug #1278731.

It was possible for hot standby units to have their db relation hooks run before the master, in which case they would copy obsolete connection details from the master. Clients following the db relation protocol would rightfully ignore the obsolete credentials (in this case, an invalid database name) and wait for the server unit to catch up (which it never did).

Now the hot standby explicitly overrides the database name in this case, and blocks waiting for the master to create the database and for it to replicate.

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

It seems that with this fix, along with a little more sleep()ing, the replication tests are no longer flaky with the local provider. We might need to increase the sleeps further for other providers.

Revision history for this message
Marco Ceppi (marcoceppi) wrote :

LGTM, +1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'hooks/hooks.py'
2--- hooks/hooks.py 2014-02-11 09:59:00 +0000
3+++ hooks/hooks.py 2014-02-11 09:59:01 +0000
4@@ -248,7 +248,8 @@
5
6 def run(command, exit_on_error=True, quiet=False):
7 '''Run a command and return the output.'''
8- log("Running {!r}".format(command), DEBUG)
9+ if not quiet:
10+ log("Running {!r}".format(command), DEBUG)
11 p = subprocess.Popen(
12 command, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
13 shell=isinstance(command, basestring))
14@@ -1372,8 +1373,8 @@
15
16 # Update the peer relation, notifying any hot standby units
17 # to republish connection details to the client relation.
18- local_state['client_relations'] = ' '.join(
19- hookenv.relation_ids('db') + hookenv.relation_ids('db-admin'))
20+ local_state['client_relations'] = ' '.join(sorted(
21+ hookenv.relation_ids('db') + hookenv.relation_ids('db-admin')))
22 log("Client relations {}".format(local_state['client_relations']))
23 local_state.publish()
24
25@@ -1861,6 +1862,13 @@
26 connection_settings['host'] = hookenv.unit_private_ip()
27 connection_settings['port'] = get_service_port()
28 connection_settings['state'] = local_state['state']
29+ requested_db = hookenv.relation_get('database')
30+ # A hot standby might have seen a database name change before
31+ # the master, so override. This is no problem because we block
32+ # until this database has been created on the master and
33+ # replicated through to this unit.
34+ if requested_db:
35+ connection_settings['database'] = requested_db
36
37 # Block until users and database has replicated, so we know the
38 # connection details we publish are actually valid. This will
39@@ -1877,7 +1885,8 @@
40 connection_settings['database']))
41 time.sleep(10)
42
43- log("Connection settings {!r}".format(connection_settings), DEBUG)
44+ log("Relation {} connection settings {!r}".format(
45+ client_relation, connection_settings), DEBUG)
46 hookenv.relation_set(
47 client_relation, relation_settings=connection_settings)
48
49
50=== modified file 'test.py'
51--- test.py 2014-02-11 09:59:00 +0000
52+++ test.py 2014-02-11 09:59:01 +0000
53@@ -101,10 +101,23 @@
54 # Due to Bug #1191079, we need to send the whole remote command
55 # as a single argument.
56 ' '.join(psql_cmd + psql_args)]
57- out = run(self, cmd, input=sql)
58- result = [line.split(',') for line in out.splitlines()]
59- self.addDetail('sql', text_content(repr((sql, result))))
60- return result
61+ # The hooks are complex, and it is common that relations haven't
62+ # yet been setup despite sleep()ing after juju reports
63+ # everything is up. Increasing the sleep() in jujufixture
64+ # further is just getting crazy, so instead here we loop a few
65+ # times with a shorter wait.
66+ attempts = 0
67+ while True:
68+ attempts += 1
69+ try:
70+ out = run(self, cmd, input=sql)
71+ result = [line.split(',') for line in out.splitlines()]
72+ self.addDetail('sql', text_content(repr((sql, result))))
73+ return result
74+ except subprocess.CalledProcessError:
75+ if attempts >= 3:
76+ raise
77+ time.sleep(30)
78
79 def pg_ctlcluster(self, unit, command):
80 cmd = [

Subscribers

People subscribed via source and target branches