Merge lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units into lp:charms/postgresql

Proposed by Stuart Bishop
Status: Merged
Merged at revision: 87
Proposed branch: lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units
Merge into: lp:charms/postgresql
Prerequisite: lp:~stub/charms/precise/postgresql/pg93
Diff against target: 80 lines (+30/-8)
2 files modified
hooks/hooks.py (+13/-4)
test.py (+17/-4)
To merge this branch: bzr merge lp:~stub/charms/precise/postgresql/bug-1278731-hot-standby-allowed-units
Reviewer Review Type Date Requested Status
Marco Ceppi (community) Approve
Review via email: mp+205725@code.launchpad.net

Description of the change

Fix for Bug #1278731.

It was possible for hot standby units to have their db relation hooks run before the master, in which case they would copy obsolete connection details from the master. Clients following the db relation protocol would rightfully ignore the obsolete credentials (in this case, an invalid database name) and wait for the server unit to catch up (which it never did).

Now the hot standby explicitly overrides the database name in this case, and blocks waiting for the master to create the database and for it to replicate.

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

It seems that with this fix, along with a little more sleep()ing, the replication tests are no longer flaky with the local provider. We might need to increase the sleeps further for other providers.

Revision history for this message
Marco Ceppi (marcoceppi) wrote :

LGTM, +1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'hooks/hooks.py'
--- hooks/hooks.py 2014-02-11 09:59:00 +0000
+++ hooks/hooks.py 2014-02-11 09:59:01 +0000
@@ -248,7 +248,8 @@
248248
249def run(command, exit_on_error=True, quiet=False):249def run(command, exit_on_error=True, quiet=False):
250 '''Run a command and return the output.'''250 '''Run a command and return the output.'''
251 log("Running {!r}".format(command), DEBUG)251 if not quiet:
252 log("Running {!r}".format(command), DEBUG)
252 p = subprocess.Popen(253 p = subprocess.Popen(
253 command, stdin=subprocess.PIPE, stdout=subprocess.PIPE,254 command, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
254 shell=isinstance(command, basestring))255 shell=isinstance(command, basestring))
@@ -1372,8 +1373,8 @@
13721373
1373 # Update the peer relation, notifying any hot standby units1374 # Update the peer relation, notifying any hot standby units
1374 # to republish connection details to the client relation.1375 # to republish connection details to the client relation.
1375 local_state['client_relations'] = ' '.join(1376 local_state['client_relations'] = ' '.join(sorted(
1376 hookenv.relation_ids('db') + hookenv.relation_ids('db-admin'))1377 hookenv.relation_ids('db') + hookenv.relation_ids('db-admin')))
1377 log("Client relations {}".format(local_state['client_relations']))1378 log("Client relations {}".format(local_state['client_relations']))
1378 local_state.publish()1379 local_state.publish()
13791380
@@ -1861,6 +1862,13 @@
1861 connection_settings['host'] = hookenv.unit_private_ip()1862 connection_settings['host'] = hookenv.unit_private_ip()
1862 connection_settings['port'] = get_service_port()1863 connection_settings['port'] = get_service_port()
1863 connection_settings['state'] = local_state['state']1864 connection_settings['state'] = local_state['state']
1865 requested_db = hookenv.relation_get('database')
1866 # A hot standby might have seen a database name change before
1867 # the master, so override. This is no problem because we block
1868 # until this database has been created on the master and
1869 # replicated through to this unit.
1870 if requested_db:
1871 connection_settings['database'] = requested_db
18641872
1865 # Block until users and database has replicated, so we know the1873 # Block until users and database has replicated, so we know the
1866 # connection details we publish are actually valid. This will1874 # connection details we publish are actually valid. This will
@@ -1877,7 +1885,8 @@
1877 connection_settings['database']))1885 connection_settings['database']))
1878 time.sleep(10)1886 time.sleep(10)
18791887
1880 log("Connection settings {!r}".format(connection_settings), DEBUG)1888 log("Relation {} connection settings {!r}".format(
1889 client_relation, connection_settings), DEBUG)
1881 hookenv.relation_set(1890 hookenv.relation_set(
1882 client_relation, relation_settings=connection_settings)1891 client_relation, relation_settings=connection_settings)
18831892
18841893
=== modified file 'test.py'
--- test.py 2014-02-11 09:59:00 +0000
+++ test.py 2014-02-11 09:59:01 +0000
@@ -101,10 +101,23 @@
101 # Due to Bug #1191079, we need to send the whole remote command101 # Due to Bug #1191079, we need to send the whole remote command
102 # as a single argument.102 # as a single argument.
103 ' '.join(psql_cmd + psql_args)]103 ' '.join(psql_cmd + psql_args)]
104 out = run(self, cmd, input=sql)104 # The hooks are complex, and it is common that relations haven't
105 result = [line.split(',') for line in out.splitlines()]105 # yet been setup despite sleep()ing after juju reports
106 self.addDetail('sql', text_content(repr((sql, result))))106 # everything is up. Increasing the sleep() in jujufixture
107 return result107 # further is just getting crazy, so instead here we loop a few
108 # times with a shorter wait.
109 attempts = 0
110 while True:
111 attempts += 1
112 try:
113 out = run(self, cmd, input=sql)
114 result = [line.split(',') for line in out.splitlines()]
115 self.addDetail('sql', text_content(repr((sql, result))))
116 return result
117 except subprocess.CalledProcessError:
118 if attempts >= 3:
119 raise
120 time.sleep(30)
108121
109 def pg_ctlcluster(self, unit, command):122 def pg_ctlcluster(self, unit, command):
110 cmd = [123 cmd = [

Subscribers

People subscribed via source and target branches