Merge lp:~stub/launchpad/branch-rewrite into lp:launchpad

Proposed by Stuart Bishop
Status: Merged
Approved by: Stuart Bishop
Approved revision: no longer in the source branch.
Merged at revision: 13863
Proposed branch: lp:~stub/launchpad/branch-rewrite
Merge into: lp:launchpad
Prerequisite: lp:~stub/launchpad/pgbouncer-fixture-noca
Diff against target: 284 lines (+132/-12)
6 files modified
lib/canonical/config/tests/test_database_config.py (+2/-1)
lib/canonical/launchpad/doc/canonical-config.txt (+2/-1)
lib/lp/codehosting/tests/test_rewrite.py (+89/-7)
lib/lp/testing/__init__.py (+24/-0)
lib/lp/testing/tests/test_pgsql.py (+1/-1)
scripts/branch-rewrite.py (+14/-2)
To merge this branch: bzr merge lp:~stub/launchpad/branch-rewrite
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve
Review via email: mp+73563@code.launchpad.net

This proposal supersedes a proposal from 2011-08-29.

Commit message

[r=jtv][bug=836662] Make branch-rewrite.py survive database outages

Description of the change

= Summary =

branch-rewrite.py does not reconnect after a database outage.

== Proposed fix ==

Make it reconnect after a database outage.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote : Posted in a previous version of this proposal
Download full text (5.1 KiB)

Hi Stuart,

Generally good, but I have a few things that look worth fixing.

=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
--- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000
+++ lib/lp/codehosting/tests/test_rewrite.py 2011-08-29 20:15:59 +0000

@@ -177,7 +181,8 @@
         transaction.commit()
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
- logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
+ logging_output_lines = self.getLoggerOutput(
+ rewriter).strip().split('\n')

Cleaning out lint, I see. Thanks.

@@ -274,3 +280,62 @@
         # The script produces logging output, but not to stderr.
         self.assertEqual('', err)
         self.assertEqual(expected_lines, output_lines)
+
+
+class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
+ """Ensure branch-rewrite.py survives fastdowntime deploys."""
+ layer = LaunchpadScriptLayer
+
+ def setUp(self):
+ super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
+ self.pgbouncer = PGBouncerFixture()
+ self.addCleanup(self.pgbouncer.cleanUp)
+ self.pgbouncer.setUp()

Couldn't those last three lines be replaced with a simple

    self.pgbouncer = self.useFixture(PGBouncerFixture())

?

In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

+ def spawn(self):
+ script_file = os.path.join(
+ config.root, 'scripts', 'branch-rewrite.py')
+
+ self.rewriter_proc = subprocess.Popen(
+ [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE, bufsize=0)

Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?

Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

+ def request(self, query):
+ self.rewriter_proc.stdin.write(query + '\n')
+ return self.rewriter_proc.stdout.readline().rstrip('\n')

Do we know that the readline() will never hang under reasonable circumstances?

+ def test_disconnect(self):
+ self.spawn()

Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?

+ # Everything should be working, and we get valid output.
+ out = self.request('foo')
+ assert out.endswith('/foo'), out

Why use assert here? Normally you'd say self.assertEndsWith(out, '/foo').

+ self.pgbouncer.stop()
+
+ # Now with pgbouncer down, we should get NULL messages and
+ # stderr spam, and this keeps happening. We test more than
+ # once to ensure that we will keep trying to reconnect even
+ # after several failures.
+ for count in range(5):
+ out = self.request('foo')

Is this race-free? I don't know ...

Read more...

review: Approve
Revision history for this message
Stuart Bishop (stub) wrote : Posted in a previous version of this proposal
Download full text (4.7 KiB)

On Wed, Aug 31, 2011 at 10:26 AM, Jeroen T. Vermeulen <email address hidden> wrote:
> === modified file 'lib/lp/codehosting/tests/test_rewrite.py'
> +class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
> +    """Ensure branch-rewrite.py survives fastdowntime deploys."""
> +    layer = LaunchpadScriptLayer
> +
> +    def setUp(self):
> +        super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
> +        self.pgbouncer = PGBouncerFixture()
> +        self.addCleanup(self.pgbouncer.cleanUp)
> +        self.pgbouncer.setUp()
>
> Couldn't those last three lines be replaced with a simple
>
>    self.pgbouncer = self.useFixture(PGBouncerFixture())
>
> ?
>
> In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

Yes, useFixture is nicer here. I've removed setUp() entirely and am
using useFixture in the tests.

> +    def spawn(self):
> +        script_file = os.path.join(
> +            config.root, 'scripts', 'branch-rewrite.py')
> +
> +        self.rewriter_proc = subprocess.Popen(
> +            [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> +            stderr=subprocess.PIPE, bufsize=0)
>
> Again, is it worth keeping self.rewriter_proc on the test class?  It just hides state.  Why not make it a return value?

If I don't hide state, I have to maintain state and pass it to the
request() helper :-)

> Moreover, is this process guaranteed to clean itself up?  If not, then please ensure that it is.  For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

subprocess module handles this for us as soon as things go out of
scope. But it doesn't hurt to be explicit so I've added the cleanup.

> +    def request(self, query):
> +        self.rewriter_proc.stdin.write(query + '\n')
> +        return self.rewriter_proc.stdout.readline().rstrip('\n')
>
> Do we know that the readline() will never hang under reasonable circumstances?

> +    def test_disconnect(self):
> +        self.spawn()
>
> Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here?  Would it help to say something with "reconnects"?

I've change the test name to test_reconnects_when_disconnected.

> +        # Everything should be working, and we get valid output.
> +        out = self.request('foo')
> +        assert out.endswith('/foo'), out
>
> Why use assert here?  Normally you'd say self.assertEndsWith(out, '/foo').

Because I'm always forgetting which assert methods are approved,
standard, testutils extensions, LP extensions :-)

Fixed, along with the other occurrences.

> +        self.pgbouncer.stop()
> +
> +        # Now with pgbouncer down, we should get NULL messages and
> +        # stderr spam, and this keeps happening. We test more than
> +        # once to ensure that we will keep trying to reconnect even
> +        # after several failures.
> +        for count in range(5):
> +            out = self.request('foo')
>
> Is this race-free?  I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote : Posted in a previous version of this proposal

readline() blocking shouldn't be a problem, but I've implemented a nonblocking readline helper and made use of it.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote : Posted in a previous version of this proposal

I don't know enough to be of much help with the raise-or-swallow issue. What the best choice is there may depend on what goes on higher up in the call tree. If any kind of transactional integrity is expected across the failure, then the exception probably needs to be re-raised.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks for taking the trouble.

A tip for nonblocking_readline: the time calculations get simpler if you introduce a variable “deadline”:

 * deadline = start + timeout
 * “now < start - timeout” becomes “now < deadline”
 * “timeout - (now - start)” becomes “deadline - now”

(Both in their way involve computing the time remaining, but I'm not sure that making that explicit as well makes things any better.)

Jeroen

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/canonical/config/tests/test_database_config.py'
--- lib/canonical/config/tests/test_database_config.py 2010-10-17 02:38:59 +0000
+++ lib/canonical/config/tests/test_database_config.py 2011-09-03 05:39:35 +0000
@@ -28,7 +28,8 @@
28 self.assertEquals('librarian', config.librarian.dbuser)28 self.assertEquals('librarian', config.librarian.dbuser)
2929
30 dbconfig.setConfigSection('librarian')30 dbconfig.setConfigSection('librarian')
31 expected_db = 'dbname=%s' % DatabaseLayer._db_fixture.dbname31 expected_db = (
32 'dbname=%s host=localhost' % DatabaseLayer._db_fixture.dbname)
32 self.assertEquals(expected_db, dbconfig.rw_main_master)33 self.assertEquals(expected_db, dbconfig.rw_main_master)
33 self.assertEquals('librarian', dbconfig.dbuser)34 self.assertEquals('librarian', dbconfig.dbuser)
3435
3536
=== modified file 'lib/canonical/launchpad/doc/canonical-config.txt'
--- lib/canonical/launchpad/doc/canonical-config.txt 2010-12-22 14:50:08 +0000
+++ lib/canonical/launchpad/doc/canonical-config.txt 2011-09-03 05:39:35 +0000
@@ -15,7 +15,8 @@
1515
16 >>> from canonical.config import config16 >>> from canonical.config import config
17 >>> from canonical.testing.layers import DatabaseLayer17 >>> from canonical.testing.layers import DatabaseLayer
18 >>> expected = 'dbname=%s' % DatabaseLayer._db_fixture.dbname18 >>> expected = (
19 ... 'dbname=%s host=localhost' % DatabaseLayer._db_fixture.dbname)
19 >>> expected == config.database.rw_main_master20 >>> expected == config.database.rw_main_master
20 True21 True
21 >>> config.database.db_statement_timeout is None22 >>> config.database.db_statement_timeout is None
2223
=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
--- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000
+++ lib/lp/codehosting/tests/test_rewrite.py 2011-09-03 05:39:35 +0000
@@ -1,4 +1,4 @@
1# Copyright 2009 Canonical Ltd. This software is licensed under the1# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Tests for the dynamic RewriteMap used to serve branches over HTTP."""4"""Tests for the dynamic RewriteMap used to serve branches over HTTP."""
@@ -14,15 +14,21 @@
14from zope.security.proxy import removeSecurityProxy14from zope.security.proxy import removeSecurityProxy
1515
16from canonical.config import config16from canonical.config import config
17from canonical.testing.layers import DatabaseFunctionalLayer17from canonical.testing.layers import (
18 DatabaseFunctionalLayer,
19 DatabaseLayer,
20 )
18from lp.code.interfaces.codehosting import branch_id_alias21from lp.code.interfaces.codehosting import branch_id_alias
19from lp.codehosting.rewrite import BranchRewriter22from lp.codehosting.rewrite import BranchRewriter
20from lp.codehosting.vfs import branch_id_to_path23from lp.codehosting.vfs import branch_id_to_path
21from lp.services.log.logger import BufferLogger24from lp.services.log.logger import BufferLogger
22from lp.testing import (25from lp.testing import (
23 FakeTime,26 FakeTime,
27 nonblocking_readline,
28 TestCase,
24 TestCaseWithFactory,29 TestCaseWithFactory,
25 )30 )
31from lp.testing.fixture import PGBouncerFixture
2632
2733
28class TestBranchRewriter(TestCaseWithFactory):34class TestBranchRewriter(TestCaseWithFactory):
@@ -177,7 +183,8 @@
177 transaction.commit()183 transaction.commit()
178 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')184 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
179 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')185 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
180 logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')186 logging_output_lines = self.getLoggerOutput(
187 rewriter).strip().split('\n')
181 self.assertEqual(2, len(logging_output_lines))188 self.assertEqual(2, len(logging_output_lines))
182 self.assertIsNot(189 self.assertIsNot(
183 None,190 None,
@@ -194,7 +201,8 @@
194 self.fake_time.advance(201 self.fake_time.advance(
195 config.codehosting.branch_rewrite_cache_lifetime + 1)202 config.codehosting.branch_rewrite_cache_lifetime + 1)
196 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')203 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
197 logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')204 logging_output_lines = self.getLoggerOutput(
205 rewriter).strip().split('\n')
198 self.assertEqual(2, len(logging_output_lines))206 self.assertEqual(2, len(logging_output_lines))
199 self.assertIsNot(207 self.assertIsNot(
200 None,208 None,
@@ -246,7 +254,8 @@
246 # buffering, write a complete line of output.254 # buffering, write a complete line of output.
247 for input_line in input_lines:255 for input_line in input_lines:
248 proc.stdin.write(input_line + '\n')256 proc.stdin.write(input_line + '\n')
249 output_lines.append(proc.stdout.readline().rstrip('\n'))257 output_lines.append(
258 nonblocking_readline(proc.stdout, 60).rstrip('\n'))
250 # If we create a new branch after the branch-rewrite.py script has259 # If we create a new branch after the branch-rewrite.py script has
251 # connected to the database, or edit a branch name that has already260 # connected to the database, or edit a branch name that has already
252 # been rewritten, both are rewritten successfully.261 # been rewritten, both are rewritten successfully.
@@ -260,17 +269,90 @@
260 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'269 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
261 % branch_id_to_path(new_branch.id))270 % branch_id_to_path(new_branch.id))
262 proc.stdin.write(new_branch_input + '\n')271 proc.stdin.write(new_branch_input + '\n')
263 output_lines.append(proc.stdout.readline().rstrip('\n'))272 output_lines.append(
273 nonblocking_readline(proc.stdout, 60).rstrip('\n'))
264274
265 edited_branch_input = '/%s/.bzr/README' % edited_branch.unique_name275 edited_branch_input = '/%s/.bzr/README' % edited_branch.unique_name
266 expected_lines.append(276 expected_lines.append(
267 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'277 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
268 % branch_id_to_path(edited_branch.id))278 % branch_id_to_path(edited_branch.id))
269 proc.stdin.write(edited_branch_input + '\n')279 proc.stdin.write(edited_branch_input + '\n')
270 output_lines.append(proc.stdout.readline().rstrip('\n'))280 output_lines.append(
281 nonblocking_readline(proc.stdout, 60).rstrip('\n'))
271282
272 os.kill(proc.pid, signal.SIGINT)283 os.kill(proc.pid, signal.SIGINT)
273 err = proc.stderr.read()284 err = proc.stderr.read()
274 # The script produces logging output, but not to stderr.285 # The script produces logging output, but not to stderr.
275 self.assertEqual('', err)286 self.assertEqual('', err)
276 self.assertEqual(expected_lines, output_lines)287 self.assertEqual(expected_lines, output_lines)
288
289
290class TestBranchRewriterScriptHandlesDisconnects(TestCase):
291 """Ensure branch-rewrite.py survives fastdowntime deploys."""
292 layer = DatabaseLayer
293
294 def spawn(self):
295 script_file = os.path.join(
296 config.root, 'scripts', 'branch-rewrite.py')
297
298 self.rewriter_proc = subprocess.Popen(
299 [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
300 stderr=subprocess.PIPE, bufsize=0)
301
302 self.addCleanup(self.rewriter_proc.terminate)
303
304 def request(self, query):
305 self.rewriter_proc.stdin.write(query + '\n')
306 self.rewriter_proc.stdin.flush()
307
308 # 60 second timeout as we might need to wait for the script to
309 # finish starting up.
310 result = nonblocking_readline(self.rewriter_proc.stdout, 60)
311
312 if result.endswith('\n'):
313 return result[:-1]
314 self.fail(
315 "Incomplete line or no result retrieved from subprocess: %s"
316 % repr(result.getvalue()))
317
318 def test_reconnects_when_disconnected(self):
319 pgbouncer = self.useFixture(PGBouncerFixture())
320
321 self.spawn()
322
323 # Everything should be working, and we get valid output.
324 out = self.request('foo')
325 self.assertEndsWith(out, '/foo')
326
327 pgbouncer.stop()
328
329 # Now with pgbouncer down, we should get NULL messages and
330 # stderr spam, and this keeps happening. We test more than
331 # once to ensure that we will keep trying to reconnect even
332 # after several failures.
333 for count in range(5):
334 out = self.request('foo')
335 self.assertEqual(out, 'NULL')
336
337 pgbouncer.start()
338
339 # Everything should be working, and we get valid output.
340 out = self.request('foo')
341 self.assertEndsWith(out, '/foo')
342
343 def test_starts_with_db_down(self):
344 pgbouncer = self.useFixture(PGBouncerFixture())
345
346 # Start with the database down.
347 pgbouncer.stop()
348
349 self.spawn()
350
351 for count in range(5):
352 out = self.request('foo')
353 self.assertEqual(out, 'NULL')
354
355 pgbouncer.start()
356
357 out = self.request('foo')
358 self.assertEndsWith(out, '/foo')
277359
=== modified file 'lib/lp/testing/__init__.py'
--- lib/lp/testing/__init__.py 2011-08-19 18:20:58 +0000
+++ lib/lp/testing/__init__.py 2011-09-03 05:39:35 +0000
@@ -29,6 +29,7 @@
29 'logout',29 'logout',
30 'map_branch_contents',30 'map_branch_contents',
31 'normalize_whitespace',31 'normalize_whitespace',
32 'nonblocking_readline',
32 'oauth_access_token_for',33 'oauth_access_token_for',
33 'person_logged_in',34 'person_logged_in',
34 'quote_jquery_expression',35 'quote_jquery_expression',
@@ -69,6 +70,7 @@
69import os70import os
70from pprint import pformat71from pprint import pformat
71import re72import re
73from select import select
72import shutil74import shutil
73import subprocess75import subprocess
74import sys76import sys
@@ -1325,3 +1327,25 @@
1325def extract_lp_cache(text):1327def extract_lp_cache(text):
1326 match = re.search(r'<script>LP.cache = (\{.*\});</script>', text)1328 match = re.search(r'<script>LP.cache = (\{.*\});</script>', text)
1327 return simplejson.loads(match.group(1))1329 return simplejson.loads(match.group(1))
1330
1331
1332def nonblocking_readline(instream, timeout):
1333 """Non-blocking readline.
1334
1335 Files must provide a valid fileno() method. This is a test helper
1336 as it is inefficient and unlikely useful for production code.
1337 """
1338 result = StringIO()
1339 start = now = time.time()
1340 deadline = start + timeout
1341 while (now < deadline and not result.getvalue().endswith('\n')):
1342 rlist = select([instream], [], [], deadline - now)
1343 if rlist:
1344 # Reading 1 character at a time is inefficient, but means
1345 # we don't need to implement put-back.
1346 next_char = os.read(instream.fileno(), 1)
1347 if next_char == "":
1348 break # EOF
1349 result.write(next_char)
1350 now = time.time()
1351 return result.getvalue()
13281352
=== modified file 'lib/lp/testing/tests/test_pgsql.py'
--- lib/lp/testing/tests/test_pgsql.py 2011-02-19 13:50:19 +0000
+++ lib/lp/testing/tests/test_pgsql.py 2011-09-03 05:39:35 +0000
@@ -53,7 +53,7 @@
53 fixture.setUp()53 fixture.setUp()
54 self.addCleanup(fixture.dropDb)54 self.addCleanup(fixture.dropDb)
55 self.addCleanup(fixture.tearDown)55 self.addCleanup(fixture.tearDown)
56 expected_value = 'dbname=%s' % fixture.dbname56 expected_value = 'dbname=%s host=localhost' % fixture.dbname
57 self.assertEqual(expected_value, dbconfig.rw_main_master)57 self.assertEqual(expected_value, dbconfig.rw_main_master)
58 self.assertEqual(expected_value, dbconfig.rw_main_slave)58 self.assertEqual(expected_value, dbconfig.rw_main_slave)
59 with ConfigUseFixture(BaseLayer.appserver_config_name):59 with ConfigUseFixture(BaseLayer.appserver_config_name):
6060
=== modified file 'scripts/branch-rewrite.py'
--- scripts/branch-rewrite.py 2011-09-02 10:50:34 +0000
+++ scripts/branch-rewrite.py 2011-09-03 05:39:35 +0000
@@ -1,6 +1,6 @@
1#!/usr/bin/python -uS1#!/usr/bin/python -uS
2#2#
3# Copyright 2009 Canonical Ltd. This software is licensed under the3# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
4# GNU Affero General Public License version 3 (see the file LICENSE).4# GNU Affero General Public License version 3 (see the file LICENSE).
55
6# pylint: disable-msg=W04036# pylint: disable-msg=W0403
@@ -19,6 +19,8 @@
1919
20from canonical.database.sqlbase import ISOLATION_LEVEL_AUTOCOMMIT20from canonical.database.sqlbase import ISOLATION_LEVEL_AUTOCOMMIT
21from canonical.config import config21from canonical.config import config
22from canonical.launchpad.interfaces.lpstorm import ISlaveStore
23from lp.code.model.branch import Branch
22from lp.codehosting.rewrite import BranchRewriter24from lp.codehosting.rewrite import BranchRewriter
23from lp.services.log.loglevels import INFO, WARNING25from lp.services.log.loglevels import INFO, WARNING
24from lp.services.scripts.base import LaunchpadScript26from lp.services.scripts.base import LaunchpadScript
@@ -60,9 +62,19 @@
60 return62 return
61 except KeyboardInterrupt:63 except KeyboardInterrupt:
62 sys.exit()64 sys.exit()
63 except:65 except Exception:
64 self.logger.exception('Exception occurred:')66 self.logger.exception('Exception occurred:')
65 print "NULL"67 print "NULL"
68 # The exception might have been a DisconnectionError or
69 # similar. Cleanup such as database reconnection will
70 # not happen until the transaction is rolled back.
71 # XXX StuartBishop 2011-08-31 bug=819282: We are
72 # explicitly rolling back the store here as a workaround
73 # instead of using transaction.abort()
74 try:
75 ISlaveStore(Branch).rollback()
76 except Exception:
77 self.logger.exception('Exception occurred in rollback:')
6678
6779
68if __name__ == '__main__':80if __name__ == '__main__':