Merge into devel : branch-rewrite : Code : Launchpad itself

Status:

Merged

Approved by:

Stuart Bishop on 2011-09-03

Approved revision:

no longer in the source branch.

Merged at revision:

13863

Proposed branch:

lp:~stub/launchpad/branch-rewrite

Merge into:

lp:launchpad

Prerequisite:

lp:~stub/launchpad/pgbouncer-fixture-noca

Diff against target:

284 lines (+132/-12)

6 files modified

lib/canonical/config/tests/test_database_config.py (+2/-1)
lib/canonical/launchpad/doc/canonical-config.txt (+2/-1)
lib/lp/codehosting/tests/test_rewrite.py (+89/-7)
lib/lp/testing/__init__.py (+24/-0)
lib/lp/testing/tests/test_pgsql.py (+1/-1)
scripts/branch-rewrite.py (+14/-2)

To merge this branch:

bzr merge lp:~stub/launchpad/branch-rewrite

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Jeroen T. Vermeulen (community)		2011-08-31	Approve on 2011-09-01
Review via email: mp+73563@code.launchpad.net

This proposal supersedes a proposal from 2011-08-29.

Commit message

[r=jtv][bug=836662] Make branch-rewrite.py survive database outages

Description of the change

= Summary =

branch-rewrite.py does not reconnect after a database outage.

== Proposed fix ==

Make it reconnect after a database outage.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-08-31: Posted in a previous version of this proposal

#

Download full text (5.1 KiB)

Hi Stuart,

Generally good, but I have a few things that look worth fixing.

=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
--- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000
+++ lib/lp/codehosting/tests/test_rewrite.py 2011-08-29 20:15:59 +0000

@@ -177,7 +181,8 @@
         transaction.commit()
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
- logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
+ logging_output_lines = self.getLoggerOutput(
+ rewriter).strip().split('\n')

Cleaning out lint, I see. Thanks.

@@ -274,3 +280,62 @@
         # The script produces logging output, but not to stderr.
         self.assertEqual('', err)
         self.assertEqual(expected_lines, output_lines)
+
+
+class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
+ """Ensure branch-rewrite.py survives fastdowntime deploys."""
+ layer = LaunchpadScriptLayer
+
+ def setUp(self):
+ super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
+ self.pgbouncer = PGBouncerFixture()
+ self.addCleanup(self.pgbouncer.cleanUp)
+ self.pgbouncer.setUp()

Couldn't those last three lines be replaced with a simple

self.pgbouncer = self.useFixture(PGBouncerFixture())

?

In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

+ def spawn(self):
+ script_file = os.path.join(
+ config.root, 'scripts', 'branch-rewrite.py')
+
+ self.rewriter_proc = subprocess.Popen(
+ [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE, bufsize=0)

Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?

Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

+ def request(self, query):
+ self.rewriter_proc.stdin.write(query + '\n')
+ return self.rewriter_proc.stdout.readline().rstrip('\n')

Do we know that the readline() will never hang under reasonable circumstances?

+ def test_disconnect(self):
+ self.spawn()

Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?

+ # Everything should be working, and we get valid output.
+ out = self.request('foo')
+ assert out.endswith('/foo'), out

Why use assert here? Normally you'd say self.assertEndsWith(out, '/foo').

+ self.pgbouncer.stop()
+
+ # Now with pgbouncer down, we should get NULL messages and
+ # stderr spam, and this keeps happening. We test more than
+ # once to ensure that we will keep trying to reconnect even
+ # after several failures.
+ for count in range(5):
+ out = self.request('foo')

Is this race-free? I don't know ...

Hi Stuart,

Generally good, but I have a few things that look worth fixing.

=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
--- lib/lp/codehosting/tests/test_rewrite.py	2011-08-12 11:37:08 +0000
+++ lib/lp/codehosting/tests/test_rewrite.py	2011-08-29 20:15:59 +0000

@@ -177,7 +181,8 @@
         transaction.commit()
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
-        logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
+        logging_output_lines = self.getLoggerOutput(
+            rewriter).strip().split('\n')

Cleaning out lint, I see.  Thanks.

@@ -274,3 +280,62 @@
         # The script produces logging output, but not to stderr.
         self.assertEqual('', err)
         self.assertEqual(expected_lines, output_lines)
+
+
+class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
+    """Ensure branch-rewrite.py survives fastdowntime deploys."""
+    layer = LaunchpadScriptLayer
+
+    def setUp(self):
+        super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
+        self.pgbouncer = PGBouncerFixture()
+        self.addCleanup(self.pgbouncer.cleanUp)
+        self.pgbouncer.setUp()

Couldn't those last three lines be replaced with a simple

self.pgbouncer = self.useFixture(PGBouncerFixture())

?

In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

+    def spawn(self):
+        script_file = os.path.join(
+            config.root, 'scripts', 'branch-rewrite.py')
+
+        self.rewriter_proc = subprocess.Popen(
+            [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE, bufsize=0)

Again, is it worth keeping self.rewriter_proc on the test class?  It just hides state.  Why not make it a return value?

Moreover, is this process guaranteed to clean itself up?  If not, then please ensure that it is.  For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

+    def request(self, query):
+        self.rewriter_proc.stdin.write(query + '\n')
+        return self.rewriter_proc.stdout.readline().rstrip('\n')

Do we know that the readline() will never hang under reasonable circumstances?

+    def test_disconnect(self):
+        self.spawn()

Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here?  Would it help to say something with "reconnects"?

+        # Everything should be working, and we get valid output.
+        out = self.request('foo')
+        assert out.endswith('/foo'), out

Why use assert here?  Normally you'd say self.assertEndsWith(out, '/foo').

+        self.pgbouncer.stop()
+
+        # Now with pgbouncer down, we should get NULL messages and
+        # stderr spam, and this keeps happening. We test more than
+        # once to ensure that we will keep trying to reconnect even
+        # after several failures.
+        for count in range(5):
+            out = self.request('foo')

Is this race-free?  I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect?

+            assert out == 'NULL', out

Here too: why assert instead of self.assertEqual?

+        self.pgbouncer.start()

Here too: is this race-free?

+        # Everything should be working, and we get valid output.
+        out = self.request('foo')
+        assert out.endswith('/foo'), out

Here too: why assert instead of self.assertEndsWith?

+    def test_starts_with_db_down(self):
+        self.pgbouncer.stop()
+        self.spawn()
+
+        for count in range(5):
+            out = self.request('foo')
+            assert out == 'NULL', out

Another assert.

=== modified file 'scripts/branch-rewrite.py'
--- scripts/branch-rewrite.py	2010-11-06 12:50:22 +0000
+++ scripts/branch-rewrite.py	2011-08-29 20:15:59 +0000

@@ -60,9 +62,18 @@
                     return
             except KeyboardInterrupt:
                 sys.exit()
-            except:
+            except Exception:

Sensible.  This means you no longer try to do useful things when things go really bad.

self.logger.exception('Exception occurred:')
                 print "NULL"
+                # The exception might have been a DisconnectionError or
+                # similar. Cleanup such as database reconnection will
+                # not happen until the transaction is rolled back. We
+                # are explicitly rolling back the store here instead of
+                # using transaction.abort() due to Bug #819282.

This looks like it should properly be an XXX.  Could you format it as one?  XXX StuartBishop 2011-08-31 bug=819282: etc.

+                try:
+                    ISlaveStore(Branch).rollback()
+                except Exception:
+                    self.logger.exception('Exception occurred in rollback:')

Just to make sure: is the exception swallowed deliberately?  Or was it supposed to be re-raised, or sys.exit() called with an error code or whatever?

Jeroen

review: Approve

Revision history for this message

Stuart Bishop (stub) wrote on 2011-08-31: Posted in a previous version of this proposal

#

Download full text (4.7 KiB)

On Wed, Aug 31, 2011 at 10:26 AM, Jeroen T. Vermeulen <email address hidden> wrote:
> === modified file 'lib/lp/codehosting/tests/test_rewrite.py'
> +class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
> + """Ensure branch-rewrite.py survives fastdowntime deploys."""
> + layer = LaunchpadScriptLayer
> +
> + def setUp(self):
> + super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
> + self.pgbouncer = PGBouncerFixture()
> + self.addCleanup(self.pgbouncer.cleanUp)
> + self.pgbouncer.setUp()
>
> Couldn't those last three lines be replaced with a simple
>
> self.pgbouncer = self.useFixture(PGBouncerFixture())
>
> ?
>
> In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

Yes, useFixture is nicer here. I've removed setUp() entirely and am
using useFixture in the tests.

> + def spawn(self):
> + script_file = os.path.join(
> + config.root, 'scripts', 'branch-rewrite.py')
> +
> + self.rewriter_proc = subprocess.Popen(
> + [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> + stderr=subprocess.PIPE, bufsize=0)
>
> Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?

If I don't hide state, I have to maintain state and pass it to the
request() helper :-)

> Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

subprocess module handles this for us as soon as things go out of
scope. But it doesn't hurt to be explicit so I've added the cleanup.

> + def request(self, query):
> + self.rewriter_proc.stdin.write(query + '\n')
> + return self.rewriter_proc.stdout.readline().rstrip('\n')
>
> Do we know that the readline() will never hang under reasonable circumstances?

> + def test_disconnect(self):
> + self.spawn()
>
> Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?

I've change the test name to test_reconnects_when_disconnected.

> + # Everything should be working, and we get valid output.
> + out = self.request('foo')
> + assert out.endswith('/foo'), out
>
> Why use assert here? Normally you'd say self.assertEndsWith(out, '/foo').

Because I'm always forgetting which assert methods are approved,
standard, testutils extensions, LP extensions :-)

Fixed, along with the other occurrences.

> + self.pgbouncer.stop()
> +
> + # Now with pgbouncer down, we should get NULL messages and
> + # stderr spam, and this keeps happening. We test more than
> + # once to ensure that we will keep trying to reconnect even
> + # after several failures.
> + for count in range(5):
> + out = self.request('foo')
>
> Is this race-free? I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect...

On Wed, Aug 31, 2011 at 10:26 AM, Jeroen T. Vermeulen <jtv@canonical.com> wrote:
> === modified file 'lib/lp/codehosting/tests/test_rewrite.py'
> +class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
> +    """Ensure branch-rewrite.py survives fastdowntime deploys."""
> +    layer = LaunchpadScriptLayer
> +
> +    def setUp(self):
> +        super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
> +        self.pgbouncer = PGBouncerFixture()
> +        self.addCleanup(self.pgbouncer.cleanUp)
> +        self.pgbouncer.setUp()
>
> Couldn't those last three lines be replaced with a simple
>
>    self.pgbouncer = self.useFixture(PGBouncerFixture())
>
> ?
>
> In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

Yes, useFixture is nicer here. I've removed setUp() entirely and am
using useFixture in the tests.

> +    def spawn(self):
> +        script_file = os.path.join(
> +            config.root, 'scripts', 'branch-rewrite.py')
> +
> +        self.rewriter_proc = subprocess.Popen(
> +            [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> +            stderr=subprocess.PIPE, bufsize=0)
>
> Again, is it worth keeping self.rewriter_proc on the test class?  It just hides state.  Why not make it a return value?

If I don't hide state, I have to maintain state and pass it to the
request() helper :-)

> Moreover, is this process guaranteed to clean itself up?  If not, then please ensure that it is.  For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

subprocess module handles this for us as soon as things go out of
scope. But it doesn't hurt to be explicit so I've added the cleanup.

> +    def request(self, query):
> +        self.rewriter_proc.stdin.write(query + '\n')
> +        return self.rewriter_proc.stdout.readline().rstrip('\n')
>
> Do we know that the readline() will never hang under reasonable circumstances?

> +    def test_disconnect(self):
> +        self.spawn()
>
> Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here?  Would it help to say something with "reconnects"?

I've change the test name to test_reconnects_when_disconnected.

> +        # Everything should be working, and we get valid output.
> +        out = self.request('foo')
> +        assert out.endswith('/foo'), out
>
> Why use assert here?  Normally you'd say self.assertEndsWith(out, '/foo').

Because I'm always forgetting which assert methods are approved,
standard, testutils extensions, LP extensions :-)

Fixed, along with the other occurrences.

> +        self.pgbouncer.stop()
> +
> +        # Now with pgbouncer down, we should get NULL messages and
> +        # stderr spam, and this keeps happening. We test more than
> +        # once to ensure that we will keep trying to reconnect even
> +        # after several failures.
> +        for count in range(5):
> +            out = self.request('foo')
>
> Is this race-free?  I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect?

It is race-free. The fixture provided by the pgbouncer package
lifeless put together blocks until the process has started up (and if
it doesn't, it is better to fix that than work around).

> +            assert out == 'NULL', out
>
> Here too: why assert instead of self.assertEqual?

Fixed.

> === modified file 'scripts/branch-rewrite.py'
>                 self.logger.exception('Exception occurred:')
>                 print "NULL"
> +                # The exception might have been a DisconnectionError or
> +                # similar. Cleanup such as database reconnection will
> +                # not happen until the transaction is rolled back. We
> +                # are explicitly rolling back the store here instead of
> +                # using transaction.abort() due to Bug #819282.
>
> This looks like it should properly be an XXX.  Could you format it as one?  XXX StuartBishop 2011-08-31 bug=819282: etc.

Yer, fixed.

> +                try:
> +                    ISlaveStore(Branch).rollback()
> +                except Exception:
> +                    self.logger.exception('Exception occurred in rollback:')
>
> Just to make sure: is the exception swallowed deliberately?  Or was it supposed to be re-raised, or sys.exit() called with an error code or whatever?

I don't know what the best behavior here is. Should we swallow the
failure in rollback, assuming it is a transient database issue? Or
should we die with an exception and hope Apache restarts the process?

-- 
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Revision history for this message

Stuart Bishop (stub) wrote on 2011-08-31: Posted in a previous version of this proposal

#

readline() blocking shouldn't be a problem, but I've implemented a nonblocking readline helper and made use of it.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-09-01: Posted in a previous version of this proposal

#

I don't know enough to be of much help with the raise-or-swallow issue. What the best choice is there may depend on what goes on higher up in the call tree. If any kind of transactional integrity is expected across the failure, then the exception probably needs to be re-raised.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-09-01:

#

Thanks for taking the trouble.

A tip for nonblocking_readline: the time calculations get simpler if you introduce a variable “deadline”:

* deadline = start + timeout
* “now < start - timeout” becomes “now < deadline”
* “timeout - (now - start)” becomes “deadline - now”

(Both in their way involve computing the time remaining, but I'm not sure that making that explicit as well makes things any better.)

Jeroen

review: Approve

Launchpad itself

Merge lp:~stub/launchpad/branch-rewrite into lp:launchpad

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'lib/canonical/config/tests/test_database_config.py'
 --- lib/canonical/config/tests/test_database_config.py	2010-10-17 02:38:59 +0000
 +++ lib/canonical/config/tests/test_database_config.py	2011-09-03 05:39:35 +0000
@@ -28,7 +28,8 @@
          self.assertEquals('librarian', config.librarian.dbuser)
          dbconfig.setConfigSection('librarian')
--        expected_db = 'dbname=%s' % DatabaseLayer._db_fixture.dbname
++        expected_db = (
++            'dbname=%s host=localhost' % DatabaseLayer._db_fixture.dbname)
          self.assertEquals(expected_db, dbconfig.rw_main_master)
          self.assertEquals('librarian', dbconfig.dbuser)
 === modified file 'lib/canonical/launchpad/doc/canonical-config.txt'
 --- lib/canonical/launchpad/doc/canonical-config.txt	2010-12-22 14:50:08 +0000
 +++ lib/canonical/launchpad/doc/canonical-config.txt	2011-09-03 05:39:35 +0000
@@ -15,7 +15,8 @@
      >>> from canonical.config import config
      >>> from canonical.testing.layers import DatabaseLayer
--    >>> expected = 'dbname=%s' % DatabaseLayer._db_fixture.dbname
++    >>> expected = (
++    ...     'dbname=%s host=localhost' % DatabaseLayer._db_fixture.dbname)
      >>> expected == config.database.rw_main_master
      True
      >>> config.database.db_statement_timeout is None
 === modified file 'lib/lp/codehosting/tests/test_rewrite.py'
 --- lib/lp/codehosting/tests/test_rewrite.py	2011-08-12 11:37:08 +0000
 +++ lib/lp/codehosting/tests/test_rewrite.py	2011-09-03 05:39:35 +0000
@@ -1,4 +1,4 @@
--# Copyright 2009 Canonical Ltd.  This software is licensed under the
++# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
  # GNU Affero General Public License version 3 (see the file LICENSE).
  """Tests for the dynamic RewriteMap used to serve branches over HTTP."""
@@ -14,15 +14,21 @@
  from zope.security.proxy import removeSecurityProxy
  from canonical.config import config
--from canonical.testing.layers import DatabaseFunctionalLayer
++from canonical.testing.layers import (
++    DatabaseFunctionalLayer,
++    DatabaseLayer,
++    )
  from lp.code.interfaces.codehosting import branch_id_alias
  from lp.codehosting.rewrite import BranchRewriter
  from lp.codehosting.vfs import branch_id_to_path
  from lp.services.log.logger import BufferLogger
  from lp.testing import (
      FakeTime,
++    nonblocking_readline,
++    TestCase,
      TestCaseWithFactory,
+     )
++from lp.testing.fixture import PGBouncerFixture
  class TestBranchRewriter(TestCaseWithFactory):
@@ -177,7 +183,8 @@
          transaction.commit()
          rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
          rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
--        logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
++        logging_output_lines = self.getLoggerOutput(
++            rewriter).strip().split('\n')
          self.assertEqual(2, len(logging_output_lines))
          self.assertIsNot(
              None,
@@ -194,7 +201,8 @@
          self.fake_time.advance(
              config.codehosting.branch_rewrite_cache_lifetime + 1)
          rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
--        logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
++        logging_output_lines = self.getLoggerOutput(
++            rewriter).strip().split('\n')
          self.assertEqual(2, len(logging_output_lines))
          self.assertIsNot(
              None,
@@ -246,7 +254,8 @@
          # buffering, write a complete line of output.
          for input_line in input_lines:
              proc.stdin.write(input_line + '\n')
--            output_lines.append(proc.stdout.readline().rstrip('\n'))
++            output_lines.append(
++                nonblocking_readline(proc.stdout, 60).rstrip('\n'))
          # If we create a new branch after the branch-rewrite.py script has
          # connected to the database, or edit a branch name that has already
          # been rewritten, both are rewritten successfully.
@@ -260,17 +269,90 @@
              'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
              % branch_id_to_path(new_branch.id))
          proc.stdin.write(new_branch_input + '\n')
--        output_lines.append(proc.stdout.readline().rstrip('\n'))
++        output_lines.append(
++            nonblocking_readline(proc.stdout, 60).rstrip('\n'))
          edited_branch_input = '/%s/.bzr/README' % edited_branch.unique_name
          expected_lines.append(
              'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
              % branch_id_to_path(edited_branch.id))
          proc.stdin.write(edited_branch_input + '\n')
--        output_lines.append(proc.stdout.readline().rstrip('\n'))
++        output_lines.append(
++            nonblocking_readline(proc.stdout, 60).rstrip('\n'))
          os.kill(proc.pid, signal.SIGINT)
          err = proc.stderr.read()
          # The script produces logging output, but not to stderr.
          self.assertEqual('', err)
          self.assertEqual(expected_lines, output_lines)
++
++
++class TestBranchRewriterScriptHandlesDisconnects(TestCase):
++    """Ensure branch-rewrite.py survives fastdowntime deploys."""
++    layer = DatabaseLayer
++
++    def spawn(self):
++        script_file = os.path.join(
++            config.root, 'scripts', 'branch-rewrite.py')
++
++        self.rewriter_proc = subprocess.Popen(
++            [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
++            stderr=subprocess.PIPE, bufsize=0)
++
++        self.addCleanup(self.rewriter_proc.terminate)
++
++    def request(self, query):
++        self.rewriter_proc.stdin.write(query + '\n')
++        self.rewriter_proc.stdin.flush()
++
++        # 60 second timeout as we might need to wait for the script to
++        # finish starting up.
++        result = nonblocking_readline(self.rewriter_proc.stdout, 60)
++
++        if result.endswith('\n'):
++            return result[:-1]
++        self.fail(
++            "Incomplete line or no result retrieved from subprocess: %s"
++            % repr(result.getvalue()))
++
++    def test_reconnects_when_disconnected(self):
++        pgbouncer = self.useFixture(PGBouncerFixture())
++
++        self.spawn()
++
++        # Everything should be working, and we get valid output.
++        out = self.request('foo')
++        self.assertEndsWith(out, '/foo')
++
++        pgbouncer.stop()
++
++        # Now with pgbouncer down, we should get NULL messages and
++        # stderr spam, and this keeps happening. We test more than
++        # once to ensure that we will keep trying to reconnect even
++        # after several failures.
++        for count in range(5):
++            out = self.request('foo')
++            self.assertEqual(out, 'NULL')
++
++        pgbouncer.start()
++
++        # Everything should be working, and we get valid output.
++        out = self.request('foo')
++        self.assertEndsWith(out, '/foo')
++
++    def test_starts_with_db_down(self):
++        pgbouncer = self.useFixture(PGBouncerFixture())
++
++        # Start with the database down.
++        pgbouncer.stop()
++
++        self.spawn()
++
++        for count in range(5):
++            out = self.request('foo')
++            self.assertEqual(out, 'NULL')
++
++        pgbouncer.start()
++
++        out = self.request('foo')
++        self.assertEndsWith(out, '/foo')
 === modified file 'lib/lp/testing/__init__.py'
 --- lib/lp/testing/__init__.py	2011-08-19 18:20:58 +0000
 +++ lib/lp/testing/__init__.py	2011-09-03 05:39:35 +0000
@@ -29,6 +29,7 @@
      'logout',
      'map_branch_contents',
      'normalize_whitespace',
++    'nonblocking_readline',
      'oauth_access_token_for',
      'person_logged_in',
      'quote_jquery_expression',
@@ -69,6 +70,7 @@
  import os
  from pprint import pformat
  import re
++from select import select
  import shutil
  import subprocess
  import sys
@@ -1325,3 +1327,25 @@
  def extract_lp_cache(text):
      match = re.search(r'<script>LP.cache = (\{.*\});</script>', text)
      return simplejson.loads(match.group(1))
++
++
++def nonblocking_readline(instream, timeout):
++    """Non-blocking readline.
++
++    Files must provide a valid fileno() method. This is a test helper
++    as it is inefficient and unlikely useful for production code.
++    """
++    result = StringIO()
++    start = now = time.time()
++    deadline = start + timeout
++    while (now < deadline and not result.getvalue().endswith('\n')):
++        rlist = select([instream], [], [], deadline - now)
++        if rlist:
++            # Reading 1 character at a time is inefficient, but means
++            # we don't need to implement put-back.
++            next_char = os.read(instream.fileno(), 1)
++            if next_char == "":
++                break  # EOF
++            result.write(next_char)
++        now = time.time()
++    return result.getvalue()
 === modified file 'lib/lp/testing/tests/test_pgsql.py'
 --- lib/lp/testing/tests/test_pgsql.py	2011-02-19 13:50:19 +0000
 +++ lib/lp/testing/tests/test_pgsql.py	2011-09-03 05:39:35 +0000
@@ -53,7 +53,7 @@
          fixture.setUp()
          self.addCleanup(fixture.dropDb)
          self.addCleanup(fixture.tearDown)
--        expected_value = 'dbname=%s' % fixture.dbname
++        expected_value = 'dbname=%s host=localhost' % fixture.dbname
          self.assertEqual(expected_value, dbconfig.rw_main_master)
          self.assertEqual(expected_value, dbconfig.rw_main_slave)
          with ConfigUseFixture(BaseLayer.appserver_config_name):
 === modified file 'scripts/branch-rewrite.py'
 --- scripts/branch-rewrite.py	2011-09-02 10:50:34 +0000
 +++ scripts/branch-rewrite.py	2011-09-03 05:39:35 +0000
@@ -1,6 +1,6 @@
  #!/usr/bin/python -uS
+ #
--# Copyright 2009 Canonical Ltd.  This software is licensed under the
++# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
  # GNU Affero General Public License version 3 (see the file LICENSE).
  # pylint: disable-msg=W0403
@@ -19,6 +19,8 @@
  from canonical.database.sqlbase import ISOLATION_LEVEL_AUTOCOMMIT
  from canonical.config import config
++from canonical.launchpad.interfaces.lpstorm import ISlaveStore
++from lp.code.model.branch import Branch
  from lp.codehosting.rewrite import BranchRewriter
  from lp.services.log.loglevels import INFO, WARNING
  from lp.services.scripts.base import LaunchpadScript
@@ -60,9 +62,19 @@
                      return
              except KeyboardInterrupt:
                  sys.exit()
--            except:
++            except Exception:
                  self.logger.exception('Exception occurred:')
                  print "NULL"
++                # The exception might have been a DisconnectionError or
++                # similar. Cleanup such as database reconnection will
++                # not happen until the transaction is rolled back.
++                # XXX StuartBishop 2011-08-31 bug=819282: We are
++                # explicitly rolling back the store here as a workaround
++                # instead of using transaction.abort()
++                try:
++                    ISlaveStore(Branch).rollback()
++                except Exception:
++                    self.logger.exception('Exception occurred in rollback:')
  if __name__ == '__main__':