Merge lp:~stub/launchpad/branch-rewrite into lp:launchpad

Proposed by Stuart Bishop
Status: Superseded
Proposed branch: lp:~stub/launchpad/branch-rewrite
Merge into: lp:launchpad
Prerequisite: lp:~stub/launchpad/pgbouncer-fixture
Diff against target: 355 lines (+198/-12)
5 files modified
lib/lp/codehosting/tests/test_rewrite.py (+88/-6)
lib/lp/testing/__init__.py (+23/-0)
lib/lp/testing/fixture.py (+16/-3)
lib/lp/testing/tests/test_fixture.py (+58/-2)
scripts/branch-rewrite.py (+13/-1)
To merge this branch: bzr merge lp:~stub/launchpad/branch-rewrite
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve
Review via email: mp+73284@code.launchpad.net

This proposal has been superseded by a proposal from 2011-08-31.

Description of the change

= Summary =

branch-rewrite.py does not reconnect after a database outage.

== Proposed fix ==

Make it reconnect after a database outage.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (5.1 KiB)

Hi Stuart,

Generally good, but I have a few things that look worth fixing.

=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
--- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000
+++ lib/lp/codehosting/tests/test_rewrite.py 2011-08-29 20:15:59 +0000

@@ -177,7 +181,8 @@
         transaction.commit()
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
         rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
- logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
+ logging_output_lines = self.getLoggerOutput(
+ rewriter).strip().split('\n')

Cleaning out lint, I see. Thanks.

@@ -274,3 +280,62 @@
         # The script produces logging output, but not to stderr.
         self.assertEqual('', err)
         self.assertEqual(expected_lines, output_lines)
+
+
+class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
+ """Ensure branch-rewrite.py survives fastdowntime deploys."""
+ layer = LaunchpadScriptLayer
+
+ def setUp(self):
+ super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
+ self.pgbouncer = PGBouncerFixture()
+ self.addCleanup(self.pgbouncer.cleanUp)
+ self.pgbouncer.setUp()

Couldn't those last three lines be replaced with a simple

    self.pgbouncer = self.useFixture(PGBouncerFixture())

?

In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

+ def spawn(self):
+ script_file = os.path.join(
+ config.root, 'scripts', 'branch-rewrite.py')
+
+ self.rewriter_proc = subprocess.Popen(
+ [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE, bufsize=0)

Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?

Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

+ def request(self, query):
+ self.rewriter_proc.stdin.write(query + '\n')
+ return self.rewriter_proc.stdout.readline().rstrip('\n')

Do we know that the readline() will never hang under reasonable circumstances?

+ def test_disconnect(self):
+ self.spawn()

Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?

+ # Everything should be working, and we get valid output.
+ out = self.request('foo')
+ assert out.endswith('/foo'), out

Why use assert here? Normally you'd say self.assertEndsWith(out, '/foo').

+ self.pgbouncer.stop()
+
+ # Now with pgbouncer down, we should get NULL messages and
+ # stderr spam, and this keeps happening. We test more than
+ # once to ensure that we will keep trying to reconnect even
+ # after several failures.
+ for count in range(5):
+ out = self.request('foo')

Is this race-free? I don't know ...

Read more...

review: Approve
Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (4.7 KiB)

On Wed, Aug 31, 2011 at 10:26 AM, Jeroen T. Vermeulen <email address hidden> wrote:
> === modified file 'lib/lp/codehosting/tests/test_rewrite.py'
> +class TestBranchRewriterScriptHandlesDisconnects(TestCaseWithFactory):
> +    """Ensure branch-rewrite.py survives fastdowntime deploys."""
> +    layer = LaunchpadScriptLayer
> +
> +    def setUp(self):
> +        super(TestBranchRewriterScriptHandlesDisconnects, self).setUp()
> +        self.pgbouncer = PGBouncerFixture()
> +        self.addCleanup(self.pgbouncer.cleanUp)
> +        self.pgbouncer.setUp()
>
> Couldn't those last three lines be replaced with a simple
>
>    self.pgbouncer = self.useFixture(PGBouncerFixture())
>
> ?
>
> In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.

Yes, useFixture is nicer here. I've removed setUp() entirely and am
using useFixture in the tests.

> +    def spawn(self):
> +        script_file = os.path.join(
> +            config.root, 'scripts', 'branch-rewrite.py')
> +
> +        self.rewriter_proc = subprocess.Popen(
> +            [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> +            stderr=subprocess.PIPE, bufsize=0)
>
> Again, is it worth keeping self.rewriter_proc on the test class?  It just hides state.  Why not make it a return value?

If I don't hide state, I have to maintain state and pass it to the
request() helper :-)

> Moreover, is this process guaranteed to clean itself up?  If not, then please ensure that it is.  For instance, you could add something like "self.addCleanup(kill, rewriter_proc)" to spawn().

subprocess module handles this for us as soon as things go out of
scope. But it doesn't hurt to be explicit so I've added the cleanup.

> +    def request(self, query):
> +        self.rewriter_proc.stdin.write(query + '\n')
> +        return self.rewriter_proc.stdout.readline().rstrip('\n')
>
> Do we know that the readline() will never hang under reasonable circumstances?

> +    def test_disconnect(self):
> +        self.spawn()
>
> Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here?  Would it help to say something with "reconnects"?

I've change the test name to test_reconnects_when_disconnected.

> +        # Everything should be working, and we get valid output.
> +        out = self.request('foo')
> +        assert out.endswith('/foo'), out
>
> Why use assert here?  Normally you'd say self.assertEndsWith(out, '/foo').

Because I'm always forgetting which assert methods are approved,
standard, testutils extensions, LP extensions :-)

Fixed, along with the other occurrences.

> +        self.pgbouncer.stop()
> +
> +        # Now with pgbouncer down, we should get NULL messages and
> +        # stderr spam, and this keeps happening. We test more than
> +        # once to ensure that we will keep trying to reconnect even
> +        # after several failures.
> +        for count in range(5):
> +            out = self.request('foo')
>
> Is this race-free?  I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :

readline() blocking shouldn't be a problem, but I've implemented a nonblocking readline helper and made use of it.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I don't know enough to be of much help with the raise-or-swallow issue. What the best choice is there may depend on what goes on higher up in the call tree. If any kind of transactional integrity is expected across the failure, then the exception probably needs to be re-raised.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/codehosting/tests/test_rewrite.py'
2--- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000
3+++ lib/lp/codehosting/tests/test_rewrite.py 2011-08-31 16:53:36 +0000
4@@ -14,15 +14,21 @@
5 from zope.security.proxy import removeSecurityProxy
6
7 from canonical.config import config
8-from canonical.testing.layers import DatabaseFunctionalLayer
9+from canonical.testing.layers import (
10+ DatabaseFunctionalLayer,
11+ DatabaseLayer,
12+ )
13 from lp.code.interfaces.codehosting import branch_id_alias
14 from lp.codehosting.rewrite import BranchRewriter
15 from lp.codehosting.vfs import branch_id_to_path
16 from lp.services.log.logger import BufferLogger
17 from lp.testing import (
18 FakeTime,
19+ nonblocking_readline,
20+ TestCase,
21 TestCaseWithFactory,
22 )
23+from lp.testing.fixture import PGBouncerFixture
24
25
26 class TestBranchRewriter(TestCaseWithFactory):
27@@ -177,7 +183,8 @@
28 transaction.commit()
29 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
30 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
31- logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
32+ logging_output_lines = self.getLoggerOutput(
33+ rewriter).strip().split('\n')
34 self.assertEqual(2, len(logging_output_lines))
35 self.assertIsNot(
36 None,
37@@ -194,7 +201,8 @@
38 self.fake_time.advance(
39 config.codehosting.branch_rewrite_cache_lifetime + 1)
40 rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README')
41- logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n')
42+ logging_output_lines = self.getLoggerOutput(
43+ rewriter).strip().split('\n')
44 self.assertEqual(2, len(logging_output_lines))
45 self.assertIsNot(
46 None,
47@@ -246,7 +254,8 @@
48 # buffering, write a complete line of output.
49 for input_line in input_lines:
50 proc.stdin.write(input_line + '\n')
51- output_lines.append(proc.stdout.readline().rstrip('\n'))
52+ output_lines.append(
53+ nonblocking_readline(proc.stdout, 60).rstrip('\n'))
54 # If we create a new branch after the branch-rewrite.py script has
55 # connected to the database, or edit a branch name that has already
56 # been rewritten, both are rewritten successfully.
57@@ -260,17 +269,90 @@
58 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
59 % branch_id_to_path(new_branch.id))
60 proc.stdin.write(new_branch_input + '\n')
61- output_lines.append(proc.stdout.readline().rstrip('\n'))
62+ output_lines.append(
63+ nonblocking_readline(proc.stdout, 60).rstrip('\n'))
64
65 edited_branch_input = '/%s/.bzr/README' % edited_branch.unique_name
66 expected_lines.append(
67 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README'
68 % branch_id_to_path(edited_branch.id))
69 proc.stdin.write(edited_branch_input + '\n')
70- output_lines.append(proc.stdout.readline().rstrip('\n'))
71+ output_lines.append(
72+ nonblocking_readline(proc.stdout, 60).rstrip('\n'))
73
74 os.kill(proc.pid, signal.SIGINT)
75 err = proc.stderr.read()
76 # The script produces logging output, but not to stderr.
77 self.assertEqual('', err)
78 self.assertEqual(expected_lines, output_lines)
79+
80+
81+class TestBranchRewriterScriptHandlesDisconnects(TestCase):
82+ """Ensure branch-rewrite.py survives fastdowntime deploys."""
83+ layer = DatabaseLayer
84+
85+ def spawn(self):
86+ script_file = os.path.join(
87+ config.root, 'scripts', 'branch-rewrite.py')
88+
89+ self.rewriter_proc = subprocess.Popen(
90+ [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
91+ stderr=subprocess.PIPE, bufsize=0)
92+
93+ self.addCleanup(self.rewriter_proc.terminate)
94+
95+ def request(self, query):
96+ self.rewriter_proc.stdin.write(query + '\n')
97+ self.rewriter_proc.stdin.flush()
98+
99+ # 60 second timeout as we might need to wait for the script to
100+ # finish starting up.
101+ result = nonblocking_readline(self.rewriter_proc.stdout, 60)
102+
103+ if result.endswith('\n'):
104+ return result[:-1]
105+ self.fail(
106+ "Incomplete line or no result retrieved from subprocess: %s"
107+ % repr(result.getvalue()))
108+
109+ def test_reconnects_when_disconnected(self):
110+ pgbouncer = self.useFixture(PGBouncerFixture())
111+
112+ self.spawn()
113+
114+ # Everything should be working, and we get valid output.
115+ out = self.request('foo')
116+ self.assertEndsWith(out, '/foo')
117+
118+ pgbouncer.stop()
119+
120+ # Now with pgbouncer down, we should get NULL messages and
121+ # stderr spam, and this keeps happening. We test more than
122+ # once to ensure that we will keep trying to reconnect even
123+ # after several failures.
124+ for count in range(5):
125+ out = self.request('foo')
126+ self.assertEqual(out, 'NULL')
127+
128+ pgbouncer.start()
129+
130+ # Everything should be working, and we get valid output.
131+ out = self.request('foo')
132+ self.assertEndsWith(out, '/foo')
133+
134+ def test_starts_with_db_down(self):
135+ pgbouncer = self.useFixture(PGBouncerFixture())
136+
137+ # Start with the database down.
138+ pgbouncer.stop()
139+
140+ self.spawn()
141+
142+ for count in range(5):
143+ out = self.request('foo')
144+ self.assertEqual(out, 'NULL')
145+
146+ pgbouncer.start()
147+
148+ out = self.request('foo')
149+ self.assertEndsWith(out, '/foo')
150
151=== modified file 'lib/lp/testing/__init__.py'
152--- lib/lp/testing/__init__.py 2011-08-19 18:20:58 +0000
153+++ lib/lp/testing/__init__.py 2011-08-31 16:53:36 +0000
154@@ -29,6 +29,7 @@
155 'logout',
156 'map_branch_contents',
157 'normalize_whitespace',
158+ 'nonblocking_readline',
159 'oauth_access_token_for',
160 'person_logged_in',
161 'quote_jquery_expression',
162@@ -69,6 +70,7 @@
163 import os
164 from pprint import pformat
165 import re
166+from select import select
167 import shutil
168 import subprocess
169 import sys
170@@ -1325,3 +1327,24 @@
171 def extract_lp_cache(text):
172 match = re.search(r'<script>LP.cache = (\{.*\});</script>', text)
173 return simplejson.loads(match.group(1))
174+
175+
176+def nonblocking_readline(instream, timeout):
177+ """Non-blocking readline.
178+
179+ Files must provide a valid fileno() method. This is a test helper
180+ as it is inefficient and unlikely useful for production code.
181+ """
182+ result = StringIO()
183+ start = now = time.time()
184+ while (now < start + timeout and not result.getvalue().endswith('\n')):
185+ rlist = select([instream], [], [], timeout - (now - start))
186+ if rlist:
187+ # Reading 1 character at a time is inefficient, but means
188+ # we don't need to implement put-back.
189+ next_char = os.read(instream.fileno(), 1)
190+ if next_char == "":
191+ break # EOF
192+ result.write(next_char)
193+ now = time.time()
194+ return result.getvalue()
195
196=== modified file 'lib/lp/testing/fixture.py'
197--- lib/lp/testing/fixture.py 2011-08-31 16:53:35 +0000
198+++ lib/lp/testing/fixture.py 2011-08-31 16:53:36 +0000
199@@ -91,8 +91,7 @@
200
201 # reconnect_store cleanup added first so it is run last, after
202 # the environment variables have been reset.
203- from canonical.testing.layers import reconnect_stores
204- self.addCleanup(reconnect_stores)
205+ self.addCleanup(self._maybe_reconnect_stores)
206
207 # Abuse the PGPORT environment variable to get things connecting
208 # via pgbouncer. Otherwise, we would need to temporarily
209@@ -100,7 +99,21 @@
210 self.useFixture(EnvironmentVariableFixture('PGPORT', str(self.port)))
211
212 # Reset database connections so they go through pgbouncer.
213- reconnect_stores()
214+ self._maybe_reconnect_stores()
215+
216+ def _maybe_reconnect_stores(self):
217+ """Force Storm Stores to reconnect if they are registered.
218+
219+ This is a noop if the Component Architecture is not loaded,
220+ as we are using a test layer that doesn't provide database
221+ connections.
222+ """
223+ from canonical.testing.layers import (
224+ reconnect_stores,
225+ is_ca_available,
226+ )
227+ if is_ca_available():
228+ reconnect_stores()
229
230
231 class ZopeAdapterFixture(Fixture):
232
233=== modified file 'lib/lp/testing/tests/test_fixture.py'
234--- lib/lp/testing/tests/test_fixture.py 2011-08-31 16:53:35 +0000
235+++ lib/lp/testing/tests/test_fixture.py 2011-08-31 16:53:36 +0000
236@@ -8,6 +8,7 @@
237 from textwrap import dedent
238
239 from fixtures import EnvironmentVariableFixture
240+import psycopg2
241 from storm.exceptions import DisconnectionError
242 from zope.component import (
243 adapts,
244@@ -18,8 +19,13 @@
245 Interface,
246 )
247
248+from canonical.config import dbconfig
249 from canonical.launchpad.interfaces.lpstorm import IMasterStore
250-from canonical.testing.layers import BaseLayer, LaunchpadZopelessLayer
251+from canonical.testing.layers import (
252+ BaseLayer,
253+ DatabaseLayer,
254+ LaunchpadZopelessLayer,
255+ )
256 from lp.registry.model.person import Person
257 from lp.testing import TestCase
258 from lp.testing.fixture import (
259@@ -95,7 +101,11 @@
260 self.assertIs(None, queryAdapter(context, IBar))
261
262
263-class TestPGBouncerFixture(TestCase):
264+class TestPGBouncerFixtureWithCA(TestCase):
265+ """PGBouncerFixture reconnect tests for Component Architecture layers.
266+
267+ Registered Storm Stores should be reconnected through pgbouncer.
268+ """
269 layer = LaunchpadZopelessLayer
270
271 def is_connected(self):
272@@ -149,3 +159,49 @@
273
274 # Database is working again.
275 assert self.is_connected()
276+
277+
278+class TestPGBouncerFixtureWithoutCA(TestCase):
279+ """PGBouncerFixture tests for non-Component Architecture layers."""
280+ layer = DatabaseLayer
281+
282+ def is_db_available(self):
283+ # Direct connection to the DB.
284+ con_str = dbconfig.rw_main_master + ' user=launchpad_main'
285+ try:
286+ con = psycopg2.connect(con_str)
287+ cur = con.cursor()
288+ cur.execute("SELECT id FROM Person LIMIT 1")
289+ con.close()
290+ return True
291+ except psycopg2.OperationalError:
292+ return False
293+
294+ def test_install_fixture(self):
295+ self.assert_(self.is_db_available())
296+
297+ with PGBouncerFixture() as pgbouncer:
298+ self.assert_(self.is_db_available())
299+
300+ pgbouncer.stop()
301+ self.assert_(not self.is_db_available())
302+
303+ # This confirms that we are again connecting directly to the
304+ # database, as the pgbouncer process was shutdown.
305+ self.assert_(self.is_db_available())
306+
307+ def test_install_fixture_with_restart(self):
308+ self.assert_(self.is_db_available())
309+
310+ with PGBouncerFixture() as pgbouncer:
311+ self.assert_(self.is_db_available())
312+
313+ pgbouncer.stop()
314+ self.assert_(not self.is_db_available())
315+
316+ pgbouncer.start()
317+ self.assert_(self.is_db_available())
318+
319+ # Note that because pgbouncer was left running, we can't confirm
320+ # that we are now connecting directly to the database.
321+ self.assert_(self.is_db_available())
322
323=== modified file 'scripts/branch-rewrite.py'
324--- scripts/branch-rewrite.py 2010-11-06 12:50:22 +0000
325+++ scripts/branch-rewrite.py 2011-08-31 16:53:36 +0000
326@@ -19,6 +19,8 @@
327
328 from canonical.database.sqlbase import ISOLATION_LEVEL_AUTOCOMMIT
329 from canonical.config import config
330+from canonical.launchpad.interfaces.lpstorm import ISlaveStore
331+from lp.code.model.branch import Branch
332 from lp.codehosting.rewrite import BranchRewriter
333 from lp.services.log.loglevels import INFO, WARNING
334 from lp.services.scripts.base import LaunchpadScript
335@@ -60,9 +62,19 @@
336 return
337 except KeyboardInterrupt:
338 sys.exit()
339- except:
340+ except Exception:
341 self.logger.exception('Exception occurred:')
342 print "NULL"
343+ # The exception might have been a DisconnectionError or
344+ # similar. Cleanup such as database reconnection will
345+ # not happen until the transaction is rolled back.
346+ # XXX StuartBishop 2011-08-31 bug=819282: We are
347+ # explicitly rolling back the store here as a workaround
348+ # instead of using transaction.abort()
349+ try:
350+ ISlaveStore(Branch).rollback()
351+ except Exception:
352+ self.logger.exception('Exception occurred in rollback:')
353
354
355 if __name__ == '__main__':