Merge lp:~stub/launchpad/branch-rewrite into lp:launchpad
- branch-rewrite
- Merge into devel
Status: | Superseded | ||||
---|---|---|---|---|---|
Proposed branch: | lp:~stub/launchpad/branch-rewrite | ||||
Merge into: | lp:launchpad | ||||
Prerequisite: | lp:~stub/launchpad/pgbouncer-fixture | ||||
Diff against target: |
355 lines (+198/-12) 5 files modified
lib/lp/codehosting/tests/test_rewrite.py (+88/-6) lib/lp/testing/__init__.py (+23/-0) lib/lp/testing/fixture.py (+16/-3) lib/lp/testing/tests/test_fixture.py (+58/-2) scripts/branch-rewrite.py (+13/-1) |
||||
To merge this branch: | bzr merge lp:~stub/launchpad/branch-rewrite | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jeroen T. Vermeulen (community) | Approve | ||
Review via email:
|
This proposal has been superseded by a proposal from 2011-08-31.
Commit message
Description of the change
= Summary =
branch-rewrite.py does not reconnect after a database outage.
== Proposed fix ==
Make it reconnect after a database outage.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Stuart Bishop (stub) wrote : | # |
On Wed, Aug 31, 2011 at 10:26 AM, Jeroen T. Vermeulen <email address hidden> wrote:
> === modified file 'lib/lp/
> +class TestBranchRewri
> + """Ensure branch-rewrite.py survives fastdowntime deploys."""
> + layer = LaunchpadScript
> +
> + def setUp(self):
> + super(TestBranc
> + self.pgbouncer = PGBouncerFixture()
> + self.addCleanup
> + self.pgbouncer.
>
> Couldn't those last three lines be replaced with a simple
>
> self.pgbouncer = self.useFixture
>
> ?
>
> In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.
Yes, useFixture is nicer here. I've removed setUp() entirely and am
using useFixture in the tests.
> + def spawn(self):
> + script_file = os.path.join(
> + config.root, 'scripts', 'branch-
> +
> + self.rewriter_proc = subprocess.Popen(
> + [script_file], stdin=subproces
> + stderr=
>
> Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?
If I don't hide state, I have to maintain state and pass it to the
request() helper :-)
> Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanu
subprocess module handles this for us as soon as things go out of
scope. But it doesn't hurt to be explicit so I've added the cleanup.
> + def request(self, query):
> + self.rewriter_
> + return self.rewriter_
>
> Do we know that the readline() will never hang under reasonable circumstances?
> + def test_disconnect
> + self.spawn()
>
> Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?
I've change the test name to test_reconnects
> + # Everything should be working, and we get valid output.
> + out = self.request('foo')
> + assert out.endswith(
>
> Why use assert here? Normally you'd say self.assertEnds
Because I'm always forgetting which assert methods are approved,
standard, testutils extensions, LP extensions :-)
Fixed, along with the other occurrences.
> + self.pgbouncer.
> +
> + # Now with pgbouncer down, we should get NULL messages and
> + # stderr spam, and this keeps happening. We test more than
> + # once to ensure that we will keep trying to reconnect even
> + # after several failures.
> + for count in range(5):
> + out = self.request('foo')
>
> Is this race-free? I don't know if pgbouncer is a separate process or not; if it is, does stop() block until it's taken effect...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Stuart Bishop (stub) wrote : | # |
readline() blocking shouldn't be a problem, but I've implemented a nonblocking readline helper and made use of it.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Jeroen T. Vermeulen (jtv) wrote : | # |
I don't know enough to be of much help with the raise-or-swallow issue. What the best choice is there may depend on what goes on higher up in the call tree. If any kind of transactional integrity is expected across the failure, then the exception probably needs to be re-raised.
Preview Diff
1 | === modified file 'lib/lp/codehosting/tests/test_rewrite.py' |
2 | --- lib/lp/codehosting/tests/test_rewrite.py 2011-08-12 11:37:08 +0000 |
3 | +++ lib/lp/codehosting/tests/test_rewrite.py 2011-08-31 16:53:36 +0000 |
4 | @@ -14,15 +14,21 @@ |
5 | from zope.security.proxy import removeSecurityProxy |
6 | |
7 | from canonical.config import config |
8 | -from canonical.testing.layers import DatabaseFunctionalLayer |
9 | +from canonical.testing.layers import ( |
10 | + DatabaseFunctionalLayer, |
11 | + DatabaseLayer, |
12 | + ) |
13 | from lp.code.interfaces.codehosting import branch_id_alias |
14 | from lp.codehosting.rewrite import BranchRewriter |
15 | from lp.codehosting.vfs import branch_id_to_path |
16 | from lp.services.log.logger import BufferLogger |
17 | from lp.testing import ( |
18 | FakeTime, |
19 | + nonblocking_readline, |
20 | + TestCase, |
21 | TestCaseWithFactory, |
22 | ) |
23 | +from lp.testing.fixture import PGBouncerFixture |
24 | |
25 | |
26 | class TestBranchRewriter(TestCaseWithFactory): |
27 | @@ -177,7 +183,8 @@ |
28 | transaction.commit() |
29 | rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README') |
30 | rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README') |
31 | - logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n') |
32 | + logging_output_lines = self.getLoggerOutput( |
33 | + rewriter).strip().split('\n') |
34 | self.assertEqual(2, len(logging_output_lines)) |
35 | self.assertIsNot( |
36 | None, |
37 | @@ -194,7 +201,8 @@ |
38 | self.fake_time.advance( |
39 | config.codehosting.branch_rewrite_cache_lifetime + 1) |
40 | rewriter.rewriteLine('/' + branch.unique_name + '/.bzr/README') |
41 | - logging_output_lines = self.getLoggerOutput(rewriter).strip().split('\n') |
42 | + logging_output_lines = self.getLoggerOutput( |
43 | + rewriter).strip().split('\n') |
44 | self.assertEqual(2, len(logging_output_lines)) |
45 | self.assertIsNot( |
46 | None, |
47 | @@ -246,7 +254,8 @@ |
48 | # buffering, write a complete line of output. |
49 | for input_line in input_lines: |
50 | proc.stdin.write(input_line + '\n') |
51 | - output_lines.append(proc.stdout.readline().rstrip('\n')) |
52 | + output_lines.append( |
53 | + nonblocking_readline(proc.stdout, 60).rstrip('\n')) |
54 | # If we create a new branch after the branch-rewrite.py script has |
55 | # connected to the database, or edit a branch name that has already |
56 | # been rewritten, both are rewritten successfully. |
57 | @@ -260,17 +269,90 @@ |
58 | 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README' |
59 | % branch_id_to_path(new_branch.id)) |
60 | proc.stdin.write(new_branch_input + '\n') |
61 | - output_lines.append(proc.stdout.readline().rstrip('\n')) |
62 | + output_lines.append( |
63 | + nonblocking_readline(proc.stdout, 60).rstrip('\n')) |
64 | |
65 | edited_branch_input = '/%s/.bzr/README' % edited_branch.unique_name |
66 | expected_lines.append( |
67 | 'file:///var/tmp/bazaar.launchpad.dev/mirrors/%s/.bzr/README' |
68 | % branch_id_to_path(edited_branch.id)) |
69 | proc.stdin.write(edited_branch_input + '\n') |
70 | - output_lines.append(proc.stdout.readline().rstrip('\n')) |
71 | + output_lines.append( |
72 | + nonblocking_readline(proc.stdout, 60).rstrip('\n')) |
73 | |
74 | os.kill(proc.pid, signal.SIGINT) |
75 | err = proc.stderr.read() |
76 | # The script produces logging output, but not to stderr. |
77 | self.assertEqual('', err) |
78 | self.assertEqual(expected_lines, output_lines) |
79 | + |
80 | + |
81 | +class TestBranchRewriterScriptHandlesDisconnects(TestCase): |
82 | + """Ensure branch-rewrite.py survives fastdowntime deploys.""" |
83 | + layer = DatabaseLayer |
84 | + |
85 | + def spawn(self): |
86 | + script_file = os.path.join( |
87 | + config.root, 'scripts', 'branch-rewrite.py') |
88 | + |
89 | + self.rewriter_proc = subprocess.Popen( |
90 | + [script_file], stdin=subprocess.PIPE, stdout=subprocess.PIPE, |
91 | + stderr=subprocess.PIPE, bufsize=0) |
92 | + |
93 | + self.addCleanup(self.rewriter_proc.terminate) |
94 | + |
95 | + def request(self, query): |
96 | + self.rewriter_proc.stdin.write(query + '\n') |
97 | + self.rewriter_proc.stdin.flush() |
98 | + |
99 | + # 60 second timeout as we might need to wait for the script to |
100 | + # finish starting up. |
101 | + result = nonblocking_readline(self.rewriter_proc.stdout, 60) |
102 | + |
103 | + if result.endswith('\n'): |
104 | + return result[:-1] |
105 | + self.fail( |
106 | + "Incomplete line or no result retrieved from subprocess: %s" |
107 | + % repr(result.getvalue())) |
108 | + |
109 | + def test_reconnects_when_disconnected(self): |
110 | + pgbouncer = self.useFixture(PGBouncerFixture()) |
111 | + |
112 | + self.spawn() |
113 | + |
114 | + # Everything should be working, and we get valid output. |
115 | + out = self.request('foo') |
116 | + self.assertEndsWith(out, '/foo') |
117 | + |
118 | + pgbouncer.stop() |
119 | + |
120 | + # Now with pgbouncer down, we should get NULL messages and |
121 | + # stderr spam, and this keeps happening. We test more than |
122 | + # once to ensure that we will keep trying to reconnect even |
123 | + # after several failures. |
124 | + for count in range(5): |
125 | + out = self.request('foo') |
126 | + self.assertEqual(out, 'NULL') |
127 | + |
128 | + pgbouncer.start() |
129 | + |
130 | + # Everything should be working, and we get valid output. |
131 | + out = self.request('foo') |
132 | + self.assertEndsWith(out, '/foo') |
133 | + |
134 | + def test_starts_with_db_down(self): |
135 | + pgbouncer = self.useFixture(PGBouncerFixture()) |
136 | + |
137 | + # Start with the database down. |
138 | + pgbouncer.stop() |
139 | + |
140 | + self.spawn() |
141 | + |
142 | + for count in range(5): |
143 | + out = self.request('foo') |
144 | + self.assertEqual(out, 'NULL') |
145 | + |
146 | + pgbouncer.start() |
147 | + |
148 | + out = self.request('foo') |
149 | + self.assertEndsWith(out, '/foo') |
150 | |
151 | === modified file 'lib/lp/testing/__init__.py' |
152 | --- lib/lp/testing/__init__.py 2011-08-19 18:20:58 +0000 |
153 | +++ lib/lp/testing/__init__.py 2011-08-31 16:53:36 +0000 |
154 | @@ -29,6 +29,7 @@ |
155 | 'logout', |
156 | 'map_branch_contents', |
157 | 'normalize_whitespace', |
158 | + 'nonblocking_readline', |
159 | 'oauth_access_token_for', |
160 | 'person_logged_in', |
161 | 'quote_jquery_expression', |
162 | @@ -69,6 +70,7 @@ |
163 | import os |
164 | from pprint import pformat |
165 | import re |
166 | +from select import select |
167 | import shutil |
168 | import subprocess |
169 | import sys |
170 | @@ -1325,3 +1327,24 @@ |
171 | def extract_lp_cache(text): |
172 | match = re.search(r'<script>LP.cache = (\{.*\});</script>', text) |
173 | return simplejson.loads(match.group(1)) |
174 | + |
175 | + |
176 | +def nonblocking_readline(instream, timeout): |
177 | + """Non-blocking readline. |
178 | + |
179 | + Files must provide a valid fileno() method. This is a test helper |
180 | + as it is inefficient and unlikely useful for production code. |
181 | + """ |
182 | + result = StringIO() |
183 | + start = now = time.time() |
184 | + while (now < start + timeout and not result.getvalue().endswith('\n')): |
185 | + rlist = select([instream], [], [], timeout - (now - start)) |
186 | + if rlist: |
187 | + # Reading 1 character at a time is inefficient, but means |
188 | + # we don't need to implement put-back. |
189 | + next_char = os.read(instream.fileno(), 1) |
190 | + if next_char == "": |
191 | + break # EOF |
192 | + result.write(next_char) |
193 | + now = time.time() |
194 | + return result.getvalue() |
195 | |
196 | === modified file 'lib/lp/testing/fixture.py' |
197 | --- lib/lp/testing/fixture.py 2011-08-31 16:53:35 +0000 |
198 | +++ lib/lp/testing/fixture.py 2011-08-31 16:53:36 +0000 |
199 | @@ -91,8 +91,7 @@ |
200 | |
201 | # reconnect_store cleanup added first so it is run last, after |
202 | # the environment variables have been reset. |
203 | - from canonical.testing.layers import reconnect_stores |
204 | - self.addCleanup(reconnect_stores) |
205 | + self.addCleanup(self._maybe_reconnect_stores) |
206 | |
207 | # Abuse the PGPORT environment variable to get things connecting |
208 | # via pgbouncer. Otherwise, we would need to temporarily |
209 | @@ -100,7 +99,21 @@ |
210 | self.useFixture(EnvironmentVariableFixture('PGPORT', str(self.port))) |
211 | |
212 | # Reset database connections so they go through pgbouncer. |
213 | - reconnect_stores() |
214 | + self._maybe_reconnect_stores() |
215 | + |
216 | + def _maybe_reconnect_stores(self): |
217 | + """Force Storm Stores to reconnect if they are registered. |
218 | + |
219 | + This is a noop if the Component Architecture is not loaded, |
220 | + as we are using a test layer that doesn't provide database |
221 | + connections. |
222 | + """ |
223 | + from canonical.testing.layers import ( |
224 | + reconnect_stores, |
225 | + is_ca_available, |
226 | + ) |
227 | + if is_ca_available(): |
228 | + reconnect_stores() |
229 | |
230 | |
231 | class ZopeAdapterFixture(Fixture): |
232 | |
233 | === modified file 'lib/lp/testing/tests/test_fixture.py' |
234 | --- lib/lp/testing/tests/test_fixture.py 2011-08-31 16:53:35 +0000 |
235 | +++ lib/lp/testing/tests/test_fixture.py 2011-08-31 16:53:36 +0000 |
236 | @@ -8,6 +8,7 @@ |
237 | from textwrap import dedent |
238 | |
239 | from fixtures import EnvironmentVariableFixture |
240 | +import psycopg2 |
241 | from storm.exceptions import DisconnectionError |
242 | from zope.component import ( |
243 | adapts, |
244 | @@ -18,8 +19,13 @@ |
245 | Interface, |
246 | ) |
247 | |
248 | +from canonical.config import dbconfig |
249 | from canonical.launchpad.interfaces.lpstorm import IMasterStore |
250 | -from canonical.testing.layers import BaseLayer, LaunchpadZopelessLayer |
251 | +from canonical.testing.layers import ( |
252 | + BaseLayer, |
253 | + DatabaseLayer, |
254 | + LaunchpadZopelessLayer, |
255 | + ) |
256 | from lp.registry.model.person import Person |
257 | from lp.testing import TestCase |
258 | from lp.testing.fixture import ( |
259 | @@ -95,7 +101,11 @@ |
260 | self.assertIs(None, queryAdapter(context, IBar)) |
261 | |
262 | |
263 | -class TestPGBouncerFixture(TestCase): |
264 | +class TestPGBouncerFixtureWithCA(TestCase): |
265 | + """PGBouncerFixture reconnect tests for Component Architecture layers. |
266 | + |
267 | + Registered Storm Stores should be reconnected through pgbouncer. |
268 | + """ |
269 | layer = LaunchpadZopelessLayer |
270 | |
271 | def is_connected(self): |
272 | @@ -149,3 +159,49 @@ |
273 | |
274 | # Database is working again. |
275 | assert self.is_connected() |
276 | + |
277 | + |
278 | +class TestPGBouncerFixtureWithoutCA(TestCase): |
279 | + """PGBouncerFixture tests for non-Component Architecture layers.""" |
280 | + layer = DatabaseLayer |
281 | + |
282 | + def is_db_available(self): |
283 | + # Direct connection to the DB. |
284 | + con_str = dbconfig.rw_main_master + ' user=launchpad_main' |
285 | + try: |
286 | + con = psycopg2.connect(con_str) |
287 | + cur = con.cursor() |
288 | + cur.execute("SELECT id FROM Person LIMIT 1") |
289 | + con.close() |
290 | + return True |
291 | + except psycopg2.OperationalError: |
292 | + return False |
293 | + |
294 | + def test_install_fixture(self): |
295 | + self.assert_(self.is_db_available()) |
296 | + |
297 | + with PGBouncerFixture() as pgbouncer: |
298 | + self.assert_(self.is_db_available()) |
299 | + |
300 | + pgbouncer.stop() |
301 | + self.assert_(not self.is_db_available()) |
302 | + |
303 | + # This confirms that we are again connecting directly to the |
304 | + # database, as the pgbouncer process was shutdown. |
305 | + self.assert_(self.is_db_available()) |
306 | + |
307 | + def test_install_fixture_with_restart(self): |
308 | + self.assert_(self.is_db_available()) |
309 | + |
310 | + with PGBouncerFixture() as pgbouncer: |
311 | + self.assert_(self.is_db_available()) |
312 | + |
313 | + pgbouncer.stop() |
314 | + self.assert_(not self.is_db_available()) |
315 | + |
316 | + pgbouncer.start() |
317 | + self.assert_(self.is_db_available()) |
318 | + |
319 | + # Note that because pgbouncer was left running, we can't confirm |
320 | + # that we are now connecting directly to the database. |
321 | + self.assert_(self.is_db_available()) |
322 | |
323 | === modified file 'scripts/branch-rewrite.py' |
324 | --- scripts/branch-rewrite.py 2010-11-06 12:50:22 +0000 |
325 | +++ scripts/branch-rewrite.py 2011-08-31 16:53:36 +0000 |
326 | @@ -19,6 +19,8 @@ |
327 | |
328 | from canonical.database.sqlbase import ISOLATION_LEVEL_AUTOCOMMIT |
329 | from canonical.config import config |
330 | +from canonical.launchpad.interfaces.lpstorm import ISlaveStore |
331 | +from lp.code.model.branch import Branch |
332 | from lp.codehosting.rewrite import BranchRewriter |
333 | from lp.services.log.loglevels import INFO, WARNING |
334 | from lp.services.scripts.base import LaunchpadScript |
335 | @@ -60,9 +62,19 @@ |
336 | return |
337 | except KeyboardInterrupt: |
338 | sys.exit() |
339 | - except: |
340 | + except Exception: |
341 | self.logger.exception('Exception occurred:') |
342 | print "NULL" |
343 | + # The exception might have been a DisconnectionError or |
344 | + # similar. Cleanup such as database reconnection will |
345 | + # not happen until the transaction is rolled back. |
346 | + # XXX StuartBishop 2011-08-31 bug=819282: We are |
347 | + # explicitly rolling back the store here as a workaround |
348 | + # instead of using transaction.abort() |
349 | + try: |
350 | + ISlaveStore(Branch).rollback() |
351 | + except Exception: |
352 | + self.logger.exception('Exception occurred in rollback:') |
353 | |
354 | |
355 | if __name__ == '__main__': |
Hi Stuart,
Generally good, but I have a few things that look worth fixing.
=== modified file 'lib/lp/ codehosting/ tests/test_ rewrite. py' codehosting/ tests/test_ rewrite. py 2011-08-12 11:37:08 +0000 codehosting/ tests/test_ rewrite. py 2011-08-29 20:15:59 +0000
--- lib/lp/
+++ lib/lp/
@@ -177,7 +181,8 @@
transaction. commit( )
rewriter. rewriteLine( '/' + branch.unique_name + '/.bzr/README')
rewriter. rewriteLine( '/' + branch.unique_name + '/.bzr/README') output_ lines = self.getLoggerO utput(rewriter) .strip( ).split( '\n') output_ lines = self.getLoggerO utput( .strip( ).split( '\n')
- logging_
+ logging_
+ rewriter)
Cleaning out lint, I see. Thanks.
@@ -274,3 +280,62 @@
self. assertEqual( '', err)
self. assertEqual( expected_ lines, output_lines) terScriptHandle sDisconnects( TestCaseWithFac tory): Layer hRewriterScript HandlesDisconne cts, self).setUp() (self.pgbouncer .cleanUp) setUp()
# The script produces logging output, but not to stderr.
+
+
+class TestBranchRewri
+ """Ensure branch-rewrite.py survives fastdowntime deploys."""
+ layer = LaunchpadScript
+
+ def setUp(self):
+ super(TestBranc
+ self.pgbouncer = PGBouncerFixture()
+ self.addCleanup
+ self.pgbouncer.
Couldn't those last three lines be replaced with a simple
self.pgbouncer = self.useFixture (PGBouncerFixtu re())
?
In fact I'm not even sure it's worth a setUp with the super() dance, and carrying self.pgbouncer from setUp to the tests.
+ def spawn(self): rewrite. py') s.PIPE, stdout= subprocess. PIPE, subprocess. PIPE, bufsize=0)
+ script_file = os.path.join(
+ config.root, 'scripts', 'branch-
+
+ self.rewriter_proc = subprocess.Popen(
+ [script_file], stdin=subproces
+ stderr=
Again, is it worth keeping self.rewriter_proc on the test class? It just hides state. Why not make it a return value?
Moreover, is this process guaranteed to clean itself up? If not, then please ensure that it is. For instance, you could add something like "self.addCleanu p(kill, rewriter_proc)" to spawn().
+ def request(self, query): proc.stdin. write(query + '\n') proc.stdout. readline( ).rstrip( '\n')
+ self.rewriter_
+ return self.rewriter_
Do we know that the readline() will never hang under reasonable circumstances?
+ def test_disconnect (self):
+ self.spawn()
Sure "disconnect" is something that happens in this test, but is that one word really a good description of what you're testing here? Would it help to say something with "reconnects"?
+ # Everything should be working, and we get valid output. '/foo') , out
+ out = self.request('foo')
+ assert out.endswith(
Why use assert here? Normally you'd say self.assertEnds With(out, '/foo').
+ self.pgbouncer. stop()
+
+ # Now with pgbouncer down, we should get NULL messages and
+ # stderr spam, and this keeps happening. We test more than
+ # once to ensure that we will keep trying to reconnect even
+ # after several failures.
+ for count in range(5):
+ out = self.request('foo')
Is this race-free? I don't know ...