Merge lp:~jtv/launchpad/bug-717969 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Merged at revision: 14459
Proposed branch: lp:~jtv/launchpad/bug-717969
Merge into: lp:launchpad
Diff against target: 1482 lines (+430/-237)
12 files modified
lib/lp/buildmaster/interfaces/builder.py (+6/-4)
lib/lp/buildmaster/manager.py (+78/-50)
lib/lp/buildmaster/model/builder.py (+17/-8)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+63/-33)
lib/lp/buildmaster/model/packagebuild.py (+91/-57)
lib/lp/buildmaster/tests/test_builder.py (+1/-1)
lib/lp/buildmaster/tests/test_manager.py (+127/-40)
lib/lp/buildmaster/tests/test_packagebuild.py (+10/-20)
lib/lp/code/model/tests/test_sourcepackagerecipebuild.py (+6/-8)
lib/lp/services/database/transaction_policy.py (+5/-2)
lib/lp/soyuz/tests/test_binarypackagebuild.py (+11/-4)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+15/-10)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-717969
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
Review via email: mp+83022@code.launchpad.net

Commit message

Read-only transactions by default in buildmaster.

Description of the change

= Bug 717969 =

This is an oversized branch. Don't just pick it up out of a sense of duty; if it's too big or you're not certain about the domain-specific stuff, contact one of the Soyuz experts — Julian, Steve, or William.

The buildmaster was showing very strange symptoms. In a nutshell, cases where observed where it went through steps A, B, and C; we could see that steps A and C were completed successfully; but the expected results from step B were just not there.

The cause turns out to be a dangerous mix of global state: our database transaction management is global to a thread (all of the thread's changes are committed or all are aborted) but our use of twisted makes a single python thread/process flit back and forth between servicing multiple logical threads of execution.

Each of those threads of execution might, from its point of view: make some changes to ORM-backed objects, then initiate an asynchronous request (talking to the librarian or a build slave), block for the result to arrive, make some more ORM changes, block again, and later commit or abort the database transaction.

That is purely from the point of view of a logical thread of execution. Twisted experts are quick to point out that “nothing blocks” in Twisted, but that is exactly the problem: while one logical thread of execution “blocks,” another gets to execute. And who's to say whether that other one might not commit or abort the first one's changes prematurely? With the numbers of builders we drive, it's bound to happen regularly.

We discussed a few architectural solutions: farming off the ORM changes to a worker thread, doing all the work in threads, and so on. The real problem is that the ORM changes are inlined in the buildmaster, but largely hidden away in the various build-related class hierarchies. Twisted is meant for glue between systems, not for interleaving complex and dispersed stateful machinery with the glue's callback structure.

Ideally, we should separate the “glue” from the “machinery.” But it's complex and fragile, so the next step from here is to get the code under control to the point where we can reason reliably about it. Once we know more about what the code really does, we'll have more freedom to reorganize it.

This branch represents that next step from here: make the buildmaster run in read-only transaction policy. It will not be able to make any changes to the database (or ORM-backed objects), except in sections of code that are explicitly wrapped in read-write transaction policies. Now we know exactly where the buildmaster may change the database — doing it anywhere else would be an error. We keep the read-write sections absolutely minimal, and try to avoid method calls where Twisted context switches might hide. Unfortunately neither Twisted nor Storm seems to have a facility for forbidding them altogether.

Every read-write section commits immediately. That means more fine-grained state changes. I can't be 100% certain that early commits will not produce any unwanted object states, and I can't be 100% certain that the read-write policies cover all code paths that need them. But we've run several types of build through the patched build farm, and we've stress-tested it against about 5 concurrent builders working all-out. As expected, we found code that still lacked an explicit read-write policy — but only one of them. There may be more, but only production use will find them all.

For Q/A, we once again perform builds on staging and/or dogfood, of as many kinds as we can. Include concurrent builds, and successes as well as failures. Verify that the build products (tarballs, packages, changes files, build logs) all go into the librarian, and that the builds end up in a proper terminal state. The contents of the build logs should be consistent with that end state.

As for tests… sorry, I just run them all!

No lint that I could help, copyrights updated, imports formatted as per policy. Thank you for reading this far.

Jeroen

To post a comment you must log in.
Revision history for this message
Steve Kowalik (stevenk) wrote :

Please also link bug 785679 to this MP, you've fixed it too since you've re-enabled the test.

This broadly looks okay, although I will admit to a large sense of fear and trepidation of this hitting production. You're absolutely right, though, the buildd-manager needs more tests and less magic.

I am wary of the amount of times you are calling transaction.commit() in this branch. Especially in tests, where you are calling into methods that could commit() first and avoid all that duplication.

I am holding off my vote for now, this branch requires some thought. And possibly a large amount of vodka.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

> Please also link bug 785679 to this MP, you've fixed it too since you've re-
> enabled the test.

Done. Apparently the test didn't use our custom setup for a Twisted-enabled test.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Thanks a *bunch* for tackling this, I realise it's a bit soul-sapping. If it helps, know that I have spent most of the day going through your code to make sure it's all OK :)

319 + transaction.commit()
320 + with DatabaseTransactionPolicy(read_only=False):
321 + library_file = getUtility(ILibraryFileAliasSet).create(
322 + filename, bytes_written, out_file,
323 + contentType=filenameToContentType(filename),
324 + restricted=private)
325 + transaction.commit()

Should that commit at 319 be an abort()? Or are you trying to make it bomb out deliberately if there's data getting written where it should not be?

349 transaction.commit()
350 + with DatabaseTransactionPolicy(read_only=False):
351 + candidate.markAsBuilding(self)
352 + transaction.commit()

Same here.

And in fact in quite a few places so I won't list them all.

586 -from lp.registry.interfaces.pocket import (
587 - PackagePublishingPocket,
588 - )
589 +from lp.registry.interfaces.pocket import PackagePublishingPocket

Why? :)

634 - d = method(librarian, slave_status, logger, send_notification)
635 - return d
636 + return None
637 + else:
638 + return method(librarian, slave_status, logger, send_notification)

I'd prefer if you left this with the local "d" variable because it makes it clear it's returning a Deferred. This is important because anything that reminds the caller is a good thing, especially morons like me who cocked it up once :)

945 - self.useFixture(BuilddSlaveTestSetup())

All of the other tests don't need this, and it's pretty slow to set up, which is why I recently moved it out of setUp(). Is there a good reason to put it back?

Finally - I'd love to see a test that tries to recreate the original bug. I think it's actually quite straightforward, I'll try and explain the steps:

 * Set up a build connected to a BrokenSlave so that when it's polled it'll abort().
 * Create and get a build at the stage where it's inside storeBuildInfo() but waiting on the Deferred to fire that returns the log.
 * Fire the reactor up so the outstanding Deferreds are processed - the abort() should get called before the log Deferred fires. That will abort all the work done in _handleStatus_OK() so is easy to test for.

Hope that makes sense!

review: Needs Fixing
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (3.7 KiB)

> Thanks a *bunch* for tackling this, I realise it's a bit soul-sapping. If it
> helps, know that I have spent most of the day going through your code to make
> sure it's all OK :)

That helps a lot. And I'm well aware that reviewing this stuff is more soul-sapping than writing it is. Thank you for wading through your 1290 lines of personal hell!

> 319 + transaction.commit()
> 320 + with DatabaseTransactionPolicy(read_only=False):
> 321 + library_file =
> getUtility(ILibraryFileAliasSet).create(
> 322 + filename, bytes_written, out_file,
> 323 + contentType=filenameToContentType(filename),
> 324 + restricted=private)
> 325 + transaction.commit()
>
> Should that commit at 319 be an abort()? Or are you trying to make it bomb
> out deliberately if there's data getting written where it should not be?

And there you put your finger on the deciding consideration, which I had totally forgotten in the meantime. Yes, thanks to DatabaseTransactionPolicy, a commit is actually better than an abort. It will reveal violations of the policy in a much more direct and helpful way than an abort would.

(Also, I believe the ORM makes abort slower because it needs to scan the live objects for changes that need undoing).

> 349 transaction.commit()
> 350 + with DatabaseTransactionPolicy(read_only=False):
> 351 + candidate.markAsBuilding(self)
> 352 + transaction.commit()
>
> Same here.
>
> And in fact in quite a few places so I won't list them all.

I went with the commit throughout, which only required 2 changes.

> 586 -from lp.registry.interfaces.pocket import (
> 587 - PackagePublishingPocket,
> 588 - )
> 589 +from lp.registry.interfaces.pocket import PackagePublishingPocket
>
> Why? :)

Automated: utilities/format-new-and-modified-imports.

> 634 - d = method(librarian, slave_status, logger,
> send_notification)
> 635 - return d
> 636 + return None
> 637 + else:
> 638 + return method(librarian, slave_status, logger,
> send_notification)
>
> I'd prefer if you left this with the local "d" variable because it makes it
> clear it's returning a Deferred. This is important because anything that
> reminds the caller is a good thing, especially morons like me who cocked it up
> once :)

You're right. Had I come across this a bit later in the branch, when I had more grey hairs but also more twisted/buildmanager experience, I would have kept the “d” as well. Fixed.

> 945 - self.useFixture(BuilddSlaveTestSetup())
>
> All of the other tests don't need this, and it's pretty slow to set up, which
> is why I recently moved it out of setUp(). Is there a good reason to put it
> back?

Probably a misguided step in changing the test cases over from that weirdly-named Twisted thing to our own TestCaseWithFactory. Fixed.

> Finally - I'd love to see a test that tries to recreate the original bug. I
> think it's actually quite s...

Read more...

Revision history for this message
Julian Edwards (julian-edwards) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/buildmaster/interfaces/builder.py'
2--- lib/lp/buildmaster/interfaces/builder.py 2011-07-14 21:49:37 +0000
3+++ lib/lp/buildmaster/interfaces/builder.py 2011-12-08 16:19:26 +0000
4@@ -1,4 +1,4 @@
5-# Copyright 2009 Canonical Ltd. This software is licensed under the
6+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
7 # GNU Affero General Public License version 3 (see the file LICENSE).
8
9 # pylint: disable-msg=E0211,E0213
10@@ -25,10 +25,10 @@
11 export_as_webservice_entry,
12 export_read_operation,
13 exported,
14+ operation_for_version,
15 operation_parameters,
16+ operation_returns_collection_of,
17 operation_returns_entry,
18- operation_returns_collection_of,
19- operation_for_version,
20 )
21 from lazr.restful.fields import (
22 Reference,
23@@ -50,12 +50,12 @@
24 from lp.app.validators.name import name_validator
25 from lp.app.validators.url import builder_url_validator
26 from lp.registry.interfaces.role import IHasOwner
27-from lp.soyuz.interfaces.processor import IProcessor
28 from lp.services.fields import (
29 Description,
30 PersonChoice,
31 Title,
32 )
33+from lp.soyuz.interfaces.processor import IProcessor
34
35
36 class BuildDaemonError(Exception):
37@@ -195,6 +195,8 @@
38
39 def setSlaveForTesting(proxy):
40 """Sets the RPC proxy through which to operate the build slave."""
41+ # XXX JeroenVermeulen 2011-11-09, bug=888010: Don't use this.
42+ # It's a trap. See bug for details.
43
44 def verifySlaveBuildCookie(slave_build_id):
45 """Verify that a slave's build cookie is consistent.
46
47=== modified file 'lib/lp/buildmaster/manager.py'
48--- lib/lp/buildmaster/manager.py 2011-10-31 09:23:47 +0000
49+++ lib/lp/buildmaster/manager.py 2011-12-08 16:19:26 +0000
50@@ -1,4 +1,4 @@
51-# Copyright 2009 Canonical Ltd. This software is licensed under the
52+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
53 # GNU Affero General Public License version 3 (see the file LICENSE).
54
55 """Soyuz buildd slave manager logic."""
56@@ -23,10 +23,6 @@
57 from zope.component import getUtility
58
59 from lp.buildmaster.enums import BuildStatus
60-from lp.buildmaster.interfaces.buildfarmjobbehavior import (
61- BuildBehaviorMismatch,
62- )
63-from lp.buildmaster.model.builder import Builder
64 from lp.buildmaster.interfaces.builder import (
65 BuildDaemonError,
66 BuildSlaveFailure,
67@@ -34,6 +30,11 @@
68 CannotFetchFile,
69 CannotResumeHost,
70 )
71+from lp.buildmaster.interfaces.buildfarmjobbehavior import (
72+ BuildBehaviorMismatch,
73+ )
74+from lp.buildmaster.model.builder import Builder
75+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
76
77
78 BUILDD_MANAGER_LOG_NAME = "slave-scanner"
79@@ -111,13 +112,17 @@
80 # algorithm for polling.
81 SCAN_INTERVAL = 15
82
83- def __init__(self, builder_name, logger):
84+ def __init__(self, builder_name, logger, clock=None):
85 self.builder_name = builder_name
86 self.logger = logger
87+ if clock is None:
88+ clock = reactor
89+ self._clock = clock
90
91 def startCycle(self):
92 """Scan the builder and dispatch to it or deal with failures."""
93 self.loop = LoopingCall(self.singleCycle)
94+ self.loop.clock = self._clock
95 self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
96 return self.stopping_deferred
97
98@@ -138,51 +143,58 @@
99 1. Print the error in the log
100 2. Increment and assess failure counts on the builder and job.
101 """
102- # Make sure that pending database updates are removed as it
103- # could leave the database in an inconsistent state (e.g. The
104- # job says it's running but the buildqueue has no builder set).
105+ # Since this is a failure path, we could be in a broken
106+ # transaction. Get us a fresh one.
107 transaction.abort()
108
109 # If we don't recognise the exception include a stack trace with
110 # the error.
111 error_message = failure.getErrorMessage()
112- if failure.check(
113+ familiar_error = failure.check(
114 BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch,
115- CannotResumeHost, BuildDaemonError, CannotFetchFile):
116- self.logger.info("Scanning %s failed with: %s" % (
117- self.builder_name, error_message))
118+ CannotResumeHost, BuildDaemonError, CannotFetchFile)
119+ if familiar_error:
120+ self.logger.info(
121+ "Scanning %s failed with: %s",
122+ self.builder_name, error_message)
123 else:
124- self.logger.info("Scanning %s failed with: %s\n%s" % (
125+ self.logger.info(
126+ "Scanning %s failed with: %s\n%s",
127 self.builder_name, failure.getErrorMessage(),
128- failure.getTraceback()))
129+ failure.getTraceback())
130
131 # Decide if we need to terminate the job or fail the
132 # builder.
133 try:
134 builder = get_builder(self.builder_name)
135- builder.gotFailure()
136- if builder.currentjob is not None:
137- build_farm_job = builder.getCurrentBuildFarmJob()
138- build_farm_job.gotFailure()
139- self.logger.info(
140- "builder %s failure count: %s, "
141- "job '%s' failure count: %s" % (
142+ transaction.commit()
143+
144+ with DatabaseTransactionPolicy(read_only=False):
145+ builder.gotFailure()
146+
147+ if builder.currentjob is None:
148+ self.logger.info(
149+ "Builder %s failed a probe, count: %s",
150+ self.builder_name, builder.failure_count)
151+ else:
152+ build_farm_job = builder.getCurrentBuildFarmJob()
153+ build_farm_job.gotFailure()
154+ self.logger.info(
155+ "builder %s failure count: %s, "
156+ "job '%s' failure count: %s",
157 self.builder_name,
158 builder.failure_count,
159 build_farm_job.title,
160- build_farm_job.failure_count))
161- else:
162- self.logger.info(
163- "Builder %s failed a probe, count: %s" % (
164- self.builder_name, builder.failure_count))
165- assessFailureCounts(builder, failure.getErrorMessage())
166- transaction.commit()
167+ build_farm_job.failure_count)
168+
169+ assessFailureCounts(builder, failure.getErrorMessage())
170+ transaction.commit()
171 except:
172 # Catastrophic code failure! Not much we can do.
173+ transaction.abort()
174 self.logger.error(
175 "Miserable failure when trying to examine failure counts:\n",
176 exc_info=True)
177- transaction.abort()
178
179 def checkCancellation(self, builder):
180 """See if there is a pending cancellation request.
181@@ -236,14 +248,9 @@
182 """
183 # We need to re-fetch the builder object on each cycle as the
184 # Storm store is invalidated over transaction boundaries.
185-
186 self.builder = get_builder(self.builder_name)
187
188 def status_updated(ignored):
189- # Commit the changes done while possibly rescuing jobs, to
190- # avoid holding table locks.
191- transaction.commit()
192-
193 # See if we think there's an active build on the builder.
194 buildqueue = self.builder.getBuildQueue()
195
196@@ -253,14 +260,10 @@
197 return self.builder.updateBuild(buildqueue)
198
199 def build_updated(ignored):
200- # Commit changes done while updating the build, to avoid
201- # holding table locks.
202- transaction.commit()
203-
204 # If the builder is in manual mode, don't dispatch anything.
205 if self.builder.manual:
206 self.logger.debug(
207- '%s is in manual mode, not dispatching.' %
208+ '%s is in manual mode, not dispatching.',
209 self.builder.name)
210 return
211
212@@ -278,22 +281,33 @@
213 job = self.builder.currentjob
214 if job is not None and not self.builder.builderok:
215 self.logger.info(
216- "%s was made unavailable, resetting attached "
217- "job" % self.builder.name)
218- job.reset()
219+ "%s was made unavailable; resetting attached job.",
220+ self.builder.name)
221 transaction.commit()
222+ with DatabaseTransactionPolicy(read_only=False):
223+ job.reset()
224+ transaction.commit()
225 return
226
227 # See if there is a job we can dispatch to the builder slave.
228
229+ # XXX JeroenVermeulen 2011-10-11, bug=872112: The job's
230+ # failure count will be reset once the job has started
231+ # successfully. Because of intervening commits, you may see
232+ # a build with a nonzero failure count that's actually going
233+ # to succeed later (and have a failure count of zero). Or
234+ # it may fail yet end up with a lower failure count than you
235+ # saw earlier.
236 d = self.builder.findAndStartJob()
237
238 def job_started(candidate):
239 if self.builder.currentjob is not None:
240 # After a successful dispatch we can reset the
241 # failure_count.
242- self.builder.resetFailureCount()
243 transaction.commit()
244+ with DatabaseTransactionPolicy(read_only=False):
245+ self.builder.resetFailureCount()
246+ transaction.commit()
247 return self.builder.slave
248 else:
249 return None
250@@ -372,6 +386,7 @@
251 self.logger = self._setupLogger()
252 self.new_builders_scanner = NewBuildersScanner(
253 manager=self, clock=clock)
254+ self.transaction_policy = DatabaseTransactionPolicy(read_only=True)
255
256 def _setupLogger(self):
257 """Set up a 'slave-scanner' logger that redirects to twisted.
258@@ -390,16 +405,28 @@
259 logger.setLevel(level)
260 return logger
261
262+ def enterReadOnlyDatabasePolicy(self):
263+ """Set the database transaction policy to read-only.
264+
265+ Any previously pending changes are committed first.
266+ """
267+ transaction.commit()
268+ self.transaction_policy.__enter__()
269+
270+ def exitReadOnlyDatabasePolicy(self, *args):
271+ """Reset database transaction policy to the default read-write."""
272+ self.transaction_policy.__exit__(None, None, None)
273+
274 def startService(self):
275 """Service entry point, called when the application starts."""
276+ # Avoiding circular imports.
277+ from lp.buildmaster.interfaces.builder import IBuilderSet
278+
279+ self.enterReadOnlyDatabasePolicy()
280
281 # Get a list of builders and set up scanners on each one.
282-
283- # Avoiding circular imports.
284- from lp.buildmaster.interfaces.builder import IBuilderSet
285- builder_set = getUtility(IBuilderSet)
286- builders = [builder.name for builder in builder_set]
287- self.addScanForBuilders(builders)
288+ self.addScanForBuilders(
289+ [builder.name for builder in getUtility(IBuilderSet)])
290 self.new_builders_scanner.scheduleScan()
291
292 # Events will now fire in the SlaveScanner objects to scan each
293@@ -420,6 +447,7 @@
294 # stopped, so we can wait on them all at once here before
295 # exiting.
296 d = defer.DeferredList(deferreds, consumeErrors=True)
297+ d.addCallback(self.exitReadOnlyDatabasePolicy)
298 return d
299
300 def addScanForBuilders(self, builders):
301
302=== modified file 'lib/lp/buildmaster/model/builder.py'
303--- lib/lp/buildmaster/model/builder.py 2011-11-23 12:02:59 +0000
304+++ lib/lp/buildmaster/model/builder.py 2011-12-08 16:19:26 +0000
305@@ -1,4 +1,4 @@
306-# Copyright 2009,2011 Canonical Ltd. This software is licensed under the
307+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
308 # GNU Affero General Public License version 3 (see the file LICENSE).
309
310 # pylint: disable-msg=E0611,W0212
311@@ -76,6 +76,7 @@
312 specific_job_classes,
313 )
314 from lp.registry.interfaces.person import validate_public_person
315+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
316 from lp.services.job.interfaces.job import JobStatus
317 from lp.services.job.model.job import Job
318 from lp.services.propertycache import (
319@@ -545,6 +546,8 @@
320
321 def setSlaveForTesting(self, proxy):
322 """See IBuilder."""
323+ # XXX JeroenVermeulen 2011-11-09, bug=888010: Don't use this.
324+ # It's a trap. See bug for details.
325 self._testing_slave = proxy
326 del get_property_cache(self).slave
327
328@@ -673,10 +676,13 @@
329 bytes_written = out_file.tell()
330 out_file.seek(0)
331
332- library_file = getUtility(ILibraryFileAliasSet).create(
333- filename, bytes_written, out_file,
334- contentType=filenameToContentType(filename),
335- restricted=private)
336+ transaction.commit()
337+ with DatabaseTransactionPolicy(read_only=False):
338+ library_file = getUtility(ILibraryFileAliasSet).create(
339+ filename, bytes_written, out_file,
340+ contentType=filenameToContentType(filename),
341+ restricted=private)
342+ transaction.commit()
343 finally:
344 # Remove the temporary file. getFile() closes the file
345 # object.
346@@ -714,7 +720,7 @@
347 def acquireBuildCandidate(self):
348 """Acquire a build candidate in an atomic fashion.
349
350- When retrieiving a candidate we need to mark it as building
351+ When retrieving a candidate we need to mark it as building
352 immediately so that it is not dispatched by another builder in the
353 build manager.
354
355@@ -724,12 +730,15 @@
356 can be in this code at the same time.
357
358 If there's ever more than one build manager running at once, then
359- this code will need some sort of mutex.
360+ this code will need some sort of mutex, or run in a single
361+ transaction.
362 """
363 candidate = self._findBuildCandidate()
364 if candidate is not None:
365- candidate.markAsBuilding(self)
366 transaction.commit()
367+ with DatabaseTransactionPolicy(read_only=False):
368+ candidate.markAsBuilding(self)
369+ transaction.commit()
370 return candidate
371
372 def _findBuildCandidate(self):
373
374=== modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py'
375--- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2011-06-09 10:50:25 +0000
376+++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2011-12-08 16:19:26 +0000
377@@ -1,4 +1,4 @@
378-# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
379+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
380 # GNU Affero General Public License version 3 (see the file LICENSE).
381
382 # pylint: disable-msg=E0211,E0213
383@@ -16,8 +16,8 @@
384 import socket
385 import xmlrpclib
386
387+import transaction
388 from twisted.internet import defer
389-
390 from zope.component import getUtility
391 from zope.interface import implements
392 from zope.security.proxy import removeSecurityProxy
393@@ -32,6 +32,7 @@
394 IBuildFarmJobBehavior,
395 )
396 from lp.services import encoding
397+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
398 from lp.services.job.interfaces.job import JobStatus
399
400
401@@ -70,6 +71,25 @@
402 if slave_build_cookie != expected_cookie:
403 raise CorruptBuildCookie("Invalid slave build cookie.")
404
405+ def _getBuilderStatusHandler(self, status_text, logger):
406+ """Look up the handler method for a given builder status.
407+
408+ If status is not a known one, logs an error and returns None.
409+ """
410+ builder_status_handlers = {
411+ 'BuilderStatus.IDLE': self.updateBuild_IDLE,
412+ 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,
413+ 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,
414+ 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,
415+ 'BuilderStatus.WAITING': self.updateBuild_WAITING,
416+ }
417+ handler = builder_status_handlers.get(status_text)
418+ if handler is None:
419+ logger.critical(
420+ "Builder on %s returned unknown status %s; failing it.",
421+ self._builder.url, status_text)
422+ return handler
423+
424 def updateBuild(self, queueItem):
425 """See `IBuildFarmJobBehavior`."""
426 logger = logging.getLogger('slave-scanner')
427@@ -77,6 +97,7 @@
428 d = self._builder.slaveStatus()
429
430 def got_failure(failure):
431+ transaction.abort()
432 failure.trap(xmlrpclib.Fault, socket.error)
433 info = failure.value
434 info = ("Could not contact the builder %s, caught a (%s)"
435@@ -84,27 +105,22 @@
436 raise BuildSlaveFailure(info)
437
438 def got_status(slave_status):
439- builder_status_handlers = {
440- 'BuilderStatus.IDLE': self.updateBuild_IDLE,
441- 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,
442- 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,
443- 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,
444- 'BuilderStatus.WAITING': self.updateBuild_WAITING,
445- }
446-
447 builder_status = slave_status['builder_status']
448- if builder_status not in builder_status_handlers:
449- logger.critical(
450- "Builder on %s returned unknown status %s, failing it"
451- % (self._builder.url, builder_status))
452- self._builder.failBuilder(
453+ status_handler = self._getBuilderStatusHandler(
454+ builder_status, logger)
455+ if status_handler is None:
456+ error = (
457 "Unknown status code (%s) returned from status() probe."
458 % builder_status)
459- # XXX: This will leave the build and job in a bad state, but
460- # should never be possible, since our builder statuses are
461- # known.
462- queueItem._builder = None
463- queueItem.setDateStarted(None)
464+ transaction.commit()
465+ with DatabaseTransactionPolicy(read_only=False):
466+ self._builder.failBuilder(error)
467+ # XXX: This will leave the build and job in a bad
468+ # state, but should never be possible since our
469+ # builder statuses are known.
470+ queueItem._builder = None
471+ queueItem.setDateStarted(None)
472+ transaction.commit()
473 return
474
475 # Since logtail is a xmlrpclib.Binary container and it is
476@@ -114,9 +130,8 @@
477 # will simply remove the proxy.
478 logtail = removeSecurityProxy(slave_status.get('logtail'))
479
480- method = builder_status_handlers[builder_status]
481 return defer.maybeDeferred(
482- method, queueItem, slave_status, logtail, logger)
483+ status_handler, queueItem, slave_status, logtail, logger)
484
485 d.addErrback(got_failure)
486 d.addCallback(got_status)
487@@ -128,22 +143,32 @@
488 Log this and reset the record.
489 """
490 logger.warn(
491- "Builder %s forgot about buildqueue %d -- resetting buildqueue "
492- "record" % (queueItem.builder.url, queueItem.id))
493- queueItem.reset()
494+ "Builder %s forgot about buildqueue %d -- "
495+ "resetting buildqueue record.",
496+ queueItem.builder.url, queueItem.id)
497+ transaction.commit()
498+ with DatabaseTransactionPolicy(read_only=False):
499+ queueItem.reset()
500+ transaction.commit()
501
502 def updateBuild_BUILDING(self, queueItem, slave_status, logtail, logger):
503 """Build still building, collect the logtail"""
504- if queueItem.job.status != JobStatus.RUNNING:
505- queueItem.job.start()
506- queueItem.logtail = encoding.guess(str(logtail))
507+ transaction.commit()
508+ with DatabaseTransactionPolicy(read_only=False):
509+ if queueItem.job.status != JobStatus.RUNNING:
510+ queueItem.job.start()
511+ queueItem.logtail = encoding.guess(str(logtail))
512+ transaction.commit()
513
514 def updateBuild_ABORTING(self, queueItem, slave_status, logtail, logger):
515 """Build was ABORTED.
516
517 Master-side should wait until the slave finish the process correctly.
518 """
519- queueItem.logtail = "Waiting for slave process to be terminated"
520+ transaction.commit()
521+ with DatabaseTransactionPolicy(read_only=False):
522+ queueItem.logtail = "Waiting for slave process to be terminated"
523+ transaction.commit()
524
525 def updateBuild_ABORTED(self, queueItem, slave_status, logtail, logger):
526 """ABORTING process has successfully terminated.
527@@ -151,11 +176,16 @@
528 Clean the builder for another jobs.
529 """
530 d = queueItem.builder.cleanSlave()
531+
532 def got_cleaned(ignored):
533- queueItem.builder = None
534- if queueItem.job.status != JobStatus.FAILED:
535- queueItem.job.fail()
536- queueItem.specific_job.jobAborted()
537+ transaction.commit()
538+ with DatabaseTransactionPolicy(read_only=False):
539+ queueItem.builder = None
540+ if queueItem.job.status != JobStatus.FAILED:
541+ queueItem.job.fail()
542+ queueItem.specific_job.jobAborted()
543+ transaction.commit()
544+
545 return d.addCallback(got_cleaned)
546
547 def extractBuildStatus(self, slave_status):
548
549=== modified file 'lib/lp/buildmaster/model/packagebuild.py'
550--- lib/lp/buildmaster/model/packagebuild.py 2011-08-31 19:41:51 +0000
551+++ lib/lp/buildmaster/model/packagebuild.py 2011-12-08 16:19:26 +0000
552@@ -1,4 +1,4 @@
553-# Copyright 2010 Canonical Ltd. This software is licensed under the
554+# Copyright 2010-2011 Canonical Ltd. This software is licensed under the
555 # GNU Affero General Public License version 3 (see the file LICENSE).
556
557 __metaclass__ = type
558@@ -9,11 +9,11 @@
559 ]
560
561
562+from cStringIO import StringIO
563 import datetime
564 import logging
565 import os.path
566
567-from cStringIO import StringIO
568 from lazr.delegates import delegates
569 import pytz
570 from storm.expr import Desc
571@@ -24,6 +24,7 @@
572 Storm,
573 Unicode,
574 )
575+import transaction
576 from zope.component import getUtility
577 from zope.interface import (
578 classProvides,
579@@ -43,8 +44,8 @@
580 MAIN_STORE,
581 )
582 from lp.buildmaster.enums import (
583+ BuildFarmJobType,
584 BuildStatus,
585- BuildFarmJobType,
586 )
587 from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSource
588 from lp.buildmaster.interfaces.packagebuild import (
589@@ -57,9 +58,8 @@
590 BuildFarmJobDerived,
591 )
592 from lp.buildmaster.model.buildqueue import BuildQueue
593-from lp.registry.interfaces.pocket import (
594- PackagePublishingPocket,
595- )
596+from lp.registry.interfaces.pocket import PackagePublishingPocket
597+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
598 from lp.soyuz.adapters.archivedependencies import (
599 default_component_dependency_name,
600 )
601@@ -179,19 +179,24 @@
602 def storeBuildInfo(build, librarian, slave_status):
603 """See `IPackageBuild`."""
604 def got_log(lfa_id):
605+ dependencies = slave_status.get('dependencies')
606+ if dependencies is not None:
607+ dependencies = unicode(dependencies)
608+
609 # log, builder and date_finished are read-only, so we must
610 # currently remove the security proxy to set them.
611 naked_build = removeSecurityProxy(build)
612- naked_build.log = lfa_id
613- naked_build.builder = build.buildqueue_record.builder
614- # XXX cprov 20060615 bug=120584: Currently buildduration includes
615- # the scanner latency, it should really be asking the slave for
616- # the duration spent building locally.
617- naked_build.date_finished = datetime.datetime.now(pytz.UTC)
618- if slave_status.get('dependencies') is not None:
619- build.dependencies = unicode(slave_status.get('dependencies'))
620- else:
621- build.dependencies = None
622+
623+ transaction.commit()
624+ with DatabaseTransactionPolicy(read_only=False):
625+ naked_build.log = lfa_id
626+ naked_build.builder = build.buildqueue_record.builder
627+ # XXX cprov 20060615 bug=120584: Currently buildduration
628+ # includes the scanner latency. It should really be asking
629+ # the slave for the duration spent building locally.
630+ naked_build.date_finished = datetime.datetime.now(pytz.UTC)
631+ build.dependencies = dependencies
632+ transaction.commit()
633
634 d = build.getLogFromSlave(build)
635 return d.addCallback(got_log)
636@@ -292,22 +297,41 @@
637
638 def handleStatus(self, status, librarian, slave_status):
639 """See `IPackageBuild`."""
640+ # Avoid circular imports.
641 from lp.buildmaster.manager import BUILDD_MANAGER_LOG_NAME
642+
643 logger = logging.getLogger(BUILDD_MANAGER_LOG_NAME)
644 send_notification = status in self.ALLOWED_STATUS_NOTIFICATIONS
645 method = getattr(self, '_handleStatus_' + status, None)
646 if method is None:
647- logger.critical("Unknown BuildStatus '%s' for builder '%s'"
648- % (status, self.buildqueue_record.builder.url))
649- return
650+ logger.critical(
651+ "Unknown BuildStatus '%s' for builder '%s'",
652+ status, self.buildqueue_record.builder.url)
653+ return None
654+
655 d = method(librarian, slave_status, logger, send_notification)
656 return d
657
658+ def _destroy_buildqueue_record(self, unused_arg):
659+ """Destroy this build's `BuildQueue` record."""
660+ transaction.commit()
661+ with DatabaseTransactionPolicy(read_only=False):
662+ self.buildqueue_record.destroySelf()
663+ transaction.commit()
664+
665 def _release_builder_and_remove_queue_item(self):
666 # Release the builder for another job.
667 d = self.buildqueue_record.builder.cleanSlave()
668 # Remove BuildQueue record.
669- return d.addCallback(lambda x: self.buildqueue_record.destroySelf())
670+ return d.addCallback(self._destroy_buildqueue_record)
671+
672+ def _notify_if_appropriate(self, appropriate=True, extra_info=None):
673+ """If `appropriate`, call `self.notify` in a write transaction."""
674+ if appropriate:
675+ transaction.commit()
676+ with DatabaseTransactionPolicy(read_only=False):
677+ self.notify(extra_info=extra_info)
678+ transaction.commit()
679
680 def _handleStatus_OK(self, librarian, slave_status, logger,
681 send_notification):
682@@ -323,16 +347,19 @@
683 self.buildqueue_record.specific_job.build.title,
684 self.buildqueue_record.builder.name))
685
686- # If this is a binary package build, discard it if its source is
687- # no longer published.
688+ # If this is a binary package build for a source that is no
689+ # longer published, discard it.
690 if self.build_farm_job_type == BuildFarmJobType.PACKAGEBUILD:
691 build = self.buildqueue_record.specific_job.build
692 if not build.current_source_publication:
693- build.status = BuildStatus.SUPERSEDED
694+ transaction.commit()
695+ with DatabaseTransactionPolicy(read_only=False):
696+ build.status = BuildStatus.SUPERSEDED
697+ transaction.commit()
698 return self._release_builder_and_remove_queue_item()
699
700- # Explode before collect a binary that is denied in this
701- # distroseries/pocket
702+ # Explode rather than collect a binary that is denied in this
703+ # distroseries/pocket.
704 if not self.archive.allowUpdatesToReleasePocket():
705 assert self.distro_series.canUploadToPocket(self.pocket), (
706 "%s (%s) can not be built for pocket %s: illegal status"
707@@ -377,18 +404,26 @@
708 # files from the slave.
709 if successful_copy_from_slave:
710 logger.info(
711- "Gathered %s %d completely. Moving %s to uploader queue."
712- % (self.__class__.__name__, self.id, upload_leaf))
713+ "Gathered %s %d completely. "
714+ "Moving %s to uploader queue.",
715+ self.__class__.__name__, self.id, upload_leaf)
716 target_dir = os.path.join(root, "incoming")
717- self.status = BuildStatus.UPLOADING
718+ resulting_status = BuildStatus.UPLOADING
719 else:
720 logger.warning(
721- "Copy from slave for build %s was unsuccessful.", self.id)
722- self.status = BuildStatus.FAILEDTOUPLOAD
723- if send_notification:
724- self.notify(
725- extra_info='Copy from slave was unsuccessful.')
726+ "Copy from slave for build %s was unsuccessful.",
727+ self.id)
728 target_dir = os.path.join(root, "failed")
729+ resulting_status = BuildStatus.FAILEDTOUPLOAD
730+
731+ transaction.commit()
732+ with DatabaseTransactionPolicy(read_only=False):
733+ self.status = resulting_status
734+ transaction.commit()
735+
736+ if not successful_copy_from_slave:
737+ self._notify_if_appropriate(
738+ send_notification, "Copy from slave was unsuccessful.")
739
740 if not os.path.exists(target_dir):
741 os.mkdir(target_dir)
742@@ -396,10 +431,6 @@
743 # Release the builder for another job.
744 d = self._release_builder_and_remove_queue_item()
745
746- # Commit so there are no race conditions with archiveuploader
747- # about self.status.
748- Store.of(self).commit()
749-
750 # Move the directory used to grab the binaries into
751 # the incoming directory so the upload processor never
752 # sees half-finished uploads.
753@@ -423,14 +454,15 @@
754 set the job status as FAILEDTOBUILD, store available info and
755 remove Buildqueue entry.
756 """
757- self.status = BuildStatus.FAILEDTOBUILD
758+ transaction.commit()
759+ with DatabaseTransactionPolicy(read_only=False):
760+ self.status = BuildStatus.FAILEDTOBUILD
761+ transaction.commit()
762
763 def build_info_stored(ignored):
764- if send_notification:
765- self.notify()
766+ self._notify_if_appropriate(send_notification)
767 d = self.buildqueue_record.builder.cleanSlave()
768- return d.addCallback(
769- lambda x: self.buildqueue_record.destroySelf())
770+ return d.addCallback(self._destroy_buildqueue_record)
771
772 d = self.storeBuildInfo(self, librarian, slave_status)
773 return d.addCallback(build_info_stored)
774@@ -448,11 +480,9 @@
775 def build_info_stored(ignored):
776 logger.critical("***** %s is MANUALDEPWAIT *****"
777 % self.buildqueue_record.builder.name)
778- if send_notification:
779- self.notify()
780+ self._notify_if_appropriate(send_notification)
781 d = self.buildqueue_record.builder.cleanSlave()
782- return d.addCallback(
783- lambda x: self.buildqueue_record.destroySelf())
784+ return d.addCallback(self._destroy_buildqueue_record)
785
786 d = self.storeBuildInfo(self, librarian, slave_status)
787 return d.addCallback(build_info_stored)
788@@ -468,17 +498,24 @@
789 self.status = BuildStatus.CHROOTWAIT
790
791 def build_info_stored(ignored):
792- logger.critical("***** %s is CHROOTWAIT *****" %
793- self.buildqueue_record.builder.name)
794- if send_notification:
795- self.notify()
796+ logger.critical(
797+ "***** %s is CHROOTWAIT *****",
798+ self.buildqueue_record.builder.name)
799+
800+ self._notify_if_appropriate(send_notification)
801 d = self.buildqueue_record.builder.cleanSlave()
802- return d.addCallback(
803- lambda x: self.buildqueue_record.destroySelf())
804+ return d.addCallback(self._destroy_buildqueue_record)
805
806 d = self.storeBuildInfo(self, librarian, slave_status)
807 return d.addCallback(build_info_stored)
808
809+ def _reset_buildqueue_record(self, ignored_arg=None):
810+ """Reset the `BuildQueue` record, in a write transaction."""
811+ transaction.commit()
812+ with DatabaseTransactionPolicy(read_only=False):
813+ self.buildqueue_record.reset()
814+ transaction.commit()
815+
816 def _handleStatus_BUILDERFAIL(self, librarian, slave_status, logger,
817 send_notification):
818 """Handle builder failures.
819@@ -492,11 +529,8 @@
820 self.buildqueue_record.builder.failBuilder(
821 "Builder returned BUILDERFAIL when asked for its status")
822
823- def build_info_stored(ignored):
824- # simply reset job
825- self.buildqueue_record.reset()
826 d = self.storeBuildInfo(self, librarian, slave_status)
827- return d.addCallback(build_info_stored)
828+ return d.addCallback(self._reset_buildqueue_record)
829
830 def _handleStatus_GIVENBACK(self, librarian, slave_status, logger,
831 send_notification):
832@@ -516,7 +550,7 @@
833 # the next Paris Summit, infinity has some ideas about how
834 # to use this content. For now we just ensure it's stored.
835 d = self.buildqueue_record.builder.cleanSlave()
836- self.buildqueue_record.reset()
837+ self._reset_buildqueue_record()
838 return d
839
840 d = self.storeBuildInfo(self, librarian, slave_status)
841
842=== modified file 'lib/lp/buildmaster/tests/test_builder.py'
843--- lib/lp/buildmaster/tests/test_builder.py 2011-11-20 23:37:23 +0000
844+++ lib/lp/buildmaster/tests/test_builder.py 2011-12-08 16:19:26 +0000
845@@ -155,7 +155,7 @@
846 d = lostbuilding_builder.updateStatus(BufferLogger())
847 def check_slave_status(failure):
848 self.assertIn('abort', slave.call_log)
849- # 'Fault' comes from the LostBuildingBrokenSlave, this is
850+ # 'Fault' comes from the LostBuildingBrokenSlave. This is
851 # just testing that the value is passed through.
852 self.assertIsInstance(failure.value, xmlrpclib.Fault)
853 return d.addBoth(check_slave_status)
854
855=== modified file 'lib/lp/buildmaster/tests/test_manager.py'
856--- lib/lp/buildmaster/tests/test_manager.py 2011-11-20 23:37:23 +0000
857+++ lib/lp/buildmaster/tests/test_manager.py 2011-12-08 16:19:26 +0000
858@@ -1,40 +1,36 @@
859-# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
860+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
861 # GNU Affero General Public License version 3 (see the file LICENSE).
862
863 """Tests for the renovated slave scanner aka BuilddManager."""
864
865+from collections import namedtuple
866 import os
867 import signal
868 import time
869 import xmlrpclib
870
871+from lpbuildd.tests import BuilddSlaveTestSetup
872 from testtools.deferredruntest import (
873 assert_fails_with,
874 AsynchronousDeferredRunTest,
875 )
876-
877 import transaction
878-
879 from twisted.internet import (
880 defer,
881 reactor,
882 task,
883 )
884-from twisted.internet.task import (
885- deferLater,
886- )
887+from twisted.internet.task import deferLater
888 from twisted.python.failure import Failure
889 from zope.component import getUtility
890 from zope.security.proxy import removeSecurityProxy
891
892-from lpbuildd.tests import BuilddSlaveTestSetup
893-
894 from canonical.config import config
895+from canonical.database.constants import UTC_NOW
896 from canonical.launchpad.ftests import (
897 ANONYMOUS,
898 login,
899 )
900-from lp.services.log.logger import BufferLogger
901 from canonical.testing.layers import (
902 LaunchpadScriptLayer,
903 LaunchpadZopelessLayer,
904@@ -50,14 +46,18 @@
905 SlaveScanner,
906 )
907 from lp.buildmaster.model.builder import Builder
908+from lp.buildmaster.model.packagebuild import PackageBuild
909 from lp.buildmaster.tests.harness import BuilddManagerTestSetup
910 from lp.buildmaster.tests.mock_slaves import (
911 BrokenSlave,
912 BuildingSlave,
913 make_publisher,
914 OkSlave,
915+ WaitingSlave,
916 )
917 from lp.registry.interfaces.distribution import IDistributionSet
918+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
919+from lp.services.log.logger import BufferLogger
920 from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet
921 from lp.testing import (
922 TestCase,
923@@ -65,10 +65,13 @@
924 )
925 from lp.testing.factory import LaunchpadObjectFactory
926 from lp.testing.fakemethod import FakeMethod
927-from lp.testing.sampledata import BOB_THE_BUILDER_NAME
928-
929-
930-class TestSlaveScannerScan(TestCase):
931+from lp.testing.sampledata import (
932+ BOB_THE_BUILDER_NAME,
933+ FROG_THE_BUILDER_NAME,
934+ )
935+
936+
937+class TestSlaveScannerScan(TestCaseWithFactory):
938 """Tests `SlaveScanner.scan` method.
939
940 This method uses the old framework for scanning and dispatching builds.
941@@ -83,6 +86,8 @@
942 'bob' builder.
943 """
944 super(TestSlaveScannerScan, self).setUp()
945+ self.read_only = DatabaseTransactionPolicy(read_only=True)
946+
947 # Creating the required chroots needed for dispatching.
948 test_publisher = make_publisher()
949 ubuntu = getUtility(IDistributionSet).getByName('ubuntu')
950@@ -90,6 +95,15 @@
951 test_publisher.setUpDefaultDistroSeries(hoary)
952 test_publisher.addFakeChroots()
953
954+ def _enterReadOnly(self):
955+ """Go into read-only transaction policy."""
956+ self.read_only.__enter__()
957+ self.addCleanup(self._exitReadOnly)
958+
959+ def _exitReadOnly(self):
960+ """Leave read-only transaction policy."""
961+ self.read_only.__exit__(None, None, None)
962+
963 def _resetBuilder(self, builder):
964 """Reset the given builder and its job."""
965
966@@ -100,6 +114,23 @@
967
968 transaction.commit()
969
970+ def getFreshBuilder(self, slave=None, name=BOB_THE_BUILDER_NAME,
971+ failure_count=0):
972+ """Return a builder.
973+
974+ The builder is taken from sample data, but reset to a usable state.
975+ Be careful: this is not a proper factory method. Identical calls
976+ return (and reset) the same builder. Don't rely on that though;
977+ maybe someday we'll have a proper factory here.
978+ """
979+ if slave is None:
980+ slave = OkSlave()
981+ builder = getUtility(IBuilderSet)[name]
982+ self._resetBuilder(builder)
983+ builder.setSlaveForTesting(slave)
984+ builder.failure_count = failure_count
985+ return builder
986+
987 def assertBuildingJob(self, job, builder, logtail=None):
988 """Assert the given job is building on the given builder."""
989 from lp.services.job.interfaces.job import JobStatus
990@@ -114,14 +145,14 @@
991 self.assertEqual(build.status, BuildStatus.BUILDING)
992 self.assertEqual(job.logtail, logtail)
993
994- def _getScanner(self, builder_name=None):
995+ def _getScanner(self, builder_name=None, clock=None):
996 """Instantiate a SlaveScanner object.
997
998 Replace its default logging handler by a testing version.
999 """
1000 if builder_name is None:
1001 builder_name = BOB_THE_BUILDER_NAME
1002- scanner = SlaveScanner(builder_name, BufferLogger())
1003+ scanner = SlaveScanner(builder_name, BufferLogger(), clock=clock)
1004 scanner.logger.name = 'slave-scanner'
1005
1006 return scanner
1007@@ -137,17 +168,15 @@
1008 def testScanDispatchForResetBuilder(self):
1009 # A job gets dispatched to the sampledata builder after it's reset.
1010
1011- # Reset sampledata builder.
1012- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
1013- self._resetBuilder(builder)
1014- builder.setSlaveForTesting(OkSlave())
1015- # Set this to 1 here so that _checkDispatch can make sure it's
1016- # reset to 0 after a successful dispatch.
1017- builder.failure_count = 1
1018+ # Obtain a builder. Initialize failure count to 1 so that
1019+ # _checkDispatch can make sure that a successful dispatch resets
1020+ # the count to 0.
1021+ builder = self.getFreshBuilder(failure_count=1)
1022
1023 # Run 'scan' and check its result.
1024 self.layer.txn.commit()
1025 self.layer.switchDbUser(config.builddmaster.dbuser)
1026+ self._enterReadOnly()
1027 scanner = self._getScanner()
1028 d = defer.maybeDeferred(scanner.scan)
1029 d.addCallback(self._checkDispatch, builder)
1030@@ -160,20 +189,18 @@
1031 to the asynchonous dispatcher and the builder remained active
1032 and IDLE.
1033 """
1034- self.assertTrue(slave is None, "Unexpected slave.")
1035+ self.assertIs(None, slave, "Unexpected slave.")
1036
1037 builder = getUtility(IBuilderSet).get(builder.id)
1038 self.assertTrue(builder.builderok)
1039- self.assertTrue(builder.currentjob is None)
1040+ self.assertIs(None, builder.currentjob)
1041
1042 def testNoDispatchForMissingChroots(self):
1043 # When a required chroot is not present the `scan` method
1044 # should not return any `RecordingSlaves` to be processed
1045 # and the builder used should remain active and IDLE.
1046
1047- # Reset sampledata builder.
1048- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
1049- self._resetBuilder(builder)
1050+ builder = self.getFreshBuilder()
1051
1052 # Remove hoary/i386 chroot.
1053 login('foo.bar@canonical.com')
1054@@ -186,6 +213,7 @@
1055
1056 # Run 'scan' and check its result.
1057 self.layer.switchDbUser(config.builddmaster.dbuser)
1058+ self._enterReadOnly()
1059 scanner = self._getScanner()
1060 d = defer.maybeDeferred(scanner.singleCycle)
1061 d.addCallback(self._checkNoDispatch, builder)
1062@@ -227,6 +255,7 @@
1063
1064 # Run 'scan' and check its result.
1065 self.layer.switchDbUser(config.builddmaster.dbuser)
1066+ self._enterReadOnly()
1067 scanner = self._getScanner()
1068 d = defer.maybeDeferred(scanner.scan)
1069 d.addCallback(self._checkJobRescued, builder, job)
1070@@ -262,25 +291,27 @@
1071
1072 # Run 'scan' and check its result.
1073 self.layer.switchDbUser(config.builddmaster.dbuser)
1074+ self._enterReadOnly()
1075 scanner = self._getScanner()
1076 d = defer.maybeDeferred(scanner.scan)
1077 d.addCallback(self._checkJobUpdated, builder, job)
1078 return d
1079
1080 def test_scan_with_nothing_to_dispatch(self):
1081- factory = LaunchpadObjectFactory()
1082- builder = factory.makeBuilder()
1083+ builder = self.factory.makeBuilder()
1084 builder.setSlaveForTesting(OkSlave())
1085+ transaction.commit()
1086+ self._enterReadOnly()
1087 scanner = self._getScanner(builder_name=builder.name)
1088 d = scanner.scan()
1089 return d.addCallback(self._checkNoDispatch, builder)
1090
1091 def test_scan_with_manual_builder(self):
1092 # Reset sampledata builder.
1093- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
1094- self._resetBuilder(builder)
1095- builder.setSlaveForTesting(OkSlave())
1096+ builder = self.getFreshBuilder()
1097 builder.manual = True
1098+ transaction.commit()
1099+ self._enterReadOnly()
1100 scanner = self._getScanner()
1101 d = scanner.scan()
1102 d.addCallback(self._checkNoDispatch, builder)
1103@@ -288,10 +319,10 @@
1104
1105 def test_scan_with_not_ok_builder(self):
1106 # Reset sampledata builder.
1107- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
1108- self._resetBuilder(builder)
1109- builder.setSlaveForTesting(OkSlave())
1110+ builder = self.getFreshBuilder()
1111 builder.builderok = False
1112+ transaction.commit()
1113+ self._enterReadOnly()
1114 scanner = self._getScanner()
1115 d = scanner.scan()
1116 # Because the builder is not ok, we can't use _checkNoDispatch.
1117@@ -300,25 +331,27 @@
1118 return d
1119
1120 def test_scan_of_broken_slave(self):
1121- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
1122- self._resetBuilder(builder)
1123- builder.setSlaveForTesting(BrokenSlave())
1124- builder.failure_count = 0
1125+ builder = self.getFreshBuilder(slave=BrokenSlave())
1126+ transaction.commit()
1127+ self._enterReadOnly()
1128 scanner = self._getScanner(builder_name=builder.name)
1129 d = scanner.scan()
1130 return assert_fails_with(d, xmlrpclib.Fault)
1131
1132 def _assertFailureCounting(self, builder_count, job_count,
1133 expected_builder_count, expected_job_count):
1134+ # Avoid circular imports.
1135+ from lp.buildmaster import manager as manager_module
1136+
1137 # If scan() fails with an exception, failure_counts should be
1138 # incremented. What we do with the results of the failure
1139 # counts is tested below separately, this test just makes sure that
1140 # scan() is setting the counts.
1141 def failing_scan():
1142 return defer.fail(Exception("fake exception"))
1143+
1144 scanner = self._getScanner()
1145 scanner.scan = failing_scan
1146- from lp.buildmaster import manager as manager_module
1147 self.patch(manager_module, 'assessFailureCounts', FakeMethod())
1148 builder = getUtility(IBuilderSet)[scanner.builder_name]
1149
1150@@ -466,6 +499,60 @@
1151 d.addCallback(check_cancelled, builder, buildqueue)
1152 return d
1153
1154+ def makeFakeFailure(self):
1155+ """Produce a fake failure for use with SlaveScanner._scanFailed."""
1156+ FakeFailure = namedtuple('FakeFailure', ['getErrorMessage', 'check'])
1157+ return FakeFailure(
1158+ FakeMethod(self.factory.getUniqueString()),
1159+ FakeMethod(True))
1160+
1161+ def test_interleaved_success_and_failure_do_not_interfere(self):
1162+ # It's possible for one builder to fail while another continues
1163+ # to function properly. When that happens, the failed builder
1164+ # may cause database changes to be rolled back. But that does
1165+ # not affect the functioning builder.
1166+ clock = task.Clock()
1167+
1168+ broken_builder = self.getFreshBuilder(
1169+ slave=BrokenSlave(), name=BOB_THE_BUILDER_NAME)
1170+ broken_scanner = self._getScanner(builder_name=broken_builder.name)
1171+ good_builder = self.getFreshBuilder(
1172+ slave=WaitingSlave(), name=FROG_THE_BUILDER_NAME)
1173+ good_build = self.factory.makeBinaryPackageBuild(
1174+ distroarchseries=self.factory.makeDistroArchSeries())
1175+
1176+ # The good build is being handled by the good builder.
1177+ buildqueue = good_build.queueBuild()
1178+ buildqueue.builder = good_builder
1179+
1180+ removeSecurityProxy(good_build.build_farm_job).date_started = UTC_NOW
1181+
1182+ # The good builder requests information from a successful build,
1183+ # and up receiving it, updates the build's metadata.
1184+ # Our dependencies string goes into the build, and its
1185+ # date_finished will be set.
1186+ dependencies = self.factory.getUniqueString()
1187+ PackageBuild.storeBuildInfo(
1188+ good_build, None, {'dependencies': dependencies})
1189+ clock.advance(1)
1190+
1191+ # The broken scanner experiences a failure before the good
1192+ # scanner is receiving its data. This aborts the ongoing
1193+ # transaction.
1194+ # As a somewhat weird example, if the builder changed its own
1195+ # title, that change will be rolled back.
1196+ original_broken_builder_title = broken_builder.title
1197+ broken_builder.title = self.factory.getUniqueString()
1198+ broken_scanner._scanFailed(self.makeFakeFailure())
1199+
1200+ # The work done by the good scanner is retained. The
1201+ # storeBuildInfo code committed it.
1202+ self.assertEqual(dependencies, good_build.dependencies)
1203+ self.assertIsNot(None, good_build.date_finished)
1204+
1205+ # The work done by the broken scanner is rolled back.
1206+ self.assertEqual(original_broken_builder_title, broken_builder.title)
1207+
1208
1209 class TestCancellationChecking(TestCaseWithFactory):
1210 """Unit tests for the checkCancellation method."""
1211
1212=== modified file 'lib/lp/buildmaster/tests/test_packagebuild.py'
1213--- lib/lp/buildmaster/tests/test_packagebuild.py 2011-09-10 08:59:08 +0000
1214+++ lib/lp/buildmaster/tests/test_packagebuild.py 2011-12-08 16:19:26 +0000
1215@@ -1,4 +1,4 @@
1216-# Copyright 2010 Canonical Ltd. This software is licensed under the
1217+# Copyright 2010-2011 Canonical Ltd. This software is licensed under the
1218 # GNU Affero General Public License version 3 (see the file LICENSE).
1219
1220 """Tests for `IPackageBuild`."""
1221@@ -22,9 +22,7 @@
1222 LaunchpadFunctionalLayer,
1223 LaunchpadZopelessLayer,
1224 )
1225-from lp.archiveuploader.uploadprocessor import (
1226- parse_build_upload_leaf_name,
1227- )
1228+from lp.archiveuploader.uploadprocessor import parse_build_upload_leaf_name
1229 from lp.buildmaster.enums import (
1230 BuildFarmJobType,
1231 BuildStatus,
1232@@ -34,12 +32,11 @@
1233 IPackageBuildSet,
1234 IPackageBuildSource,
1235 )
1236+from lp.buildmaster.model.builder import BuilderSlave
1237 from lp.buildmaster.model.buildfarmjob import BuildFarmJob
1238 from lp.buildmaster.model.packagebuild import PackageBuild
1239 from lp.buildmaster.tests.mock_slaves import WaitingSlave
1240-from lp.registry.interfaces.pocket import (
1241- PackagePublishingPocket,
1242- )
1243+from lp.registry.interfaces.pocket import PackagePublishingPocket
1244 from lp.testing import (
1245 login,
1246 login_person,
1247@@ -285,10 +282,7 @@
1248
1249
1250 class TestHandleStatusMixin:
1251- """Tests for `IPackageBuild`s handleStatus method.
1252-
1253- This should be run with a Trial TestCase.
1254- """
1255+ """Tests for `IPackageBuild`s handleStatus method."""
1256
1257 layer = LaunchpadZopelessLayer
1258
1259@@ -307,7 +301,7 @@
1260 self.build.buildqueue_record.setDateStarted(UTC_NOW)
1261 self.slave = WaitingSlave('BuildStatus.OK')
1262 self.slave.valid_file_hashes.append('test_file_hash')
1263- builder.setSlaveForTesting(self.slave)
1264+ self.patch(BuilderSlave, 'makeBuilderSlave', FakeMethod(self.slave))
1265
1266 # We overwrite the buildmaster root to use a temp directory.
1267 tempdir = tempfile.mkdtemp()
1268@@ -348,7 +342,7 @@
1269 def got_status(ignored):
1270 self.assertEqual(BuildStatus.FAILEDTOUPLOAD, self.build.status)
1271 self.assertResultCount(0, "failed")
1272- self.assertIdentical(None, self.build.buildqueue_record)
1273+ self.assertIs(None, self.build.buildqueue_record)
1274
1275 d = self.build.handleStatus('OK', None, {
1276 'filemap': {'/tmp/myfile.py': 'test_file_hash'},
1277@@ -390,14 +384,10 @@
1278
1279 def got_status(ignored):
1280 if expected_notification:
1281- self.failIf(
1282- len(pop_notifications()) == 0,
1283- "No notifications received")
1284+ self.assertNotEqual(
1285+ 0, len(pop_notifications()), "No notifications received.")
1286 else:
1287- self.failIf(
1288- len(pop_notifications()) > 0,
1289- "Notifications received")
1290-
1291+ self.assertContentEqual([], pop_notifications())
1292 d = self.build.handleStatus(status, None, {})
1293 return d.addCallback(got_status)
1294
1295
1296=== modified file 'lib/lp/code/model/tests/test_sourcepackagerecipebuild.py'
1297--- lib/lp/code/model/tests/test_sourcepackagerecipebuild.py 2011-11-09 11:36:44 +0000
1298+++ lib/lp/code/model/tests/test_sourcepackagerecipebuild.py 2011-12-08 16:19:26 +0000
1299@@ -13,8 +13,8 @@
1300
1301 from pytz import utc
1302 from storm.locals import Store
1303+from testtools.deferredruntest import AsynchronousDeferredRunTest
1304 import transaction
1305-from twisted.trial.unittest import TestCase as TrialTestCase
1306 from zope.component import getUtility
1307 from zope.security.proxy import removeSecurityProxy
1308
1309@@ -28,6 +28,7 @@
1310 from lp.app.errors import NotFoundError
1311 from lp.buildmaster.enums import BuildStatus
1312 from lp.buildmaster.interfaces.buildqueue import IBuildQueue
1313+from lp.buildmaster.model.builder import BuilderSlave
1314 from lp.buildmaster.model.buildfarmjob import BuildFarmJob
1315 from lp.buildmaster.model.packagebuild import PackageBuild
1316 from lp.buildmaster.tests.mock_slaves import WaitingSlave
1317@@ -588,14 +589,11 @@
1318 self.assertEquals(0, len(notifications))
1319
1320
1321-class TestBuildNotifications(TrialTestCase):
1322+class TestBuildNotifications(TestCaseWithFactory):
1323
1324 layer = LaunchpadZopelessLayer
1325
1326- def setUp(self):
1327- super(TestBuildNotifications, self).setUp()
1328- from lp.testing.factory import LaunchpadObjectFactory
1329- self.factory = LaunchpadObjectFactory()
1330+ run_tests_with = AsynchronousDeferredRunTest.make_factory(timeout=20)
1331
1332 def prepare_build(self, fake_successful_upload=False):
1333 queue_record = self.factory.makeSourcePackageRecipeBuildJob()
1334@@ -608,7 +606,7 @@
1335 result=True)
1336 queue_record.builder = self.factory.makeBuilder()
1337 slave = WaitingSlave('BuildStatus.OK')
1338- queue_record.builder.setSlaveForTesting(slave)
1339+ self.patch(BuilderSlave, 'makeBuilderSlave', FakeMethod(slave))
1340 return build
1341
1342 def assertDeferredNotifyCount(self, status, build, expected_count):
1343@@ -666,5 +664,5 @@
1344
1345
1346 class TestHandleStatusForSPRBuild(
1347- MakeSPRecipeBuildMixin, TestHandleStatusMixin, TrialTestCase):
1348+ MakeSPRecipeBuildMixin, TestHandleStatusMixin, TestCaseWithFactory):
1349 """IPackageBuild.handleStatus works with SPRecipe builds."""
1350
1351=== modified file 'lib/lp/services/database/transaction_policy.py'
1352--- lib/lp/services/database/transaction_policy.py 2011-09-29 05:49:18 +0000
1353+++ lib/lp/services/database/transaction_policy.py 2011-12-08 16:19:26 +0000
1354@@ -133,8 +133,11 @@
1355 def _isInTransaction(self):
1356 """Is our store currently in a transaction?"""
1357 pg_connection = self.store._connection._raw_connection
1358- status = pg_connection.get_transaction_status()
1359- return status != TRANSACTION_STATUS_IDLE
1360+ if pg_connection is None:
1361+ return False
1362+ else:
1363+ status = pg_connection.get_transaction_status()
1364+ return status != TRANSACTION_STATUS_IDLE
1365
1366 def _checkNoTransaction(self, error_msg):
1367 """Verify that no transaction is ongoing.
1368
1369=== modified file 'lib/lp/soyuz/tests/test_binarypackagebuild.py'
1370--- lib/lp/soyuz/tests/test_binarypackagebuild.py 2011-11-09 11:36:44 +0000
1371+++ lib/lp/soyuz/tests/test_binarypackagebuild.py 2011-12-08 16:19:26 +0000
1372@@ -1,4 +1,4 @@
1373-# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
1374+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
1375 # GNU Affero General Public License version 3 (see the file LICENSE).
1376
1377 """Test Build features."""
1378@@ -10,7 +10,7 @@
1379
1380 import pytz
1381 from storm.store import Store
1382-from twisted.trial.unittest import TestCase as TrialTestCase
1383+from testtools.deferredruntest import AsynchronousDeferredRunTest
1384 from zope.component import getUtility
1385 from zope.security.proxy import removeSecurityProxy
1386
1387@@ -25,6 +25,7 @@
1388 from lp.buildmaster.interfaces.builder import IBuilderSet
1389 from lp.buildmaster.interfaces.buildqueue import IBuildQueue
1390 from lp.buildmaster.interfaces.packagebuild import IPackageBuild
1391+from lp.buildmaster.model.builder import BuilderSlave
1392 from lp.buildmaster.model.buildqueue import BuildQueue
1393 from lp.buildmaster.tests.mock_slaves import WaitingSlave
1394 from lp.buildmaster.tests.test_packagebuild import (
1395@@ -53,6 +54,7 @@
1396 logout,
1397 TestCaseWithFactory,
1398 )
1399+from lp.testing.fakemethod import FakeMethod
1400
1401
1402 class TestBinaryPackageBuild(TestCaseWithFactory):
1403@@ -522,7 +524,9 @@
1404 self.build = gedit_src_hist.createMissingBuilds()[0]
1405
1406 self.builder = self.factory.makeBuilder()
1407- self.builder.setSlaveForTesting(WaitingSlave('BuildStatus.OK'))
1408+ self.patch(
1409+ BuilderSlave, 'makeBuilderSlave',
1410+ FakeMethod(WaitingSlave('BuildStatus.OK')))
1411 self.build.buildqueue_record.markAsBuilding(self.builder)
1412
1413 def testDependencies(self):
1414@@ -568,9 +572,12 @@
1415
1416
1417 class TestHandleStatusForBinaryPackageBuild(
1418- MakeBinaryPackageBuildMixin, TestHandleStatusMixin, TrialTestCase):
1419+ MakeBinaryPackageBuildMixin, TestHandleStatusMixin, TestCaseWithFactory):
1420 """IPackageBuild.handleStatus works with binary builds."""
1421
1422+ layer = LaunchpadZopelessLayer
1423+ run_tests_with = AsynchronousDeferredRunTest.make_factory(timeout=20)
1424+
1425
1426 class TestBinaryPackageBuildWebservice(TestCaseWithFactory):
1427 """Test cases for BinaryPackageBuild on the webservice.
1428
1429=== modified file 'lib/lp/translations/model/translationtemplatesbuildbehavior.py'
1430--- lib/lp/translations/model/translationtemplatesbuildbehavior.py 2011-10-19 08:00:29 +0000
1431+++ lib/lp/translations/model/translationtemplatesbuildbehavior.py 2011-12-08 16:19:26 +0000
1432@@ -1,4 +1,4 @@
1433-# Copyright 2010 Canonical Ltd. This software is licensed under the
1434+# Copyright 2010-2011 Canonical Ltd. This software is licensed under the
1435 # GNU Affero General Public License version 3 (see the file LICENSE).
1436
1437 """An `IBuildFarmJobBehavior` for `TranslationTemplatesBuildJob`.
1438@@ -13,9 +13,10 @@
1439
1440 import datetime
1441 import os
1442+import tempfile
1443+
1444 import pytz
1445-import tempfile
1446-
1447+import transaction
1448 from twisted.internet import defer
1449 from zope.component import getUtility
1450 from zope.interface import implements
1451@@ -28,6 +29,7 @@
1452 )
1453 from lp.buildmaster.model.buildfarmjobbehavior import BuildFarmJobBehaviorBase
1454 from lp.registry.interfaces.productseries import IProductSeriesSet
1455+from lp.services.database.transaction_policy import DatabaseTransactionPolicy
1456 from lp.translations.interfaces.translationimportqueue import (
1457 ITranslationImportQueue,
1458 )
1459@@ -132,13 +134,16 @@
1460 def storeBuildInfo(build, queue_item, build_status):
1461 """See `IPackageBuild`."""
1462 def got_log(lfa_id):
1463- build.build.log = lfa_id
1464- build.build.builder = queue_item.builder
1465- build.build.date_started = queue_item.date_started
1466- # XXX cprov 20060615 bug=120584: Currently buildduration includes
1467- # the scanner latency, it should really be asking the slave for
1468- # the duration spent building locally.
1469- build.build.date_finished = datetime.datetime.now(pytz.UTC)
1470+ transaction.commit()
1471+ with DatabaseTransactionPolicy(read_only=False):
1472+ build.build.log = lfa_id
1473+ build.build.builder = queue_item.builder
1474+ build.build.date_started = queue_item.date_started
1475+ # XXX cprov 20060615 bug=120584: Currently buildduration
1476+ # includes the scanner latency. It should really be
1477+ # asking the slave for the duration spent building locally.
1478+ build.build.date_finished = datetime.datetime.now(pytz.UTC)
1479+ transaction.commit()
1480
1481 d = build.getLogFromSlave(build, queue_item)
1482 return d.addCallback(got_log)