Merge lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck into lp:launchpad

Proposed by Michael Nelson
Status: Merged
Approved by: Brad Crittenden
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck
Merge into: lp:launchpad
Diff against target: 87 lines (+15/-16)
2 files modified
lib/lp/buildmaster/buildergroup.py (+2/-8)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+13/-8)
To merge this branch: bzr merge lp:~michael.nelson/launchpad/499095-build-retry-depwait-stuck
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+16865@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :

= Summary =
This is a fix for the 2nd cause of bug 499095, where we sometimes get builds on builders with inconsistent states (date_started is not set because the job was already in the RUNNING state).

For details of the cause, see the comments:
https://bugs.edge.launchpad.net/soyuz/+bug/499095/comments/4
https://bugs.edge.launchpad.net/soyuz/+bug/499095/comments/11

== Proposed fix ==

Ensure that the job status is correctly reset back to WAITING after buildStatus_GIVENBACK() and buildStatus_BUILDERFAIL().

== Pre-implementation notes ==

Worked with wgrant to diagnose and fix this second cause.

We really need to update the bm logs so we can see the relevant info. Much of the bm logging is done as warning/debug which doesn't seem to be logged currently.

== Implementation details ==

Nothing weird.

== Tests ==

bin/test -vvt buildd-slavescanner.txt

== Demo and Q/A ==

We can QA this on dogfood (wgrant already QA'd the core of the fix locally).

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/buildmaster/buildergroup.py
  lib/lp/soyuz/doc/buildd-slavescanner.txt

Revision history for this message
Brad Crittenden (bac) wrote :

Changes look reasonable Michael. Thanks for the fix.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/buildmaster/buildergroup.py'
2--- lib/lp/buildmaster/buildergroup.py 2009-12-03 15:07:17 +0000
3+++ lib/lp/buildmaster/buildergroup.py 2010-01-05 16:52:18 +0000
4@@ -521,10 +521,8 @@
5 "for its status"))
6 # simply reset job
7 build = getUtility(IBuildSet).getByQueueEntry(queueItem)
8- build.buildstate = BuildStatus.NEEDSBUILD
9 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)
10- queueItem.builder = None
11- queueItem.setDateStarted(None)
12+ queueItem.reset()
13
14 def buildStatus_GIVENBACK(self, queueItem, librarian, buildid,
15 filemap=None, dependencies=None):
16@@ -537,16 +535,12 @@
17 self.logger.warning("***** %s is GIVENBACK by %s *****"
18 % (buildid, queueItem.builder.name))
19 build = getUtility(IBuildSet).getByQueueEntry(queueItem)
20- build.buildstate = BuildStatus.NEEDSBUILD
21 self.storeBuildInfo(queueItem, librarian, buildid, dependencies)
22 # XXX cprov 2006-05-30: Currently this information is not
23 # properly presented in the Web UI. We will discuss it in
24 # the next Paris Summit, infinity has some ideas about how
25 # to use this content. For now we just ensure it's stored.
26 queueItem.builder.cleanSlave()
27- queueItem.builder = None
28- queueItem.setDateStarted(None)
29- queueItem.logtail = None
30- queueItem.lastscore = 0
31+ queueItem.reset()
32
33
34
35=== modified file 'lib/lp/soyuz/doc/buildd-slavescanner.txt'
36--- lib/lp/soyuz/doc/buildd-slavescanner.txt 2009-12-14 20:36:19 +0000
37+++ lib/lp/soyuz/doc/buildd-slavescanner.txt 2010-01-05 16:52:18 +0000
38@@ -311,11 +311,10 @@
39 >>> build.buildstate.title
40 'Chroot problem'
41
42-WAITING - BUILDERFAIL -> builder has failed by internal error, job is
43-available for next build round:
44+WAITING - BUILDERFAIL -> builder has failed by internal error, job is available for next build round:
45
46 >>> bqItem6 = a_build.createBuildQueueEntry()
47- >>> setupBuildQueue(bqItem6, a_builder)
48+ >>> bqItem6.markAsBuilding(a_builder)
49 >>> last_stub_mail_count = len(stub.test_emails)
50
51 Create a mock slave so the builder can operate - one with a builder error.
52@@ -336,8 +335,11 @@
53 >>> check_mail_sent(last_stub_mail_count)
54 False
55 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem6)
56- >>> build.buildstate.title
57- 'Needs building'
58+ >>> print build.buildstate.title
59+ Needs building
60+ >>> job = bqItem6.specific_job.job
61+ >>> print job.status.title
62+ Waiting
63
64 Cleanup in preparation for the next test:
65
66@@ -556,7 +558,7 @@
67 WAITING -> GIVENBACK - slave requested build record to be rescheduled.
68
69 >>> bqItem11 = a_build.createBuildQueueEntry()
70- >>> setupBuildQueue(bqItem11, a_builder)
71+ >>> bqItem11.markAsBuilding(a_builder)
72 >>> last_stub_mail_count = len(stub.test_emails)
73
74 Create a mock slave so the builder gets the right responses for this test.
75@@ -580,8 +582,11 @@
76 >>> check_mail_sent(last_stub_mail_count)
77 False
78 >>> build = getUtility(IBuildSet).getByQueueEntry(bqItem11)
79- >>> build.buildstate.title
80- 'Needs building'
81+ >>> print build.buildstate.title
82+ Needs building
83+ >>> job = bqItem11.specific_job.job
84+ >>> print job.status.title
85+ Waiting
86
87 Cleanup in preparation for the next test:
88