Merge lp:~jelmer/launchpad/681974-building-commit into lp:launchpad

Proposed by Jelmer Vernooij on 2010-11-29
Status: Merged
Approved by: Jelmer Vernooij on 2010-11-29
Approved revision: no longer in the source branch.
Merged at revision: 12024
Proposed branch: lp:~jelmer/launchpad/681974-building-commit
Merge into: lp:launchpad
Diff against target: 39 lines (+5/-6)
2 files modified
lib/lp/buildmaster/model/builder.py (+0/-1)
lib/lp/buildmaster/model/packagebuild.py (+5/-5)
To merge this branch: bzr merge lp:~jelmer/launchpad/681974-building-commit
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) code 2010-11-29 Approve on 2010-11-29
Review via email: mp+42081@code.launchpad.net

Commit Message

[r=jtv][ui=none][bug=681974] Commit rather than flush to prevent races between buildd manager and archive uploader.

Description of the Change

There is a race condition between the buildd manager and the upload processor, which we seem to've hit a couple of times. The buildd manager in some cases moves the file before it commits the change to the job status, causing the upload processor to blow up.

We did originally consider this but did a database flush rather than a commit. This fixes that.

I haven't added any new tests since I couldn't think of a good way to test this. I'm interested in any ideas on how to test this.

To post a comment you must log in.
Jeroen T. Vermeulen (jtv) wrote :

(18:00:31) jtv: good cover letter. I'm surprised it's the buildd manager confusing the upload processor by moving the file before committing, and not another way around.
(18:03:44) jelmer: This is happening with binary builds that the buildd manager has fetched from the builders.
(18:04:25) jelmer: There is another archive uploader instance that processes the source uploads from users, which the buildd manager then sends off to the builders.
(18:08:17) jtv: ah so the data flows in the opposite direction from what I thought.
(18:09:45) jtv: Depending on what the upload processor's transactions look like and even which isolation level it uses, there may _conceivably_ still be race conditions but even then this should narrow it own. And I'm not assuming that you got it wrong anyway. :)
(18:10:06) jtv: I'm also assuming that you're not committing in some untenable state, and therefore r=jtv.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/buildmaster/model/builder.py'
2--- lib/lp/buildmaster/model/builder.py 2010-11-29 14:51:07 +0000
3+++ lib/lp/buildmaster/model/builder.py 2010-12-06 04:48:14 +0000
4@@ -19,7 +19,6 @@
5 import socket
6 import tempfile
7 import transaction
8-import urllib2
9 import xmlrpclib
10
11 from sqlobject import (
12
13=== modified file 'lib/lp/buildmaster/model/packagebuild.py'
14--- lib/lp/buildmaster/model/packagebuild.py 2010-11-22 20:53:55 +0000
15+++ lib/lp/buildmaster/model/packagebuild.py 2010-12-06 04:48:14 +0000
16@@ -360,18 +360,18 @@
17 if not os.path.exists(target_dir):
18 os.mkdir(target_dir)
19
20- # Flush so there are no race conditions with archiveuploader about
21+ # Release the builder for another job.
22+ d = self.buildqueue_record.builder.cleanSlave()
23+
24+ # Commit so there are no race conditions with archiveuploader about
25 # self.status.
26- Store.of(self).flush()
27+ Store.of(self).commit()
28
29 # Move the directory used to grab the binaries into
30 # the incoming directory so the upload processor never
31 # sees half-finished uploads.
32 os.rename(grab_dir, os.path.join(target_dir, upload_leaf))
33
34- # Release the builder for another job.
35- d = self.buildqueue_record.builder.cleanSlave()
36-
37 # Remove BuildQueue record.
38 return d.addCallback(
39 lambda x:self.buildqueue_record.destroySelf())