Merge lp:~julian-edwards/launchpad/builderslave-resume into lp:launchpad
- builderslave-resume
- Merge into devel
Status: | Merged | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Approved by: | Julian Edwards | ||||||||||||||||||||||||||||||||
Approved revision: | no longer in the source branch. | ||||||||||||||||||||||||||||||||
Merged at revision: | 11801 | ||||||||||||||||||||||||||||||||
Proposed branch: | lp:~julian-edwards/launchpad/builderslave-resume | ||||||||||||||||||||||||||||||||
Merge into: | lp:launchpad | ||||||||||||||||||||||||||||||||
Diff against target: |
7192 lines (+2211/-3509) 24 files modified
lib/lp/buildmaster/doc/builder.txt (+2/-118) lib/lp/buildmaster/interfaces/builder.py (+83/-62) lib/lp/buildmaster/manager.py (+205/-469) lib/lp/buildmaster/model/builder.py (+240/-224) lib/lp/buildmaster/model/buildfarmjobbehavior.py (+60/-52) lib/lp/buildmaster/model/packagebuild.py (+6/-0) lib/lp/buildmaster/tests/mock_slaves.py (+157/-32) lib/lp/buildmaster/tests/test_builder.py (+582/-154) lib/lp/buildmaster/tests/test_manager.py (+248/-782) lib/lp/buildmaster/tests/test_packagebuild.py (+12/-0) lib/lp/code/model/recipebuilder.py (+32/-28) lib/lp/soyuz/browser/tests/test_builder_views.py (+1/-1) lib/lp/soyuz/doc/buildd-dispatching.txt (+0/-371) lib/lp/soyuz/doc/buildd-slavescanner.txt (+0/-876) lib/lp/soyuz/model/binarypackagebuildbehavior.py (+59/-41) lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+290/-8) lib/lp/soyuz/tests/test_doc.py (+0/-6) lib/lp/testing/factory.py (+8/-2) lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+0/-114) lib/lp/translations/model/translationtemplatesbuildbehavior.py (+20/-14) lib/lp/translations/stories/buildfarm/xx-build-summary.txt (+1/-1) lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+202/-153) lib/lp_sitecustomize.py (+3/-0) utilities/migrater/file-ownership.txt (+0/-1) |
||||||||||||||||||||||||||||||||
To merge this branch: | bzr merge lp:~julian-edwards/launchpad/builderslave-resume | ||||||||||||||||||||||||||||||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jonathan Lange (community) | Approve | ||
Review via email: mp+36351@code.launchpad.net |
Commit message
Description of the change
This is the integration branch for the "fully asynchronous build manager" changes.
Jonathan Lange (jml) wrote : | # |
Jonathan Lange (jml) : | # |
Jonathan Lange (jml) wrote : | # |
Branch has been pushed to since review. Definitely not approved.
Julian Edwards (julian-edwards) wrote : | # |
This is now the integration branch. WIP.
Jonathan Lange (jml) wrote : | # |
Some quick comments, almost all of which are shallow. This doesn't count as a proper code review. Importantly, I haven't rigorously checked that the deleted tests have been properly replaced, and I haven't checked a lot of the Deferred-
Will respond soon with a review of the new manager & test_manager.
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -154,11 +154,6 @@ class IBuilder(
>
Could you please go through this interface and:
* Make sure that all methods that return Deferreds are documented as doing
so.
* Possibly, group all of the methods that return Deferreds. I think it will
make the interface clearer.
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
...
> @@ -125,24 +112,17 @@ class BuilderSlave(
> # many false positives in your test run and will most likely break
> # production.
>
> # XXX: Have a documented interface for the XML-RPC server:
> # - what methods
> # - what return values expected
> # - what faults
> # (see XMLRPCBuildDSlave in lib/canonical/
>
I've filed bug https:/
don't think having the XXX here helps.
> # XXX: Once we have a client object with a defined, tested interface, we
> # should make a test double that doesn't do any XML-RPC and can be used to
> # make testing easier & tests faster.
>
This XXX can be safely deleted, I think.
> def getFile(self, sha_sum):
> """Construct a file-like object to return the named file."""
> + # XXX: Change this to do non-blocking IO.
Please file a bug.
...
> + # Twisted API requires string but the configuration provides unicode.
> + resume_argv = [str(term) for term in resume_
It's more explicit to do .encode('utf-8'), rather than str().
> def updateStatus(self, logger=None):
> """See `IBuilder`."""
> - updateBuilderSt
> + # updateBuilderStatus returns a Deferred if the builder timed
> + # out, otherwise it returns a thing that we can wrap in a
> + # defer.succeed. maybeDeferred() handles this for us.
> + return defer.maybeDefe
>
This comment seems bogus. As far as I can tell, updateBuilderStatus always
returns a Deferred.
> - if builder_
> + d = self.resumeSlav
> + return d
> + else:
> + # XXX: This should really let the failure bubble up to the
> + # scan() method that does the failure counting.
> # Mark builder as 'failed'.
Are you going to fix this in this branch or in another? If so, when?
> logger.warn(
> - "Disabling builder: %s -- %s" % (self.url, error_message),
> - ...
Jonathan Lange (jml) wrote : | # |
Hey Julian,
The new manager is far, far more readable than before. The hard work has paid off.
I mention a lot of small things in the comments below. I'd really appreciate it if you could address them all, since now is the best opportunity we'll have in a while to make this code comprehensible to others.
> # Copyright 2009 Canonical Ltd. This software is licensed under the
> # GNU Affero General Public License version 3 (see the file LICENSE).
>
> """Soyuz buildd slave manager logic."""
>
> __metaclass__ = type
>
> __all__ = [
> 'BuilddManager',
> 'BUILDD_
> 'buildd_
> ]
>
> import logging
>
> import transaction
> from twisted.application import service
> from twisted.internet import (
> defer,
> reactor,
> )
> from twisted.
> from twisted.python import log
> from zope.component import getUtility
>
> from lp.buildmaster.
> from lp.buildmaster.
> BuildBehaviorMi
> )
> from lp.buildmaster.
> BuildDaemonError,
> BuildSlaveFailure,
> CannotBuild,
> CannotResumeHost,
> )
>
>
> BUILDD_
>
>
> buildd_
> 'ensurepresent': True,
> 'build': 'BuilderStatus.
> }
>
You can delete this now. Yay. (Don't forget the __all__ too).
> class SlaveScanner:
> """A manager for a single builder."""
>
> SCAN_INTERVAL = 5
>
Can you please add a comment explaining what this means, what unit it's in,
and hinting at why 5 is a good number for it?
> def startCycle(self):
> """Scan the builder and dispatch to it or deal with failures."""
> self.loop = LoopingCall(
> self.stopping_
> return self.stopping_
>
> def _startCycle(self):
> # Same as _startCycle but the next cycle is not scheduled. This
> # is so tests can initiate a single scan.
This comment is obsolete. Also, there's probably a better name than
"_startCycle", since this is pretty much doing the scan. Perhaps 'oneCycle'.
> def _scanFailed(self, failure):
> # Trap known exceptions and print a message without a
> # stack trace in that case, or if we don't know about it,
> # include the trace.
>
This comment is also obsolete. Although the traceback/
still here, it's hardly the point of the method.
> # Paranoia.
> transaction.abort()
>
Can you please explain in the comment exactly what you are being paranoid
about?
> error_message = failure.
> if failure.check(
> BuildSlaveFailure, CannotBuild, BuildBehaviorMi
> CannotResumeHost, BuildDaemonError):
> self.logger.
> else:
> self.logger.
> (failure.
>
> builder = get_builder(
>
Shouldn...
Julian Edwards (julian-edwards) wrote : | # |
On Monday 18 October 2010 11:59:57 Jonathan Lange wrote:
> Some quick comments, almost all of which are shallow. This doesn't count as
> a proper code review. Importantly, I haven't rigorously checked that the
> deleted tests have been properly replaced, and I haven't checked a lot of
> the Deferred-
>
> Will respond soon with a review of the new manager & test_manager.
Cheers, I replied inline.
>
> > === modified file 'lib/lp/
> > --- lib/lp/
> > +++ lib/lp/
>
> > @@ -154,11 +154,6 @@ class IBuilder(
> Could you please go through this interface and:
>
> * Make sure that all methods that return Deferreds are documented as
> doing so.
>
> * Possibly, group all of the methods that return Deferreds. I think it
> will make the interface clearer.
I've done both of these are done as you suggest.
> > === modified file 'lib/lp/
> > --- lib/lp/
> > +++ lib/lp/
>
> ...
>
> > @@ -125,24 +112,17 @@ class BuilderSlave(
> > # many false positives in your test run and will most likely break
> > # production.
> >
> >
> > # XXX: Have a documented interface for the XML-RPC server:
> > # - what methods
> > # - what return values expected
> > # - what faults
> > # (see XMLRPCBuildDSlave in lib/canonical/
>
> I've filed bug https:/
> I don't think having the XXX here helps.
Right, I've deleted it.
>
> > # XXX: Once we have a client object with a defined, tested
> > interface, we # should make a test double that doesn't do any
> > XML-RPC and can be used to # make testing easier & tests faster.
>
> This XXX can be safely deleted, I think.
Done.
>
> > def getFile(self, sha_sum):
> > """Construct a file-like object to return the named file."""
> >
> > + # XXX: Change this to do non-blocking IO.
>
> Please file a bug.
https:/
>
> ...
>
> > + # Twisted API requires string but the configuration provides
> > unicode. + resume_argv = [str(term) for term in
> > resume_
>
> It's more explicit to do .encode('utf-8'), rather than str().
Not my code, but I've done that.
>
> > def updateStatus(self, logger=None):
> > """See `IBuilder`."""
> >
> > - updateBuilderSt
> > + # updateBuilderStatus returns a Deferred if the builder timed
> > + # out, otherwise it returns a thing that we can wrap in a
> > + # defer.succeed. maybeDeferred() handles this for us.
> > + return defer.maybeDefe
>
> This comment seems bogus. As far as I can tell, updateBuilderStatus always
> returns a Deferred.
It does, so I've removed the comment and the maybeDeferred().
>
> > - if builder_shoul...
Julian Edwards (julian-edwards) wrote : | # |
On Monday 18 October 2010 13:21:45 you wrote:
> Review: Needs Fixing
> Hey Julian,
>
> The new manager is far, far more readable than before. The hard work has
> paid off.
\o/
> I mention a lot of small things in the comments below. I'd really
> appreciate it if you could address them all, since now is the best
> opportunity we'll have in a while to make this code comprehensible to
> others.
I shall endeavour to do so.
> > # Copyright 2009 Canonical Ltd. This software is licensed under the
> > # GNU Affero General Public License version 3 (see the file LICENSE).
> >
> > """Soyuz buildd slave manager logic."""
> >
> > __metaclass__ = type
> >
> > __all__ = [
> >
> > 'BuilddManager',
> > 'BUILDD_
> > 'buildd_
> > ]
> >
> > import logging
> >
> > import transaction
> > from twisted.application import service
> > from twisted.internet import (
> >
> > defer,
> > reactor,
> > )
> >
> > from twisted.
> > from twisted.python import log
> > from zope.component import getUtility
> >
> > from lp.buildmaster.
> > from lp.buildmaster.
> >
> > BuildBehaviorMi
> > )
> >
> > from lp.buildmaster.
> >
> > BuildDaemonError,
> > BuildSlaveFailure,
> > CannotBuild,
> > CannotResumeHost,
> > )
> >
> > BUILDD_
> >
> >
> > buildd_
> >
> > 'ensurepresent': True,
> > 'build': 'BuilderStatus.
> > }
>
> You can delete this now. Yay. (Don't forget the __all__ too).
Right!
>
> > class SlaveScanner:
> > """A manager for a single builder."""
> >
> > SCAN_INTERVAL = 5
>
> Can you please add a comment explaining what this means, what unit it's in,
> and hinting at why 5 is a good number for it?
Done.
>
> > def startCycle(self):
> > """Scan the builder and dispatch to it or deal with failures."""
> > self.loop = LoopingCall(
> > self.stopping_
> > return self.stopping_
> >
> > def _startCycle(self):
> > # Same as _startCycle but the next cycle is not scheduled. This
> > # is so tests can initiate a single scan.
>
> This comment is obsolete. Also, there's probably a better name than
> "_startCycle", since this is pretty much doing the scan. Perhaps
> 'oneCycle'.
I named it singleCycle()
>
> > def _scanFailed(self, failure):
> > # Trap known exceptions and print a message without a
> > # stack trace in that case, or if we don't know about it,
> > # include the trace.
>
> This comment is also obsolete. Although the traceback/
> is still here, it's hardly the point of the method.
Spruced up as discussed in IRC.
>
> > # Paranoia.
> > transaction.abort()
>
> Can you please explain in the comment exactly what you are being paranoid
> about?
Yup, done.
>
> > error_message = failure.
> > if failure.check(
...
Jonathan Lange (jml) wrote : | # |
On Mon, Oct 18, 2010 at 4:49 PM, Julian Edwards
<email address hidden> wrote:
> On Monday 18 October 2010 13:21:45 you wrote:
>> Review: Needs Fixing
>> Hey Julian,
>>
>> The new manager is far, far more readable than before. Â The hard work has
>> paid off.
>
> \o/
>
>> I mention a lot of small things in the comments below. Â I'd really
>> appreciate it if you could address them all, since now is the best
>> opportunity we'll have in a while to make this code comprehensible to
>> others.
>
> I shall endeavour to do so.
>
...
>>
>> > class SlaveScanner:
>> > Â Â """A manager for a single builder."""
>> >
>> > Â Â SCAN_INTERVAL = 5
>>
>> Can you please add a comment explaining what this means, what unit it's in,
>> and hinting at why 5 is a good number for it?
>
> Done.
>
...
>>
>> > Â Â Â Â """
>> > Â Â Â Â # We need to re-fetch the builder object on each cycle as the
>> > Â Â Â Â # Storm store is invalidated over transaction boundaries.
>>
>> This method is complicated enough that I think it would benefit from a
>> prose summary of what it does.
>>
>> For example:
>>
>> Â If the builder is OK [XXX - I don't know what this actually means - jml],
>> Â then update its status [XXX - this seems redundant, didn't we just check
>> Â that it was OK? - jml]. Â If we think there's an active build on the
>> Â builder, then check to see if it's done (builder.
>> Â is done, then check that it's available and not in manual mode, then
>> Â dispatch the job to the build.
>>
>> Well, that's my best guess. Â I think it's only worth describing the happy
>> case, since the unusual cases will be easy enough to follow in the code.
>
> I've written something - let me know if you think it's easy enough to follow.
> It does require some knowledge of how the farm works and I don't think code
> comments are the best place for explaining that.
>
It's good. Much more helpful, thanks.
>> > class BuilddManager(
>> > Â Â """Main Buildd Manager service class."""
>> >
>> > Â Â def __init__(self, clock=None):
>> > Â Â Â Â self.builder_slaves = []
>> > Â Â Â Â self.logger = self._setupLogger()
>>
>> Given that _setupLogger changes global state, it's better to put it in
>> startService.
>
> Nocando (which is a bit near Katmandu) - the NewBuildersScanner needs it
> there. Â (More below on this section of code)
>
>>
>> > Â Â Â Â self.new_
>> >
>> > Â Â Â Â Â Â manager=self, clock=clock)
>> >
>> > Â Â def _setupLogger(self):
>> > Â Â Â Â """Setup a 'slave-scanner' logger that redirects to twisted.
>>
>> FWIW, "Setup" is a noun. "Set up" is a verb.
>
> Setup is not a word, it's a typo.
>
>> > class TestSlaveScanne
>> > Â Â """Tests `SlaveScanner.scan` method.
>> >
>> > Â Â This method uses the old framework for scanning and dispatching
>> > Â Â builds. """
>> > Â Â layer = LaunchpadZopele
>> >
>> > Â Â def setUp(self):
>> > Â Â Â Â """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest.
>> >
>> > Â Â Â Â Also adjust the sampledata in a way a build can be dispatched to
>> > Â Â Â Â 'bob' builder.
>> > Â Â Â Â """
>> > Â Â Â Â from lp.soyuz.
Jonathan Lange (jml) wrote : | # |
Add the XXXs as recommended in the last review comment and please land!
Jonathan Lange (jml) wrote : | # |
Oh, forgot about the two WIP things mentioned in your reply.
Jonathan Lange (jml) wrote : | # |
The new code looks good. You should add an explanation about why you don't disable builders as soon as they've got a failure, i.e. why the threshold exists at all.
Also, in my previous review just prior to commit r11699, I suggested adding several XXXs. Could you please do that.
Do both of these things, and then land.
Julian Edwards (julian-edwards) wrote : | # |
On Tuesday 19 October 2010 16:39:43 you wrote:
> Review: Approve
> The new code looks good. You should add an explanation about why you don't
> disable builders as soon as they've got a failure, i.e. why the threshold
> exists at all.
Roger.
> Also, in my previous review just prior to commit r11699, I suggested adding
> several XXXs. Could you please do that.
Grar, I forgot to finish that, thanks for reminding me.
> Do both of these things, and then land.
I'm not going to land it right away. I want to soak test it on dogfood first,
borrowing some builders from production. Once I'm happy with it on there,
it's time to let loose the hounds.
Thanks for all your help on this branch! You'll notice the large list of bugs
I linked to it just now. We've fixed all of those in this branch, and
probably more.
Cheers.
Preview Diff
1 | === modified file 'lib/lp/buildmaster/doc/builder.txt' |
2 | --- lib/lp/buildmaster/doc/builder.txt 2010-09-23 12:35:21 +0000 |
3 | +++ lib/lp/buildmaster/doc/builder.txt 2010-10-25 19:14:01 +0000 |
4 | @@ -19,9 +19,6 @@ |
5 | As expected, it implements IBuilder. |
6 | |
7 | >>> from canonical.launchpad.webapp.testing import verifyObject |
8 | - >>> from lp.buildmaster.interfaces.builder import IBuilder |
9 | - >>> verifyObject(IBuilder, builder) |
10 | - True |
11 | |
12 | >>> print builder.name |
13 | bob |
14 | @@ -86,7 +83,7 @@ |
15 | The 'new' method will create a new builder in the database. |
16 | |
17 | >>> bnew = builderset.new(1, 'http://dummy.com:8221/', 'dummy', |
18 | - ... 'Dummy Title', 'eh ?', 1) |
19 | + ... 'Dummy Title', 'eh ?', 1) |
20 | >>> bnew.name |
21 | u'dummy' |
22 | |
23 | @@ -170,7 +167,7 @@ |
24 | >>> recipe_bq.processor = i386_family.processors[0] |
25 | >>> recipe_bq.virtualized = True |
26 | >>> transaction.commit() |
27 | - |
28 | + |
29 | >>> queue_sizes = builderset.getBuildQueueSizes() |
30 | >>> print queue_sizes['virt']['386'] |
31 | (1L, datetime.timedelta(0, 64)) |
32 | @@ -188,116 +185,3 @@ |
33 | |
34 | >>> print queue_sizes['virt']['386'] |
35 | (2L, datetime.timedelta(0, 128)) |
36 | - |
37 | - |
38 | -Resuming buildd slaves |
39 | -====================== |
40 | - |
41 | -Virtual slaves are resumed using a command specified in the |
42 | -configuration profile. Production configuration uses a SSH trigger |
43 | -account accessed via a private key available in the builddmaster |
44 | -machine (which used ftpmaster configuration profile) as in: |
45 | - |
46 | -{{{ |
47 | -ssh ~/.ssh/ppa-reset-key ppa@%(vm_host)s |
48 | -}}} |
49 | - |
50 | -The test configuration uses a fake command that can be performed in |
51 | -development machine and allow us to tests the important features used |
52 | -in production, as 'vm_host' variable replacement. |
53 | - |
54 | - >>> from canonical.config import config |
55 | - >>> config.builddmaster.vm_resume_command |
56 | - 'echo %(vm_host)s' |
57 | - |
58 | -Before performing the command, it checks if the builder is indeed |
59 | -virtual and raises CannotResumeHost if it isn't. |
60 | - |
61 | - >>> bob = getUtility(IBuilderSet)['bob'] |
62 | - >>> bob.resumeSlaveHost() |
63 | - Traceback (most recent call last): |
64 | - ... |
65 | - CannotResumeHost: Builder is not virtualized. |
66 | - |
67 | -For testing purposes resumeSlaveHost returns the stdout and stderr |
68 | -buffer resulted from the command. |
69 | - |
70 | - >>> frog = getUtility(IBuilderSet)['frog'] |
71 | - >>> out, err = frog.resumeSlaveHost() |
72 | - >>> print out.strip() |
73 | - localhost-host.ppa |
74 | - |
75 | -If the specified command fails, resumeSlaveHost also raises |
76 | -CannotResumeHost exception with the results stdout and stderr. |
77 | - |
78 | - # The command must have a vm_host dict key and when executed, |
79 | - # have a returncode that is not 0. |
80 | - >>> vm_resume_command = """ |
81 | - ... [builddmaster] |
82 | - ... vm_resume_command: test "%(vm_host)s = 'false'" |
83 | - ... """ |
84 | - >>> config.push('vm_resume_command', vm_resume_command) |
85 | - >>> frog.resumeSlaveHost() |
86 | - Traceback (most recent call last): |
87 | - ... |
88 | - CannotResumeHost: Resuming failed: |
89 | - OUT: |
90 | - <BLANKLINE> |
91 | - ERR: |
92 | - <BLANKLINE> |
93 | - |
94 | -Restore default value for resume command. |
95 | - |
96 | - >>> config_data = config.pop('vm_resume_command') |
97 | - |
98 | - |
99 | -Rescuing lost slaves |
100 | -==================== |
101 | - |
102 | -Builder.rescueIfLost() checks the build ID reported in the slave status |
103 | -against the database. If it isn't building what we think it should be, |
104 | -the current build will be aborted and the slave cleaned in preparation |
105 | -for a new task. The decision about the slave's correctness is left up |
106 | -to IBuildFarmJobBehavior.verifySlaveBuildCookie -- for these examples we |
107 | -will use a special behavior that just checks if the cookie reads 'good'. |
108 | - |
109 | - >>> import logging |
110 | - >>> from lp.buildmaster.interfaces.builder import CorruptBuildCookie |
111 | - >>> from lp.buildmaster.tests.mock_slaves import ( |
112 | - ... BuildingSlave, MockBuilder, OkSlave, WaitingSlave) |
113 | - |
114 | - >>> class TestBuildBehavior: |
115 | - ... def verifySlaveBuildCookie(self, cookie): |
116 | - ... if cookie != 'good': |
117 | - ... raise CorruptBuildCookie('Bad value') |
118 | - |
119 | - >>> def rescue_slave_if_lost(slave): |
120 | - ... builder = MockBuilder('mock', slave, TestBuildBehavior()) |
121 | - ... builder.rescueIfLost(logging.getLogger()) |
122 | - |
123 | -An idle slave is not rescued. |
124 | - |
125 | - >>> rescue_slave_if_lost(OkSlave()) |
126 | - |
127 | -Slaves building or having built the correct build are not rescued |
128 | -either. |
129 | - |
130 | - >>> rescue_slave_if_lost(BuildingSlave(build_id='good')) |
131 | - >>> rescue_slave_if_lost(WaitingSlave(build_id='good')) |
132 | - |
133 | -But if a slave is building the wrong ID, it is declared lost and |
134 | -an abort is attempted. MockSlave prints out a message when it is aborted |
135 | -or cleaned. |
136 | - |
137 | - >>> rescue_slave_if_lost(BuildingSlave(build_id='bad')) |
138 | - Aborting slave |
139 | - INFO:root:Builder 'mock' rescued from 'bad': 'Bad value' |
140 | - |
141 | -Slaves having completed an incorrect build are also declared lost, |
142 | -but there's no need to abort a completed build. Such builders are |
143 | -instead simply cleaned, ready for the next build. |
144 | - |
145 | - >>> rescue_slave_if_lost(WaitingSlave(build_id='bad')) |
146 | - Cleaning slave |
147 | - INFO:root:Builder 'mock' rescued from 'bad': 'Bad value' |
148 | - |
149 | |
150 | === modified file 'lib/lp/buildmaster/interfaces/builder.py' |
151 | --- lib/lp/buildmaster/interfaces/builder.py 2010-09-23 18:17:21 +0000 |
152 | +++ lib/lp/buildmaster/interfaces/builder.py 2010-10-25 19:14:01 +0000 |
153 | @@ -154,11 +154,6 @@ |
154 | |
155 | currentjob = Attribute("BuildQueue instance for job being processed.") |
156 | |
157 | - is_available = Bool( |
158 | - title=_("Whether or not a builder is available for building " |
159 | - "new jobs. "), |
160 | - required=False) |
161 | - |
162 | failure_count = Int( |
163 | title=_('Failure Count'), required=False, default=0, |
164 | description=_("Number of consecutive failures for this builder.")) |
165 | @@ -173,32 +168,74 @@ |
166 | def resetFailureCount(): |
167 | """Set the failure_count back to zero.""" |
168 | |
169 | - def checkSlaveAlive(): |
170 | - """Check that the buildd slave is alive. |
171 | - |
172 | - This pings the slave over the network via the echo method and looks |
173 | - for the sent message as the reply. |
174 | - |
175 | - :raises BuildDaemonError: When the slave is down. |
176 | + def failBuilder(reason): |
177 | + """Mark builder as failed for a given reason.""" |
178 | + |
179 | + def setSlaveForTesting(proxy): |
180 | + """Sets the RPC proxy through which to operate the build slave.""" |
181 | + |
182 | + def verifySlaveBuildCookie(slave_build_id): |
183 | + """Verify that a slave's build cookie is consistent. |
184 | + |
185 | + This should delegate to the current `IBuildFarmJobBehavior`. |
186 | + """ |
187 | + |
188 | + def transferSlaveFileToLibrarian(file_sha1, filename, private): |
189 | + """Transfer a file from the slave to the librarian. |
190 | + |
191 | + :param file_sha1: The file's sha1, which is how the file is addressed |
192 | + in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog' |
193 | + will cause the build log to be retrieved and gzipped. |
194 | + :param filename: The name of the file to be given to the librarian file |
195 | + alias. |
196 | + :param private: True if the build is for a private archive. |
197 | + :return: A librarian file alias. |
198 | + """ |
199 | + |
200 | + def getBuildQueue(): |
201 | + """Return a `BuildQueue` if there's an active job on this builder. |
202 | + |
203 | + :return: A BuildQueue, or None. |
204 | + """ |
205 | + |
206 | + def getCurrentBuildFarmJob(): |
207 | + """Return a `BuildFarmJob` for this builder.""" |
208 | + |
209 | + # All methods below here return Deferred. |
210 | + |
211 | + def isAvailable(): |
212 | + """Whether or not a builder is available for building new jobs. |
213 | + |
214 | + :return: A Deferred that fires with True or False, depending on |
215 | + whether the builder is available or not. |
216 | """ |
217 | |
218 | def rescueIfLost(logger=None): |
219 | """Reset the slave if its job information doesn't match the DB. |
220 | |
221 | - If the builder is BUILDING or WAITING but has a build ID string |
222 | - that doesn't match what is stored in the DB, we have to dismiss |
223 | - its current actions and clean the slave for another job, assuming |
224 | - the XMLRPC is working properly at this point. |
225 | + This checks the build ID reported in the slave status against the |
226 | + database. If it isn't building what we think it should be, the current |
227 | + build will be aborted and the slave cleaned in preparation for a new |
228 | + task. The decision about the slave's correctness is left up to |
229 | + `IBuildFarmJobBehavior.verifySlaveBuildCookie`. |
230 | + |
231 | + :return: A Deferred that fires when the dialog with the slave is |
232 | + finished. It does not have a return value. |
233 | """ |
234 | |
235 | def updateStatus(logger=None): |
236 | - """Update the builder's status by probing it.""" |
237 | + """Update the builder's status by probing it. |
238 | + |
239 | + :return: A Deferred that fires when the dialog with the slave is |
240 | + finished. It does not have a return value. |
241 | + """ |
242 | |
243 | def cleanSlave(): |
244 | - """Clean any temporary files from the slave.""" |
245 | - |
246 | - def failBuilder(reason): |
247 | - """Mark builder as failed for a given reason.""" |
248 | + """Clean any temporary files from the slave. |
249 | + |
250 | + :return: A Deferred that fires when the dialog with the slave is |
251 | + finished. It does not have a return value. |
252 | + """ |
253 | |
254 | def requestAbort(): |
255 | """Ask that a build be aborted. |
256 | @@ -206,6 +243,9 @@ |
257 | This takes place asynchronously: Actually killing everything running |
258 | can take some time so the slave status should be queried again to |
259 | detect when the abort has taken effect. (Look for status ABORTED). |
260 | + |
261 | + :return: A Deferred that fires when the dialog with the slave is |
262 | + finished. It does not have a return value. |
263 | """ |
264 | |
265 | def resumeSlaveHost(): |
266 | @@ -217,37 +257,35 @@ |
267 | :raises: CannotResumeHost: if builder is not virtual or if the |
268 | configuration command has failed. |
269 | |
270 | - :return: command stdout and stderr buffers as a tuple. |
271 | + :return: A Deferred that fires when the resume operation finishes, |
272 | + whose value is a (stdout, stderr) tuple for success, or a Failure |
273 | + whose value is a CannotResumeHost exception. |
274 | """ |
275 | |
276 | - def setSlaveForTesting(proxy): |
277 | - """Sets the RPC proxy through which to operate the build slave.""" |
278 | - |
279 | def slaveStatus(): |
280 | """Get the slave status for this builder. |
281 | |
282 | - :return: a dict containing at least builder_status, but potentially |
283 | - other values included by the current build behavior. |
284 | + :return: A Deferred which fires when the slave dialog is complete. |
285 | + Its value is a dict containing at least builder_status, but |
286 | + potentially other values included by the current build |
287 | + behavior. |
288 | """ |
289 | |
290 | def slaveStatusSentence(): |
291 | """Get the slave status sentence for this builder. |
292 | |
293 | - :return: A tuple with the first element containing the slave status, |
294 | - build_id-queue-id and then optionally more elements depending on |
295 | - the status. |
296 | - """ |
297 | - |
298 | - def verifySlaveBuildCookie(slave_build_id): |
299 | - """Verify that a slave's build cookie is consistent. |
300 | - |
301 | - This should delegate to the current `IBuildFarmJobBehavior`. |
302 | + :return: A Deferred which fires when the slave dialog is complete. |
303 | + Its value is a tuple with the first element containing the |
304 | + slave status, build_id-queue-id and then optionally more |
305 | + elements depending on the status. |
306 | """ |
307 | |
308 | def updateBuild(queueItem): |
309 | """Verify the current build job status. |
310 | |
311 | Perform the required actions for each state. |
312 | + |
313 | + :return: A Deferred that fires when the slave dialog is finished. |
314 | """ |
315 | |
316 | def startBuild(build_queue_item, logger): |
317 | @@ -255,21 +293,10 @@ |
318 | |
319 | :param build_queue_item: A BuildQueueItem to build. |
320 | :param logger: A logger to be used to log diagnostic information. |
321 | - :raises BuildSlaveFailure: When the build slave fails. |
322 | - :raises CannotBuild: When a build cannot be started for some reason |
323 | - other than the build slave failing. |
324 | - """ |
325 | - |
326 | - def transferSlaveFileToLibrarian(file_sha1, filename, private): |
327 | - """Transfer a file from the slave to the librarian. |
328 | - |
329 | - :param file_sha1: The file's sha1, which is how the file is addressed |
330 | - in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog' |
331 | - will cause the build log to be retrieved and gzipped. |
332 | - :param filename: The name of the file to be given to the librarian file |
333 | - alias. |
334 | - :param private: True if the build is for a private archive. |
335 | - :return: A librarian file alias. |
336 | + |
337 | + :return: A Deferred that fires after the dispatch has completed whose |
338 | + value is None, or a Failure that contains an exception |
339 | + explaining what went wrong. |
340 | """ |
341 | |
342 | def handleTimeout(logger, error_message): |
343 | @@ -284,6 +311,8 @@ |
344 | |
345 | :param logger: The logger object to be used for logging. |
346 | :param error_message: The error message to be used for logging. |
347 | + :return: A Deferred that fires after the virtual slave was resumed |
348 | + or immediately if it's a non-virtual slave. |
349 | """ |
350 | |
351 | def findAndStartJob(buildd_slave=None): |
352 | @@ -291,17 +320,9 @@ |
353 | |
354 | :param buildd_slave: An optional buildd slave that this builder should |
355 | talk to. |
356 | - :return: the `IBuildQueue` instance found or None if no job was found. |
357 | - """ |
358 | - |
359 | - def getBuildQueue(): |
360 | - """Return a `BuildQueue` if there's an active job on this builder. |
361 | - |
362 | - :return: A BuildQueue, or None. |
363 | - """ |
364 | - |
365 | - def getCurrentBuildFarmJob(): |
366 | - """Return a `BuildFarmJob` for this builder.""" |
367 | + :return: A Deferred whose value is the `IBuildQueue` instance |
368 | + found or None if no job was found. |
369 | + """ |
370 | |
371 | |
372 | class IBuilderSet(Interface): |
373 | |
374 | === modified file 'lib/lp/buildmaster/manager.py' |
375 | --- lib/lp/buildmaster/manager.py 2010-09-24 15:40:49 +0000 |
376 | +++ lib/lp/buildmaster/manager.py 2010-10-25 19:14:01 +0000 |
377 | @@ -10,13 +10,10 @@ |
378 | 'BuilddManager', |
379 | 'BUILDD_MANAGER_LOG_NAME', |
380 | 'FailDispatchResult', |
381 | - 'RecordingSlave', |
382 | 'ResetDispatchResult', |
383 | - 'buildd_success_result_map', |
384 | ] |
385 | |
386 | import logging |
387 | -import os |
388 | |
389 | import transaction |
390 | from twisted.application import service |
391 | @@ -24,129 +21,27 @@ |
392 | defer, |
393 | reactor, |
394 | ) |
395 | -from twisted.protocols.policies import TimeoutMixin |
396 | +from twisted.internet.task import LoopingCall |
397 | from twisted.python import log |
398 | -from twisted.python.failure import Failure |
399 | -from twisted.web import xmlrpc |
400 | from zope.component import getUtility |
401 | |
402 | -from canonical.config import config |
403 | -from canonical.launchpad.webapp import urlappend |
404 | -from lp.services.database import write_transaction |
405 | from lp.buildmaster.enums import BuildStatus |
406 | -from lp.services.twistedsupport.processmonitor import ProcessWithTimeout |
407 | +from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
408 | + BuildBehaviorMismatch, |
409 | + ) |
410 | +from lp.buildmaster.model.builder import Builder |
411 | +from lp.buildmaster.interfaces.builder import ( |
412 | + BuildDaemonError, |
413 | + BuildSlaveFailure, |
414 | + CannotBuild, |
415 | + CannotFetchFile, |
416 | + CannotResumeHost, |
417 | + ) |
418 | |
419 | |
420 | BUILDD_MANAGER_LOG_NAME = "slave-scanner" |
421 | |
422 | |
423 | -buildd_success_result_map = { |
424 | - 'ensurepresent': True, |
425 | - 'build': 'BuilderStatus.BUILDING', |
426 | - } |
427 | - |
428 | - |
429 | -class QueryWithTimeoutProtocol(xmlrpc.QueryProtocol, TimeoutMixin): |
430 | - """XMLRPC query protocol with a configurable timeout. |
431 | - |
432 | - XMLRPC queries using this protocol will be unconditionally closed |
433 | - when the timeout is elapsed. The timeout is fetched from the context |
434 | - Launchpad configuration file (`config.builddmaster.socket_timeout`). |
435 | - """ |
436 | - def connectionMade(self): |
437 | - xmlrpc.QueryProtocol.connectionMade(self) |
438 | - self.setTimeout(config.builddmaster.socket_timeout) |
439 | - |
440 | - |
441 | -class QueryFactoryWithTimeout(xmlrpc._QueryFactory): |
442 | - """XMLRPC client factory with timeout support.""" |
443 | - # Make this factory quiet. |
444 | - noisy = False |
445 | - # Use the protocol with timeout support. |
446 | - protocol = QueryWithTimeoutProtocol |
447 | - |
448 | - |
449 | -class RecordingSlave: |
450 | - """An RPC proxy for buildd slaves that records instructions to the latter. |
451 | - |
452 | - The idea here is to merely record the instructions that the slave-scanner |
453 | - issues to the buildd slaves and "replay" them a bit later in asynchronous |
454 | - and parallel fashion. |
455 | - |
456 | - By dealing with a number of buildd slaves in parallel we remove *the* |
457 | - major slave-scanner throughput issue while avoiding large-scale changes to |
458 | - its code base. |
459 | - """ |
460 | - |
461 | - def __init__(self, name, url, vm_host): |
462 | - self.name = name |
463 | - self.url = url |
464 | - self.vm_host = vm_host |
465 | - |
466 | - self.resume_requested = False |
467 | - self.calls = [] |
468 | - |
469 | - def __repr__(self): |
470 | - return '<%s:%s>' % (self.name, self.url) |
471 | - |
472 | - def cacheFile(self, logger, libraryfilealias): |
473 | - """Cache the file on the server.""" |
474 | - self.ensurepresent( |
475 | - libraryfilealias.content.sha1, libraryfilealias.http_url, '', '') |
476 | - |
477 | - def sendFileToSlave(self, *args): |
478 | - """Helper to send a file to this builder.""" |
479 | - return self.ensurepresent(*args) |
480 | - |
481 | - def ensurepresent(self, *args): |
482 | - """Download files needed for the build.""" |
483 | - self.calls.append(('ensurepresent', args)) |
484 | - result = buildd_success_result_map.get('ensurepresent') |
485 | - return [result, 'Download'] |
486 | - |
487 | - def build(self, *args): |
488 | - """Perform the build.""" |
489 | - # XXX: This method does not appear to be used. |
490 | - self.calls.append(('build', args)) |
491 | - result = buildd_success_result_map.get('build') |
492 | - return [result, args[0]] |
493 | - |
494 | - def resume(self): |
495 | - """Record the request to resume the builder.. |
496 | - |
497 | - Always succeed. |
498 | - |
499 | - :return: a (stdout, stderr, subprocess exitcode) triple |
500 | - """ |
501 | - self.resume_requested = True |
502 | - return ['', '', 0] |
503 | - |
504 | - def resumeSlave(self, clock=None): |
505 | - """Resume the builder in a asynchronous fashion. |
506 | - |
507 | - Used the configuration command-line in the same way |
508 | - `BuilddSlave.resume` does. |
509 | - |
510 | - Also use the builddmaster configuration 'socket_timeout' as |
511 | - the process timeout. |
512 | - |
513 | - :param clock: An optional twisted.internet.task.Clock to override |
514 | - the default clock. For use in tests. |
515 | - |
516 | - :return: a Deferred |
517 | - """ |
518 | - resume_command = config.builddmaster.vm_resume_command % { |
519 | - 'vm_host': self.vm_host} |
520 | - # Twisted API require string and the configuration provides unicode. |
521 | - resume_argv = [str(term) for term in resume_command.split()] |
522 | - |
523 | - d = defer.Deferred() |
524 | - p = ProcessWithTimeout( |
525 | - d, config.builddmaster.socket_timeout, clock=clock) |
526 | - p.spawnProcess(resume_argv[0], tuple(resume_argv)) |
527 | - return d |
528 | - |
529 | - |
530 | def get_builder(name): |
531 | """Helper to return the builder given the slave for this request.""" |
532 | # Avoiding circular imports. |
533 | @@ -159,9 +54,12 @@ |
534 | # builder.currentjob hides a complicated query, don't run it twice. |
535 | # See bug 623281. |
536 | current_job = builder.currentjob |
537 | - build_job = current_job.specific_job.build |
538 | + if current_job is None: |
539 | + job_failure_count = 0 |
540 | + else: |
541 | + job_failure_count = current_job.specific_job.build.failure_count |
542 | |
543 | - if builder.failure_count == build_job.failure_count: |
544 | + if builder.failure_count == job_failure_count and current_job is not None: |
545 | # If the failure count for the builder is the same as the |
546 | # failure count for the job being built, then we cannot |
547 | # tell whether the job or the builder is at fault. The best |
548 | @@ -170,17 +68,28 @@ |
549 | current_job.reset() |
550 | return |
551 | |
552 | - if builder.failure_count > build_job.failure_count: |
553 | + if builder.failure_count > job_failure_count: |
554 | # The builder has failed more than the jobs it's been |
555 | - # running, so let's disable it and re-schedule the build. |
556 | - builder.failBuilder(fail_notes) |
557 | - current_job.reset() |
558 | + # running. |
559 | + |
560 | + # Re-schedule the build if there is one. |
561 | + if current_job is not None: |
562 | + current_job.reset() |
563 | + |
564 | + # We are a little more tolerant with failing builders than |
565 | + # failing jobs because sometimes they get unresponsive due to |
566 | + # human error, flaky networks etc. We expect the builder to get |
567 | + # better, whereas jobs are very unlikely to get better. |
568 | + if builder.failure_count >= Builder.FAILURE_THRESHOLD: |
569 | + # It's also gone over the threshold so let's disable it. |
570 | + builder.failBuilder(fail_notes) |
571 | else: |
572 | # The job is the culprit! Override its status to 'failed' |
573 | # to make sure it won't get automatically dispatched again, |
574 | # and remove the buildqueue request. The failure should |
575 | # have already caused any relevant slave data to be stored |
576 | # on the build record so don't worry about that here. |
577 | + build_job = current_job.specific_job.build |
578 | build_job.status = BuildStatus.FAILEDTOBUILD |
579 | builder.currentjob.destroySelf() |
580 | |
581 | @@ -190,133 +99,108 @@ |
582 | # next buildd scan. |
583 | |
584 | |
585 | -class BaseDispatchResult: |
586 | - """Base class for *DispatchResult variations. |
587 | - |
588 | - It will be extended to represent dispatching results and allow |
589 | - homogeneous processing. |
590 | - """ |
591 | - |
592 | - def __init__(self, slave, info=None): |
593 | - self.slave = slave |
594 | - self.info = info |
595 | - |
596 | - def _cleanJob(self, job): |
597 | - """Clean up in case of builder reset or dispatch failure.""" |
598 | - if job is not None: |
599 | - job.reset() |
600 | - |
601 | - def assessFailureCounts(self): |
602 | - """View builder/job failure_count and work out which needs to die. |
603 | - |
604 | - :return: True if we disabled something, False if we did not. |
605 | - """ |
606 | - builder = get_builder(self.slave.name) |
607 | - assessFailureCounts(builder, self.info) |
608 | - |
609 | - def ___call__(self): |
610 | - raise NotImplementedError( |
611 | - "Call sites must define an evaluation method.") |
612 | - |
613 | - |
614 | -class FailDispatchResult(BaseDispatchResult): |
615 | - """Represents a communication failure while dispatching a build job.. |
616 | - |
617 | - When evaluated this object mark the corresponding `IBuilder` as |
618 | - 'NOK' with the given text as 'failnotes'. It also cleans up the running |
619 | - job (`IBuildQueue`). |
620 | - """ |
621 | - |
622 | - def __repr__(self): |
623 | - return '%r failure (%s)' % (self.slave, self.info) |
624 | - |
625 | - @write_transaction |
626 | - def __call__(self): |
627 | - self.assessFailureCounts() |
628 | - |
629 | - |
630 | -class ResetDispatchResult(BaseDispatchResult): |
631 | - """Represents a failure to reset a builder. |
632 | - |
633 | - When evaluated this object simply cleans up the running job |
634 | - (`IBuildQueue`) and marks the builder down. |
635 | - """ |
636 | - |
637 | - def __repr__(self): |
638 | - return '%r reset failure' % self.slave |
639 | - |
640 | - @write_transaction |
641 | - def __call__(self): |
642 | - builder = get_builder(self.slave.name) |
643 | - # Builders that fail to reset should be disabled as per bug |
644 | - # 563353. |
645 | - # XXX Julian bug=586362 |
646 | - # This is disabled until this code is not also used for dispatch |
647 | - # failures where we *don't* want to disable the builder. |
648 | - # builder.failBuilder(self.info) |
649 | - self._cleanJob(builder.currentjob) |
650 | - |
651 | - |
652 | class SlaveScanner: |
653 | """A manager for a single builder.""" |
654 | |
655 | + # The interval between each poll cycle, in seconds. We'd ideally |
656 | + # like this to be lower but 5 seems a reasonable compromise between |
657 | + # responsivity and load on the database server, since in each cycle |
658 | + # we can run quite a few queries. |
659 | SCAN_INTERVAL = 5 |
660 | |
661 | - # These are for the benefit of tests; see `TestingSlaveScanner`. |
662 | - # It pokes fake versions in here so that it can verify methods were |
663 | - # called. The tests should really be using FakeMethod() though. |
664 | - reset_result = ResetDispatchResult |
665 | - fail_result = FailDispatchResult |
666 | - |
667 | def __init__(self, builder_name, logger): |
668 | self.builder_name = builder_name |
669 | self.logger = logger |
670 | - self._deferred_list = [] |
671 | - |
672 | - def scheduleNextScanCycle(self): |
673 | - """Schedule another scan of the builder some time in the future.""" |
674 | - self._deferred_list = [] |
675 | - # XXX: Change this to use LoopingCall. |
676 | - reactor.callLater(self.SCAN_INTERVAL, self.startCycle) |
677 | |
678 | def startCycle(self): |
679 | """Scan the builder and dispatch to it or deal with failures.""" |
680 | + self.loop = LoopingCall(self.singleCycle) |
681 | + self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL) |
682 | + return self.stopping_deferred |
683 | + |
684 | + def stopCycle(self): |
685 | + """Terminate the LoopingCall.""" |
686 | + self.loop.stop() |
687 | + |
688 | + def singleCycle(self): |
689 | self.logger.debug("Scanning builder: %s" % self.builder_name) |
690 | - |
691 | + d = self.scan() |
692 | + |
693 | + d.addErrback(self._scanFailed) |
694 | + return d |
695 | + |
696 | + def _scanFailed(self, failure): |
697 | + """Deal with failures encountered during the scan cycle. |
698 | + |
699 | + 1. Print the error in the log |
700 | + 2. Increment and assess failure counts on the builder and job. |
701 | + """ |
702 | + # Make sure that pending database updates are removed as it |
703 | + # could leave the database in an inconsistent state (e.g. The |
704 | + # job says it's running but the buildqueue has no builder set). |
705 | + transaction.abort() |
706 | + |
707 | + # If we don't recognise the exception include a stack trace with |
708 | + # the error. |
709 | + error_message = failure.getErrorMessage() |
710 | + if failure.check( |
711 | + BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch, |
712 | + CannotResumeHost, BuildDaemonError, CannotFetchFile): |
713 | + self.logger.info("Scanning failed with: %s" % error_message) |
714 | + else: |
715 | + self.logger.info("Scanning failed with: %s\n%s" % |
716 | + (failure.getErrorMessage(), failure.getTraceback())) |
717 | + |
718 | + # Decide if we need to terminate the job or fail the |
719 | + # builder. |
720 | try: |
721 | - slave = self.scan() |
722 | - if slave is None: |
723 | - self.scheduleNextScanCycle() |
724 | + builder = get_builder(self.builder_name) |
725 | + builder.gotFailure() |
726 | + if builder.currentjob is not None: |
727 | + build_farm_job = builder.getCurrentBuildFarmJob() |
728 | + build_farm_job.gotFailure() |
729 | + self.logger.info( |
730 | + "builder %s failure count: %s, " |
731 | + "job '%s' failure count: %s" % ( |
732 | + self.builder_name, |
733 | + builder.failure_count, |
734 | + build_farm_job.title, |
735 | + build_farm_job.failure_count)) |
736 | else: |
737 | - # XXX: Ought to return Deferred. |
738 | - self.resumeAndDispatch(slave) |
739 | + self.logger.info( |
740 | + "Builder %s failed a probe, count: %s" % ( |
741 | + self.builder_name, builder.failure_count)) |
742 | + assessFailureCounts(builder, failure.getErrorMessage()) |
743 | + transaction.commit() |
744 | except: |
745 | - error = Failure() |
746 | - self.logger.info("Scanning failed with: %s\n%s" % |
747 | - (error.getErrorMessage(), error.getTraceback())) |
748 | - |
749 | - builder = get_builder(self.builder_name) |
750 | - |
751 | - # Decide if we need to terminate the job or fail the |
752 | - # builder. |
753 | - self._incrementFailureCounts(builder) |
754 | - self.logger.info( |
755 | - "builder failure count: %s, job failure count: %s" % ( |
756 | - builder.failure_count, |
757 | - builder.getCurrentBuildFarmJob().failure_count)) |
758 | - assessFailureCounts(builder, error.getErrorMessage()) |
759 | - transaction.commit() |
760 | - |
761 | - self.scheduleNextScanCycle() |
762 | - |
763 | - @write_transaction |
764 | + # Catastrophic code failure! Not much we can do. |
765 | + self.logger.error( |
766 | + "Miserable failure when trying to examine failure counts:\n", |
767 | + exc_info=True) |
768 | + transaction.abort() |
769 | + |
770 | def scan(self): |
771 | """Probe the builder and update/dispatch/collect as appropriate. |
772 | |
773 | - The whole method is wrapped in a transaction, but we do partial |
774 | - commits to avoid holding locks on tables. |
775 | - |
776 | - :return: A `RecordingSlave` if we dispatched a job to it, or None. |
777 | + There are several steps to scanning: |
778 | + |
779 | + 1. If the builder is marked as "ok" then probe it to see what state |
780 | + it's in. This is where lost jobs are rescued if we think the |
781 | + builder is doing something that it later tells us it's not, |
782 | + and also where the multi-phase abort procedure happens. |
783 | + See IBuilder.rescueIfLost, which is called by |
784 | + IBuilder.updateStatus(). |
785 | + 2. If the builder is still happy, we ask it if it has an active build |
786 | + and then either update the build in Launchpad or collect the |
787 | + completed build. (builder.updateBuild) |
788 | + 3. If the builder is not happy or it was marked as unavailable |
789 | + mid-build, we need to reset the job that we thought it had, so |
790 | + that the job is dispatched elsewhere. |
791 | + 4. If the builder is idle and we have another build ready, dispatch |
792 | + it. |
793 | + |
794 | + :return: A Deferred that fires when the scan is complete, whose |
795 | + value is A `BuilderSlave` if we dispatched a job to it, or None. |
796 | """ |
797 | # We need to re-fetch the builder object on each cycle as the |
798 | # Storm store is invalidated over transaction boundaries. |
799 | @@ -324,240 +208,72 @@ |
800 | self.builder = get_builder(self.builder_name) |
801 | |
802 | if self.builder.builderok: |
803 | - self.builder.updateStatus(self.logger) |
804 | - transaction.commit() |
805 | - |
806 | - # See if we think there's an active build on the builder. |
807 | - buildqueue = self.builder.getBuildQueue() |
808 | - |
809 | - # XXX Julian 2010-07-29 bug=611258 |
810 | - # We're not using the RecordingSlave until dispatching, which |
811 | - # means that this part blocks until we've received a response |
812 | - # from the builder. updateBuild() needs to be made |
813 | - # asyncronous. |
814 | - |
815 | - # Scan the slave and get the logtail, or collect the build if |
816 | - # it's ready. Yes, "updateBuild" is a bad name. |
817 | - if buildqueue is not None: |
818 | - self.builder.updateBuild(buildqueue) |
819 | - transaction.commit() |
820 | - |
821 | - # If the builder is in manual mode, don't dispatch anything. |
822 | - if self.builder.manual: |
823 | - self.logger.debug( |
824 | - '%s is in manual mode, not dispatching.' % self.builder.name) |
825 | - return None |
826 | - |
827 | - # If the builder is marked unavailable, don't dispatch anything. |
828 | - # Additionaly, because builders can be removed from the pool at |
829 | - # any time, we need to see if we think there was a build running |
830 | - # on it before it was marked unavailable. In this case we reset |
831 | - # the build thusly forcing it to get re-dispatched to another |
832 | - # builder. |
833 | - if not self.builder.is_available: |
834 | - job = self.builder.currentjob |
835 | - if job is not None and not self.builder.builderok: |
836 | - self.logger.info( |
837 | - "%s was made unavailable, resetting attached " |
838 | - "job" % self.builder.name) |
839 | - job.reset() |
840 | - transaction.commit() |
841 | - return None |
842 | - |
843 | - # See if there is a job we can dispatch to the builder slave. |
844 | - |
845 | - # XXX: Rather than use the slave actually associated with the builder |
846 | - # (which, incidentally, shouldn't be a property anyway), we make a new |
847 | - # RecordingSlave so we can get access to its asynchronous |
848 | - # "resumeSlave" method. Blech. |
849 | - slave = RecordingSlave( |
850 | - self.builder.name, self.builder.url, self.builder.vm_host) |
851 | - # XXX: Passing buildd_slave=slave overwrites the 'slave' property of |
852 | - # self.builder. Not sure why this is needed yet. |
853 | - self.builder.findAndStartJob(buildd_slave=slave) |
854 | - if self.builder.currentjob is not None: |
855 | - # After a successful dispatch we can reset the |
856 | - # failure_count. |
857 | - self.builder.resetFailureCount() |
858 | - transaction.commit() |
859 | - return slave |
860 | - |
861 | - return None |
862 | - |
863 | - def resumeAndDispatch(self, slave): |
864 | - """Chain the resume and dispatching Deferreds.""" |
865 | - # XXX: resumeAndDispatch makes Deferreds without returning them. |
866 | - if slave.resume_requested: |
867 | - # The slave needs to be reset before we can dispatch to |
868 | - # it (e.g. a virtual slave) |
869 | - |
870 | - # XXX: Two problems here. The first is that 'resumeSlave' only |
871 | - # exists on RecordingSlave (BuilderSlave calls it 'resume'). |
872 | - d = slave.resumeSlave() |
873 | - d.addBoth(self.checkResume, slave) |
874 | + d = self.builder.updateStatus(self.logger) |
875 | else: |
876 | - # No resume required, build dispatching can commence. |
877 | d = defer.succeed(None) |
878 | |
879 | - # Dispatch the build to the slave asynchronously. |
880 | - d.addCallback(self.initiateDispatch, slave) |
881 | - # Store this deferred so we can wait for it along with all |
882 | - # the others that will be generated by RecordingSlave during |
883 | - # the dispatch process, and chain a callback after they've |
884 | - # all fired. |
885 | - self._deferred_list.append(d) |
886 | - |
887 | - def initiateDispatch(self, resume_result, slave): |
888 | - """Start dispatching a build to a slave. |
889 | - |
890 | - If the previous task in chain (slave resuming) has failed it will |
891 | - receive a `ResetBuilderRequest` instance as 'resume_result' and |
892 | - will immediately return that so the subsequent callback can collect |
893 | - it. |
894 | - |
895 | - If the slave resuming succeeded, it starts the XMLRPC dialogue. The |
896 | - dialogue may consist of many calls to the slave before the build |
897 | - starts. Each call is done via a Deferred event, where slave calls |
898 | - are sent in callSlave(), and checked in checkDispatch() which will |
899 | - keep firing events via callSlave() until all the events are done or |
900 | - an error occurs. |
901 | - """ |
902 | - if resume_result is not None: |
903 | - self.slaveConversationEnded() |
904 | - return resume_result |
905 | - |
906 | - self.logger.info('Dispatching: %s' % slave) |
907 | - self.callSlave(slave) |
908 | - |
909 | - def _getProxyForSlave(self, slave): |
910 | - """Return a twisted.web.xmlrpc.Proxy for the buildd slave. |
911 | - |
912 | - Uses a protocol with timeout support, See QueryFactoryWithTimeout. |
913 | - """ |
914 | - proxy = xmlrpc.Proxy(str(urlappend(slave.url, 'rpc'))) |
915 | - proxy.queryFactory = QueryFactoryWithTimeout |
916 | - return proxy |
917 | - |
918 | - def callSlave(self, slave): |
919 | - """Dispatch the next XMLRPC for the given slave.""" |
920 | - if len(slave.calls) == 0: |
921 | - # That's the end of the dialogue with the slave. |
922 | - self.slaveConversationEnded() |
923 | - return |
924 | - |
925 | - # Get an XMLRPC proxy for the buildd slave. |
926 | - proxy = self._getProxyForSlave(slave) |
927 | - method, args = slave.calls.pop(0) |
928 | - d = proxy.callRemote(method, *args) |
929 | - d.addBoth(self.checkDispatch, method, slave) |
930 | - self._deferred_list.append(d) |
931 | - self.logger.debug('%s -> %s(%s)' % (slave, method, args)) |
932 | - |
933 | - def slaveConversationEnded(self): |
934 | - """After all the Deferreds are set up, chain a callback on them.""" |
935 | - dl = defer.DeferredList(self._deferred_list, consumeErrors=True) |
936 | - dl.addBoth(self.evaluateDispatchResult) |
937 | - return dl |
938 | - |
939 | - def evaluateDispatchResult(self, deferred_list_results): |
940 | - """Process the DispatchResult for this dispatch chain. |
941 | - |
942 | - After waiting for the Deferred chain to finish, we'll have a |
943 | - DispatchResult to evaluate, which deals with the result of |
944 | - dispatching. |
945 | - """ |
946 | - # The `deferred_list_results` is what we get when waiting on a |
947 | - # DeferredList. It's a list of tuples of (status, result) where |
948 | - # result is what the last callback in that chain returned. |
949 | - |
950 | - # If the result is an instance of BaseDispatchResult we need to |
951 | - # evaluate it, as there's further action required at the end of |
952 | - # the dispatch chain. None, resulting from successful chains, |
953 | - # are discarded. |
954 | - |
955 | - dispatch_results = [ |
956 | - result for status, result in deferred_list_results |
957 | - if isinstance(result, BaseDispatchResult)] |
958 | - |
959 | - for result in dispatch_results: |
960 | - self.logger.info("%r" % result) |
961 | - result() |
962 | - |
963 | - # At this point, we're done dispatching, so we can schedule the |
964 | - # next scan cycle. |
965 | - self.scheduleNextScanCycle() |
966 | - |
967 | - # For the test suite so that it can chain callback results. |
968 | - return deferred_list_results |
969 | - |
970 | - def checkResume(self, response, slave): |
971 | - """Check the result of resuming a slave. |
972 | - |
973 | - If there's a problem resuming, we return a ResetDispatchResult which |
974 | - will get evaluated at the end of the scan, or None if the resume |
975 | - was OK. |
976 | - |
977 | - :param response: the tuple that's constructed in |
978 | - ProcessWithTimeout.processEnded(), or a Failure that |
979 | - contains the tuple. |
980 | - :param slave: the slave object we're talking to |
981 | - """ |
982 | - if isinstance(response, Failure): |
983 | - out, err, code = response.value |
984 | - else: |
985 | - out, err, code = response |
986 | - if code == os.EX_OK: |
987 | - return None |
988 | - |
989 | - error_text = '%s\n%s' % (out, err) |
990 | - self.logger.error('%s resume failure: %s' % (slave, error_text)) |
991 | - return self.reset_result(slave, error_text) |
992 | - |
993 | - def _incrementFailureCounts(self, builder): |
994 | - builder.gotFailure() |
995 | - builder.getCurrentBuildFarmJob().gotFailure() |
996 | - |
997 | - def checkDispatch(self, response, method, slave): |
998 | - """Verify the results of a slave xmlrpc call. |
999 | - |
1000 | - If it failed and it compromises the slave then return a corresponding |
1001 | - `FailDispatchResult`, if it was a communication failure, simply |
1002 | - reset the slave by returning a `ResetDispatchResult`. |
1003 | - """ |
1004 | - from lp.buildmaster.interfaces.builder import IBuilderSet |
1005 | - builder = getUtility(IBuilderSet)[slave.name] |
1006 | - |
1007 | - # XXX these DispatchResult classes are badly named and do the |
1008 | - # same thing. We need to fix that. |
1009 | - self.logger.debug( |
1010 | - '%s response for "%s": %s' % (slave, method, response)) |
1011 | - |
1012 | - if isinstance(response, Failure): |
1013 | - self.logger.warn( |
1014 | - '%s communication failed (%s)' % |
1015 | - (slave, response.getErrorMessage())) |
1016 | - self.slaveConversationEnded() |
1017 | - self._incrementFailureCounts(builder) |
1018 | - return self.fail_result(slave) |
1019 | - |
1020 | - if isinstance(response, list) and len(response) == 2: |
1021 | - if method in buildd_success_result_map: |
1022 | - expected_status = buildd_success_result_map.get(method) |
1023 | - status, info = response |
1024 | - if status == expected_status: |
1025 | - self.callSlave(slave) |
1026 | + def status_updated(ignored): |
1027 | + # Commit the changes done while possibly rescuing jobs, to |
1028 | + # avoid holding table locks. |
1029 | + transaction.commit() |
1030 | + |
1031 | + # See if we think there's an active build on the builder. |
1032 | + buildqueue = self.builder.getBuildQueue() |
1033 | + |
1034 | + # Scan the slave and get the logtail, or collect the build if |
1035 | + # it's ready. Yes, "updateBuild" is a bad name. |
1036 | + if buildqueue is not None: |
1037 | + return self.builder.updateBuild(buildqueue) |
1038 | + |
1039 | + def build_updated(ignored): |
1040 | + # Commit changes done while updating the build, to avoid |
1041 | + # holding table locks. |
1042 | + transaction.commit() |
1043 | + |
1044 | + # If the builder is in manual mode, don't dispatch anything. |
1045 | + if self.builder.manual: |
1046 | + self.logger.debug( |
1047 | + '%s is in manual mode, not dispatching.' % |
1048 | + self.builder.name) |
1049 | + return |
1050 | + |
1051 | + # If the builder is marked unavailable, don't dispatch anything. |
1052 | + # Additionaly, because builders can be removed from the pool at |
1053 | + # any time, we need to see if we think there was a build running |
1054 | + # on it before it was marked unavailable. In this case we reset |
1055 | + # the build thusly forcing it to get re-dispatched to another |
1056 | + # builder. |
1057 | + |
1058 | + return self.builder.isAvailable().addCallback(got_available) |
1059 | + |
1060 | + def got_available(available): |
1061 | + if not available: |
1062 | + job = self.builder.currentjob |
1063 | + if job is not None and not self.builder.builderok: |
1064 | + self.logger.info( |
1065 | + "%s was made unavailable, resetting attached " |
1066 | + "job" % self.builder.name) |
1067 | + job.reset() |
1068 | + transaction.commit() |
1069 | + return |
1070 | + |
1071 | + # See if there is a job we can dispatch to the builder slave. |
1072 | + |
1073 | + d = self.builder.findAndStartJob() |
1074 | + def job_started(candidate): |
1075 | + if self.builder.currentjob is not None: |
1076 | + # After a successful dispatch we can reset the |
1077 | + # failure_count. |
1078 | + self.builder.resetFailureCount() |
1079 | + transaction.commit() |
1080 | + return self.builder.slave |
1081 | + else: |
1082 | return None |
1083 | - else: |
1084 | - info = 'Unknown slave method: %s' % method |
1085 | - else: |
1086 | - info = 'Unexpected response: %s' % repr(response) |
1087 | - |
1088 | - self.logger.error( |
1089 | - '%s failed to dispatch (%s)' % (slave, info)) |
1090 | - |
1091 | - self.slaveConversationEnded() |
1092 | - self._incrementFailureCounts(builder) |
1093 | - return self.fail_result(slave, info) |
1094 | + return d.addCallback(job_started) |
1095 | + |
1096 | + d.addCallback(status_updated) |
1097 | + d.addCallback(build_updated) |
1098 | + return d |
1099 | |
1100 | |
1101 | class NewBuildersScanner: |
1102 | @@ -578,15 +294,21 @@ |
1103 | self.current_builders = [ |
1104 | builder.name for builder in getUtility(IBuilderSet)] |
1105 | |
1106 | + def stop(self): |
1107 | + """Terminate the LoopingCall.""" |
1108 | + self.loop.stop() |
1109 | + |
1110 | def scheduleScan(self): |
1111 | """Schedule a callback SCAN_INTERVAL seconds later.""" |
1112 | - return self._clock.callLater(self.SCAN_INTERVAL, self.scan) |
1113 | + self.loop = LoopingCall(self.scan) |
1114 | + self.loop.clock = self._clock |
1115 | + self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL) |
1116 | + return self.stopping_deferred |
1117 | |
1118 | def scan(self): |
1119 | """If a new builder appears, create a SlaveScanner for it.""" |
1120 | new_builders = self.checkForNewBuilders() |
1121 | self.manager.addScanForBuilders(new_builders) |
1122 | - self.scheduleScan() |
1123 | |
1124 | def checkForNewBuilders(self): |
1125 | """See if any new builders were added.""" |
1126 | @@ -609,10 +331,7 @@ |
1127 | manager=self, clock=clock) |
1128 | |
1129 | def _setupLogger(self): |
1130 | - """Setup a 'slave-scanner' logger that redirects to twisted. |
1131 | - |
1132 | - It is going to be used locally and within the thread running |
1133 | - the scan() method. |
1134 | + """Set up a 'slave-scanner' logger that redirects to twisted. |
1135 | |
1136 | Make it less verbose to avoid messing too much with the old code. |
1137 | """ |
1138 | @@ -643,12 +362,29 @@ |
1139 | # Events will now fire in the SlaveScanner objects to scan each |
1140 | # builder. |
1141 | |
1142 | + def stopService(self): |
1143 | + """Callback for when we need to shut down.""" |
1144 | + # XXX: lacks unit tests |
1145 | + # All the SlaveScanner objects need to be halted gracefully. |
1146 | + deferreds = [slave.stopping_deferred for slave in self.builder_slaves] |
1147 | + deferreds.append(self.new_builders_scanner.stopping_deferred) |
1148 | + |
1149 | + self.new_builders_scanner.stop() |
1150 | + for slave in self.builder_slaves: |
1151 | + slave.stopCycle() |
1152 | + |
1153 | + # The 'stopping_deferred's are called back when the loops are |
1154 | + # stopped, so we can wait on them all at once here before |
1155 | + # exiting. |
1156 | + d = defer.DeferredList(deferreds, consumeErrors=True) |
1157 | + return d |
1158 | + |
1159 | def addScanForBuilders(self, builders): |
1160 | """Set up scanner objects for the builders specified.""" |
1161 | for builder in builders: |
1162 | slave_scanner = SlaveScanner(builder, self.logger) |
1163 | self.builder_slaves.append(slave_scanner) |
1164 | - slave_scanner.scheduleNextScanCycle() |
1165 | + slave_scanner.startCycle() |
1166 | |
1167 | # Return the slave list for the benefit of tests. |
1168 | return self.builder_slaves |
1169 | |
1170 | === modified file 'lib/lp/buildmaster/model/builder.py' |
1171 | --- lib/lp/buildmaster/model/builder.py 2010-09-24 13:39:27 +0000 |
1172 | +++ lib/lp/buildmaster/model/builder.py 2010-10-25 19:14:01 +0000 |
1173 | @@ -13,12 +13,11 @@ |
1174 | ] |
1175 | |
1176 | import gzip |
1177 | -import httplib |
1178 | import logging |
1179 | import os |
1180 | import socket |
1181 | -import subprocess |
1182 | import tempfile |
1183 | +import transaction |
1184 | import urllib2 |
1185 | import xmlrpclib |
1186 | |
1187 | @@ -34,6 +33,13 @@ |
1188 | Count, |
1189 | Sum, |
1190 | ) |
1191 | + |
1192 | +from twisted.internet import ( |
1193 | + defer, |
1194 | + reactor as default_reactor, |
1195 | + ) |
1196 | +from twisted.web import xmlrpc |
1197 | + |
1198 | from zope.component import getUtility |
1199 | from zope.interface import implements |
1200 | |
1201 | @@ -58,7 +64,6 @@ |
1202 | from lp.buildmaster.interfaces.builder import ( |
1203 | BuildDaemonError, |
1204 | BuildSlaveFailure, |
1205 | - CannotBuild, |
1206 | CannotFetchFile, |
1207 | CannotResumeHost, |
1208 | CorruptBuildCookie, |
1209 | @@ -66,9 +71,6 @@ |
1210 | IBuilderSet, |
1211 | ) |
1212 | from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSet |
1213 | -from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
1214 | - BuildBehaviorMismatch, |
1215 | - ) |
1216 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
1217 | from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior |
1218 | from lp.buildmaster.model.buildqueue import ( |
1219 | @@ -78,9 +80,9 @@ |
1220 | from lp.registry.interfaces.person import validate_public_person |
1221 | from lp.services.job.interfaces.job import JobStatus |
1222 | from lp.services.job.model.job import Job |
1223 | -from lp.services.osutils import until_no_eintr |
1224 | from lp.services.propertycache import cachedproperty |
1225 | -from lp.services.twistedsupport.xmlrpc import BlockingProxy |
1226 | +from lp.services.twistedsupport.processmonitor import ProcessWithTimeout |
1227 | +from lp.services.twistedsupport import cancel_on_timeout |
1228 | # XXX Michael Nelson 2010-01-13 bug=491330 |
1229 | # These dependencies on soyuz will be removed when getBuildRecords() |
1230 | # is moved. |
1231 | @@ -92,25 +94,9 @@ |
1232 | from lp.soyuz.model.processor import Processor |
1233 | |
1234 | |
1235 | -class TimeoutHTTPConnection(httplib.HTTPConnection): |
1236 | - |
1237 | - def connect(self): |
1238 | - """Override the standard connect() methods to set a timeout""" |
1239 | - ret = httplib.HTTPConnection.connect(self) |
1240 | - self.sock.settimeout(config.builddmaster.socket_timeout) |
1241 | - return ret |
1242 | - |
1243 | - |
1244 | -class TimeoutHTTP(httplib.HTTP): |
1245 | - _connection_class = TimeoutHTTPConnection |
1246 | - |
1247 | - |
1248 | -class TimeoutTransport(xmlrpclib.Transport): |
1249 | - """XMLRPC Transport to setup a socket with defined timeout""" |
1250 | - |
1251 | - def make_connection(self, host): |
1252 | - host, extra_headers, x509 = self.get_host_info(host) |
1253 | - return TimeoutHTTP(host) |
1254 | +class QuietQueryFactory(xmlrpc._QueryFactory): |
1255 | + """XMLRPC client factory that doesn't splatter the log with junk.""" |
1256 | + noisy = False |
1257 | |
1258 | |
1259 | class BuilderSlave(object): |
1260 | @@ -125,24 +111,7 @@ |
1261 | # many false positives in your test run and will most likely break |
1262 | # production. |
1263 | |
1264 | - # XXX: This (BuilderSlave) should use composition, rather than |
1265 | - # inheritance. |
1266 | - |
1267 | - # XXX: Have a documented interface for the XML-RPC server: |
1268 | - # - what methods |
1269 | - # - what return values expected |
1270 | - # - what faults |
1271 | - # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py). |
1272 | - |
1273 | - # XXX: Arguably, this interface should be asynchronous |
1274 | - # (i.e. Deferred-returning). This would mean that Builder (see below) |
1275 | - # would have to expect Deferreds. |
1276 | - |
1277 | - # XXX: Once we have a client object with a defined, tested interface, we |
1278 | - # should make a test double that doesn't do any XML-RPC and can be used to |
1279 | - # make testing easier & tests faster. |
1280 | - |
1281 | - def __init__(self, proxy, builder_url, vm_host): |
1282 | + def __init__(self, proxy, builder_url, vm_host, reactor=None): |
1283 | """Initialize a BuilderSlave. |
1284 | |
1285 | :param proxy: An XML-RPC proxy, implementing 'callRemote'. It must |
1286 | @@ -155,63 +124,87 @@ |
1287 | self._file_cache_url = urlappend(builder_url, 'filecache') |
1288 | self._server = proxy |
1289 | |
1290 | + if reactor is None: |
1291 | + self.reactor = default_reactor |
1292 | + else: |
1293 | + self.reactor = reactor |
1294 | + |
1295 | @classmethod |
1296 | - def makeBlockingSlave(cls, builder_url, vm_host): |
1297 | - rpc_url = urlappend(builder_url, 'rpc') |
1298 | - server_proxy = xmlrpclib.ServerProxy( |
1299 | - rpc_url, transport=TimeoutTransport(), allow_none=True) |
1300 | - return cls(BlockingProxy(server_proxy), builder_url, vm_host) |
1301 | + def makeBuilderSlave(cls, builder_url, vm_host, reactor=None, proxy=None): |
1302 | + """Create and return a `BuilderSlave`. |
1303 | + |
1304 | + :param builder_url: The URL of the slave buildd machine, |
1305 | + e.g. http://localhost:8221 |
1306 | + :param vm_host: If the slave is virtual, specify its host machine here. |
1307 | + :param reactor: Used by tests to override the Twisted reactor. |
1308 | + :param proxy: Used By tests to override the xmlrpc.Proxy. |
1309 | + """ |
1310 | + rpc_url = urlappend(builder_url.encode('utf-8'), 'rpc') |
1311 | + if proxy is None: |
1312 | + server_proxy = xmlrpc.Proxy(rpc_url, allowNone=True) |
1313 | + server_proxy.queryFactory = QuietQueryFactory |
1314 | + else: |
1315 | + server_proxy = proxy |
1316 | + return cls(server_proxy, builder_url, vm_host, reactor) |
1317 | + |
1318 | + def _with_timeout(self, d): |
1319 | + TIMEOUT = config.builddmaster.socket_timeout |
1320 | + return cancel_on_timeout(d, TIMEOUT, self.reactor) |
1321 | |
1322 | def abort(self): |
1323 | """Abort the current build.""" |
1324 | - return self._server.callRemote('abort') |
1325 | + return self._with_timeout(self._server.callRemote('abort')) |
1326 | |
1327 | def clean(self): |
1328 | """Clean up the waiting files and reset the slave's internal state.""" |
1329 | - return self._server.callRemote('clean') |
1330 | + return self._with_timeout(self._server.callRemote('clean')) |
1331 | |
1332 | def echo(self, *args): |
1333 | """Echo the arguments back.""" |
1334 | - return self._server.callRemote('echo', *args) |
1335 | + return self._with_timeout(self._server.callRemote('echo', *args)) |
1336 | |
1337 | def info(self): |
1338 | """Return the protocol version and the builder methods supported.""" |
1339 | - return self._server.callRemote('info') |
1340 | + return self._with_timeout(self._server.callRemote('info')) |
1341 | |
1342 | def status(self): |
1343 | """Return the status of the build daemon.""" |
1344 | - return self._server.callRemote('status') |
1345 | + return self._with_timeout(self._server.callRemote('status')) |
1346 | |
1347 | def ensurepresent(self, sha1sum, url, username, password): |
1348 | + # XXX: Nothing external calls this. Make it private. |
1349 | """Attempt to ensure the given file is present.""" |
1350 | - return self._server.callRemote( |
1351 | - 'ensurepresent', sha1sum, url, username, password) |
1352 | + return self._with_timeout(self._server.callRemote( |
1353 | + 'ensurepresent', sha1sum, url, username, password)) |
1354 | |
1355 | def getFile(self, sha_sum): |
1356 | """Construct a file-like object to return the named file.""" |
1357 | + # XXX 2010-10-18 bug=662631 |
1358 | + # Change this to do non-blocking IO. |
1359 | file_url = urlappend(self._file_cache_url, sha_sum) |
1360 | return urllib2.urlopen(file_url) |
1361 | |
1362 | - def resume(self): |
1363 | - """Resume a virtual builder. |
1364 | - |
1365 | - It uses the configuration command-line (replacing 'vm_host') and |
1366 | - return its output. |
1367 | - |
1368 | - :return: a (stdout, stderr, subprocess exitcode) triple |
1369 | + def resume(self, clock=None): |
1370 | + """Resume the builder in an asynchronous fashion. |
1371 | + |
1372 | + We use the builddmaster configuration 'socket_timeout' as |
1373 | + the process timeout. |
1374 | + |
1375 | + :param clock: An optional twisted.internet.task.Clock to override |
1376 | + the default clock. For use in tests. |
1377 | + |
1378 | + :return: a Deferred that returns a |
1379 | + (stdout, stderr, subprocess exitcode) triple |
1380 | """ |
1381 | - # XXX: This executes the vm_resume_command |
1382 | - # synchronously. RecordingSlave does so asynchronously. Since we |
1383 | - # always want to do this asynchronously, there's no need for the |
1384 | - # duplication. |
1385 | resume_command = config.builddmaster.vm_resume_command % { |
1386 | 'vm_host': self._vm_host} |
1387 | - resume_argv = resume_command.split() |
1388 | - resume_process = subprocess.Popen( |
1389 | - resume_argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE) |
1390 | - stdout, stderr = resume_process.communicate() |
1391 | - |
1392 | - return (stdout, stderr, resume_process.returncode) |
1393 | + # Twisted API requires string but the configuration provides unicode. |
1394 | + resume_argv = [term.encode('utf-8') for term in resume_command.split()] |
1395 | + d = defer.Deferred() |
1396 | + p = ProcessWithTimeout( |
1397 | + d, config.builddmaster.socket_timeout, clock=clock) |
1398 | + p.spawnProcess(resume_argv[0], tuple(resume_argv)) |
1399 | + return d |
1400 | |
1401 | def cacheFile(self, logger, libraryfilealias): |
1402 | """Make sure that the file at 'libraryfilealias' is on the slave. |
1403 | @@ -224,13 +217,15 @@ |
1404 | "Asking builder on %s to ensure it has file %s (%s, %s)" % ( |
1405 | self._file_cache_url, libraryfilealias.filename, url, |
1406 | libraryfilealias.content.sha1)) |
1407 | - self.sendFileToSlave(libraryfilealias.content.sha1, url) |
1408 | + return self.sendFileToSlave(libraryfilealias.content.sha1, url) |
1409 | |
1410 | def sendFileToSlave(self, sha1, url, username="", password=""): |
1411 | """Helper to send the file at 'url' with 'sha1' to this builder.""" |
1412 | - present, info = self.ensurepresent(sha1, url, username, password) |
1413 | - if not present: |
1414 | - raise CannotFetchFile(url, info) |
1415 | + d = self.ensurepresent(sha1, url, username, password) |
1416 | + def check_present((present, info)): |
1417 | + if not present: |
1418 | + raise CannotFetchFile(url, info) |
1419 | + return d.addCallback(check_present) |
1420 | |
1421 | def build(self, buildid, builder_type, chroot_sha1, filemap, args): |
1422 | """Build a thing on this build slave. |
1423 | @@ -243,19 +238,18 @@ |
1424 | :param args: A dictionary of extra arguments. The contents depend on |
1425 | the build job type. |
1426 | """ |
1427 | - try: |
1428 | - return self._server.callRemote( |
1429 | - 'build', buildid, builder_type, chroot_sha1, filemap, args) |
1430 | - except xmlrpclib.Fault, info: |
1431 | - raise BuildSlaveFailure(info) |
1432 | + d = self._with_timeout(self._server.callRemote( |
1433 | + 'build', buildid, builder_type, chroot_sha1, filemap, args)) |
1434 | + def got_fault(failure): |
1435 | + failure.trap(xmlrpclib.Fault) |
1436 | + raise BuildSlaveFailure(failure.value) |
1437 | + return d.addErrback(got_fault) |
1438 | |
1439 | |
1440 | # This is a separate function since MockBuilder needs to use it too. |
1441 | # Do not use it -- (Mock)Builder.rescueIfLost should be used instead. |
1442 | def rescueBuilderIfLost(builder, logger=None): |
1443 | """See `IBuilder`.""" |
1444 | - status_sentence = builder.slaveStatusSentence() |
1445 | - |
1446 | # 'ident_position' dict relates the position of the job identifier |
1447 | # token in the sentence received from status(), according the |
1448 | # two status we care about. See see lib/canonical/buildd/slave.py |
1449 | @@ -265,61 +259,58 @@ |
1450 | 'BuilderStatus.WAITING': 2 |
1451 | } |
1452 | |
1453 | - # Isolate the BuilderStatus string, always the first token in |
1454 | - # see lib/canonical/buildd/slave.py and |
1455 | - # IBuilder.slaveStatusSentence(). |
1456 | - status = status_sentence[0] |
1457 | - |
1458 | - # If the cookie test below fails, it will request an abort of the |
1459 | - # builder. This will leave the builder in the aborted state and |
1460 | - # with no assigned job, and we should now "clean" the slave which |
1461 | - # will reset its state back to IDLE, ready to accept new builds. |
1462 | - # This situation is usually caused by a temporary loss of |
1463 | - # communications with the slave and the build manager had to reset |
1464 | - # the job. |
1465 | - if status == 'BuilderStatus.ABORTED' and builder.currentjob is None: |
1466 | - builder.cleanSlave() |
1467 | - if logger is not None: |
1468 | - logger.info( |
1469 | - "Builder '%s' cleaned up from ABORTED" % builder.name) |
1470 | - return |
1471 | - |
1472 | - # If slave is not building nor waiting, it's not in need of rescuing. |
1473 | - if status not in ident_position.keys(): |
1474 | - return |
1475 | - |
1476 | - slave_build_id = status_sentence[ident_position[status]] |
1477 | - |
1478 | - try: |
1479 | - builder.verifySlaveBuildCookie(slave_build_id) |
1480 | - except CorruptBuildCookie, reason: |
1481 | - if status == 'BuilderStatus.WAITING': |
1482 | - builder.cleanSlave() |
1483 | + d = builder.slaveStatusSentence() |
1484 | + |
1485 | + def got_status(status_sentence): |
1486 | + """After we get the status, clean if we have to. |
1487 | + |
1488 | + Always return status_sentence. |
1489 | + """ |
1490 | + # Isolate the BuilderStatus string, always the first token in |
1491 | + # see lib/canonical/buildd/slave.py and |
1492 | + # IBuilder.slaveStatusSentence(). |
1493 | + status = status_sentence[0] |
1494 | + |
1495 | + # If the cookie test below fails, it will request an abort of the |
1496 | + # builder. This will leave the builder in the aborted state and |
1497 | + # with no assigned job, and we should now "clean" the slave which |
1498 | + # will reset its state back to IDLE, ready to accept new builds. |
1499 | + # This situation is usually caused by a temporary loss of |
1500 | + # communications with the slave and the build manager had to reset |
1501 | + # the job. |
1502 | + if status == 'BuilderStatus.ABORTED' and builder.currentjob is None: |
1503 | + if logger is not None: |
1504 | + logger.info( |
1505 | + "Builder '%s' being cleaned up from ABORTED" % |
1506 | + (builder.name,)) |
1507 | + d = builder.cleanSlave() |
1508 | + return d.addCallback(lambda ignored: status_sentence) |
1509 | else: |
1510 | - builder.requestAbort() |
1511 | - if logger: |
1512 | - logger.info( |
1513 | - "Builder '%s' rescued from '%s': '%s'" % |
1514 | - (builder.name, slave_build_id, reason)) |
1515 | - |
1516 | - |
1517 | -def _update_builder_status(builder, logger=None): |
1518 | - """Really update the builder status.""" |
1519 | - try: |
1520 | - builder.checkSlaveAlive() |
1521 | - builder.rescueIfLost(logger) |
1522 | - # Catch only known exceptions. |
1523 | - # XXX cprov 2007-06-15 bug=120571: ValueError & TypeError catching is |
1524 | - # disturbing in this context. We should spend sometime sanitizing the |
1525 | - # exceptions raised in the Builder API since we already started the |
1526 | - # main refactoring of this area. |
1527 | - except (ValueError, TypeError, xmlrpclib.Fault, |
1528 | - BuildDaemonError), reason: |
1529 | - builder.failBuilder(str(reason)) |
1530 | - if logger: |
1531 | - logger.warn( |
1532 | - "%s (%s) marked as failed due to: %s", |
1533 | - builder.name, builder.url, builder.failnotes, exc_info=True) |
1534 | + return status_sentence |
1535 | + |
1536 | + def rescue_slave(status_sentence): |
1537 | + # If slave is not building nor waiting, it's not in need of rescuing. |
1538 | + status = status_sentence[0] |
1539 | + if status not in ident_position.keys(): |
1540 | + return |
1541 | + slave_build_id = status_sentence[ident_position[status]] |
1542 | + try: |
1543 | + builder.verifySlaveBuildCookie(slave_build_id) |
1544 | + except CorruptBuildCookie, reason: |
1545 | + if status == 'BuilderStatus.WAITING': |
1546 | + d = builder.cleanSlave() |
1547 | + else: |
1548 | + d = builder.requestAbort() |
1549 | + def log_rescue(ignored): |
1550 | + if logger: |
1551 | + logger.info( |
1552 | + "Builder '%s' rescued from '%s': '%s'" % |
1553 | + (builder.name, slave_build_id, reason)) |
1554 | + return d.addCallback(log_rescue) |
1555 | + |
1556 | + d.addCallback(got_status) |
1557 | + d.addCallback(rescue_slave) |
1558 | + return d |
1559 | |
1560 | |
1561 | def updateBuilderStatus(builder, logger=None): |
1562 | @@ -327,16 +318,7 @@ |
1563 | if logger: |
1564 | logger.debug('Checking %s' % builder.name) |
1565 | |
1566 | - MAX_EINTR_RETRIES = 42 # pulling a number out of my a$$ here |
1567 | - try: |
1568 | - return until_no_eintr( |
1569 | - MAX_EINTR_RETRIES, _update_builder_status, builder, logger=logger) |
1570 | - except socket.error, reason: |
1571 | - # In Python 2.6 we can use IOError instead. It also has |
1572 | - # reason.errno but we might be using 2.5 here so use the |
1573 | - # index hack. |
1574 | - error_message = str(reason) |
1575 | - builder.handleTimeout(logger, error_message) |
1576 | + return builder.rescueIfLost(logger) |
1577 | |
1578 | |
1579 | class Builder(SQLBase): |
1580 | @@ -364,6 +346,10 @@ |
1581 | active = BoolCol(dbName='active', notNull=True, default=True) |
1582 | failure_count = IntCol(dbName='failure_count', default=0, notNull=True) |
1583 | |
1584 | + # The number of times a builder can consecutively fail before we |
1585 | + # give up and mark it builderok=False. |
1586 | + FAILURE_THRESHOLD = 5 |
1587 | + |
1588 | def _getCurrentBuildBehavior(self): |
1589 | """Return the current build behavior.""" |
1590 | if not safe_hasattr(self, '_current_build_behavior'): |
1591 | @@ -409,18 +395,13 @@ |
1592 | """See `IBuilder`.""" |
1593 | self.failure_count = 0 |
1594 | |
1595 | - def checkSlaveAlive(self): |
1596 | - """See IBuilder.""" |
1597 | - if self.slave.echo("Test")[0] != "Test": |
1598 | - raise BuildDaemonError("Failed to echo OK") |
1599 | - |
1600 | def rescueIfLost(self, logger=None): |
1601 | """See `IBuilder`.""" |
1602 | - rescueBuilderIfLost(self, logger) |
1603 | + return rescueBuilderIfLost(self, logger) |
1604 | |
1605 | def updateStatus(self, logger=None): |
1606 | """See `IBuilder`.""" |
1607 | - updateBuilderStatus(self, logger) |
1608 | + return updateBuilderStatus(self, logger) |
1609 | |
1610 | def cleanSlave(self): |
1611 | """See IBuilder.""" |
1612 | @@ -440,20 +421,23 @@ |
1613 | def resumeSlaveHost(self): |
1614 | """See IBuilder.""" |
1615 | if not self.virtualized: |
1616 | - raise CannotResumeHost('Builder is not virtualized.') |
1617 | + return defer.fail(CannotResumeHost('Builder is not virtualized.')) |
1618 | |
1619 | if not self.vm_host: |
1620 | - raise CannotResumeHost('Undefined vm_host.') |
1621 | + return defer.fail(CannotResumeHost('Undefined vm_host.')) |
1622 | |
1623 | logger = self._getSlaveScannerLogger() |
1624 | logger.debug("Resuming %s (%s)" % (self.name, self.url)) |
1625 | |
1626 | - stdout, stderr, returncode = self.slave.resume() |
1627 | - if returncode != 0: |
1628 | + d = self.slave.resume() |
1629 | + def got_resume_ok((stdout, stderr, returncode)): |
1630 | + return stdout, stderr |
1631 | + def got_resume_bad(failure): |
1632 | + stdout, stderr, code = failure.value |
1633 | raise CannotResumeHost( |
1634 | "Resuming failed:\nOUT:\n%s\nERR:\n%s\n" % (stdout, stderr)) |
1635 | |
1636 | - return stdout, stderr |
1637 | + return d.addCallback(got_resume_ok).addErrback(got_resume_bad) |
1638 | |
1639 | @cachedproperty |
1640 | def slave(self): |
1641 | @@ -462,7 +446,7 @@ |
1642 | # the slave object, which is usually an XMLRPC client, with a |
1643 | # stub object that removes the need to actually create a buildd |
1644 | # slave in various states - which can be hard to create. |
1645 | - return BuilderSlave.makeBlockingSlave(self.url, self.vm_host) |
1646 | + return BuilderSlave.makeBuilderSlave(self.url, self.vm_host) |
1647 | |
1648 | def setSlaveForTesting(self, proxy): |
1649 | """See IBuilder.""" |
1650 | @@ -483,18 +467,23 @@ |
1651 | |
1652 | # If we are building a virtual build, resume the virtual machine. |
1653 | if self.virtualized: |
1654 | - self.resumeSlaveHost() |
1655 | + d = self.resumeSlaveHost() |
1656 | + else: |
1657 | + d = defer.succeed(None) |
1658 | |
1659 | - # Do it. |
1660 | - build_queue_item.markAsBuilding(self) |
1661 | - try: |
1662 | - self.current_build_behavior.dispatchBuildToSlave( |
1663 | + def resume_done(ignored): |
1664 | + return self.current_build_behavior.dispatchBuildToSlave( |
1665 | build_queue_item.id, logger) |
1666 | - except BuildSlaveFailure, e: |
1667 | - logger.debug("Disabling builder: %s" % self.url, exc_info=1) |
1668 | + |
1669 | + def eb_slave_failure(failure): |
1670 | + failure.trap(BuildSlaveFailure) |
1671 | + e = failure.value |
1672 | self.failBuilder( |
1673 | "Exception (%s) when setting up to new job" % (e,)) |
1674 | - except CannotFetchFile, e: |
1675 | + |
1676 | + def eb_cannot_fetch_file(failure): |
1677 | + failure.trap(CannotFetchFile) |
1678 | + e = failure.value |
1679 | message = """Slave '%s' (%s) was unable to fetch file. |
1680 | ****** URL ******** |
1681 | %s |
1682 | @@ -503,10 +492,19 @@ |
1683 | ******************* |
1684 | """ % (self.name, self.url, e.file_url, e.error_information) |
1685 | raise BuildDaemonError(message) |
1686 | - except socket.error, e: |
1687 | + |
1688 | + def eb_socket_error(failure): |
1689 | + failure.trap(socket.error) |
1690 | + e = failure.value |
1691 | error_message = "Exception (%s) when setting up new job" % (e,) |
1692 | - self.handleTimeout(logger, error_message) |
1693 | - raise BuildSlaveFailure |
1694 | + d = self.handleTimeout(logger, error_message) |
1695 | + return d.addBoth(lambda ignored: failure) |
1696 | + |
1697 | + d.addCallback(resume_done) |
1698 | + d.addErrback(eb_slave_failure) |
1699 | + d.addErrback(eb_cannot_fetch_file) |
1700 | + d.addErrback(eb_socket_error) |
1701 | + return d |
1702 | |
1703 | def failBuilder(self, reason): |
1704 | """See IBuilder""" |
1705 | @@ -534,22 +532,24 @@ |
1706 | |
1707 | def slaveStatus(self): |
1708 | """See IBuilder.""" |
1709 | - builder_version, builder_arch, mechanisms = self.slave.info() |
1710 | - status_sentence = self.slave.status() |
1711 | - |
1712 | - status = {'builder_status': status_sentence[0]} |
1713 | - |
1714 | - # Extract detailed status and log information if present. |
1715 | - # Although build_id is also easily extractable here, there is no |
1716 | - # valid reason for anything to use it, so we exclude it. |
1717 | - if status['builder_status'] == 'BuilderStatus.WAITING': |
1718 | - status['build_status'] = status_sentence[1] |
1719 | - else: |
1720 | - if status['builder_status'] == 'BuilderStatus.BUILDING': |
1721 | - status['logtail'] = status_sentence[2] |
1722 | - |
1723 | - self.current_build_behavior.updateSlaveStatus(status_sentence, status) |
1724 | - return status |
1725 | + d = self.slave.status() |
1726 | + def got_status(status_sentence): |
1727 | + status = {'builder_status': status_sentence[0]} |
1728 | + |
1729 | + # Extract detailed status and log information if present. |
1730 | + # Although build_id is also easily extractable here, there is no |
1731 | + # valid reason for anything to use it, so we exclude it. |
1732 | + if status['builder_status'] == 'BuilderStatus.WAITING': |
1733 | + status['build_status'] = status_sentence[1] |
1734 | + else: |
1735 | + if status['builder_status'] == 'BuilderStatus.BUILDING': |
1736 | + status['logtail'] = status_sentence[2] |
1737 | + |
1738 | + self.current_build_behavior.updateSlaveStatus( |
1739 | + status_sentence, status) |
1740 | + return status |
1741 | + |
1742 | + return d.addCallback(got_status) |
1743 | |
1744 | def slaveStatusSentence(self): |
1745 | """See IBuilder.""" |
1746 | @@ -562,13 +562,15 @@ |
1747 | |
1748 | def updateBuild(self, queueItem): |
1749 | """See `IBuilder`.""" |
1750 | - self.current_build_behavior.updateBuild(queueItem) |
1751 | + return self.current_build_behavior.updateBuild(queueItem) |
1752 | |
1753 | def transferSlaveFileToLibrarian(self, file_sha1, filename, private): |
1754 | """See IBuilder.""" |
1755 | out_file_fd, out_file_name = tempfile.mkstemp(suffix=".buildlog") |
1756 | out_file = os.fdopen(out_file_fd, "r+") |
1757 | try: |
1758 | + # XXX 2010-10-18 bug=662631 |
1759 | + # Change this to do non-blocking IO. |
1760 | slave_file = self.slave.getFile(file_sha1) |
1761 | copy_and_close(slave_file, out_file) |
1762 | # If the requested file is the 'buildlog' compress it using gzip |
1763 | @@ -599,18 +601,17 @@ |
1764 | |
1765 | return library_file.id |
1766 | |
1767 | - @property |
1768 | - def is_available(self): |
1769 | + def isAvailable(self): |
1770 | """See `IBuilder`.""" |
1771 | if not self.builderok: |
1772 | - return False |
1773 | - try: |
1774 | - slavestatus = self.slaveStatusSentence() |
1775 | - except (xmlrpclib.Fault, socket.error): |
1776 | - return False |
1777 | - if slavestatus[0] != BuilderStatus.IDLE: |
1778 | - return False |
1779 | - return True |
1780 | + return defer.succeed(False) |
1781 | + d = self.slaveStatusSentence() |
1782 | + def catch_fault(failure): |
1783 | + failure.trap(xmlrpclib.Fault, socket.error) |
1784 | + return False |
1785 | + def check_available(status): |
1786 | + return status[0] == BuilderStatus.IDLE |
1787 | + return d.addCallbacks(check_available, catch_fault) |
1788 | |
1789 | def _getSlaveScannerLogger(self): |
1790 | """Return the logger instance from buildd-slave-scanner.py.""" |
1791 | @@ -621,6 +622,27 @@ |
1792 | logger = logging.getLogger('slave-scanner') |
1793 | return logger |
1794 | |
1795 | + def acquireBuildCandidate(self): |
1796 | + """Acquire a build candidate in an atomic fashion. |
1797 | + |
1798 | + When retrieiving a candidate we need to mark it as building |
1799 | + immediately so that it is not dispatched by another builder in the |
1800 | + build manager. |
1801 | + |
1802 | + We can consider this to be atomic because although the build manager |
1803 | + is a Twisted app and gives the appearance of doing lots of things at |
1804 | + once, it's still single-threaded so no more than one builder scan |
1805 | + can be in this code at the same time. |
1806 | + |
1807 | + If there's ever more than one build manager running at once, then |
1808 | + this code will need some sort of mutex. |
1809 | + """ |
1810 | + candidate = self._findBuildCandidate() |
1811 | + if candidate is not None: |
1812 | + candidate.markAsBuilding(self) |
1813 | + transaction.commit() |
1814 | + return candidate |
1815 | + |
1816 | def _findBuildCandidate(self): |
1817 | """Find a candidate job for dispatch to an idle buildd slave. |
1818 | |
1819 | @@ -700,52 +722,46 @@ |
1820 | :param candidate: The job to dispatch. |
1821 | """ |
1822 | logger = self._getSlaveScannerLogger() |
1823 | - try: |
1824 | - self.startBuild(candidate, logger) |
1825 | - except (BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch), err: |
1826 | - logger.warn('Could not build: %s' % err) |
1827 | + # Using maybeDeferred ensures that any exceptions are also |
1828 | + # wrapped up and caught later. |
1829 | + d = defer.maybeDeferred(self.startBuild, candidate, logger) |
1830 | + return d |
1831 | |
1832 | def handleTimeout(self, logger, error_message): |
1833 | """See IBuilder.""" |
1834 | - builder_should_be_failed = True |
1835 | - |
1836 | if self.virtualized: |
1837 | # Virtualized/PPA builder: attempt a reset. |
1838 | logger.warn( |
1839 | "Resetting builder: %s -- %s" % (self.url, error_message), |
1840 | exc_info=True) |
1841 | - try: |
1842 | - self.resumeSlaveHost() |
1843 | - except CannotResumeHost, err: |
1844 | - # Failed to reset builder. |
1845 | - logger.warn( |
1846 | - "Failed to reset builder: %s -- %s" % |
1847 | - (self.url, str(err)), exc_info=True) |
1848 | - else: |
1849 | - # Builder was reset, do *not* mark it as failed. |
1850 | - builder_should_be_failed = False |
1851 | - |
1852 | - if builder_should_be_failed: |
1853 | + d = self.resumeSlaveHost() |
1854 | + return d |
1855 | + else: |
1856 | + # XXX: This should really let the failure bubble up to the |
1857 | + # scan() method that does the failure counting. |
1858 | # Mark builder as 'failed'. |
1859 | logger.warn( |
1860 | - "Disabling builder: %s -- %s" % (self.url, error_message), |
1861 | - exc_info=True) |
1862 | + "Disabling builder: %s -- %s" % (self.url, error_message)) |
1863 | self.failBuilder(error_message) |
1864 | + return defer.succeed(None) |
1865 | |
1866 | def findAndStartJob(self, buildd_slave=None): |
1867 | """See IBuilder.""" |
1868 | + # XXX This method should be removed in favour of two separately |
1869 | + # called methods that find and dispatch the job. It will |
1870 | + # require a lot of test fixing. |
1871 | logger = self._getSlaveScannerLogger() |
1872 | - candidate = self._findBuildCandidate() |
1873 | + candidate = self.acquireBuildCandidate() |
1874 | |
1875 | if candidate is None: |
1876 | logger.debug("No build candidates available for builder.") |
1877 | - return None |
1878 | + return defer.succeed(None) |
1879 | |
1880 | if buildd_slave is not None: |
1881 | self.setSlaveForTesting(buildd_slave) |
1882 | |
1883 | - self._dispatchBuildCandidate(candidate) |
1884 | - return candidate |
1885 | + d = self._dispatchBuildCandidate(candidate) |
1886 | + return d.addCallback(lambda ignored: candidate) |
1887 | |
1888 | def getBuildQueue(self): |
1889 | """See `IBuilder`.""" |
1890 | |
1891 | === modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py' |
1892 | --- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-08-20 20:31:18 +0000 |
1893 | +++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-10-25 19:14:01 +0000 |
1894 | @@ -16,13 +16,18 @@ |
1895 | import socket |
1896 | import xmlrpclib |
1897 | |
1898 | +from twisted.internet import defer |
1899 | + |
1900 | from zope.component import getUtility |
1901 | from zope.interface import implements |
1902 | from zope.security.proxy import removeSecurityProxy |
1903 | |
1904 | from canonical import encoding |
1905 | from canonical.librarian.interfaces import ILibrarianClient |
1906 | -from lp.buildmaster.interfaces.builder import CorruptBuildCookie |
1907 | +from lp.buildmaster.interfaces.builder import ( |
1908 | + BuildSlaveFailure, |
1909 | + CorruptBuildCookie, |
1910 | + ) |
1911 | from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
1912 | BuildBehaviorMismatch, |
1913 | IBuildFarmJobBehavior, |
1914 | @@ -69,54 +74,53 @@ |
1915 | """See `IBuildFarmJobBehavior`.""" |
1916 | logger = logging.getLogger('slave-scanner') |
1917 | |
1918 | - try: |
1919 | - slave_status = self._builder.slaveStatus() |
1920 | - except (xmlrpclib.Fault, socket.error), info: |
1921 | - # XXX cprov 2005-06-29: |
1922 | - # Hmm, a problem with the xmlrpc interface, |
1923 | - # disable the builder ?? or simple notice the failure |
1924 | - # with a timestamp. |
1925 | + d = self._builder.slaveStatus() |
1926 | + |
1927 | + def got_failure(failure): |
1928 | + failure.trap(xmlrpclib.Fault, socket.error) |
1929 | + info = failure.value |
1930 | info = ("Could not contact the builder %s, caught a (%s)" |
1931 | % (queueItem.builder.url, info)) |
1932 | - logger.debug(info, exc_info=True) |
1933 | - # keep the job for scan |
1934 | - return |
1935 | - |
1936 | - builder_status_handlers = { |
1937 | - 'BuilderStatus.IDLE': self.updateBuild_IDLE, |
1938 | - 'BuilderStatus.BUILDING': self.updateBuild_BUILDING, |
1939 | - 'BuilderStatus.ABORTING': self.updateBuild_ABORTING, |
1940 | - 'BuilderStatus.ABORTED': self.updateBuild_ABORTED, |
1941 | - 'BuilderStatus.WAITING': self.updateBuild_WAITING, |
1942 | - } |
1943 | - |
1944 | - builder_status = slave_status['builder_status'] |
1945 | - if builder_status not in builder_status_handlers: |
1946 | - logger.critical( |
1947 | - "Builder on %s returned unknown status %s, failing it" |
1948 | - % (self._builder.url, builder_status)) |
1949 | - self._builder.failBuilder( |
1950 | - "Unknown status code (%s) returned from status() probe." |
1951 | - % builder_status) |
1952 | - # XXX: This will leave the build and job in a bad state, but |
1953 | - # should never be possible, since our builder statuses are |
1954 | - # known. |
1955 | - queueItem._builder = None |
1956 | - queueItem.setDateStarted(None) |
1957 | - return |
1958 | - |
1959 | - # Since logtail is a xmlrpclib.Binary container and it is returned |
1960 | - # from the IBuilder content class, it arrives protected by a Zope |
1961 | - # Security Proxy, which is not declared, thus empty. Before passing |
1962 | - # it to the status handlers we will simply remove the proxy. |
1963 | - logtail = removeSecurityProxy(slave_status.get('logtail')) |
1964 | - |
1965 | - method = builder_status_handlers[builder_status] |
1966 | - try: |
1967 | - method(queueItem, slave_status, logtail, logger) |
1968 | - except TypeError, e: |
1969 | - logger.critical("Received wrong number of args in response.") |
1970 | - logger.exception(e) |
1971 | + raise BuildSlaveFailure(info) |
1972 | + |
1973 | + def got_status(slave_status): |
1974 | + builder_status_handlers = { |
1975 | + 'BuilderStatus.IDLE': self.updateBuild_IDLE, |
1976 | + 'BuilderStatus.BUILDING': self.updateBuild_BUILDING, |
1977 | + 'BuilderStatus.ABORTING': self.updateBuild_ABORTING, |
1978 | + 'BuilderStatus.ABORTED': self.updateBuild_ABORTED, |
1979 | + 'BuilderStatus.WAITING': self.updateBuild_WAITING, |
1980 | + } |
1981 | + |
1982 | + builder_status = slave_status['builder_status'] |
1983 | + if builder_status not in builder_status_handlers: |
1984 | + logger.critical( |
1985 | + "Builder on %s returned unknown status %s, failing it" |
1986 | + % (self._builder.url, builder_status)) |
1987 | + self._builder.failBuilder( |
1988 | + "Unknown status code (%s) returned from status() probe." |
1989 | + % builder_status) |
1990 | + # XXX: This will leave the build and job in a bad state, but |
1991 | + # should never be possible, since our builder statuses are |
1992 | + # known. |
1993 | + queueItem._builder = None |
1994 | + queueItem.setDateStarted(None) |
1995 | + return |
1996 | + |
1997 | + # Since logtail is a xmlrpclib.Binary container and it is |
1998 | + # returned from the IBuilder content class, it arrives |
1999 | + # protected by a Zope Security Proxy, which is not declared, |
2000 | + # thus empty. Before passing it to the status handlers we |
2001 | + # will simply remove the proxy. |
2002 | + logtail = removeSecurityProxy(slave_status.get('logtail')) |
2003 | + |
2004 | + method = builder_status_handlers[builder_status] |
2005 | + return defer.maybeDeferred( |
2006 | + method, queueItem, slave_status, logtail, logger) |
2007 | + |
2008 | + d.addErrback(got_failure) |
2009 | + d.addCallback(got_status) |
2010 | + return d |
2011 | |
2012 | def updateBuild_IDLE(self, queueItem, slave_status, logtail, logger): |
2013 | """Somehow the builder forgot about the build job. |
2014 | @@ -146,11 +150,13 @@ |
2015 | |
2016 | Clean the builder for another jobs. |
2017 | """ |
2018 | - queueItem.builder.cleanSlave() |
2019 | - queueItem.builder = None |
2020 | - if queueItem.job.status != JobStatus.FAILED: |
2021 | - queueItem.job.fail() |
2022 | - queueItem.specific_job.jobAborted() |
2023 | + d = queueItem.builder.cleanSlave() |
2024 | + def got_cleaned(ignored): |
2025 | + queueItem.builder = None |
2026 | + if queueItem.job.status != JobStatus.FAILED: |
2027 | + queueItem.job.fail() |
2028 | + queueItem.specific_job.jobAborted() |
2029 | + return d.addCallback(got_cleaned) |
2030 | |
2031 | def extractBuildStatus(self, slave_status): |
2032 | """Read build status name. |
2033 | @@ -185,6 +191,8 @@ |
2034 | # XXX: dsilvers 2005-03-02: Confirm the builder has the right build? |
2035 | |
2036 | build = queueItem.specific_job.build |
2037 | + # XXX 2010-10-18 bug=662631 |
2038 | + # Change this to do non-blocking IO. |
2039 | build.handleStatus(build_status, librarian, slave_status) |
2040 | |
2041 | |
2042 | |
2043 | === modified file 'lib/lp/buildmaster/model/packagebuild.py' |
2044 | --- lib/lp/buildmaster/model/packagebuild.py 2010-10-02 11:41:43 +0000 |
2045 | +++ lib/lp/buildmaster/model/packagebuild.py 2010-10-25 19:14:01 +0000 |
2046 | @@ -165,6 +165,8 @@ |
2047 | def getLogFromSlave(package_build): |
2048 | """See `IPackageBuild`.""" |
2049 | builder = package_build.buildqueue_record.builder |
2050 | + # XXX 2010-10-18 bug=662631 |
2051 | + # Change this to do non-blocking IO. |
2052 | return builder.transferSlaveFileToLibrarian( |
2053 | SLAVE_LOG_FILENAME, |
2054 | package_build.buildqueue_record.getLogFileName(), |
2055 | @@ -180,6 +182,8 @@ |
2056 | # log, builder and date_finished are read-only, so we must |
2057 | # currently remove the security proxy to set them. |
2058 | naked_build = removeSecurityProxy(build) |
2059 | + # XXX 2010-10-18 bug=662631 |
2060 | + # Change this to do non-blocking IO. |
2061 | naked_build.log = build.getLogFromSlave(build) |
2062 | naked_build.builder = build.buildqueue_record.builder |
2063 | # XXX cprov 20060615 bug=120584: Currently buildduration includes |
2064 | @@ -276,6 +280,8 @@ |
2065 | logger.critical("Unknown BuildStatus '%s' for builder '%s'" |
2066 | % (status, self.buildqueue_record.builder.url)) |
2067 | return |
2068 | + # XXX 2010-10-18 bug=662631 |
2069 | + # Change this to do non-blocking IO. |
2070 | method(librarian, slave_status, logger) |
2071 | |
2072 | def _handleStatus_OK(self, librarian, slave_status, logger): |
2073 | |
2074 | === modified file 'lib/lp/buildmaster/tests/mock_slaves.py' |
2075 | --- lib/lp/buildmaster/tests/mock_slaves.py 2010-09-23 12:35:21 +0000 |
2076 | +++ lib/lp/buildmaster/tests/mock_slaves.py 2010-10-25 19:14:01 +0000 |
2077 | @@ -6,21 +6,40 @@ |
2078 | __metaclass__ = type |
2079 | |
2080 | __all__ = [ |
2081 | + 'AbortedSlave', |
2082 | + 'AbortingSlave', |
2083 | + 'BrokenSlave', |
2084 | + 'BuildingSlave', |
2085 | + 'CorruptBehavior', |
2086 | + 'DeadProxy', |
2087 | + 'LostBuildingBrokenSlave', |
2088 | 'MockBuilder', |
2089 | - 'LostBuildingBrokenSlave', |
2090 | - 'BrokenSlave', |
2091 | 'OkSlave', |
2092 | - 'BuildingSlave', |
2093 | - 'AbortedSlave', |
2094 | + 'SlaveTestHelpers', |
2095 | + 'TrivialBehavior', |
2096 | 'WaitingSlave', |
2097 | - 'AbortingSlave', |
2098 | ] |
2099 | |
2100 | +import fixtures |
2101 | +import os |
2102 | + |
2103 | from StringIO import StringIO |
2104 | import xmlrpclib |
2105 | |
2106 | -from lp.buildmaster.interfaces.builder import CannotFetchFile |
2107 | +from testtools.content import Content |
2108 | +from testtools.content_type import UTF8_TEXT |
2109 | + |
2110 | +from twisted.internet import defer |
2111 | +from twisted.web import xmlrpc |
2112 | + |
2113 | +from canonical.buildd.tests.harness import BuilddSlaveTestSetup |
2114 | + |
2115 | +from lp.buildmaster.interfaces.builder import ( |
2116 | + CannotFetchFile, |
2117 | + CorruptBuildCookie, |
2118 | + ) |
2119 | from lp.buildmaster.model.builder import ( |
2120 | + BuilderSlave, |
2121 | rescueBuilderIfLost, |
2122 | updateBuilderStatus, |
2123 | ) |
2124 | @@ -59,15 +78,9 @@ |
2125 | slave_build_id) |
2126 | |
2127 | def cleanSlave(self): |
2128 | - # XXX: This should not print anything. The print is only here to make |
2129 | - # doc/builder.txt a meaningful test. |
2130 | - print 'Cleaning slave' |
2131 | return self.slave.clean() |
2132 | |
2133 | def requestAbort(self): |
2134 | - # XXX: This should not print anything. The print is only here to make |
2135 | - # doc/builder.txt a meaningful test. |
2136 | - print 'Aborting slave' |
2137 | return self.slave.abort() |
2138 | |
2139 | def resumeSlave(self, logger): |
2140 | @@ -77,10 +90,10 @@ |
2141 | pass |
2142 | |
2143 | def rescueIfLost(self, logger=None): |
2144 | - rescueBuilderIfLost(self, logger) |
2145 | + return rescueBuilderIfLost(self, logger) |
2146 | |
2147 | def updateStatus(self, logger=None): |
2148 | - updateBuilderStatus(self, logger) |
2149 | + return defer.maybeDeferred(updateBuilderStatus, self, logger) |
2150 | |
2151 | |
2152 | # XXX: It would be *really* nice to run some set of tests against the real |
2153 | @@ -95,36 +108,44 @@ |
2154 | self.arch_tag = arch_tag |
2155 | |
2156 | def status(self): |
2157 | - return ('BuilderStatus.IDLE', '') |
2158 | + return defer.succeed(('BuilderStatus.IDLE', '')) |
2159 | |
2160 | def ensurepresent(self, sha1, url, user=None, password=None): |
2161 | self.call_log.append(('ensurepresent', url, user, password)) |
2162 | - return True, None |
2163 | + return defer.succeed((True, None)) |
2164 | |
2165 | def build(self, buildid, buildtype, chroot, filemap, args): |
2166 | self.call_log.append( |
2167 | ('build', buildid, buildtype, chroot, filemap.keys(), args)) |
2168 | info = 'OkSlave BUILDING' |
2169 | - return ('BuildStatus.Building', info) |
2170 | + return defer.succeed(('BuildStatus.Building', info)) |
2171 | |
2172 | def echo(self, *args): |
2173 | self.call_log.append(('echo',) + args) |
2174 | - return args |
2175 | + return defer.succeed(args) |
2176 | |
2177 | def clean(self): |
2178 | self.call_log.append('clean') |
2179 | + return defer.succeed(None) |
2180 | |
2181 | def abort(self): |
2182 | self.call_log.append('abort') |
2183 | + return defer.succeed(None) |
2184 | |
2185 | def info(self): |
2186 | self.call_log.append('info') |
2187 | - return ('1.0', self.arch_tag, 'debian') |
2188 | + return defer.succeed(('1.0', self.arch_tag, 'debian')) |
2189 | + |
2190 | + def resume(self): |
2191 | + self.call_log.append('resume') |
2192 | + return defer.succeed(("", "", 0)) |
2193 | |
2194 | def sendFileToSlave(self, sha1, url, username="", password=""): |
2195 | - present, info = self.ensurepresent(sha1, url, username, password) |
2196 | - if not present: |
2197 | - raise CannotFetchFile(url, info) |
2198 | + d = self.ensurepresent(sha1, url, username, password) |
2199 | + def check_present((present, info)): |
2200 | + if not present: |
2201 | + raise CannotFetchFile(url, info) |
2202 | + return d.addCallback(check_present) |
2203 | |
2204 | def cacheFile(self, logger, libraryfilealias): |
2205 | return self.sendFileToSlave( |
2206 | @@ -141,9 +162,11 @@ |
2207 | def status(self): |
2208 | self.call_log.append('status') |
2209 | buildlog = xmlrpclib.Binary("This is a build log") |
2210 | - return ('BuilderStatus.BUILDING', self.build_id, buildlog) |
2211 | + return defer.succeed( |
2212 | + ('BuilderStatus.BUILDING', self.build_id, buildlog)) |
2213 | |
2214 | def getFile(self, sum): |
2215 | + # XXX: This needs to be updated to return a Deferred. |
2216 | self.call_log.append('getFile') |
2217 | if sum == "buildlog": |
2218 | s = StringIO("This is a build log") |
2219 | @@ -155,11 +178,15 @@ |
2220 | """A mock slave that looks like it's currently waiting.""" |
2221 | |
2222 | def __init__(self, state='BuildStatus.OK', dependencies=None, |
2223 | - build_id='1-1'): |
2224 | + build_id='1-1', filemap=None): |
2225 | super(WaitingSlave, self).__init__() |
2226 | self.state = state |
2227 | self.dependencies = dependencies |
2228 | self.build_id = build_id |
2229 | + if filemap is None: |
2230 | + self.filemap = {} |
2231 | + else: |
2232 | + self.filemap = filemap |
2233 | |
2234 | # By default, the slave only has a buildlog, but callsites |
2235 | # can update this list as needed. |
2236 | @@ -167,10 +194,12 @@ |
2237 | |
2238 | def status(self): |
2239 | self.call_log.append('status') |
2240 | - return ('BuilderStatus.WAITING', self.state, self.build_id, {}, |
2241 | - self.dependencies) |
2242 | + return defer.succeed(( |
2243 | + 'BuilderStatus.WAITING', self.state, self.build_id, self.filemap, |
2244 | + self.dependencies)) |
2245 | |
2246 | def getFile(self, hash): |
2247 | + # XXX: This needs to be updated to return a Deferred. |
2248 | self.call_log.append('getFile') |
2249 | if hash in self.valid_file_hashes: |
2250 | content = "This is a %s" % hash |
2251 | @@ -184,15 +213,19 @@ |
2252 | |
2253 | def status(self): |
2254 | self.call_log.append('status') |
2255 | - return ('BuilderStatus.ABORTING', '1-1') |
2256 | + return defer.succeed(('BuilderStatus.ABORTING', '1-1')) |
2257 | |
2258 | |
2259 | class AbortedSlave(OkSlave): |
2260 | """A mock slave that looks like it's aborted.""" |
2261 | |
2262 | + def clean(self): |
2263 | + self.call_log.append('status') |
2264 | + return defer.succeed(None) |
2265 | + |
2266 | def status(self): |
2267 | - self.call_log.append('status') |
2268 | - return ('BuilderStatus.ABORTED', '1-1') |
2269 | + self.call_log.append('clean') |
2270 | + return defer.succeed(('BuilderStatus.ABORTED', '1-1')) |
2271 | |
2272 | |
2273 | class LostBuildingBrokenSlave: |
2274 | @@ -206,16 +239,108 @@ |
2275 | |
2276 | def status(self): |
2277 | self.call_log.append('status') |
2278 | - return ('BuilderStatus.BUILDING', '1000-10000') |
2279 | + return defer.succeed(('BuilderStatus.BUILDING', '1000-10000')) |
2280 | |
2281 | def abort(self): |
2282 | self.call_log.append('abort') |
2283 | - raise xmlrpclib.Fault(8002, "Could not abort") |
2284 | + return defer.fail(xmlrpclib.Fault(8002, "Could not abort")) |
2285 | |
2286 | |
2287 | class BrokenSlave: |
2288 | """A mock slave that reports that it is broken.""" |
2289 | |
2290 | + def __init__(self): |
2291 | + self.call_log = [] |
2292 | + |
2293 | def status(self): |
2294 | self.call_log.append('status') |
2295 | - raise xmlrpclib.Fault(8001, "Broken slave") |
2296 | + return defer.fail(xmlrpclib.Fault(8001, "Broken slave")) |
2297 | + |
2298 | + |
2299 | +class CorruptBehavior: |
2300 | + |
2301 | + def verifySlaveBuildCookie(self, cookie): |
2302 | + raise CorruptBuildCookie("Bad value: %r" % (cookie,)) |
2303 | + |
2304 | + |
2305 | +class TrivialBehavior: |
2306 | + |
2307 | + def verifySlaveBuildCookie(self, cookie): |
2308 | + pass |
2309 | + |
2310 | + |
2311 | +class DeadProxy(xmlrpc.Proxy): |
2312 | + """An xmlrpc.Proxy that doesn't actually send any messages. |
2313 | + |
2314 | + Used when you want to test timeouts, for example. |
2315 | + """ |
2316 | + |
2317 | + def callRemote(self, *args, **kwargs): |
2318 | + return defer.Deferred() |
2319 | + |
2320 | + |
2321 | +class SlaveTestHelpers(fixtures.Fixture): |
2322 | + |
2323 | + # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`. |
2324 | + BASE_URL = 'http://localhost:8221' |
2325 | + TEST_URL = '%s/rpc/' % (BASE_URL,) |
2326 | + |
2327 | + def getServerSlave(self): |
2328 | + """Set up a test build slave server. |
2329 | + |
2330 | + :return: A `BuilddSlaveTestSetup` object. |
2331 | + """ |
2332 | + tachandler = BuilddSlaveTestSetup() |
2333 | + tachandler.setUp() |
2334 | + # Basically impossible to do this w/ TrialTestCase. But it would be |
2335 | + # really nice to keep it. |
2336 | + # |
2337 | + # def addLogFile(exc_info): |
2338 | + # self.addDetail( |
2339 | + # 'xmlrpc-log-file', |
2340 | + # Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read())) |
2341 | + # self.addOnException(addLogFile) |
2342 | + self.addCleanup(tachandler.tearDown) |
2343 | + return tachandler |
2344 | + |
2345 | + def getClientSlave(self, reactor=None, proxy=None): |
2346 | + """Return a `BuilderSlave` for use in testing. |
2347 | + |
2348 | + Points to a fixed URL that is also used by `BuilddSlaveTestSetup`. |
2349 | + """ |
2350 | + return BuilderSlave.makeBuilderSlave( |
2351 | + self.TEST_URL, 'vmhost', reactor, proxy) |
2352 | + |
2353 | + def makeCacheFile(self, tachandler, filename): |
2354 | + """Make a cache file available on the remote slave. |
2355 | + |
2356 | + :param tachandler: The TacTestSetup object used to start the remote |
2357 | + slave. |
2358 | + :param filename: The name of the file to create in the file cache |
2359 | + area. |
2360 | + """ |
2361 | + path = os.path.join(tachandler.root, 'filecache', filename) |
2362 | + fd = open(path, 'w') |
2363 | + fd.write('something') |
2364 | + fd.close() |
2365 | + self.addCleanup(os.unlink, path) |
2366 | + |
2367 | + def triggerGoodBuild(self, slave, build_id=None): |
2368 | + """Trigger a good build on 'slave'. |
2369 | + |
2370 | + :param slave: A `BuilderSlave` instance to trigger the build on. |
2371 | + :param build_id: The build identifier. If not specified, defaults to |
2372 | + an arbitrary string. |
2373 | + :type build_id: str |
2374 | + :return: The build id returned by the slave. |
2375 | + """ |
2376 | + if build_id is None: |
2377 | + build_id = 'random-build-id' |
2378 | + tachandler = self.getServerSlave() |
2379 | + chroot_file = 'fake-chroot' |
2380 | + dsc_file = 'thing' |
2381 | + self.makeCacheFile(tachandler, chroot_file) |
2382 | + self.makeCacheFile(tachandler, dsc_file) |
2383 | + return slave.build( |
2384 | + build_id, 'debian', chroot_file, {'.dsc': dsc_file}, |
2385 | + {'ogrecomponent': 'main'}) |
2386 | |
2387 | === modified file 'lib/lp/buildmaster/tests/test_builder.py' |
2388 | --- lib/lp/buildmaster/tests/test_builder.py 2010-10-06 09:06:30 +0000 |
2389 | +++ lib/lp/buildmaster/tests/test_builder.py 2010-10-25 19:14:01 +0000 |
2390 | @@ -3,20 +3,24 @@ |
2391 | |
2392 | """Test Builder features.""" |
2393 | |
2394 | -import errno |
2395 | import os |
2396 | -import socket |
2397 | +import signal |
2398 | import xmlrpclib |
2399 | |
2400 | -from testtools.content import Content |
2401 | -from testtools.content_type import UTF8_TEXT |
2402 | +from twisted.web.client import getPage |
2403 | + |
2404 | +from twisted.internet.defer import CancelledError |
2405 | +from twisted.internet.task import Clock |
2406 | +from twisted.python.failure import Failure |
2407 | +from twisted.trial.unittest import TestCase as TrialTestCase |
2408 | |
2409 | from zope.component import getUtility |
2410 | from zope.security.proxy import removeSecurityProxy |
2411 | |
2412 | from canonical.buildd.slave import BuilderStatus |
2413 | -from canonical.buildd.tests.harness import BuilddSlaveTestSetup |
2414 | +from canonical.config import config |
2415 | from canonical.database.sqlbase import flush_database_updates |
2416 | +from canonical.launchpad.scripts import QuietFakeLogger |
2417 | from canonical.launchpad.webapp.interfaces import ( |
2418 | DEFAULT_FLAVOR, |
2419 | IStoreSelector, |
2420 | @@ -24,21 +28,38 @@ |
2421 | ) |
2422 | from canonical.testing.layers import ( |
2423 | DatabaseFunctionalLayer, |
2424 | - LaunchpadZopelessLayer |
2425 | + LaunchpadZopelessLayer, |
2426 | + TwistedLaunchpadZopelessLayer, |
2427 | + TwistedLayer, |
2428 | ) |
2429 | from lp.buildmaster.enums import BuildStatus |
2430 | -from lp.buildmaster.interfaces.builder import IBuilder, IBuilderSet |
2431 | +from lp.buildmaster.interfaces.builder import ( |
2432 | + CannotFetchFile, |
2433 | + IBuilder, |
2434 | + IBuilderSet, |
2435 | + ) |
2436 | from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
2437 | IBuildFarmJobBehavior, |
2438 | ) |
2439 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
2440 | -from lp.buildmaster.model.builder import BuilderSlave |
2441 | +from lp.buildmaster.interfaces.builder import CannotResumeHost |
2442 | from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior |
2443 | from lp.buildmaster.model.buildqueue import BuildQueue |
2444 | from lp.buildmaster.tests.mock_slaves import ( |
2445 | AbortedSlave, |
2446 | + AbortingSlave, |
2447 | + BrokenSlave, |
2448 | + BuildingSlave, |
2449 | + CorruptBehavior, |
2450 | + DeadProxy, |
2451 | + LostBuildingBrokenSlave, |
2452 | MockBuilder, |
2453 | + OkSlave, |
2454 | + SlaveTestHelpers, |
2455 | + TrivialBehavior, |
2456 | + WaitingSlave, |
2457 | ) |
2458 | +from lp.services.job.interfaces.job import JobStatus |
2459 | from lp.soyuz.enums import ( |
2460 | ArchivePurpose, |
2461 | PackagePublishingStatus, |
2462 | @@ -49,9 +70,12 @@ |
2463 | ) |
2464 | from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
2465 | from lp.testing import ( |
2466 | - TestCase, |
2467 | + ANONYMOUS, |
2468 | + login_as, |
2469 | + logout, |
2470 | TestCaseWithFactory, |
2471 | ) |
2472 | +from lp.testing.factory import LaunchpadObjectFactory |
2473 | from lp.testing.fakemethod import FakeMethod |
2474 | |
2475 | |
2476 | @@ -92,42 +116,121 @@ |
2477 | bq = builder.getBuildQueue() |
2478 | self.assertIs(None, bq) |
2479 | |
2480 | - def test_updateBuilderStatus_catches_repeated_EINTR(self): |
2481 | - # A single EINTR return from a socket operation should cause the |
2482 | - # operation to be retried, not fail/reset the builder. |
2483 | - builder = removeSecurityProxy(self.factory.makeBuilder()) |
2484 | - builder.handleTimeout = FakeMethod() |
2485 | - builder.rescueIfLost = FakeMethod() |
2486 | - |
2487 | - def _fake_checkSlaveAlive(): |
2488 | - # Raise an EINTR error for all invocations. |
2489 | - raise socket.error(errno.EINTR, "fake eintr") |
2490 | - |
2491 | - builder.checkSlaveAlive = _fake_checkSlaveAlive |
2492 | - builder.updateStatus() |
2493 | - |
2494 | - # builder.updateStatus should eventually have called |
2495 | - # handleTimeout() |
2496 | - self.assertEqual(1, builder.handleTimeout.call_count) |
2497 | - |
2498 | - def test_updateBuilderStatus_catches_single_EINTR(self): |
2499 | - builder = removeSecurityProxy(self.factory.makeBuilder()) |
2500 | - builder.handleTimeout = FakeMethod() |
2501 | - builder.rescueIfLost = FakeMethod() |
2502 | - self.eintr_returned = False |
2503 | - |
2504 | - def _fake_checkSlaveAlive(): |
2505 | - # raise an EINTR error for the first invocation only. |
2506 | - if not self.eintr_returned: |
2507 | - self.eintr_returned = True |
2508 | - raise socket.error(errno.EINTR, "fake eintr") |
2509 | - |
2510 | - builder.checkSlaveAlive = _fake_checkSlaveAlive |
2511 | - builder.updateStatus() |
2512 | - |
2513 | - # builder.updateStatus should never call handleTimeout() for a |
2514 | - # single EINTR. |
2515 | - self.assertEqual(0, builder.handleTimeout.call_count) |
2516 | + |
2517 | +class TestBuilderWithTrial(TrialTestCase): |
2518 | + |
2519 | + layer = TwistedLaunchpadZopelessLayer |
2520 | + |
2521 | + def setUp(self): |
2522 | + super(TestBuilderWithTrial, self) |
2523 | + self.slave_helper = SlaveTestHelpers() |
2524 | + self.slave_helper.setUp() |
2525 | + self.addCleanup(self.slave_helper.cleanUp) |
2526 | + self.factory = LaunchpadObjectFactory() |
2527 | + login_as(ANONYMOUS) |
2528 | + self.addCleanup(logout) |
2529 | + |
2530 | + def test_updateStatus_aborts_lost_and_broken_slave(self): |
2531 | + # A slave that's 'lost' should be aborted; when the slave is |
2532 | + # broken then abort() should also throw a fault. |
2533 | + slave = LostBuildingBrokenSlave() |
2534 | + lostbuilding_builder = MockBuilder( |
2535 | + 'Lost Building Broken Slave', slave, behavior=CorruptBehavior()) |
2536 | + d = lostbuilding_builder.updateStatus(QuietFakeLogger()) |
2537 | + def check_slave_status(failure): |
2538 | + self.assertIn('abort', slave.call_log) |
2539 | + # 'Fault' comes from the LostBuildingBrokenSlave, this is |
2540 | + # just testing that the value is passed through. |
2541 | + self.assertIsInstance(failure.value, xmlrpclib.Fault) |
2542 | + return d.addBoth(check_slave_status) |
2543 | + |
2544 | + def test_resumeSlaveHost_nonvirtual(self): |
2545 | + builder = self.factory.makeBuilder(virtualized=False) |
2546 | + d = builder.resumeSlaveHost() |
2547 | + return self.assertFailure(d, CannotResumeHost) |
2548 | + |
2549 | + def test_resumeSlaveHost_no_vmhost(self): |
2550 | + builder = self.factory.makeBuilder(virtualized=True, vm_host=None) |
2551 | + d = builder.resumeSlaveHost() |
2552 | + return self.assertFailure(d, CannotResumeHost) |
2553 | + |
2554 | + def test_resumeSlaveHost_success(self): |
2555 | + reset_config = """ |
2556 | + [builddmaster] |
2557 | + vm_resume_command: /bin/echo -n parp""" |
2558 | + config.push('reset', reset_config) |
2559 | + self.addCleanup(config.pop, 'reset') |
2560 | + |
2561 | + builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2562 | + d = builder.resumeSlaveHost() |
2563 | + def got_resume(output): |
2564 | + self.assertEqual(('parp', ''), output) |
2565 | + return d.addCallback(got_resume) |
2566 | + |
2567 | + def test_resumeSlaveHost_command_failed(self): |
2568 | + reset_fail_config = """ |
2569 | + [builddmaster] |
2570 | + vm_resume_command: /bin/false""" |
2571 | + config.push('reset fail', reset_fail_config) |
2572 | + self.addCleanup(config.pop, 'reset fail') |
2573 | + builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2574 | + d = builder.resumeSlaveHost() |
2575 | + return self.assertFailure(d, CannotResumeHost) |
2576 | + |
2577 | + def test_handleTimeout_resume_failure(self): |
2578 | + reset_fail_config = """ |
2579 | + [builddmaster] |
2580 | + vm_resume_command: /bin/false""" |
2581 | + config.push('reset fail', reset_fail_config) |
2582 | + self.addCleanup(config.pop, 'reset fail') |
2583 | + builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2584 | + builder.builderok = True |
2585 | + d = builder.handleTimeout(QuietFakeLogger(), 'blah') |
2586 | + return self.assertFailure(d, CannotResumeHost) |
2587 | + |
2588 | + def _setupRecipeBuildAndBuilder(self): |
2589 | + # Helper function to make a builder capable of building a |
2590 | + # recipe, returning both. |
2591 | + processor = self.factory.makeProcessor(name="i386") |
2592 | + builder = self.factory.makeBuilder( |
2593 | + processor=processor, virtualized=True, vm_host="bladh") |
2594 | + builder.setSlaveForTesting(OkSlave()) |
2595 | + distroseries = self.factory.makeDistroSeries() |
2596 | + das = self.factory.makeDistroArchSeries( |
2597 | + distroseries=distroseries, architecturetag="i386", |
2598 | + processorfamily=processor.family) |
2599 | + chroot = self.factory.makeLibraryFileAlias() |
2600 | + das.addOrUpdateChroot(chroot) |
2601 | + distroseries.nominatedarchindep = das |
2602 | + build = self.factory.makeSourcePackageRecipeBuild( |
2603 | + distroseries=distroseries) |
2604 | + return builder, build |
2605 | + |
2606 | + def test_findAndStartJob_returns_candidate(self): |
2607 | + # findAndStartJob finds the next queued job using _findBuildCandidate. |
2608 | + # We don't care about the type of build at all. |
2609 | + builder, build = self._setupRecipeBuildAndBuilder() |
2610 | + candidate = build.queueBuild() |
2611 | + # _findBuildCandidate is tested elsewhere, we just make sure that |
2612 | + # findAndStartJob delegates to it. |
2613 | + removeSecurityProxy(builder)._findBuildCandidate = FakeMethod( |
2614 | + result=candidate) |
2615 | + d = builder.findAndStartJob() |
2616 | + return d.addCallback(self.assertEqual, candidate) |
2617 | + |
2618 | + def test_findAndStartJob_starts_job(self): |
2619 | + # findAndStartJob finds the next queued job using _findBuildCandidate |
2620 | + # and then starts it. |
2621 | + # We don't care about the type of build at all. |
2622 | + builder, build = self._setupRecipeBuildAndBuilder() |
2623 | + candidate = build.queueBuild() |
2624 | + removeSecurityProxy(builder)._findBuildCandidate = FakeMethod( |
2625 | + result=candidate) |
2626 | + d = builder.findAndStartJob() |
2627 | + def check_build_started(candidate): |
2628 | + self.assertEqual(candidate.builder, builder) |
2629 | + self.assertEqual(BuildStatus.BUILDING, build.status) |
2630 | + return d.addCallback(check_build_started) |
2631 | |
2632 | def test_slave(self): |
2633 | # Builder.slave is a BuilderSlave that points at the actual Builder. |
2634 | @@ -136,25 +239,147 @@ |
2635 | builder = removeSecurityProxy(self.factory.makeBuilder()) |
2636 | self.assertEqual(builder.url, builder.slave.url) |
2637 | |
2638 | - |
2639 | -class Test_rescueBuilderIfLost(TestCaseWithFactory): |
2640 | - """Tests for lp.buildmaster.model.builder.rescueBuilderIfLost.""" |
2641 | - |
2642 | - layer = LaunchpadZopelessLayer |
2643 | - |
2644 | def test_recovery_of_aborted_slave(self): |
2645 | # If a slave is in the ABORTED state, rescueBuilderIfLost should |
2646 | # clean it if we don't think it's currently building anything. |
2647 | # See bug 463046. |
2648 | aborted_slave = AbortedSlave() |
2649 | - # The slave's clean() method is normally an XMLRPC call, so we |
2650 | - # can just stub it out and check that it got called. |
2651 | - aborted_slave.clean = FakeMethod() |
2652 | builder = MockBuilder("mock_builder", aborted_slave) |
2653 | builder.currentjob = None |
2654 | - builder.rescueIfLost() |
2655 | - |
2656 | - self.assertEqual(1, aborted_slave.clean.call_count) |
2657 | + d = builder.rescueIfLost() |
2658 | + def check_slave_calls(ignored): |
2659 | + self.assertIn('clean', aborted_slave.call_log) |
2660 | + return d.addCallback(check_slave_calls) |
2661 | + |
2662 | + def test_recover_ok_slave(self): |
2663 | + # An idle slave is not rescued. |
2664 | + slave = OkSlave() |
2665 | + builder = MockBuilder("mock_builder", slave, TrivialBehavior()) |
2666 | + d = builder.rescueIfLost() |
2667 | + def check_slave_calls(ignored): |
2668 | + self.assertNotIn('abort', slave.call_log) |
2669 | + self.assertNotIn('clean', slave.call_log) |
2670 | + return d.addCallback(check_slave_calls) |
2671 | + |
2672 | + def test_recover_waiting_slave_with_good_id(self): |
2673 | + # rescueIfLost does not attempt to abort or clean a builder that is |
2674 | + # WAITING. |
2675 | + waiting_slave = WaitingSlave() |
2676 | + builder = MockBuilder("mock_builder", waiting_slave, TrivialBehavior()) |
2677 | + d = builder.rescueIfLost() |
2678 | + def check_slave_calls(ignored): |
2679 | + self.assertNotIn('abort', waiting_slave.call_log) |
2680 | + self.assertNotIn('clean', waiting_slave.call_log) |
2681 | + return d.addCallback(check_slave_calls) |
2682 | + |
2683 | + def test_recover_waiting_slave_with_bad_id(self): |
2684 | + # If a slave is WAITING with a build for us to get, and the build |
2685 | + # cookie cannot be verified, which means we don't recognize the build, |
2686 | + # then rescueBuilderIfLost should attempt to abort it, so that the |
2687 | + # builder is reset for a new build, and the corrupt build is |
2688 | + # discarded. |
2689 | + waiting_slave = WaitingSlave() |
2690 | + builder = MockBuilder("mock_builder", waiting_slave, CorruptBehavior()) |
2691 | + d = builder.rescueIfLost() |
2692 | + def check_slave_calls(ignored): |
2693 | + self.assertNotIn('abort', waiting_slave.call_log) |
2694 | + self.assertIn('clean', waiting_slave.call_log) |
2695 | + return d.addCallback(check_slave_calls) |
2696 | + |
2697 | + def test_recover_building_slave_with_good_id(self): |
2698 | + # rescueIfLost does not attempt to abort or clean a builder that is |
2699 | + # BUILDING. |
2700 | + building_slave = BuildingSlave() |
2701 | + builder = MockBuilder("mock_builder", building_slave, TrivialBehavior()) |
2702 | + d = builder.rescueIfLost() |
2703 | + def check_slave_calls(ignored): |
2704 | + self.assertNotIn('abort', building_slave.call_log) |
2705 | + self.assertNotIn('clean', building_slave.call_log) |
2706 | + return d.addCallback(check_slave_calls) |
2707 | + |
2708 | + def test_recover_building_slave_with_bad_id(self): |
2709 | + # If a slave is BUILDING with a build id we don't recognize, then we |
2710 | + # abort the build, thus stopping it in its tracks. |
2711 | + building_slave = BuildingSlave() |
2712 | + builder = MockBuilder("mock_builder", building_slave, CorruptBehavior()) |
2713 | + d = builder.rescueIfLost() |
2714 | + def check_slave_calls(ignored): |
2715 | + self.assertIn('abort', building_slave.call_log) |
2716 | + self.assertNotIn('clean', building_slave.call_log) |
2717 | + return d.addCallback(check_slave_calls) |
2718 | + |
2719 | + |
2720 | +class TestBuilderSlaveStatus(TestBuilderWithTrial): |
2721 | + |
2722 | + # Verify what IBuilder.slaveStatus returns with slaves in different |
2723 | + # states. |
2724 | + |
2725 | + def assertStatus(self, slave, builder_status=None, |
2726 | + build_status=None, logtail=False, filemap=None, |
2727 | + dependencies=None): |
2728 | + builder = self.factory.makeBuilder() |
2729 | + builder.setSlaveForTesting(slave) |
2730 | + d = builder.slaveStatus() |
2731 | + |
2732 | + def got_status(status_dict): |
2733 | + expected = {} |
2734 | + if builder_status is not None: |
2735 | + expected["builder_status"] = builder_status |
2736 | + if build_status is not None: |
2737 | + expected["build_status"] = build_status |
2738 | + if dependencies is not None: |
2739 | + expected["dependencies"] = dependencies |
2740 | + |
2741 | + # We don't care so much about the content of the logtail, |
2742 | + # just that it's there. |
2743 | + if logtail: |
2744 | + tail = status_dict.pop("logtail") |
2745 | + self.assertIsInstance(tail, xmlrpclib.Binary) |
2746 | + |
2747 | + self.assertEqual(expected, status_dict) |
2748 | + |
2749 | + return d.addCallback(got_status) |
2750 | + |
2751 | + def test_slaveStatus_idle_slave(self): |
2752 | + self.assertStatus( |
2753 | + OkSlave(), builder_status='BuilderStatus.IDLE') |
2754 | + |
2755 | + def test_slaveStatus_building_slave(self): |
2756 | + self.assertStatus( |
2757 | + BuildingSlave(), builder_status='BuilderStatus.BUILDING', |
2758 | + logtail=True) |
2759 | + |
2760 | + def test_slaveStatus_waiting_slave(self): |
2761 | + self.assertStatus( |
2762 | + WaitingSlave(), builder_status='BuilderStatus.WAITING', |
2763 | + build_status='BuildStatus.OK', filemap={}) |
2764 | + |
2765 | + def test_slaveStatus_aborting_slave(self): |
2766 | + self.assertStatus( |
2767 | + AbortingSlave(), builder_status='BuilderStatus.ABORTING') |
2768 | + |
2769 | + def test_slaveStatus_aborted_slave(self): |
2770 | + self.assertStatus( |
2771 | + AbortedSlave(), builder_status='BuilderStatus.ABORTED') |
2772 | + |
2773 | + def test_isAvailable_with_not_builderok(self): |
2774 | + # isAvailable() is a wrapper around slaveStatusSentence() |
2775 | + builder = self.factory.makeBuilder() |
2776 | + builder.builderok = False |
2777 | + d = builder.isAvailable() |
2778 | + return d.addCallback(self.assertFalse) |
2779 | + |
2780 | + def test_isAvailable_with_slave_fault(self): |
2781 | + builder = self.factory.makeBuilder() |
2782 | + builder.setSlaveForTesting(BrokenSlave()) |
2783 | + d = builder.isAvailable() |
2784 | + return d.addCallback(self.assertFalse) |
2785 | + |
2786 | + def test_isAvailable_with_slave_idle(self): |
2787 | + builder = self.factory.makeBuilder() |
2788 | + builder.setSlaveForTesting(OkSlave()) |
2789 | + d = builder.isAvailable() |
2790 | + return d.addCallback(self.assertTrue) |
2791 | |
2792 | |
2793 | class TestFindBuildCandidateBase(TestCaseWithFactory): |
2794 | @@ -188,6 +413,49 @@ |
2795 | builder.manual = False |
2796 | |
2797 | |
2798 | +class TestFindBuildCandidateGeneralCases(TestFindBuildCandidateBase): |
2799 | + # Test usage of findBuildCandidate not specific to any archive type. |
2800 | + |
2801 | + def test_findBuildCandidate_supersedes_builds(self): |
2802 | + # IBuilder._findBuildCandidate identifies if there are builds |
2803 | + # for superseded source package releases in the queue and marks |
2804 | + # the corresponding build record as SUPERSEDED. |
2805 | + archive = self.factory.makeArchive() |
2806 | + self.publisher.getPubSource( |
2807 | + sourcename="gedit", status=PackagePublishingStatus.PUBLISHED, |
2808 | + archive=archive).createMissingBuilds() |
2809 | + old_candidate = removeSecurityProxy( |
2810 | + self.frog_builder)._findBuildCandidate() |
2811 | + |
2812 | + # The candidate starts off as NEEDSBUILD: |
2813 | + build = getUtility(IBinaryPackageBuildSet).getByQueueEntry( |
2814 | + old_candidate) |
2815 | + self.assertEqual(BuildStatus.NEEDSBUILD, build.status) |
2816 | + |
2817 | + # Now supersede the source package: |
2818 | + publication = build.current_source_publication |
2819 | + publication.status = PackagePublishingStatus.SUPERSEDED |
2820 | + |
2821 | + # The candidate returned is now a different one: |
2822 | + new_candidate = removeSecurityProxy( |
2823 | + self.frog_builder)._findBuildCandidate() |
2824 | + self.assertNotEqual(new_candidate, old_candidate) |
2825 | + |
2826 | + # And the old_candidate is superseded: |
2827 | + self.assertEqual(BuildStatus.SUPERSEDED, build.status) |
2828 | + |
2829 | + def test_acquireBuildCandidate_marks_building(self): |
2830 | + # acquireBuildCandidate() should call _findBuildCandidate and |
2831 | + # mark the build as building. |
2832 | + archive = self.factory.makeArchive() |
2833 | + self.publisher.getPubSource( |
2834 | + sourcename="gedit", status=PackagePublishingStatus.PUBLISHED, |
2835 | + archive=archive).createMissingBuilds() |
2836 | + candidate = removeSecurityProxy( |
2837 | + self.frog_builder).acquireBuildCandidate() |
2838 | + self.assertEqual(JobStatus.RUNNING, candidate.job.status) |
2839 | + |
2840 | + |
2841 | class TestFindBuildCandidatePPAWithSingleBuilder(TestCaseWithFactory): |
2842 | |
2843 | layer = LaunchpadZopelessLayer |
2844 | @@ -320,6 +588,16 @@ |
2845 | build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job) |
2846 | self.failUnlessEqual('joesppa', build.archive.name) |
2847 | |
2848 | + def test_findBuildCandidate_with_disabled_archive(self): |
2849 | + # Disabled archives should not be considered for dispatching |
2850 | + # builds. |
2851 | + disabled_job = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2852 | + build = getUtility(IBinaryPackageBuildSet).getByQueueEntry( |
2853 | + disabled_job) |
2854 | + build.archive.disable() |
2855 | + next_job = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2856 | + self.assertNotEqual(disabled_job, next_job) |
2857 | + |
2858 | |
2859 | class TestFindBuildCandidatePrivatePPA(TestFindBuildCandidatePPABase): |
2860 | |
2861 | @@ -332,6 +610,14 @@ |
2862 | build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job) |
2863 | self.failUnlessEqual('joesppa', build.archive.name) |
2864 | |
2865 | + # If the source for the build is still pending, it won't be |
2866 | + # dispatched because the builder has to fetch the source files |
2867 | + # from the (password protected) repo area, not the librarian. |
2868 | + pub = build.current_source_publication |
2869 | + pub.status = PackagePublishingStatus.PENDING |
2870 | + candidate = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2871 | + self.assertNotEqual(next_job.id, candidate.id) |
2872 | + |
2873 | |
2874 | class TestFindBuildCandidateDistroArchive(TestFindBuildCandidateBase): |
2875 | |
2876 | @@ -474,97 +760,48 @@ |
2877 | self.builder.current_build_behavior, BinaryPackageBuildBehavior) |
2878 | |
2879 | |
2880 | -class TestSlave(TestCase): |
2881 | +class TestSlave(TrialTestCase): |
2882 | """ |
2883 | Integration tests for BuilderSlave that verify how it works against a |
2884 | real slave server. |
2885 | """ |
2886 | |
2887 | + layer = TwistedLayer |
2888 | + |
2889 | + def setUp(self): |
2890 | + super(TestSlave, self).setUp() |
2891 | + self.slave_helper = SlaveTestHelpers() |
2892 | + self.slave_helper.setUp() |
2893 | + self.addCleanup(self.slave_helper.cleanUp) |
2894 | + |
2895 | # XXX: JonathanLange 2010-09-20 bug=643521: There are also tests for |
2896 | # BuilderSlave in buildd-slave.txt and in other places. The tests here |
2897 | # ought to become the canonical tests for BuilderSlave vs running buildd |
2898 | # XML-RPC server interaction. |
2899 | |
2900 | - # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`. |
2901 | - TEST_URL = 'http://localhost:8221/rpc/' |
2902 | - |
2903 | - def getServerSlave(self): |
2904 | - """Set up a test build slave server. |
2905 | - |
2906 | - :return: A `BuilddSlaveTestSetup` object. |
2907 | - """ |
2908 | - tachandler = BuilddSlaveTestSetup() |
2909 | - tachandler.setUp() |
2910 | - self.addCleanup(tachandler.tearDown) |
2911 | - def addLogFile(exc_info): |
2912 | - self.addDetail( |
2913 | - 'xmlrpc-log-file', |
2914 | - Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read())) |
2915 | - self.addOnException(addLogFile) |
2916 | - return tachandler |
2917 | - |
2918 | - def getClientSlave(self): |
2919 | - """Return a `BuilderSlave` for use in testing. |
2920 | - |
2921 | - Points to a fixed URL that is also used by `BuilddSlaveTestSetup`. |
2922 | - """ |
2923 | - return BuilderSlave.makeBlockingSlave(self.TEST_URL, 'vmhost') |
2924 | - |
2925 | - def makeCacheFile(self, tachandler, filename): |
2926 | - """Make a cache file available on the remote slave. |
2927 | - |
2928 | - :param tachandler: The TacTestSetup object used to start the remote |
2929 | - slave. |
2930 | - :param filename: The name of the file to create in the file cache |
2931 | - area. |
2932 | - """ |
2933 | - path = os.path.join(tachandler.root, 'filecache', filename) |
2934 | - fd = open(path, 'w') |
2935 | - fd.write('something') |
2936 | - fd.close() |
2937 | - self.addCleanup(os.unlink, path) |
2938 | - |
2939 | - def triggerGoodBuild(self, slave, build_id=None): |
2940 | - """Trigger a good build on 'slave'. |
2941 | - |
2942 | - :param slave: A `BuilderSlave` instance to trigger the build on. |
2943 | - :param build_id: The build identifier. If not specified, defaults to |
2944 | - an arbitrary string. |
2945 | - :type build_id: str |
2946 | - :return: The build id returned by the slave. |
2947 | - """ |
2948 | - if build_id is None: |
2949 | - build_id = self.getUniqueString() |
2950 | - tachandler = self.getServerSlave() |
2951 | - chroot_file = 'fake-chroot' |
2952 | - dsc_file = 'thing' |
2953 | - self.makeCacheFile(tachandler, chroot_file) |
2954 | - self.makeCacheFile(tachandler, dsc_file) |
2955 | - return slave.build( |
2956 | - build_id, 'debian', chroot_file, {'.dsc': dsc_file}, |
2957 | - {'ogrecomponent': 'main'}) |
2958 | - |
2959 | # XXX 2010-10-06 Julian bug=655559 |
2960 | # This is failing on buildbot but not locally; it's trying to abort |
2961 | # before the build has started. |
2962 | def disabled_test_abort(self): |
2963 | - slave = self.getClientSlave() |
2964 | + slave = self.slave_helper.getClientSlave() |
2965 | # We need to be in a BUILDING state before we can abort. |
2966 | - self.triggerGoodBuild(slave) |
2967 | - result = slave.abort() |
2968 | - self.assertEqual(result, BuilderStatus.ABORTING) |
2969 | + d = self.slave_helper.triggerGoodBuild(slave) |
2970 | + d.addCallback(lambda ignored: slave.abort()) |
2971 | + d.addCallback(self.assertEqual, BuilderStatus.ABORTING) |
2972 | + return d |
2973 | |
2974 | def test_build(self): |
2975 | # Calling 'build' with an expected builder type, a good build id, |
2976 | # valid chroot & filemaps works and returns a BuilderStatus of |
2977 | # BUILDING. |
2978 | build_id = 'some-id' |
2979 | - slave = self.getClientSlave() |
2980 | - result = self.triggerGoodBuild(slave, build_id) |
2981 | - self.assertEqual([BuilderStatus.BUILDING, build_id], result) |
2982 | + slave = self.slave_helper.getClientSlave() |
2983 | + d = self.slave_helper.triggerGoodBuild(slave, build_id) |
2984 | + return d.addCallback( |
2985 | + self.assertEqual, [BuilderStatus.BUILDING, build_id]) |
2986 | |
2987 | def test_clean(self): |
2988 | - slave = self.getClientSlave() |
2989 | + slave = self.slave_helper.getClientSlave() |
2990 | # XXX: JonathanLange 2010-09-21: Calling clean() on the slave requires |
2991 | # it to be in either the WAITING or ABORTED states, and both of these |
2992 | # states are very difficult to achieve in a test environment. For the |
2993 | @@ -574,57 +811,248 @@ |
2994 | def test_echo(self): |
2995 | # Calling 'echo' contacts the server which returns the arguments we |
2996 | # gave it. |
2997 | - self.getServerSlave() |
2998 | - slave = self.getClientSlave() |
2999 | - result = slave.echo('foo', 'bar', 42) |
3000 | - self.assertEqual(['foo', 'bar', 42], result) |
3001 | + self.slave_helper.getServerSlave() |
3002 | + slave = self.slave_helper.getClientSlave() |
3003 | + d = slave.echo('foo', 'bar', 42) |
3004 | + return d.addCallback(self.assertEqual, ['foo', 'bar', 42]) |
3005 | |
3006 | def test_info(self): |
3007 | # Calling 'info' gets some information about the slave. |
3008 | - self.getServerSlave() |
3009 | - slave = self.getClientSlave() |
3010 | - result = slave.info() |
3011 | + self.slave_helper.getServerSlave() |
3012 | + slave = self.slave_helper.getClientSlave() |
3013 | + d = slave.info() |
3014 | # We're testing the hard-coded values, since the version is hard-coded |
3015 | # into the remote slave, the supported build managers are hard-coded |
3016 | # into the tac file for the remote slave and config is returned from |
3017 | # the configuration file. |
3018 | - self.assertEqual( |
3019 | + return d.addCallback( |
3020 | + self.assertEqual, |
3021 | ['1.0', |
3022 | 'i386', |
3023 | ['sourcepackagerecipe', |
3024 | - 'translation-templates', 'binarypackage', 'debian']], |
3025 | - result) |
3026 | + 'translation-templates', 'binarypackage', 'debian']]) |
3027 | |
3028 | def test_initial_status(self): |
3029 | # Calling 'status' returns the current status of the slave. The |
3030 | # initial status is IDLE. |
3031 | - self.getServerSlave() |
3032 | - slave = self.getClientSlave() |
3033 | - status = slave.status() |
3034 | - self.assertEqual([BuilderStatus.IDLE, ''], status) |
3035 | + self.slave_helper.getServerSlave() |
3036 | + slave = self.slave_helper.getClientSlave() |
3037 | + d = slave.status() |
3038 | + return d.addCallback(self.assertEqual, [BuilderStatus.IDLE, '']) |
3039 | |
3040 | def test_status_after_build(self): |
3041 | # Calling 'status' returns the current status of the slave. After a |
3042 | # build has been triggered, the status is BUILDING. |
3043 | - slave = self.getClientSlave() |
3044 | + slave = self.slave_helper.getClientSlave() |
3045 | build_id = 'status-build-id' |
3046 | - self.triggerGoodBuild(slave, build_id) |
3047 | - status = slave.status() |
3048 | - self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2]) |
3049 | - [log_file] = status[2:] |
3050 | - self.assertIsInstance(log_file, xmlrpclib.Binary) |
3051 | + d = self.slave_helper.triggerGoodBuild(slave, build_id) |
3052 | + d.addCallback(lambda ignored: slave.status()) |
3053 | + def check_status(status): |
3054 | + self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2]) |
3055 | + [log_file] = status[2:] |
3056 | + self.assertIsInstance(log_file, xmlrpclib.Binary) |
3057 | + return d.addCallback(check_status) |
3058 | |
3059 | def test_ensurepresent_not_there(self): |
3060 | # ensurepresent checks to see if a file is there. |
3061 | - self.getServerSlave() |
3062 | - slave = self.getClientSlave() |
3063 | - result = slave.ensurepresent('blahblah', None, None, None) |
3064 | - self.assertEqual([False, 'No URL'], result) |
3065 | + self.slave_helper.getServerSlave() |
3066 | + slave = self.slave_helper.getClientSlave() |
3067 | + d = slave.ensurepresent('blahblah', None, None, None) |
3068 | + d.addCallback(self.assertEqual, [False, 'No URL']) |
3069 | + return d |
3070 | |
3071 | def test_ensurepresent_actually_there(self): |
3072 | # ensurepresent checks to see if a file is there. |
3073 | - tachandler = self.getServerSlave() |
3074 | - slave = self.getClientSlave() |
3075 | - self.makeCacheFile(tachandler, 'blahblah') |
3076 | - result = slave.ensurepresent('blahblah', None, None, None) |
3077 | - self.assertEqual([True, 'No URL'], result) |
3078 | + tachandler = self.slave_helper.getServerSlave() |
3079 | + slave = self.slave_helper.getClientSlave() |
3080 | + self.slave_helper.makeCacheFile(tachandler, 'blahblah') |
3081 | + d = slave.ensurepresent('blahblah', None, None, None) |
3082 | + d.addCallback(self.assertEqual, [True, 'No URL']) |
3083 | + return d |
3084 | + |
3085 | + def test_sendFileToSlave_not_there(self): |
3086 | + self.slave_helper.getServerSlave() |
3087 | + slave = self.slave_helper.getClientSlave() |
3088 | + d = slave.sendFileToSlave('blahblah', None, None, None) |
3089 | + return self.assertFailure(d, CannotFetchFile) |
3090 | + |
3091 | + def test_sendFileToSlave_actually_there(self): |
3092 | + tachandler = self.slave_helper.getServerSlave() |
3093 | + slave = self.slave_helper.getClientSlave() |
3094 | + self.slave_helper.makeCacheFile(tachandler, 'blahblah') |
3095 | + d = slave.sendFileToSlave('blahblah', None, None, None) |
3096 | + def check_present(ignored): |
3097 | + d = slave.ensurepresent('blahblah', None, None, None) |
3098 | + return d.addCallback(self.assertEqual, [True, 'No URL']) |
3099 | + d.addCallback(check_present) |
3100 | + return d |
3101 | + |
3102 | + def test_resumeHost_success(self): |
3103 | + # On a successful resume resume() fires the returned deferred |
3104 | + # callback with 'None'. |
3105 | + self.slave_helper.getServerSlave() |
3106 | + slave = self.slave_helper.getClientSlave() |
3107 | + |
3108 | + # The configuration testing command-line. |
3109 | + self.assertEqual( |
3110 | + 'echo %(vm_host)s', config.builddmaster.vm_resume_command) |
3111 | + |
3112 | + # On success the response is None. |
3113 | + def check_resume_success(response): |
3114 | + out, err, code = response |
3115 | + self.assertEqual(os.EX_OK, code) |
3116 | + # XXX: JonathanLange 2010-09-23: We should instead pass the |
3117 | + # expected vm_host into the client slave. Not doing this now, |
3118 | + # since the SlaveHelper is being moved around. |
3119 | + self.assertEqual("%s\n" % slave._vm_host, out) |
3120 | + d = slave.resume() |
3121 | + d.addBoth(check_resume_success) |
3122 | + return d |
3123 | + |
3124 | + def test_resumeHost_failure(self): |
3125 | + # On a failed resume, 'resumeHost' fires the returned deferred |
3126 | + # errorback with the `ProcessTerminated` failure. |
3127 | + self.slave_helper.getServerSlave() |
3128 | + slave = self.slave_helper.getClientSlave() |
3129 | + |
3130 | + # Override the configuration command-line with one that will fail. |
3131 | + failed_config = """ |
3132 | + [builddmaster] |
3133 | + vm_resume_command: test "%(vm_host)s = 'no-sir'" |
3134 | + """ |
3135 | + config.push('failed_resume_command', failed_config) |
3136 | + self.addCleanup(config.pop, 'failed_resume_command') |
3137 | + |
3138 | + # On failures, the response is a twisted `Failure` object containing |
3139 | + # a tuple. |
3140 | + def check_resume_failure(failure): |
3141 | + out, err, code = failure.value |
3142 | + # The process will exit with a return code of "1". |
3143 | + self.assertEqual(code, 1) |
3144 | + d = slave.resume() |
3145 | + d.addBoth(check_resume_failure) |
3146 | + return d |
3147 | + |
3148 | + def test_resumeHost_timeout(self): |
3149 | + # On a resume timeouts, 'resumeHost' fires the returned deferred |
3150 | + # errorback with the `TimeoutError` failure. |
3151 | + self.slave_helper.getServerSlave() |
3152 | + slave = self.slave_helper.getClientSlave() |
3153 | + |
3154 | + # Override the configuration command-line with one that will timeout. |
3155 | + timeout_config = """ |
3156 | + [builddmaster] |
3157 | + vm_resume_command: sleep 5 |
3158 | + socket_timeout: 1 |
3159 | + """ |
3160 | + config.push('timeout_resume_command', timeout_config) |
3161 | + self.addCleanup(config.pop, 'timeout_resume_command') |
3162 | + |
3163 | + # On timeouts, the response is a twisted `Failure` object containing |
3164 | + # a `TimeoutError` error. |
3165 | + def check_resume_timeout(failure): |
3166 | + self.assertIsInstance(failure, Failure) |
3167 | + out, err, code = failure.value |
3168 | + self.assertEqual(code, signal.SIGKILL) |
3169 | + clock = Clock() |
3170 | + d = slave.resume(clock=clock) |
3171 | + # Move the clock beyond the socket_timeout but earlier than the |
3172 | + # sleep 5. This stops the test having to wait for the timeout. |
3173 | + # Fast tests FTW! |
3174 | + clock.advance(2) |
3175 | + d.addBoth(check_resume_timeout) |
3176 | + return d |
3177 | + |
3178 | + |
3179 | +class TestSlaveTimeouts(TrialTestCase): |
3180 | + # Testing that the methods that call callRemote() all time out |
3181 | + # as required. |
3182 | + |
3183 | + layer = TwistedLayer |
3184 | + |
3185 | + def setUp(self): |
3186 | + super(TestSlaveTimeouts, self).setUp() |
3187 | + self.slave_helper = SlaveTestHelpers() |
3188 | + self.slave_helper.setUp() |
3189 | + self.addCleanup(self.slave_helper.cleanUp) |
3190 | + self.clock = Clock() |
3191 | + self.proxy = DeadProxy("url") |
3192 | + self.slave = self.slave_helper.getClientSlave( |
3193 | + reactor=self.clock, proxy=self.proxy) |
3194 | + |
3195 | + def assertCancelled(self, d): |
3196 | + self.clock.advance(config.builddmaster.socket_timeout + 1) |
3197 | + return self.assertFailure(d, CancelledError) |
3198 | + |
3199 | + def test_timeout_abort(self): |
3200 | + return self.assertCancelled(self.slave.abort()) |
3201 | + |
3202 | + def test_timeout_clean(self): |
3203 | + return self.assertCancelled(self.slave.clean()) |
3204 | + |
3205 | + def test_timeout_echo(self): |
3206 | + return self.assertCancelled(self.slave.echo()) |
3207 | + |
3208 | + def test_timeout_info(self): |
3209 | + return self.assertCancelled(self.slave.info()) |
3210 | + |
3211 | + def test_timeout_status(self): |
3212 | + return self.assertCancelled(self.slave.status()) |
3213 | + |
3214 | + def test_timeout_ensurepresent(self): |
3215 | + return self.assertCancelled( |
3216 | + self.slave.ensurepresent(None, None, None, None)) |
3217 | + |
3218 | + def test_timeout_build(self): |
3219 | + return self.assertCancelled( |
3220 | + self.slave.build(None, None, None, None, None)) |
3221 | + |
3222 | + |
3223 | +class TestSlaveWithLibrarian(TrialTestCase): |
3224 | + """Tests that need more of Launchpad to run.""" |
3225 | + |
3226 | + layer = TwistedLaunchpadZopelessLayer |
3227 | + |
3228 | + def setUp(self): |
3229 | + super(TestSlaveWithLibrarian, self) |
3230 | + self.slave_helper = SlaveTestHelpers() |
3231 | + self.slave_helper.setUp() |
3232 | + self.addCleanup(self.slave_helper.cleanUp) |
3233 | + self.factory = LaunchpadObjectFactory() |
3234 | + login_as(ANONYMOUS) |
3235 | + self.addCleanup(logout) |
3236 | + |
3237 | + def test_ensurepresent_librarian(self): |
3238 | + # ensurepresent, when given an http URL for a file will download the |
3239 | + # file from that URL and report that the file is present, and it was |
3240 | + # downloaded. |
3241 | + |
3242 | + # Use the Librarian because it's a "convenient" web server. |
3243 | + lf = self.factory.makeLibraryFileAlias( |
3244 | + 'HelloWorld.txt', content="Hello World") |
3245 | + self.layer.txn.commit() |
3246 | + self.slave_helper.getServerSlave() |
3247 | + slave = self.slave_helper.getClientSlave() |
3248 | + d = slave.ensurepresent( |
3249 | + lf.content.sha1, lf.http_url, "", "") |
3250 | + d.addCallback(self.assertEqual, [True, 'Download']) |
3251 | + return d |
3252 | + |
3253 | + def test_retrieve_files_from_filecache(self): |
3254 | + # Files that are present on the slave can be downloaded with a |
3255 | + # filename made from the sha1 of the content underneath the |
3256 | + # 'filecache' directory. |
3257 | + content = "Hello World" |
3258 | + lf = self.factory.makeLibraryFileAlias( |
3259 | + 'HelloWorld.txt', content=content) |
3260 | + self.layer.txn.commit() |
3261 | + expected_url = '%s/filecache/%s' % ( |
3262 | + self.slave_helper.BASE_URL, lf.content.sha1) |
3263 | + self.slave_helper.getServerSlave() |
3264 | + slave = self.slave_helper.getClientSlave() |
3265 | + d = slave.ensurepresent( |
3266 | + lf.content.sha1, lf.http_url, "", "") |
3267 | + def check_file(ignored): |
3268 | + d = getPage(expected_url.encode('utf8')) |
3269 | + return d.addCallback(self.assertEqual, content) |
3270 | + return d.addCallback(check_file) |
3271 | |
3272 | === modified file 'lib/lp/buildmaster/tests/test_manager.py' |
3273 | --- lib/lp/buildmaster/tests/test_manager.py 2010-09-28 11:05:14 +0000 |
3274 | +++ lib/lp/buildmaster/tests/test_manager.py 2010-10-25 19:14:01 +0000 |
3275 | @@ -6,6 +6,7 @@ |
3276 | import os |
3277 | import signal |
3278 | import time |
3279 | +import xmlrpclib |
3280 | |
3281 | import transaction |
3282 | |
3283 | @@ -14,9 +15,7 @@ |
3284 | reactor, |
3285 | task, |
3286 | ) |
3287 | -from twisted.internet.error import ConnectionClosed |
3288 | from twisted.internet.task import ( |
3289 | - Clock, |
3290 | deferLater, |
3291 | ) |
3292 | from twisted.python.failure import Failure |
3293 | @@ -30,577 +29,45 @@ |
3294 | ANONYMOUS, |
3295 | login, |
3296 | ) |
3297 | -from canonical.launchpad.scripts.logger import BufferLogger |
3298 | +from canonical.launchpad.scripts.logger import ( |
3299 | + QuietFakeLogger, |
3300 | + ) |
3301 | from canonical.testing.layers import ( |
3302 | LaunchpadScriptLayer, |
3303 | - LaunchpadZopelessLayer, |
3304 | + TwistedLaunchpadZopelessLayer, |
3305 | TwistedLayer, |
3306 | + ZopelessDatabaseLayer, |
3307 | ) |
3308 | from lp.buildmaster.enums import BuildStatus |
3309 | from lp.buildmaster.interfaces.builder import IBuilderSet |
3310 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
3311 | from lp.buildmaster.manager import ( |
3312 | - BaseDispatchResult, |
3313 | - buildd_success_result_map, |
3314 | + assessFailureCounts, |
3315 | BuilddManager, |
3316 | - FailDispatchResult, |
3317 | NewBuildersScanner, |
3318 | - RecordingSlave, |
3319 | - ResetDispatchResult, |
3320 | SlaveScanner, |
3321 | ) |
3322 | +from lp.buildmaster.model.builder import Builder |
3323 | from lp.buildmaster.tests.harness import BuilddManagerTestSetup |
3324 | -from lp.buildmaster.tests.mock_slaves import BuildingSlave |
3325 | +from lp.buildmaster.tests.mock_slaves import ( |
3326 | + BrokenSlave, |
3327 | + BuildingSlave, |
3328 | + OkSlave, |
3329 | + ) |
3330 | from lp.registry.interfaces.distribution import IDistributionSet |
3331 | from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet |
3332 | -from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
3333 | -from lp.testing import TestCase as LaunchpadTestCase |
3334 | +from lp.testing import TestCaseWithFactory |
3335 | from lp.testing.factory import LaunchpadObjectFactory |
3336 | from lp.testing.fakemethod import FakeMethod |
3337 | from lp.testing.sampledata import BOB_THE_BUILDER_NAME |
3338 | |
3339 | |
3340 | -class TestRecordingSlaves(TrialTestCase): |
3341 | - """Tests for the recording slave class.""" |
3342 | - layer = TwistedLayer |
3343 | - |
3344 | - def setUp(self): |
3345 | - """Setup a fresh `RecordingSlave` for tests.""" |
3346 | - TrialTestCase.setUp(self) |
3347 | - self.slave = RecordingSlave( |
3348 | - 'foo', 'http://foo:8221/rpc', 'foo.host') |
3349 | - |
3350 | - def test_representation(self): |
3351 | - """`RecordingSlave` has a custom representation. |
3352 | - |
3353 | - It encloses builder name and xmlrpc url for debug purposes. |
3354 | - """ |
3355 | - self.assertEqual('<foo:http://foo:8221/rpc>', repr(self.slave)) |
3356 | - |
3357 | - def assert_ensurepresent(self, func): |
3358 | - """Helper function to test results from calling ensurepresent.""" |
3359 | - self.assertEqual( |
3360 | - [True, 'Download'], |
3361 | - func('boing', 'bar', 'baz')) |
3362 | - self.assertEqual( |
3363 | - [('ensurepresent', ('boing', 'bar', 'baz'))], |
3364 | - self.slave.calls) |
3365 | - |
3366 | - def test_ensurepresent(self): |
3367 | - """`RecordingSlave.ensurepresent` always succeeds. |
3368 | - |
3369 | - It returns the expected succeed code and records the interaction |
3370 | - information for later use. |
3371 | - """ |
3372 | - self.assert_ensurepresent(self.slave.ensurepresent) |
3373 | - |
3374 | - def test_sendFileToSlave(self): |
3375 | - """RecordingSlave.sendFileToSlave always succeeeds. |
3376 | - |
3377 | - It calls ensurepresent() and hence returns the same results. |
3378 | - """ |
3379 | - self.assert_ensurepresent(self.slave.sendFileToSlave) |
3380 | - |
3381 | - def test_build(self): |
3382 | - """`RecordingSlave.build` always succeeds. |
3383 | - |
3384 | - It returns the expected succeed code and records the interaction |
3385 | - information for later use. |
3386 | - """ |
3387 | - self.assertEqual( |
3388 | - ['BuilderStatus.BUILDING', 'boing'], |
3389 | - self.slave.build('boing', 'bar', 'baz')) |
3390 | - self.assertEqual( |
3391 | - [('build', ('boing', 'bar', 'baz'))], |
3392 | - self.slave.calls) |
3393 | - |
3394 | - def test_resume(self): |
3395 | - """`RecordingSlave.resume` always returns successs.""" |
3396 | - # Resume isn't requested in a just-instantiated RecordingSlave. |
3397 | - self.assertFalse(self.slave.resume_requested) |
3398 | - |
3399 | - # When resume is called, it returns the success list and mark |
3400 | - # the slave for resuming. |
3401 | - self.assertEqual(['', '', os.EX_OK], self.slave.resume()) |
3402 | - self.assertTrue(self.slave.resume_requested) |
3403 | - |
3404 | - def test_resumeHost_success(self): |
3405 | - # On a successful resume resumeHost() fires the returned deferred |
3406 | - # callback with 'None'. |
3407 | - |
3408 | - # The configuration testing command-line. |
3409 | - self.assertEqual( |
3410 | - 'echo %(vm_host)s', config.builddmaster.vm_resume_command) |
3411 | - |
3412 | - # On success the response is None. |
3413 | - def check_resume_success(response): |
3414 | - out, err, code = response |
3415 | - self.assertEqual(os.EX_OK, code) |
3416 | - self.assertEqual("%s\n" % self.slave.vm_host, out) |
3417 | - d = self.slave.resumeSlave() |
3418 | - d.addBoth(check_resume_success) |
3419 | - return d |
3420 | - |
3421 | - def test_resumeHost_failure(self): |
3422 | - # On a failed resume, 'resumeHost' fires the returned deferred |
3423 | - # errorback with the `ProcessTerminated` failure. |
3424 | - |
3425 | - # Override the configuration command-line with one that will fail. |
3426 | - failed_config = """ |
3427 | - [builddmaster] |
3428 | - vm_resume_command: test "%(vm_host)s = 'no-sir'" |
3429 | - """ |
3430 | - config.push('failed_resume_command', failed_config) |
3431 | - self.addCleanup(config.pop, 'failed_resume_command') |
3432 | - |
3433 | - # On failures, the response is a twisted `Failure` object containing |
3434 | - # a tuple. |
3435 | - def check_resume_failure(failure): |
3436 | - out, err, code = failure.value |
3437 | - # The process will exit with a return code of "1". |
3438 | - self.assertEqual(code, 1) |
3439 | - d = self.slave.resumeSlave() |
3440 | - d.addBoth(check_resume_failure) |
3441 | - return d |
3442 | - |
3443 | - def test_resumeHost_timeout(self): |
3444 | - # On a resume timeouts, 'resumeHost' fires the returned deferred |
3445 | - # errorback with the `TimeoutError` failure. |
3446 | - |
3447 | - # Override the configuration command-line with one that will timeout. |
3448 | - timeout_config = """ |
3449 | - [builddmaster] |
3450 | - vm_resume_command: sleep 5 |
3451 | - socket_timeout: 1 |
3452 | - """ |
3453 | - config.push('timeout_resume_command', timeout_config) |
3454 | - self.addCleanup(config.pop, 'timeout_resume_command') |
3455 | - |
3456 | - # On timeouts, the response is a twisted `Failure` object containing |
3457 | - # a `TimeoutError` error. |
3458 | - def check_resume_timeout(failure): |
3459 | - self.assertIsInstance(failure, Failure) |
3460 | - out, err, code = failure.value |
3461 | - self.assertEqual(code, signal.SIGKILL) |
3462 | - clock = Clock() |
3463 | - d = self.slave.resumeSlave(clock=clock) |
3464 | - # Move the clock beyond the socket_timeout but earlier than the |
3465 | - # sleep 5. This stops the test having to wait for the timeout. |
3466 | - # Fast tests FTW! |
3467 | - clock.advance(2) |
3468 | - d.addBoth(check_resume_timeout) |
3469 | - return d |
3470 | - |
3471 | - |
3472 | -class TestingXMLRPCProxy: |
3473 | - """This class mimics a twisted XMLRPC Proxy class.""" |
3474 | - |
3475 | - def __init__(self, failure_info=None): |
3476 | - self.calls = [] |
3477 | - self.failure_info = failure_info |
3478 | - self.works = failure_info is None |
3479 | - |
3480 | - def callRemote(self, *args): |
3481 | - self.calls.append(args) |
3482 | - if self.works: |
3483 | - result = buildd_success_result_map.get(args[0]) |
3484 | - else: |
3485 | - result = 'boing' |
3486 | - return defer.succeed([result, self.failure_info]) |
3487 | - |
3488 | - |
3489 | -class TestingResetDispatchResult(ResetDispatchResult): |
3490 | - """Override the evaluation method to simply annotate the call.""" |
3491 | - |
3492 | - def __init__(self, slave, info=None): |
3493 | - ResetDispatchResult.__init__(self, slave, info) |
3494 | - self.processed = False |
3495 | - |
3496 | - def __call__(self): |
3497 | - self.processed = True |
3498 | - |
3499 | - |
3500 | -class TestingFailDispatchResult(FailDispatchResult): |
3501 | - """Override the evaluation method to simply annotate the call.""" |
3502 | - |
3503 | - def __init__(self, slave, info=None): |
3504 | - FailDispatchResult.__init__(self, slave, info) |
3505 | - self.processed = False |
3506 | - |
3507 | - def __call__(self): |
3508 | - self.processed = True |
3509 | - |
3510 | - |
3511 | -class TestingSlaveScanner(SlaveScanner): |
3512 | - """Override the dispatch result factories """ |
3513 | - |
3514 | - reset_result = TestingResetDispatchResult |
3515 | - fail_result = TestingFailDispatchResult |
3516 | - |
3517 | - |
3518 | -class TestSlaveScanner(TrialTestCase): |
3519 | - """Tests for the actual build slave manager.""" |
3520 | - layer = LaunchpadZopelessLayer |
3521 | - |
3522 | - def setUp(self): |
3523 | - TrialTestCase.setUp(self) |
3524 | - self.manager = TestingSlaveScanner( |
3525 | - BOB_THE_BUILDER_NAME, BufferLogger()) |
3526 | - |
3527 | - self.fake_builder_url = 'http://bob.buildd:8221/' |
3528 | - self.fake_builder_host = 'bob.host' |
3529 | - |
3530 | - # We will use an instrumented SlaveScanner instance for tests in |
3531 | - # this context. |
3532 | - |
3533 | - # Stop cyclic execution and record the end of the cycle. |
3534 | - self.stopped = False |
3535 | - |
3536 | - def testNextCycle(): |
3537 | - self.stopped = True |
3538 | - |
3539 | - self.manager.scheduleNextScanCycle = testNextCycle |
3540 | - |
3541 | - # Return the testing Proxy version. |
3542 | - self.test_proxy = TestingXMLRPCProxy() |
3543 | - |
3544 | - def testGetProxyForSlave(slave): |
3545 | - return self.test_proxy |
3546 | - self.manager._getProxyForSlave = testGetProxyForSlave |
3547 | - |
3548 | - # Deactivate the 'scan' method. |
3549 | - def testScan(): |
3550 | - pass |
3551 | - self.manager.scan = testScan |
3552 | - |
3553 | - # Stop automatic collection of dispatching results. |
3554 | - def testslaveConversationEnded(): |
3555 | - pass |
3556 | - self._realslaveConversationEnded = self.manager.slaveConversationEnded |
3557 | - self.manager.slaveConversationEnded = testslaveConversationEnded |
3558 | - |
3559 | - def assertIsDispatchReset(self, result): |
3560 | - self.assertTrue( |
3561 | - isinstance(result, TestingResetDispatchResult), |
3562 | - 'Dispatch failure did not result in a ResetBuildResult object') |
3563 | - |
3564 | - def assertIsDispatchFail(self, result): |
3565 | - self.assertTrue( |
3566 | - isinstance(result, TestingFailDispatchResult), |
3567 | - 'Dispatch failure did not result in a FailBuildResult object') |
3568 | - |
3569 | - def test_checkResume(self): |
3570 | - """`SlaveScanner.checkResume` is chained after resume requests. |
3571 | - |
3572 | - If the resume request succeed it returns None, otherwise it returns |
3573 | - a `ResetBuildResult` (the one in the test context) that will be |
3574 | - collect and evaluated later. |
3575 | - |
3576 | - See `RecordingSlave.resumeHost` for more information about the resume |
3577 | - result contents. |
3578 | - """ |
3579 | - slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3580 | - |
3581 | - successful_response = ['', '', os.EX_OK] |
3582 | - result = self.manager.checkResume(successful_response, slave) |
3583 | - self.assertEqual( |
3584 | - None, result, 'Successful resume checks should return None') |
3585 | - |
3586 | - failed_response = ['stdout', 'stderr', 1] |
3587 | - result = self.manager.checkResume(failed_response, slave) |
3588 | - self.assertIsDispatchReset(result) |
3589 | - self.assertEqual( |
3590 | - '<foo:http://foo.buildd:8221/> reset failure', repr(result)) |
3591 | - self.assertEqual( |
3592 | - result.info, "stdout\nstderr") |
3593 | - |
3594 | - def test_fail_to_resume_slave_resets_slave(self): |
3595 | - # If an attempt to resume and dispatch a slave fails, we reset the |
3596 | - # slave by calling self.reset_result(slave)(). |
3597 | - |
3598 | - reset_result_calls = [] |
3599 | - |
3600 | - class LoggingResetResult(BaseDispatchResult): |
3601 | - """A DispatchResult that logs calls to itself. |
3602 | - |
3603 | - This *must* subclass BaseDispatchResult, otherwise finishCycle() |
3604 | - won't treat it like a dispatch result. |
3605 | - """ |
3606 | - |
3607 | - def __init__(self, slave, info=None): |
3608 | - self.slave = slave |
3609 | - |
3610 | - def __call__(self): |
3611 | - reset_result_calls.append(self.slave) |
3612 | - |
3613 | - # Make a failing slave that is requesting a resume. |
3614 | - slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3615 | - slave.resume_requested = True |
3616 | - slave.resumeSlave = lambda: deferLater( |
3617 | - reactor, 0, defer.fail, Failure(('out', 'err', 1))) |
3618 | - |
3619 | - # Make the manager log the reset result calls. |
3620 | - self.manager.reset_result = LoggingResetResult |
3621 | - |
3622 | - # We only care about this one slave. Reset the list of manager |
3623 | - # deferreds in case setUp did something unexpected. |
3624 | - self.manager._deferred_list = [] |
3625 | - |
3626 | - # Here, we're patching the slaveConversationEnded method so we can |
3627 | - # get an extra callback at the end of it, so we can |
3628 | - # verify that the reset_result was really called. |
3629 | - def _slaveConversationEnded(): |
3630 | - d = self._realslaveConversationEnded() |
3631 | - return d.addCallback( |
3632 | - lambda ignored: self.assertEqual([slave], reset_result_calls)) |
3633 | - self.manager.slaveConversationEnded = _slaveConversationEnded |
3634 | - |
3635 | - self.manager.resumeAndDispatch(slave) |
3636 | - |
3637 | - def test_failed_to_resume_slave_ready_for_reset(self): |
3638 | - # When a slave fails to resume, the manager has a Deferred in its |
3639 | - # Deferred list that is ready to fire with a ResetDispatchResult. |
3640 | - |
3641 | - # Make a failing slave that is requesting a resume. |
3642 | - slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3643 | - slave.resume_requested = True |
3644 | - slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1))) |
3645 | - |
3646 | - # We only care about this one slave. Reset the list of manager |
3647 | - # deferreds in case setUp did something unexpected. |
3648 | - self.manager._deferred_list = [] |
3649 | - # Restore the slaveConversationEnded method. It's very relevant to |
3650 | - # this test. |
3651 | - self.manager.slaveConversationEnded = self._realslaveConversationEnded |
3652 | - self.manager.resumeAndDispatch(slave) |
3653 | - [d] = self.manager._deferred_list |
3654 | - |
3655 | - # The Deferred for our failing slave should be ready to fire |
3656 | - # successfully with a ResetDispatchResult. |
3657 | - def check_result(result): |
3658 | - self.assertIsInstance(result, ResetDispatchResult) |
3659 | - self.assertEqual(slave, result.slave) |
3660 | - self.assertFalse(result.processed) |
3661 | - return d.addCallback(check_result) |
3662 | - |
3663 | - def _setUpSlaveAndBuilder(self, builder_failure_count=None, |
3664 | - job_failure_count=None): |
3665 | - # Helper function to set up a builder and its recording slave. |
3666 | - if builder_failure_count is None: |
3667 | - builder_failure_count = 0 |
3668 | - if job_failure_count is None: |
3669 | - job_failure_count = 0 |
3670 | - slave = RecordingSlave( |
3671 | - BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3672 | - self.fake_builder_host) |
3673 | - bob_builder = getUtility(IBuilderSet)[slave.name] |
3674 | - bob_builder.failure_count = builder_failure_count |
3675 | - bob_builder.getCurrentBuildFarmJob().failure_count = job_failure_count |
3676 | - return slave, bob_builder |
3677 | - |
3678 | - def test_checkDispatch_success(self): |
3679 | - # SlaveScanner.checkDispatch returns None for a successful |
3680 | - # dispatch. |
3681 | - |
3682 | - """ |
3683 | - If the dispatch request fails or a unknown method is given, it |
3684 | - returns a `FailDispatchResult` (in the test context) that will |
3685 | - be evaluated later. |
3686 | - |
3687 | - Builders will be marked as failed if the following responses |
3688 | - categories are received. |
3689 | - |
3690 | - * Legitimate slave failures: when the response is a list with 2 |
3691 | - elements but the first element ('status') does not correspond to |
3692 | - the expected 'success' result. See `buildd_success_result_map`. |
3693 | - |
3694 | - * Unexpected (code) failures: when the given 'method' is unknown |
3695 | - or the response isn't a 2-element list or Failure instance. |
3696 | - |
3697 | - Communication failures (a twisted `Failure` instance) will simply |
3698 | - cause the builder to be reset, a `ResetDispatchResult` object is |
3699 | - returned. In other words, network failures are ignored in this |
3700 | - stage, broken builders will be identified and marked as so |
3701 | - during 'scan()' stage. |
3702 | - |
3703 | - On success dispatching it returns None. |
3704 | - """ |
3705 | - slave, bob_builder = self._setUpSlaveAndBuilder( |
3706 | - builder_failure_count=0, job_failure_count=0) |
3707 | - |
3708 | - # Successful legitimate response, None is returned. |
3709 | - successful_response = [ |
3710 | - buildd_success_result_map.get('ensurepresent'), 'cool builder'] |
3711 | - result = self.manager.checkDispatch( |
3712 | - successful_response, 'ensurepresent', slave) |
3713 | - self.assertEqual( |
3714 | - None, result, 'Successful dispatch checks should return None') |
3715 | - |
3716 | - def test_checkDispatch_first_fail(self): |
3717 | - # Failed legitimate response, results in FailDispatchResult and |
3718 | - # failure_count on the job and the builder are both incremented. |
3719 | - slave, bob_builder = self._setUpSlaveAndBuilder( |
3720 | - builder_failure_count=0, job_failure_count=0) |
3721 | - |
3722 | - failed_response = [False, 'uncool builder'] |
3723 | - result = self.manager.checkDispatch( |
3724 | - failed_response, 'ensurepresent', slave) |
3725 | - self.assertIsDispatchFail(result) |
3726 | - self.assertEqual( |
3727 | - repr(result), |
3728 | - '<bob:%s> failure (uncool builder)' % self.fake_builder_url) |
3729 | - self.assertEqual(1, bob_builder.failure_count) |
3730 | - self.assertEqual( |
3731 | - 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3732 | - |
3733 | - def test_checkDispatch_second_reset_fail_by_builder(self): |
3734 | - # Twisted Failure response, results in a `FailDispatchResult`. |
3735 | - slave, bob_builder = self._setUpSlaveAndBuilder( |
3736 | - builder_failure_count=1, job_failure_count=0) |
3737 | - |
3738 | - twisted_failure = Failure(ConnectionClosed('Boom!')) |
3739 | - result = self.manager.checkDispatch( |
3740 | - twisted_failure, 'ensurepresent', slave) |
3741 | - self.assertIsDispatchFail(result) |
3742 | - self.assertEqual( |
3743 | - '<bob:%s> failure (None)' % self.fake_builder_url, repr(result)) |
3744 | - self.assertEqual(2, bob_builder.failure_count) |
3745 | - self.assertEqual( |
3746 | - 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3747 | - |
3748 | - def test_checkDispatch_second_comms_fail_by_builder(self): |
3749 | - # Unexpected response, results in a `FailDispatchResult`. |
3750 | - slave, bob_builder = self._setUpSlaveAndBuilder( |
3751 | - builder_failure_count=1, job_failure_count=0) |
3752 | - |
3753 | - unexpected_response = [1, 2, 3] |
3754 | - result = self.manager.checkDispatch( |
3755 | - unexpected_response, 'build', slave) |
3756 | - self.assertIsDispatchFail(result) |
3757 | - self.assertEqual( |
3758 | - '<bob:%s> failure ' |
3759 | - '(Unexpected response: [1, 2, 3])' % self.fake_builder_url, |
3760 | - repr(result)) |
3761 | - self.assertEqual(2, bob_builder.failure_count) |
3762 | - self.assertEqual( |
3763 | - 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3764 | - |
3765 | - def test_checkDispatch_second_comms_fail_by_job(self): |
3766 | - # Unknown method was given, results in a `FailDispatchResult`. |
3767 | - # This could be caused by a faulty job which would fail the job. |
3768 | - slave, bob_builder = self._setUpSlaveAndBuilder( |
3769 | - builder_failure_count=0, job_failure_count=1) |
3770 | - |
3771 | - successful_response = [ |
3772 | - buildd_success_result_map.get('ensurepresent'), 'cool builder'] |
3773 | - result = self.manager.checkDispatch( |
3774 | - successful_response, 'unknown-method', slave) |
3775 | - self.assertIsDispatchFail(result) |
3776 | - self.assertEqual( |
3777 | - '<bob:%s> failure ' |
3778 | - '(Unknown slave method: unknown-method)' % self.fake_builder_url, |
3779 | - repr(result)) |
3780 | - self.assertEqual(1, bob_builder.failure_count) |
3781 | - self.assertEqual( |
3782 | - 2, bob_builder.getCurrentBuildFarmJob().failure_count) |
3783 | - |
3784 | - def test_initiateDispatch(self): |
3785 | - """Check `dispatchBuild` in various scenarios. |
3786 | - |
3787 | - When there are no recording slaves (i.e. no build got dispatched |
3788 | - in scan()) it simply finishes the cycle. |
3789 | - |
3790 | - When there is a recording slave with pending slave calls, they are |
3791 | - performed and if they all succeed the cycle is finished with no |
3792 | - errors. |
3793 | - |
3794 | - On slave call failure the chain is stopped immediately and an |
3795 | - FailDispatchResult is collected while finishing the cycle. |
3796 | - """ |
3797 | - def check_no_events(results): |
3798 | - errors = [ |
3799 | - r for s, r in results if isinstance(r, BaseDispatchResult)] |
3800 | - self.assertEqual(0, len(errors)) |
3801 | - |
3802 | - def check_events(results): |
3803 | - [error] = [r for s, r in results if r is not None] |
3804 | - self.assertEqual( |
3805 | - '<bob:%s> failure (very broken slave)' |
3806 | - % self.fake_builder_url, |
3807 | - repr(error)) |
3808 | - self.assertTrue(error.processed) |
3809 | - |
3810 | - def _wait_on_deferreds_then_check_no_events(): |
3811 | - dl = self._realslaveConversationEnded() |
3812 | - dl.addCallback(check_no_events) |
3813 | - |
3814 | - def _wait_on_deferreds_then_check_events(): |
3815 | - dl = self._realslaveConversationEnded() |
3816 | - dl.addCallback(check_events) |
3817 | - |
3818 | - # A functional slave charged with some interactions. |
3819 | - slave = RecordingSlave( |
3820 | - BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3821 | - self.fake_builder_host) |
3822 | - slave.ensurepresent('arg1', 'arg2', 'arg3') |
3823 | - slave.build('arg1', 'arg2', 'arg3') |
3824 | - |
3825 | - # If the previous step (resuming) has failed nothing gets dispatched. |
3826 | - reset_result = ResetDispatchResult(slave) |
3827 | - result = self.manager.initiateDispatch(reset_result, slave) |
3828 | - self.assertTrue(result is reset_result) |
3829 | - self.assertFalse(slave.resume_requested) |
3830 | - self.assertEqual(0, len(self.manager._deferred_list)) |
3831 | - |
3832 | - # Operation with the default (funcional slave), no resets or |
3833 | - # failures results are triggered. |
3834 | - slave.resume() |
3835 | - result = self.manager.initiateDispatch(None, slave) |
3836 | - self.assertEqual(None, result) |
3837 | - self.assertTrue(slave.resume_requested) |
3838 | - self.assertEqual( |
3839 | - [('ensurepresent', 'arg1', 'arg2', 'arg3'), |
3840 | - ('build', 'arg1', 'arg2', 'arg3')], |
3841 | - self.test_proxy.calls) |
3842 | - self.assertEqual(2, len(self.manager._deferred_list)) |
3843 | - |
3844 | - # Monkey patch the slaveConversationEnded method so we can chain a |
3845 | - # callback to check the end of the result chain. |
3846 | - self.manager.slaveConversationEnded = \ |
3847 | - _wait_on_deferreds_then_check_no_events |
3848 | - events = self.manager.slaveConversationEnded() |
3849 | - |
3850 | - # Create a broken slave and insert interaction that will |
3851 | - # cause the builder to be marked as fail. |
3852 | - self.test_proxy = TestingXMLRPCProxy('very broken slave') |
3853 | - slave = RecordingSlave( |
3854 | - BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3855 | - self.fake_builder_host) |
3856 | - slave.ensurepresent('arg1', 'arg2', 'arg3') |
3857 | - slave.build('arg1', 'arg2', 'arg3') |
3858 | - |
3859 | - result = self.manager.initiateDispatch(None, slave) |
3860 | - self.assertEqual(None, result) |
3861 | - self.assertEqual(3, len(self.manager._deferred_list)) |
3862 | - self.assertEqual( |
3863 | - [('ensurepresent', 'arg1', 'arg2', 'arg3')], |
3864 | - self.test_proxy.calls) |
3865 | - |
3866 | - # Monkey patch the slaveConversationEnded method so we can chain a |
3867 | - # callback to check the end of the result chain. |
3868 | - self.manager.slaveConversationEnded = \ |
3869 | - _wait_on_deferreds_then_check_events |
3870 | - events = self.manager.slaveConversationEnded() |
3871 | - |
3872 | - return events |
3873 | - |
3874 | - |
3875 | class TestSlaveScannerScan(TrialTestCase): |
3876 | """Tests `SlaveScanner.scan` method. |
3877 | |
3878 | This method uses the old framework for scanning and dispatching builds. |
3879 | """ |
3880 | - layer = LaunchpadZopelessLayer |
3881 | + layer = TwistedLaunchpadZopelessLayer |
3882 | |
3883 | def setUp(self): |
3884 | """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest. |
3885 | @@ -608,19 +75,18 @@ |
3886 | Also adjust the sampledata in a way a build can be dispatched to |
3887 | 'bob' builder. |
3888 | """ |
3889 | + from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
3890 | TwistedLayer.testSetUp() |
3891 | TrialTestCase.setUp(self) |
3892 | self.slave = BuilddSlaveTestSetup() |
3893 | self.slave.setUp() |
3894 | |
3895 | # Creating the required chroots needed for dispatching. |
3896 | - login('foo.bar@canonical.com') |
3897 | test_publisher = SoyuzTestPublisher() |
3898 | ubuntu = getUtility(IDistributionSet).getByName('ubuntu') |
3899 | hoary = ubuntu.getSeries('hoary') |
3900 | test_publisher.setUpDefaultDistroSeries(hoary) |
3901 | test_publisher.addFakeChroots() |
3902 | - login(ANONYMOUS) |
3903 | |
3904 | def tearDown(self): |
3905 | self.slave.tearDown() |
3906 | @@ -628,8 +94,7 @@ |
3907 | TwistedLayer.testTearDown() |
3908 | |
3909 | def _resetBuilder(self, builder): |
3910 | - """Reset the given builder and it's job.""" |
3911 | - login('foo.bar@canonical.com') |
3912 | + """Reset the given builder and its job.""" |
3913 | |
3914 | builder.builderok = True |
3915 | job = builder.currentjob |
3916 | @@ -637,7 +102,6 @@ |
3917 | job.reset() |
3918 | |
3919 | transaction.commit() |
3920 | - login(ANONYMOUS) |
3921 | |
3922 | def assertBuildingJob(self, job, builder, logtail=None): |
3923 | """Assert the given job is building on the given builder.""" |
3924 | @@ -653,55 +117,25 @@ |
3925 | self.assertEqual(build.status, BuildStatus.BUILDING) |
3926 | self.assertEqual(job.logtail, logtail) |
3927 | |
3928 | - def _getManager(self): |
3929 | + def _getScanner(self, builder_name=None): |
3930 | """Instantiate a SlaveScanner object. |
3931 | |
3932 | Replace its default logging handler by a testing version. |
3933 | """ |
3934 | - manager = SlaveScanner(BOB_THE_BUILDER_NAME, BufferLogger()) |
3935 | - manager.logger.name = 'slave-scanner' |
3936 | + if builder_name is None: |
3937 | + builder_name = BOB_THE_BUILDER_NAME |
3938 | + scanner = SlaveScanner(builder_name, QuietFakeLogger()) |
3939 | + scanner.logger.name = 'slave-scanner' |
3940 | |
3941 | - return manager |
3942 | + return scanner |
3943 | |
3944 | def _checkDispatch(self, slave, builder): |
3945 | - """`SlaveScanner.scan` returns a `RecordingSlave`. |
3946 | - |
3947 | - The single slave returned should match the given builder and |
3948 | - contain interactions that should be performed asynchronously for |
3949 | - properly dispatching the sampledata job. |
3950 | - """ |
3951 | - self.assertFalse( |
3952 | - slave is None, "Unexpected recording_slaves.") |
3953 | - |
3954 | - self.assertEqual(slave.name, builder.name) |
3955 | - self.assertEqual(slave.url, builder.url) |
3956 | - self.assertEqual(slave.vm_host, builder.vm_host) |
3957 | + # SlaveScanner.scan returns a slave when a dispatch was |
3958 | + # successful. We also check that the builder has a job on it. |
3959 | + |
3960 | + self.assertTrue(slave is not None, "Expected a slave.") |
3961 | self.assertEqual(0, builder.failure_count) |
3962 | - |
3963 | - self.assertEqual( |
3964 | - [('ensurepresent', |
3965 | - ('0feca720e2c29dafb2c900713ba560e03b758711', |
3966 | - 'http://localhost:58000/93/fake_chroot.tar.gz', |
3967 | - '', '')), |
3968 | - ('ensurepresent', |
3969 | - ('4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33', |
3970 | - 'http://localhost:58000/43/alsa-utils_1.0.9a-4ubuntu1.dsc', |
3971 | - '', '')), |
3972 | - ('build', |
3973 | - ('6358a89e2215e19b02bf91e2e4d009640fae5cf8', |
3974 | - 'binarypackage', '0feca720e2c29dafb2c900713ba560e03b758711', |
3975 | - {'alsa-utils_1.0.9a-4ubuntu1.dsc': |
3976 | - '4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33'}, |
3977 | - {'arch_indep': True, |
3978 | - 'arch_tag': 'i386', |
3979 | - 'archive_private': False, |
3980 | - 'archive_purpose': 'PRIMARY', |
3981 | - 'archives': |
3982 | - ['deb http://ftpmaster.internal/ubuntu hoary main'], |
3983 | - 'build_debug_symbols': False, |
3984 | - 'ogrecomponent': 'main', |
3985 | - 'suite': u'hoary'}))], |
3986 | - slave.calls, "Job was not properly dispatched.") |
3987 | + self.assertTrue(builder.currentjob is not None) |
3988 | |
3989 | def testScanDispatchForResetBuilder(self): |
3990 | # A job gets dispatched to the sampledata builder after it's reset. |
3991 | @@ -709,26 +143,27 @@ |
3992 | # Reset sampledata builder. |
3993 | builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
3994 | self._resetBuilder(builder) |
3995 | + builder.setSlaveForTesting(OkSlave()) |
3996 | # Set this to 1 here so that _checkDispatch can make sure it's |
3997 | # reset to 0 after a successful dispatch. |
3998 | builder.failure_count = 1 |
3999 | |
4000 | # Run 'scan' and check its result. |
4001 | - LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4002 | - manager = self._getManager() |
4003 | - d = defer.maybeDeferred(manager.scan) |
4004 | + self.layer.txn.commit() |
4005 | + self.layer.switchDbUser(config.builddmaster.dbuser) |
4006 | + scanner = self._getScanner() |
4007 | + d = defer.maybeDeferred(scanner.scan) |
4008 | d.addCallback(self._checkDispatch, builder) |
4009 | return d |
4010 | |
4011 | - def _checkNoDispatch(self, recording_slave, builder): |
4012 | + def _checkNoDispatch(self, slave, builder): |
4013 | """Assert that no dispatch has occurred. |
4014 | |
4015 | - 'recording_slave' is None, so no interations would be passed |
4016 | + 'slave' is None, so no interations would be passed |
4017 | to the asynchonous dispatcher and the builder remained active |
4018 | and IDLE. |
4019 | """ |
4020 | - self.assertTrue( |
4021 | - recording_slave is None, "Unexpected recording_slave.") |
4022 | + self.assertTrue(slave is None, "Unexpected slave.") |
4023 | |
4024 | builder = getUtility(IBuilderSet).get(builder.id) |
4025 | self.assertTrue(builder.builderok) |
4026 | @@ -753,9 +188,9 @@ |
4027 | login(ANONYMOUS) |
4028 | |
4029 | # Run 'scan' and check its result. |
4030 | - LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4031 | - manager = self._getManager() |
4032 | - d = defer.maybeDeferred(manager.scan) |
4033 | + self.layer.switchDbUser(config.builddmaster.dbuser) |
4034 | + scanner = self._getScanner() |
4035 | + d = defer.maybeDeferred(scanner.singleCycle) |
4036 | d.addCallback(self._checkNoDispatch, builder) |
4037 | return d |
4038 | |
4039 | @@ -793,9 +228,9 @@ |
4040 | login(ANONYMOUS) |
4041 | |
4042 | # Run 'scan' and check its result. |
4043 | - LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4044 | - manager = self._getManager() |
4045 | - d = defer.maybeDeferred(manager.scan) |
4046 | + self.layer.switchDbUser(config.builddmaster.dbuser) |
4047 | + scanner = self._getScanner() |
4048 | + d = defer.maybeDeferred(scanner.scan) |
4049 | d.addCallback(self._checkJobRescued, builder, job) |
4050 | return d |
4051 | |
4052 | @@ -814,8 +249,6 @@ |
4053 | self.assertBuildingJob(job, builder, logtail='This is a build log') |
4054 | |
4055 | def testScanUpdatesBuildingJobs(self): |
4056 | - # The job assigned to a broken builder is rescued. |
4057 | - |
4058 | # Enable sampledata builder attached to an appropriate testing |
4059 | # slave. It will respond as if it was building the sampledata job. |
4060 | builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4061 | @@ -830,188 +263,174 @@ |
4062 | self.assertBuildingJob(job, builder) |
4063 | |
4064 | # Run 'scan' and check its result. |
4065 | - LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4066 | - manager = self._getManager() |
4067 | - d = defer.maybeDeferred(manager.scan) |
4068 | + self.layer.switchDbUser(config.builddmaster.dbuser) |
4069 | + scanner = self._getScanner() |
4070 | + d = defer.maybeDeferred(scanner.scan) |
4071 | d.addCallback(self._checkJobUpdated, builder, job) |
4072 | return d |
4073 | |
4074 | - def test_scan_assesses_failure_exceptions(self): |
4075 | + def test_scan_with_nothing_to_dispatch(self): |
4076 | + factory = LaunchpadObjectFactory() |
4077 | + builder = factory.makeBuilder() |
4078 | + builder.setSlaveForTesting(OkSlave()) |
4079 | + scanner = self._getScanner(builder_name=builder.name) |
4080 | + d = scanner.scan() |
4081 | + return d.addCallback(self._checkNoDispatch, builder) |
4082 | + |
4083 | + def test_scan_with_manual_builder(self): |
4084 | + # Reset sampledata builder. |
4085 | + builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4086 | + self._resetBuilder(builder) |
4087 | + builder.setSlaveForTesting(OkSlave()) |
4088 | + builder.manual = True |
4089 | + scanner = self._getScanner() |
4090 | + d = scanner.scan() |
4091 | + d.addCallback(self._checkNoDispatch, builder) |
4092 | + return d |
4093 | + |
4094 | + def test_scan_with_not_ok_builder(self): |
4095 | + # Reset sampledata builder. |
4096 | + builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4097 | + self._resetBuilder(builder) |
4098 | + builder.setSlaveForTesting(OkSlave()) |
4099 | + builder.builderok = False |
4100 | + scanner = self._getScanner() |
4101 | + d = scanner.scan() |
4102 | + # Because the builder is not ok, we can't use _checkNoDispatch. |
4103 | + d.addCallback( |
4104 | + lambda ignored: self.assertIdentical(None, builder.currentjob)) |
4105 | + return d |
4106 | + |
4107 | + def test_scan_of_broken_slave(self): |
4108 | + builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4109 | + self._resetBuilder(builder) |
4110 | + builder.setSlaveForTesting(BrokenSlave()) |
4111 | + builder.failure_count = 0 |
4112 | + scanner = self._getScanner(builder_name=builder.name) |
4113 | + d = scanner.scan() |
4114 | + return self.assertFailure(d, xmlrpclib.Fault) |
4115 | + |
4116 | + def _assertFailureCounting(self, builder_count, job_count, |
4117 | + expected_builder_count, expected_job_count): |
4118 | # If scan() fails with an exception, failure_counts should be |
4119 | - # incremented and tested. |
4120 | + # incremented. What we do with the results of the failure |
4121 | + # counts is tested below separately, this test just makes sure that |
4122 | + # scan() is setting the counts. |
4123 | def failing_scan(): |
4124 | - raise Exception("fake exception") |
4125 | - manager = self._getManager() |
4126 | - manager.scan = failing_scan |
4127 | - manager.scheduleNextScanCycle = FakeMethod() |
4128 | + return defer.fail(Exception("fake exception")) |
4129 | + scanner = self._getScanner() |
4130 | + scanner.scan = failing_scan |
4131 | from lp.buildmaster import manager as manager_module |
4132 | self.patch(manager_module, 'assessFailureCounts', FakeMethod()) |
4133 | - builder = getUtility(IBuilderSet)[manager.builder_name] |
4134 | - |
4135 | - # Failure counts start at zero. |
4136 | - self.assertEqual(0, builder.failure_count) |
4137 | - self.assertEqual( |
4138 | - 0, builder.currentjob.specific_job.build.failure_count) |
4139 | - |
4140 | - # startCycle() calls scan() which is our fake one that throws an |
4141 | + builder = getUtility(IBuilderSet)[scanner.builder_name] |
4142 | + |
4143 | + builder.failure_count = builder_count |
4144 | + builder.currentjob.specific_job.build.failure_count = job_count |
4145 | + # The _scanFailed() calls abort, so make sure our existing |
4146 | + # failure counts are persisted. |
4147 | + self.layer.txn.commit() |
4148 | + |
4149 | + # singleCycle() calls scan() which is our fake one that throws an |
4150 | # exception. |
4151 | - manager.startCycle() |
4152 | + d = scanner.singleCycle() |
4153 | |
4154 | # Failure counts should be updated, and the assessment method |
4155 | - # should have been called. |
4156 | - self.assertEqual(1, builder.failure_count) |
4157 | - self.assertEqual( |
4158 | - 1, builder.currentjob.specific_job.build.failure_count) |
4159 | - |
4160 | - self.assertEqual( |
4161 | - 1, manager_module.assessFailureCounts.call_count) |
4162 | - |
4163 | - |
4164 | -class TestDispatchResult(LaunchpadTestCase): |
4165 | - """Tests `BaseDispatchResult` variations. |
4166 | - |
4167 | - Variations of `BaseDispatchResult` when evaluated update the database |
4168 | - information according to their purpose. |
4169 | - """ |
4170 | - |
4171 | - layer = LaunchpadZopelessLayer |
4172 | - |
4173 | - def _getBuilder(self, name): |
4174 | - """Return a fixed `IBuilder` instance from the sampledata. |
4175 | - |
4176 | - Ensure it's active (builderok=True) and it has a in-progress job. |
4177 | - """ |
4178 | - login('foo.bar@canonical.com') |
4179 | - |
4180 | - builder = getUtility(IBuilderSet)[name] |
4181 | - builder.builderok = True |
4182 | - |
4183 | - job = builder.currentjob |
4184 | - build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4185 | - self.assertEqual( |
4186 | - 'i386 build of mozilla-firefox 0.9 in ubuntu hoary RELEASE', |
4187 | - build.title) |
4188 | - |
4189 | - self.assertEqual('BUILDING', build.status.name) |
4190 | - self.assertNotEqual(None, job.builder) |
4191 | - self.assertNotEqual(None, job.date_started) |
4192 | - self.assertNotEqual(None, job.logtail) |
4193 | - |
4194 | - transaction.commit() |
4195 | - |
4196 | - return builder, job.id |
4197 | - |
4198 | - def assertBuildqueueIsClean(self, buildqueue): |
4199 | - # Check that the buildqueue is reset. |
4200 | - self.assertEqual(None, buildqueue.builder) |
4201 | - self.assertEqual(None, buildqueue.date_started) |
4202 | - self.assertEqual(None, buildqueue.logtail) |
4203 | - |
4204 | - def assertBuilderIsClean(self, builder): |
4205 | - # Check that the builder is ready for a new build. |
4206 | - self.assertTrue(builder.builderok) |
4207 | - self.assertIs(None, builder.failnotes) |
4208 | - self.assertIs(None, builder.currentjob) |
4209 | - |
4210 | - def testResetDispatchResult(self): |
4211 | - # Test that `ResetDispatchResult` resets the builder and job. |
4212 | - builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4213 | - buildqueue_id = builder.currentjob.id |
4214 | - builder.builderok = True |
4215 | - builder.failure_count = 1 |
4216 | - |
4217 | - # Setup a interaction to satisfy 'write_transaction' decorator. |
4218 | - login(ANONYMOUS) |
4219 | - slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4220 | - result = ResetDispatchResult(slave) |
4221 | - result() |
4222 | - |
4223 | - buildqueue = getUtility(IBuildQueueSet).get(buildqueue_id) |
4224 | - self.assertBuildqueueIsClean(buildqueue) |
4225 | - |
4226 | - # XXX Julian |
4227 | - # Disabled test until bug 586362 is fixed. |
4228 | - #self.assertFalse(builder.builderok) |
4229 | - self.assertBuilderIsClean(builder) |
4230 | - |
4231 | - def testFailDispatchResult(self): |
4232 | - # Test that `FailDispatchResult` calls assessFailureCounts() so |
4233 | - # that we know the builders and jobs are failed as necessary |
4234 | - # when a FailDispatchResult is called at the end of the dispatch |
4235 | - # chain. |
4236 | - builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4237 | - |
4238 | - # Setup a interaction to satisfy 'write_transaction' decorator. |
4239 | - login(ANONYMOUS) |
4240 | - slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4241 | - result = FailDispatchResult(slave, 'does not work!') |
4242 | - result.assessFailureCounts = FakeMethod() |
4243 | - self.assertEqual(0, result.assessFailureCounts.call_count) |
4244 | - result() |
4245 | - self.assertEqual(1, result.assessFailureCounts.call_count) |
4246 | - |
4247 | - def _setup_failing_dispatch_result(self): |
4248 | - # assessFailureCounts should fail jobs or builders depending on |
4249 | - # whether it sees the failure_counts on each increasing. |
4250 | - builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4251 | - slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4252 | - result = FailDispatchResult(slave, 'does not work!') |
4253 | - return builder, result |
4254 | - |
4255 | - def test_assessFailureCounts_equal_failures(self): |
4256 | - # Basic case where the failure counts are equal and the job is |
4257 | - # reset to try again & the builder is not failed. |
4258 | - builder, result = self._setup_failing_dispatch_result() |
4259 | - buildqueue = builder.currentjob |
4260 | - build = buildqueue.specific_job.build |
4261 | - builder.failure_count = 2 |
4262 | - build.failure_count = 2 |
4263 | - result.assessFailureCounts() |
4264 | - |
4265 | - self.assertBuilderIsClean(builder) |
4266 | - self.assertEqual('NEEDSBUILD', build.status.name) |
4267 | - self.assertBuildqueueIsClean(buildqueue) |
4268 | - |
4269 | - def test_assessFailureCounts_job_failed(self): |
4270 | - # Case where the job has failed more than the builder. |
4271 | - builder, result = self._setup_failing_dispatch_result() |
4272 | - buildqueue = builder.currentjob |
4273 | - build = buildqueue.specific_job.build |
4274 | - build.failure_count = 2 |
4275 | - builder.failure_count = 1 |
4276 | - result.assessFailureCounts() |
4277 | - |
4278 | - self.assertBuilderIsClean(builder) |
4279 | - self.assertEqual('FAILEDTOBUILD', build.status.name) |
4280 | - # The buildqueue should have been removed entirely. |
4281 | - self.assertEqual( |
4282 | - None, getUtility(IBuildQueueSet).getByBuilder(builder), |
4283 | - "Buildqueue was not removed when it should be.") |
4284 | - |
4285 | - def test_assessFailureCounts_builder_failed(self): |
4286 | - # Case where the builder has failed more than the job. |
4287 | - builder, result = self._setup_failing_dispatch_result() |
4288 | - buildqueue = builder.currentjob |
4289 | - build = buildqueue.specific_job.build |
4290 | - build.failure_count = 2 |
4291 | - builder.failure_count = 3 |
4292 | - result.assessFailureCounts() |
4293 | - |
4294 | - self.assertFalse(builder.builderok) |
4295 | - self.assertEqual('does not work!', builder.failnotes) |
4296 | - self.assertTrue(builder.currentjob is None) |
4297 | - self.assertEqual('NEEDSBUILD', build.status.name) |
4298 | - self.assertBuildqueueIsClean(buildqueue) |
4299 | + # should have been called. The actual behaviour is tested below |
4300 | + # in TestFailureAssessments. |
4301 | + def got_scan(ignored): |
4302 | + self.assertEqual(expected_builder_count, builder.failure_count) |
4303 | + self.assertEqual( |
4304 | + expected_job_count, |
4305 | + builder.currentjob.specific_job.build.failure_count) |
4306 | + self.assertEqual( |
4307 | + 1, manager_module.assessFailureCounts.call_count) |
4308 | + |
4309 | + return d.addCallback(got_scan) |
4310 | + |
4311 | + def test_scan_first_fail(self): |
4312 | + # The first failure of a job should result in the failure_count |
4313 | + # on the job and the builder both being incremented. |
4314 | + self._assertFailureCounting( |
4315 | + builder_count=0, job_count=0, expected_builder_count=1, |
4316 | + expected_job_count=1) |
4317 | + |
4318 | + def test_scan_second_builder_fail(self): |
4319 | + # The first failure of a job should result in the failure_count |
4320 | + # on the job and the builder both being incremented. |
4321 | + self._assertFailureCounting( |
4322 | + builder_count=1, job_count=0, expected_builder_count=2, |
4323 | + expected_job_count=1) |
4324 | + |
4325 | + def test_scan_second_job_fail(self): |
4326 | + # The first failure of a job should result in the failure_count |
4327 | + # on the job and the builder both being incremented. |
4328 | + self._assertFailureCounting( |
4329 | + builder_count=0, job_count=1, expected_builder_count=1, |
4330 | + expected_job_count=2) |
4331 | + |
4332 | + def test_scanFailed_handles_lack_of_a_job_on_the_builder(self): |
4333 | + def failing_scan(): |
4334 | + return defer.fail(Exception("fake exception")) |
4335 | + scanner = self._getScanner() |
4336 | + scanner.scan = failing_scan |
4337 | + builder = getUtility(IBuilderSet)[scanner.builder_name] |
4338 | + builder.failure_count = Builder.FAILURE_THRESHOLD |
4339 | + builder.currentjob.reset() |
4340 | + self.layer.txn.commit() |
4341 | + |
4342 | + d = scanner.singleCycle() |
4343 | + |
4344 | + def scan_finished(ignored): |
4345 | + self.assertFalse(builder.builderok) |
4346 | + |
4347 | + return d.addCallback(scan_finished) |
4348 | + |
4349 | + def test_fail_to_resume_slave_resets_job(self): |
4350 | + # If an attempt to resume and dispatch a slave fails, it should |
4351 | + # reset the job via job.reset() |
4352 | + |
4353 | + # Make a slave with a failing resume() method. |
4354 | + slave = OkSlave() |
4355 | + slave.resume = lambda: deferLater( |
4356 | + reactor, 0, defer.fail, Failure(('out', 'err', 1))) |
4357 | + |
4358 | + # Reset sampledata builder. |
4359 | + builder = removeSecurityProxy( |
4360 | + getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]) |
4361 | + self._resetBuilder(builder) |
4362 | + self.assertEqual(0, builder.failure_count) |
4363 | + builder.setSlaveForTesting(slave) |
4364 | + builder.vm_host = "fake_vm_host" |
4365 | + |
4366 | + scanner = self._getScanner() |
4367 | + |
4368 | + # Get the next job that will be dispatched. |
4369 | + job = removeSecurityProxy(builder._findBuildCandidate()) |
4370 | + job.virtualized = True |
4371 | + builder.virtualized = True |
4372 | + d = scanner.singleCycle() |
4373 | + |
4374 | + def check(ignored): |
4375 | + # The failure_count will have been incremented on the |
4376 | + # builder, we can check that to see that a dispatch attempt |
4377 | + # did indeed occur. |
4378 | + self.assertEqual(1, builder.failure_count) |
4379 | + # There should also be no builder set on the job. |
4380 | + self.assertTrue(job.builder is None) |
4381 | + build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4382 | + self.assertEqual(build.status, BuildStatus.NEEDSBUILD) |
4383 | + |
4384 | + return d.addCallback(check) |
4385 | |
4386 | |
4387 | class TestBuilddManager(TrialTestCase): |
4388 | |
4389 | - layer = LaunchpadZopelessLayer |
4390 | + layer = TwistedLaunchpadZopelessLayer |
4391 | |
4392 | def _stub_out_scheduleNextScanCycle(self): |
4393 | # stub out the code that adds a callLater, so that later tests |
4394 | # don't get surprises. |
4395 | - self.patch(SlaveScanner, 'scheduleNextScanCycle', FakeMethod()) |
4396 | + self.patch(SlaveScanner, 'startCycle', FakeMethod()) |
4397 | |
4398 | def test_addScanForBuilders(self): |
4399 | # Test that addScanForBuilders generates NewBuildersScanner objects. |
4400 | @@ -1040,10 +459,62 @@ |
4401 | self.assertNotEqual(0, manager.new_builders_scanner.scan.call_count) |
4402 | |
4403 | |
4404 | +class TestFailureAssessments(TestCaseWithFactory): |
4405 | + |
4406 | + layer = ZopelessDatabaseLayer |
4407 | + |
4408 | + def setUp(self): |
4409 | + TestCaseWithFactory.setUp(self) |
4410 | + self.builder = self.factory.makeBuilder() |
4411 | + self.build = self.factory.makeSourcePackageRecipeBuild() |
4412 | + self.buildqueue = self.build.queueBuild() |
4413 | + self.buildqueue.markAsBuilding(self.builder) |
4414 | + |
4415 | + def test_equal_failures_reset_job(self): |
4416 | + self.builder.gotFailure() |
4417 | + self.builder.getCurrentBuildFarmJob().gotFailure() |
4418 | + |
4419 | + assessFailureCounts(self.builder, "failnotes") |
4420 | + self.assertIs(None, self.builder.currentjob) |
4421 | + self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4422 | + |
4423 | + def test_job_failing_more_than_builder_fails_job(self): |
4424 | + self.builder.getCurrentBuildFarmJob().gotFailure() |
4425 | + |
4426 | + assessFailureCounts(self.builder, "failnotes") |
4427 | + self.assertIs(None, self.builder.currentjob) |
4428 | + self.assertEqual(self.build.status, BuildStatus.FAILEDTOBUILD) |
4429 | + |
4430 | + def test_builder_failing_more_than_job_but_under_fail_threshold(self): |
4431 | + self.builder.failure_count = Builder.FAILURE_THRESHOLD - 1 |
4432 | + |
4433 | + assessFailureCounts(self.builder, "failnotes") |
4434 | + self.assertIs(None, self.builder.currentjob) |
4435 | + self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4436 | + self.assertTrue(self.builder.builderok) |
4437 | + |
4438 | + def test_builder_failing_more_than_job_but_over_fail_threshold(self): |
4439 | + self.builder.failure_count = Builder.FAILURE_THRESHOLD |
4440 | + |
4441 | + assessFailureCounts(self.builder, "failnotes") |
4442 | + self.assertIs(None, self.builder.currentjob) |
4443 | + self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4444 | + self.assertFalse(self.builder.builderok) |
4445 | + self.assertEqual("failnotes", self.builder.failnotes) |
4446 | + |
4447 | + def test_builder_failing_with_no_attached_job(self): |
4448 | + self.buildqueue.reset() |
4449 | + self.builder.failure_count = Builder.FAILURE_THRESHOLD |
4450 | + |
4451 | + assessFailureCounts(self.builder, "failnotes") |
4452 | + self.assertFalse(self.builder.builderok) |
4453 | + self.assertEqual("failnotes", self.builder.failnotes) |
4454 | + |
4455 | + |
4456 | class TestNewBuilders(TrialTestCase): |
4457 | """Test detecting of new builders.""" |
4458 | |
4459 | - layer = LaunchpadZopelessLayer |
4460 | + layer = TwistedLaunchpadZopelessLayer |
4461 | |
4462 | def _getScanner(self, manager=None, clock=None): |
4463 | return NewBuildersScanner(manager=manager, clock=clock) |
4464 | @@ -1084,11 +555,8 @@ |
4465 | new_builders, builder_scanner.checkForNewBuilders()) |
4466 | |
4467 | def test_scan(self): |
4468 | - # See if scan detects new builders and schedules the next scan. |
4469 | + # See if scan detects new builders. |
4470 | |
4471 | - # stub out the addScanForBuilders and scheduleScan methods since |
4472 | - # they use callLater; we only want to assert that they get |
4473 | - # called. |
4474 | def fake_checkForNewBuilders(): |
4475 | return "new_builders" |
4476 | |
4477 | @@ -1104,9 +572,6 @@ |
4478 | builder_scanner.scan() |
4479 | advance = NewBuildersScanner.SCAN_INTERVAL + 1 |
4480 | clock.advance(advance) |
4481 | - self.assertNotEqual( |
4482 | - 0, builder_scanner.scheduleScan.call_count, |
4483 | - "scheduleScan did not get called") |
4484 | |
4485 | |
4486 | def is_file_growing(filepath, poll_interval=1, poll_repeat=10): |
4487 | @@ -1147,7 +612,7 @@ |
4488 | return False |
4489 | |
4490 | |
4491 | -class TestBuilddManagerScript(LaunchpadTestCase): |
4492 | +class TestBuilddManagerScript(TestCaseWithFactory): |
4493 | |
4494 | layer = LaunchpadScriptLayer |
4495 | |
4496 | @@ -1156,6 +621,7 @@ |
4497 | fixture = BuilddManagerTestSetup() |
4498 | fixture.setUp() |
4499 | fixture.tearDown() |
4500 | + self.layer.force_dirty_database() |
4501 | |
4502 | # XXX Julian 2010-08-06 bug=614275 |
4503 | # These next 2 tests are in the wrong place, they should be near the |
4504 | |
4505 | === modified file 'lib/lp/buildmaster/tests/test_packagebuild.py' |
4506 | --- lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-02 11:41:43 +0000 |
4507 | +++ lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-25 19:14:01 +0000 |
4508 | @@ -99,6 +99,8 @@ |
4509 | self.assertRaises( |
4510 | NotImplementedError, self.package_build.verifySuccessfulUpload) |
4511 | self.assertRaises(NotImplementedError, self.package_build.notify) |
4512 | + # XXX 2010-10-18 bug=662631 |
4513 | + # Change this to do non-blocking IO. |
4514 | self.assertRaises( |
4515 | NotImplementedError, self.package_build.handleStatus, |
4516 | None, None, None) |
4517 | @@ -311,6 +313,8 @@ |
4518 | # A filemap with plain filenames should not cause a problem. |
4519 | # The call to handleStatus will attempt to get the file from |
4520 | # the slave resulting in a URL error in this test case. |
4521 | + # XXX 2010-10-18 bug=662631 |
4522 | + # Change this to do non-blocking IO. |
4523 | self.build.handleStatus('OK', None, { |
4524 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4525 | }) |
4526 | @@ -321,6 +325,8 @@ |
4527 | def test_handleStatus_OK_absolute_filepath(self): |
4528 | # A filemap that tries to write to files outside of |
4529 | # the upload directory will result in a failed upload. |
4530 | + # XXX 2010-10-18 bug=662631 |
4531 | + # Change this to do non-blocking IO. |
4532 | self.build.handleStatus('OK', None, { |
4533 | 'filemap': {'/tmp/myfile.py': 'test_file_hash'}, |
4534 | }) |
4535 | @@ -331,6 +337,8 @@ |
4536 | def test_handleStatus_OK_relative_filepath(self): |
4537 | # A filemap that tries to write to files outside of |
4538 | # the upload directory will result in a failed upload. |
4539 | + # XXX 2010-10-18 bug=662631 |
4540 | + # Change this to do non-blocking IO. |
4541 | self.build.handleStatus('OK', None, { |
4542 | 'filemap': {'../myfile.py': 'test_file_hash'}, |
4543 | }) |
4544 | @@ -341,6 +349,8 @@ |
4545 | # The build log is set during handleStatus. |
4546 | removeSecurityProxy(self.build).log = None |
4547 | self.assertEqual(None, self.build.log) |
4548 | + # XXX 2010-10-18 bug=662631 |
4549 | + # Change this to do non-blocking IO. |
4550 | self.build.handleStatus('OK', None, { |
4551 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4552 | }) |
4553 | @@ -350,6 +360,8 @@ |
4554 | # The date finished is updated during handleStatus_OK. |
4555 | removeSecurityProxy(self.build).date_finished = None |
4556 | self.assertEqual(None, self.build.date_finished) |
4557 | + # XXX 2010-10-18 bug=662631 |
4558 | + # Change this to do non-blocking IO. |
4559 | self.build.handleStatus('OK', None, { |
4560 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4561 | }) |
4562 | |
4563 | === modified file 'lib/lp/code/model/recipebuilder.py' |
4564 | --- lib/lp/code/model/recipebuilder.py 2010-08-20 20:31:18 +0000 |
4565 | +++ lib/lp/code/model/recipebuilder.py 2010-10-25 19:14:01 +0000 |
4566 | @@ -117,38 +117,42 @@ |
4567 | raise CannotBuild("Unable to find distroarchseries for %s in %s" % |
4568 | (self._builder.processor.name, |
4569 | self.build.distroseries.displayname)) |
4570 | - |
4571 | + args = self._extraBuildArgs(distroarchseries, logger) |
4572 | chroot = distroarchseries.getChroot() |
4573 | if chroot is None: |
4574 | raise CannotBuild("Unable to find a chroot for %s" % |
4575 | distroarchseries.displayname) |
4576 | - self._builder.slave.cacheFile(logger, chroot) |
4577 | - |
4578 | - # Generate a string which can be used to cross-check when obtaining |
4579 | - # results so we know we are referring to the right database object in |
4580 | - # subsequent runs. |
4581 | - buildid = "%s-%s" % (self.build.id, build_queue_id) |
4582 | - cookie = self.buildfarmjob.generateSlaveBuildCookie() |
4583 | - chroot_sha1 = chroot.content.sha1 |
4584 | - logger.debug( |
4585 | - "Initiating build %s on %s" % (buildid, self._builder.url)) |
4586 | - |
4587 | - args = self._extraBuildArgs(distroarchseries, logger) |
4588 | - status, info = self._builder.slave.build( |
4589 | - cookie, "sourcepackagerecipe", chroot_sha1, {}, args) |
4590 | - message = """%s (%s): |
4591 | - ***** RESULT ***** |
4592 | - %s |
4593 | - %s: %s |
4594 | - ****************** |
4595 | - """ % ( |
4596 | - self._builder.name, |
4597 | - self._builder.url, |
4598 | - args, |
4599 | - status, |
4600 | - info, |
4601 | - ) |
4602 | - logger.info(message) |
4603 | + d = self._builder.slave.cacheFile(logger, chroot) |
4604 | + |
4605 | + def got_cache_file(ignored): |
4606 | + # Generate a string which can be used to cross-check when obtaining |
4607 | + # results so we know we are referring to the right database object in |
4608 | + # subsequent runs. |
4609 | + buildid = "%s-%s" % (self.build.id, build_queue_id) |
4610 | + cookie = self.buildfarmjob.generateSlaveBuildCookie() |
4611 | + chroot_sha1 = chroot.content.sha1 |
4612 | + logger.debug( |
4613 | + "Initiating build %s on %s" % (buildid, self._builder.url)) |
4614 | + |
4615 | + return self._builder.slave.build( |
4616 | + cookie, "sourcepackagerecipe", chroot_sha1, {}, args) |
4617 | + |
4618 | + def log_build_result((status, info)): |
4619 | + message = """%s (%s): |
4620 | + ***** RESULT ***** |
4621 | + %s |
4622 | + %s: %s |
4623 | + ****************** |
4624 | + """ % ( |
4625 | + self._builder.name, |
4626 | + self._builder.url, |
4627 | + args, |
4628 | + status, |
4629 | + info, |
4630 | + ) |
4631 | + logger.info(message) |
4632 | + |
4633 | + return d.addCallback(got_cache_file).addCallback(log_build_result) |
4634 | |
4635 | def verifyBuildRequest(self, logger): |
4636 | """Assert some pre-build checks. |
4637 | |
4638 | === modified file 'lib/lp/soyuz/browser/tests/test_builder_views.py' |
4639 | --- lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-04 19:50:45 +0000 |
4640 | +++ lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-25 19:14:01 +0000 |
4641 | @@ -34,7 +34,7 @@ |
4642 | return view |
4643 | |
4644 | def test_posting_form_doesnt_call_slave_xmlrpc(self): |
4645 | - # Posting the +edit for should not call is_available, which |
4646 | + # Posting the +edit for should not call isAvailable, which |
4647 | # would do xmlrpc to a slave builder and is explicitly forbidden |
4648 | # in a webapp process. |
4649 | view = self.initialize_view() |
4650 | |
4651 | === removed file 'lib/lp/soyuz/doc/buildd-dispatching.txt' |
4652 | --- lib/lp/soyuz/doc/buildd-dispatching.txt 2010-10-18 22:24:59 +0000 |
4653 | +++ lib/lp/soyuz/doc/buildd-dispatching.txt 1970-01-01 00:00:00 +0000 |
4654 | @@ -1,371 +0,0 @@ |
4655 | -= Buildd Dispatching = |
4656 | - |
4657 | - >>> import transaction |
4658 | - >>> import logging |
4659 | - >>> logger = logging.getLogger() |
4660 | - >>> logger.setLevel(logging.DEBUG) |
4661 | - |
4662 | -The buildd dispatching basically consists of finding a available |
4663 | -slave in IDLE state, pushing any required files to it, then requesting |
4664 | -that it starts the build procedure. These tasks are implemented by the |
4665 | -BuilderSet and Builder classes. |
4666 | - |
4667 | -Setup the test builder: |
4668 | - |
4669 | - >>> from canonical.buildd.tests import BuilddSlaveTestSetup |
4670 | - >>> fixture = BuilddSlaveTestSetup() |
4671 | - >>> fixture.setUp() |
4672 | - |
4673 | -Setup a suitable chroot for Hoary i386: |
4674 | - |
4675 | - >>> from StringIO import StringIO |
4676 | - >>> from canonical.librarian.interfaces import ILibrarianClient |
4677 | - >>> librarian_client = getUtility(ILibrarianClient) |
4678 | - |
4679 | - >>> content = 'anything' |
4680 | - >>> alias_id = librarian_client.addFile( |
4681 | - ... 'foo.tar.gz', len(content), StringIO(content), 'text/plain') |
4682 | - |
4683 | - >>> from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet |
4684 | - >>> from lp.registry.interfaces.distribution import IDistributionSet |
4685 | - >>> from lp.registry.interfaces.pocket import PackagePublishingPocket |
4686 | - |
4687 | - >>> hoary = getUtility(IDistributionSet)['ubuntu']['hoary'] |
4688 | - >>> hoary_i386 = hoary['i386'] |
4689 | - |
4690 | - >>> chroot = getUtility(ILibraryFileAliasSet)[alias_id] |
4691 | - >>> pc = hoary_i386.addOrUpdateChroot(chroot=chroot) |
4692 | - |
4693 | -Activate builders present in sampledata, we need to be logged in as a |
4694 | -member of launchpad-buildd-admin: |
4695 | - |
4696 | - >>> from canonical.launchpad.ftests import login |
4697 | - >>> login('celso.providelo@canonical.com') |
4698 | - |
4699 | -Set IBuilder.builderok of all present builders: |
4700 | - |
4701 | - >>> from lp.buildmaster.interfaces.builder import IBuilderSet |
4702 | - >>> builder_set = getUtility(IBuilderSet) |
4703 | - |
4704 | - >>> builder_set.count() |
4705 | - 2 |
4706 | - |
4707 | - >>> from canonical.launchpad.ftests import syncUpdate |
4708 | - >>> for b in builder_set: |
4709 | - ... b.builderok = True |
4710 | - ... syncUpdate(b) |
4711 | - |
4712 | -Clean up previous BuildQueue results from sampledata: |
4713 | - |
4714 | - >>> from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
4715 | - >>> lost_job = getUtility(IBuildQueueSet).get(1) |
4716 | - >>> lost_job.builder.name |
4717 | - u'bob' |
4718 | - >>> lost_job.destroySelf() |
4719 | - >>> transaction.commit() |
4720 | - |
4721 | -If the specified buildd slave reset command (used inside resumeSlaveHost()) |
4722 | -fails, the slave will still be marked as failed. |
4723 | - |
4724 | - >>> from canonical.config import config |
4725 | - >>> reset_fail_config = ''' |
4726 | - ... [builddmaster] |
4727 | - ... vm_resume_command: /bin/false''' |
4728 | - >>> config.push('reset fail', reset_fail_config) |
4729 | - >>> frog_builder = builder_set['frog'] |
4730 | - >>> frog_builder.handleTimeout(logger, 'The universe just collapsed') |
4731 | - WARNING:root:Resetting builder: http://localhost:9221/ -- The universe just collapsed |
4732 | - ... |
4733 | - WARNING:root:Failed to reset builder: http://localhost:9221/ -- Resuming failed: |
4734 | - ... |
4735 | - WARNING:root:Disabling builder: http://localhost:9221/ -- The universe just collapsed |
4736 | - ... |
4737 | - <BLANKLINE> |
4738 | - |
4739 | -Since we were unable to reset the 'frog' builder it was marked as 'failed'. |
4740 | - |
4741 | - >>> frog_builder.builderok |
4742 | - False |
4743 | - |
4744 | -Restore default value for resume command. |
4745 | - |
4746 | - >>> ignored_config = config.pop('reset fail') |
4747 | - |
4748 | -The 'bob' builder is available for build jobs. |
4749 | - |
4750 | - >>> bob_builder = builder_set['bob'] |
4751 | - >>> bob_builder.name |
4752 | - u'bob' |
4753 | - >>> bob_builder.virtualized |
4754 | - False |
4755 | - >>> bob_builder.is_available |
4756 | - True |
4757 | - >>> bob_builder.builderok |
4758 | - True |
4759 | - |
4760 | - |
4761 | -== Builder dispatching API == |
4762 | - |
4763 | -Now let's check the build candidates which will be considered for the |
4764 | -builder 'bob': |
4765 | - |
4766 | - >>> from zope.security.proxy import removeSecurityProxy |
4767 | - >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4768 | - |
4769 | -The single BuildQueue found is a non-virtual pending build: |
4770 | - |
4771 | - >>> job.id |
4772 | - 2 |
4773 | - >>> from lp.soyuz.interfaces.binarypackagebuild import ( |
4774 | - ... IBinaryPackageBuildSet) |
4775 | - >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4776 | - >>> build.status.name |
4777 | - 'NEEDSBUILD' |
4778 | - >>> job.builder is None |
4779 | - True |
4780 | - >>> job.date_started is None |
4781 | - True |
4782 | - >>> build.is_virtualized |
4783 | - False |
4784 | - |
4785 | -The build start time is not set yet either. |
4786 | - |
4787 | - >>> print build.date_first_dispatched |
4788 | - None |
4789 | - |
4790 | -Update the SourcePackageReleaseFile corresponding to this job: |
4791 | - |
4792 | - >>> content = 'anything' |
4793 | - >>> alias_id = librarian_client.addFile( |
4794 | - ... 'foo.dsc', len(content), StringIO(content), 'application/dsc') |
4795 | - |
4796 | - >>> sprf = build.source_package_release.files[0] |
4797 | - >>> naked_sprf = removeSecurityProxy(sprf) |
4798 | - >>> naked_sprf.libraryfile = getUtility(ILibraryFileAliasSet)[alias_id] |
4799 | - >>> flush_database_updates() |
4800 | - |
4801 | -Check the dispatching method itself: |
4802 | - |
4803 | - >>> dispatched_job = bob_builder.findAndStartJob() |
4804 | - >>> job == dispatched_job |
4805 | - True |
4806 | - >>> bob_builder.builderok = True |
4807 | - |
4808 | - >>> flush_database_updates() |
4809 | - |
4810 | -Verify if the job (BuildQueue) was updated appropriately: |
4811 | - |
4812 | - >>> job.builder.id == bob_builder.id |
4813 | - True |
4814 | - |
4815 | - >>> dispatched_build = getUtility( |
4816 | - ... IBinaryPackageBuildSet).getByQueueEntry(job) |
4817 | - >>> dispatched_build == build |
4818 | - True |
4819 | - |
4820 | - >>> build.status.name |
4821 | - 'BUILDING' |
4822 | - |
4823 | -Shutdown builder, mark the build record as failed and remove the |
4824 | -buildqueue record, so the build was eliminated: |
4825 | - |
4826 | - >>> fixture.tearDown() |
4827 | - |
4828 | - >>> from lp.buildmaster.enums import BuildStatus |
4829 | - >>> build.status = BuildStatus.FAILEDTOBUILD |
4830 | - >>> job.destroySelf() |
4831 | - >>> flush_database_updates() |
4832 | - |
4833 | - |
4834 | -== PPA build dispatching == |
4835 | - |
4836 | -Create a new Build record of the same source targeted for a PPA archive: |
4837 | - |
4838 | - >>> from lp.registry.interfaces.person import IPersonSet |
4839 | - >>> cprov = getUtility(IPersonSet).getByName('cprov') |
4840 | - |
4841 | - >>> ppa_build = sprf.sourcepackagerelease.createBuild( |
4842 | - ... hoary_i386, PackagePublishingPocket.RELEASE, cprov.archive) |
4843 | - |
4844 | -Create BuildQueue record and inspect some parameters: |
4845 | - |
4846 | - >>> ppa_job = ppa_build.queueBuild() |
4847 | - >>> ppa_job.id |
4848 | - 3 |
4849 | - >>> ppa_job.builder == None |
4850 | - True |
4851 | - >>> ppa_job.date_started == None |
4852 | - True |
4853 | - |
4854 | -The build job's archive requires virtualized builds. |
4855 | - |
4856 | - >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(ppa_job) |
4857 | - >>> build.archive.require_virtualized |
4858 | - True |
4859 | - |
4860 | -But the builder is not virtualized. |
4861 | - |
4862 | - >>> bob_builder.virtualized |
4863 | - False |
4864 | - |
4865 | -Hence, the builder will not be able to pick up the PPA build job created |
4866 | -above. |
4867 | - |
4868 | - >>> bob_builder.vm_host = 'localhost.ppa' |
4869 | - >>> syncUpdate(bob_builder) |
4870 | - |
4871 | - >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4872 | - >>> print job |
4873 | - None |
4874 | - |
4875 | -In order to enable 'bob' to find and build the PPA job, we have to |
4876 | -change it to virtualized. This is because PPA builds will only build |
4877 | -on virtualized builders. We also need to make sure this build's source |
4878 | -is published, or it will also be ignored (by superseding it). We can |
4879 | -do this by copying the existing publication in Ubuntu. |
4880 | - |
4881 | - >>> from lp.soyuz.model.publishing import ( |
4882 | - ... SourcePackagePublishingHistory) |
4883 | - >>> [old_pub] = SourcePackagePublishingHistory.selectBy( |
4884 | - ... distroseries=build.distro_series, |
4885 | - ... sourcepackagerelease=build.source_package_release) |
4886 | - >>> new_pub = old_pub.copyTo( |
4887 | - ... old_pub.distroseries, old_pub.pocket, build.archive) |
4888 | - |
4889 | - >>> bob_builder.virtualized = True |
4890 | - >>> syncUpdate(bob_builder) |
4891 | - |
4892 | - >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4893 | - >>> ppa_job.id == job.id |
4894 | - True |
4895 | - |
4896 | -For further details regarding IBuilder._findBuildCandidate() please see |
4897 | -lib/lp/soyuz/tests/test_builder.py. |
4898 | - |
4899 | -Start buildd-slave to be able to dispatch jobs. |
4900 | - |
4901 | - >>> fixture = BuilddSlaveTestSetup() |
4902 | - >>> fixture.setUp() |
4903 | - |
4904 | -Before dispatching we can check if the builder is protected against |
4905 | -mistakes in code that results in a attempt to build a virtual job in |
4906 | -a non-virtual build. |
4907 | - |
4908 | - >>> bob_builder.virtualized = False |
4909 | - >>> flush_database_updates() |
4910 | - >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(ppa_job) |
4911 | - Traceback (most recent call last): |
4912 | - ... |
4913 | - AssertionError: Attempt to build non-virtual item on a virtual builder. |
4914 | - |
4915 | -Mark the builder as virtual again, so we can dispatch the ppa job |
4916 | -successfully. |
4917 | - |
4918 | - >>> bob_builder.virtualized = True |
4919 | - >>> flush_database_updates() |
4920 | - |
4921 | - >>> dispatched_job = bob_builder.findAndStartJob() |
4922 | - >>> ppa_job == dispatched_job |
4923 | - True |
4924 | - |
4925 | - >>> flush_database_updates() |
4926 | - |
4927 | -PPA job is building. |
4928 | - |
4929 | - >>> ppa_job.builder.name |
4930 | - u'bob' |
4931 | - |
4932 | - >>> build.status.name |
4933 | - 'BUILDING' |
4934 | - |
4935 | -Shutdown builder slave, mark the ppa build record as failed, remove the |
4936 | -buildqueue record and make 'bob' builder non-virtual again, so the |
4937 | -environment is back to the initial state. |
4938 | - |
4939 | - >>> fixture.tearDown() |
4940 | - |
4941 | - >>> build.status = BuildStatus.FAILEDTOBUILD |
4942 | - >>> ppa_job.destroySelf() |
4943 | - >>> bob_builder.virtualized = False |
4944 | - >>> flush_database_updates() |
4945 | - |
4946 | - |
4947 | -== Security build dispatching == |
4948 | - |
4949 | -Setup chroot for warty/i386. |
4950 | - |
4951 | - >>> warty = getUtility(IDistributionSet)['ubuntu']['warty'] |
4952 | - >>> warty_i386 = warty['i386'] |
4953 | - >>> pc = warty_i386.addOrUpdateChroot(chroot=chroot) |
4954 | - |
4955 | -Create a new Build record for test source targeted to warty/i386 |
4956 | -architecture and SECURITY pocket: |
4957 | - |
4958 | - >>> sec_build = sprf.sourcepackagerelease.createBuild( |
4959 | - ... warty_i386, PackagePublishingPocket.SECURITY, hoary.main_archive) |
4960 | - |
4961 | -Create BuildQueue record and inspect some parameters: |
4962 | - |
4963 | - >>> sec_job = sec_build.queueBuild() |
4964 | - >>> sec_job.id |
4965 | - 4 |
4966 | - >>> print sec_job.builder |
4967 | - None |
4968 | - >>> print sec_job.date_started |
4969 | - None |
4970 | - >>> sec_build.is_virtualized |
4971 | - False |
4972 | - |
4973 | -In normal conditions the next available candidate would be the job |
4974 | -targeted to SECURITY pocket. However, the builders are forbidden to |
4975 | -accept such jobs until we have finished the EMBARGOED archive |
4976 | -implementation. |
4977 | - |
4978 | - >>> fixture = BuilddSlaveTestSetup() |
4979 | - >>> fixture.setUp() |
4980 | - >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(sec_job) |
4981 | - Traceback (most recent call last): |
4982 | - ... |
4983 | - AssertionError: Soyuz is not yet capable of building SECURITY uploads. |
4984 | - >>> fixture.tearDown() |
4985 | - |
4986 | -To solve this problem temporarily until we start building security |
4987 | -uploads, we will mark builds targeted to the SECURITY pocket as |
4988 | -FAILEDTOBUILD during the _findBuildCandidate look-up. |
4989 | - |
4990 | -We will also create another build candidate in breezy-autotest/i386 to |
4991 | -check if legitimate pending candidates will remain valid. |
4992 | - |
4993 | - >>> breezy = getUtility(IDistributionSet)['ubuntu']['breezy-autotest'] |
4994 | - >>> breezy_i386 = breezy['i386'] |
4995 | - >>> pc = breezy_i386.addOrUpdateChroot(chroot=chroot) |
4996 | - |
4997 | - >>> pending_build = sprf.sourcepackagerelease.createBuild( |
4998 | - ... breezy_i386, PackagePublishingPocket.UPDATES, hoary.main_archive) |
4999 | - >>> pending_job = pending_build.queueBuild() |
5000 | - |
Looks good. Thanks.
I think that as long as we have resumeSlave, we should keep the tests in 'lib/lp/ buildmaster/ tests/test_ manager. py'. Please revert your changes to that file & land this branch.