Merge lp:~julian-edwards/launchpad/builderslave-resume into lp:launchpad

Proposed by Julian Edwards
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 11801
Proposed branch: lp:~julian-edwards/launchpad/builderslave-resume
Merge into: lp:launchpad
Diff against target: 7192 lines (+2211/-3509)
24 files modified
lib/lp/buildmaster/doc/builder.txt (+2/-118)
lib/lp/buildmaster/interfaces/builder.py (+83/-62)
lib/lp/buildmaster/manager.py (+205/-469)
lib/lp/buildmaster/model/builder.py (+240/-224)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+60/-52)
lib/lp/buildmaster/model/packagebuild.py (+6/-0)
lib/lp/buildmaster/tests/mock_slaves.py (+157/-32)
lib/lp/buildmaster/tests/test_builder.py (+582/-154)
lib/lp/buildmaster/tests/test_manager.py (+248/-782)
lib/lp/buildmaster/tests/test_packagebuild.py (+12/-0)
lib/lp/code/model/recipebuilder.py (+32/-28)
lib/lp/soyuz/browser/tests/test_builder_views.py (+1/-1)
lib/lp/soyuz/doc/buildd-dispatching.txt (+0/-371)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+0/-876)
lib/lp/soyuz/model/binarypackagebuildbehavior.py (+59/-41)
lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+290/-8)
lib/lp/soyuz/tests/test_doc.py (+0/-6)
lib/lp/testing/factory.py (+8/-2)
lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+0/-114)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+20/-14)
lib/lp/translations/stories/buildfarm/xx-build-summary.txt (+1/-1)
lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+202/-153)
lib/lp_sitecustomize.py (+3/-0)
utilities/migrater/file-ownership.txt (+0/-1)
To merge this branch: bzr merge lp:~julian-edwards/launchpad/builderslave-resume
Reviewer Review Type Date Requested Status
Jonathan Lange (community) Approve
Review via email: mp+36351@code.launchpad.net

Description of the change

This is the integration branch for the "fully asynchronous build manager" changes.

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote :

Looks good. Thanks.

I think that as long as we have resumeSlave, we should keep the tests in 'lib/lp/buildmaster/tests/test_manager.py'. Please revert your changes to that file & land this branch.

Revision history for this message
Jonathan Lange (jml) :
review: Approve
Revision history for this message
Jonathan Lange (jml) wrote :

Branch has been pushed to since review. Definitely not approved.

review: Abstain
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This is now the integration branch. WIP.

Revision history for this message
Jonathan Lange (jml) wrote :
Download full text (9.8 KiB)

Some quick comments, almost all of which are shallow. This doesn't count as a proper code review. Importantly, I haven't rigorously checked that the deleted tests have been properly replaced, and I haven't checked a lot of the Deferred-transformation work for correctness.

Will respond soon with a review of the new manager & test_manager.

> === modified file 'lib/lp/buildmaster/interfaces/builder.py'
> --- lib/lp/buildmaster/interfaces/builder.py 2010-09-23 18:17:21 +0000
> +++ lib/lp/buildmaster/interfaces/builder.py 2010-10-08 17:29:52 +0000
> @@ -154,11 +154,6 @@ class IBuilder(IHasOwner):
>

Could you please go through this interface and:

  * Make sure that all methods that return Deferreds are documented as doing
    so.

  * Possibly, group all of the methods that return Deferreds. I think it will
    make the interface clearer.

> === modified file 'lib/lp/buildmaster/model/builder.py'
> --- lib/lp/buildmaster/model/builder.py 2010-09-24 13:39:27 +0000
> +++ lib/lp/buildmaster/model/builder.py 2010-10-18 10:09:54 +0000
...
> @@ -125,24 +112,17 @@ class BuilderSlave(object):
> # many false positives in your test run and will most likely break
> # production.
>

> # XXX: Have a documented interface for the XML-RPC server:
> # - what methods
> # - what return values expected
> # - what faults
> # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py).
>

I've filed bug https://bugs.edge.launchpad.net/soyuz/+bug/662599 for this. I
don't think having the XXX here helps.

> # XXX: Once we have a client object with a defined, tested interface, we
> # should make a test double that doesn't do any XML-RPC and can be used to
> # make testing easier & tests faster.
>

This XXX can be safely deleted, I think.

> def getFile(self, sha_sum):
> """Construct a file-like object to return the named file."""
> + # XXX: Change this to do non-blocking IO.

Please file a bug.

...
> + # Twisted API requires string but the configuration provides unicode.
> + resume_argv = [str(term) for term in resume_command.split()]

It's more explicit to do .encode('utf-8'), rather than str().

> def updateStatus(self, logger=None):
> """See `IBuilder`."""
> - updateBuilderStatus(self, logger)
> + # updateBuilderStatus returns a Deferred if the builder timed
> + # out, otherwise it returns a thing that we can wrap in a
> + # defer.succeed. maybeDeferred() handles this for us.
> + return defer.maybeDeferred(updateBuilderStatus, self, logger)
>

This comment seems bogus. As far as I can tell, updateBuilderStatus always
returns a Deferred.

> - if builder_should_be_failed:
> + d = self.resumeSlaveHost()
> + return d
> + else:
> + # XXX: This should really let the failure bubble up to the
> + # scan() method that does the failure counting.
> # Mark builder as 'failed'.

Are you going to fix this in this branch or in another? If so, when?

> logger.warn(
> - "Disabling builder: %s -- %s" % (self.url, error_message),
> - ...

Revision history for this message
Jonathan Lange (jml) wrote :
Download full text (16.2 KiB)

Hey Julian,

The new manager is far, far more readable than before. The hard work has paid off.

I mention a lot of small things in the comments below. I'd really appreciate it if you could address them all, since now is the best opportunity we'll have in a while to make this code comprehensible to others.

> # Copyright 2009 Canonical Ltd. This software is licensed under the
> # GNU Affero General Public License version 3 (see the file LICENSE).
>
> """Soyuz buildd slave manager logic."""
>
> __metaclass__ = type
>
> __all__ = [
> 'BuilddManager',
> 'BUILDD_MANAGER_LOG_NAME',
> 'buildd_success_result_map',
> ]
>
> import logging
>
> import transaction
> from twisted.application import service
> from twisted.internet import (
> defer,
> reactor,
> )
> from twisted.internet.task import LoopingCall
> from twisted.python import log
> from zope.component import getUtility
>
> from lp.buildmaster.enums import BuildStatus
> from lp.buildmaster.interfaces.buildfarmjobbehavior import (
> BuildBehaviorMismatch,
> )
> from lp.buildmaster.interfaces.builder import (
> BuildDaemonError,
> BuildSlaveFailure,
> CannotBuild,
> CannotResumeHost,
> )
>
>
> BUILDD_MANAGER_LOG_NAME = "slave-scanner"
>
>
> buildd_success_result_map = {
> 'ensurepresent': True,
> 'build': 'BuilderStatus.BUILDING',
> }
>

You can delete this now. Yay. (Don't forget the __all__ too).

> class SlaveScanner:
> """A manager for a single builder."""
>
> SCAN_INTERVAL = 5
>

Can you please add a comment explaining what this means, what unit it's in,
and hinting at why 5 is a good number for it?

> def startCycle(self):
> """Scan the builder and dispatch to it or deal with failures."""
> self.loop = LoopingCall(self._startCycle)
> self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
> return self.stopping_deferred
>
> def _startCycle(self):
> # Same as _startCycle but the next cycle is not scheduled. This
> # is so tests can initiate a single scan.

This comment is obsolete. Also, there's probably a better name than
"_startCycle", since this is pretty much doing the scan. Perhaps 'oneCycle'.

> def _scanFailed(self, failure):
> # Trap known exceptions and print a message without a
> # stack trace in that case, or if we don't know about it,
> # include the trace.
>

This comment is also obsolete. Although the traceback/no-traceback logic is
still here, it's hardly the point of the method.

> # Paranoia.
> transaction.abort()
>

Can you please explain in the comment exactly what you are being paranoid
about?

> error_message = failure.getErrorMessage()
> if failure.check(
> BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch,
> CannotResumeHost, BuildDaemonError):
> self.logger.info("Scanning failed with: %s" % error_message)
> else:
> self.logger.info("Scanning failed with: %s\n%s" %
> (failure.getErrorMessage(), failure.getTraceback()))
>
> builder = get_builder(self.builder_name)
>

Shouldn...

review: Needs Fixing
Revision history for this message
Julian Edwards (julian-edwards) wrote :
Download full text (8.9 KiB)

On Monday 18 October 2010 11:59:57 Jonathan Lange wrote:
> Some quick comments, almost all of which are shallow. This doesn't count as
> a proper code review. Importantly, I haven't rigorously checked that the
> deleted tests have been properly replaced, and I haven't checked a lot of
> the Deferred-transformation work for correctness.
>
> Will respond soon with a review of the new manager & test_manager.

Cheers, I replied inline.

>
> > === modified file 'lib/lp/buildmaster/interfaces/builder.py'
> > --- lib/lp/buildmaster/interfaces/builder.py 2010-09-23 18:17:21 +0000
> > +++ lib/lp/buildmaster/interfaces/builder.py 2010-10-08 17:29:52 +0000
>
> > @@ -154,11 +154,6 @@ class IBuilder(IHasOwner):
> Could you please go through this interface and:
>
> * Make sure that all methods that return Deferreds are documented as
> doing so.
>
> * Possibly, group all of the methods that return Deferreds. I think it
> will make the interface clearer.

I've done both of these are done as you suggest.

> > === modified file 'lib/lp/buildmaster/model/builder.py'
> > --- lib/lp/buildmaster/model/builder.py 2010-09-24 13:39:27 +0000
> > +++ lib/lp/buildmaster/model/builder.py 2010-10-18 10:09:54 +0000
>
> ...
>
> > @@ -125,24 +112,17 @@ class BuilderSlave(object):
> > # many false positives in your test run and will most likely break
> > # production.
> >
> >
> > # XXX: Have a documented interface for the XML-RPC server:
> > # - what methods
> > # - what return values expected
> > # - what faults
> > # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py).
>
> I've filed bug https://bugs.edge.launchpad.net/soyuz/+bug/662599 for this.
> I don't think having the XXX here helps.

Right, I've deleted it.

>
> > # XXX: Once we have a client object with a defined, tested
> > interface, we # should make a test double that doesn't do any
> > XML-RPC and can be used to # make testing easier & tests faster.
>
> This XXX can be safely deleted, I think.

Done.

>
> > def getFile(self, sha_sum):
> > """Construct a file-like object to return the named file."""
> >
> > + # XXX: Change this to do non-blocking IO.
>
> Please file a bug.

https://bugs.edge.launchpad.net/soyuz/+bug/662631

>
> ...
>
> > + # Twisted API requires string but the configuration provides
> > unicode. + resume_argv = [str(term) for term in
> > resume_command.split()]
>
> It's more explicit to do .encode('utf-8'), rather than str().

Not my code, but I've done that.

>
> > def updateStatus(self, logger=None):
> > """See `IBuilder`."""
> >
> > - updateBuilderStatus(self, logger)
> > + # updateBuilderStatus returns a Deferred if the builder timed
> > + # out, otherwise it returns a thing that we can wrap in a
> > + # defer.succeed. maybeDeferred() handles this for us.
> > + return defer.maybeDeferred(updateBuilderStatus, self, logger)
>
> This comment seems bogus. As far as I can tell, updateBuilderStatus always
> returns a Deferred.

It does, so I've removed the comment and the maybeDeferred().

>
> > - if builder_shoul...

Read more...

Revision history for this message
Julian Edwards (julian-edwards) wrote :
Download full text (20.4 KiB)

On Monday 18 October 2010 13:21:45 you wrote:
> Review: Needs Fixing
> Hey Julian,
>
> The new manager is far, far more readable than before. The hard work has
> paid off.

\o/

> I mention a lot of small things in the comments below. I'd really
> appreciate it if you could address them all, since now is the best
> opportunity we'll have in a while to make this code comprehensible to
> others.

I shall endeavour to do so.

> > # Copyright 2009 Canonical Ltd. This software is licensed under the
> > # GNU Affero General Public License version 3 (see the file LICENSE).
> >
> > """Soyuz buildd slave manager logic."""
> >
> > __metaclass__ = type
> >
> > __all__ = [
> >
> > 'BuilddManager',
> > 'BUILDD_MANAGER_LOG_NAME',
> > 'buildd_success_result_map',
> > ]
> >
> > import logging
> >
> > import transaction
> > from twisted.application import service
> > from twisted.internet import (
> >
> > defer,
> > reactor,
> > )
> >
> > from twisted.internet.task import LoopingCall
> > from twisted.python import log
> > from zope.component import getUtility
> >
> > from lp.buildmaster.enums import BuildStatus
> > from lp.buildmaster.interfaces.buildfarmjobbehavior import (
> >
> > BuildBehaviorMismatch,
> > )
> >
> > from lp.buildmaster.interfaces.builder import (
> >
> > BuildDaemonError,
> > BuildSlaveFailure,
> > CannotBuild,
> > CannotResumeHost,
> > )
> >
> > BUILDD_MANAGER_LOG_NAME = "slave-scanner"
> >
> >
> > buildd_success_result_map = {
> >
> > 'ensurepresent': True,
> > 'build': 'BuilderStatus.BUILDING',
> > }
>
> You can delete this now. Yay. (Don't forget the __all__ too).

Right!

>
> > class SlaveScanner:
> > """A manager for a single builder."""
> >
> > SCAN_INTERVAL = 5
>
> Can you please add a comment explaining what this means, what unit it's in,
> and hinting at why 5 is a good number for it?

Done.

>
> > def startCycle(self):
> > """Scan the builder and dispatch to it or deal with failures."""
> > self.loop = LoopingCall(self._startCycle)
> > self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
> > return self.stopping_deferred
> >
> > def _startCycle(self):
> > # Same as _startCycle but the next cycle is not scheduled. This
> > # is so tests can initiate a single scan.
>
> This comment is obsolete. Also, there's probably a better name than
> "_startCycle", since this is pretty much doing the scan. Perhaps
> 'oneCycle'.

I named it singleCycle()

>
> > def _scanFailed(self, failure):
> > # Trap known exceptions and print a message without a
> > # stack trace in that case, or if we don't know about it,
> > # include the trace.
>
> This comment is also obsolete. Although the traceback/no-traceback logic
> is still here, it's hardly the point of the method.

Spruced up as discussed in IRC.

>
> > # Paranoia.
> > transaction.abort()
>
> Can you please explain in the comment exactly what you are being paranoid
> about?

Yup, done.

>
> > error_message = failure.getErrorMessage()
> > if failure.check(
...

Revision history for this message
Jonathan Lange (jml) wrote :
Download full text (9.4 KiB)

On Mon, Oct 18, 2010 at 4:49 PM, Julian Edwards
<email address hidden> wrote:
> On Monday 18 October 2010 13:21:45 you wrote:
>> Review: Needs Fixing
>> Hey Julian,
>>
>> The new manager is far, far more readable than before.  The hard work has
>> paid off.
>
> \o/
>
>> I mention a lot of small things in the comments below.  I'd really
>> appreciate it if you could address them all, since now is the best
>> opportunity we'll have in a while to make this code comprehensible to
>> others.
>
> I shall endeavour to do so.
>
...
>>
>> > class SlaveScanner:
>> >     """A manager for a single builder."""
>> >
>> >     SCAN_INTERVAL = 5
>>
>> Can you please add a comment explaining what this means, what unit it's in,
>> and hinting at why 5 is a good number for it?
>
> Done.
>

...
>>
>> >         """
>> >         # We need to re-fetch the builder object on each cycle as the
>> >         # Storm store is invalidated over transaction boundaries.
>>
>> This method is complicated enough that I think it would benefit from a
>> prose summary of what it does.
>>
>> For example:
>>
>>   If the builder is OK [XXX - I don't know what this actually means - jml],
>>   then update its status [XXX - this seems redundant, didn't we just check
>>   that it was OK? - jml].  If we think there's an active build on the
>>   builder, then check to see if it's done (builder.updateBuild).  If it
>>   is done, then check that it's available and not in manual mode, then
>>   dispatch the job to the build.
>>
>> Well, that's my best guess.  I think it's only worth describing the happy
>> case, since the unusual cases will be easy enough to follow in the code.
>
> I've written something - let me know if you think it's easy enough to follow.
> It does require some knowledge of how the farm works and I don't think code
> comments are the best place for explaining that.
>

It's good. Much more helpful, thanks.

>> > class BuilddManager(service.Service):
>> >     """Main Buildd Manager service class."""
>> >
>> >     def __init__(self, clock=None):
>> >         self.builder_slaves = []
>> >         self.logger = self._setupLogger()
>>
>> Given that _setupLogger changes global state, it's better to put it in
>> startService.
>
> Nocando (which is a bit near Katmandu) - the NewBuildersScanner needs it
> there.  (More below on this section of code)
>

>>
>> >         self.new_builders_scanner = NewBuildersScanner(
>> >
>> >             manager=self, clock=clock)
>> >
>> >     def _setupLogger(self):
>> >         """Setup a 'slave-scanner' logger that redirects to twisted.
>>
>> FWIW, "Setup" is a noun. "Set up" is a verb.
>
> Setup is not a word, it's a typo.
>

>> > class TestSlaveScannerScan(TrialTestCase):
>> >     """Tests `SlaveScanner.scan` method.
>> >
>> >     This method uses the old framework for scanning and dispatching
>> >     builds. """
>> >     layer = LaunchpadZopelessLayer
>> >
>> >     def setUp(self):
>> >         """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest.
>> >
>> >         Also adjust the sampledata in a way a build can be dispatched to
>> >         'bob' builder.
>> >         """
>> >         from lp.soyuz.tests.test_publishing import...

Read more...

Revision history for this message
Jonathan Lange (jml) wrote :

Add the XXXs as recommended in the last review comment and please land!

review: Approve
Revision history for this message
Jonathan Lange (jml) wrote :

Oh, forgot about the two WIP things mentioned in your reply.

review: Needs Fixing
Revision history for this message
Jonathan Lange (jml) wrote :

The new code looks good. You should add an explanation about why you don't disable builders as soon as they've got a failure, i.e. why the threshold exists at all.

Also, in my previous review just prior to commit r11699, I suggested adding several XXXs. Could you please do that.

Do both of these things, and then land.

review: Approve
Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 19 October 2010 16:39:43 you wrote:
> Review: Approve
> The new code looks good. You should add an explanation about why you don't
> disable builders as soon as they've got a failure, i.e. why the threshold
> exists at all.

Roger.

> Also, in my previous review just prior to commit r11699, I suggested adding
> several XXXs. Could you please do that.

Grar, I forgot to finish that, thanks for reminding me.

> Do both of these things, and then land.

I'm not going to land it right away. I want to soak test it on dogfood first,
borrowing some builders from production. Once I'm happy with it on there,
it's time to let loose the hounds.

Thanks for all your help on this branch! You'll notice the large list of bugs
I linked to it just now. We've fixed all of those in this branch, and
probably more.

Cheers.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/buildmaster/doc/builder.txt'
--- lib/lp/buildmaster/doc/builder.txt 2010-09-23 12:35:21 +0000
+++ lib/lp/buildmaster/doc/builder.txt 2010-10-25 19:14:01 +0000
@@ -19,9 +19,6 @@
19As expected, it implements IBuilder.19As expected, it implements IBuilder.
2020
21 >>> from canonical.launchpad.webapp.testing import verifyObject21 >>> from canonical.launchpad.webapp.testing import verifyObject
22 >>> from lp.buildmaster.interfaces.builder import IBuilder
23 >>> verifyObject(IBuilder, builder)
24 True
2522
26 >>> print builder.name23 >>> print builder.name
27 bob24 bob
@@ -86,7 +83,7 @@
86The 'new' method will create a new builder in the database.83The 'new' method will create a new builder in the database.
8784
88 >>> bnew = builderset.new(1, 'http://dummy.com:8221/', 'dummy',85 >>> bnew = builderset.new(1, 'http://dummy.com:8221/', 'dummy',
89 ... 'Dummy Title', 'eh ?', 1)86 ... 'Dummy Title', 'eh ?', 1)
90 >>> bnew.name87 >>> bnew.name
91 u'dummy'88 u'dummy'
9289
@@ -170,7 +167,7 @@
170 >>> recipe_bq.processor = i386_family.processors[0]167 >>> recipe_bq.processor = i386_family.processors[0]
171 >>> recipe_bq.virtualized = True168 >>> recipe_bq.virtualized = True
172 >>> transaction.commit()169 >>> transaction.commit()
173 170
174 >>> queue_sizes = builderset.getBuildQueueSizes()171 >>> queue_sizes = builderset.getBuildQueueSizes()
175 >>> print queue_sizes['virt']['386']172 >>> print queue_sizes['virt']['386']
176 (1L, datetime.timedelta(0, 64))173 (1L, datetime.timedelta(0, 64))
@@ -188,116 +185,3 @@
188185
189 >>> print queue_sizes['virt']['386']186 >>> print queue_sizes['virt']['386']
190 (2L, datetime.timedelta(0, 128))187 (2L, datetime.timedelta(0, 128))
191
192
193Resuming buildd slaves
194======================
195
196Virtual slaves are resumed using a command specified in the
197configuration profile. Production configuration uses a SSH trigger
198account accessed via a private key available in the builddmaster
199machine (which used ftpmaster configuration profile) as in:
200
201{{{
202ssh ~/.ssh/ppa-reset-key ppa@%(vm_host)s
203}}}
204
205The test configuration uses a fake command that can be performed in
206development machine and allow us to tests the important features used
207in production, as 'vm_host' variable replacement.
208
209 >>> from canonical.config import config
210 >>> config.builddmaster.vm_resume_command
211 'echo %(vm_host)s'
212
213Before performing the command, it checks if the builder is indeed
214virtual and raises CannotResumeHost if it isn't.
215
216 >>> bob = getUtility(IBuilderSet)['bob']
217 >>> bob.resumeSlaveHost()
218 Traceback (most recent call last):
219 ...
220 CannotResumeHost: Builder is not virtualized.
221
222For testing purposes resumeSlaveHost returns the stdout and stderr
223buffer resulted from the command.
224
225 >>> frog = getUtility(IBuilderSet)['frog']
226 >>> out, err = frog.resumeSlaveHost()
227 >>> print out.strip()
228 localhost-host.ppa
229
230If the specified command fails, resumeSlaveHost also raises
231CannotResumeHost exception with the results stdout and stderr.
232
233 # The command must have a vm_host dict key and when executed,
234 # have a returncode that is not 0.
235 >>> vm_resume_command = """
236 ... [builddmaster]
237 ... vm_resume_command: test "%(vm_host)s = 'false'"
238 ... """
239 >>> config.push('vm_resume_command', vm_resume_command)
240 >>> frog.resumeSlaveHost()
241 Traceback (most recent call last):
242 ...
243 CannotResumeHost: Resuming failed:
244 OUT:
245 <BLANKLINE>
246 ERR:
247 <BLANKLINE>
248
249Restore default value for resume command.
250
251 >>> config_data = config.pop('vm_resume_command')
252
253
254Rescuing lost slaves
255====================
256
257Builder.rescueIfLost() checks the build ID reported in the slave status
258against the database. If it isn't building what we think it should be,
259the current build will be aborted and the slave cleaned in preparation
260for a new task. The decision about the slave's correctness is left up
261to IBuildFarmJobBehavior.verifySlaveBuildCookie -- for these examples we
262will use a special behavior that just checks if the cookie reads 'good'.
263
264 >>> import logging
265 >>> from lp.buildmaster.interfaces.builder import CorruptBuildCookie
266 >>> from lp.buildmaster.tests.mock_slaves import (
267 ... BuildingSlave, MockBuilder, OkSlave, WaitingSlave)
268
269 >>> class TestBuildBehavior:
270 ... def verifySlaveBuildCookie(self, cookie):
271 ... if cookie != 'good':
272 ... raise CorruptBuildCookie('Bad value')
273
274 >>> def rescue_slave_if_lost(slave):
275 ... builder = MockBuilder('mock', slave, TestBuildBehavior())
276 ... builder.rescueIfLost(logging.getLogger())
277
278An idle slave is not rescued.
279
280 >>> rescue_slave_if_lost(OkSlave())
281
282Slaves building or having built the correct build are not rescued
283either.
284
285 >>> rescue_slave_if_lost(BuildingSlave(build_id='good'))
286 >>> rescue_slave_if_lost(WaitingSlave(build_id='good'))
287
288But if a slave is building the wrong ID, it is declared lost and
289an abort is attempted. MockSlave prints out a message when it is aborted
290or cleaned.
291
292 >>> rescue_slave_if_lost(BuildingSlave(build_id='bad'))
293 Aborting slave
294 INFO:root:Builder 'mock' rescued from 'bad': 'Bad value'
295
296Slaves having completed an incorrect build are also declared lost,
297but there's no need to abort a completed build. Such builders are
298instead simply cleaned, ready for the next build.
299
300 >>> rescue_slave_if_lost(WaitingSlave(build_id='bad'))
301 Cleaning slave
302 INFO:root:Builder 'mock' rescued from 'bad': 'Bad value'
303
304188
=== modified file 'lib/lp/buildmaster/interfaces/builder.py'
--- lib/lp/buildmaster/interfaces/builder.py 2010-09-23 18:17:21 +0000
+++ lib/lp/buildmaster/interfaces/builder.py 2010-10-25 19:14:01 +0000
@@ -154,11 +154,6 @@
154154
155 currentjob = Attribute("BuildQueue instance for job being processed.")155 currentjob = Attribute("BuildQueue instance for job being processed.")
156156
157 is_available = Bool(
158 title=_("Whether or not a builder is available for building "
159 "new jobs. "),
160 required=False)
161
162 failure_count = Int(157 failure_count = Int(
163 title=_('Failure Count'), required=False, default=0,158 title=_('Failure Count'), required=False, default=0,
164 description=_("Number of consecutive failures for this builder."))159 description=_("Number of consecutive failures for this builder."))
@@ -173,32 +168,74 @@
173 def resetFailureCount():168 def resetFailureCount():
174 """Set the failure_count back to zero."""169 """Set the failure_count back to zero."""
175170
176 def checkSlaveAlive():171 def failBuilder(reason):
177 """Check that the buildd slave is alive.172 """Mark builder as failed for a given reason."""
178173
179 This pings the slave over the network via the echo method and looks174 def setSlaveForTesting(proxy):
180 for the sent message as the reply.175 """Sets the RPC proxy through which to operate the build slave."""
181176
182 :raises BuildDaemonError: When the slave is down.177 def verifySlaveBuildCookie(slave_build_id):
178 """Verify that a slave's build cookie is consistent.
179
180 This should delegate to the current `IBuildFarmJobBehavior`.
181 """
182
183 def transferSlaveFileToLibrarian(file_sha1, filename, private):
184 """Transfer a file from the slave to the librarian.
185
186 :param file_sha1: The file's sha1, which is how the file is addressed
187 in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog'
188 will cause the build log to be retrieved and gzipped.
189 :param filename: The name of the file to be given to the librarian file
190 alias.
191 :param private: True if the build is for a private archive.
192 :return: A librarian file alias.
193 """
194
195 def getBuildQueue():
196 """Return a `BuildQueue` if there's an active job on this builder.
197
198 :return: A BuildQueue, or None.
199 """
200
201 def getCurrentBuildFarmJob():
202 """Return a `BuildFarmJob` for this builder."""
203
204 # All methods below here return Deferred.
205
206 def isAvailable():
207 """Whether or not a builder is available for building new jobs.
208
209 :return: A Deferred that fires with True or False, depending on
210 whether the builder is available or not.
183 """211 """
184212
185 def rescueIfLost(logger=None):213 def rescueIfLost(logger=None):
186 """Reset the slave if its job information doesn't match the DB.214 """Reset the slave if its job information doesn't match the DB.
187215
188 If the builder is BUILDING or WAITING but has a build ID string216 This checks the build ID reported in the slave status against the
189 that doesn't match what is stored in the DB, we have to dismiss217 database. If it isn't building what we think it should be, the current
190 its current actions and clean the slave for another job, assuming218 build will be aborted and the slave cleaned in preparation for a new
191 the XMLRPC is working properly at this point.219 task. The decision about the slave's correctness is left up to
220 `IBuildFarmJobBehavior.verifySlaveBuildCookie`.
221
222 :return: A Deferred that fires when the dialog with the slave is
223 finished. It does not have a return value.
192 """224 """
193225
194 def updateStatus(logger=None):226 def updateStatus(logger=None):
195 """Update the builder's status by probing it."""227 """Update the builder's status by probing it.
228
229 :return: A Deferred that fires when the dialog with the slave is
230 finished. It does not have a return value.
231 """
196232
197 def cleanSlave():233 def cleanSlave():
198 """Clean any temporary files from the slave."""234 """Clean any temporary files from the slave.
199235
200 def failBuilder(reason):236 :return: A Deferred that fires when the dialog with the slave is
201 """Mark builder as failed for a given reason."""237 finished. It does not have a return value.
238 """
202239
203 def requestAbort():240 def requestAbort():
204 """Ask that a build be aborted.241 """Ask that a build be aborted.
@@ -206,6 +243,9 @@
206 This takes place asynchronously: Actually killing everything running243 This takes place asynchronously: Actually killing everything running
207 can take some time so the slave status should be queried again to244 can take some time so the slave status should be queried again to
208 detect when the abort has taken effect. (Look for status ABORTED).245 detect when the abort has taken effect. (Look for status ABORTED).
246
247 :return: A Deferred that fires when the dialog with the slave is
248 finished. It does not have a return value.
209 """249 """
210250
211 def resumeSlaveHost():251 def resumeSlaveHost():
@@ -217,37 +257,35 @@
217 :raises: CannotResumeHost: if builder is not virtual or if the257 :raises: CannotResumeHost: if builder is not virtual or if the
218 configuration command has failed.258 configuration command has failed.
219259
220 :return: command stdout and stderr buffers as a tuple.260 :return: A Deferred that fires when the resume operation finishes,
261 whose value is a (stdout, stderr) tuple for success, or a Failure
262 whose value is a CannotResumeHost exception.
221 """263 """
222264
223 def setSlaveForTesting(proxy):
224 """Sets the RPC proxy through which to operate the build slave."""
225
226 def slaveStatus():265 def slaveStatus():
227 """Get the slave status for this builder.266 """Get the slave status for this builder.
228267
229 :return: a dict containing at least builder_status, but potentially268 :return: A Deferred which fires when the slave dialog is complete.
230 other values included by the current build behavior.269 Its value is a dict containing at least builder_status, but
270 potentially other values included by the current build
271 behavior.
231 """272 """
232273
233 def slaveStatusSentence():274 def slaveStatusSentence():
234 """Get the slave status sentence for this builder.275 """Get the slave status sentence for this builder.
235276
236 :return: A tuple with the first element containing the slave status,277 :return: A Deferred which fires when the slave dialog is complete.
237 build_id-queue-id and then optionally more elements depending on278 Its value is a tuple with the first element containing the
238 the status.279 slave status, build_id-queue-id and then optionally more
239 """280 elements depending on the status.
240
241 def verifySlaveBuildCookie(slave_build_id):
242 """Verify that a slave's build cookie is consistent.
243
244 This should delegate to the current `IBuildFarmJobBehavior`.
245 """281 """
246282
247 def updateBuild(queueItem):283 def updateBuild(queueItem):
248 """Verify the current build job status.284 """Verify the current build job status.
249285
250 Perform the required actions for each state.286 Perform the required actions for each state.
287
288 :return: A Deferred that fires when the slave dialog is finished.
251 """289 """
252290
253 def startBuild(build_queue_item, logger):291 def startBuild(build_queue_item, logger):
@@ -255,21 +293,10 @@
255293
256 :param build_queue_item: A BuildQueueItem to build.294 :param build_queue_item: A BuildQueueItem to build.
257 :param logger: A logger to be used to log diagnostic information.295 :param logger: A logger to be used to log diagnostic information.
258 :raises BuildSlaveFailure: When the build slave fails.296
259 :raises CannotBuild: When a build cannot be started for some reason297 :return: A Deferred that fires after the dispatch has completed whose
260 other than the build slave failing.298 value is None, or a Failure that contains an exception
261 """299 explaining what went wrong.
262
263 def transferSlaveFileToLibrarian(file_sha1, filename, private):
264 """Transfer a file from the slave to the librarian.
265
266 :param file_sha1: The file's sha1, which is how the file is addressed
267 in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog'
268 will cause the build log to be retrieved and gzipped.
269 :param filename: The name of the file to be given to the librarian file
270 alias.
271 :param private: True if the build is for a private archive.
272 :return: A librarian file alias.
273 """300 """
274301
275 def handleTimeout(logger, error_message):302 def handleTimeout(logger, error_message):
@@ -284,6 +311,8 @@
284311
285 :param logger: The logger object to be used for logging.312 :param logger: The logger object to be used for logging.
286 :param error_message: The error message to be used for logging.313 :param error_message: The error message to be used for logging.
314 :return: A Deferred that fires after the virtual slave was resumed
315 or immediately if it's a non-virtual slave.
287 """316 """
288317
289 def findAndStartJob(buildd_slave=None):318 def findAndStartJob(buildd_slave=None):
@@ -291,17 +320,9 @@
291320
292 :param buildd_slave: An optional buildd slave that this builder should321 :param buildd_slave: An optional buildd slave that this builder should
293 talk to.322 talk to.
294 :return: the `IBuildQueue` instance found or None if no job was found.323 :return: A Deferred whose value is the `IBuildQueue` instance
295 """324 found or None if no job was found.
296325 """
297 def getBuildQueue():
298 """Return a `BuildQueue` if there's an active job on this builder.
299
300 :return: A BuildQueue, or None.
301 """
302
303 def getCurrentBuildFarmJob():
304 """Return a `BuildFarmJob` for this builder."""
305326
306327
307class IBuilderSet(Interface):328class IBuilderSet(Interface):
308329
=== modified file 'lib/lp/buildmaster/manager.py'
--- lib/lp/buildmaster/manager.py 2010-09-24 15:40:49 +0000
+++ lib/lp/buildmaster/manager.py 2010-10-25 19:14:01 +0000
@@ -10,13 +10,10 @@
10 'BuilddManager',10 'BuilddManager',
11 'BUILDD_MANAGER_LOG_NAME',11 'BUILDD_MANAGER_LOG_NAME',
12 'FailDispatchResult',12 'FailDispatchResult',
13 'RecordingSlave',
14 'ResetDispatchResult',13 'ResetDispatchResult',
15 'buildd_success_result_map',
16 ]14 ]
1715
18import logging16import logging
19import os
2017
21import transaction18import transaction
22from twisted.application import service19from twisted.application import service
@@ -24,129 +21,27 @@
24 defer,21 defer,
25 reactor,22 reactor,
26 )23 )
27from twisted.protocols.policies import TimeoutMixin24from twisted.internet.task import LoopingCall
28from twisted.python import log25from twisted.python import log
29from twisted.python.failure import Failure
30from twisted.web import xmlrpc
31from zope.component import getUtility26from zope.component import getUtility
3227
33from canonical.config import config
34from canonical.launchpad.webapp import urlappend
35from lp.services.database import write_transaction
36from lp.buildmaster.enums import BuildStatus28from lp.buildmaster.enums import BuildStatus
37from lp.services.twistedsupport.processmonitor import ProcessWithTimeout29from lp.buildmaster.interfaces.buildfarmjobbehavior import (
30 BuildBehaviorMismatch,
31 )
32from lp.buildmaster.model.builder import Builder
33from lp.buildmaster.interfaces.builder import (
34 BuildDaemonError,
35 BuildSlaveFailure,
36 CannotBuild,
37 CannotFetchFile,
38 CannotResumeHost,
39 )
3840
3941
40BUILDD_MANAGER_LOG_NAME = "slave-scanner"42BUILDD_MANAGER_LOG_NAME = "slave-scanner"
4143
4244
43buildd_success_result_map = {
44 'ensurepresent': True,
45 'build': 'BuilderStatus.BUILDING',
46 }
47
48
49class QueryWithTimeoutProtocol(xmlrpc.QueryProtocol, TimeoutMixin):
50 """XMLRPC query protocol with a configurable timeout.
51
52 XMLRPC queries using this protocol will be unconditionally closed
53 when the timeout is elapsed. The timeout is fetched from the context
54 Launchpad configuration file (`config.builddmaster.socket_timeout`).
55 """
56 def connectionMade(self):
57 xmlrpc.QueryProtocol.connectionMade(self)
58 self.setTimeout(config.builddmaster.socket_timeout)
59
60
61class QueryFactoryWithTimeout(xmlrpc._QueryFactory):
62 """XMLRPC client factory with timeout support."""
63 # Make this factory quiet.
64 noisy = False
65 # Use the protocol with timeout support.
66 protocol = QueryWithTimeoutProtocol
67
68
69class RecordingSlave:
70 """An RPC proxy for buildd slaves that records instructions to the latter.
71
72 The idea here is to merely record the instructions that the slave-scanner
73 issues to the buildd slaves and "replay" them a bit later in asynchronous
74 and parallel fashion.
75
76 By dealing with a number of buildd slaves in parallel we remove *the*
77 major slave-scanner throughput issue while avoiding large-scale changes to
78 its code base.
79 """
80
81 def __init__(self, name, url, vm_host):
82 self.name = name
83 self.url = url
84 self.vm_host = vm_host
85
86 self.resume_requested = False
87 self.calls = []
88
89 def __repr__(self):
90 return '<%s:%s>' % (self.name, self.url)
91
92 def cacheFile(self, logger, libraryfilealias):
93 """Cache the file on the server."""
94 self.ensurepresent(
95 libraryfilealias.content.sha1, libraryfilealias.http_url, '', '')
96
97 def sendFileToSlave(self, *args):
98 """Helper to send a file to this builder."""
99 return self.ensurepresent(*args)
100
101 def ensurepresent(self, *args):
102 """Download files needed for the build."""
103 self.calls.append(('ensurepresent', args))
104 result = buildd_success_result_map.get('ensurepresent')
105 return [result, 'Download']
106
107 def build(self, *args):
108 """Perform the build."""
109 # XXX: This method does not appear to be used.
110 self.calls.append(('build', args))
111 result = buildd_success_result_map.get('build')
112 return [result, args[0]]
113
114 def resume(self):
115 """Record the request to resume the builder..
116
117 Always succeed.
118
119 :return: a (stdout, stderr, subprocess exitcode) triple
120 """
121 self.resume_requested = True
122 return ['', '', 0]
123
124 def resumeSlave(self, clock=None):
125 """Resume the builder in a asynchronous fashion.
126
127 Used the configuration command-line in the same way
128 `BuilddSlave.resume` does.
129
130 Also use the builddmaster configuration 'socket_timeout' as
131 the process timeout.
132
133 :param clock: An optional twisted.internet.task.Clock to override
134 the default clock. For use in tests.
135
136 :return: a Deferred
137 """
138 resume_command = config.builddmaster.vm_resume_command % {
139 'vm_host': self.vm_host}
140 # Twisted API require string and the configuration provides unicode.
141 resume_argv = [str(term) for term in resume_command.split()]
142
143 d = defer.Deferred()
144 p = ProcessWithTimeout(
145 d, config.builddmaster.socket_timeout, clock=clock)
146 p.spawnProcess(resume_argv[0], tuple(resume_argv))
147 return d
148
149
150def get_builder(name):45def get_builder(name):
151 """Helper to return the builder given the slave for this request."""46 """Helper to return the builder given the slave for this request."""
152 # Avoiding circular imports.47 # Avoiding circular imports.
@@ -159,9 +54,12 @@
159 # builder.currentjob hides a complicated query, don't run it twice.54 # builder.currentjob hides a complicated query, don't run it twice.
160 # See bug 623281.55 # See bug 623281.
161 current_job = builder.currentjob56 current_job = builder.currentjob
162 build_job = current_job.specific_job.build57 if current_job is None:
58 job_failure_count = 0
59 else:
60 job_failure_count = current_job.specific_job.build.failure_count
16361
164 if builder.failure_count == build_job.failure_count:62 if builder.failure_count == job_failure_count and current_job is not None:
165 # If the failure count for the builder is the same as the63 # If the failure count for the builder is the same as the
166 # failure count for the job being built, then we cannot64 # failure count for the job being built, then we cannot
167 # tell whether the job or the builder is at fault. The best65 # tell whether the job or the builder is at fault. The best
@@ -170,17 +68,28 @@
170 current_job.reset()68 current_job.reset()
171 return69 return
17270
173 if builder.failure_count > build_job.failure_count:71 if builder.failure_count > job_failure_count:
174 # The builder has failed more than the jobs it's been72 # The builder has failed more than the jobs it's been
175 # running, so let's disable it and re-schedule the build.73 # running.
176 builder.failBuilder(fail_notes)74
177 current_job.reset()75 # Re-schedule the build if there is one.
76 if current_job is not None:
77 current_job.reset()
78
79 # We are a little more tolerant with failing builders than
80 # failing jobs because sometimes they get unresponsive due to
81 # human error, flaky networks etc. We expect the builder to get
82 # better, whereas jobs are very unlikely to get better.
83 if builder.failure_count >= Builder.FAILURE_THRESHOLD:
84 # It's also gone over the threshold so let's disable it.
85 builder.failBuilder(fail_notes)
178 else:86 else:
179 # The job is the culprit! Override its status to 'failed'87 # The job is the culprit! Override its status to 'failed'
180 # to make sure it won't get automatically dispatched again,88 # to make sure it won't get automatically dispatched again,
181 # and remove the buildqueue request. The failure should89 # and remove the buildqueue request. The failure should
182 # have already caused any relevant slave data to be stored90 # have already caused any relevant slave data to be stored
183 # on the build record so don't worry about that here.91 # on the build record so don't worry about that here.
92 build_job = current_job.specific_job.build
184 build_job.status = BuildStatus.FAILEDTOBUILD93 build_job.status = BuildStatus.FAILEDTOBUILD
185 builder.currentjob.destroySelf()94 builder.currentjob.destroySelf()
18695
@@ -190,133 +99,108 @@
190 # next buildd scan.99 # next buildd scan.
191100
192101
193class BaseDispatchResult:
194 """Base class for *DispatchResult variations.
195
196 It will be extended to represent dispatching results and allow
197 homogeneous processing.
198 """
199
200 def __init__(self, slave, info=None):
201 self.slave = slave
202 self.info = info
203
204 def _cleanJob(self, job):
205 """Clean up in case of builder reset or dispatch failure."""
206 if job is not None:
207 job.reset()
208
209 def assessFailureCounts(self):
210 """View builder/job failure_count and work out which needs to die.
211
212 :return: True if we disabled something, False if we did not.
213 """
214 builder = get_builder(self.slave.name)
215 assessFailureCounts(builder, self.info)
216
217 def ___call__(self):
218 raise NotImplementedError(
219 "Call sites must define an evaluation method.")
220
221
222class FailDispatchResult(BaseDispatchResult):
223 """Represents a communication failure while dispatching a build job..
224
225 When evaluated this object mark the corresponding `IBuilder` as
226 'NOK' with the given text as 'failnotes'. It also cleans up the running
227 job (`IBuildQueue`).
228 """
229
230 def __repr__(self):
231 return '%r failure (%s)' % (self.slave, self.info)
232
233 @write_transaction
234 def __call__(self):
235 self.assessFailureCounts()
236
237
238class ResetDispatchResult(BaseDispatchResult):
239 """Represents a failure to reset a builder.
240
241 When evaluated this object simply cleans up the running job
242 (`IBuildQueue`) and marks the builder down.
243 """
244
245 def __repr__(self):
246 return '%r reset failure' % self.slave
247
248 @write_transaction
249 def __call__(self):
250 builder = get_builder(self.slave.name)
251 # Builders that fail to reset should be disabled as per bug
252 # 563353.
253 # XXX Julian bug=586362
254 # This is disabled until this code is not also used for dispatch
255 # failures where we *don't* want to disable the builder.
256 # builder.failBuilder(self.info)
257 self._cleanJob(builder.currentjob)
258
259
260class SlaveScanner:102class SlaveScanner:
261 """A manager for a single builder."""103 """A manager for a single builder."""
262104
105 # The interval between each poll cycle, in seconds. We'd ideally
106 # like this to be lower but 5 seems a reasonable compromise between
107 # responsivity and load on the database server, since in each cycle
108 # we can run quite a few queries.
263 SCAN_INTERVAL = 5109 SCAN_INTERVAL = 5
264110
265 # These are for the benefit of tests; see `TestingSlaveScanner`.
266 # It pokes fake versions in here so that it can verify methods were
267 # called. The tests should really be using FakeMethod() though.
268 reset_result = ResetDispatchResult
269 fail_result = FailDispatchResult
270
271 def __init__(self, builder_name, logger):111 def __init__(self, builder_name, logger):
272 self.builder_name = builder_name112 self.builder_name = builder_name
273 self.logger = logger113 self.logger = logger
274 self._deferred_list = []
275
276 def scheduleNextScanCycle(self):
277 """Schedule another scan of the builder some time in the future."""
278 self._deferred_list = []
279 # XXX: Change this to use LoopingCall.
280 reactor.callLater(self.SCAN_INTERVAL, self.startCycle)
281114
282 def startCycle(self):115 def startCycle(self):
283 """Scan the builder and dispatch to it or deal with failures."""116 """Scan the builder and dispatch to it or deal with failures."""
117 self.loop = LoopingCall(self.singleCycle)
118 self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
119 return self.stopping_deferred
120
121 def stopCycle(self):
122 """Terminate the LoopingCall."""
123 self.loop.stop()
124
125 def singleCycle(self):
284 self.logger.debug("Scanning builder: %s" % self.builder_name)126 self.logger.debug("Scanning builder: %s" % self.builder_name)
285127 d = self.scan()
128
129 d.addErrback(self._scanFailed)
130 return d
131
132 def _scanFailed(self, failure):
133 """Deal with failures encountered during the scan cycle.
134
135 1. Print the error in the log
136 2. Increment and assess failure counts on the builder and job.
137 """
138 # Make sure that pending database updates are removed as it
139 # could leave the database in an inconsistent state (e.g. The
140 # job says it's running but the buildqueue has no builder set).
141 transaction.abort()
142
143 # If we don't recognise the exception include a stack trace with
144 # the error.
145 error_message = failure.getErrorMessage()
146 if failure.check(
147 BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch,
148 CannotResumeHost, BuildDaemonError, CannotFetchFile):
149 self.logger.info("Scanning failed with: %s" % error_message)
150 else:
151 self.logger.info("Scanning failed with: %s\n%s" %
152 (failure.getErrorMessage(), failure.getTraceback()))
153
154 # Decide if we need to terminate the job or fail the
155 # builder.
286 try:156 try:
287 slave = self.scan()157 builder = get_builder(self.builder_name)
288 if slave is None:158 builder.gotFailure()
289 self.scheduleNextScanCycle()159 if builder.currentjob is not None:
160 build_farm_job = builder.getCurrentBuildFarmJob()
161 build_farm_job.gotFailure()
162 self.logger.info(
163 "builder %s failure count: %s, "
164 "job '%s' failure count: %s" % (
165 self.builder_name,
166 builder.failure_count,
167 build_farm_job.title,
168 build_farm_job.failure_count))
290 else:169 else:
291 # XXX: Ought to return Deferred.170 self.logger.info(
292 self.resumeAndDispatch(slave)171 "Builder %s failed a probe, count: %s" % (
172 self.builder_name, builder.failure_count))
173 assessFailureCounts(builder, failure.getErrorMessage())
174 transaction.commit()
293 except:175 except:
294 error = Failure()176 # Catastrophic code failure! Not much we can do.
295 self.logger.info("Scanning failed with: %s\n%s" %177 self.logger.error(
296 (error.getErrorMessage(), error.getTraceback()))178 "Miserable failure when trying to examine failure counts:\n",
297179 exc_info=True)
298 builder = get_builder(self.builder_name)180 transaction.abort()
299181
300 # Decide if we need to terminate the job or fail the
301 # builder.
302 self._incrementFailureCounts(builder)
303 self.logger.info(
304 "builder failure count: %s, job failure count: %s" % (
305 builder.failure_count,
306 builder.getCurrentBuildFarmJob().failure_count))
307 assessFailureCounts(builder, error.getErrorMessage())
308 transaction.commit()
309
310 self.scheduleNextScanCycle()
311
312 @write_transaction
313 def scan(self):182 def scan(self):
314 """Probe the builder and update/dispatch/collect as appropriate.183 """Probe the builder and update/dispatch/collect as appropriate.
315184
316 The whole method is wrapped in a transaction, but we do partial185 There are several steps to scanning:
317 commits to avoid holding locks on tables.186
318187 1. If the builder is marked as "ok" then probe it to see what state
319 :return: A `RecordingSlave` if we dispatched a job to it, or None.188 it's in. This is where lost jobs are rescued if we think the
189 builder is doing something that it later tells us it's not,
190 and also where the multi-phase abort procedure happens.
191 See IBuilder.rescueIfLost, which is called by
192 IBuilder.updateStatus().
193 2. If the builder is still happy, we ask it if it has an active build
194 and then either update the build in Launchpad or collect the
195 completed build. (builder.updateBuild)
196 3. If the builder is not happy or it was marked as unavailable
197 mid-build, we need to reset the job that we thought it had, so
198 that the job is dispatched elsewhere.
199 4. If the builder is idle and we have another build ready, dispatch
200 it.
201
202 :return: A Deferred that fires when the scan is complete, whose
203 value is A `BuilderSlave` if we dispatched a job to it, or None.
320 """204 """
321 # We need to re-fetch the builder object on each cycle as the205 # We need to re-fetch the builder object on each cycle as the
322 # Storm store is invalidated over transaction boundaries.206 # Storm store is invalidated over transaction boundaries.
@@ -324,240 +208,72 @@
324 self.builder = get_builder(self.builder_name)208 self.builder = get_builder(self.builder_name)
325209
326 if self.builder.builderok:210 if self.builder.builderok:
327 self.builder.updateStatus(self.logger)211 d = self.builder.updateStatus(self.logger)
328 transaction.commit()
329
330 # See if we think there's an active build on the builder.
331 buildqueue = self.builder.getBuildQueue()
332
333 # XXX Julian 2010-07-29 bug=611258
334 # We're not using the RecordingSlave until dispatching, which
335 # means that this part blocks until we've received a response
336 # from the builder. updateBuild() needs to be made
337 # asyncronous.
338
339 # Scan the slave and get the logtail, or collect the build if
340 # it's ready. Yes, "updateBuild" is a bad name.
341 if buildqueue is not None:
342 self.builder.updateBuild(buildqueue)
343 transaction.commit()
344
345 # If the builder is in manual mode, don't dispatch anything.
346 if self.builder.manual:
347 self.logger.debug(
348 '%s is in manual mode, not dispatching.' % self.builder.name)
349 return None
350
351 # If the builder is marked unavailable, don't dispatch anything.
352 # Additionaly, because builders can be removed from the pool at
353 # any time, we need to see if we think there was a build running
354 # on it before it was marked unavailable. In this case we reset
355 # the build thusly forcing it to get re-dispatched to another
356 # builder.
357 if not self.builder.is_available:
358 job = self.builder.currentjob
359 if job is not None and not self.builder.builderok:
360 self.logger.info(
361 "%s was made unavailable, resetting attached "
362 "job" % self.builder.name)
363 job.reset()
364 transaction.commit()
365 return None
366
367 # See if there is a job we can dispatch to the builder slave.
368
369 # XXX: Rather than use the slave actually associated with the builder
370 # (which, incidentally, shouldn't be a property anyway), we make a new
371 # RecordingSlave so we can get access to its asynchronous
372 # "resumeSlave" method. Blech.
373 slave = RecordingSlave(
374 self.builder.name, self.builder.url, self.builder.vm_host)
375 # XXX: Passing buildd_slave=slave overwrites the 'slave' property of
376 # self.builder. Not sure why this is needed yet.
377 self.builder.findAndStartJob(buildd_slave=slave)
378 if self.builder.currentjob is not None:
379 # After a successful dispatch we can reset the
380 # failure_count.
381 self.builder.resetFailureCount()
382 transaction.commit()
383 return slave
384
385 return None
386
387 def resumeAndDispatch(self, slave):
388 """Chain the resume and dispatching Deferreds."""
389 # XXX: resumeAndDispatch makes Deferreds without returning them.
390 if slave.resume_requested:
391 # The slave needs to be reset before we can dispatch to
392 # it (e.g. a virtual slave)
393
394 # XXX: Two problems here. The first is that 'resumeSlave' only
395 # exists on RecordingSlave (BuilderSlave calls it 'resume').
396 d = slave.resumeSlave()
397 d.addBoth(self.checkResume, slave)
398 else:212 else:
399 # No resume required, build dispatching can commence.
400 d = defer.succeed(None)213 d = defer.succeed(None)
401214
402 # Dispatch the build to the slave asynchronously.215 def status_updated(ignored):
403 d.addCallback(self.initiateDispatch, slave)216 # Commit the changes done while possibly rescuing jobs, to
404 # Store this deferred so we can wait for it along with all217 # avoid holding table locks.
405 # the others that will be generated by RecordingSlave during218 transaction.commit()
406 # the dispatch process, and chain a callback after they've219
407 # all fired.220 # See if we think there's an active build on the builder.
408 self._deferred_list.append(d)221 buildqueue = self.builder.getBuildQueue()
409222
410 def initiateDispatch(self, resume_result, slave):223 # Scan the slave and get the logtail, or collect the build if
411 """Start dispatching a build to a slave.224 # it's ready. Yes, "updateBuild" is a bad name.
412225 if buildqueue is not None:
413 If the previous task in chain (slave resuming) has failed it will226 return self.builder.updateBuild(buildqueue)
414 receive a `ResetBuilderRequest` instance as 'resume_result' and227
415 will immediately return that so the subsequent callback can collect228 def build_updated(ignored):
416 it.229 # Commit changes done while updating the build, to avoid
417230 # holding table locks.
418 If the slave resuming succeeded, it starts the XMLRPC dialogue. The231 transaction.commit()
419 dialogue may consist of many calls to the slave before the build232
420 starts. Each call is done via a Deferred event, where slave calls233 # If the builder is in manual mode, don't dispatch anything.
421 are sent in callSlave(), and checked in checkDispatch() which will234 if self.builder.manual:
422 keep firing events via callSlave() until all the events are done or235 self.logger.debug(
423 an error occurs.236 '%s is in manual mode, not dispatching.' %
424 """237 self.builder.name)
425 if resume_result is not None:238 return
426 self.slaveConversationEnded()239
427 return resume_result240 # If the builder is marked unavailable, don't dispatch anything.
428241 # Additionaly, because builders can be removed from the pool at
429 self.logger.info('Dispatching: %s' % slave)242 # any time, we need to see if we think there was a build running
430 self.callSlave(slave)243 # on it before it was marked unavailable. In this case we reset
431244 # the build thusly forcing it to get re-dispatched to another
432 def _getProxyForSlave(self, slave):245 # builder.
433 """Return a twisted.web.xmlrpc.Proxy for the buildd slave.246
434247 return self.builder.isAvailable().addCallback(got_available)
435 Uses a protocol with timeout support, See QueryFactoryWithTimeout.248
436 """249 def got_available(available):
437 proxy = xmlrpc.Proxy(str(urlappend(slave.url, 'rpc')))250 if not available:
438 proxy.queryFactory = QueryFactoryWithTimeout251 job = self.builder.currentjob
439 return proxy252 if job is not None and not self.builder.builderok:
440253 self.logger.info(
441 def callSlave(self, slave):254 "%s was made unavailable, resetting attached "
442 """Dispatch the next XMLRPC for the given slave."""255 "job" % self.builder.name)
443 if len(slave.calls) == 0:256 job.reset()
444 # That's the end of the dialogue with the slave.257 transaction.commit()
445 self.slaveConversationEnded()258 return
446 return259
447260 # See if there is a job we can dispatch to the builder slave.
448 # Get an XMLRPC proxy for the buildd slave.261
449 proxy = self._getProxyForSlave(slave)262 d = self.builder.findAndStartJob()
450 method, args = slave.calls.pop(0)263 def job_started(candidate):
451 d = proxy.callRemote(method, *args)264 if self.builder.currentjob is not None:
452 d.addBoth(self.checkDispatch, method, slave)265 # After a successful dispatch we can reset the
453 self._deferred_list.append(d)266 # failure_count.
454 self.logger.debug('%s -> %s(%s)' % (slave, method, args))267 self.builder.resetFailureCount()
455268 transaction.commit()
456 def slaveConversationEnded(self):269 return self.builder.slave
457 """After all the Deferreds are set up, chain a callback on them."""270 else:
458 dl = defer.DeferredList(self._deferred_list, consumeErrors=True)
459 dl.addBoth(self.evaluateDispatchResult)
460 return dl
461
462 def evaluateDispatchResult(self, deferred_list_results):
463 """Process the DispatchResult for this dispatch chain.
464
465 After waiting for the Deferred chain to finish, we'll have a
466 DispatchResult to evaluate, which deals with the result of
467 dispatching.
468 """
469 # The `deferred_list_results` is what we get when waiting on a
470 # DeferredList. It's a list of tuples of (status, result) where
471 # result is what the last callback in that chain returned.
472
473 # If the result is an instance of BaseDispatchResult we need to
474 # evaluate it, as there's further action required at the end of
475 # the dispatch chain. None, resulting from successful chains,
476 # are discarded.
477
478 dispatch_results = [
479 result for status, result in deferred_list_results
480 if isinstance(result, BaseDispatchResult)]
481
482 for result in dispatch_results:
483 self.logger.info("%r" % result)
484 result()
485
486 # At this point, we're done dispatching, so we can schedule the
487 # next scan cycle.
488 self.scheduleNextScanCycle()
489
490 # For the test suite so that it can chain callback results.
491 return deferred_list_results
492
493 def checkResume(self, response, slave):
494 """Check the result of resuming a slave.
495
496 If there's a problem resuming, we return a ResetDispatchResult which
497 will get evaluated at the end of the scan, or None if the resume
498 was OK.
499
500 :param response: the tuple that's constructed in
501 ProcessWithTimeout.processEnded(), or a Failure that
502 contains the tuple.
503 :param slave: the slave object we're talking to
504 """
505 if isinstance(response, Failure):
506 out, err, code = response.value
507 else:
508 out, err, code = response
509 if code == os.EX_OK:
510 return None
511
512 error_text = '%s\n%s' % (out, err)
513 self.logger.error('%s resume failure: %s' % (slave, error_text))
514 return self.reset_result(slave, error_text)
515
516 def _incrementFailureCounts(self, builder):
517 builder.gotFailure()
518 builder.getCurrentBuildFarmJob().gotFailure()
519
520 def checkDispatch(self, response, method, slave):
521 """Verify the results of a slave xmlrpc call.
522
523 If it failed and it compromises the slave then return a corresponding
524 `FailDispatchResult`, if it was a communication failure, simply
525 reset the slave by returning a `ResetDispatchResult`.
526 """
527 from lp.buildmaster.interfaces.builder import IBuilderSet
528 builder = getUtility(IBuilderSet)[slave.name]
529
530 # XXX these DispatchResult classes are badly named and do the
531 # same thing. We need to fix that.
532 self.logger.debug(
533 '%s response for "%s": %s' % (slave, method, response))
534
535 if isinstance(response, Failure):
536 self.logger.warn(
537 '%s communication failed (%s)' %
538 (slave, response.getErrorMessage()))
539 self.slaveConversationEnded()
540 self._incrementFailureCounts(builder)
541 return self.fail_result(slave)
542
543 if isinstance(response, list) and len(response) == 2:
544 if method in buildd_success_result_map:
545 expected_status = buildd_success_result_map.get(method)
546 status, info = response
547 if status == expected_status:
548 self.callSlave(slave)
549 return None271 return None
550 else:272 return d.addCallback(job_started)
551 info = 'Unknown slave method: %s' % method273
552 else:274 d.addCallback(status_updated)
553 info = 'Unexpected response: %s' % repr(response)275 d.addCallback(build_updated)
554276 return d
555 self.logger.error(
556 '%s failed to dispatch (%s)' % (slave, info))
557
558 self.slaveConversationEnded()
559 self._incrementFailureCounts(builder)
560 return self.fail_result(slave, info)
561277
562278
563class NewBuildersScanner:279class NewBuildersScanner:
@@ -578,15 +294,21 @@
578 self.current_builders = [294 self.current_builders = [
579 builder.name for builder in getUtility(IBuilderSet)]295 builder.name for builder in getUtility(IBuilderSet)]
580296
297 def stop(self):
298 """Terminate the LoopingCall."""
299 self.loop.stop()
300
581 def scheduleScan(self):301 def scheduleScan(self):
582 """Schedule a callback SCAN_INTERVAL seconds later."""302 """Schedule a callback SCAN_INTERVAL seconds later."""
583 return self._clock.callLater(self.SCAN_INTERVAL, self.scan)303 self.loop = LoopingCall(self.scan)
304 self.loop.clock = self._clock
305 self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
306 return self.stopping_deferred
584307
585 def scan(self):308 def scan(self):
586 """If a new builder appears, create a SlaveScanner for it."""309 """If a new builder appears, create a SlaveScanner for it."""
587 new_builders = self.checkForNewBuilders()310 new_builders = self.checkForNewBuilders()
588 self.manager.addScanForBuilders(new_builders)311 self.manager.addScanForBuilders(new_builders)
589 self.scheduleScan()
590312
591 def checkForNewBuilders(self):313 def checkForNewBuilders(self):
592 """See if any new builders were added."""314 """See if any new builders were added."""
@@ -609,10 +331,7 @@
609 manager=self, clock=clock)331 manager=self, clock=clock)
610332
611 def _setupLogger(self):333 def _setupLogger(self):
612 """Setup a 'slave-scanner' logger that redirects to twisted.334 """Set up a 'slave-scanner' logger that redirects to twisted.
613
614 It is going to be used locally and within the thread running
615 the scan() method.
616335
617 Make it less verbose to avoid messing too much with the old code.336 Make it less verbose to avoid messing too much with the old code.
618 """337 """
@@ -643,12 +362,29 @@
643 # Events will now fire in the SlaveScanner objects to scan each362 # Events will now fire in the SlaveScanner objects to scan each
644 # builder.363 # builder.
645364
365 def stopService(self):
366 """Callback for when we need to shut down."""
367 # XXX: lacks unit tests
368 # All the SlaveScanner objects need to be halted gracefully.
369 deferreds = [slave.stopping_deferred for slave in self.builder_slaves]
370 deferreds.append(self.new_builders_scanner.stopping_deferred)
371
372 self.new_builders_scanner.stop()
373 for slave in self.builder_slaves:
374 slave.stopCycle()
375
376 # The 'stopping_deferred's are called back when the loops are
377 # stopped, so we can wait on them all at once here before
378 # exiting.
379 d = defer.DeferredList(deferreds, consumeErrors=True)
380 return d
381
646 def addScanForBuilders(self, builders):382 def addScanForBuilders(self, builders):
647 """Set up scanner objects for the builders specified."""383 """Set up scanner objects for the builders specified."""
648 for builder in builders:384 for builder in builders:
649 slave_scanner = SlaveScanner(builder, self.logger)385 slave_scanner = SlaveScanner(builder, self.logger)
650 self.builder_slaves.append(slave_scanner)386 self.builder_slaves.append(slave_scanner)
651 slave_scanner.scheduleNextScanCycle()387 slave_scanner.startCycle()
652388
653 # Return the slave list for the benefit of tests.389 # Return the slave list for the benefit of tests.
654 return self.builder_slaves390 return self.builder_slaves
655391
=== modified file 'lib/lp/buildmaster/model/builder.py'
--- lib/lp/buildmaster/model/builder.py 2010-09-24 13:39:27 +0000
+++ lib/lp/buildmaster/model/builder.py 2010-10-25 19:14:01 +0000
@@ -13,12 +13,11 @@
13 ]13 ]
1414
15import gzip15import gzip
16import httplib
17import logging16import logging
18import os17import os
19import socket18import socket
20import subprocess
21import tempfile19import tempfile
20import transaction
22import urllib221import urllib2
23import xmlrpclib22import xmlrpclib
2423
@@ -34,6 +33,13 @@
34 Count,33 Count,
35 Sum,34 Sum,
36 )35 )
36
37from twisted.internet import (
38 defer,
39 reactor as default_reactor,
40 )
41from twisted.web import xmlrpc
42
37from zope.component import getUtility43from zope.component import getUtility
38from zope.interface import implements44from zope.interface import implements
3945
@@ -58,7 +64,6 @@
58from lp.buildmaster.interfaces.builder import (64from lp.buildmaster.interfaces.builder import (
59 BuildDaemonError,65 BuildDaemonError,
60 BuildSlaveFailure,66 BuildSlaveFailure,
61 CannotBuild,
62 CannotFetchFile,67 CannotFetchFile,
63 CannotResumeHost,68 CannotResumeHost,
64 CorruptBuildCookie,69 CorruptBuildCookie,
@@ -66,9 +71,6 @@
66 IBuilderSet,71 IBuilderSet,
67 )72 )
68from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSet73from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSet
69from lp.buildmaster.interfaces.buildfarmjobbehavior import (
70 BuildBehaviorMismatch,
71 )
72from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet74from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
73from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior75from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior
74from lp.buildmaster.model.buildqueue import (76from lp.buildmaster.model.buildqueue import (
@@ -78,9 +80,9 @@
78from lp.registry.interfaces.person import validate_public_person80from lp.registry.interfaces.person import validate_public_person
79from lp.services.job.interfaces.job import JobStatus81from lp.services.job.interfaces.job import JobStatus
80from lp.services.job.model.job import Job82from lp.services.job.model.job import Job
81from lp.services.osutils import until_no_eintr
82from lp.services.propertycache import cachedproperty83from lp.services.propertycache import cachedproperty
83from lp.services.twistedsupport.xmlrpc import BlockingProxy84from lp.services.twistedsupport.processmonitor import ProcessWithTimeout
85from lp.services.twistedsupport import cancel_on_timeout
84# XXX Michael Nelson 2010-01-13 bug=49133086# XXX Michael Nelson 2010-01-13 bug=491330
85# These dependencies on soyuz will be removed when getBuildRecords()87# These dependencies on soyuz will be removed when getBuildRecords()
86# is moved.88# is moved.
@@ -92,25 +94,9 @@
92from lp.soyuz.model.processor import Processor94from lp.soyuz.model.processor import Processor
9395
9496
95class TimeoutHTTPConnection(httplib.HTTPConnection):97class QuietQueryFactory(xmlrpc._QueryFactory):
9698 """XMLRPC client factory that doesn't splatter the log with junk."""
97 def connect(self):99 noisy = False
98 """Override the standard connect() methods to set a timeout"""
99 ret = httplib.HTTPConnection.connect(self)
100 self.sock.settimeout(config.builddmaster.socket_timeout)
101 return ret
102
103
104class TimeoutHTTP(httplib.HTTP):
105 _connection_class = TimeoutHTTPConnection
106
107
108class TimeoutTransport(xmlrpclib.Transport):
109 """XMLRPC Transport to setup a socket with defined timeout"""
110
111 def make_connection(self, host):
112 host, extra_headers, x509 = self.get_host_info(host)
113 return TimeoutHTTP(host)
114100
115101
116class BuilderSlave(object):102class BuilderSlave(object):
@@ -125,24 +111,7 @@
125 # many false positives in your test run and will most likely break111 # many false positives in your test run and will most likely break
126 # production.112 # production.
127113
128 # XXX: This (BuilderSlave) should use composition, rather than114 def __init__(self, proxy, builder_url, vm_host, reactor=None):
129 # inheritance.
130
131 # XXX: Have a documented interface for the XML-RPC server:
132 # - what methods
133 # - what return values expected
134 # - what faults
135 # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py).
136
137 # XXX: Arguably, this interface should be asynchronous
138 # (i.e. Deferred-returning). This would mean that Builder (see below)
139 # would have to expect Deferreds.
140
141 # XXX: Once we have a client object with a defined, tested interface, we
142 # should make a test double that doesn't do any XML-RPC and can be used to
143 # make testing easier & tests faster.
144
145 def __init__(self, proxy, builder_url, vm_host):
146 """Initialize a BuilderSlave.115 """Initialize a BuilderSlave.
147116
148 :param proxy: An XML-RPC proxy, implementing 'callRemote'. It must117 :param proxy: An XML-RPC proxy, implementing 'callRemote'. It must
@@ -155,63 +124,87 @@
155 self._file_cache_url = urlappend(builder_url, 'filecache')124 self._file_cache_url = urlappend(builder_url, 'filecache')
156 self._server = proxy125 self._server = proxy
157126
127 if reactor is None:
128 self.reactor = default_reactor
129 else:
130 self.reactor = reactor
131
158 @classmethod132 @classmethod
159 def makeBlockingSlave(cls, builder_url, vm_host):133 def makeBuilderSlave(cls, builder_url, vm_host, reactor=None, proxy=None):
160 rpc_url = urlappend(builder_url, 'rpc')134 """Create and return a `BuilderSlave`.
161 server_proxy = xmlrpclib.ServerProxy(135
162 rpc_url, transport=TimeoutTransport(), allow_none=True)136 :param builder_url: The URL of the slave buildd machine,
163 return cls(BlockingProxy(server_proxy), builder_url, vm_host)137 e.g. http://localhost:8221
138 :param vm_host: If the slave is virtual, specify its host machine here.
139 :param reactor: Used by tests to override the Twisted reactor.
140 :param proxy: Used By tests to override the xmlrpc.Proxy.
141 """
142 rpc_url = urlappend(builder_url.encode('utf-8'), 'rpc')
143 if proxy is None:
144 server_proxy = xmlrpc.Proxy(rpc_url, allowNone=True)
145 server_proxy.queryFactory = QuietQueryFactory
146 else:
147 server_proxy = proxy
148 return cls(server_proxy, builder_url, vm_host, reactor)
149
150 def _with_timeout(self, d):
151 TIMEOUT = config.builddmaster.socket_timeout
152 return cancel_on_timeout(d, TIMEOUT, self.reactor)
164153
165 def abort(self):154 def abort(self):
166 """Abort the current build."""155 """Abort the current build."""
167 return self._server.callRemote('abort')156 return self._with_timeout(self._server.callRemote('abort'))
168157
169 def clean(self):158 def clean(self):
170 """Clean up the waiting files and reset the slave's internal state."""159 """Clean up the waiting files and reset the slave's internal state."""
171 return self._server.callRemote('clean')160 return self._with_timeout(self._server.callRemote('clean'))
172161
173 def echo(self, *args):162 def echo(self, *args):
174 """Echo the arguments back."""163 """Echo the arguments back."""
175 return self._server.callRemote('echo', *args)164 return self._with_timeout(self._server.callRemote('echo', *args))
176165
177 def info(self):166 def info(self):
178 """Return the protocol version and the builder methods supported."""167 """Return the protocol version and the builder methods supported."""
179 return self._server.callRemote('info')168 return self._with_timeout(self._server.callRemote('info'))
180169
181 def status(self):170 def status(self):
182 """Return the status of the build daemon."""171 """Return the status of the build daemon."""
183 return self._server.callRemote('status')172 return self._with_timeout(self._server.callRemote('status'))
184173
185 def ensurepresent(self, sha1sum, url, username, password):174 def ensurepresent(self, sha1sum, url, username, password):
175 # XXX: Nothing external calls this. Make it private.
186 """Attempt to ensure the given file is present."""176 """Attempt to ensure the given file is present."""
187 return self._server.callRemote(177 return self._with_timeout(self._server.callRemote(
188 'ensurepresent', sha1sum, url, username, password)178 'ensurepresent', sha1sum, url, username, password))
189179
190 def getFile(self, sha_sum):180 def getFile(self, sha_sum):
191 """Construct a file-like object to return the named file."""181 """Construct a file-like object to return the named file."""
182 # XXX 2010-10-18 bug=662631
183 # Change this to do non-blocking IO.
192 file_url = urlappend(self._file_cache_url, sha_sum)184 file_url = urlappend(self._file_cache_url, sha_sum)
193 return urllib2.urlopen(file_url)185 return urllib2.urlopen(file_url)
194186
195 def resume(self):187 def resume(self, clock=None):
196 """Resume a virtual builder.188 """Resume the builder in an asynchronous fashion.
197189
198 It uses the configuration command-line (replacing 'vm_host') and190 We use the builddmaster configuration 'socket_timeout' as
199 return its output.191 the process timeout.
200192
201 :return: a (stdout, stderr, subprocess exitcode) triple193 :param clock: An optional twisted.internet.task.Clock to override
194 the default clock. For use in tests.
195
196 :return: a Deferred that returns a
197 (stdout, stderr, subprocess exitcode) triple
202 """198 """
203 # XXX: This executes the vm_resume_command
204 # synchronously. RecordingSlave does so asynchronously. Since we
205 # always want to do this asynchronously, there's no need for the
206 # duplication.
207 resume_command = config.builddmaster.vm_resume_command % {199 resume_command = config.builddmaster.vm_resume_command % {
208 'vm_host': self._vm_host}200 'vm_host': self._vm_host}
209 resume_argv = resume_command.split()201 # Twisted API requires string but the configuration provides unicode.
210 resume_process = subprocess.Popen(202 resume_argv = [term.encode('utf-8') for term in resume_command.split()]
211 resume_argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE)203 d = defer.Deferred()
212 stdout, stderr = resume_process.communicate()204 p = ProcessWithTimeout(
213205 d, config.builddmaster.socket_timeout, clock=clock)
214 return (stdout, stderr, resume_process.returncode)206 p.spawnProcess(resume_argv[0], tuple(resume_argv))
207 return d
215208
216 def cacheFile(self, logger, libraryfilealias):209 def cacheFile(self, logger, libraryfilealias):
217 """Make sure that the file at 'libraryfilealias' is on the slave.210 """Make sure that the file at 'libraryfilealias' is on the slave.
@@ -224,13 +217,15 @@
224 "Asking builder on %s to ensure it has file %s (%s, %s)" % (217 "Asking builder on %s to ensure it has file %s (%s, %s)" % (
225 self._file_cache_url, libraryfilealias.filename, url,218 self._file_cache_url, libraryfilealias.filename, url,
226 libraryfilealias.content.sha1))219 libraryfilealias.content.sha1))
227 self.sendFileToSlave(libraryfilealias.content.sha1, url)220 return self.sendFileToSlave(libraryfilealias.content.sha1, url)
228221
229 def sendFileToSlave(self, sha1, url, username="", password=""):222 def sendFileToSlave(self, sha1, url, username="", password=""):
230 """Helper to send the file at 'url' with 'sha1' to this builder."""223 """Helper to send the file at 'url' with 'sha1' to this builder."""
231 present, info = self.ensurepresent(sha1, url, username, password)224 d = self.ensurepresent(sha1, url, username, password)
232 if not present:225 def check_present((present, info)):
233 raise CannotFetchFile(url, info)226 if not present:
227 raise CannotFetchFile(url, info)
228 return d.addCallback(check_present)
234229
235 def build(self, buildid, builder_type, chroot_sha1, filemap, args):230 def build(self, buildid, builder_type, chroot_sha1, filemap, args):
236 """Build a thing on this build slave.231 """Build a thing on this build slave.
@@ -243,19 +238,18 @@
243 :param args: A dictionary of extra arguments. The contents depend on238 :param args: A dictionary of extra arguments. The contents depend on
244 the build job type.239 the build job type.
245 """240 """
246 try:241 d = self._with_timeout(self._server.callRemote(
247 return self._server.callRemote(242 'build', buildid, builder_type, chroot_sha1, filemap, args))
248 'build', buildid, builder_type, chroot_sha1, filemap, args)243 def got_fault(failure):
249 except xmlrpclib.Fault, info:244 failure.trap(xmlrpclib.Fault)
250 raise BuildSlaveFailure(info)245 raise BuildSlaveFailure(failure.value)
246 return d.addErrback(got_fault)
251247
252248
253# This is a separate function since MockBuilder needs to use it too.249# This is a separate function since MockBuilder needs to use it too.
254# Do not use it -- (Mock)Builder.rescueIfLost should be used instead.250# Do not use it -- (Mock)Builder.rescueIfLost should be used instead.
255def rescueBuilderIfLost(builder, logger=None):251def rescueBuilderIfLost(builder, logger=None):
256 """See `IBuilder`."""252 """See `IBuilder`."""
257 status_sentence = builder.slaveStatusSentence()
258
259 # 'ident_position' dict relates the position of the job identifier253 # 'ident_position' dict relates the position of the job identifier
260 # token in the sentence received from status(), according the254 # token in the sentence received from status(), according the
261 # two status we care about. See see lib/canonical/buildd/slave.py255 # two status we care about. See see lib/canonical/buildd/slave.py
@@ -265,61 +259,58 @@
265 'BuilderStatus.WAITING': 2259 'BuilderStatus.WAITING': 2
266 }260 }
267261
268 # Isolate the BuilderStatus string, always the first token in262 d = builder.slaveStatusSentence()
269 # see lib/canonical/buildd/slave.py and263
270 # IBuilder.slaveStatusSentence().264 def got_status(status_sentence):
271 status = status_sentence[0]265 """After we get the status, clean if we have to.
272266
273 # If the cookie test below fails, it will request an abort of the267 Always return status_sentence.
274 # builder. This will leave the builder in the aborted state and268 """
275 # with no assigned job, and we should now "clean" the slave which269 # Isolate the BuilderStatus string, always the first token in
276 # will reset its state back to IDLE, ready to accept new builds.270 # see lib/canonical/buildd/slave.py and
277 # This situation is usually caused by a temporary loss of271 # IBuilder.slaveStatusSentence().
278 # communications with the slave and the build manager had to reset272 status = status_sentence[0]
279 # the job.273
280 if status == 'BuilderStatus.ABORTED' and builder.currentjob is None:274 # If the cookie test below fails, it will request an abort of the
281 builder.cleanSlave()275 # builder. This will leave the builder in the aborted state and
282 if logger is not None:276 # with no assigned job, and we should now "clean" the slave which
283 logger.info(277 # will reset its state back to IDLE, ready to accept new builds.
284 "Builder '%s' cleaned up from ABORTED" % builder.name)278 # This situation is usually caused by a temporary loss of
285 return279 # communications with the slave and the build manager had to reset
286280 # the job.
287 # If slave is not building nor waiting, it's not in need of rescuing.281 if status == 'BuilderStatus.ABORTED' and builder.currentjob is None:
288 if status not in ident_position.keys():282 if logger is not None:
289 return283 logger.info(
290284 "Builder '%s' being cleaned up from ABORTED" %
291 slave_build_id = status_sentence[ident_position[status]]285 (builder.name,))
292286 d = builder.cleanSlave()
293 try:287 return d.addCallback(lambda ignored: status_sentence)
294 builder.verifySlaveBuildCookie(slave_build_id)
295 except CorruptBuildCookie, reason:
296 if status == 'BuilderStatus.WAITING':
297 builder.cleanSlave()
298 else:288 else:
299 builder.requestAbort()289 return status_sentence
300 if logger:290
301 logger.info(291 def rescue_slave(status_sentence):
302 "Builder '%s' rescued from '%s': '%s'" %292 # If slave is not building nor waiting, it's not in need of rescuing.
303 (builder.name, slave_build_id, reason))293 status = status_sentence[0]
304294 if status not in ident_position.keys():
305295 return
306def _update_builder_status(builder, logger=None):296 slave_build_id = status_sentence[ident_position[status]]
307 """Really update the builder status."""297 try:
308 try:298 builder.verifySlaveBuildCookie(slave_build_id)
309 builder.checkSlaveAlive()299 except CorruptBuildCookie, reason:
310 builder.rescueIfLost(logger)300 if status == 'BuilderStatus.WAITING':
311 # Catch only known exceptions.301 d = builder.cleanSlave()
312 # XXX cprov 2007-06-15 bug=120571: ValueError & TypeError catching is302 else:
313 # disturbing in this context. We should spend sometime sanitizing the303 d = builder.requestAbort()
314 # exceptions raised in the Builder API since we already started the304 def log_rescue(ignored):
315 # main refactoring of this area.305 if logger:
316 except (ValueError, TypeError, xmlrpclib.Fault,306 logger.info(
317 BuildDaemonError), reason:307 "Builder '%s' rescued from '%s': '%s'" %
318 builder.failBuilder(str(reason))308 (builder.name, slave_build_id, reason))
319 if logger:309 return d.addCallback(log_rescue)
320 logger.warn(310
321 "%s (%s) marked as failed due to: %s",311 d.addCallback(got_status)
322 builder.name, builder.url, builder.failnotes, exc_info=True)312 d.addCallback(rescue_slave)
313 return d
323314
324315
325def updateBuilderStatus(builder, logger=None):316def updateBuilderStatus(builder, logger=None):
@@ -327,16 +318,7 @@
327 if logger:318 if logger:
328 logger.debug('Checking %s' % builder.name)319 logger.debug('Checking %s' % builder.name)
329320
330 MAX_EINTR_RETRIES = 42 # pulling a number out of my a$$ here321 return builder.rescueIfLost(logger)
331 try:
332 return until_no_eintr(
333 MAX_EINTR_RETRIES, _update_builder_status, builder, logger=logger)
334 except socket.error, reason:
335 # In Python 2.6 we can use IOError instead. It also has
336 # reason.errno but we might be using 2.5 here so use the
337 # index hack.
338 error_message = str(reason)
339 builder.handleTimeout(logger, error_message)
340322
341323
342class Builder(SQLBase):324class Builder(SQLBase):
@@ -364,6 +346,10 @@
364 active = BoolCol(dbName='active', notNull=True, default=True)346 active = BoolCol(dbName='active', notNull=True, default=True)
365 failure_count = IntCol(dbName='failure_count', default=0, notNull=True)347 failure_count = IntCol(dbName='failure_count', default=0, notNull=True)
366348
349 # The number of times a builder can consecutively fail before we
350 # give up and mark it builderok=False.
351 FAILURE_THRESHOLD = 5
352
367 def _getCurrentBuildBehavior(self):353 def _getCurrentBuildBehavior(self):
368 """Return the current build behavior."""354 """Return the current build behavior."""
369 if not safe_hasattr(self, '_current_build_behavior'):355 if not safe_hasattr(self, '_current_build_behavior'):
@@ -409,18 +395,13 @@
409 """See `IBuilder`."""395 """See `IBuilder`."""
410 self.failure_count = 0396 self.failure_count = 0
411397
412 def checkSlaveAlive(self):
413 """See IBuilder."""
414 if self.slave.echo("Test")[0] != "Test":
415 raise BuildDaemonError("Failed to echo OK")
416
417 def rescueIfLost(self, logger=None):398 def rescueIfLost(self, logger=None):
418 """See `IBuilder`."""399 """See `IBuilder`."""
419 rescueBuilderIfLost(self, logger)400 return rescueBuilderIfLost(self, logger)
420401
421 def updateStatus(self, logger=None):402 def updateStatus(self, logger=None):
422 """See `IBuilder`."""403 """See `IBuilder`."""
423 updateBuilderStatus(self, logger)404 return updateBuilderStatus(self, logger)
424405
425 def cleanSlave(self):406 def cleanSlave(self):
426 """See IBuilder."""407 """See IBuilder."""
@@ -440,20 +421,23 @@
440 def resumeSlaveHost(self):421 def resumeSlaveHost(self):
441 """See IBuilder."""422 """See IBuilder."""
442 if not self.virtualized:423 if not self.virtualized:
443 raise CannotResumeHost('Builder is not virtualized.')424 return defer.fail(CannotResumeHost('Builder is not virtualized.'))
444425
445 if not self.vm_host:426 if not self.vm_host:
446 raise CannotResumeHost('Undefined vm_host.')427 return defer.fail(CannotResumeHost('Undefined vm_host.'))
447428
448 logger = self._getSlaveScannerLogger()429 logger = self._getSlaveScannerLogger()
449 logger.debug("Resuming %s (%s)" % (self.name, self.url))430 logger.debug("Resuming %s (%s)" % (self.name, self.url))
450431
451 stdout, stderr, returncode = self.slave.resume()432 d = self.slave.resume()
452 if returncode != 0:433 def got_resume_ok((stdout, stderr, returncode)):
434 return stdout, stderr
435 def got_resume_bad(failure):
436 stdout, stderr, code = failure.value
453 raise CannotResumeHost(437 raise CannotResumeHost(
454 "Resuming failed:\nOUT:\n%s\nERR:\n%s\n" % (stdout, stderr))438 "Resuming failed:\nOUT:\n%s\nERR:\n%s\n" % (stdout, stderr))
455439
456 return stdout, stderr440 return d.addCallback(got_resume_ok).addErrback(got_resume_bad)
457441
458 @cachedproperty442 @cachedproperty
459 def slave(self):443 def slave(self):
@@ -462,7 +446,7 @@
462 # the slave object, which is usually an XMLRPC client, with a446 # the slave object, which is usually an XMLRPC client, with a
463 # stub object that removes the need to actually create a buildd447 # stub object that removes the need to actually create a buildd
464 # slave in various states - which can be hard to create.448 # slave in various states - which can be hard to create.
465 return BuilderSlave.makeBlockingSlave(self.url, self.vm_host)449 return BuilderSlave.makeBuilderSlave(self.url, self.vm_host)
466450
467 def setSlaveForTesting(self, proxy):451 def setSlaveForTesting(self, proxy):
468 """See IBuilder."""452 """See IBuilder."""
@@ -483,18 +467,23 @@
483467
484 # If we are building a virtual build, resume the virtual machine.468 # If we are building a virtual build, resume the virtual machine.
485 if self.virtualized:469 if self.virtualized:
486 self.resumeSlaveHost()470 d = self.resumeSlaveHost()
471 else:
472 d = defer.succeed(None)
487473
488 # Do it.474 def resume_done(ignored):
489 build_queue_item.markAsBuilding(self)475 return self.current_build_behavior.dispatchBuildToSlave(
490 try:
491 self.current_build_behavior.dispatchBuildToSlave(
492 build_queue_item.id, logger)476 build_queue_item.id, logger)
493 except BuildSlaveFailure, e:477
494 logger.debug("Disabling builder: %s" % self.url, exc_info=1)478 def eb_slave_failure(failure):
479 failure.trap(BuildSlaveFailure)
480 e = failure.value
495 self.failBuilder(481 self.failBuilder(
496 "Exception (%s) when setting up to new job" % (e,))482 "Exception (%s) when setting up to new job" % (e,))
497 except CannotFetchFile, e:483
484 def eb_cannot_fetch_file(failure):
485 failure.trap(CannotFetchFile)
486 e = failure.value
498 message = """Slave '%s' (%s) was unable to fetch file.487 message = """Slave '%s' (%s) was unable to fetch file.
499 ****** URL ********488 ****** URL ********
500 %s489 %s
@@ -503,10 +492,19 @@
503 *******************492 *******************
504 """ % (self.name, self.url, e.file_url, e.error_information)493 """ % (self.name, self.url, e.file_url, e.error_information)
505 raise BuildDaemonError(message)494 raise BuildDaemonError(message)
506 except socket.error, e:495
496 def eb_socket_error(failure):
497 failure.trap(socket.error)
498 e = failure.value
507 error_message = "Exception (%s) when setting up new job" % (e,)499 error_message = "Exception (%s) when setting up new job" % (e,)
508 self.handleTimeout(logger, error_message)500 d = self.handleTimeout(logger, error_message)
509 raise BuildSlaveFailure501 return d.addBoth(lambda ignored: failure)
502
503 d.addCallback(resume_done)
504 d.addErrback(eb_slave_failure)
505 d.addErrback(eb_cannot_fetch_file)
506 d.addErrback(eb_socket_error)
507 return d
510508
511 def failBuilder(self, reason):509 def failBuilder(self, reason):
512 """See IBuilder"""510 """See IBuilder"""
@@ -534,22 +532,24 @@
534532
535 def slaveStatus(self):533 def slaveStatus(self):
536 """See IBuilder."""534 """See IBuilder."""
537 builder_version, builder_arch, mechanisms = self.slave.info()535 d = self.slave.status()
538 status_sentence = self.slave.status()536 def got_status(status_sentence):
539537 status = {'builder_status': status_sentence[0]}
540 status = {'builder_status': status_sentence[0]}538
541539 # Extract detailed status and log information if present.
542 # Extract detailed status and log information if present.540 # Although build_id is also easily extractable here, there is no
543 # Although build_id is also easily extractable here, there is no541 # valid reason for anything to use it, so we exclude it.
544 # valid reason for anything to use it, so we exclude it.542 if status['builder_status'] == 'BuilderStatus.WAITING':
545 if status['builder_status'] == 'BuilderStatus.WAITING':543 status['build_status'] = status_sentence[1]
546 status['build_status'] = status_sentence[1]544 else:
547 else:545 if status['builder_status'] == 'BuilderStatus.BUILDING':
548 if status['builder_status'] == 'BuilderStatus.BUILDING':546 status['logtail'] = status_sentence[2]
549 status['logtail'] = status_sentence[2]547
550548 self.current_build_behavior.updateSlaveStatus(
551 self.current_build_behavior.updateSlaveStatus(status_sentence, status)549 status_sentence, status)
552 return status550 return status
551
552 return d.addCallback(got_status)
553553
554 def slaveStatusSentence(self):554 def slaveStatusSentence(self):
555 """See IBuilder."""555 """See IBuilder."""
@@ -562,13 +562,15 @@
562562
563 def updateBuild(self, queueItem):563 def updateBuild(self, queueItem):
564 """See `IBuilder`."""564 """See `IBuilder`."""
565 self.current_build_behavior.updateBuild(queueItem)565 return self.current_build_behavior.updateBuild(queueItem)
566566
567 def transferSlaveFileToLibrarian(self, file_sha1, filename, private):567 def transferSlaveFileToLibrarian(self, file_sha1, filename, private):
568 """See IBuilder."""568 """See IBuilder."""
569 out_file_fd, out_file_name = tempfile.mkstemp(suffix=".buildlog")569 out_file_fd, out_file_name = tempfile.mkstemp(suffix=".buildlog")
570 out_file = os.fdopen(out_file_fd, "r+")570 out_file = os.fdopen(out_file_fd, "r+")
571 try:571 try:
572 # XXX 2010-10-18 bug=662631
573 # Change this to do non-blocking IO.
572 slave_file = self.slave.getFile(file_sha1)574 slave_file = self.slave.getFile(file_sha1)
573 copy_and_close(slave_file, out_file)575 copy_and_close(slave_file, out_file)
574 # If the requested file is the 'buildlog' compress it using gzip576 # If the requested file is the 'buildlog' compress it using gzip
@@ -599,18 +601,17 @@
599601
600 return library_file.id602 return library_file.id
601603
602 @property604 def isAvailable(self):
603 def is_available(self):
604 """See `IBuilder`."""605 """See `IBuilder`."""
605 if not self.builderok:606 if not self.builderok:
606 return False607 return defer.succeed(False)
607 try:608 d = self.slaveStatusSentence()
608 slavestatus = self.slaveStatusSentence()609 def catch_fault(failure):
609 except (xmlrpclib.Fault, socket.error):610 failure.trap(xmlrpclib.Fault, socket.error)
610 return False611 return False
611 if slavestatus[0] != BuilderStatus.IDLE:612 def check_available(status):
612 return False613 return status[0] == BuilderStatus.IDLE
613 return True614 return d.addCallbacks(check_available, catch_fault)
614615
615 def _getSlaveScannerLogger(self):616 def _getSlaveScannerLogger(self):
616 """Return the logger instance from buildd-slave-scanner.py."""617 """Return the logger instance from buildd-slave-scanner.py."""
@@ -621,6 +622,27 @@
621 logger = logging.getLogger('slave-scanner')622 logger = logging.getLogger('slave-scanner')
622 return logger623 return logger
623624
625 def acquireBuildCandidate(self):
626 """Acquire a build candidate in an atomic fashion.
627
628 When retrieiving a candidate we need to mark it as building
629 immediately so that it is not dispatched by another builder in the
630 build manager.
631
632 We can consider this to be atomic because although the build manager
633 is a Twisted app and gives the appearance of doing lots of things at
634 once, it's still single-threaded so no more than one builder scan
635 can be in this code at the same time.
636
637 If there's ever more than one build manager running at once, then
638 this code will need some sort of mutex.
639 """
640 candidate = self._findBuildCandidate()
641 if candidate is not None:
642 candidate.markAsBuilding(self)
643 transaction.commit()
644 return candidate
645
624 def _findBuildCandidate(self):646 def _findBuildCandidate(self):
625 """Find a candidate job for dispatch to an idle buildd slave.647 """Find a candidate job for dispatch to an idle buildd slave.
626648
@@ -700,52 +722,46 @@
700 :param candidate: The job to dispatch.722 :param candidate: The job to dispatch.
701 """723 """
702 logger = self._getSlaveScannerLogger()724 logger = self._getSlaveScannerLogger()
703 try:725 # Using maybeDeferred ensures that any exceptions are also
704 self.startBuild(candidate, logger)726 # wrapped up and caught later.
705 except (BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch), err:727 d = defer.maybeDeferred(self.startBuild, candidate, logger)
706 logger.warn('Could not build: %s' % err)728 return d
707729
708 def handleTimeout(self, logger, error_message):730 def handleTimeout(self, logger, error_message):
709 """See IBuilder."""731 """See IBuilder."""
710 builder_should_be_failed = True
711
712 if self.virtualized:732 if self.virtualized:
713 # Virtualized/PPA builder: attempt a reset.733 # Virtualized/PPA builder: attempt a reset.
714 logger.warn(734 logger.warn(
715 "Resetting builder: %s -- %s" % (self.url, error_message),735 "Resetting builder: %s -- %s" % (self.url, error_message),
716 exc_info=True)736 exc_info=True)
717 try:737 d = self.resumeSlaveHost()
718 self.resumeSlaveHost()738 return d
719 except CannotResumeHost, err:739 else:
720 # Failed to reset builder.740 # XXX: This should really let the failure bubble up to the
721 logger.warn(741 # scan() method that does the failure counting.
722 "Failed to reset builder: %s -- %s" %
723 (self.url, str(err)), exc_info=True)
724 else:
725 # Builder was reset, do *not* mark it as failed.
726 builder_should_be_failed = False
727
728 if builder_should_be_failed:
729 # Mark builder as 'failed'.742 # Mark builder as 'failed'.
730 logger.warn(743 logger.warn(
731 "Disabling builder: %s -- %s" % (self.url, error_message),744 "Disabling builder: %s -- %s" % (self.url, error_message))
732 exc_info=True)
733 self.failBuilder(error_message)745 self.failBuilder(error_message)
746 return defer.succeed(None)
734747
735 def findAndStartJob(self, buildd_slave=None):748 def findAndStartJob(self, buildd_slave=None):
736 """See IBuilder."""749 """See IBuilder."""
750 # XXX This method should be removed in favour of two separately
751 # called methods that find and dispatch the job. It will
752 # require a lot of test fixing.
737 logger = self._getSlaveScannerLogger()753 logger = self._getSlaveScannerLogger()
738 candidate = self._findBuildCandidate()754 candidate = self.acquireBuildCandidate()
739755
740 if candidate is None:756 if candidate is None:
741 logger.debug("No build candidates available for builder.")757 logger.debug("No build candidates available for builder.")
742 return None758 return defer.succeed(None)
743759
744 if buildd_slave is not None:760 if buildd_slave is not None:
745 self.setSlaveForTesting(buildd_slave)761 self.setSlaveForTesting(buildd_slave)
746762
747 self._dispatchBuildCandidate(candidate)763 d = self._dispatchBuildCandidate(candidate)
748 return candidate764 return d.addCallback(lambda ignored: candidate)
749765
750 def getBuildQueue(self):766 def getBuildQueue(self):
751 """See `IBuilder`."""767 """See `IBuilder`."""
752768
=== modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py'
--- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-08-20 20:31:18 +0000
+++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-10-25 19:14:01 +0000
@@ -16,13 +16,18 @@
16import socket16import socket
17import xmlrpclib17import xmlrpclib
1818
19from twisted.internet import defer
20
19from zope.component import getUtility21from zope.component import getUtility
20from zope.interface import implements22from zope.interface import implements
21from zope.security.proxy import removeSecurityProxy23from zope.security.proxy import removeSecurityProxy
2224
23from canonical import encoding25from canonical import encoding
24from canonical.librarian.interfaces import ILibrarianClient26from canonical.librarian.interfaces import ILibrarianClient
25from lp.buildmaster.interfaces.builder import CorruptBuildCookie27from lp.buildmaster.interfaces.builder import (
28 BuildSlaveFailure,
29 CorruptBuildCookie,
30 )
26from lp.buildmaster.interfaces.buildfarmjobbehavior import (31from lp.buildmaster.interfaces.buildfarmjobbehavior import (
27 BuildBehaviorMismatch,32 BuildBehaviorMismatch,
28 IBuildFarmJobBehavior,33 IBuildFarmJobBehavior,
@@ -69,54 +74,53 @@
69 """See `IBuildFarmJobBehavior`."""74 """See `IBuildFarmJobBehavior`."""
70 logger = logging.getLogger('slave-scanner')75 logger = logging.getLogger('slave-scanner')
7176
72 try:77 d = self._builder.slaveStatus()
73 slave_status = self._builder.slaveStatus()78
74 except (xmlrpclib.Fault, socket.error), info:79 def got_failure(failure):
75 # XXX cprov 2005-06-29:80 failure.trap(xmlrpclib.Fault, socket.error)
76 # Hmm, a problem with the xmlrpc interface,81 info = failure.value
77 # disable the builder ?? or simple notice the failure
78 # with a timestamp.
79 info = ("Could not contact the builder %s, caught a (%s)"82 info = ("Could not contact the builder %s, caught a (%s)"
80 % (queueItem.builder.url, info))83 % (queueItem.builder.url, info))
81 logger.debug(info, exc_info=True)84 raise BuildSlaveFailure(info)
82 # keep the job for scan85
83 return86 def got_status(slave_status):
8487 builder_status_handlers = {
85 builder_status_handlers = {88 'BuilderStatus.IDLE': self.updateBuild_IDLE,
86 'BuilderStatus.IDLE': self.updateBuild_IDLE,89 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,
87 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,90 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,
88 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,91 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,
89 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,92 'BuilderStatus.WAITING': self.updateBuild_WAITING,
90 'BuilderStatus.WAITING': self.updateBuild_WAITING,93 }
91 }94
9295 builder_status = slave_status['builder_status']
93 builder_status = slave_status['builder_status']96 if builder_status not in builder_status_handlers:
94 if builder_status not in builder_status_handlers:97 logger.critical(
95 logger.critical(98 "Builder on %s returned unknown status %s, failing it"
96 "Builder on %s returned unknown status %s, failing it"99 % (self._builder.url, builder_status))
97 % (self._builder.url, builder_status))100 self._builder.failBuilder(
98 self._builder.failBuilder(101 "Unknown status code (%s) returned from status() probe."
99 "Unknown status code (%s) returned from status() probe."102 % builder_status)
100 % builder_status)103 # XXX: This will leave the build and job in a bad state, but
101 # XXX: This will leave the build and job in a bad state, but104 # should never be possible, since our builder statuses are
102 # should never be possible, since our builder statuses are105 # known.
103 # known.106 queueItem._builder = None
104 queueItem._builder = None107 queueItem.setDateStarted(None)
105 queueItem.setDateStarted(None)108 return
106 return109
107110 # Since logtail is a xmlrpclib.Binary container and it is
108 # Since logtail is a xmlrpclib.Binary container and it is returned111 # returned from the IBuilder content class, it arrives
109 # from the IBuilder content class, it arrives protected by a Zope112 # protected by a Zope Security Proxy, which is not declared,
110 # Security Proxy, which is not declared, thus empty. Before passing113 # thus empty. Before passing it to the status handlers we
111 # it to the status handlers we will simply remove the proxy.114 # will simply remove the proxy.
112 logtail = removeSecurityProxy(slave_status.get('logtail'))115 logtail = removeSecurityProxy(slave_status.get('logtail'))
113116
114 method = builder_status_handlers[builder_status]117 method = builder_status_handlers[builder_status]
115 try:118 return defer.maybeDeferred(
116 method(queueItem, slave_status, logtail, logger)119 method, queueItem, slave_status, logtail, logger)
117 except TypeError, e:120
118 logger.critical("Received wrong number of args in response.")121 d.addErrback(got_failure)
119 logger.exception(e)122 d.addCallback(got_status)
123 return d
120124
121 def updateBuild_IDLE(self, queueItem, slave_status, logtail, logger):125 def updateBuild_IDLE(self, queueItem, slave_status, logtail, logger):
122 """Somehow the builder forgot about the build job.126 """Somehow the builder forgot about the build job.
@@ -146,11 +150,13 @@
146150
147 Clean the builder for another jobs.151 Clean the builder for another jobs.
148 """152 """
149 queueItem.builder.cleanSlave()153 d = queueItem.builder.cleanSlave()
150 queueItem.builder = None154 def got_cleaned(ignored):
151 if queueItem.job.status != JobStatus.FAILED:155 queueItem.builder = None
152 queueItem.job.fail()156 if queueItem.job.status != JobStatus.FAILED:
153 queueItem.specific_job.jobAborted()157 queueItem.job.fail()
158 queueItem.specific_job.jobAborted()
159 return d.addCallback(got_cleaned)
154160
155 def extractBuildStatus(self, slave_status):161 def extractBuildStatus(self, slave_status):
156 """Read build status name.162 """Read build status name.
@@ -185,6 +191,8 @@
185 # XXX: dsilvers 2005-03-02: Confirm the builder has the right build?191 # XXX: dsilvers 2005-03-02: Confirm the builder has the right build?
186192
187 build = queueItem.specific_job.build193 build = queueItem.specific_job.build
194 # XXX 2010-10-18 bug=662631
195 # Change this to do non-blocking IO.
188 build.handleStatus(build_status, librarian, slave_status)196 build.handleStatus(build_status, librarian, slave_status)
189197
190198
191199
=== modified file 'lib/lp/buildmaster/model/packagebuild.py'
--- lib/lp/buildmaster/model/packagebuild.py 2010-10-02 11:41:43 +0000
+++ lib/lp/buildmaster/model/packagebuild.py 2010-10-25 19:14:01 +0000
@@ -165,6 +165,8 @@
165 def getLogFromSlave(package_build):165 def getLogFromSlave(package_build):
166 """See `IPackageBuild`."""166 """See `IPackageBuild`."""
167 builder = package_build.buildqueue_record.builder167 builder = package_build.buildqueue_record.builder
168 # XXX 2010-10-18 bug=662631
169 # Change this to do non-blocking IO.
168 return builder.transferSlaveFileToLibrarian(170 return builder.transferSlaveFileToLibrarian(
169 SLAVE_LOG_FILENAME,171 SLAVE_LOG_FILENAME,
170 package_build.buildqueue_record.getLogFileName(),172 package_build.buildqueue_record.getLogFileName(),
@@ -180,6 +182,8 @@
180 # log, builder and date_finished are read-only, so we must182 # log, builder and date_finished are read-only, so we must
181 # currently remove the security proxy to set them.183 # currently remove the security proxy to set them.
182 naked_build = removeSecurityProxy(build)184 naked_build = removeSecurityProxy(build)
185 # XXX 2010-10-18 bug=662631
186 # Change this to do non-blocking IO.
183 naked_build.log = build.getLogFromSlave(build)187 naked_build.log = build.getLogFromSlave(build)
184 naked_build.builder = build.buildqueue_record.builder188 naked_build.builder = build.buildqueue_record.builder
185 # XXX cprov 20060615 bug=120584: Currently buildduration includes189 # XXX cprov 20060615 bug=120584: Currently buildduration includes
@@ -276,6 +280,8 @@
276 logger.critical("Unknown BuildStatus '%s' for builder '%s'"280 logger.critical("Unknown BuildStatus '%s' for builder '%s'"
277 % (status, self.buildqueue_record.builder.url))281 % (status, self.buildqueue_record.builder.url))
278 return282 return
283 # XXX 2010-10-18 bug=662631
284 # Change this to do non-blocking IO.
279 method(librarian, slave_status, logger)285 method(librarian, slave_status, logger)
280286
281 def _handleStatus_OK(self, librarian, slave_status, logger):287 def _handleStatus_OK(self, librarian, slave_status, logger):
282288
=== modified file 'lib/lp/buildmaster/tests/mock_slaves.py'
--- lib/lp/buildmaster/tests/mock_slaves.py 2010-09-23 12:35:21 +0000
+++ lib/lp/buildmaster/tests/mock_slaves.py 2010-10-25 19:14:01 +0000
@@ -6,21 +6,40 @@
6__metaclass__ = type6__metaclass__ = type
77
8__all__ = [8__all__ = [
9 'AbortedSlave',
10 'AbortingSlave',
11 'BrokenSlave',
12 'BuildingSlave',
13 'CorruptBehavior',
14 'DeadProxy',
15 'LostBuildingBrokenSlave',
9 'MockBuilder',16 'MockBuilder',
10 'LostBuildingBrokenSlave',
11 'BrokenSlave',
12 'OkSlave',17 'OkSlave',
13 'BuildingSlave',18 'SlaveTestHelpers',
14 'AbortedSlave',19 'TrivialBehavior',
15 'WaitingSlave',20 'WaitingSlave',
16 'AbortingSlave',
17 ]21 ]
1822
23import fixtures
24import os
25
19from StringIO import StringIO26from StringIO import StringIO
20import xmlrpclib27import xmlrpclib
2128
22from lp.buildmaster.interfaces.builder import CannotFetchFile29from testtools.content import Content
30from testtools.content_type import UTF8_TEXT
31
32from twisted.internet import defer
33from twisted.web import xmlrpc
34
35from canonical.buildd.tests.harness import BuilddSlaveTestSetup
36
37from lp.buildmaster.interfaces.builder import (
38 CannotFetchFile,
39 CorruptBuildCookie,
40 )
23from lp.buildmaster.model.builder import (41from lp.buildmaster.model.builder import (
42 BuilderSlave,
24 rescueBuilderIfLost,43 rescueBuilderIfLost,
25 updateBuilderStatus,44 updateBuilderStatus,
26 )45 )
@@ -59,15 +78,9 @@
59 slave_build_id)78 slave_build_id)
6079
61 def cleanSlave(self):80 def cleanSlave(self):
62 # XXX: This should not print anything. The print is only here to make
63 # doc/builder.txt a meaningful test.
64 print 'Cleaning slave'
65 return self.slave.clean()81 return self.slave.clean()
6682
67 def requestAbort(self):83 def requestAbort(self):
68 # XXX: This should not print anything. The print is only here to make
69 # doc/builder.txt a meaningful test.
70 print 'Aborting slave'
71 return self.slave.abort()84 return self.slave.abort()
7285
73 def resumeSlave(self, logger):86 def resumeSlave(self, logger):
@@ -77,10 +90,10 @@
77 pass90 pass
7891
79 def rescueIfLost(self, logger=None):92 def rescueIfLost(self, logger=None):
80 rescueBuilderIfLost(self, logger)93 return rescueBuilderIfLost(self, logger)
8194
82 def updateStatus(self, logger=None):95 def updateStatus(self, logger=None):
83 updateBuilderStatus(self, logger)96 return defer.maybeDeferred(updateBuilderStatus, self, logger)
8497
8598
86# XXX: It would be *really* nice to run some set of tests against the real99# XXX: It would be *really* nice to run some set of tests against the real
@@ -95,36 +108,44 @@
95 self.arch_tag = arch_tag108 self.arch_tag = arch_tag
96109
97 def status(self):110 def status(self):
98 return ('BuilderStatus.IDLE', '')111 return defer.succeed(('BuilderStatus.IDLE', ''))
99112
100 def ensurepresent(self, sha1, url, user=None, password=None):113 def ensurepresent(self, sha1, url, user=None, password=None):
101 self.call_log.append(('ensurepresent', url, user, password))114 self.call_log.append(('ensurepresent', url, user, password))
102 return True, None115 return defer.succeed((True, None))
103116
104 def build(self, buildid, buildtype, chroot, filemap, args):117 def build(self, buildid, buildtype, chroot, filemap, args):
105 self.call_log.append(118 self.call_log.append(
106 ('build', buildid, buildtype, chroot, filemap.keys(), args))119 ('build', buildid, buildtype, chroot, filemap.keys(), args))
107 info = 'OkSlave BUILDING'120 info = 'OkSlave BUILDING'
108 return ('BuildStatus.Building', info)121 return defer.succeed(('BuildStatus.Building', info))
109122
110 def echo(self, *args):123 def echo(self, *args):
111 self.call_log.append(('echo',) + args)124 self.call_log.append(('echo',) + args)
112 return args125 return defer.succeed(args)
113126
114 def clean(self):127 def clean(self):
115 self.call_log.append('clean')128 self.call_log.append('clean')
129 return defer.succeed(None)
116130
117 def abort(self):131 def abort(self):
118 self.call_log.append('abort')132 self.call_log.append('abort')
133 return defer.succeed(None)
119134
120 def info(self):135 def info(self):
121 self.call_log.append('info')136 self.call_log.append('info')
122 return ('1.0', self.arch_tag, 'debian')137 return defer.succeed(('1.0', self.arch_tag, 'debian'))
138
139 def resume(self):
140 self.call_log.append('resume')
141 return defer.succeed(("", "", 0))
123142
124 def sendFileToSlave(self, sha1, url, username="", password=""):143 def sendFileToSlave(self, sha1, url, username="", password=""):
125 present, info = self.ensurepresent(sha1, url, username, password)144 d = self.ensurepresent(sha1, url, username, password)
126 if not present:145 def check_present((present, info)):
127 raise CannotFetchFile(url, info)146 if not present:
147 raise CannotFetchFile(url, info)
148 return d.addCallback(check_present)
128149
129 def cacheFile(self, logger, libraryfilealias):150 def cacheFile(self, logger, libraryfilealias):
130 return self.sendFileToSlave(151 return self.sendFileToSlave(
@@ -141,9 +162,11 @@
141 def status(self):162 def status(self):
142 self.call_log.append('status')163 self.call_log.append('status')
143 buildlog = xmlrpclib.Binary("This is a build log")164 buildlog = xmlrpclib.Binary("This is a build log")
144 return ('BuilderStatus.BUILDING', self.build_id, buildlog)165 return defer.succeed(
166 ('BuilderStatus.BUILDING', self.build_id, buildlog))
145167
146 def getFile(self, sum):168 def getFile(self, sum):
169 # XXX: This needs to be updated to return a Deferred.
147 self.call_log.append('getFile')170 self.call_log.append('getFile')
148 if sum == "buildlog":171 if sum == "buildlog":
149 s = StringIO("This is a build log")172 s = StringIO("This is a build log")
@@ -155,11 +178,15 @@
155 """A mock slave that looks like it's currently waiting."""178 """A mock slave that looks like it's currently waiting."""
156179
157 def __init__(self, state='BuildStatus.OK', dependencies=None,180 def __init__(self, state='BuildStatus.OK', dependencies=None,
158 build_id='1-1'):181 build_id='1-1', filemap=None):
159 super(WaitingSlave, self).__init__()182 super(WaitingSlave, self).__init__()
160 self.state = state183 self.state = state
161 self.dependencies = dependencies184 self.dependencies = dependencies
162 self.build_id = build_id185 self.build_id = build_id
186 if filemap is None:
187 self.filemap = {}
188 else:
189 self.filemap = filemap
163190
164 # By default, the slave only has a buildlog, but callsites191 # By default, the slave only has a buildlog, but callsites
165 # can update this list as needed.192 # can update this list as needed.
@@ -167,10 +194,12 @@
167194
168 def status(self):195 def status(self):
169 self.call_log.append('status')196 self.call_log.append('status')
170 return ('BuilderStatus.WAITING', self.state, self.build_id, {},197 return defer.succeed((
171 self.dependencies)198 'BuilderStatus.WAITING', self.state, self.build_id, self.filemap,
199 self.dependencies))
172200
173 def getFile(self, hash):201 def getFile(self, hash):
202 # XXX: This needs to be updated to return a Deferred.
174 self.call_log.append('getFile')203 self.call_log.append('getFile')
175 if hash in self.valid_file_hashes:204 if hash in self.valid_file_hashes:
176 content = "This is a %s" % hash205 content = "This is a %s" % hash
@@ -184,15 +213,19 @@
184213
185 def status(self):214 def status(self):
186 self.call_log.append('status')215 self.call_log.append('status')
187 return ('BuilderStatus.ABORTING', '1-1')216 return defer.succeed(('BuilderStatus.ABORTING', '1-1'))
188217
189218
190class AbortedSlave(OkSlave):219class AbortedSlave(OkSlave):
191 """A mock slave that looks like it's aborted."""220 """A mock slave that looks like it's aborted."""
192221
222 def clean(self):
223 self.call_log.append('status')
224 return defer.succeed(None)
225
193 def status(self):226 def status(self):
194 self.call_log.append('status')227 self.call_log.append('clean')
195 return ('BuilderStatus.ABORTED', '1-1')228 return defer.succeed(('BuilderStatus.ABORTED', '1-1'))
196229
197230
198class LostBuildingBrokenSlave:231class LostBuildingBrokenSlave:
@@ -206,16 +239,108 @@
206239
207 def status(self):240 def status(self):
208 self.call_log.append('status')241 self.call_log.append('status')
209 return ('BuilderStatus.BUILDING', '1000-10000')242 return defer.succeed(('BuilderStatus.BUILDING', '1000-10000'))
210243
211 def abort(self):244 def abort(self):
212 self.call_log.append('abort')245 self.call_log.append('abort')
213 raise xmlrpclib.Fault(8002, "Could not abort")246 return defer.fail(xmlrpclib.Fault(8002, "Could not abort"))
214247
215248
216class BrokenSlave:249class BrokenSlave:
217 """A mock slave that reports that it is broken."""250 """A mock slave that reports that it is broken."""
218251
252 def __init__(self):
253 self.call_log = []
254
219 def status(self):255 def status(self):
220 self.call_log.append('status')256 self.call_log.append('status')
221 raise xmlrpclib.Fault(8001, "Broken slave")257 return defer.fail(xmlrpclib.Fault(8001, "Broken slave"))
258
259
260class CorruptBehavior:
261
262 def verifySlaveBuildCookie(self, cookie):
263 raise CorruptBuildCookie("Bad value: %r" % (cookie,))
264
265
266class TrivialBehavior:
267
268 def verifySlaveBuildCookie(self, cookie):
269 pass
270
271
272class DeadProxy(xmlrpc.Proxy):
273 """An xmlrpc.Proxy that doesn't actually send any messages.
274
275 Used when you want to test timeouts, for example.
276 """
277
278 def callRemote(self, *args, **kwargs):
279 return defer.Deferred()
280
281
282class SlaveTestHelpers(fixtures.Fixture):
283
284 # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`.
285 BASE_URL = 'http://localhost:8221'
286 TEST_URL = '%s/rpc/' % (BASE_URL,)
287
288 def getServerSlave(self):
289 """Set up a test build slave server.
290
291 :return: A `BuilddSlaveTestSetup` object.
292 """
293 tachandler = BuilddSlaveTestSetup()
294 tachandler.setUp()
295 # Basically impossible to do this w/ TrialTestCase. But it would be
296 # really nice to keep it.
297 #
298 # def addLogFile(exc_info):
299 # self.addDetail(
300 # 'xmlrpc-log-file',
301 # Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read()))
302 # self.addOnException(addLogFile)
303 self.addCleanup(tachandler.tearDown)
304 return tachandler
305
306 def getClientSlave(self, reactor=None, proxy=None):
307 """Return a `BuilderSlave` for use in testing.
308
309 Points to a fixed URL that is also used by `BuilddSlaveTestSetup`.
310 """
311 return BuilderSlave.makeBuilderSlave(
312 self.TEST_URL, 'vmhost', reactor, proxy)
313
314 def makeCacheFile(self, tachandler, filename):
315 """Make a cache file available on the remote slave.
316
317 :param tachandler: The TacTestSetup object used to start the remote
318 slave.
319 :param filename: The name of the file to create in the file cache
320 area.
321 """
322 path = os.path.join(tachandler.root, 'filecache', filename)
323 fd = open(path, 'w')
324 fd.write('something')
325 fd.close()
326 self.addCleanup(os.unlink, path)
327
328 def triggerGoodBuild(self, slave, build_id=None):
329 """Trigger a good build on 'slave'.
330
331 :param slave: A `BuilderSlave` instance to trigger the build on.
332 :param build_id: The build identifier. If not specified, defaults to
333 an arbitrary string.
334 :type build_id: str
335 :return: The build id returned by the slave.
336 """
337 if build_id is None:
338 build_id = 'random-build-id'
339 tachandler = self.getServerSlave()
340 chroot_file = 'fake-chroot'
341 dsc_file = 'thing'
342 self.makeCacheFile(tachandler, chroot_file)
343 self.makeCacheFile(tachandler, dsc_file)
344 return slave.build(
345 build_id, 'debian', chroot_file, {'.dsc': dsc_file},
346 {'ogrecomponent': 'main'})
222347
=== modified file 'lib/lp/buildmaster/tests/test_builder.py'
--- lib/lp/buildmaster/tests/test_builder.py 2010-10-06 09:06:30 +0000
+++ lib/lp/buildmaster/tests/test_builder.py 2010-10-25 19:14:01 +0000
@@ -3,20 +3,24 @@
33
4"""Test Builder features."""4"""Test Builder features."""
55
6import errno
7import os6import os
8import socket7import signal
9import xmlrpclib8import xmlrpclib
109
11from testtools.content import Content10from twisted.web.client import getPage
12from testtools.content_type import UTF8_TEXT11
12from twisted.internet.defer import CancelledError
13from twisted.internet.task import Clock
14from twisted.python.failure import Failure
15from twisted.trial.unittest import TestCase as TrialTestCase
1316
14from zope.component import getUtility17from zope.component import getUtility
15from zope.security.proxy import removeSecurityProxy18from zope.security.proxy import removeSecurityProxy
1619
17from canonical.buildd.slave import BuilderStatus20from canonical.buildd.slave import BuilderStatus
18from canonical.buildd.tests.harness import BuilddSlaveTestSetup21from canonical.config import config
19from canonical.database.sqlbase import flush_database_updates22from canonical.database.sqlbase import flush_database_updates
23from canonical.launchpad.scripts import QuietFakeLogger
20from canonical.launchpad.webapp.interfaces import (24from canonical.launchpad.webapp.interfaces import (
21 DEFAULT_FLAVOR,25 DEFAULT_FLAVOR,
22 IStoreSelector,26 IStoreSelector,
@@ -24,21 +28,38 @@
24 )28 )
25from canonical.testing.layers import (29from canonical.testing.layers import (
26 DatabaseFunctionalLayer,30 DatabaseFunctionalLayer,
27 LaunchpadZopelessLayer31 LaunchpadZopelessLayer,
32 TwistedLaunchpadZopelessLayer,
33 TwistedLayer,
28 )34 )
29from lp.buildmaster.enums import BuildStatus35from lp.buildmaster.enums import BuildStatus
30from lp.buildmaster.interfaces.builder import IBuilder, IBuilderSet36from lp.buildmaster.interfaces.builder import (
37 CannotFetchFile,
38 IBuilder,
39 IBuilderSet,
40 )
31from lp.buildmaster.interfaces.buildfarmjobbehavior import (41from lp.buildmaster.interfaces.buildfarmjobbehavior import (
32 IBuildFarmJobBehavior,42 IBuildFarmJobBehavior,
33 )43 )
34from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet44from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
35from lp.buildmaster.model.builder import BuilderSlave45from lp.buildmaster.interfaces.builder import CannotResumeHost
36from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior46from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior
37from lp.buildmaster.model.buildqueue import BuildQueue47from lp.buildmaster.model.buildqueue import BuildQueue
38from lp.buildmaster.tests.mock_slaves import (48from lp.buildmaster.tests.mock_slaves import (
39 AbortedSlave,49 AbortedSlave,
50 AbortingSlave,
51 BrokenSlave,
52 BuildingSlave,
53 CorruptBehavior,
54 DeadProxy,
55 LostBuildingBrokenSlave,
40 MockBuilder,56 MockBuilder,
57 OkSlave,
58 SlaveTestHelpers,
59 TrivialBehavior,
60 WaitingSlave,
41 )61 )
62from lp.services.job.interfaces.job import JobStatus
42from lp.soyuz.enums import (63from lp.soyuz.enums import (
43 ArchivePurpose,64 ArchivePurpose,
44 PackagePublishingStatus,65 PackagePublishingStatus,
@@ -49,9 +70,12 @@
49 )70 )
50from lp.soyuz.tests.test_publishing import SoyuzTestPublisher71from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
51from lp.testing import (72from lp.testing import (
52 TestCase,73 ANONYMOUS,
74 login_as,
75 logout,
53 TestCaseWithFactory,76 TestCaseWithFactory,
54 )77 )
78from lp.testing.factory import LaunchpadObjectFactory
55from lp.testing.fakemethod import FakeMethod79from lp.testing.fakemethod import FakeMethod
5680
5781
@@ -92,42 +116,121 @@
92 bq = builder.getBuildQueue()116 bq = builder.getBuildQueue()
93 self.assertIs(None, bq)117 self.assertIs(None, bq)
94118
95 def test_updateBuilderStatus_catches_repeated_EINTR(self):119
96 # A single EINTR return from a socket operation should cause the120class TestBuilderWithTrial(TrialTestCase):
97 # operation to be retried, not fail/reset the builder.121
98 builder = removeSecurityProxy(self.factory.makeBuilder())122 layer = TwistedLaunchpadZopelessLayer
99 builder.handleTimeout = FakeMethod()123
100 builder.rescueIfLost = FakeMethod()124 def setUp(self):
101125 super(TestBuilderWithTrial, self)
102 def _fake_checkSlaveAlive():126 self.slave_helper = SlaveTestHelpers()
103 # Raise an EINTR error for all invocations.127 self.slave_helper.setUp()
104 raise socket.error(errno.EINTR, "fake eintr")128 self.addCleanup(self.slave_helper.cleanUp)
105129 self.factory = LaunchpadObjectFactory()
106 builder.checkSlaveAlive = _fake_checkSlaveAlive130 login_as(ANONYMOUS)
107 builder.updateStatus()131 self.addCleanup(logout)
108132
109 # builder.updateStatus should eventually have called133 def test_updateStatus_aborts_lost_and_broken_slave(self):
110 # handleTimeout()134 # A slave that's 'lost' should be aborted; when the slave is
111 self.assertEqual(1, builder.handleTimeout.call_count)135 # broken then abort() should also throw a fault.
112136 slave = LostBuildingBrokenSlave()
113 def test_updateBuilderStatus_catches_single_EINTR(self):137 lostbuilding_builder = MockBuilder(
114 builder = removeSecurityProxy(self.factory.makeBuilder())138 'Lost Building Broken Slave', slave, behavior=CorruptBehavior())
115 builder.handleTimeout = FakeMethod()139 d = lostbuilding_builder.updateStatus(QuietFakeLogger())
116 builder.rescueIfLost = FakeMethod()140 def check_slave_status(failure):
117 self.eintr_returned = False141 self.assertIn('abort', slave.call_log)
118142 # 'Fault' comes from the LostBuildingBrokenSlave, this is
119 def _fake_checkSlaveAlive():143 # just testing that the value is passed through.
120 # raise an EINTR error for the first invocation only.144 self.assertIsInstance(failure.value, xmlrpclib.Fault)
121 if not self.eintr_returned:145 return d.addBoth(check_slave_status)
122 self.eintr_returned = True146
123 raise socket.error(errno.EINTR, "fake eintr")147 def test_resumeSlaveHost_nonvirtual(self):
124148 builder = self.factory.makeBuilder(virtualized=False)
125 builder.checkSlaveAlive = _fake_checkSlaveAlive149 d = builder.resumeSlaveHost()
126 builder.updateStatus()150 return self.assertFailure(d, CannotResumeHost)
127151
128 # builder.updateStatus should never call handleTimeout() for a152 def test_resumeSlaveHost_no_vmhost(self):
129 # single EINTR.153 builder = self.factory.makeBuilder(virtualized=True, vm_host=None)
130 self.assertEqual(0, builder.handleTimeout.call_count)154 d = builder.resumeSlaveHost()
155 return self.assertFailure(d, CannotResumeHost)
156
157 def test_resumeSlaveHost_success(self):
158 reset_config = """
159 [builddmaster]
160 vm_resume_command: /bin/echo -n parp"""
161 config.push('reset', reset_config)
162 self.addCleanup(config.pop, 'reset')
163
164 builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
165 d = builder.resumeSlaveHost()
166 def got_resume(output):
167 self.assertEqual(('parp', ''), output)
168 return d.addCallback(got_resume)
169
170 def test_resumeSlaveHost_command_failed(self):
171 reset_fail_config = """
172 [builddmaster]
173 vm_resume_command: /bin/false"""
174 config.push('reset fail', reset_fail_config)
175 self.addCleanup(config.pop, 'reset fail')
176 builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
177 d = builder.resumeSlaveHost()
178 return self.assertFailure(d, CannotResumeHost)
179
180 def test_handleTimeout_resume_failure(self):
181 reset_fail_config = """
182 [builddmaster]
183 vm_resume_command: /bin/false"""
184 config.push('reset fail', reset_fail_config)
185 self.addCleanup(config.pop, 'reset fail')
186 builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
187 builder.builderok = True
188 d = builder.handleTimeout(QuietFakeLogger(), 'blah')
189 return self.assertFailure(d, CannotResumeHost)
190
191 def _setupRecipeBuildAndBuilder(self):
192 # Helper function to make a builder capable of building a
193 # recipe, returning both.
194 processor = self.factory.makeProcessor(name="i386")
195 builder = self.factory.makeBuilder(
196 processor=processor, virtualized=True, vm_host="bladh")
197 builder.setSlaveForTesting(OkSlave())
198 distroseries = self.factory.makeDistroSeries()
199 das = self.factory.makeDistroArchSeries(
200 distroseries=distroseries, architecturetag="i386",
201 processorfamily=processor.family)
202 chroot = self.factory.makeLibraryFileAlias()
203 das.addOrUpdateChroot(chroot)
204 distroseries.nominatedarchindep = das
205 build = self.factory.makeSourcePackageRecipeBuild(
206 distroseries=distroseries)
207 return builder, build
208
209 def test_findAndStartJob_returns_candidate(self):
210 # findAndStartJob finds the next queued job using _findBuildCandidate.
211 # We don't care about the type of build at all.
212 builder, build = self._setupRecipeBuildAndBuilder()
213 candidate = build.queueBuild()
214 # _findBuildCandidate is tested elsewhere, we just make sure that
215 # findAndStartJob delegates to it.
216 removeSecurityProxy(builder)._findBuildCandidate = FakeMethod(
217 result=candidate)
218 d = builder.findAndStartJob()
219 return d.addCallback(self.assertEqual, candidate)
220
221 def test_findAndStartJob_starts_job(self):
222 # findAndStartJob finds the next queued job using _findBuildCandidate
223 # and then starts it.
224 # We don't care about the type of build at all.
225 builder, build = self._setupRecipeBuildAndBuilder()
226 candidate = build.queueBuild()
227 removeSecurityProxy(builder)._findBuildCandidate = FakeMethod(
228 result=candidate)
229 d = builder.findAndStartJob()
230 def check_build_started(candidate):
231 self.assertEqual(candidate.builder, builder)
232 self.assertEqual(BuildStatus.BUILDING, build.status)
233 return d.addCallback(check_build_started)
131234
132 def test_slave(self):235 def test_slave(self):
133 # Builder.slave is a BuilderSlave that points at the actual Builder.236 # Builder.slave is a BuilderSlave that points at the actual Builder.
@@ -136,25 +239,147 @@
136 builder = removeSecurityProxy(self.factory.makeBuilder())239 builder = removeSecurityProxy(self.factory.makeBuilder())
137 self.assertEqual(builder.url, builder.slave.url)240 self.assertEqual(builder.url, builder.slave.url)
138241
139
140class Test_rescueBuilderIfLost(TestCaseWithFactory):
141 """Tests for lp.buildmaster.model.builder.rescueBuilderIfLost."""
142
143 layer = LaunchpadZopelessLayer
144
145 def test_recovery_of_aborted_slave(self):242 def test_recovery_of_aborted_slave(self):
146 # If a slave is in the ABORTED state, rescueBuilderIfLost should243 # If a slave is in the ABORTED state, rescueBuilderIfLost should
147 # clean it if we don't think it's currently building anything.244 # clean it if we don't think it's currently building anything.
148 # See bug 463046.245 # See bug 463046.
149 aborted_slave = AbortedSlave()246 aborted_slave = AbortedSlave()
150 # The slave's clean() method is normally an XMLRPC call, so we
151 # can just stub it out and check that it got called.
152 aborted_slave.clean = FakeMethod()
153 builder = MockBuilder("mock_builder", aborted_slave)247 builder = MockBuilder("mock_builder", aborted_slave)
154 builder.currentjob = None248 builder.currentjob = None
155 builder.rescueIfLost()249 d = builder.rescueIfLost()
156250 def check_slave_calls(ignored):
157 self.assertEqual(1, aborted_slave.clean.call_count)251 self.assertIn('clean', aborted_slave.call_log)
252 return d.addCallback(check_slave_calls)
253
254 def test_recover_ok_slave(self):
255 # An idle slave is not rescued.
256 slave = OkSlave()
257 builder = MockBuilder("mock_builder", slave, TrivialBehavior())
258 d = builder.rescueIfLost()
259 def check_slave_calls(ignored):
260 self.assertNotIn('abort', slave.call_log)
261 self.assertNotIn('clean', slave.call_log)
262 return d.addCallback(check_slave_calls)
263
264 def test_recover_waiting_slave_with_good_id(self):
265 # rescueIfLost does not attempt to abort or clean a builder that is
266 # WAITING.
267 waiting_slave = WaitingSlave()
268 builder = MockBuilder("mock_builder", waiting_slave, TrivialBehavior())
269 d = builder.rescueIfLost()
270 def check_slave_calls(ignored):
271 self.assertNotIn('abort', waiting_slave.call_log)
272 self.assertNotIn('clean', waiting_slave.call_log)
273 return d.addCallback(check_slave_calls)
274
275 def test_recover_waiting_slave_with_bad_id(self):
276 # If a slave is WAITING with a build for us to get, and the build
277 # cookie cannot be verified, which means we don't recognize the build,
278 # then rescueBuilderIfLost should attempt to abort it, so that the
279 # builder is reset for a new build, and the corrupt build is
280 # discarded.
281 waiting_slave = WaitingSlave()
282 builder = MockBuilder("mock_builder", waiting_slave, CorruptBehavior())
283 d = builder.rescueIfLost()
284 def check_slave_calls(ignored):
285 self.assertNotIn('abort', waiting_slave.call_log)
286 self.assertIn('clean', waiting_slave.call_log)
287 return d.addCallback(check_slave_calls)
288
289 def test_recover_building_slave_with_good_id(self):
290 # rescueIfLost does not attempt to abort or clean a builder that is
291 # BUILDING.
292 building_slave = BuildingSlave()
293 builder = MockBuilder("mock_builder", building_slave, TrivialBehavior())
294 d = builder.rescueIfLost()
295 def check_slave_calls(ignored):
296 self.assertNotIn('abort', building_slave.call_log)
297 self.assertNotIn('clean', building_slave.call_log)
298 return d.addCallback(check_slave_calls)
299
300 def test_recover_building_slave_with_bad_id(self):
301 # If a slave is BUILDING with a build id we don't recognize, then we
302 # abort the build, thus stopping it in its tracks.
303 building_slave = BuildingSlave()
304 builder = MockBuilder("mock_builder", building_slave, CorruptBehavior())
305 d = builder.rescueIfLost()
306 def check_slave_calls(ignored):
307 self.assertIn('abort', building_slave.call_log)
308 self.assertNotIn('clean', building_slave.call_log)
309 return d.addCallback(check_slave_calls)
310
311
312class TestBuilderSlaveStatus(TestBuilderWithTrial):
313
314 # Verify what IBuilder.slaveStatus returns with slaves in different
315 # states.
316
317 def assertStatus(self, slave, builder_status=None,
318 build_status=None, logtail=False, filemap=None,
319 dependencies=None):
320 builder = self.factory.makeBuilder()
321 builder.setSlaveForTesting(slave)
322 d = builder.slaveStatus()
323
324 def got_status(status_dict):
325 expected = {}
326 if builder_status is not None:
327 expected["builder_status"] = builder_status
328 if build_status is not None:
329 expected["build_status"] = build_status
330 if dependencies is not None:
331 expected["dependencies"] = dependencies
332
333 # We don't care so much about the content of the logtail,
334 # just that it's there.
335 if logtail:
336 tail = status_dict.pop("logtail")
337 self.assertIsInstance(tail, xmlrpclib.Binary)
338
339 self.assertEqual(expected, status_dict)
340
341 return d.addCallback(got_status)
342
343 def test_slaveStatus_idle_slave(self):
344 self.assertStatus(
345 OkSlave(), builder_status='BuilderStatus.IDLE')
346
347 def test_slaveStatus_building_slave(self):
348 self.assertStatus(
349 BuildingSlave(), builder_status='BuilderStatus.BUILDING',
350 logtail=True)
351
352 def test_slaveStatus_waiting_slave(self):
353 self.assertStatus(
354 WaitingSlave(), builder_status='BuilderStatus.WAITING',
355 build_status='BuildStatus.OK', filemap={})
356
357 def test_slaveStatus_aborting_slave(self):
358 self.assertStatus(
359 AbortingSlave(), builder_status='BuilderStatus.ABORTING')
360
361 def test_slaveStatus_aborted_slave(self):
362 self.assertStatus(
363 AbortedSlave(), builder_status='BuilderStatus.ABORTED')
364
365 def test_isAvailable_with_not_builderok(self):
366 # isAvailable() is a wrapper around slaveStatusSentence()
367 builder = self.factory.makeBuilder()
368 builder.builderok = False
369 d = builder.isAvailable()
370 return d.addCallback(self.assertFalse)
371
372 def test_isAvailable_with_slave_fault(self):
373 builder = self.factory.makeBuilder()
374 builder.setSlaveForTesting(BrokenSlave())
375 d = builder.isAvailable()
376 return d.addCallback(self.assertFalse)
377
378 def test_isAvailable_with_slave_idle(self):
379 builder = self.factory.makeBuilder()
380 builder.setSlaveForTesting(OkSlave())
381 d = builder.isAvailable()
382 return d.addCallback(self.assertTrue)
158383
159384
160class TestFindBuildCandidateBase(TestCaseWithFactory):385class TestFindBuildCandidateBase(TestCaseWithFactory):
@@ -188,6 +413,49 @@
188 builder.manual = False413 builder.manual = False
189414
190415
416class TestFindBuildCandidateGeneralCases(TestFindBuildCandidateBase):
417 # Test usage of findBuildCandidate not specific to any archive type.
418
419 def test_findBuildCandidate_supersedes_builds(self):
420 # IBuilder._findBuildCandidate identifies if there are builds
421 # for superseded source package releases in the queue and marks
422 # the corresponding build record as SUPERSEDED.
423 archive = self.factory.makeArchive()
424 self.publisher.getPubSource(
425 sourcename="gedit", status=PackagePublishingStatus.PUBLISHED,
426 archive=archive).createMissingBuilds()
427 old_candidate = removeSecurityProxy(
428 self.frog_builder)._findBuildCandidate()
429
430 # The candidate starts off as NEEDSBUILD:
431 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(
432 old_candidate)
433 self.assertEqual(BuildStatus.NEEDSBUILD, build.status)
434
435 # Now supersede the source package:
436 publication = build.current_source_publication
437 publication.status = PackagePublishingStatus.SUPERSEDED
438
439 # The candidate returned is now a different one:
440 new_candidate = removeSecurityProxy(
441 self.frog_builder)._findBuildCandidate()
442 self.assertNotEqual(new_candidate, old_candidate)
443
444 # And the old_candidate is superseded:
445 self.assertEqual(BuildStatus.SUPERSEDED, build.status)
446
447 def test_acquireBuildCandidate_marks_building(self):
448 # acquireBuildCandidate() should call _findBuildCandidate and
449 # mark the build as building.
450 archive = self.factory.makeArchive()
451 self.publisher.getPubSource(
452 sourcename="gedit", status=PackagePublishingStatus.PUBLISHED,
453 archive=archive).createMissingBuilds()
454 candidate = removeSecurityProxy(
455 self.frog_builder).acquireBuildCandidate()
456 self.assertEqual(JobStatus.RUNNING, candidate.job.status)
457
458
191class TestFindBuildCandidatePPAWithSingleBuilder(TestCaseWithFactory):459class TestFindBuildCandidatePPAWithSingleBuilder(TestCaseWithFactory):
192460
193 layer = LaunchpadZopelessLayer461 layer = LaunchpadZopelessLayer
@@ -320,6 +588,16 @@
320 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)588 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)
321 self.failUnlessEqual('joesppa', build.archive.name)589 self.failUnlessEqual('joesppa', build.archive.name)
322590
591 def test_findBuildCandidate_with_disabled_archive(self):
592 # Disabled archives should not be considered for dispatching
593 # builds.
594 disabled_job = removeSecurityProxy(self.builder4)._findBuildCandidate()
595 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(
596 disabled_job)
597 build.archive.disable()
598 next_job = removeSecurityProxy(self.builder4)._findBuildCandidate()
599 self.assertNotEqual(disabled_job, next_job)
600
323601
324class TestFindBuildCandidatePrivatePPA(TestFindBuildCandidatePPABase):602class TestFindBuildCandidatePrivatePPA(TestFindBuildCandidatePPABase):
325603
@@ -332,6 +610,14 @@
332 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)610 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)
333 self.failUnlessEqual('joesppa', build.archive.name)611 self.failUnlessEqual('joesppa', build.archive.name)
334612
613 # If the source for the build is still pending, it won't be
614 # dispatched because the builder has to fetch the source files
615 # from the (password protected) repo area, not the librarian.
616 pub = build.current_source_publication
617 pub.status = PackagePublishingStatus.PENDING
618 candidate = removeSecurityProxy(self.builder4)._findBuildCandidate()
619 self.assertNotEqual(next_job.id, candidate.id)
620
335621
336class TestFindBuildCandidateDistroArchive(TestFindBuildCandidateBase):622class TestFindBuildCandidateDistroArchive(TestFindBuildCandidateBase):
337623
@@ -474,97 +760,48 @@
474 self.builder.current_build_behavior, BinaryPackageBuildBehavior)760 self.builder.current_build_behavior, BinaryPackageBuildBehavior)
475761
476762
477class TestSlave(TestCase):763class TestSlave(TrialTestCase):
478 """764 """
479 Integration tests for BuilderSlave that verify how it works against a765 Integration tests for BuilderSlave that verify how it works against a
480 real slave server.766 real slave server.
481 """767 """
482768
769 layer = TwistedLayer
770
771 def setUp(self):
772 super(TestSlave, self).setUp()
773 self.slave_helper = SlaveTestHelpers()
774 self.slave_helper.setUp()
775 self.addCleanup(self.slave_helper.cleanUp)
776
483 # XXX: JonathanLange 2010-09-20 bug=643521: There are also tests for777 # XXX: JonathanLange 2010-09-20 bug=643521: There are also tests for
484 # BuilderSlave in buildd-slave.txt and in other places. The tests here778 # BuilderSlave in buildd-slave.txt and in other places. The tests here
485 # ought to become the canonical tests for BuilderSlave vs running buildd779 # ought to become the canonical tests for BuilderSlave vs running buildd
486 # XML-RPC server interaction.780 # XML-RPC server interaction.
487781
488 # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`.
489 TEST_URL = 'http://localhost:8221/rpc/'
490
491 def getServerSlave(self):
492 """Set up a test build slave server.
493
494 :return: A `BuilddSlaveTestSetup` object.
495 """
496 tachandler = BuilddSlaveTestSetup()
497 tachandler.setUp()
498 self.addCleanup(tachandler.tearDown)
499 def addLogFile(exc_info):
500 self.addDetail(
501 'xmlrpc-log-file',
502 Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read()))
503 self.addOnException(addLogFile)
504 return tachandler
505
506 def getClientSlave(self):
507 """Return a `BuilderSlave` for use in testing.
508
509 Points to a fixed URL that is also used by `BuilddSlaveTestSetup`.
510 """
511 return BuilderSlave.makeBlockingSlave(self.TEST_URL, 'vmhost')
512
513 def makeCacheFile(self, tachandler, filename):
514 """Make a cache file available on the remote slave.
515
516 :param tachandler: The TacTestSetup object used to start the remote
517 slave.
518 :param filename: The name of the file to create in the file cache
519 area.
520 """
521 path = os.path.join(tachandler.root, 'filecache', filename)
522 fd = open(path, 'w')
523 fd.write('something')
524 fd.close()
525 self.addCleanup(os.unlink, path)
526
527 def triggerGoodBuild(self, slave, build_id=None):
528 """Trigger a good build on 'slave'.
529
530 :param slave: A `BuilderSlave` instance to trigger the build on.
531 :param build_id: The build identifier. If not specified, defaults to
532 an arbitrary string.
533 :type build_id: str
534 :return: The build id returned by the slave.
535 """
536 if build_id is None:
537 build_id = self.getUniqueString()
538 tachandler = self.getServerSlave()
539 chroot_file = 'fake-chroot'
540 dsc_file = 'thing'
541 self.makeCacheFile(tachandler, chroot_file)
542 self.makeCacheFile(tachandler, dsc_file)
543 return slave.build(
544 build_id, 'debian', chroot_file, {'.dsc': dsc_file},
545 {'ogrecomponent': 'main'})
546
547 # XXX 2010-10-06 Julian bug=655559782 # XXX 2010-10-06 Julian bug=655559
548 # This is failing on buildbot but not locally; it's trying to abort783 # This is failing on buildbot but not locally; it's trying to abort
549 # before the build has started.784 # before the build has started.
550 def disabled_test_abort(self):785 def disabled_test_abort(self):
551 slave = self.getClientSlave()786 slave = self.slave_helper.getClientSlave()
552 # We need to be in a BUILDING state before we can abort.787 # We need to be in a BUILDING state before we can abort.
553 self.triggerGoodBuild(slave)788 d = self.slave_helper.triggerGoodBuild(slave)
554 result = slave.abort()789 d.addCallback(lambda ignored: slave.abort())
555 self.assertEqual(result, BuilderStatus.ABORTING)790 d.addCallback(self.assertEqual, BuilderStatus.ABORTING)
791 return d
556792
557 def test_build(self):793 def test_build(self):
558 # Calling 'build' with an expected builder type, a good build id,794 # Calling 'build' with an expected builder type, a good build id,
559 # valid chroot & filemaps works and returns a BuilderStatus of795 # valid chroot & filemaps works and returns a BuilderStatus of
560 # BUILDING.796 # BUILDING.
561 build_id = 'some-id'797 build_id = 'some-id'
562 slave = self.getClientSlave()798 slave = self.slave_helper.getClientSlave()
563 result = self.triggerGoodBuild(slave, build_id)799 d = self.slave_helper.triggerGoodBuild(slave, build_id)
564 self.assertEqual([BuilderStatus.BUILDING, build_id], result)800 return d.addCallback(
801 self.assertEqual, [BuilderStatus.BUILDING, build_id])
565802
566 def test_clean(self):803 def test_clean(self):
567 slave = self.getClientSlave()804 slave = self.slave_helper.getClientSlave()
568 # XXX: JonathanLange 2010-09-21: Calling clean() on the slave requires805 # XXX: JonathanLange 2010-09-21: Calling clean() on the slave requires
569 # it to be in either the WAITING or ABORTED states, and both of these806 # it to be in either the WAITING or ABORTED states, and both of these
570 # states are very difficult to achieve in a test environment. For the807 # states are very difficult to achieve in a test environment. For the
@@ -574,57 +811,248 @@
574 def test_echo(self):811 def test_echo(self):
575 # Calling 'echo' contacts the server which returns the arguments we812 # Calling 'echo' contacts the server which returns the arguments we
576 # gave it.813 # gave it.
577 self.getServerSlave()814 self.slave_helper.getServerSlave()
578 slave = self.getClientSlave()815 slave = self.slave_helper.getClientSlave()
579 result = slave.echo('foo', 'bar', 42)816 d = slave.echo('foo', 'bar', 42)
580 self.assertEqual(['foo', 'bar', 42], result)817 return d.addCallback(self.assertEqual, ['foo', 'bar', 42])
581818
582 def test_info(self):819 def test_info(self):
583 # Calling 'info' gets some information about the slave.820 # Calling 'info' gets some information about the slave.
584 self.getServerSlave()821 self.slave_helper.getServerSlave()
585 slave = self.getClientSlave()822 slave = self.slave_helper.getClientSlave()
586 result = slave.info()823 d = slave.info()
587 # We're testing the hard-coded values, since the version is hard-coded824 # We're testing the hard-coded values, since the version is hard-coded
588 # into the remote slave, the supported build managers are hard-coded825 # into the remote slave, the supported build managers are hard-coded
589 # into the tac file for the remote slave and config is returned from826 # into the tac file for the remote slave and config is returned from
590 # the configuration file.827 # the configuration file.
591 self.assertEqual(828 return d.addCallback(
829 self.assertEqual,
592 ['1.0',830 ['1.0',
593 'i386',831 'i386',
594 ['sourcepackagerecipe',832 ['sourcepackagerecipe',
595 'translation-templates', 'binarypackage', 'debian']],833 'translation-templates', 'binarypackage', 'debian']])
596 result)
597834
598 def test_initial_status(self):835 def test_initial_status(self):
599 # Calling 'status' returns the current status of the slave. The836 # Calling 'status' returns the current status of the slave. The
600 # initial status is IDLE.837 # initial status is IDLE.
601 self.getServerSlave()838 self.slave_helper.getServerSlave()
602 slave = self.getClientSlave()839 slave = self.slave_helper.getClientSlave()
603 status = slave.status()840 d = slave.status()
604 self.assertEqual([BuilderStatus.IDLE, ''], status)841 return d.addCallback(self.assertEqual, [BuilderStatus.IDLE, ''])
605842
606 def test_status_after_build(self):843 def test_status_after_build(self):
607 # Calling 'status' returns the current status of the slave. After a844 # Calling 'status' returns the current status of the slave. After a
608 # build has been triggered, the status is BUILDING.845 # build has been triggered, the status is BUILDING.
609 slave = self.getClientSlave()846 slave = self.slave_helper.getClientSlave()
610 build_id = 'status-build-id'847 build_id = 'status-build-id'
611 self.triggerGoodBuild(slave, build_id)848 d = self.slave_helper.triggerGoodBuild(slave, build_id)
612 status = slave.status()849 d.addCallback(lambda ignored: slave.status())
613 self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2])850 def check_status(status):
614 [log_file] = status[2:]851 self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2])
615 self.assertIsInstance(log_file, xmlrpclib.Binary)852 [log_file] = status[2:]
853 self.assertIsInstance(log_file, xmlrpclib.Binary)
854 return d.addCallback(check_status)
616855
617 def test_ensurepresent_not_there(self):856 def test_ensurepresent_not_there(self):
618 # ensurepresent checks to see if a file is there.857 # ensurepresent checks to see if a file is there.
619 self.getServerSlave()858 self.slave_helper.getServerSlave()
620 slave = self.getClientSlave()859 slave = self.slave_helper.getClientSlave()
621 result = slave.ensurepresent('blahblah', None, None, None)860 d = slave.ensurepresent('blahblah', None, None, None)
622 self.assertEqual([False, 'No URL'], result)861 d.addCallback(self.assertEqual, [False, 'No URL'])
862 return d
623863
624 def test_ensurepresent_actually_there(self):864 def test_ensurepresent_actually_there(self):
625 # ensurepresent checks to see if a file is there.865 # ensurepresent checks to see if a file is there.
626 tachandler = self.getServerSlave()866 tachandler = self.slave_helper.getServerSlave()
627 slave = self.getClientSlave()867 slave = self.slave_helper.getClientSlave()
628 self.makeCacheFile(tachandler, 'blahblah')868 self.slave_helper.makeCacheFile(tachandler, 'blahblah')
629 result = slave.ensurepresent('blahblah', None, None, None)869 d = slave.ensurepresent('blahblah', None, None, None)
630 self.assertEqual([True, 'No URL'], result)870 d.addCallback(self.assertEqual, [True, 'No URL'])
871 return d
872
873 def test_sendFileToSlave_not_there(self):
874 self.slave_helper.getServerSlave()
875 slave = self.slave_helper.getClientSlave()
876 d = slave.sendFileToSlave('blahblah', None, None, None)
877 return self.assertFailure(d, CannotFetchFile)
878
879 def test_sendFileToSlave_actually_there(self):
880 tachandler = self.slave_helper.getServerSlave()
881 slave = self.slave_helper.getClientSlave()
882 self.slave_helper.makeCacheFile(tachandler, 'blahblah')
883 d = slave.sendFileToSlave('blahblah', None, None, None)
884 def check_present(ignored):
885 d = slave.ensurepresent('blahblah', None, None, None)
886 return d.addCallback(self.assertEqual, [True, 'No URL'])
887 d.addCallback(check_present)
888 return d
889
890 def test_resumeHost_success(self):
891 # On a successful resume resume() fires the returned deferred
892 # callback with 'None'.
893 self.slave_helper.getServerSlave()
894 slave = self.slave_helper.getClientSlave()
895
896 # The configuration testing command-line.
897 self.assertEqual(
898 'echo %(vm_host)s', config.builddmaster.vm_resume_command)
899
900 # On success the response is None.
901 def check_resume_success(response):
902 out, err, code = response
903 self.assertEqual(os.EX_OK, code)
904 # XXX: JonathanLange 2010-09-23: We should instead pass the
905 # expected vm_host into the client slave. Not doing this now,
906 # since the SlaveHelper is being moved around.
907 self.assertEqual("%s\n" % slave._vm_host, out)
908 d = slave.resume()
909 d.addBoth(check_resume_success)
910 return d
911
912 def test_resumeHost_failure(self):
913 # On a failed resume, 'resumeHost' fires the returned deferred
914 # errorback with the `ProcessTerminated` failure.
915 self.slave_helper.getServerSlave()
916 slave = self.slave_helper.getClientSlave()
917
918 # Override the configuration command-line with one that will fail.
919 failed_config = """
920 [builddmaster]
921 vm_resume_command: test "%(vm_host)s = 'no-sir'"
922 """
923 config.push('failed_resume_command', failed_config)
924 self.addCleanup(config.pop, 'failed_resume_command')
925
926 # On failures, the response is a twisted `Failure` object containing
927 # a tuple.
928 def check_resume_failure(failure):
929 out, err, code = failure.value
930 # The process will exit with a return code of "1".
931 self.assertEqual(code, 1)
932 d = slave.resume()
933 d.addBoth(check_resume_failure)
934 return d
935
936 def test_resumeHost_timeout(self):
937 # On a resume timeouts, 'resumeHost' fires the returned deferred
938 # errorback with the `TimeoutError` failure.
939 self.slave_helper.getServerSlave()
940 slave = self.slave_helper.getClientSlave()
941
942 # Override the configuration command-line with one that will timeout.
943 timeout_config = """
944 [builddmaster]
945 vm_resume_command: sleep 5
946 socket_timeout: 1
947 """
948 config.push('timeout_resume_command', timeout_config)
949 self.addCleanup(config.pop, 'timeout_resume_command')
950
951 # On timeouts, the response is a twisted `Failure` object containing
952 # a `TimeoutError` error.
953 def check_resume_timeout(failure):
954 self.assertIsInstance(failure, Failure)
955 out, err, code = failure.value
956 self.assertEqual(code, signal.SIGKILL)
957 clock = Clock()
958 d = slave.resume(clock=clock)
959 # Move the clock beyond the socket_timeout but earlier than the
960 # sleep 5. This stops the test having to wait for the timeout.
961 # Fast tests FTW!
962 clock.advance(2)
963 d.addBoth(check_resume_timeout)
964 return d
965
966
967class TestSlaveTimeouts(TrialTestCase):
968 # Testing that the methods that call callRemote() all time out
969 # as required.
970
971 layer = TwistedLayer
972
973 def setUp(self):
974 super(TestSlaveTimeouts, self).setUp()
975 self.slave_helper = SlaveTestHelpers()
976 self.slave_helper.setUp()
977 self.addCleanup(self.slave_helper.cleanUp)
978 self.clock = Clock()
979 self.proxy = DeadProxy("url")
980 self.slave = self.slave_helper.getClientSlave(
981 reactor=self.clock, proxy=self.proxy)
982
983 def assertCancelled(self, d):
984 self.clock.advance(config.builddmaster.socket_timeout + 1)
985 return self.assertFailure(d, CancelledError)
986
987 def test_timeout_abort(self):
988 return self.assertCancelled(self.slave.abort())
989
990 def test_timeout_clean(self):
991 return self.assertCancelled(self.slave.clean())
992
993 def test_timeout_echo(self):
994 return self.assertCancelled(self.slave.echo())
995
996 def test_timeout_info(self):
997 return self.assertCancelled(self.slave.info())
998
999 def test_timeout_status(self):
1000 return self.assertCancelled(self.slave.status())
1001
1002 def test_timeout_ensurepresent(self):
1003 return self.assertCancelled(
1004 self.slave.ensurepresent(None, None, None, None))
1005
1006 def test_timeout_build(self):
1007 return self.assertCancelled(
1008 self.slave.build(None, None, None, None, None))
1009
1010
1011class TestSlaveWithLibrarian(TrialTestCase):
1012 """Tests that need more of Launchpad to run."""
1013
1014 layer = TwistedLaunchpadZopelessLayer
1015
1016 def setUp(self):
1017 super(TestSlaveWithLibrarian, self)
1018 self.slave_helper = SlaveTestHelpers()
1019 self.slave_helper.setUp()
1020 self.addCleanup(self.slave_helper.cleanUp)
1021 self.factory = LaunchpadObjectFactory()
1022 login_as(ANONYMOUS)
1023 self.addCleanup(logout)
1024
1025 def test_ensurepresent_librarian(self):
1026 # ensurepresent, when given an http URL for a file will download the
1027 # file from that URL and report that the file is present, and it was
1028 # downloaded.
1029
1030 # Use the Librarian because it's a "convenient" web server.
1031 lf = self.factory.makeLibraryFileAlias(
1032 'HelloWorld.txt', content="Hello World")
1033 self.layer.txn.commit()
1034 self.slave_helper.getServerSlave()
1035 slave = self.slave_helper.getClientSlave()
1036 d = slave.ensurepresent(
1037 lf.content.sha1, lf.http_url, "", "")
1038 d.addCallback(self.assertEqual, [True, 'Download'])
1039 return d
1040
1041 def test_retrieve_files_from_filecache(self):
1042 # Files that are present on the slave can be downloaded with a
1043 # filename made from the sha1 of the content underneath the
1044 # 'filecache' directory.
1045 content = "Hello World"
1046 lf = self.factory.makeLibraryFileAlias(
1047 'HelloWorld.txt', content=content)
1048 self.layer.txn.commit()
1049 expected_url = '%s/filecache/%s' % (
1050 self.slave_helper.BASE_URL, lf.content.sha1)
1051 self.slave_helper.getServerSlave()
1052 slave = self.slave_helper.getClientSlave()
1053 d = slave.ensurepresent(
1054 lf.content.sha1, lf.http_url, "", "")
1055 def check_file(ignored):
1056 d = getPage(expected_url.encode('utf8'))
1057 return d.addCallback(self.assertEqual, content)
1058 return d.addCallback(check_file)
6311059
=== modified file 'lib/lp/buildmaster/tests/test_manager.py'
--- lib/lp/buildmaster/tests/test_manager.py 2010-09-28 11:05:14 +0000
+++ lib/lp/buildmaster/tests/test_manager.py 2010-10-25 19:14:01 +0000
@@ -6,6 +6,7 @@
6import os6import os
7import signal7import signal
8import time8import time
9import xmlrpclib
910
10import transaction11import transaction
1112
@@ -14,9 +15,7 @@
14 reactor,15 reactor,
15 task,16 task,
16 )17 )
17from twisted.internet.error import ConnectionClosed
18from twisted.internet.task import (18from twisted.internet.task import (
19 Clock,
20 deferLater,19 deferLater,
21 )20 )
22from twisted.python.failure import Failure21from twisted.python.failure import Failure
@@ -30,577 +29,45 @@
30 ANONYMOUS,29 ANONYMOUS,
31 login,30 login,
32 )31 )
33from canonical.launchpad.scripts.logger import BufferLogger32from canonical.launchpad.scripts.logger import (
33 QuietFakeLogger,
34 )
34from canonical.testing.layers import (35from canonical.testing.layers import (
35 LaunchpadScriptLayer,36 LaunchpadScriptLayer,
36 LaunchpadZopelessLayer,37 TwistedLaunchpadZopelessLayer,
37 TwistedLayer,38 TwistedLayer,
39 ZopelessDatabaseLayer,
38 )40 )
39from lp.buildmaster.enums import BuildStatus41from lp.buildmaster.enums import BuildStatus
40from lp.buildmaster.interfaces.builder import IBuilderSet42from lp.buildmaster.interfaces.builder import IBuilderSet
41from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet43from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
42from lp.buildmaster.manager import (44from lp.buildmaster.manager import (
43 BaseDispatchResult,45 assessFailureCounts,
44 buildd_success_result_map,
45 BuilddManager,46 BuilddManager,
46 FailDispatchResult,
47 NewBuildersScanner,47 NewBuildersScanner,
48 RecordingSlave,
49 ResetDispatchResult,
50 SlaveScanner,48 SlaveScanner,
51 )49 )
50from lp.buildmaster.model.builder import Builder
52from lp.buildmaster.tests.harness import BuilddManagerTestSetup51from lp.buildmaster.tests.harness import BuilddManagerTestSetup
53from lp.buildmaster.tests.mock_slaves import BuildingSlave52from lp.buildmaster.tests.mock_slaves import (
53 BrokenSlave,
54 BuildingSlave,
55 OkSlave,
56 )
54from lp.registry.interfaces.distribution import IDistributionSet57from lp.registry.interfaces.distribution import IDistributionSet
55from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet58from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet
56from lp.soyuz.tests.test_publishing import SoyuzTestPublisher59from lp.testing import TestCaseWithFactory
57from lp.testing import TestCase as LaunchpadTestCase
58from lp.testing.factory import LaunchpadObjectFactory60from lp.testing.factory import LaunchpadObjectFactory
59from lp.testing.fakemethod import FakeMethod61from lp.testing.fakemethod import FakeMethod
60from lp.testing.sampledata import BOB_THE_BUILDER_NAME62from lp.testing.sampledata import BOB_THE_BUILDER_NAME
6163
6264
63class TestRecordingSlaves(TrialTestCase):
64 """Tests for the recording slave class."""
65 layer = TwistedLayer
66
67 def setUp(self):
68 """Setup a fresh `RecordingSlave` for tests."""
69 TrialTestCase.setUp(self)
70 self.slave = RecordingSlave(
71 'foo', 'http://foo:8221/rpc', 'foo.host')
72
73 def test_representation(self):
74 """`RecordingSlave` has a custom representation.
75
76 It encloses builder name and xmlrpc url for debug purposes.
77 """
78 self.assertEqual('<foo:http://foo:8221/rpc>', repr(self.slave))
79
80 def assert_ensurepresent(self, func):
81 """Helper function to test results from calling ensurepresent."""
82 self.assertEqual(
83 [True, 'Download'],
84 func('boing', 'bar', 'baz'))
85 self.assertEqual(
86 [('ensurepresent', ('boing', 'bar', 'baz'))],
87 self.slave.calls)
88
89 def test_ensurepresent(self):
90 """`RecordingSlave.ensurepresent` always succeeds.
91
92 It returns the expected succeed code and records the interaction
93 information for later use.
94 """
95 self.assert_ensurepresent(self.slave.ensurepresent)
96
97 def test_sendFileToSlave(self):
98 """RecordingSlave.sendFileToSlave always succeeeds.
99
100 It calls ensurepresent() and hence returns the same results.
101 """
102 self.assert_ensurepresent(self.slave.sendFileToSlave)
103
104 def test_build(self):
105 """`RecordingSlave.build` always succeeds.
106
107 It returns the expected succeed code and records the interaction
108 information for later use.
109 """
110 self.assertEqual(
111 ['BuilderStatus.BUILDING', 'boing'],
112 self.slave.build('boing', 'bar', 'baz'))
113 self.assertEqual(
114 [('build', ('boing', 'bar', 'baz'))],
115 self.slave.calls)
116
117 def test_resume(self):
118 """`RecordingSlave.resume` always returns successs."""
119 # Resume isn't requested in a just-instantiated RecordingSlave.
120 self.assertFalse(self.slave.resume_requested)
121
122 # When resume is called, it returns the success list and mark
123 # the slave for resuming.
124 self.assertEqual(['', '', os.EX_OK], self.slave.resume())
125 self.assertTrue(self.slave.resume_requested)
126
127 def test_resumeHost_success(self):
128 # On a successful resume resumeHost() fires the returned deferred
129 # callback with 'None'.
130
131 # The configuration testing command-line.
132 self.assertEqual(
133 'echo %(vm_host)s', config.builddmaster.vm_resume_command)
134
135 # On success the response is None.
136 def check_resume_success(response):
137 out, err, code = response
138 self.assertEqual(os.EX_OK, code)
139 self.assertEqual("%s\n" % self.slave.vm_host, out)
140 d = self.slave.resumeSlave()
141 d.addBoth(check_resume_success)
142 return d
143
144 def test_resumeHost_failure(self):
145 # On a failed resume, 'resumeHost' fires the returned deferred
146 # errorback with the `ProcessTerminated` failure.
147
148 # Override the configuration command-line with one that will fail.
149 failed_config = """
150 [builddmaster]
151 vm_resume_command: test "%(vm_host)s = 'no-sir'"
152 """
153 config.push('failed_resume_command', failed_config)
154 self.addCleanup(config.pop, 'failed_resume_command')
155
156 # On failures, the response is a twisted `Failure` object containing
157 # a tuple.
158 def check_resume_failure(failure):
159 out, err, code = failure.value
160 # The process will exit with a return code of "1".
161 self.assertEqual(code, 1)
162 d = self.slave.resumeSlave()
163 d.addBoth(check_resume_failure)
164 return d
165
166 def test_resumeHost_timeout(self):
167 # On a resume timeouts, 'resumeHost' fires the returned deferred
168 # errorback with the `TimeoutError` failure.
169
170 # Override the configuration command-line with one that will timeout.
171 timeout_config = """
172 [builddmaster]
173 vm_resume_command: sleep 5
174 socket_timeout: 1
175 """
176 config.push('timeout_resume_command', timeout_config)
177 self.addCleanup(config.pop, 'timeout_resume_command')
178
179 # On timeouts, the response is a twisted `Failure` object containing
180 # a `TimeoutError` error.
181 def check_resume_timeout(failure):
182 self.assertIsInstance(failure, Failure)
183 out, err, code = failure.value
184 self.assertEqual(code, signal.SIGKILL)
185 clock = Clock()
186 d = self.slave.resumeSlave(clock=clock)
187 # Move the clock beyond the socket_timeout but earlier than the
188 # sleep 5. This stops the test having to wait for the timeout.
189 # Fast tests FTW!
190 clock.advance(2)
191 d.addBoth(check_resume_timeout)
192 return d
193
194
195class TestingXMLRPCProxy:
196 """This class mimics a twisted XMLRPC Proxy class."""
197
198 def __init__(self, failure_info=None):
199 self.calls = []
200 self.failure_info = failure_info
201 self.works = failure_info is None
202
203 def callRemote(self, *args):
204 self.calls.append(args)
205 if self.works:
206 result = buildd_success_result_map.get(args[0])
207 else:
208 result = 'boing'
209 return defer.succeed([result, self.failure_info])
210
211
212class TestingResetDispatchResult(ResetDispatchResult):
213 """Override the evaluation method to simply annotate the call."""
214
215 def __init__(self, slave, info=None):
216 ResetDispatchResult.__init__(self, slave, info)
217 self.processed = False
218
219 def __call__(self):
220 self.processed = True
221
222
223class TestingFailDispatchResult(FailDispatchResult):
224 """Override the evaluation method to simply annotate the call."""
225
226 def __init__(self, slave, info=None):
227 FailDispatchResult.__init__(self, slave, info)
228 self.processed = False
229
230 def __call__(self):
231 self.processed = True
232
233
234class TestingSlaveScanner(SlaveScanner):
235 """Override the dispatch result factories """
236
237 reset_result = TestingResetDispatchResult
238 fail_result = TestingFailDispatchResult
239
240
241class TestSlaveScanner(TrialTestCase):
242 """Tests for the actual build slave manager."""
243 layer = LaunchpadZopelessLayer
244
245 def setUp(self):
246 TrialTestCase.setUp(self)
247 self.manager = TestingSlaveScanner(
248 BOB_THE_BUILDER_NAME, BufferLogger())
249
250 self.fake_builder_url = 'http://bob.buildd:8221/'
251 self.fake_builder_host = 'bob.host'
252
253 # We will use an instrumented SlaveScanner instance for tests in
254 # this context.
255
256 # Stop cyclic execution and record the end of the cycle.
257 self.stopped = False
258
259 def testNextCycle():
260 self.stopped = True
261
262 self.manager.scheduleNextScanCycle = testNextCycle
263
264 # Return the testing Proxy version.
265 self.test_proxy = TestingXMLRPCProxy()
266
267 def testGetProxyForSlave(slave):
268 return self.test_proxy
269 self.manager._getProxyForSlave = testGetProxyForSlave
270
271 # Deactivate the 'scan' method.
272 def testScan():
273 pass
274 self.manager.scan = testScan
275
276 # Stop automatic collection of dispatching results.
277 def testslaveConversationEnded():
278 pass
279 self._realslaveConversationEnded = self.manager.slaveConversationEnded
280 self.manager.slaveConversationEnded = testslaveConversationEnded
281
282 def assertIsDispatchReset(self, result):
283 self.assertTrue(
284 isinstance(result, TestingResetDispatchResult),
285 'Dispatch failure did not result in a ResetBuildResult object')
286
287 def assertIsDispatchFail(self, result):
288 self.assertTrue(
289 isinstance(result, TestingFailDispatchResult),
290 'Dispatch failure did not result in a FailBuildResult object')
291
292 def test_checkResume(self):
293 """`SlaveScanner.checkResume` is chained after resume requests.
294
295 If the resume request succeed it returns None, otherwise it returns
296 a `ResetBuildResult` (the one in the test context) that will be
297 collect and evaluated later.
298
299 See `RecordingSlave.resumeHost` for more information about the resume
300 result contents.
301 """
302 slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
303
304 successful_response = ['', '', os.EX_OK]
305 result = self.manager.checkResume(successful_response, slave)
306 self.assertEqual(
307 None, result, 'Successful resume checks should return None')
308
309 failed_response = ['stdout', 'stderr', 1]
310 result = self.manager.checkResume(failed_response, slave)
311 self.assertIsDispatchReset(result)
312 self.assertEqual(
313 '<foo:http://foo.buildd:8221/> reset failure', repr(result))
314 self.assertEqual(
315 result.info, "stdout\nstderr")
316
317 def test_fail_to_resume_slave_resets_slave(self):
318 # If an attempt to resume and dispatch a slave fails, we reset the
319 # slave by calling self.reset_result(slave)().
320
321 reset_result_calls = []
322
323 class LoggingResetResult(BaseDispatchResult):
324 """A DispatchResult that logs calls to itself.
325
326 This *must* subclass BaseDispatchResult, otherwise finishCycle()
327 won't treat it like a dispatch result.
328 """
329
330 def __init__(self, slave, info=None):
331 self.slave = slave
332
333 def __call__(self):
334 reset_result_calls.append(self.slave)
335
336 # Make a failing slave that is requesting a resume.
337 slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
338 slave.resume_requested = True
339 slave.resumeSlave = lambda: deferLater(
340 reactor, 0, defer.fail, Failure(('out', 'err', 1)))
341
342 # Make the manager log the reset result calls.
343 self.manager.reset_result = LoggingResetResult
344
345 # We only care about this one slave. Reset the list of manager
346 # deferreds in case setUp did something unexpected.
347 self.manager._deferred_list = []
348
349 # Here, we're patching the slaveConversationEnded method so we can
350 # get an extra callback at the end of it, so we can
351 # verify that the reset_result was really called.
352 def _slaveConversationEnded():
353 d = self._realslaveConversationEnded()
354 return d.addCallback(
355 lambda ignored: self.assertEqual([slave], reset_result_calls))
356 self.manager.slaveConversationEnded = _slaveConversationEnded
357
358 self.manager.resumeAndDispatch(slave)
359
360 def test_failed_to_resume_slave_ready_for_reset(self):
361 # When a slave fails to resume, the manager has a Deferred in its
362 # Deferred list that is ready to fire with a ResetDispatchResult.
363
364 # Make a failing slave that is requesting a resume.
365 slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
366 slave.resume_requested = True
367 slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1)))
368
369 # We only care about this one slave. Reset the list of manager
370 # deferreds in case setUp did something unexpected.
371 self.manager._deferred_list = []
372 # Restore the slaveConversationEnded method. It's very relevant to
373 # this test.
374 self.manager.slaveConversationEnded = self._realslaveConversationEnded
375 self.manager.resumeAndDispatch(slave)
376 [d] = self.manager._deferred_list
377
378 # The Deferred for our failing slave should be ready to fire
379 # successfully with a ResetDispatchResult.
380 def check_result(result):
381 self.assertIsInstance(result, ResetDispatchResult)
382 self.assertEqual(slave, result.slave)
383 self.assertFalse(result.processed)
384 return d.addCallback(check_result)
385
386 def _setUpSlaveAndBuilder(self, builder_failure_count=None,
387 job_failure_count=None):
388 # Helper function to set up a builder and its recording slave.
389 if builder_failure_count is None:
390 builder_failure_count = 0
391 if job_failure_count is None:
392 job_failure_count = 0
393 slave = RecordingSlave(
394 BOB_THE_BUILDER_NAME, self.fake_builder_url,
395 self.fake_builder_host)
396 bob_builder = getUtility(IBuilderSet)[slave.name]
397 bob_builder.failure_count = builder_failure_count
398 bob_builder.getCurrentBuildFarmJob().failure_count = job_failure_count
399 return slave, bob_builder
400
401 def test_checkDispatch_success(self):
402 # SlaveScanner.checkDispatch returns None for a successful
403 # dispatch.
404
405 """
406 If the dispatch request fails or a unknown method is given, it
407 returns a `FailDispatchResult` (in the test context) that will
408 be evaluated later.
409
410 Builders will be marked as failed if the following responses
411 categories are received.
412
413 * Legitimate slave failures: when the response is a list with 2
414 elements but the first element ('status') does not correspond to
415 the expected 'success' result. See `buildd_success_result_map`.
416
417 * Unexpected (code) failures: when the given 'method' is unknown
418 or the response isn't a 2-element list or Failure instance.
419
420 Communication failures (a twisted `Failure` instance) will simply
421 cause the builder to be reset, a `ResetDispatchResult` object is
422 returned. In other words, network failures are ignored in this
423 stage, broken builders will be identified and marked as so
424 during 'scan()' stage.
425
426 On success dispatching it returns None.
427 """
428 slave, bob_builder = self._setUpSlaveAndBuilder(
429 builder_failure_count=0, job_failure_count=0)
430
431 # Successful legitimate response, None is returned.
432 successful_response = [
433 buildd_success_result_map.get('ensurepresent'), 'cool builder']
434 result = self.manager.checkDispatch(
435 successful_response, 'ensurepresent', slave)
436 self.assertEqual(
437 None, result, 'Successful dispatch checks should return None')
438
439 def test_checkDispatch_first_fail(self):
440 # Failed legitimate response, results in FailDispatchResult and
441 # failure_count on the job and the builder are both incremented.
442 slave, bob_builder = self._setUpSlaveAndBuilder(
443 builder_failure_count=0, job_failure_count=0)
444
445 failed_response = [False, 'uncool builder']
446 result = self.manager.checkDispatch(
447 failed_response, 'ensurepresent', slave)
448 self.assertIsDispatchFail(result)
449 self.assertEqual(
450 repr(result),
451 '<bob:%s> failure (uncool builder)' % self.fake_builder_url)
452 self.assertEqual(1, bob_builder.failure_count)
453 self.assertEqual(
454 1, bob_builder.getCurrentBuildFarmJob().failure_count)
455
456 def test_checkDispatch_second_reset_fail_by_builder(self):
457 # Twisted Failure response, results in a `FailDispatchResult`.
458 slave, bob_builder = self._setUpSlaveAndBuilder(
459 builder_failure_count=1, job_failure_count=0)
460
461 twisted_failure = Failure(ConnectionClosed('Boom!'))
462 result = self.manager.checkDispatch(
463 twisted_failure, 'ensurepresent', slave)
464 self.assertIsDispatchFail(result)
465 self.assertEqual(
466 '<bob:%s> failure (None)' % self.fake_builder_url, repr(result))
467 self.assertEqual(2, bob_builder.failure_count)
468 self.assertEqual(
469 1, bob_builder.getCurrentBuildFarmJob().failure_count)
470
471 def test_checkDispatch_second_comms_fail_by_builder(self):
472 # Unexpected response, results in a `FailDispatchResult`.
473 slave, bob_builder = self._setUpSlaveAndBuilder(
474 builder_failure_count=1, job_failure_count=0)
475
476 unexpected_response = [1, 2, 3]
477 result = self.manager.checkDispatch(
478 unexpected_response, 'build', slave)
479 self.assertIsDispatchFail(result)
480 self.assertEqual(
481 '<bob:%s> failure '
482 '(Unexpected response: [1, 2, 3])' % self.fake_builder_url,
483 repr(result))
484 self.assertEqual(2, bob_builder.failure_count)
485 self.assertEqual(
486 1, bob_builder.getCurrentBuildFarmJob().failure_count)
487
488 def test_checkDispatch_second_comms_fail_by_job(self):
489 # Unknown method was given, results in a `FailDispatchResult`.
490 # This could be caused by a faulty job which would fail the job.
491 slave, bob_builder = self._setUpSlaveAndBuilder(
492 builder_failure_count=0, job_failure_count=1)
493
494 successful_response = [
495 buildd_success_result_map.get('ensurepresent'), 'cool builder']
496 result = self.manager.checkDispatch(
497 successful_response, 'unknown-method', slave)
498 self.assertIsDispatchFail(result)
499 self.assertEqual(
500 '<bob:%s> failure '
501 '(Unknown slave method: unknown-method)' % self.fake_builder_url,
502 repr(result))
503 self.assertEqual(1, bob_builder.failure_count)
504 self.assertEqual(
505 2, bob_builder.getCurrentBuildFarmJob().failure_count)
506
507 def test_initiateDispatch(self):
508 """Check `dispatchBuild` in various scenarios.
509
510 When there are no recording slaves (i.e. no build got dispatched
511 in scan()) it simply finishes the cycle.
512
513 When there is a recording slave with pending slave calls, they are
514 performed and if they all succeed the cycle is finished with no
515 errors.
516
517 On slave call failure the chain is stopped immediately and an
518 FailDispatchResult is collected while finishing the cycle.
519 """
520 def check_no_events(results):
521 errors = [
522 r for s, r in results if isinstance(r, BaseDispatchResult)]
523 self.assertEqual(0, len(errors))
524
525 def check_events(results):
526 [error] = [r for s, r in results if r is not None]
527 self.assertEqual(
528 '<bob:%s> failure (very broken slave)'
529 % self.fake_builder_url,
530 repr(error))
531 self.assertTrue(error.processed)
532
533 def _wait_on_deferreds_then_check_no_events():
534 dl = self._realslaveConversationEnded()
535 dl.addCallback(check_no_events)
536
537 def _wait_on_deferreds_then_check_events():
538 dl = self._realslaveConversationEnded()
539 dl.addCallback(check_events)
540
541 # A functional slave charged with some interactions.
542 slave = RecordingSlave(
543 BOB_THE_BUILDER_NAME, self.fake_builder_url,
544 self.fake_builder_host)
545 slave.ensurepresent('arg1', 'arg2', 'arg3')
546 slave.build('arg1', 'arg2', 'arg3')
547
548 # If the previous step (resuming) has failed nothing gets dispatched.
549 reset_result = ResetDispatchResult(slave)
550 result = self.manager.initiateDispatch(reset_result, slave)
551 self.assertTrue(result is reset_result)
552 self.assertFalse(slave.resume_requested)
553 self.assertEqual(0, len(self.manager._deferred_list))
554
555 # Operation with the default (funcional slave), no resets or
556 # failures results are triggered.
557 slave.resume()
558 result = self.manager.initiateDispatch(None, slave)
559 self.assertEqual(None, result)
560 self.assertTrue(slave.resume_requested)
561 self.assertEqual(
562 [('ensurepresent', 'arg1', 'arg2', 'arg3'),
563 ('build', 'arg1', 'arg2', 'arg3')],
564 self.test_proxy.calls)
565 self.assertEqual(2, len(self.manager._deferred_list))
566
567 # Monkey patch the slaveConversationEnded method so we can chain a
568 # callback to check the end of the result chain.
569 self.manager.slaveConversationEnded = \
570 _wait_on_deferreds_then_check_no_events
571 events = self.manager.slaveConversationEnded()
572
573 # Create a broken slave and insert interaction that will
574 # cause the builder to be marked as fail.
575 self.test_proxy = TestingXMLRPCProxy('very broken slave')
576 slave = RecordingSlave(
577 BOB_THE_BUILDER_NAME, self.fake_builder_url,
578 self.fake_builder_host)
579 slave.ensurepresent('arg1', 'arg2', 'arg3')
580 slave.build('arg1', 'arg2', 'arg3')
581
582 result = self.manager.initiateDispatch(None, slave)
583 self.assertEqual(None, result)
584 self.assertEqual(3, len(self.manager._deferred_list))
585 self.assertEqual(
586 [('ensurepresent', 'arg1', 'arg2', 'arg3')],
587 self.test_proxy.calls)
588
589 # Monkey patch the slaveConversationEnded method so we can chain a
590 # callback to check the end of the result chain.
591 self.manager.slaveConversationEnded = \
592 _wait_on_deferreds_then_check_events
593 events = self.manager.slaveConversationEnded()
594
595 return events
596
597
598class TestSlaveScannerScan(TrialTestCase):65class TestSlaveScannerScan(TrialTestCase):
599 """Tests `SlaveScanner.scan` method.66 """Tests `SlaveScanner.scan` method.
60067
601 This method uses the old framework for scanning and dispatching builds.68 This method uses the old framework for scanning and dispatching builds.
602 """69 """
603 layer = LaunchpadZopelessLayer70 layer = TwistedLaunchpadZopelessLayer
60471
605 def setUp(self):72 def setUp(self):
606 """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest.73 """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest.
@@ -608,19 +75,18 @@
608 Also adjust the sampledata in a way a build can be dispatched to75 Also adjust the sampledata in a way a build can be dispatched to
609 'bob' builder.76 'bob' builder.
610 """77 """
78 from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
611 TwistedLayer.testSetUp()79 TwistedLayer.testSetUp()
612 TrialTestCase.setUp(self)80 TrialTestCase.setUp(self)
613 self.slave = BuilddSlaveTestSetup()81 self.slave = BuilddSlaveTestSetup()
614 self.slave.setUp()82 self.slave.setUp()
61583
616 # Creating the required chroots needed for dispatching.84 # Creating the required chroots needed for dispatching.
617 login('foo.bar@canonical.com')
618 test_publisher = SoyuzTestPublisher()85 test_publisher = SoyuzTestPublisher()
619 ubuntu = getUtility(IDistributionSet).getByName('ubuntu')86 ubuntu = getUtility(IDistributionSet).getByName('ubuntu')
620 hoary = ubuntu.getSeries('hoary')87 hoary = ubuntu.getSeries('hoary')
621 test_publisher.setUpDefaultDistroSeries(hoary)88 test_publisher.setUpDefaultDistroSeries(hoary)
622 test_publisher.addFakeChroots()89 test_publisher.addFakeChroots()
623 login(ANONYMOUS)
62490
625 def tearDown(self):91 def tearDown(self):
626 self.slave.tearDown()92 self.slave.tearDown()
@@ -628,8 +94,7 @@
628 TwistedLayer.testTearDown()94 TwistedLayer.testTearDown()
62995
630 def _resetBuilder(self, builder):96 def _resetBuilder(self, builder):
631 """Reset the given builder and it's job."""97 """Reset the given builder and its job."""
632 login('foo.bar@canonical.com')
63398
634 builder.builderok = True99 builder.builderok = True
635 job = builder.currentjob100 job = builder.currentjob
@@ -637,7 +102,6 @@
637 job.reset()102 job.reset()
638103
639 transaction.commit()104 transaction.commit()
640 login(ANONYMOUS)
641105
642 def assertBuildingJob(self, job, builder, logtail=None):106 def assertBuildingJob(self, job, builder, logtail=None):
643 """Assert the given job is building on the given builder."""107 """Assert the given job is building on the given builder."""
@@ -653,55 +117,25 @@
653 self.assertEqual(build.status, BuildStatus.BUILDING)117 self.assertEqual(build.status, BuildStatus.BUILDING)
654 self.assertEqual(job.logtail, logtail)118 self.assertEqual(job.logtail, logtail)
655119
656 def _getManager(self):120 def _getScanner(self, builder_name=None):
657 """Instantiate a SlaveScanner object.121 """Instantiate a SlaveScanner object.
658122
659 Replace its default logging handler by a testing version.123 Replace its default logging handler by a testing version.
660 """124 """
661 manager = SlaveScanner(BOB_THE_BUILDER_NAME, BufferLogger())125 if builder_name is None:
662 manager.logger.name = 'slave-scanner'126 builder_name = BOB_THE_BUILDER_NAME
127 scanner = SlaveScanner(builder_name, QuietFakeLogger())
128 scanner.logger.name = 'slave-scanner'
663129
664 return manager130 return scanner
665131
666 def _checkDispatch(self, slave, builder):132 def _checkDispatch(self, slave, builder):
667 """`SlaveScanner.scan` returns a `RecordingSlave`.133 # SlaveScanner.scan returns a slave when a dispatch was
668134 # successful. We also check that the builder has a job on it.
669 The single slave returned should match the given builder and135
670 contain interactions that should be performed asynchronously for136 self.assertTrue(slave is not None, "Expected a slave.")
671 properly dispatching the sampledata job.
672 """
673 self.assertFalse(
674 slave is None, "Unexpected recording_slaves.")
675
676 self.assertEqual(slave.name, builder.name)
677 self.assertEqual(slave.url, builder.url)
678 self.assertEqual(slave.vm_host, builder.vm_host)
679 self.assertEqual(0, builder.failure_count)137 self.assertEqual(0, builder.failure_count)
680138 self.assertTrue(builder.currentjob is not None)
681 self.assertEqual(
682 [('ensurepresent',
683 ('0feca720e2c29dafb2c900713ba560e03b758711',
684 'http://localhost:58000/93/fake_chroot.tar.gz',
685 '', '')),
686 ('ensurepresent',
687 ('4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33',
688 'http://localhost:58000/43/alsa-utils_1.0.9a-4ubuntu1.dsc',
689 '', '')),
690 ('build',
691 ('6358a89e2215e19b02bf91e2e4d009640fae5cf8',
692 'binarypackage', '0feca720e2c29dafb2c900713ba560e03b758711',
693 {'alsa-utils_1.0.9a-4ubuntu1.dsc':
694 '4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33'},
695 {'arch_indep': True,
696 'arch_tag': 'i386',
697 'archive_private': False,
698 'archive_purpose': 'PRIMARY',
699 'archives':
700 ['deb http://ftpmaster.internal/ubuntu hoary main'],
701 'build_debug_symbols': False,
702 'ogrecomponent': 'main',
703 'suite': u'hoary'}))],
704 slave.calls, "Job was not properly dispatched.")
705139
706 def testScanDispatchForResetBuilder(self):140 def testScanDispatchForResetBuilder(self):
707 # A job gets dispatched to the sampledata builder after it's reset.141 # A job gets dispatched to the sampledata builder after it's reset.
@@ -709,26 +143,27 @@
709 # Reset sampledata builder.143 # Reset sampledata builder.
710 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]144 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
711 self._resetBuilder(builder)145 self._resetBuilder(builder)
146 builder.setSlaveForTesting(OkSlave())
712 # Set this to 1 here so that _checkDispatch can make sure it's147 # Set this to 1 here so that _checkDispatch can make sure it's
713 # reset to 0 after a successful dispatch.148 # reset to 0 after a successful dispatch.
714 builder.failure_count = 1149 builder.failure_count = 1
715150
716 # Run 'scan' and check its result.151 # Run 'scan' and check its result.
717 LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)152 self.layer.txn.commit()
718 manager = self._getManager()153 self.layer.switchDbUser(config.builddmaster.dbuser)
719 d = defer.maybeDeferred(manager.scan)154 scanner = self._getScanner()
155 d = defer.maybeDeferred(scanner.scan)
720 d.addCallback(self._checkDispatch, builder)156 d.addCallback(self._checkDispatch, builder)
721 return d157 return d
722158
723 def _checkNoDispatch(self, recording_slave, builder):159 def _checkNoDispatch(self, slave, builder):
724 """Assert that no dispatch has occurred.160 """Assert that no dispatch has occurred.
725161
726 'recording_slave' is None, so no interations would be passed162 'slave' is None, so no interations would be passed
727 to the asynchonous dispatcher and the builder remained active163 to the asynchonous dispatcher and the builder remained active
728 and IDLE.164 and IDLE.
729 """165 """
730 self.assertTrue(166 self.assertTrue(slave is None, "Unexpected slave.")
731 recording_slave is None, "Unexpected recording_slave.")
732167
733 builder = getUtility(IBuilderSet).get(builder.id)168 builder = getUtility(IBuilderSet).get(builder.id)
734 self.assertTrue(builder.builderok)169 self.assertTrue(builder.builderok)
@@ -753,9 +188,9 @@
753 login(ANONYMOUS)188 login(ANONYMOUS)
754189
755 # Run 'scan' and check its result.190 # Run 'scan' and check its result.
756 LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)191 self.layer.switchDbUser(config.builddmaster.dbuser)
757 manager = self._getManager()192 scanner = self._getScanner()
758 d = defer.maybeDeferred(manager.scan)193 d = defer.maybeDeferred(scanner.singleCycle)
759 d.addCallback(self._checkNoDispatch, builder)194 d.addCallback(self._checkNoDispatch, builder)
760 return d195 return d
761196
@@ -793,9 +228,9 @@
793 login(ANONYMOUS)228 login(ANONYMOUS)
794229
795 # Run 'scan' and check its result.230 # Run 'scan' and check its result.
796 LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)231 self.layer.switchDbUser(config.builddmaster.dbuser)
797 manager = self._getManager()232 scanner = self._getScanner()
798 d = defer.maybeDeferred(manager.scan)233 d = defer.maybeDeferred(scanner.scan)
799 d.addCallback(self._checkJobRescued, builder, job)234 d.addCallback(self._checkJobRescued, builder, job)
800 return d235 return d
801236
@@ -814,8 +249,6 @@
814 self.assertBuildingJob(job, builder, logtail='This is a build log')249 self.assertBuildingJob(job, builder, logtail='This is a build log')
815250
816 def testScanUpdatesBuildingJobs(self):251 def testScanUpdatesBuildingJobs(self):
817 # The job assigned to a broken builder is rescued.
818
819 # Enable sampledata builder attached to an appropriate testing252 # Enable sampledata builder attached to an appropriate testing
820 # slave. It will respond as if it was building the sampledata job.253 # slave. It will respond as if it was building the sampledata job.
821 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]254 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
@@ -830,188 +263,174 @@
830 self.assertBuildingJob(job, builder)263 self.assertBuildingJob(job, builder)
831264
832 # Run 'scan' and check its result.265 # Run 'scan' and check its result.
833 LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)266 self.layer.switchDbUser(config.builddmaster.dbuser)
834 manager = self._getManager()267 scanner = self._getScanner()
835 d = defer.maybeDeferred(manager.scan)268 d = defer.maybeDeferred(scanner.scan)
836 d.addCallback(self._checkJobUpdated, builder, job)269 d.addCallback(self._checkJobUpdated, builder, job)
837 return d270 return d
838271
839 def test_scan_assesses_failure_exceptions(self):272 def test_scan_with_nothing_to_dispatch(self):
273 factory = LaunchpadObjectFactory()
274 builder = factory.makeBuilder()
275 builder.setSlaveForTesting(OkSlave())
276 scanner = self._getScanner(builder_name=builder.name)
277 d = scanner.scan()
278 return d.addCallback(self._checkNoDispatch, builder)
279
280 def test_scan_with_manual_builder(self):
281 # Reset sampledata builder.
282 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
283 self._resetBuilder(builder)
284 builder.setSlaveForTesting(OkSlave())
285 builder.manual = True
286 scanner = self._getScanner()
287 d = scanner.scan()
288 d.addCallback(self._checkNoDispatch, builder)
289 return d
290
291 def test_scan_with_not_ok_builder(self):
292 # Reset sampledata builder.
293 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
294 self._resetBuilder(builder)
295 builder.setSlaveForTesting(OkSlave())
296 builder.builderok = False
297 scanner = self._getScanner()
298 d = scanner.scan()
299 # Because the builder is not ok, we can't use _checkNoDispatch.
300 d.addCallback(
301 lambda ignored: self.assertIdentical(None, builder.currentjob))
302 return d
303
304 def test_scan_of_broken_slave(self):
305 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
306 self._resetBuilder(builder)
307 builder.setSlaveForTesting(BrokenSlave())
308 builder.failure_count = 0
309 scanner = self._getScanner(builder_name=builder.name)
310 d = scanner.scan()
311 return self.assertFailure(d, xmlrpclib.Fault)
312
313 def _assertFailureCounting(self, builder_count, job_count,
314 expected_builder_count, expected_job_count):
840 # If scan() fails with an exception, failure_counts should be315 # If scan() fails with an exception, failure_counts should be
841 # incremented and tested.316 # incremented. What we do with the results of the failure
317 # counts is tested below separately, this test just makes sure that
318 # scan() is setting the counts.
842 def failing_scan():319 def failing_scan():
843 raise Exception("fake exception")320 return defer.fail(Exception("fake exception"))
844 manager = self._getManager()321 scanner = self._getScanner()
845 manager.scan = failing_scan322 scanner.scan = failing_scan
846 manager.scheduleNextScanCycle = FakeMethod()
847 from lp.buildmaster import manager as manager_module323 from lp.buildmaster import manager as manager_module
848 self.patch(manager_module, 'assessFailureCounts', FakeMethod())324 self.patch(manager_module, 'assessFailureCounts', FakeMethod())
849 builder = getUtility(IBuilderSet)[manager.builder_name]325 builder = getUtility(IBuilderSet)[scanner.builder_name]
850326
851 # Failure counts start at zero.327 builder.failure_count = builder_count
852 self.assertEqual(0, builder.failure_count)328 builder.currentjob.specific_job.build.failure_count = job_count
853 self.assertEqual(329 # The _scanFailed() calls abort, so make sure our existing
854 0, builder.currentjob.specific_job.build.failure_count)330 # failure counts are persisted.
855331 self.layer.txn.commit()
856 # startCycle() calls scan() which is our fake one that throws an332
333 # singleCycle() calls scan() which is our fake one that throws an
857 # exception.334 # exception.
858 manager.startCycle()335 d = scanner.singleCycle()
859336
860 # Failure counts should be updated, and the assessment method337 # Failure counts should be updated, and the assessment method
861 # should have been called.338 # should have been called. The actual behaviour is tested below
862 self.assertEqual(1, builder.failure_count)339 # in TestFailureAssessments.
863 self.assertEqual(340 def got_scan(ignored):
864 1, builder.currentjob.specific_job.build.failure_count)341 self.assertEqual(expected_builder_count, builder.failure_count)
865342 self.assertEqual(
866 self.assertEqual(343 expected_job_count,
867 1, manager_module.assessFailureCounts.call_count)344 builder.currentjob.specific_job.build.failure_count)
868345 self.assertEqual(
869346 1, manager_module.assessFailureCounts.call_count)
870class TestDispatchResult(LaunchpadTestCase):347
871 """Tests `BaseDispatchResult` variations.348 return d.addCallback(got_scan)
872349
873 Variations of `BaseDispatchResult` when evaluated update the database350 def test_scan_first_fail(self):
874 information according to their purpose.351 # The first failure of a job should result in the failure_count
875 """352 # on the job and the builder both being incremented.
876353 self._assertFailureCounting(
877 layer = LaunchpadZopelessLayer354 builder_count=0, job_count=0, expected_builder_count=1,
878355 expected_job_count=1)
879 def _getBuilder(self, name):356
880 """Return a fixed `IBuilder` instance from the sampledata.357 def test_scan_second_builder_fail(self):
881358 # The first failure of a job should result in the failure_count
882 Ensure it's active (builderok=True) and it has a in-progress job.359 # on the job and the builder both being incremented.
883 """360 self._assertFailureCounting(
884 login('foo.bar@canonical.com')361 builder_count=1, job_count=0, expected_builder_count=2,
885362 expected_job_count=1)
886 builder = getUtility(IBuilderSet)[name]363
887 builder.builderok = True364 def test_scan_second_job_fail(self):
888365 # The first failure of a job should result in the failure_count
889 job = builder.currentjob366 # on the job and the builder both being incremented.
890 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)367 self._assertFailureCounting(
891 self.assertEqual(368 builder_count=0, job_count=1, expected_builder_count=1,
892 'i386 build of mozilla-firefox 0.9 in ubuntu hoary RELEASE',369 expected_job_count=2)
893 build.title)370
894371 def test_scanFailed_handles_lack_of_a_job_on_the_builder(self):
895 self.assertEqual('BUILDING', build.status.name)372 def failing_scan():
896 self.assertNotEqual(None, job.builder)373 return defer.fail(Exception("fake exception"))
897 self.assertNotEqual(None, job.date_started)374 scanner = self._getScanner()
898 self.assertNotEqual(None, job.logtail)375 scanner.scan = failing_scan
899376 builder = getUtility(IBuilderSet)[scanner.builder_name]
900 transaction.commit()377 builder.failure_count = Builder.FAILURE_THRESHOLD
901378 builder.currentjob.reset()
902 return builder, job.id379 self.layer.txn.commit()
903380
904 def assertBuildqueueIsClean(self, buildqueue):381 d = scanner.singleCycle()
905 # Check that the buildqueue is reset.382
906 self.assertEqual(None, buildqueue.builder)383 def scan_finished(ignored):
907 self.assertEqual(None, buildqueue.date_started)384 self.assertFalse(builder.builderok)
908 self.assertEqual(None, buildqueue.logtail)385
909386 return d.addCallback(scan_finished)
910 def assertBuilderIsClean(self, builder):387
911 # Check that the builder is ready for a new build.388 def test_fail_to_resume_slave_resets_job(self):
912 self.assertTrue(builder.builderok)389 # If an attempt to resume and dispatch a slave fails, it should
913 self.assertIs(None, builder.failnotes)390 # reset the job via job.reset()
914 self.assertIs(None, builder.currentjob)391
915392 # Make a slave with a failing resume() method.
916 def testResetDispatchResult(self):393 slave = OkSlave()
917 # Test that `ResetDispatchResult` resets the builder and job.394 slave.resume = lambda: deferLater(
918 builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)395 reactor, 0, defer.fail, Failure(('out', 'err', 1)))
919 buildqueue_id = builder.currentjob.id396
920 builder.builderok = True397 # Reset sampledata builder.
921 builder.failure_count = 1398 builder = removeSecurityProxy(
922399 getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME])
923 # Setup a interaction to satisfy 'write_transaction' decorator.400 self._resetBuilder(builder)
924 login(ANONYMOUS)401 self.assertEqual(0, builder.failure_count)
925 slave = RecordingSlave(builder.name, builder.url, builder.vm_host)402 builder.setSlaveForTesting(slave)
926 result = ResetDispatchResult(slave)403 builder.vm_host = "fake_vm_host"
927 result()404
928405 scanner = self._getScanner()
929 buildqueue = getUtility(IBuildQueueSet).get(buildqueue_id)406
930 self.assertBuildqueueIsClean(buildqueue)407 # Get the next job that will be dispatched.
931408 job = removeSecurityProxy(builder._findBuildCandidate())
932 # XXX Julian409 job.virtualized = True
933 # Disabled test until bug 586362 is fixed.410 builder.virtualized = True
934 #self.assertFalse(builder.builderok)411 d = scanner.singleCycle()
935 self.assertBuilderIsClean(builder)412
936413 def check(ignored):
937 def testFailDispatchResult(self):414 # The failure_count will have been incremented on the
938 # Test that `FailDispatchResult` calls assessFailureCounts() so415 # builder, we can check that to see that a dispatch attempt
939 # that we know the builders and jobs are failed as necessary416 # did indeed occur.
940 # when a FailDispatchResult is called at the end of the dispatch417 self.assertEqual(1, builder.failure_count)
941 # chain.418 # There should also be no builder set on the job.
942 builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)419 self.assertTrue(job.builder is None)
943420 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)
944 # Setup a interaction to satisfy 'write_transaction' decorator.421 self.assertEqual(build.status, BuildStatus.NEEDSBUILD)
945 login(ANONYMOUS)422
946 slave = RecordingSlave(builder.name, builder.url, builder.vm_host)423 return d.addCallback(check)
947 result = FailDispatchResult(slave, 'does not work!')
948 result.assessFailureCounts = FakeMethod()
949 self.assertEqual(0, result.assessFailureCounts.call_count)
950 result()
951 self.assertEqual(1, result.assessFailureCounts.call_count)
952
953 def _setup_failing_dispatch_result(self):
954 # assessFailureCounts should fail jobs or builders depending on
955 # whether it sees the failure_counts on each increasing.
956 builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)
957 slave = RecordingSlave(builder.name, builder.url, builder.vm_host)
958 result = FailDispatchResult(slave, 'does not work!')
959 return builder, result
960
961 def test_assessFailureCounts_equal_failures(self):
962 # Basic case where the failure counts are equal and the job is
963 # reset to try again & the builder is not failed.
964 builder, result = self._setup_failing_dispatch_result()
965 buildqueue = builder.currentjob
966 build = buildqueue.specific_job.build
967 builder.failure_count = 2
968 build.failure_count = 2
969 result.assessFailureCounts()
970
971 self.assertBuilderIsClean(builder)
972 self.assertEqual('NEEDSBUILD', build.status.name)
973 self.assertBuildqueueIsClean(buildqueue)
974
975 def test_assessFailureCounts_job_failed(self):
976 # Case where the job has failed more than the builder.
977 builder, result = self._setup_failing_dispatch_result()
978 buildqueue = builder.currentjob
979 build = buildqueue.specific_job.build
980 build.failure_count = 2
981 builder.failure_count = 1
982 result.assessFailureCounts()
983
984 self.assertBuilderIsClean(builder)
985 self.assertEqual('FAILEDTOBUILD', build.status.name)
986 # The buildqueue should have been removed entirely.
987 self.assertEqual(
988 None, getUtility(IBuildQueueSet).getByBuilder(builder),
989 "Buildqueue was not removed when it should be.")
990
991 def test_assessFailureCounts_builder_failed(self):
992 # Case where the builder has failed more than the job.
993 builder, result = self._setup_failing_dispatch_result()
994 buildqueue = builder.currentjob
995 build = buildqueue.specific_job.build
996 build.failure_count = 2
997 builder.failure_count = 3
998 result.assessFailureCounts()
999
1000 self.assertFalse(builder.builderok)
1001 self.assertEqual('does not work!', builder.failnotes)
1002 self.assertTrue(builder.currentjob is None)
1003 self.assertEqual('NEEDSBUILD', build.status.name)
1004 self.assertBuildqueueIsClean(buildqueue)
1005424
1006425
1007class TestBuilddManager(TrialTestCase):426class TestBuilddManager(TrialTestCase):
1008427
1009 layer = LaunchpadZopelessLayer428 layer = TwistedLaunchpadZopelessLayer
1010429
1011 def _stub_out_scheduleNextScanCycle(self):430 def _stub_out_scheduleNextScanCycle(self):
1012 # stub out the code that adds a callLater, so that later tests431 # stub out the code that adds a callLater, so that later tests
1013 # don't get surprises.432 # don't get surprises.
1014 self.patch(SlaveScanner, 'scheduleNextScanCycle', FakeMethod())433 self.patch(SlaveScanner, 'startCycle', FakeMethod())
1015434
1016 def test_addScanForBuilders(self):435 def test_addScanForBuilders(self):
1017 # Test that addScanForBuilders generates NewBuildersScanner objects.436 # Test that addScanForBuilders generates NewBuildersScanner objects.
@@ -1040,10 +459,62 @@
1040 self.assertNotEqual(0, manager.new_builders_scanner.scan.call_count)459 self.assertNotEqual(0, manager.new_builders_scanner.scan.call_count)
1041460
1042461
462class TestFailureAssessments(TestCaseWithFactory):
463
464 layer = ZopelessDatabaseLayer
465
466 def setUp(self):
467 TestCaseWithFactory.setUp(self)
468 self.builder = self.factory.makeBuilder()
469 self.build = self.factory.makeSourcePackageRecipeBuild()
470 self.buildqueue = self.build.queueBuild()
471 self.buildqueue.markAsBuilding(self.builder)
472
473 def test_equal_failures_reset_job(self):
474 self.builder.gotFailure()
475 self.builder.getCurrentBuildFarmJob().gotFailure()
476
477 assessFailureCounts(self.builder, "failnotes")
478 self.assertIs(None, self.builder.currentjob)
479 self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
480
481 def test_job_failing_more_than_builder_fails_job(self):
482 self.builder.getCurrentBuildFarmJob().gotFailure()
483
484 assessFailureCounts(self.builder, "failnotes")
485 self.assertIs(None, self.builder.currentjob)
486 self.assertEqual(self.build.status, BuildStatus.FAILEDTOBUILD)
487
488 def test_builder_failing_more_than_job_but_under_fail_threshold(self):
489 self.builder.failure_count = Builder.FAILURE_THRESHOLD - 1
490
491 assessFailureCounts(self.builder, "failnotes")
492 self.assertIs(None, self.builder.currentjob)
493 self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
494 self.assertTrue(self.builder.builderok)
495
496 def test_builder_failing_more_than_job_but_over_fail_threshold(self):
497 self.builder.failure_count = Builder.FAILURE_THRESHOLD
498
499 assessFailureCounts(self.builder, "failnotes")
500 self.assertIs(None, self.builder.currentjob)
501 self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
502 self.assertFalse(self.builder.builderok)
503 self.assertEqual("failnotes", self.builder.failnotes)
504
505 def test_builder_failing_with_no_attached_job(self):
506 self.buildqueue.reset()
507 self.builder.failure_count = Builder.FAILURE_THRESHOLD
508
509 assessFailureCounts(self.builder, "failnotes")
510 self.assertFalse(self.builder.builderok)
511 self.assertEqual("failnotes", self.builder.failnotes)
512
513
1043class TestNewBuilders(TrialTestCase):514class TestNewBuilders(TrialTestCase):
1044 """Test detecting of new builders."""515 """Test detecting of new builders."""
1045516
1046 layer = LaunchpadZopelessLayer517 layer = TwistedLaunchpadZopelessLayer
1047518
1048 def _getScanner(self, manager=None, clock=None):519 def _getScanner(self, manager=None, clock=None):
1049 return NewBuildersScanner(manager=manager, clock=clock)520 return NewBuildersScanner(manager=manager, clock=clock)
@@ -1084,11 +555,8 @@
1084 new_builders, builder_scanner.checkForNewBuilders())555 new_builders, builder_scanner.checkForNewBuilders())
1085556
1086 def test_scan(self):557 def test_scan(self):
1087 # See if scan detects new builders and schedules the next scan.558 # See if scan detects new builders.
1088559
1089 # stub out the addScanForBuilders and scheduleScan methods since
1090 # they use callLater; we only want to assert that they get
1091 # called.
1092 def fake_checkForNewBuilders():560 def fake_checkForNewBuilders():
1093 return "new_builders"561 return "new_builders"
1094562
@@ -1104,9 +572,6 @@
1104 builder_scanner.scan()572 builder_scanner.scan()
1105 advance = NewBuildersScanner.SCAN_INTERVAL + 1573 advance = NewBuildersScanner.SCAN_INTERVAL + 1
1106 clock.advance(advance)574 clock.advance(advance)
1107 self.assertNotEqual(
1108 0, builder_scanner.scheduleScan.call_count,
1109 "scheduleScan did not get called")
1110575
1111576
1112def is_file_growing(filepath, poll_interval=1, poll_repeat=10):577def is_file_growing(filepath, poll_interval=1, poll_repeat=10):
@@ -1147,7 +612,7 @@
1147 return False612 return False
1148613
1149614
1150class TestBuilddManagerScript(LaunchpadTestCase):615class TestBuilddManagerScript(TestCaseWithFactory):
1151616
1152 layer = LaunchpadScriptLayer617 layer = LaunchpadScriptLayer
1153618
@@ -1156,6 +621,7 @@
1156 fixture = BuilddManagerTestSetup()621 fixture = BuilddManagerTestSetup()
1157 fixture.setUp()622 fixture.setUp()
1158 fixture.tearDown()623 fixture.tearDown()
624 self.layer.force_dirty_database()
1159625
1160 # XXX Julian 2010-08-06 bug=614275626 # XXX Julian 2010-08-06 bug=614275
1161 # These next 2 tests are in the wrong place, they should be near the627 # These next 2 tests are in the wrong place, they should be near the
1162628
=== modified file 'lib/lp/buildmaster/tests/test_packagebuild.py'
--- lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-02 11:41:43 +0000
+++ lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-25 19:14:01 +0000
@@ -99,6 +99,8 @@
99 self.assertRaises(99 self.assertRaises(
100 NotImplementedError, self.package_build.verifySuccessfulUpload)100 NotImplementedError, self.package_build.verifySuccessfulUpload)
101 self.assertRaises(NotImplementedError, self.package_build.notify)101 self.assertRaises(NotImplementedError, self.package_build.notify)
102 # XXX 2010-10-18 bug=662631
103 # Change this to do non-blocking IO.
102 self.assertRaises(104 self.assertRaises(
103 NotImplementedError, self.package_build.handleStatus,105 NotImplementedError, self.package_build.handleStatus,
104 None, None, None)106 None, None, None)
@@ -311,6 +313,8 @@
311 # A filemap with plain filenames should not cause a problem.313 # A filemap with plain filenames should not cause a problem.
312 # The call to handleStatus will attempt to get the file from314 # The call to handleStatus will attempt to get the file from
313 # the slave resulting in a URL error in this test case.315 # the slave resulting in a URL error in this test case.
316 # XXX 2010-10-18 bug=662631
317 # Change this to do non-blocking IO.
314 self.build.handleStatus('OK', None, {318 self.build.handleStatus('OK', None, {
315 'filemap': {'myfile.py': 'test_file_hash'},319 'filemap': {'myfile.py': 'test_file_hash'},
316 })320 })
@@ -321,6 +325,8 @@
321 def test_handleStatus_OK_absolute_filepath(self):325 def test_handleStatus_OK_absolute_filepath(self):
322 # A filemap that tries to write to files outside of326 # A filemap that tries to write to files outside of
323 # the upload directory will result in a failed upload.327 # the upload directory will result in a failed upload.
328 # XXX 2010-10-18 bug=662631
329 # Change this to do non-blocking IO.
324 self.build.handleStatus('OK', None, {330 self.build.handleStatus('OK', None, {
325 'filemap': {'/tmp/myfile.py': 'test_file_hash'},331 'filemap': {'/tmp/myfile.py': 'test_file_hash'},
326 })332 })
@@ -331,6 +337,8 @@
331 def test_handleStatus_OK_relative_filepath(self):337 def test_handleStatus_OK_relative_filepath(self):
332 # A filemap that tries to write to files outside of338 # A filemap that tries to write to files outside of
333 # the upload directory will result in a failed upload.339 # the upload directory will result in a failed upload.
340 # XXX 2010-10-18 bug=662631
341 # Change this to do non-blocking IO.
334 self.build.handleStatus('OK', None, {342 self.build.handleStatus('OK', None, {
335 'filemap': {'../myfile.py': 'test_file_hash'},343 'filemap': {'../myfile.py': 'test_file_hash'},
336 })344 })
@@ -341,6 +349,8 @@
341 # The build log is set during handleStatus.349 # The build log is set during handleStatus.
342 removeSecurityProxy(self.build).log = None350 removeSecurityProxy(self.build).log = None
343 self.assertEqual(None, self.build.log)351 self.assertEqual(None, self.build.log)
352 # XXX 2010-10-18 bug=662631
353 # Change this to do non-blocking IO.
344 self.build.handleStatus('OK', None, {354 self.build.handleStatus('OK', None, {
345 'filemap': {'myfile.py': 'test_file_hash'},355 'filemap': {'myfile.py': 'test_file_hash'},
346 })356 })
@@ -350,6 +360,8 @@
350 # The date finished is updated during handleStatus_OK.360 # The date finished is updated during handleStatus_OK.
351 removeSecurityProxy(self.build).date_finished = None361 removeSecurityProxy(self.build).date_finished = None
352 self.assertEqual(None, self.build.date_finished)362 self.assertEqual(None, self.build.date_finished)
363 # XXX 2010-10-18 bug=662631
364 # Change this to do non-blocking IO.
353 self.build.handleStatus('OK', None, {365 self.build.handleStatus('OK', None, {
354 'filemap': {'myfile.py': 'test_file_hash'},366 'filemap': {'myfile.py': 'test_file_hash'},
355 })367 })
356368
=== modified file 'lib/lp/code/model/recipebuilder.py'
--- lib/lp/code/model/recipebuilder.py 2010-08-20 20:31:18 +0000
+++ lib/lp/code/model/recipebuilder.py 2010-10-25 19:14:01 +0000
@@ -117,38 +117,42 @@
117 raise CannotBuild("Unable to find distroarchseries for %s in %s" %117 raise CannotBuild("Unable to find distroarchseries for %s in %s" %
118 (self._builder.processor.name,118 (self._builder.processor.name,
119 self.build.distroseries.displayname))119 self.build.distroseries.displayname))
120120 args = self._extraBuildArgs(distroarchseries, logger)
121 chroot = distroarchseries.getChroot()121 chroot = distroarchseries.getChroot()
122 if chroot is None:122 if chroot is None:
123 raise CannotBuild("Unable to find a chroot for %s" %123 raise CannotBuild("Unable to find a chroot for %s" %
124 distroarchseries.displayname)124 distroarchseries.displayname)
125 self._builder.slave.cacheFile(logger, chroot)125 d = self._builder.slave.cacheFile(logger, chroot)
126126
127 # Generate a string which can be used to cross-check when obtaining127 def got_cache_file(ignored):
128 # results so we know we are referring to the right database object in128 # Generate a string which can be used to cross-check when obtaining
129 # subsequent runs.129 # results so we know we are referring to the right database object in
130 buildid = "%s-%s" % (self.build.id, build_queue_id)130 # subsequent runs.
131 cookie = self.buildfarmjob.generateSlaveBuildCookie()131 buildid = "%s-%s" % (self.build.id, build_queue_id)
132 chroot_sha1 = chroot.content.sha1132 cookie = self.buildfarmjob.generateSlaveBuildCookie()
133 logger.debug(133 chroot_sha1 = chroot.content.sha1
134 "Initiating build %s on %s" % (buildid, self._builder.url))134 logger.debug(
135135 "Initiating build %s on %s" % (buildid, self._builder.url))
136 args = self._extraBuildArgs(distroarchseries, logger)136
137 status, info = self._builder.slave.build(137 return self._builder.slave.build(
138 cookie, "sourcepackagerecipe", chroot_sha1, {}, args)138 cookie, "sourcepackagerecipe", chroot_sha1, {}, args)
139 message = """%s (%s):139
140 ***** RESULT *****140 def log_build_result((status, info)):
141 %s141 message = """%s (%s):
142 %s: %s142 ***** RESULT *****
143 ******************143 %s
144 """ % (144 %s: %s
145 self._builder.name,145 ******************
146 self._builder.url,146 """ % (
147 args,147 self._builder.name,
148 status,148 self._builder.url,
149 info,149 args,
150 )150 status,
151 logger.info(message)151 info,
152 )
153 logger.info(message)
154
155 return d.addCallback(got_cache_file).addCallback(log_build_result)
152156
153 def verifyBuildRequest(self, logger):157 def verifyBuildRequest(self, logger):
154 """Assert some pre-build checks.158 """Assert some pre-build checks.
155159
=== modified file 'lib/lp/soyuz/browser/tests/test_builder_views.py'
--- lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-04 19:50:45 +0000
+++ lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-25 19:14:01 +0000
@@ -34,7 +34,7 @@
34 return view34 return view
3535
36 def test_posting_form_doesnt_call_slave_xmlrpc(self):36 def test_posting_form_doesnt_call_slave_xmlrpc(self):
37 # Posting the +edit for should not call is_available, which37 # Posting the +edit for should not call isAvailable, which
38 # would do xmlrpc to a slave builder and is explicitly forbidden38 # would do xmlrpc to a slave builder and is explicitly forbidden
39 # in a webapp process.39 # in a webapp process.
40 view = self.initialize_view()40 view = self.initialize_view()
4141
=== removed file 'lib/lp/soyuz/doc/buildd-dispatching.txt'
--- lib/lp/soyuz/doc/buildd-dispatching.txt 2010-10-18 22:24:59 +0000
+++ lib/lp/soyuz/doc/buildd-dispatching.txt 1970-01-01 00:00:00 +0000
@@ -1,371 +0,0 @@
1= Buildd Dispatching =
2
3 >>> import transaction
4 >>> import logging
5 >>> logger = logging.getLogger()
6 >>> logger.setLevel(logging.DEBUG)
7
8The buildd dispatching basically consists of finding a available
9slave in IDLE state, pushing any required files to it, then requesting
10that it starts the build procedure. These tasks are implemented by the
11BuilderSet and Builder classes.
12
13Setup the test builder:
14
15 >>> from canonical.buildd.tests import BuilddSlaveTestSetup
16 >>> fixture = BuilddSlaveTestSetup()
17 >>> fixture.setUp()
18
19Setup a suitable chroot for Hoary i386:
20
21 >>> from StringIO import StringIO
22 >>> from canonical.librarian.interfaces import ILibrarianClient
23 >>> librarian_client = getUtility(ILibrarianClient)
24
25 >>> content = 'anything'
26 >>> alias_id = librarian_client.addFile(
27 ... 'foo.tar.gz', len(content), StringIO(content), 'text/plain')
28
29 >>> from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet
30 >>> from lp.registry.interfaces.distribution import IDistributionSet
31 >>> from lp.registry.interfaces.pocket import PackagePublishingPocket
32
33 >>> hoary = getUtility(IDistributionSet)['ubuntu']['hoary']
34 >>> hoary_i386 = hoary['i386']
35
36 >>> chroot = getUtility(ILibraryFileAliasSet)[alias_id]
37 >>> pc = hoary_i386.addOrUpdateChroot(chroot=chroot)
38
39Activate builders present in sampledata, we need to be logged in as a
40member of launchpad-buildd-admin:
41
42 >>> from canonical.launchpad.ftests import login
43 >>> login('celso.providelo@canonical.com')
44
45Set IBuilder.builderok of all present builders:
46
47 >>> from lp.buildmaster.interfaces.builder import IBuilderSet
48 >>> builder_set = getUtility(IBuilderSet)
49
50 >>> builder_set.count()
51 2
52
53 >>> from canonical.launchpad.ftests import syncUpdate
54 >>> for b in builder_set:
55 ... b.builderok = True
56 ... syncUpdate(b)
57
58Clean up previous BuildQueue results from sampledata:
59
60 >>> from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
61 >>> lost_job = getUtility(IBuildQueueSet).get(1)
62 >>> lost_job.builder.name
63 u'bob'
64 >>> lost_job.destroySelf()
65 >>> transaction.commit()
66
67If the specified buildd slave reset command (used inside resumeSlaveHost())
68fails, the slave will still be marked as failed.
69
70 >>> from canonical.config import config
71 >>> reset_fail_config = '''
72 ... [builddmaster]
73 ... vm_resume_command: /bin/false'''
74 >>> config.push('reset fail', reset_fail_config)
75 >>> frog_builder = builder_set['frog']
76 >>> frog_builder.handleTimeout(logger, 'The universe just collapsed')
77 WARNING:root:Resetting builder: http://localhost:9221/ -- The universe just collapsed
78 ...
79 WARNING:root:Failed to reset builder: http://localhost:9221/ -- Resuming failed:
80 ...
81 WARNING:root:Disabling builder: http://localhost:9221/ -- The universe just collapsed
82 ...
83 <BLANKLINE>
84
85Since we were unable to reset the 'frog' builder it was marked as 'failed'.
86
87 >>> frog_builder.builderok
88 False
89
90Restore default value for resume command.
91
92 >>> ignored_config = config.pop('reset fail')
93
94The 'bob' builder is available for build jobs.
95
96 >>> bob_builder = builder_set['bob']
97 >>> bob_builder.name
98 u'bob'
99 >>> bob_builder.virtualized
100 False
101 >>> bob_builder.is_available
102 True
103 >>> bob_builder.builderok
104 True
105
106
107== Builder dispatching API ==
108
109Now let's check the build candidates which will be considered for the
110builder 'bob':
111
112 >>> from zope.security.proxy import removeSecurityProxy
113 >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
114
115The single BuildQueue found is a non-virtual pending build:
116
117 >>> job.id
118 2
119 >>> from lp.soyuz.interfaces.binarypackagebuild import (
120 ... IBinaryPackageBuildSet)
121 >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)
122 >>> build.status.name
123 'NEEDSBUILD'
124 >>> job.builder is None
125 True
126 >>> job.date_started is None
127 True
128 >>> build.is_virtualized
129 False
130
131The build start time is not set yet either.
132
133 >>> print build.date_first_dispatched
134 None
135
136Update the SourcePackageReleaseFile corresponding to this job:
137
138 >>> content = 'anything'
139 >>> alias_id = librarian_client.addFile(
140 ... 'foo.dsc', len(content), StringIO(content), 'application/dsc')
141
142 >>> sprf = build.source_package_release.files[0]
143 >>> naked_sprf = removeSecurityProxy(sprf)
144 >>> naked_sprf.libraryfile = getUtility(ILibraryFileAliasSet)[alias_id]
145 >>> flush_database_updates()
146
147Check the dispatching method itself:
148
149 >>> dispatched_job = bob_builder.findAndStartJob()
150 >>> job == dispatched_job
151 True
152 >>> bob_builder.builderok = True
153
154 >>> flush_database_updates()
155
156Verify if the job (BuildQueue) was updated appropriately:
157
158 >>> job.builder.id == bob_builder.id
159 True
160
161 >>> dispatched_build = getUtility(
162 ... IBinaryPackageBuildSet).getByQueueEntry(job)
163 >>> dispatched_build == build
164 True
165
166 >>> build.status.name
167 'BUILDING'
168
169Shutdown builder, mark the build record as failed and remove the
170buildqueue record, so the build was eliminated:
171
172 >>> fixture.tearDown()
173
174 >>> from lp.buildmaster.enums import BuildStatus
175 >>> build.status = BuildStatus.FAILEDTOBUILD
176 >>> job.destroySelf()
177 >>> flush_database_updates()
178
179
180== PPA build dispatching ==
181
182Create a new Build record of the same source targeted for a PPA archive:
183
184 >>> from lp.registry.interfaces.person import IPersonSet
185 >>> cprov = getUtility(IPersonSet).getByName('cprov')
186
187 >>> ppa_build = sprf.sourcepackagerelease.createBuild(
188 ... hoary_i386, PackagePublishingPocket.RELEASE, cprov.archive)
189
190Create BuildQueue record and inspect some parameters:
191
192 >>> ppa_job = ppa_build.queueBuild()
193 >>> ppa_job.id
194 3
195 >>> ppa_job.builder == None
196 True
197 >>> ppa_job.date_started == None
198 True
199
200The build job's archive requires virtualized builds.
201
202 >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(ppa_job)
203 >>> build.archive.require_virtualized
204 True
205
206But the builder is not virtualized.
207
208 >>> bob_builder.virtualized
209 False
210
211Hence, the builder will not be able to pick up the PPA build job created
212above.
213
214 >>> bob_builder.vm_host = 'localhost.ppa'
215 >>> syncUpdate(bob_builder)
216
217 >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
218 >>> print job
219 None
220
221In order to enable 'bob' to find and build the PPA job, we have to
222change it to virtualized. This is because PPA builds will only build
223on virtualized builders. We also need to make sure this build's source
224is published, or it will also be ignored (by superseding it). We can
225do this by copying the existing publication in Ubuntu.
226
227 >>> from lp.soyuz.model.publishing import (
228 ... SourcePackagePublishingHistory)
229 >>> [old_pub] = SourcePackagePublishingHistory.selectBy(
230 ... distroseries=build.distro_series,
231 ... sourcepackagerelease=build.source_package_release)
232 >>> new_pub = old_pub.copyTo(
233 ... old_pub.distroseries, old_pub.pocket, build.archive)
234
235 >>> bob_builder.virtualized = True
236 >>> syncUpdate(bob_builder)
237
238 >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
239 >>> ppa_job.id == job.id
240 True
241
242For further details regarding IBuilder._findBuildCandidate() please see
243lib/lp/soyuz/tests/test_builder.py.
244
245Start buildd-slave to be able to dispatch jobs.
246
247 >>> fixture = BuilddSlaveTestSetup()
248 >>> fixture.setUp()
249
250Before dispatching we can check if the builder is protected against
251mistakes in code that results in a attempt to build a virtual job in
252a non-virtual build.
253
254 >>> bob_builder.virtualized = False
255 >>> flush_database_updates()
256 >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(ppa_job)
257 Traceback (most recent call last):
258 ...
259 AssertionError: Attempt to build non-virtual item on a virtual builder.
260
261Mark the builder as virtual again, so we can dispatch the ppa job
262successfully.
263
264 >>> bob_builder.virtualized = True
265 >>> flush_database_updates()
266
267 >>> dispatched_job = bob_builder.findAndStartJob()
268 >>> ppa_job == dispatched_job
269 True
270
271 >>> flush_database_updates()
272
273PPA job is building.
274
275 >>> ppa_job.builder.name
276 u'bob'
277
278 >>> build.status.name
279 'BUILDING'
280
281Shutdown builder slave, mark the ppa build record as failed, remove the
282buildqueue record and make 'bob' builder non-virtual again, so the
283environment is back to the initial state.
284
285 >>> fixture.tearDown()
286
287 >>> build.status = BuildStatus.FAILEDTOBUILD
288 >>> ppa_job.destroySelf()
289 >>> bob_builder.virtualized = False
290 >>> flush_database_updates()
291
292
293== Security build dispatching ==
294
295Setup chroot for warty/i386.
296
297 >>> warty = getUtility(IDistributionSet)['ubuntu']['warty']
298 >>> warty_i386 = warty['i386']
299 >>> pc = warty_i386.addOrUpdateChroot(chroot=chroot)
300
301Create a new Build record for test source targeted to warty/i386
302architecture and SECURITY pocket:
303
304 >>> sec_build = sprf.sourcepackagerelease.createBuild(
305 ... warty_i386, PackagePublishingPocket.SECURITY, hoary.main_archive)
306
307Create BuildQueue record and inspect some parameters:
308
309 >>> sec_job = sec_build.queueBuild()
310 >>> sec_job.id
311 4
312 >>> print sec_job.builder
313 None
314 >>> print sec_job.date_started
315 None
316 >>> sec_build.is_virtualized
317 False
318
319In normal conditions the next available candidate would be the job
320targeted to SECURITY pocket. However, the builders are forbidden to
321accept such jobs until we have finished the EMBARGOED archive
322implementation.
323
324 >>> fixture = BuilddSlaveTestSetup()
325 >>> fixture.setUp()
326 >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(sec_job)
327 Traceback (most recent call last):
328 ...
329 AssertionError: Soyuz is not yet capable of building SECURITY uploads.
330 >>> fixture.tearDown()
331
332To solve this problem temporarily until we start building security
333uploads, we will mark builds targeted to the SECURITY pocket as
334FAILEDTOBUILD during the _findBuildCandidate look-up.
335
336We will also create another build candidate in breezy-autotest/i386 to
337check if legitimate pending candidates will remain valid.
338
339 >>> breezy = getUtility(IDistributionSet)['ubuntu']['breezy-autotest']
340 >>> breezy_i386 = breezy['i386']
341 >>> pc = breezy_i386.addOrUpdateChroot(chroot=chroot)
342
343 >>> pending_build = sprf.sourcepackagerelease.createBuild(
344 ... breezy_i386, PackagePublishingPocket.UPDATES, hoary.main_archive)
345 >>> pending_job = pending_build.queueBuild()
346
The diff has been truncated for viewing.