Merge lp:~mars/launchpad/test-ghost-update into lp:~launchpad/launchpad/ghost-line

Proposed by Māris Fogels
Status: Merged
Approved by: Māris Fogels
Approved revision: 11785
Merged at revision: 11785
Proposed branch: lp:~mars/launchpad/test-ghost-update
Merge into: lp:~launchpad/launchpad/ghost-line
Diff against target: 7192 lines (+3508/-2210)
24 files modified
lib/lp/buildmaster/doc/builder.txt (+118/-2)
lib/lp/buildmaster/interfaces/builder.py (+62/-83)
lib/lp/buildmaster/manager.py (+468/-204)
lib/lp/buildmaster/model/builder.py (+224/-240)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+52/-60)
lib/lp/buildmaster/model/packagebuild.py (+0/-6)
lib/lp/buildmaster/tests/mock_slaves.py (+32/-157)
lib/lp/buildmaster/tests/test_builder.py (+154/-582)
lib/lp/buildmaster/tests/test_manager.py (+782/-248)
lib/lp/buildmaster/tests/test_packagebuild.py (+0/-12)
lib/lp/code/model/recipebuilder.py (+28/-32)
lib/lp/soyuz/browser/tests/test_builder_views.py (+1/-1)
lib/lp/soyuz/doc/buildd-dispatching.txt (+371/-0)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+876/-0)
lib/lp/soyuz/model/binarypackagebuildbehavior.py (+41/-59)
lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+8/-290)
lib/lp/soyuz/tests/test_doc.py (+6/-0)
lib/lp/testing/factory.py (+2/-8)
lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+114/-0)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+14/-20)
lib/lp/translations/stories/buildfarm/xx-build-summary.txt (+1/-1)
lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+153/-202)
lib/lp_sitecustomize.py (+0/-3)
utilities/migrater/file-ownership.txt (+1/-0)
To merge this branch: bzr merge lp:~mars/launchpad/test-ghost-update
Reviewer Review Type Date Requested Status
Māris Fogels (community) Approve
Review via email: mp+42514@code.launchpad.net

Commit message

Test merge for bundle-merge command

Description of the change

Test merge

To post a comment you must log in.
Revision history for this message
Māris Fogels (mars) :
review: Approve
Revision history for this message
Launchpad PQM Bot (launchpad-pqm) wrote :

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message
Māris Fogels (mars) :
review: Approve
Revision history for this message
Launchpad PQM Bot (launchpad-pqm) wrote :

The attempt to merge lp:~mars/launchpad/test-ghost-update into lp:~launchpad/launchpad/ghost-line failed. Below is the output from the failed tests.

rm -f lib/canonical/launchpad/icing/build/launchpad.js
rm -f -r lazr-js/build
rm -f -r bin
rm -f -r parts
rm -f -r develop-eggs
rm -f .installed.cfg
rm -f -r build
rm -f _pythonpath.py
make -C sourcecode/pygettextpo clean

make: *** sourcecode/pygettextpo: No such file or directory. Stop.
make: *** [clean] Error 2

lp:~mars/launchpad/test-ghost-update updated
11785. By Māris Fogels

Added a file for testing

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/buildmaster/doc/builder.txt'
2--- lib/lp/buildmaster/doc/builder.txt 2010-09-24 12:10:52 +0000
3+++ lib/lp/buildmaster/doc/builder.txt 2010-12-07 16:24:04 +0000
4@@ -19,6 +19,9 @@
5 As expected, it implements IBuilder.
6
7 >>> from canonical.launchpad.webapp.testing import verifyObject
8+ >>> from lp.buildmaster.interfaces.builder import IBuilder
9+ >>> verifyObject(IBuilder, builder)
10+ True
11
12 >>> print builder.name
13 bob
14@@ -83,7 +86,7 @@
15 The 'new' method will create a new builder in the database.
16
17 >>> bnew = builderset.new(1, 'http://dummy.com:8221/', 'dummy',
18- ... 'Dummy Title', 'eh ?', 1)
19+ ... 'Dummy Title', 'eh ?', 1)
20 >>> bnew.name
21 u'dummy'
22
23@@ -167,7 +170,7 @@
24 >>> recipe_bq.processor = i386_family.processors[0]
25 >>> recipe_bq.virtualized = True
26 >>> transaction.commit()
27-
28+
29 >>> queue_sizes = builderset.getBuildQueueSizes()
30 >>> print queue_sizes['virt']['386']
31 (1L, datetime.timedelta(0, 64))
32@@ -185,3 +188,116 @@
33
34 >>> print queue_sizes['virt']['386']
35 (2L, datetime.timedelta(0, 128))
36+
37+
38+Resuming buildd slaves
39+======================
40+
41+Virtual slaves are resumed using a command specified in the
42+configuration profile. Production configuration uses a SSH trigger
43+account accessed via a private key available in the builddmaster
44+machine (which used ftpmaster configuration profile) as in:
45+
46+{{{
47+ssh ~/.ssh/ppa-reset-key ppa@%(vm_host)s
48+}}}
49+
50+The test configuration uses a fake command that can be performed in
51+development machine and allow us to tests the important features used
52+in production, as 'vm_host' variable replacement.
53+
54+ >>> from canonical.config import config
55+ >>> config.builddmaster.vm_resume_command
56+ 'echo %(vm_host)s'
57+
58+Before performing the command, it checks if the builder is indeed
59+virtual and raises CannotResumeHost if it isn't.
60+
61+ >>> bob = getUtility(IBuilderSet)['bob']
62+ >>> bob.resumeSlaveHost()
63+ Traceback (most recent call last):
64+ ...
65+ CannotResumeHost: Builder is not virtualized.
66+
67+For testing purposes resumeSlaveHost returns the stdout and stderr
68+buffer resulted from the command.
69+
70+ >>> frog = getUtility(IBuilderSet)['frog']
71+ >>> out, err = frog.resumeSlaveHost()
72+ >>> print out.strip()
73+ localhost-host.ppa
74+
75+If the specified command fails, resumeSlaveHost also raises
76+CannotResumeHost exception with the results stdout and stderr.
77+
78+ # The command must have a vm_host dict key and when executed,
79+ # have a returncode that is not 0.
80+ >>> vm_resume_command = """
81+ ... [builddmaster]
82+ ... vm_resume_command: test "%(vm_host)s = 'false'"
83+ ... """
84+ >>> config.push('vm_resume_command', vm_resume_command)
85+ >>> frog.resumeSlaveHost()
86+ Traceback (most recent call last):
87+ ...
88+ CannotResumeHost: Resuming failed:
89+ OUT:
90+ <BLANKLINE>
91+ ERR:
92+ <BLANKLINE>
93+
94+Restore default value for resume command.
95+
96+ >>> config_data = config.pop('vm_resume_command')
97+
98+
99+Rescuing lost slaves
100+====================
101+
102+Builder.rescueIfLost() checks the build ID reported in the slave status
103+against the database. If it isn't building what we think it should be,
104+the current build will be aborted and the slave cleaned in preparation
105+for a new task. The decision about the slave's correctness is left up
106+to IBuildFarmJobBehavior.verifySlaveBuildCookie -- for these examples we
107+will use a special behavior that just checks if the cookie reads 'good'.
108+
109+ >>> import logging
110+ >>> from lp.buildmaster.interfaces.builder import CorruptBuildCookie
111+ >>> from lp.buildmaster.tests.mock_slaves import (
112+ ... BuildingSlave, MockBuilder, OkSlave, WaitingSlave)
113+
114+ >>> class TestBuildBehavior:
115+ ... def verifySlaveBuildCookie(self, cookie):
116+ ... if cookie != 'good':
117+ ... raise CorruptBuildCookie('Bad value')
118+
119+ >>> def rescue_slave_if_lost(slave):
120+ ... builder = MockBuilder('mock', slave, TestBuildBehavior())
121+ ... builder.rescueIfLost(logging.getLogger())
122+
123+An idle slave is not rescued.
124+
125+ >>> rescue_slave_if_lost(OkSlave())
126+
127+Slaves building or having built the correct build are not rescued
128+either.
129+
130+ >>> rescue_slave_if_lost(BuildingSlave(build_id='good'))
131+ >>> rescue_slave_if_lost(WaitingSlave(build_id='good'))
132+
133+But if a slave is building the wrong ID, it is declared lost and
134+an abort is attempted. MockSlave prints out a message when it is aborted
135+or cleaned.
136+
137+ >>> rescue_slave_if_lost(BuildingSlave(build_id='bad'))
138+ Aborting slave
139+ INFO:root:Builder 'mock' rescued from 'bad': 'Bad value'
140+
141+Slaves having completed an incorrect build are also declared lost,
142+but there's no need to abort a completed build. Such builders are
143+instead simply cleaned, ready for the next build.
144+
145+ >>> rescue_slave_if_lost(WaitingSlave(build_id='bad'))
146+ Cleaning slave
147+ INFO:root:Builder 'mock' rescued from 'bad': 'Bad value'
148+
149
150=== modified file 'lib/lp/buildmaster/interfaces/builder.py'
151--- lib/lp/buildmaster/interfaces/builder.py 2010-10-18 11:57:09 +0000
152+++ lib/lp/buildmaster/interfaces/builder.py 2010-12-07 16:24:04 +0000
153@@ -154,6 +154,11 @@
154
155 currentjob = Attribute("BuildQueue instance for job being processed.")
156
157+ is_available = Bool(
158+ title=_("Whether or not a builder is available for building "
159+ "new jobs. "),
160+ required=False)
161+
162 failure_count = Int(
163 title=_('Failure Count'), required=False, default=0,
164 description=_("Number of consecutive failures for this builder."))
165@@ -168,74 +173,32 @@
166 def resetFailureCount():
167 """Set the failure_count back to zero."""
168
169- def failBuilder(reason):
170- """Mark builder as failed for a given reason."""
171-
172- def setSlaveForTesting(proxy):
173- """Sets the RPC proxy through which to operate the build slave."""
174-
175- def verifySlaveBuildCookie(slave_build_id):
176- """Verify that a slave's build cookie is consistent.
177-
178- This should delegate to the current `IBuildFarmJobBehavior`.
179- """
180-
181- def transferSlaveFileToLibrarian(file_sha1, filename, private):
182- """Transfer a file from the slave to the librarian.
183-
184- :param file_sha1: The file's sha1, which is how the file is addressed
185- in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog'
186- will cause the build log to be retrieved and gzipped.
187- :param filename: The name of the file to be given to the librarian file
188- alias.
189- :param private: True if the build is for a private archive.
190- :return: A librarian file alias.
191- """
192-
193- def getBuildQueue():
194- """Return a `BuildQueue` if there's an active job on this builder.
195-
196- :return: A BuildQueue, or None.
197- """
198-
199- def getCurrentBuildFarmJob():
200- """Return a `BuildFarmJob` for this builder."""
201-
202- # All methods below here return Deferred.
203-
204- def isAvailable():
205- """Whether or not a builder is available for building new jobs.
206-
207- :return: A Deferred that fires with True or False, depending on
208- whether the builder is available or not.
209+ def checkSlaveAlive():
210+ """Check that the buildd slave is alive.
211+
212+ This pings the slave over the network via the echo method and looks
213+ for the sent message as the reply.
214+
215+ :raises BuildDaemonError: When the slave is down.
216 """
217
218 def rescueIfLost(logger=None):
219 """Reset the slave if its job information doesn't match the DB.
220
221- This checks the build ID reported in the slave status against the
222- database. If it isn't building what we think it should be, the current
223- build will be aborted and the slave cleaned in preparation for a new
224- task. The decision about the slave's correctness is left up to
225- `IBuildFarmJobBehavior.verifySlaveBuildCookie`.
226-
227- :return: A Deferred that fires when the dialog with the slave is
228- finished. It does not have a return value.
229+ If the builder is BUILDING or WAITING but has a build ID string
230+ that doesn't match what is stored in the DB, we have to dismiss
231+ its current actions and clean the slave for another job, assuming
232+ the XMLRPC is working properly at this point.
233 """
234
235 def updateStatus(logger=None):
236- """Update the builder's status by probing it.
237-
238- :return: A Deferred that fires when the dialog with the slave is
239- finished. It does not have a return value.
240- """
241+ """Update the builder's status by probing it."""
242
243 def cleanSlave():
244- """Clean any temporary files from the slave.
245-
246- :return: A Deferred that fires when the dialog with the slave is
247- finished. It does not have a return value.
248- """
249+ """Clean any temporary files from the slave."""
250+
251+ def failBuilder(reason):
252+ """Mark builder as failed for a given reason."""
253
254 def requestAbort():
255 """Ask that a build be aborted.
256@@ -243,9 +206,6 @@
257 This takes place asynchronously: Actually killing everything running
258 can take some time so the slave status should be queried again to
259 detect when the abort has taken effect. (Look for status ABORTED).
260-
261- :return: A Deferred that fires when the dialog with the slave is
262- finished. It does not have a return value.
263 """
264
265 def resumeSlaveHost():
266@@ -257,35 +217,37 @@
267 :raises: CannotResumeHost: if builder is not virtual or if the
268 configuration command has failed.
269
270- :return: A Deferred that fires when the resume operation finishes,
271- whose value is a (stdout, stderr) tuple for success, or a Failure
272- whose value is a CannotResumeHost exception.
273+ :return: command stdout and stderr buffers as a tuple.
274 """
275
276+ def setSlaveForTesting(proxy):
277+ """Sets the RPC proxy through which to operate the build slave."""
278+
279 def slaveStatus():
280 """Get the slave status for this builder.
281
282- :return: A Deferred which fires when the slave dialog is complete.
283- Its value is a dict containing at least builder_status, but
284- potentially other values included by the current build
285- behavior.
286+ :return: a dict containing at least builder_status, but potentially
287+ other values included by the current build behavior.
288 """
289
290 def slaveStatusSentence():
291 """Get the slave status sentence for this builder.
292
293- :return: A Deferred which fires when the slave dialog is complete.
294- Its value is a tuple with the first element containing the
295- slave status, build_id-queue-id and then optionally more
296- elements depending on the status.
297+ :return: A tuple with the first element containing the slave status,
298+ build_id-queue-id and then optionally more elements depending on
299+ the status.
300+ """
301+
302+ def verifySlaveBuildCookie(slave_build_id):
303+ """Verify that a slave's build cookie is consistent.
304+
305+ This should delegate to the current `IBuildFarmJobBehavior`.
306 """
307
308 def updateBuild(queueItem):
309 """Verify the current build job status.
310
311 Perform the required actions for each state.
312-
313- :return: A Deferred that fires when the slave dialog is finished.
314 """
315
316 def startBuild(build_queue_item, logger):
317@@ -293,10 +255,21 @@
318
319 :param build_queue_item: A BuildQueueItem to build.
320 :param logger: A logger to be used to log diagnostic information.
321-
322- :return: A Deferred that fires after the dispatch has completed whose
323- value is None, or a Failure that contains an exception
324- explaining what went wrong.
325+ :raises BuildSlaveFailure: When the build slave fails.
326+ :raises CannotBuild: When a build cannot be started for some reason
327+ other than the build slave failing.
328+ """
329+
330+ def transferSlaveFileToLibrarian(file_sha1, filename, private):
331+ """Transfer a file from the slave to the librarian.
332+
333+ :param file_sha1: The file's sha1, which is how the file is addressed
334+ in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog'
335+ will cause the build log to be retrieved and gzipped.
336+ :param filename: The name of the file to be given to the librarian file
337+ alias.
338+ :param private: True if the build is for a private archive.
339+ :return: A librarian file alias.
340 """
341
342 def handleTimeout(logger, error_message):
343@@ -311,8 +284,6 @@
344
345 :param logger: The logger object to be used for logging.
346 :param error_message: The error message to be used for logging.
347- :return: A Deferred that fires after the virtual slave was resumed
348- or immediately if it's a non-virtual slave.
349 """
350
351 def findAndStartJob(buildd_slave=None):
352@@ -320,9 +291,17 @@
353
354 :param buildd_slave: An optional buildd slave that this builder should
355 talk to.
356- :return: A Deferred whose value is the `IBuildQueue` instance
357- found or None if no job was found.
358- """
359+ :return: the `IBuildQueue` instance found or None if no job was found.
360+ """
361+
362+ def getBuildQueue():
363+ """Return a `BuildQueue` if there's an active job on this builder.
364+
365+ :return: A BuildQueue, or None.
366+ """
367+
368+ def getCurrentBuildFarmJob():
369+ """Return a `BuildFarmJob` for this builder."""
370
371
372 class IBuilderSet(Interface):
373
374=== modified file 'lib/lp/buildmaster/manager.py'
375--- lib/lp/buildmaster/manager.py 2010-10-20 12:28:46 +0000
376+++ lib/lp/buildmaster/manager.py 2010-12-07 16:24:04 +0000
377@@ -10,10 +10,13 @@
378 'BuilddManager',
379 'BUILDD_MANAGER_LOG_NAME',
380 'FailDispatchResult',
381+ 'RecordingSlave',
382 'ResetDispatchResult',
383+ 'buildd_success_result_map',
384 ]
385
386 import logging
387+import os
388
389 import transaction
390 from twisted.application import service
391@@ -21,27 +24,129 @@
392 defer,
393 reactor,
394 )
395-from twisted.internet.task import LoopingCall
396+from twisted.protocols.policies import TimeoutMixin
397 from twisted.python import log
398+from twisted.python.failure import Failure
399+from twisted.web import xmlrpc
400 from zope.component import getUtility
401
402+from canonical.config import config
403+from canonical.launchpad.webapp import urlappend
404+from lp.services.database import write_transaction
405 from lp.buildmaster.enums import BuildStatus
406-from lp.buildmaster.interfaces.buildfarmjobbehavior import (
407- BuildBehaviorMismatch,
408- )
409-from lp.buildmaster.model.builder import Builder
410-from lp.buildmaster.interfaces.builder import (
411- BuildDaemonError,
412- BuildSlaveFailure,
413- CannotBuild,
414- CannotFetchFile,
415- CannotResumeHost,
416- )
417+from lp.services.twistedsupport.processmonitor import ProcessWithTimeout
418
419
420 BUILDD_MANAGER_LOG_NAME = "slave-scanner"
421
422
423+buildd_success_result_map = {
424+ 'ensurepresent': True,
425+ 'build': 'BuilderStatus.BUILDING',
426+ }
427+
428+
429+class QueryWithTimeoutProtocol(xmlrpc.QueryProtocol, TimeoutMixin):
430+ """XMLRPC query protocol with a configurable timeout.
431+
432+ XMLRPC queries using this protocol will be unconditionally closed
433+ when the timeout is elapsed. The timeout is fetched from the context
434+ Launchpad configuration file (`config.builddmaster.socket_timeout`).
435+ """
436+ def connectionMade(self):
437+ xmlrpc.QueryProtocol.connectionMade(self)
438+ self.setTimeout(config.builddmaster.socket_timeout)
439+
440+
441+class QueryFactoryWithTimeout(xmlrpc._QueryFactory):
442+ """XMLRPC client factory with timeout support."""
443+ # Make this factory quiet.
444+ noisy = False
445+ # Use the protocol with timeout support.
446+ protocol = QueryWithTimeoutProtocol
447+
448+
449+class RecordingSlave:
450+ """An RPC proxy for buildd slaves that records instructions to the latter.
451+
452+ The idea here is to merely record the instructions that the slave-scanner
453+ issues to the buildd slaves and "replay" them a bit later in asynchronous
454+ and parallel fashion.
455+
456+ By dealing with a number of buildd slaves in parallel we remove *the*
457+ major slave-scanner throughput issue while avoiding large-scale changes to
458+ its code base.
459+ """
460+
461+ def __init__(self, name, url, vm_host):
462+ self.name = name
463+ self.url = url
464+ self.vm_host = vm_host
465+
466+ self.resume_requested = False
467+ self.calls = []
468+
469+ def __repr__(self):
470+ return '<%s:%s>' % (self.name, self.url)
471+
472+ def cacheFile(self, logger, libraryfilealias):
473+ """Cache the file on the server."""
474+ self.ensurepresent(
475+ libraryfilealias.content.sha1, libraryfilealias.http_url, '', '')
476+
477+ def sendFileToSlave(self, *args):
478+ """Helper to send a file to this builder."""
479+ return self.ensurepresent(*args)
480+
481+ def ensurepresent(self, *args):
482+ """Download files needed for the build."""
483+ self.calls.append(('ensurepresent', args))
484+ result = buildd_success_result_map.get('ensurepresent')
485+ return [result, 'Download']
486+
487+ def build(self, *args):
488+ """Perform the build."""
489+ # XXX: This method does not appear to be used.
490+ self.calls.append(('build', args))
491+ result = buildd_success_result_map.get('build')
492+ return [result, args[0]]
493+
494+ def resume(self):
495+ """Record the request to resume the builder..
496+
497+ Always succeed.
498+
499+ :return: a (stdout, stderr, subprocess exitcode) triple
500+ """
501+ self.resume_requested = True
502+ return ['', '', 0]
503+
504+ def resumeSlave(self, clock=None):
505+ """Resume the builder in a asynchronous fashion.
506+
507+ Used the configuration command-line in the same way
508+ `BuilddSlave.resume` does.
509+
510+ Also use the builddmaster configuration 'socket_timeout' as
511+ the process timeout.
512+
513+ :param clock: An optional twisted.internet.task.Clock to override
514+ the default clock. For use in tests.
515+
516+ :return: a Deferred
517+ """
518+ resume_command = config.builddmaster.vm_resume_command % {
519+ 'vm_host': self.vm_host}
520+ # Twisted API require string and the configuration provides unicode.
521+ resume_argv = [str(term) for term in resume_command.split()]
522+
523+ d = defer.Deferred()
524+ p = ProcessWithTimeout(
525+ d, config.builddmaster.socket_timeout, clock=clock)
526+ p.spawnProcess(resume_argv[0], tuple(resume_argv))
527+ return d
528+
529+
530 def get_builder(name):
531 """Helper to return the builder given the slave for this request."""
532 # Avoiding circular imports.
533@@ -54,12 +159,9 @@
534 # builder.currentjob hides a complicated query, don't run it twice.
535 # See bug 623281.
536 current_job = builder.currentjob
537- if current_job is None:
538- job_failure_count = 0
539- else:
540- job_failure_count = current_job.specific_job.build.failure_count
541+ build_job = current_job.specific_job.build
542
543- if builder.failure_count == job_failure_count and current_job is not None:
544+ if builder.failure_count == build_job.failure_count:
545 # If the failure count for the builder is the same as the
546 # failure count for the job being built, then we cannot
547 # tell whether the job or the builder is at fault. The best
548@@ -68,28 +170,17 @@
549 current_job.reset()
550 return
551
552- if builder.failure_count > job_failure_count:
553+ if builder.failure_count > build_job.failure_count:
554 # The builder has failed more than the jobs it's been
555- # running.
556-
557- # Re-schedule the build if there is one.
558- if current_job is not None:
559- current_job.reset()
560-
561- # We are a little more tolerant with failing builders than
562- # failing jobs because sometimes they get unresponsive due to
563- # human error, flaky networks etc. We expect the builder to get
564- # better, whereas jobs are very unlikely to get better.
565- if builder.failure_count >= Builder.FAILURE_THRESHOLD:
566- # It's also gone over the threshold so let's disable it.
567- builder.failBuilder(fail_notes)
568+ # running, so let's disable it and re-schedule the build.
569+ builder.failBuilder(fail_notes)
570+ current_job.reset()
571 else:
572 # The job is the culprit! Override its status to 'failed'
573 # to make sure it won't get automatically dispatched again,
574 # and remove the buildqueue request. The failure should
575 # have already caused any relevant slave data to be stored
576 # on the build record so don't worry about that here.
577- build_job = current_job.specific_job.build
578 build_job.status = BuildStatus.FAILEDTOBUILD
579 builder.currentjob.destroySelf()
580
581@@ -99,108 +190,133 @@
582 # next buildd scan.
583
584
585+class BaseDispatchResult:
586+ """Base class for *DispatchResult variations.
587+
588+ It will be extended to represent dispatching results and allow
589+ homogeneous processing.
590+ """
591+
592+ def __init__(self, slave, info=None):
593+ self.slave = slave
594+ self.info = info
595+
596+ def _cleanJob(self, job):
597+ """Clean up in case of builder reset or dispatch failure."""
598+ if job is not None:
599+ job.reset()
600+
601+ def assessFailureCounts(self):
602+ """View builder/job failure_count and work out which needs to die.
603+
604+ :return: True if we disabled something, False if we did not.
605+ """
606+ builder = get_builder(self.slave.name)
607+ assessFailureCounts(builder, self.info)
608+
609+ def ___call__(self):
610+ raise NotImplementedError(
611+ "Call sites must define an evaluation method.")
612+
613+
614+class FailDispatchResult(BaseDispatchResult):
615+ """Represents a communication failure while dispatching a build job..
616+
617+ When evaluated this object mark the corresponding `IBuilder` as
618+ 'NOK' with the given text as 'failnotes'. It also cleans up the running
619+ job (`IBuildQueue`).
620+ """
621+
622+ def __repr__(self):
623+ return '%r failure (%s)' % (self.slave, self.info)
624+
625+ @write_transaction
626+ def __call__(self):
627+ self.assessFailureCounts()
628+
629+
630+class ResetDispatchResult(BaseDispatchResult):
631+ """Represents a failure to reset a builder.
632+
633+ When evaluated this object simply cleans up the running job
634+ (`IBuildQueue`) and marks the builder down.
635+ """
636+
637+ def __repr__(self):
638+ return '%r reset failure' % self.slave
639+
640+ @write_transaction
641+ def __call__(self):
642+ builder = get_builder(self.slave.name)
643+ # Builders that fail to reset should be disabled as per bug
644+ # 563353.
645+ # XXX Julian bug=586362
646+ # This is disabled until this code is not also used for dispatch
647+ # failures where we *don't* want to disable the builder.
648+ # builder.failBuilder(self.info)
649+ self._cleanJob(builder.currentjob)
650+
651+
652 class SlaveScanner:
653 """A manager for a single builder."""
654
655- # The interval between each poll cycle, in seconds. We'd ideally
656- # like this to be lower but 5 seems a reasonable compromise between
657- # responsivity and load on the database server, since in each cycle
658- # we can run quite a few queries.
659 SCAN_INTERVAL = 5
660
661+ # These are for the benefit of tests; see `TestingSlaveScanner`.
662+ # It pokes fake versions in here so that it can verify methods were
663+ # called. The tests should really be using FakeMethod() though.
664+ reset_result = ResetDispatchResult
665+ fail_result = FailDispatchResult
666+
667 def __init__(self, builder_name, logger):
668 self.builder_name = builder_name
669 self.logger = logger
670+ self._deferred_list = []
671+
672+ def scheduleNextScanCycle(self):
673+ """Schedule another scan of the builder some time in the future."""
674+ self._deferred_list = []
675+ # XXX: Change this to use LoopingCall.
676+ reactor.callLater(self.SCAN_INTERVAL, self.startCycle)
677
678 def startCycle(self):
679 """Scan the builder and dispatch to it or deal with failures."""
680- self.loop = LoopingCall(self.singleCycle)
681- self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
682- return self.stopping_deferred
683-
684- def stopCycle(self):
685- """Terminate the LoopingCall."""
686- self.loop.stop()
687-
688- def singleCycle(self):
689 self.logger.debug("Scanning builder: %s" % self.builder_name)
690- d = self.scan()
691-
692- d.addErrback(self._scanFailed)
693- return d
694-
695- def _scanFailed(self, failure):
696- """Deal with failures encountered during the scan cycle.
697-
698- 1. Print the error in the log
699- 2. Increment and assess failure counts on the builder and job.
700- """
701- # Make sure that pending database updates are removed as it
702- # could leave the database in an inconsistent state (e.g. The
703- # job says it's running but the buildqueue has no builder set).
704- transaction.abort()
705-
706- # If we don't recognise the exception include a stack trace with
707- # the error.
708- error_message = failure.getErrorMessage()
709- if failure.check(
710- BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch,
711- CannotResumeHost, BuildDaemonError, CannotFetchFile):
712- self.logger.info("Scanning failed with: %s" % error_message)
713- else:
714+
715+ try:
716+ slave = self.scan()
717+ if slave is None:
718+ self.scheduleNextScanCycle()
719+ else:
720+ # XXX: Ought to return Deferred.
721+ self.resumeAndDispatch(slave)
722+ except:
723+ error = Failure()
724 self.logger.info("Scanning failed with: %s\n%s" %
725- (failure.getErrorMessage(), failure.getTraceback()))
726+ (error.getErrorMessage(), error.getTraceback()))
727
728- # Decide if we need to terminate the job or fail the
729- # builder.
730- try:
731 builder = get_builder(self.builder_name)
732- builder.gotFailure()
733- if builder.currentjob is not None:
734- build_farm_job = builder.getCurrentBuildFarmJob()
735- build_farm_job.gotFailure()
736- self.logger.info(
737- "builder %s failure count: %s, "
738- "job '%s' failure count: %s" % (
739- self.builder_name,
740- builder.failure_count,
741- build_farm_job.title,
742- build_farm_job.failure_count))
743- else:
744- self.logger.info(
745- "Builder %s failed a probe, count: %s" % (
746- self.builder_name, builder.failure_count))
747- assessFailureCounts(builder, failure.getErrorMessage())
748+
749+ # Decide if we need to terminate the job or fail the
750+ # builder.
751+ self._incrementFailureCounts(builder)
752+ self.logger.info(
753+ "builder failure count: %s, job failure count: %s" % (
754+ builder.failure_count,
755+ builder.getCurrentBuildFarmJob().failure_count))
756+ assessFailureCounts(builder, error.getErrorMessage())
757 transaction.commit()
758- except:
759- # Catastrophic code failure! Not much we can do.
760- self.logger.error(
761- "Miserable failure when trying to examine failure counts:\n",
762- exc_info=True)
763- transaction.abort()
764-
765+
766+ self.scheduleNextScanCycle()
767+
768+ @write_transaction
769 def scan(self):
770 """Probe the builder and update/dispatch/collect as appropriate.
771
772- There are several steps to scanning:
773-
774- 1. If the builder is marked as "ok" then probe it to see what state
775- it's in. This is where lost jobs are rescued if we think the
776- builder is doing something that it later tells us it's not,
777- and also where the multi-phase abort procedure happens.
778- See IBuilder.rescueIfLost, which is called by
779- IBuilder.updateStatus().
780- 2. If the builder is still happy, we ask it if it has an active build
781- and then either update the build in Launchpad or collect the
782- completed build. (builder.updateBuild)
783- 3. If the builder is not happy or it was marked as unavailable
784- mid-build, we need to reset the job that we thought it had, so
785- that the job is dispatched elsewhere.
786- 4. If the builder is idle and we have another build ready, dispatch
787- it.
788-
789- :return: A Deferred that fires when the scan is complete, whose
790- value is A `BuilderSlave` if we dispatched a job to it, or None.
791+ The whole method is wrapped in a transaction, but we do partial
792+ commits to avoid holding locks on tables.
793+
794+ :return: A `RecordingSlave` if we dispatched a job to it, or None.
795 """
796 # We need to re-fetch the builder object on each cycle as the
797 # Storm store is invalidated over transaction boundaries.
798@@ -208,72 +324,240 @@
799 self.builder = get_builder(self.builder_name)
800
801 if self.builder.builderok:
802- d = self.builder.updateStatus(self.logger)
803+ self.builder.updateStatus(self.logger)
804+ transaction.commit()
805+
806+ # See if we think there's an active build on the builder.
807+ buildqueue = self.builder.getBuildQueue()
808+
809+ # XXX Julian 2010-07-29 bug=611258
810+ # We're not using the RecordingSlave until dispatching, which
811+ # means that this part blocks until we've received a response
812+ # from the builder. updateBuild() needs to be made
813+ # asyncronous.
814+
815+ # Scan the slave and get the logtail, or collect the build if
816+ # it's ready. Yes, "updateBuild" is a bad name.
817+ if buildqueue is not None:
818+ self.builder.updateBuild(buildqueue)
819+ transaction.commit()
820+
821+ # If the builder is in manual mode, don't dispatch anything.
822+ if self.builder.manual:
823+ self.logger.debug(
824+ '%s is in manual mode, not dispatching.' % self.builder.name)
825+ return None
826+
827+ # If the builder is marked unavailable, don't dispatch anything.
828+ # Additionaly, because builders can be removed from the pool at
829+ # any time, we need to see if we think there was a build running
830+ # on it before it was marked unavailable. In this case we reset
831+ # the build thusly forcing it to get re-dispatched to another
832+ # builder.
833+ if not self.builder.is_available:
834+ job = self.builder.currentjob
835+ if job is not None and not self.builder.builderok:
836+ self.logger.info(
837+ "%s was made unavailable, resetting attached "
838+ "job" % self.builder.name)
839+ job.reset()
840+ transaction.commit()
841+ return None
842+
843+ # See if there is a job we can dispatch to the builder slave.
844+
845+ # XXX: Rather than use the slave actually associated with the builder
846+ # (which, incidentally, shouldn't be a property anyway), we make a new
847+ # RecordingSlave so we can get access to its asynchronous
848+ # "resumeSlave" method. Blech.
849+ slave = RecordingSlave(
850+ self.builder.name, self.builder.url, self.builder.vm_host)
851+ # XXX: Passing buildd_slave=slave overwrites the 'slave' property of
852+ # self.builder. Not sure why this is needed yet.
853+ self.builder.findAndStartJob(buildd_slave=slave)
854+ if self.builder.currentjob is not None:
855+ # After a successful dispatch we can reset the
856+ # failure_count.
857+ self.builder.resetFailureCount()
858+ transaction.commit()
859+ return slave
860+
861+ return None
862+
863+ def resumeAndDispatch(self, slave):
864+ """Chain the resume and dispatching Deferreds."""
865+ # XXX: resumeAndDispatch makes Deferreds without returning them.
866+ if slave.resume_requested:
867+ # The slave needs to be reset before we can dispatch to
868+ # it (e.g. a virtual slave)
869+
870+ # XXX: Two problems here. The first is that 'resumeSlave' only
871+ # exists on RecordingSlave (BuilderSlave calls it 'resume').
872+ d = slave.resumeSlave()
873+ d.addBoth(self.checkResume, slave)
874 else:
875+ # No resume required, build dispatching can commence.
876 d = defer.succeed(None)
877
878- def status_updated(ignored):
879- # Commit the changes done while possibly rescuing jobs, to
880- # avoid holding table locks.
881- transaction.commit()
882-
883- # See if we think there's an active build on the builder.
884- buildqueue = self.builder.getBuildQueue()
885-
886- # Scan the slave and get the logtail, or collect the build if
887- # it's ready. Yes, "updateBuild" is a bad name.
888- if buildqueue is not None:
889- return self.builder.updateBuild(buildqueue)
890-
891- def build_updated(ignored):
892- # Commit changes done while updating the build, to avoid
893- # holding table locks.
894- transaction.commit()
895-
896- # If the builder is in manual mode, don't dispatch anything.
897- if self.builder.manual:
898- self.logger.debug(
899- '%s is in manual mode, not dispatching.' %
900- self.builder.name)
901- return
902-
903- # If the builder is marked unavailable, don't dispatch anything.
904- # Additionaly, because builders can be removed from the pool at
905- # any time, we need to see if we think there was a build running
906- # on it before it was marked unavailable. In this case we reset
907- # the build thusly forcing it to get re-dispatched to another
908- # builder.
909-
910- return self.builder.isAvailable().addCallback(got_available)
911-
912- def got_available(available):
913- if not available:
914- job = self.builder.currentjob
915- if job is not None and not self.builder.builderok:
916- self.logger.info(
917- "%s was made unavailable, resetting attached "
918- "job" % self.builder.name)
919- job.reset()
920- transaction.commit()
921- return
922-
923- # See if there is a job we can dispatch to the builder slave.
924-
925- d = self.builder.findAndStartJob()
926- def job_started(candidate):
927- if self.builder.currentjob is not None:
928- # After a successful dispatch we can reset the
929- # failure_count.
930- self.builder.resetFailureCount()
931- transaction.commit()
932- return self.builder.slave
933- else:
934+ # Dispatch the build to the slave asynchronously.
935+ d.addCallback(self.initiateDispatch, slave)
936+ # Store this deferred so we can wait for it along with all
937+ # the others that will be generated by RecordingSlave during
938+ # the dispatch process, and chain a callback after they've
939+ # all fired.
940+ self._deferred_list.append(d)
941+
942+ def initiateDispatch(self, resume_result, slave):
943+ """Start dispatching a build to a slave.
944+
945+ If the previous task in chain (slave resuming) has failed it will
946+ receive a `ResetBuilderRequest` instance as 'resume_result' and
947+ will immediately return that so the subsequent callback can collect
948+ it.
949+
950+ If the slave resuming succeeded, it starts the XMLRPC dialogue. The
951+ dialogue may consist of many calls to the slave before the build
952+ starts. Each call is done via a Deferred event, where slave calls
953+ are sent in callSlave(), and checked in checkDispatch() which will
954+ keep firing events via callSlave() until all the events are done or
955+ an error occurs.
956+ """
957+ if resume_result is not None:
958+ self.slaveConversationEnded()
959+ return resume_result
960+
961+ self.logger.info('Dispatching: %s' % slave)
962+ self.callSlave(slave)
963+
964+ def _getProxyForSlave(self, slave):
965+ """Return a twisted.web.xmlrpc.Proxy for the buildd slave.
966+
967+ Uses a protocol with timeout support, See QueryFactoryWithTimeout.
968+ """
969+ proxy = xmlrpc.Proxy(str(urlappend(slave.url, 'rpc')))
970+ proxy.queryFactory = QueryFactoryWithTimeout
971+ return proxy
972+
973+ def callSlave(self, slave):
974+ """Dispatch the next XMLRPC for the given slave."""
975+ if len(slave.calls) == 0:
976+ # That's the end of the dialogue with the slave.
977+ self.slaveConversationEnded()
978+ return
979+
980+ # Get an XMLRPC proxy for the buildd slave.
981+ proxy = self._getProxyForSlave(slave)
982+ method, args = slave.calls.pop(0)
983+ d = proxy.callRemote(method, *args)
984+ d.addBoth(self.checkDispatch, method, slave)
985+ self._deferred_list.append(d)
986+ self.logger.debug('%s -> %s(%s)' % (slave, method, args))
987+
988+ def slaveConversationEnded(self):
989+ """After all the Deferreds are set up, chain a callback on them."""
990+ dl = defer.DeferredList(self._deferred_list, consumeErrors=True)
991+ dl.addBoth(self.evaluateDispatchResult)
992+ return dl
993+
994+ def evaluateDispatchResult(self, deferred_list_results):
995+ """Process the DispatchResult for this dispatch chain.
996+
997+ After waiting for the Deferred chain to finish, we'll have a
998+ DispatchResult to evaluate, which deals with the result of
999+ dispatching.
1000+ """
1001+ # The `deferred_list_results` is what we get when waiting on a
1002+ # DeferredList. It's a list of tuples of (status, result) where
1003+ # result is what the last callback in that chain returned.
1004+
1005+ # If the result is an instance of BaseDispatchResult we need to
1006+ # evaluate it, as there's further action required at the end of
1007+ # the dispatch chain. None, resulting from successful chains,
1008+ # are discarded.
1009+
1010+ dispatch_results = [
1011+ result for status, result in deferred_list_results
1012+ if isinstance(result, BaseDispatchResult)]
1013+
1014+ for result in dispatch_results:
1015+ self.logger.info("%r" % result)
1016+ result()
1017+
1018+ # At this point, we're done dispatching, so we can schedule the
1019+ # next scan cycle.
1020+ self.scheduleNextScanCycle()
1021+
1022+ # For the test suite so that it can chain callback results.
1023+ return deferred_list_results
1024+
1025+ def checkResume(self, response, slave):
1026+ """Check the result of resuming a slave.
1027+
1028+ If there's a problem resuming, we return a ResetDispatchResult which
1029+ will get evaluated at the end of the scan, or None if the resume
1030+ was OK.
1031+
1032+ :param response: the tuple that's constructed in
1033+ ProcessWithTimeout.processEnded(), or a Failure that
1034+ contains the tuple.
1035+ :param slave: the slave object we're talking to
1036+ """
1037+ if isinstance(response, Failure):
1038+ out, err, code = response.value
1039+ else:
1040+ out, err, code = response
1041+ if code == os.EX_OK:
1042+ return None
1043+
1044+ error_text = '%s\n%s' % (out, err)
1045+ self.logger.error('%s resume failure: %s' % (slave, error_text))
1046+ return self.reset_result(slave, error_text)
1047+
1048+ def _incrementFailureCounts(self, builder):
1049+ builder.gotFailure()
1050+ builder.getCurrentBuildFarmJob().gotFailure()
1051+
1052+ def checkDispatch(self, response, method, slave):
1053+ """Verify the results of a slave xmlrpc call.
1054+
1055+ If it failed and it compromises the slave then return a corresponding
1056+ `FailDispatchResult`, if it was a communication failure, simply
1057+ reset the slave by returning a `ResetDispatchResult`.
1058+ """
1059+ from lp.buildmaster.interfaces.builder import IBuilderSet
1060+ builder = getUtility(IBuilderSet)[slave.name]
1061+
1062+ # XXX these DispatchResult classes are badly named and do the
1063+ # same thing. We need to fix that.
1064+ self.logger.debug(
1065+ '%s response for "%s": %s' % (slave, method, response))
1066+
1067+ if isinstance(response, Failure):
1068+ self.logger.warn(
1069+ '%s communication failed (%s)' %
1070+ (slave, response.getErrorMessage()))
1071+ self.slaveConversationEnded()
1072+ self._incrementFailureCounts(builder)
1073+ return self.fail_result(slave)
1074+
1075+ if isinstance(response, list) and len(response) == 2:
1076+ if method in buildd_success_result_map:
1077+ expected_status = buildd_success_result_map.get(method)
1078+ status, info = response
1079+ if status == expected_status:
1080+ self.callSlave(slave)
1081 return None
1082- return d.addCallback(job_started)
1083-
1084- d.addCallback(status_updated)
1085- d.addCallback(build_updated)
1086- return d
1087+ else:
1088+ info = 'Unknown slave method: %s' % method
1089+ else:
1090+ info = 'Unexpected response: %s' % repr(response)
1091+
1092+ self.logger.error(
1093+ '%s failed to dispatch (%s)' % (slave, info))
1094+
1095+ self.slaveConversationEnded()
1096+ self._incrementFailureCounts(builder)
1097+ return self.fail_result(slave, info)
1098
1099
1100 class NewBuildersScanner:
1101@@ -294,21 +578,15 @@
1102 self.current_builders = [
1103 builder.name for builder in getUtility(IBuilderSet)]
1104
1105- def stop(self):
1106- """Terminate the LoopingCall."""
1107- self.loop.stop()
1108-
1109 def scheduleScan(self):
1110 """Schedule a callback SCAN_INTERVAL seconds later."""
1111- self.loop = LoopingCall(self.scan)
1112- self.loop.clock = self._clock
1113- self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL)
1114- return self.stopping_deferred
1115+ return self._clock.callLater(self.SCAN_INTERVAL, self.scan)
1116
1117 def scan(self):
1118 """If a new builder appears, create a SlaveScanner for it."""
1119 new_builders = self.checkForNewBuilders()
1120 self.manager.addScanForBuilders(new_builders)
1121+ self.scheduleScan()
1122
1123 def checkForNewBuilders(self):
1124 """See if any new builders were added."""
1125@@ -331,7 +609,10 @@
1126 manager=self, clock=clock)
1127
1128 def _setupLogger(self):
1129- """Set up a 'slave-scanner' logger that redirects to twisted.
1130+ """Setup a 'slave-scanner' logger that redirects to twisted.
1131+
1132+ It is going to be used locally and within the thread running
1133+ the scan() method.
1134
1135 Make it less verbose to avoid messing too much with the old code.
1136 """
1137@@ -362,29 +643,12 @@
1138 # Events will now fire in the SlaveScanner objects to scan each
1139 # builder.
1140
1141- def stopService(self):
1142- """Callback for when we need to shut down."""
1143- # XXX: lacks unit tests
1144- # All the SlaveScanner objects need to be halted gracefully.
1145- deferreds = [slave.stopping_deferred for slave in self.builder_slaves]
1146- deferreds.append(self.new_builders_scanner.stopping_deferred)
1147-
1148- self.new_builders_scanner.stop()
1149- for slave in self.builder_slaves:
1150- slave.stopCycle()
1151-
1152- # The 'stopping_deferred's are called back when the loops are
1153- # stopped, so we can wait on them all at once here before
1154- # exiting.
1155- d = defer.DeferredList(deferreds, consumeErrors=True)
1156- return d
1157-
1158 def addScanForBuilders(self, builders):
1159 """Set up scanner objects for the builders specified."""
1160 for builder in builders:
1161 slave_scanner = SlaveScanner(builder, self.logger)
1162 self.builder_slaves.append(slave_scanner)
1163- slave_scanner.startCycle()
1164+ slave_scanner.scheduleNextScanCycle()
1165
1166 # Return the slave list for the benefit of tests.
1167 return self.builder_slaves
1168
1169=== modified file 'lib/lp/buildmaster/model/builder.py'
1170--- lib/lp/buildmaster/model/builder.py 2010-10-20 11:54:27 +0000
1171+++ lib/lp/buildmaster/model/builder.py 2010-12-07 16:24:04 +0000
1172@@ -13,11 +13,12 @@
1173 ]
1174
1175 import gzip
1176+import httplib
1177 import logging
1178 import os
1179 import socket
1180+import subprocess
1181 import tempfile
1182-import transaction
1183 import urllib2
1184 import xmlrpclib
1185
1186@@ -33,13 +34,6 @@
1187 Count,
1188 Sum,
1189 )
1190-
1191-from twisted.internet import (
1192- defer,
1193- reactor as default_reactor,
1194- )
1195-from twisted.web import xmlrpc
1196-
1197 from zope.component import getUtility
1198 from zope.interface import implements
1199
1200@@ -64,6 +58,7 @@
1201 from lp.buildmaster.interfaces.builder import (
1202 BuildDaemonError,
1203 BuildSlaveFailure,
1204+ CannotBuild,
1205 CannotFetchFile,
1206 CannotResumeHost,
1207 CorruptBuildCookie,
1208@@ -71,6 +66,9 @@
1209 IBuilderSet,
1210 )
1211 from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSet
1212+from lp.buildmaster.interfaces.buildfarmjobbehavior import (
1213+ BuildBehaviorMismatch,
1214+ )
1215 from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
1216 from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior
1217 from lp.buildmaster.model.buildqueue import (
1218@@ -80,9 +78,9 @@
1219 from lp.registry.interfaces.person import validate_public_person
1220 from lp.services.job.interfaces.job import JobStatus
1221 from lp.services.job.model.job import Job
1222+from lp.services.osutils import until_no_eintr
1223 from lp.services.propertycache import cachedproperty
1224-from lp.services.twistedsupport.processmonitor import ProcessWithTimeout
1225-from lp.services.twistedsupport import cancel_on_timeout
1226+from lp.services.twistedsupport.xmlrpc import BlockingProxy
1227 # XXX Michael Nelson 2010-01-13 bug=491330
1228 # These dependencies on soyuz will be removed when getBuildRecords()
1229 # is moved.
1230@@ -94,9 +92,25 @@
1231 from lp.soyuz.model.processor import Processor
1232
1233
1234-class QuietQueryFactory(xmlrpc._QueryFactory):
1235- """XMLRPC client factory that doesn't splatter the log with junk."""
1236- noisy = False
1237+class TimeoutHTTPConnection(httplib.HTTPConnection):
1238+
1239+ def connect(self):
1240+ """Override the standard connect() methods to set a timeout"""
1241+ ret = httplib.HTTPConnection.connect(self)
1242+ self.sock.settimeout(config.builddmaster.socket_timeout)
1243+ return ret
1244+
1245+
1246+class TimeoutHTTP(httplib.HTTP):
1247+ _connection_class = TimeoutHTTPConnection
1248+
1249+
1250+class TimeoutTransport(xmlrpclib.Transport):
1251+ """XMLRPC Transport to setup a socket with defined timeout"""
1252+
1253+ def make_connection(self, host):
1254+ host, extra_headers, x509 = self.get_host_info(host)
1255+ return TimeoutHTTP(host)
1256
1257
1258 class BuilderSlave(object):
1259@@ -111,7 +125,24 @@
1260 # many false positives in your test run and will most likely break
1261 # production.
1262
1263- def __init__(self, proxy, builder_url, vm_host, reactor=None):
1264+ # XXX: This (BuilderSlave) should use composition, rather than
1265+ # inheritance.
1266+
1267+ # XXX: Have a documented interface for the XML-RPC server:
1268+ # - what methods
1269+ # - what return values expected
1270+ # - what faults
1271+ # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py).
1272+
1273+ # XXX: Arguably, this interface should be asynchronous
1274+ # (i.e. Deferred-returning). This would mean that Builder (see below)
1275+ # would have to expect Deferreds.
1276+
1277+ # XXX: Once we have a client object with a defined, tested interface, we
1278+ # should make a test double that doesn't do any XML-RPC and can be used to
1279+ # make testing easier & tests faster.
1280+
1281+ def __init__(self, proxy, builder_url, vm_host):
1282 """Initialize a BuilderSlave.
1283
1284 :param proxy: An XML-RPC proxy, implementing 'callRemote'. It must
1285@@ -124,87 +155,63 @@
1286 self._file_cache_url = urlappend(builder_url, 'filecache')
1287 self._server = proxy
1288
1289- if reactor is None:
1290- self.reactor = default_reactor
1291- else:
1292- self.reactor = reactor
1293-
1294 @classmethod
1295- def makeBuilderSlave(cls, builder_url, vm_host, reactor=None, proxy=None):
1296- """Create and return a `BuilderSlave`.
1297-
1298- :param builder_url: The URL of the slave buildd machine,
1299- e.g. http://localhost:8221
1300- :param vm_host: If the slave is virtual, specify its host machine here.
1301- :param reactor: Used by tests to override the Twisted reactor.
1302- :param proxy: Used By tests to override the xmlrpc.Proxy.
1303- """
1304- rpc_url = urlappend(builder_url.encode('utf-8'), 'rpc')
1305- if proxy is None:
1306- server_proxy = xmlrpc.Proxy(rpc_url, allowNone=True)
1307- server_proxy.queryFactory = QuietQueryFactory
1308- else:
1309- server_proxy = proxy
1310- return cls(server_proxy, builder_url, vm_host, reactor)
1311-
1312- def _with_timeout(self, d):
1313- TIMEOUT = config.builddmaster.socket_timeout
1314- return cancel_on_timeout(d, TIMEOUT, self.reactor)
1315+ def makeBlockingSlave(cls, builder_url, vm_host):
1316+ rpc_url = urlappend(builder_url, 'rpc')
1317+ server_proxy = xmlrpclib.ServerProxy(
1318+ rpc_url, transport=TimeoutTransport(), allow_none=True)
1319+ return cls(BlockingProxy(server_proxy), builder_url, vm_host)
1320
1321 def abort(self):
1322 """Abort the current build."""
1323- return self._with_timeout(self._server.callRemote('abort'))
1324+ return self._server.callRemote('abort')
1325
1326 def clean(self):
1327 """Clean up the waiting files and reset the slave's internal state."""
1328- return self._with_timeout(self._server.callRemote('clean'))
1329+ return self._server.callRemote('clean')
1330
1331 def echo(self, *args):
1332 """Echo the arguments back."""
1333- return self._with_timeout(self._server.callRemote('echo', *args))
1334+ return self._server.callRemote('echo', *args)
1335
1336 def info(self):
1337 """Return the protocol version and the builder methods supported."""
1338- return self._with_timeout(self._server.callRemote('info'))
1339+ return self._server.callRemote('info')
1340
1341 def status(self):
1342 """Return the status of the build daemon."""
1343- return self._with_timeout(self._server.callRemote('status'))
1344+ return self._server.callRemote('status')
1345
1346 def ensurepresent(self, sha1sum, url, username, password):
1347- # XXX: Nothing external calls this. Make it private.
1348 """Attempt to ensure the given file is present."""
1349- return self._with_timeout(self._server.callRemote(
1350- 'ensurepresent', sha1sum, url, username, password))
1351+ return self._server.callRemote(
1352+ 'ensurepresent', sha1sum, url, username, password)
1353
1354 def getFile(self, sha_sum):
1355 """Construct a file-like object to return the named file."""
1356- # XXX 2010-10-18 bug=662631
1357- # Change this to do non-blocking IO.
1358 file_url = urlappend(self._file_cache_url, sha_sum)
1359 return urllib2.urlopen(file_url)
1360
1361- def resume(self, clock=None):
1362- """Resume the builder in an asynchronous fashion.
1363-
1364- We use the builddmaster configuration 'socket_timeout' as
1365- the process timeout.
1366-
1367- :param clock: An optional twisted.internet.task.Clock to override
1368- the default clock. For use in tests.
1369-
1370- :return: a Deferred that returns a
1371- (stdout, stderr, subprocess exitcode) triple
1372+ def resume(self):
1373+ """Resume a virtual builder.
1374+
1375+ It uses the configuration command-line (replacing 'vm_host') and
1376+ return its output.
1377+
1378+ :return: a (stdout, stderr, subprocess exitcode) triple
1379 """
1380+ # XXX: This executes the vm_resume_command
1381+ # synchronously. RecordingSlave does so asynchronously. Since we
1382+ # always want to do this asynchronously, there's no need for the
1383+ # duplication.
1384 resume_command = config.builddmaster.vm_resume_command % {
1385 'vm_host': self._vm_host}
1386- # Twisted API requires string but the configuration provides unicode.
1387- resume_argv = [term.encode('utf-8') for term in resume_command.split()]
1388- d = defer.Deferred()
1389- p = ProcessWithTimeout(
1390- d, config.builddmaster.socket_timeout, clock=clock)
1391- p.spawnProcess(resume_argv[0], tuple(resume_argv))
1392- return d
1393+ resume_argv = resume_command.split()
1394+ resume_process = subprocess.Popen(
1395+ resume_argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
1396+ stdout, stderr = resume_process.communicate()
1397+
1398+ return (stdout, stderr, resume_process.returncode)
1399
1400 def cacheFile(self, logger, libraryfilealias):
1401 """Make sure that the file at 'libraryfilealias' is on the slave.
1402@@ -217,15 +224,13 @@
1403 "Asking builder on %s to ensure it has file %s (%s, %s)" % (
1404 self._file_cache_url, libraryfilealias.filename, url,
1405 libraryfilealias.content.sha1))
1406- return self.sendFileToSlave(libraryfilealias.content.sha1, url)
1407+ self.sendFileToSlave(libraryfilealias.content.sha1, url)
1408
1409 def sendFileToSlave(self, sha1, url, username="", password=""):
1410 """Helper to send the file at 'url' with 'sha1' to this builder."""
1411- d = self.ensurepresent(sha1, url, username, password)
1412- def check_present((present, info)):
1413- if not present:
1414- raise CannotFetchFile(url, info)
1415- return d.addCallback(check_present)
1416+ present, info = self.ensurepresent(sha1, url, username, password)
1417+ if not present:
1418+ raise CannotFetchFile(url, info)
1419
1420 def build(self, buildid, builder_type, chroot_sha1, filemap, args):
1421 """Build a thing on this build slave.
1422@@ -238,18 +243,19 @@
1423 :param args: A dictionary of extra arguments. The contents depend on
1424 the build job type.
1425 """
1426- d = self._with_timeout(self._server.callRemote(
1427- 'build', buildid, builder_type, chroot_sha1, filemap, args))
1428- def got_fault(failure):
1429- failure.trap(xmlrpclib.Fault)
1430- raise BuildSlaveFailure(failure.value)
1431- return d.addErrback(got_fault)
1432+ try:
1433+ return self._server.callRemote(
1434+ 'build', buildid, builder_type, chroot_sha1, filemap, args)
1435+ except xmlrpclib.Fault, info:
1436+ raise BuildSlaveFailure(info)
1437
1438
1439 # This is a separate function since MockBuilder needs to use it too.
1440 # Do not use it -- (Mock)Builder.rescueIfLost should be used instead.
1441 def rescueBuilderIfLost(builder, logger=None):
1442 """See `IBuilder`."""
1443+ status_sentence = builder.slaveStatusSentence()
1444+
1445 # 'ident_position' dict relates the position of the job identifier
1446 # token in the sentence received from status(), according the
1447 # two status we care about. See see lib/canonical/buildd/slave.py
1448@@ -259,58 +265,61 @@
1449 'BuilderStatus.WAITING': 2
1450 }
1451
1452- d = builder.slaveStatusSentence()
1453-
1454- def got_status(status_sentence):
1455- """After we get the status, clean if we have to.
1456-
1457- Always return status_sentence.
1458- """
1459- # Isolate the BuilderStatus string, always the first token in
1460- # see lib/canonical/buildd/slave.py and
1461- # IBuilder.slaveStatusSentence().
1462- status = status_sentence[0]
1463-
1464- # If the cookie test below fails, it will request an abort of the
1465- # builder. This will leave the builder in the aborted state and
1466- # with no assigned job, and we should now "clean" the slave which
1467- # will reset its state back to IDLE, ready to accept new builds.
1468- # This situation is usually caused by a temporary loss of
1469- # communications with the slave and the build manager had to reset
1470- # the job.
1471- if status == 'BuilderStatus.ABORTED' and builder.currentjob is None:
1472- if logger is not None:
1473- logger.info(
1474- "Builder '%s' being cleaned up from ABORTED" %
1475- (builder.name,))
1476- d = builder.cleanSlave()
1477- return d.addCallback(lambda ignored: status_sentence)
1478+ # Isolate the BuilderStatus string, always the first token in
1479+ # see lib/canonical/buildd/slave.py and
1480+ # IBuilder.slaveStatusSentence().
1481+ status = status_sentence[0]
1482+
1483+ # If the cookie test below fails, it will request an abort of the
1484+ # builder. This will leave the builder in the aborted state and
1485+ # with no assigned job, and we should now "clean" the slave which
1486+ # will reset its state back to IDLE, ready to accept new builds.
1487+ # This situation is usually caused by a temporary loss of
1488+ # communications with the slave and the build manager had to reset
1489+ # the job.
1490+ if status == 'BuilderStatus.ABORTED' and builder.currentjob is None:
1491+ builder.cleanSlave()
1492+ if logger is not None:
1493+ logger.info(
1494+ "Builder '%s' cleaned up from ABORTED" % builder.name)
1495+ return
1496+
1497+ # If slave is not building nor waiting, it's not in need of rescuing.
1498+ if status not in ident_position.keys():
1499+ return
1500+
1501+ slave_build_id = status_sentence[ident_position[status]]
1502+
1503+ try:
1504+ builder.verifySlaveBuildCookie(slave_build_id)
1505+ except CorruptBuildCookie, reason:
1506+ if status == 'BuilderStatus.WAITING':
1507+ builder.cleanSlave()
1508 else:
1509- return status_sentence
1510-
1511- def rescue_slave(status_sentence):
1512- # If slave is not building nor waiting, it's not in need of rescuing.
1513- status = status_sentence[0]
1514- if status not in ident_position.keys():
1515- return
1516- slave_build_id = status_sentence[ident_position[status]]
1517- try:
1518- builder.verifySlaveBuildCookie(slave_build_id)
1519- except CorruptBuildCookie, reason:
1520- if status == 'BuilderStatus.WAITING':
1521- d = builder.cleanSlave()
1522- else:
1523- d = builder.requestAbort()
1524- def log_rescue(ignored):
1525- if logger:
1526- logger.info(
1527- "Builder '%s' rescued from '%s': '%s'" %
1528- (builder.name, slave_build_id, reason))
1529- return d.addCallback(log_rescue)
1530-
1531- d.addCallback(got_status)
1532- d.addCallback(rescue_slave)
1533- return d
1534+ builder.requestAbort()
1535+ if logger:
1536+ logger.info(
1537+ "Builder '%s' rescued from '%s': '%s'" %
1538+ (builder.name, slave_build_id, reason))
1539+
1540+
1541+def _update_builder_status(builder, logger=None):
1542+ """Really update the builder status."""
1543+ try:
1544+ builder.checkSlaveAlive()
1545+ builder.rescueIfLost(logger)
1546+ # Catch only known exceptions.
1547+ # XXX cprov 2007-06-15 bug=120571: ValueError & TypeError catching is
1548+ # disturbing in this context. We should spend sometime sanitizing the
1549+ # exceptions raised in the Builder API since we already started the
1550+ # main refactoring of this area.
1551+ except (ValueError, TypeError, xmlrpclib.Fault,
1552+ BuildDaemonError), reason:
1553+ builder.failBuilder(str(reason))
1554+ if logger:
1555+ logger.warn(
1556+ "%s (%s) marked as failed due to: %s",
1557+ builder.name, builder.url, builder.failnotes, exc_info=True)
1558
1559
1560 def updateBuilderStatus(builder, logger=None):
1561@@ -318,7 +327,16 @@
1562 if logger:
1563 logger.debug('Checking %s' % builder.name)
1564
1565- return builder.rescueIfLost(logger)
1566+ MAX_EINTR_RETRIES = 42 # pulling a number out of my a$$ here
1567+ try:
1568+ return until_no_eintr(
1569+ MAX_EINTR_RETRIES, _update_builder_status, builder, logger=logger)
1570+ except socket.error, reason:
1571+ # In Python 2.6 we can use IOError instead. It also has
1572+ # reason.errno but we might be using 2.5 here so use the
1573+ # index hack.
1574+ error_message = str(reason)
1575+ builder.handleTimeout(logger, error_message)
1576
1577
1578 class Builder(SQLBase):
1579@@ -346,10 +364,6 @@
1580 active = BoolCol(dbName='active', notNull=True, default=True)
1581 failure_count = IntCol(dbName='failure_count', default=0, notNull=True)
1582
1583- # The number of times a builder can consecutively fail before we
1584- # give up and mark it builderok=False.
1585- FAILURE_THRESHOLD = 5
1586-
1587 def _getCurrentBuildBehavior(self):
1588 """Return the current build behavior."""
1589 if not safe_hasattr(self, '_current_build_behavior'):
1590@@ -395,13 +409,18 @@
1591 """See `IBuilder`."""
1592 self.failure_count = 0
1593
1594+ def checkSlaveAlive(self):
1595+ """See IBuilder."""
1596+ if self.slave.echo("Test")[0] != "Test":
1597+ raise BuildDaemonError("Failed to echo OK")
1598+
1599 def rescueIfLost(self, logger=None):
1600 """See `IBuilder`."""
1601- return rescueBuilderIfLost(self, logger)
1602+ rescueBuilderIfLost(self, logger)
1603
1604 def updateStatus(self, logger=None):
1605 """See `IBuilder`."""
1606- return updateBuilderStatus(self, logger)
1607+ updateBuilderStatus(self, logger)
1608
1609 def cleanSlave(self):
1610 """See IBuilder."""
1611@@ -421,23 +440,20 @@
1612 def resumeSlaveHost(self):
1613 """See IBuilder."""
1614 if not self.virtualized:
1615- return defer.fail(CannotResumeHost('Builder is not virtualized.'))
1616+ raise CannotResumeHost('Builder is not virtualized.')
1617
1618 if not self.vm_host:
1619- return defer.fail(CannotResumeHost('Undefined vm_host.'))
1620+ raise CannotResumeHost('Undefined vm_host.')
1621
1622 logger = self._getSlaveScannerLogger()
1623 logger.debug("Resuming %s (%s)" % (self.name, self.url))
1624
1625- d = self.slave.resume()
1626- def got_resume_ok((stdout, stderr, returncode)):
1627- return stdout, stderr
1628- def got_resume_bad(failure):
1629- stdout, stderr, code = failure.value
1630+ stdout, stderr, returncode = self.slave.resume()
1631+ if returncode != 0:
1632 raise CannotResumeHost(
1633 "Resuming failed:\nOUT:\n%s\nERR:\n%s\n" % (stdout, stderr))
1634
1635- return d.addCallback(got_resume_ok).addErrback(got_resume_bad)
1636+ return stdout, stderr
1637
1638 @cachedproperty
1639 def slave(self):
1640@@ -446,7 +462,7 @@
1641 # the slave object, which is usually an XMLRPC client, with a
1642 # stub object that removes the need to actually create a buildd
1643 # slave in various states - which can be hard to create.
1644- return BuilderSlave.makeBuilderSlave(self.url, self.vm_host)
1645+ return BuilderSlave.makeBlockingSlave(self.url, self.vm_host)
1646
1647 def setSlaveForTesting(self, proxy):
1648 """See IBuilder."""
1649@@ -467,23 +483,18 @@
1650
1651 # If we are building a virtual build, resume the virtual machine.
1652 if self.virtualized:
1653- d = self.resumeSlaveHost()
1654- else:
1655- d = defer.succeed(None)
1656+ self.resumeSlaveHost()
1657
1658- def resume_done(ignored):
1659- return self.current_build_behavior.dispatchBuildToSlave(
1660+ # Do it.
1661+ build_queue_item.markAsBuilding(self)
1662+ try:
1663+ self.current_build_behavior.dispatchBuildToSlave(
1664 build_queue_item.id, logger)
1665-
1666- def eb_slave_failure(failure):
1667- failure.trap(BuildSlaveFailure)
1668- e = failure.value
1669+ except BuildSlaveFailure, e:
1670+ logger.debug("Disabling builder: %s" % self.url, exc_info=1)
1671 self.failBuilder(
1672 "Exception (%s) when setting up to new job" % (e,))
1673-
1674- def eb_cannot_fetch_file(failure):
1675- failure.trap(CannotFetchFile)
1676- e = failure.value
1677+ except CannotFetchFile, e:
1678 message = """Slave '%s' (%s) was unable to fetch file.
1679 ****** URL ********
1680 %s
1681@@ -492,19 +503,10 @@
1682 *******************
1683 """ % (self.name, self.url, e.file_url, e.error_information)
1684 raise BuildDaemonError(message)
1685-
1686- def eb_socket_error(failure):
1687- failure.trap(socket.error)
1688- e = failure.value
1689+ except socket.error, e:
1690 error_message = "Exception (%s) when setting up new job" % (e,)
1691- d = self.handleTimeout(logger, error_message)
1692- return d.addBoth(lambda ignored: failure)
1693-
1694- d.addCallback(resume_done)
1695- d.addErrback(eb_slave_failure)
1696- d.addErrback(eb_cannot_fetch_file)
1697- d.addErrback(eb_socket_error)
1698- return d
1699+ self.handleTimeout(logger, error_message)
1700+ raise BuildSlaveFailure
1701
1702 def failBuilder(self, reason):
1703 """See IBuilder"""
1704@@ -532,24 +534,22 @@
1705
1706 def slaveStatus(self):
1707 """See IBuilder."""
1708- d = self.slave.status()
1709- def got_status(status_sentence):
1710- status = {'builder_status': status_sentence[0]}
1711-
1712- # Extract detailed status and log information if present.
1713- # Although build_id is also easily extractable here, there is no
1714- # valid reason for anything to use it, so we exclude it.
1715- if status['builder_status'] == 'BuilderStatus.WAITING':
1716- status['build_status'] = status_sentence[1]
1717- else:
1718- if status['builder_status'] == 'BuilderStatus.BUILDING':
1719- status['logtail'] = status_sentence[2]
1720-
1721- self.current_build_behavior.updateSlaveStatus(
1722- status_sentence, status)
1723- return status
1724-
1725- return d.addCallback(got_status)
1726+ builder_version, builder_arch, mechanisms = self.slave.info()
1727+ status_sentence = self.slave.status()
1728+
1729+ status = {'builder_status': status_sentence[0]}
1730+
1731+ # Extract detailed status and log information if present.
1732+ # Although build_id is also easily extractable here, there is no
1733+ # valid reason for anything to use it, so we exclude it.
1734+ if status['builder_status'] == 'BuilderStatus.WAITING':
1735+ status['build_status'] = status_sentence[1]
1736+ else:
1737+ if status['builder_status'] == 'BuilderStatus.BUILDING':
1738+ status['logtail'] = status_sentence[2]
1739+
1740+ self.current_build_behavior.updateSlaveStatus(status_sentence, status)
1741+ return status
1742
1743 def slaveStatusSentence(self):
1744 """See IBuilder."""
1745@@ -562,15 +562,13 @@
1746
1747 def updateBuild(self, queueItem):
1748 """See `IBuilder`."""
1749- return self.current_build_behavior.updateBuild(queueItem)
1750+ self.current_build_behavior.updateBuild(queueItem)
1751
1752 def transferSlaveFileToLibrarian(self, file_sha1, filename, private):
1753 """See IBuilder."""
1754 out_file_fd, out_file_name = tempfile.mkstemp(suffix=".buildlog")
1755 out_file = os.fdopen(out_file_fd, "r+")
1756 try:
1757- # XXX 2010-10-18 bug=662631
1758- # Change this to do non-blocking IO.
1759 slave_file = self.slave.getFile(file_sha1)
1760 copy_and_close(slave_file, out_file)
1761 # If the requested file is the 'buildlog' compress it using gzip
1762@@ -601,17 +599,18 @@
1763
1764 return library_file.id
1765
1766- def isAvailable(self):
1767+ @property
1768+ def is_available(self):
1769 """See `IBuilder`."""
1770 if not self.builderok:
1771- return defer.succeed(False)
1772- d = self.slaveStatusSentence()
1773- def catch_fault(failure):
1774- failure.trap(xmlrpclib.Fault, socket.error)
1775- return False
1776- def check_available(status):
1777- return status[0] == BuilderStatus.IDLE
1778- return d.addCallbacks(check_available, catch_fault)
1779+ return False
1780+ try:
1781+ slavestatus = self.slaveStatusSentence()
1782+ except (xmlrpclib.Fault, socket.error):
1783+ return False
1784+ if slavestatus[0] != BuilderStatus.IDLE:
1785+ return False
1786+ return True
1787
1788 def _getSlaveScannerLogger(self):
1789 """Return the logger instance from buildd-slave-scanner.py."""
1790@@ -622,27 +621,6 @@
1791 logger = logging.getLogger('slave-scanner')
1792 return logger
1793
1794- def acquireBuildCandidate(self):
1795- """Acquire a build candidate in an atomic fashion.
1796-
1797- When retrieiving a candidate we need to mark it as building
1798- immediately so that it is not dispatched by another builder in the
1799- build manager.
1800-
1801- We can consider this to be atomic because although the build manager
1802- is a Twisted app and gives the appearance of doing lots of things at
1803- once, it's still single-threaded so no more than one builder scan
1804- can be in this code at the same time.
1805-
1806- If there's ever more than one build manager running at once, then
1807- this code will need some sort of mutex.
1808- """
1809- candidate = self._findBuildCandidate()
1810- if candidate is not None:
1811- candidate.markAsBuilding(self)
1812- transaction.commit()
1813- return candidate
1814-
1815 def _findBuildCandidate(self):
1816 """Find a candidate job for dispatch to an idle buildd slave.
1817
1818@@ -722,46 +700,52 @@
1819 :param candidate: The job to dispatch.
1820 """
1821 logger = self._getSlaveScannerLogger()
1822- # Using maybeDeferred ensures that any exceptions are also
1823- # wrapped up and caught later.
1824- d = defer.maybeDeferred(self.startBuild, candidate, logger)
1825- return d
1826+ try:
1827+ self.startBuild(candidate, logger)
1828+ except (BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch), err:
1829+ logger.warn('Could not build: %s' % err)
1830
1831 def handleTimeout(self, logger, error_message):
1832 """See IBuilder."""
1833+ builder_should_be_failed = True
1834+
1835 if self.virtualized:
1836 # Virtualized/PPA builder: attempt a reset.
1837 logger.warn(
1838 "Resetting builder: %s -- %s" % (self.url, error_message),
1839 exc_info=True)
1840- d = self.resumeSlaveHost()
1841- return d
1842- else:
1843- # XXX: This should really let the failure bubble up to the
1844- # scan() method that does the failure counting.
1845+ try:
1846+ self.resumeSlaveHost()
1847+ except CannotResumeHost, err:
1848+ # Failed to reset builder.
1849+ logger.warn(
1850+ "Failed to reset builder: %s -- %s" %
1851+ (self.url, str(err)), exc_info=True)
1852+ else:
1853+ # Builder was reset, do *not* mark it as failed.
1854+ builder_should_be_failed = False
1855+
1856+ if builder_should_be_failed:
1857 # Mark builder as 'failed'.
1858 logger.warn(
1859- "Disabling builder: %s -- %s" % (self.url, error_message))
1860+ "Disabling builder: %s -- %s" % (self.url, error_message),
1861+ exc_info=True)
1862 self.failBuilder(error_message)
1863- return defer.succeed(None)
1864
1865 def findAndStartJob(self, buildd_slave=None):
1866 """See IBuilder."""
1867- # XXX This method should be removed in favour of two separately
1868- # called methods that find and dispatch the job. It will
1869- # require a lot of test fixing.
1870 logger = self._getSlaveScannerLogger()
1871- candidate = self.acquireBuildCandidate()
1872+ candidate = self._findBuildCandidate()
1873
1874 if candidate is None:
1875 logger.debug("No build candidates available for builder.")
1876- return defer.succeed(None)
1877+ return None
1878
1879 if buildd_slave is not None:
1880 self.setSlaveForTesting(buildd_slave)
1881
1882- d = self._dispatchBuildCandidate(candidate)
1883- return d.addCallback(lambda ignored: candidate)
1884+ self._dispatchBuildCandidate(candidate)
1885+ return candidate
1886
1887 def getBuildQueue(self):
1888 """See `IBuilder`."""
1889
1890=== modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py'
1891--- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-10-20 11:54:27 +0000
1892+++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-12-07 16:24:04 +0000
1893@@ -16,18 +16,13 @@
1894 import socket
1895 import xmlrpclib
1896
1897-from twisted.internet import defer
1898-
1899 from zope.component import getUtility
1900 from zope.interface import implements
1901 from zope.security.proxy import removeSecurityProxy
1902
1903 from canonical import encoding
1904 from canonical.librarian.interfaces import ILibrarianClient
1905-from lp.buildmaster.interfaces.builder import (
1906- BuildSlaveFailure,
1907- CorruptBuildCookie,
1908- )
1909+from lp.buildmaster.interfaces.builder import CorruptBuildCookie
1910 from lp.buildmaster.interfaces.buildfarmjobbehavior import (
1911 BuildBehaviorMismatch,
1912 IBuildFarmJobBehavior,
1913@@ -74,53 +69,54 @@
1914 """See `IBuildFarmJobBehavior`."""
1915 logger = logging.getLogger('slave-scanner')
1916
1917- d = self._builder.slaveStatus()
1918-
1919- def got_failure(failure):
1920- failure.trap(xmlrpclib.Fault, socket.error)
1921- info = failure.value
1922+ try:
1923+ slave_status = self._builder.slaveStatus()
1924+ except (xmlrpclib.Fault, socket.error), info:
1925+ # XXX cprov 2005-06-29:
1926+ # Hmm, a problem with the xmlrpc interface,
1927+ # disable the builder ?? or simple notice the failure
1928+ # with a timestamp.
1929 info = ("Could not contact the builder %s, caught a (%s)"
1930 % (queueItem.builder.url, info))
1931- raise BuildSlaveFailure(info)
1932-
1933- def got_status(slave_status):
1934- builder_status_handlers = {
1935- 'BuilderStatus.IDLE': self.updateBuild_IDLE,
1936- 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,
1937- 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,
1938- 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,
1939- 'BuilderStatus.WAITING': self.updateBuild_WAITING,
1940- }
1941-
1942- builder_status = slave_status['builder_status']
1943- if builder_status not in builder_status_handlers:
1944- logger.critical(
1945- "Builder on %s returned unknown status %s, failing it"
1946- % (self._builder.url, builder_status))
1947- self._builder.failBuilder(
1948- "Unknown status code (%s) returned from status() probe."
1949- % builder_status)
1950- # XXX: This will leave the build and job in a bad state, but
1951- # should never be possible, since our builder statuses are
1952- # known.
1953- queueItem._builder = None
1954- queueItem.setDateStarted(None)
1955- return
1956-
1957- # Since logtail is a xmlrpclib.Binary container and it is
1958- # returned from the IBuilder content class, it arrives
1959- # protected by a Zope Security Proxy, which is not declared,
1960- # thus empty. Before passing it to the status handlers we
1961- # will simply remove the proxy.
1962- logtail = removeSecurityProxy(slave_status.get('logtail'))
1963-
1964- method = builder_status_handlers[builder_status]
1965- return defer.maybeDeferred(
1966- method, queueItem, slave_status, logtail, logger)
1967-
1968- d.addErrback(got_failure)
1969- d.addCallback(got_status)
1970- return d
1971+ logger.debug(info, exc_info=True)
1972+ # keep the job for scan
1973+ return
1974+
1975+ builder_status_handlers = {
1976+ 'BuilderStatus.IDLE': self.updateBuild_IDLE,
1977+ 'BuilderStatus.BUILDING': self.updateBuild_BUILDING,
1978+ 'BuilderStatus.ABORTING': self.updateBuild_ABORTING,
1979+ 'BuilderStatus.ABORTED': self.updateBuild_ABORTED,
1980+ 'BuilderStatus.WAITING': self.updateBuild_WAITING,
1981+ }
1982+
1983+ builder_status = slave_status['builder_status']
1984+ if builder_status not in builder_status_handlers:
1985+ logger.critical(
1986+ "Builder on %s returned unknown status %s, failing it"
1987+ % (self._builder.url, builder_status))
1988+ self._builder.failBuilder(
1989+ "Unknown status code (%s) returned from status() probe."
1990+ % builder_status)
1991+ # XXX: This will leave the build and job in a bad state, but
1992+ # should never be possible, since our builder statuses are
1993+ # known.
1994+ queueItem._builder = None
1995+ queueItem.setDateStarted(None)
1996+ return
1997+
1998+ # Since logtail is a xmlrpclib.Binary container and it is returned
1999+ # from the IBuilder content class, it arrives protected by a Zope
2000+ # Security Proxy, which is not declared, thus empty. Before passing
2001+ # it to the status handlers we will simply remove the proxy.
2002+ logtail = removeSecurityProxy(slave_status.get('logtail'))
2003+
2004+ method = builder_status_handlers[builder_status]
2005+ try:
2006+ method(queueItem, slave_status, logtail, logger)
2007+ except TypeError, e:
2008+ logger.critical("Received wrong number of args in response.")
2009+ logger.exception(e)
2010
2011 def updateBuild_IDLE(self, queueItem, slave_status, logtail, logger):
2012 """Somehow the builder forgot about the build job.
2013@@ -150,13 +146,11 @@
2014
2015 Clean the builder for another jobs.
2016 """
2017- d = queueItem.builder.cleanSlave()
2018- def got_cleaned(ignored):
2019- queueItem.builder = None
2020- if queueItem.job.status != JobStatus.FAILED:
2021- queueItem.job.fail()
2022- queueItem.specific_job.jobAborted()
2023- return d.addCallback(got_cleaned)
2024+ queueItem.builder.cleanSlave()
2025+ queueItem.builder = None
2026+ if queueItem.job.status != JobStatus.FAILED:
2027+ queueItem.job.fail()
2028+ queueItem.specific_job.jobAborted()
2029
2030 def extractBuildStatus(self, slave_status):
2031 """Read build status name.
2032@@ -191,8 +185,6 @@
2033 # XXX: dsilvers 2005-03-02: Confirm the builder has the right build?
2034
2035 build = queueItem.specific_job.build
2036- # XXX 2010-10-18 bug=662631
2037- # Change this to do non-blocking IO.
2038 build.handleStatus(build_status, librarian, slave_status)
2039
2040
2041
2042=== modified file 'lib/lp/buildmaster/model/packagebuild.py'
2043--- lib/lp/buildmaster/model/packagebuild.py 2010-10-26 20:43:50 +0000
2044+++ lib/lp/buildmaster/model/packagebuild.py 2010-12-07 16:24:04 +0000
2045@@ -163,8 +163,6 @@
2046 def getLogFromSlave(package_build):
2047 """See `IPackageBuild`."""
2048 builder = package_build.buildqueue_record.builder
2049- # XXX 2010-10-18 bug=662631
2050- # Change this to do non-blocking IO.
2051 return builder.transferSlaveFileToLibrarian(
2052 SLAVE_LOG_FILENAME,
2053 package_build.buildqueue_record.getLogFileName(),
2054@@ -180,8 +178,6 @@
2055 # log, builder and date_finished are read-only, so we must
2056 # currently remove the security proxy to set them.
2057 naked_build = removeSecurityProxy(build)
2058- # XXX 2010-10-18 bug=662631
2059- # Change this to do non-blocking IO.
2060 naked_build.log = build.getLogFromSlave(build)
2061 naked_build.builder = build.buildqueue_record.builder
2062 # XXX cprov 20060615 bug=120584: Currently buildduration includes
2063@@ -278,8 +274,6 @@
2064 logger.critical("Unknown BuildStatus '%s' for builder '%s'"
2065 % (status, self.buildqueue_record.builder.url))
2066 return
2067- # XXX 2010-10-18 bug=662631
2068- # Change this to do non-blocking IO.
2069 method(librarian, slave_status, logger)
2070
2071 def _handleStatus_OK(self, librarian, slave_status, logger):
2072
2073=== modified file 'lib/lp/buildmaster/tests/mock_slaves.py'
2074--- lib/lp/buildmaster/tests/mock_slaves.py 2010-10-14 15:37:56 +0000
2075+++ lib/lp/buildmaster/tests/mock_slaves.py 2010-12-07 16:24:04 +0000
2076@@ -6,40 +6,21 @@
2077 __metaclass__ = type
2078
2079 __all__ = [
2080- 'AbortedSlave',
2081- 'AbortingSlave',
2082+ 'MockBuilder',
2083+ 'LostBuildingBrokenSlave',
2084 'BrokenSlave',
2085+ 'OkSlave',
2086 'BuildingSlave',
2087- 'CorruptBehavior',
2088- 'DeadProxy',
2089- 'LostBuildingBrokenSlave',
2090- 'MockBuilder',
2091- 'OkSlave',
2092- 'SlaveTestHelpers',
2093- 'TrivialBehavior',
2094+ 'AbortedSlave',
2095 'WaitingSlave',
2096+ 'AbortingSlave',
2097 ]
2098
2099-import fixtures
2100-import os
2101-
2102 from StringIO import StringIO
2103 import xmlrpclib
2104
2105-from testtools.content import Content
2106-from testtools.content_type import UTF8_TEXT
2107-
2108-from twisted.internet import defer
2109-from twisted.web import xmlrpc
2110-
2111-from canonical.buildd.tests.harness import BuilddSlaveTestSetup
2112-
2113-from lp.buildmaster.interfaces.builder import (
2114- CannotFetchFile,
2115- CorruptBuildCookie,
2116- )
2117+from lp.buildmaster.interfaces.builder import CannotFetchFile
2118 from lp.buildmaster.model.builder import (
2119- BuilderSlave,
2120 rescueBuilderIfLost,
2121 updateBuilderStatus,
2122 )
2123@@ -78,9 +59,15 @@
2124 slave_build_id)
2125
2126 def cleanSlave(self):
2127+ # XXX: This should not print anything. The print is only here to make
2128+ # doc/builder.txt a meaningful test.
2129+ print 'Cleaning slave'
2130 return self.slave.clean()
2131
2132 def requestAbort(self):
2133+ # XXX: This should not print anything. The print is only here to make
2134+ # doc/builder.txt a meaningful test.
2135+ print 'Aborting slave'
2136 return self.slave.abort()
2137
2138 def resumeSlave(self, logger):
2139@@ -90,10 +77,10 @@
2140 pass
2141
2142 def rescueIfLost(self, logger=None):
2143- return rescueBuilderIfLost(self, logger)
2144+ rescueBuilderIfLost(self, logger)
2145
2146 def updateStatus(self, logger=None):
2147- return defer.maybeDeferred(updateBuilderStatus, self, logger)
2148+ updateBuilderStatus(self, logger)
2149
2150
2151 # XXX: It would be *really* nice to run some set of tests against the real
2152@@ -108,44 +95,36 @@
2153 self.arch_tag = arch_tag
2154
2155 def status(self):
2156- return defer.succeed(('BuilderStatus.IDLE', ''))
2157+ return ('BuilderStatus.IDLE', '')
2158
2159 def ensurepresent(self, sha1, url, user=None, password=None):
2160 self.call_log.append(('ensurepresent', url, user, password))
2161- return defer.succeed((True, None))
2162+ return True, None
2163
2164 def build(self, buildid, buildtype, chroot, filemap, args):
2165 self.call_log.append(
2166 ('build', buildid, buildtype, chroot, filemap.keys(), args))
2167 info = 'OkSlave BUILDING'
2168- return defer.succeed(('BuildStatus.Building', info))
2169+ return ('BuildStatus.Building', info)
2170
2171 def echo(self, *args):
2172 self.call_log.append(('echo',) + args)
2173- return defer.succeed(args)
2174+ return args
2175
2176 def clean(self):
2177 self.call_log.append('clean')
2178- return defer.succeed(None)
2179
2180 def abort(self):
2181 self.call_log.append('abort')
2182- return defer.succeed(None)
2183
2184 def info(self):
2185 self.call_log.append('info')
2186- return defer.succeed(('1.0', self.arch_tag, 'debian'))
2187-
2188- def resume(self):
2189- self.call_log.append('resume')
2190- return defer.succeed(("", "", 0))
2191+ return ('1.0', self.arch_tag, 'debian')
2192
2193 def sendFileToSlave(self, sha1, url, username="", password=""):
2194- d = self.ensurepresent(sha1, url, username, password)
2195- def check_present((present, info)):
2196- if not present:
2197- raise CannotFetchFile(url, info)
2198- return d.addCallback(check_present)
2199+ present, info = self.ensurepresent(sha1, url, username, password)
2200+ if not present:
2201+ raise CannotFetchFile(url, info)
2202
2203 def cacheFile(self, logger, libraryfilealias):
2204 return self.sendFileToSlave(
2205@@ -162,11 +141,9 @@
2206 def status(self):
2207 self.call_log.append('status')
2208 buildlog = xmlrpclib.Binary("This is a build log")
2209- return defer.succeed(
2210- ('BuilderStatus.BUILDING', self.build_id, buildlog))
2211+ return ('BuilderStatus.BUILDING', self.build_id, buildlog)
2212
2213 def getFile(self, sum):
2214- # XXX: This needs to be updated to return a Deferred.
2215 self.call_log.append('getFile')
2216 if sum == "buildlog":
2217 s = StringIO("This is a build log")
2218@@ -178,15 +155,11 @@
2219 """A mock slave that looks like it's currently waiting."""
2220
2221 def __init__(self, state='BuildStatus.OK', dependencies=None,
2222- build_id='1-1', filemap=None):
2223+ build_id='1-1'):
2224 super(WaitingSlave, self).__init__()
2225 self.state = state
2226 self.dependencies = dependencies
2227 self.build_id = build_id
2228- if filemap is None:
2229- self.filemap = {}
2230- else:
2231- self.filemap = filemap
2232
2233 # By default, the slave only has a buildlog, but callsites
2234 # can update this list as needed.
2235@@ -194,12 +167,10 @@
2236
2237 def status(self):
2238 self.call_log.append('status')
2239- return defer.succeed((
2240- 'BuilderStatus.WAITING', self.state, self.build_id, self.filemap,
2241- self.dependencies))
2242+ return ('BuilderStatus.WAITING', self.state, self.build_id, {},
2243+ self.dependencies)
2244
2245 def getFile(self, hash):
2246- # XXX: This needs to be updated to return a Deferred.
2247 self.call_log.append('getFile')
2248 if hash in self.valid_file_hashes:
2249 content = "This is a %s" % hash
2250@@ -213,19 +184,15 @@
2251
2252 def status(self):
2253 self.call_log.append('status')
2254- return defer.succeed(('BuilderStatus.ABORTING', '1-1'))
2255+ return ('BuilderStatus.ABORTING', '1-1')
2256
2257
2258 class AbortedSlave(OkSlave):
2259 """A mock slave that looks like it's aborted."""
2260
2261- def clean(self):
2262+ def status(self):
2263 self.call_log.append('status')
2264- return defer.succeed(None)
2265-
2266- def status(self):
2267- self.call_log.append('clean')
2268- return defer.succeed(('BuilderStatus.ABORTED', '1-1'))
2269+ return ('BuilderStatus.ABORTED', '1-1')
2270
2271
2272 class LostBuildingBrokenSlave:
2273@@ -239,108 +206,16 @@
2274
2275 def status(self):
2276 self.call_log.append('status')
2277- return defer.succeed(('BuilderStatus.BUILDING', '1000-10000'))
2278+ return ('BuilderStatus.BUILDING', '1000-10000')
2279
2280 def abort(self):
2281 self.call_log.append('abort')
2282- return defer.fail(xmlrpclib.Fault(8002, "Could not abort"))
2283+ raise xmlrpclib.Fault(8002, "Could not abort")
2284
2285
2286 class BrokenSlave:
2287 """A mock slave that reports that it is broken."""
2288
2289- def __init__(self):
2290- self.call_log = []
2291-
2292 def status(self):
2293 self.call_log.append('status')
2294- return defer.fail(xmlrpclib.Fault(8001, "Broken slave"))
2295-
2296-
2297-class CorruptBehavior:
2298-
2299- def verifySlaveBuildCookie(self, cookie):
2300- raise CorruptBuildCookie("Bad value: %r" % (cookie,))
2301-
2302-
2303-class TrivialBehavior:
2304-
2305- def verifySlaveBuildCookie(self, cookie):
2306- pass
2307-
2308-
2309-class DeadProxy(xmlrpc.Proxy):
2310- """An xmlrpc.Proxy that doesn't actually send any messages.
2311-
2312- Used when you want to test timeouts, for example.
2313- """
2314-
2315- def callRemote(self, *args, **kwargs):
2316- return defer.Deferred()
2317-
2318-
2319-class SlaveTestHelpers(fixtures.Fixture):
2320-
2321- # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`.
2322- BASE_URL = 'http://localhost:8221'
2323- TEST_URL = '%s/rpc/' % (BASE_URL,)
2324-
2325- def getServerSlave(self):
2326- """Set up a test build slave server.
2327-
2328- :return: A `BuilddSlaveTestSetup` object.
2329- """
2330- tachandler = BuilddSlaveTestSetup()
2331- tachandler.setUp()
2332- # Basically impossible to do this w/ TrialTestCase. But it would be
2333- # really nice to keep it.
2334- #
2335- # def addLogFile(exc_info):
2336- # self.addDetail(
2337- # 'xmlrpc-log-file',
2338- # Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read()))
2339- # self.addOnException(addLogFile)
2340- self.addCleanup(tachandler.tearDown)
2341- return tachandler
2342-
2343- def getClientSlave(self, reactor=None, proxy=None):
2344- """Return a `BuilderSlave` for use in testing.
2345-
2346- Points to a fixed URL that is also used by `BuilddSlaveTestSetup`.
2347- """
2348- return BuilderSlave.makeBuilderSlave(
2349- self.TEST_URL, 'vmhost', reactor, proxy)
2350-
2351- def makeCacheFile(self, tachandler, filename):
2352- """Make a cache file available on the remote slave.
2353-
2354- :param tachandler: The TacTestSetup object used to start the remote
2355- slave.
2356- :param filename: The name of the file to create in the file cache
2357- area.
2358- """
2359- path = os.path.join(tachandler.root, 'filecache', filename)
2360- fd = open(path, 'w')
2361- fd.write('something')
2362- fd.close()
2363- self.addCleanup(os.unlink, path)
2364-
2365- def triggerGoodBuild(self, slave, build_id=None):
2366- """Trigger a good build on 'slave'.
2367-
2368- :param slave: A `BuilderSlave` instance to trigger the build on.
2369- :param build_id: The build identifier. If not specified, defaults to
2370- an arbitrary string.
2371- :type build_id: str
2372- :return: The build id returned by the slave.
2373- """
2374- if build_id is None:
2375- build_id = 'random-build-id'
2376- tachandler = self.getServerSlave()
2377- chroot_file = 'fake-chroot'
2378- dsc_file = 'thing'
2379- self.makeCacheFile(tachandler, chroot_file)
2380- self.makeCacheFile(tachandler, dsc_file)
2381- return slave.build(
2382- build_id, 'debian', chroot_file, {'.dsc': dsc_file},
2383- {'ogrecomponent': 'main'})
2384+ raise xmlrpclib.Fault(8001, "Broken slave")
2385
2386=== modified file 'lib/lp/buildmaster/tests/test_builder.py'
2387--- lib/lp/buildmaster/tests/test_builder.py 2010-10-18 16:44:22 +0000
2388+++ lib/lp/buildmaster/tests/test_builder.py 2010-12-07 16:24:04 +0000
2389@@ -3,24 +3,20 @@
2390
2391 """Test Builder features."""
2392
2393+import errno
2394 import os
2395-import signal
2396+import socket
2397 import xmlrpclib
2398
2399-from twisted.web.client import getPage
2400-
2401-from twisted.internet.defer import CancelledError
2402-from twisted.internet.task import Clock
2403-from twisted.python.failure import Failure
2404-from twisted.trial.unittest import TestCase as TrialTestCase
2405+from testtools.content import Content
2406+from testtools.content_type import UTF8_TEXT
2407
2408 from zope.component import getUtility
2409 from zope.security.proxy import removeSecurityProxy
2410
2411 from canonical.buildd.slave import BuilderStatus
2412-from canonical.config import config
2413+from canonical.buildd.tests.harness import BuilddSlaveTestSetup
2414 from canonical.database.sqlbase import flush_database_updates
2415-from canonical.launchpad.scripts import QuietFakeLogger
2416 from canonical.launchpad.webapp.interfaces import (
2417 DEFAULT_FLAVOR,
2418 IStoreSelector,
2419@@ -28,38 +24,21 @@
2420 )
2421 from canonical.testing.layers import (
2422 DatabaseFunctionalLayer,
2423- LaunchpadZopelessLayer,
2424- TwistedLaunchpadZopelessLayer,
2425- TwistedLayer,
2426+ LaunchpadZopelessLayer
2427 )
2428 from lp.buildmaster.enums import BuildStatus
2429-from lp.buildmaster.interfaces.builder import (
2430- CannotFetchFile,
2431- IBuilder,
2432- IBuilderSet,
2433- )
2434+from lp.buildmaster.interfaces.builder import IBuilder, IBuilderSet
2435 from lp.buildmaster.interfaces.buildfarmjobbehavior import (
2436 IBuildFarmJobBehavior,
2437 )
2438 from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
2439-from lp.buildmaster.interfaces.builder import CannotResumeHost
2440+from lp.buildmaster.model.builder import BuilderSlave
2441 from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior
2442 from lp.buildmaster.model.buildqueue import BuildQueue
2443 from lp.buildmaster.tests.mock_slaves import (
2444 AbortedSlave,
2445- AbortingSlave,
2446- BrokenSlave,
2447- BuildingSlave,
2448- CorruptBehavior,
2449- DeadProxy,
2450- LostBuildingBrokenSlave,
2451 MockBuilder,
2452- OkSlave,
2453- SlaveTestHelpers,
2454- TrivialBehavior,
2455- WaitingSlave,
2456 )
2457-from lp.services.job.interfaces.job import JobStatus
2458 from lp.soyuz.enums import (
2459 ArchivePurpose,
2460 PackagePublishingStatus,
2461@@ -70,12 +49,9 @@
2462 )
2463 from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
2464 from lp.testing import (
2465- ANONYMOUS,
2466- login_as,
2467- logout,
2468+ TestCase,
2469 TestCaseWithFactory,
2470 )
2471-from lp.testing.factory import LaunchpadObjectFactory
2472 from lp.testing.fakemethod import FakeMethod
2473
2474
2475@@ -116,121 +92,42 @@
2476 bq = builder.getBuildQueue()
2477 self.assertIs(None, bq)
2478
2479-
2480-class TestBuilderWithTrial(TrialTestCase):
2481-
2482- layer = TwistedLaunchpadZopelessLayer
2483-
2484- def setUp(self):
2485- super(TestBuilderWithTrial, self)
2486- self.slave_helper = SlaveTestHelpers()
2487- self.slave_helper.setUp()
2488- self.addCleanup(self.slave_helper.cleanUp)
2489- self.factory = LaunchpadObjectFactory()
2490- login_as(ANONYMOUS)
2491- self.addCleanup(logout)
2492-
2493- def test_updateStatus_aborts_lost_and_broken_slave(self):
2494- # A slave that's 'lost' should be aborted; when the slave is
2495- # broken then abort() should also throw a fault.
2496- slave = LostBuildingBrokenSlave()
2497- lostbuilding_builder = MockBuilder(
2498- 'Lost Building Broken Slave', slave, behavior=CorruptBehavior())
2499- d = lostbuilding_builder.updateStatus(QuietFakeLogger())
2500- def check_slave_status(failure):
2501- self.assertIn('abort', slave.call_log)
2502- # 'Fault' comes from the LostBuildingBrokenSlave, this is
2503- # just testing that the value is passed through.
2504- self.assertIsInstance(failure.value, xmlrpclib.Fault)
2505- return d.addBoth(check_slave_status)
2506-
2507- def test_resumeSlaveHost_nonvirtual(self):
2508- builder = self.factory.makeBuilder(virtualized=False)
2509- d = builder.resumeSlaveHost()
2510- return self.assertFailure(d, CannotResumeHost)
2511-
2512- def test_resumeSlaveHost_no_vmhost(self):
2513- builder = self.factory.makeBuilder(virtualized=True, vm_host=None)
2514- d = builder.resumeSlaveHost()
2515- return self.assertFailure(d, CannotResumeHost)
2516-
2517- def test_resumeSlaveHost_success(self):
2518- reset_config = """
2519- [builddmaster]
2520- vm_resume_command: /bin/echo -n parp"""
2521- config.push('reset', reset_config)
2522- self.addCleanup(config.pop, 'reset')
2523-
2524- builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
2525- d = builder.resumeSlaveHost()
2526- def got_resume(output):
2527- self.assertEqual(('parp', ''), output)
2528- return d.addCallback(got_resume)
2529-
2530- def test_resumeSlaveHost_command_failed(self):
2531- reset_fail_config = """
2532- [builddmaster]
2533- vm_resume_command: /bin/false"""
2534- config.push('reset fail', reset_fail_config)
2535- self.addCleanup(config.pop, 'reset fail')
2536- builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
2537- d = builder.resumeSlaveHost()
2538- return self.assertFailure(d, CannotResumeHost)
2539-
2540- def test_handleTimeout_resume_failure(self):
2541- reset_fail_config = """
2542- [builddmaster]
2543- vm_resume_command: /bin/false"""
2544- config.push('reset fail', reset_fail_config)
2545- self.addCleanup(config.pop, 'reset fail')
2546- builder = self.factory.makeBuilder(virtualized=True, vm_host="pop")
2547- builder.builderok = True
2548- d = builder.handleTimeout(QuietFakeLogger(), 'blah')
2549- return self.assertFailure(d, CannotResumeHost)
2550-
2551- def _setupRecipeBuildAndBuilder(self):
2552- # Helper function to make a builder capable of building a
2553- # recipe, returning both.
2554- processor = self.factory.makeProcessor(name="i386")
2555- builder = self.factory.makeBuilder(
2556- processor=processor, virtualized=True, vm_host="bladh")
2557- builder.setSlaveForTesting(OkSlave())
2558- distroseries = self.factory.makeDistroSeries()
2559- das = self.factory.makeDistroArchSeries(
2560- distroseries=distroseries, architecturetag="i386",
2561- processorfamily=processor.family)
2562- chroot = self.factory.makeLibraryFileAlias()
2563- das.addOrUpdateChroot(chroot)
2564- distroseries.nominatedarchindep = das
2565- build = self.factory.makeSourcePackageRecipeBuild(
2566- distroseries=distroseries)
2567- return builder, build
2568-
2569- def test_findAndStartJob_returns_candidate(self):
2570- # findAndStartJob finds the next queued job using _findBuildCandidate.
2571- # We don't care about the type of build at all.
2572- builder, build = self._setupRecipeBuildAndBuilder()
2573- candidate = build.queueBuild()
2574- # _findBuildCandidate is tested elsewhere, we just make sure that
2575- # findAndStartJob delegates to it.
2576- removeSecurityProxy(builder)._findBuildCandidate = FakeMethod(
2577- result=candidate)
2578- d = builder.findAndStartJob()
2579- return d.addCallback(self.assertEqual, candidate)
2580-
2581- def test_findAndStartJob_starts_job(self):
2582- # findAndStartJob finds the next queued job using _findBuildCandidate
2583- # and then starts it.
2584- # We don't care about the type of build at all.
2585- builder, build = self._setupRecipeBuildAndBuilder()
2586- candidate = build.queueBuild()
2587- removeSecurityProxy(builder)._findBuildCandidate = FakeMethod(
2588- result=candidate)
2589- d = builder.findAndStartJob()
2590- def check_build_started(candidate):
2591- self.assertEqual(candidate.builder, builder)
2592- self.assertEqual(BuildStatus.BUILDING, build.status)
2593- return d.addCallback(check_build_started)
2594+ def test_updateBuilderStatus_catches_repeated_EINTR(self):
2595+ # A single EINTR return from a socket operation should cause the
2596+ # operation to be retried, not fail/reset the builder.
2597+ builder = removeSecurityProxy(self.factory.makeBuilder())
2598+ builder.handleTimeout = FakeMethod()
2599+ builder.rescueIfLost = FakeMethod()
2600+
2601+ def _fake_checkSlaveAlive():
2602+ # Raise an EINTR error for all invocations.
2603+ raise socket.error(errno.EINTR, "fake eintr")
2604+
2605+ builder.checkSlaveAlive = _fake_checkSlaveAlive
2606+ builder.updateStatus()
2607+
2608+ # builder.updateStatus should eventually have called
2609+ # handleTimeout()
2610+ self.assertEqual(1, builder.handleTimeout.call_count)
2611+
2612+ def test_updateBuilderStatus_catches_single_EINTR(self):
2613+ builder = removeSecurityProxy(self.factory.makeBuilder())
2614+ builder.handleTimeout = FakeMethod()
2615+ builder.rescueIfLost = FakeMethod()
2616+ self.eintr_returned = False
2617+
2618+ def _fake_checkSlaveAlive():
2619+ # raise an EINTR error for the first invocation only.
2620+ if not self.eintr_returned:
2621+ self.eintr_returned = True
2622+ raise socket.error(errno.EINTR, "fake eintr")
2623+
2624+ builder.checkSlaveAlive = _fake_checkSlaveAlive
2625+ builder.updateStatus()
2626+
2627+ # builder.updateStatus should never call handleTimeout() for a
2628+ # single EINTR.
2629+ self.assertEqual(0, builder.handleTimeout.call_count)
2630
2631 def test_slave(self):
2632 # Builder.slave is a BuilderSlave that points at the actual Builder.
2633@@ -239,147 +136,25 @@
2634 builder = removeSecurityProxy(self.factory.makeBuilder())
2635 self.assertEqual(builder.url, builder.slave.url)
2636
2637+
2638+class Test_rescueBuilderIfLost(TestCaseWithFactory):
2639+ """Tests for lp.buildmaster.model.builder.rescueBuilderIfLost."""
2640+
2641+ layer = LaunchpadZopelessLayer
2642+
2643 def test_recovery_of_aborted_slave(self):
2644 # If a slave is in the ABORTED state, rescueBuilderIfLost should
2645 # clean it if we don't think it's currently building anything.
2646 # See bug 463046.
2647 aborted_slave = AbortedSlave()
2648+ # The slave's clean() method is normally an XMLRPC call, so we
2649+ # can just stub it out and check that it got called.
2650+ aborted_slave.clean = FakeMethod()
2651 builder = MockBuilder("mock_builder", aborted_slave)
2652 builder.currentjob = None
2653- d = builder.rescueIfLost()
2654- def check_slave_calls(ignored):
2655- self.assertIn('clean', aborted_slave.call_log)
2656- return d.addCallback(check_slave_calls)
2657-
2658- def test_recover_ok_slave(self):
2659- # An idle slave is not rescued.
2660- slave = OkSlave()
2661- builder = MockBuilder("mock_builder", slave, TrivialBehavior())
2662- d = builder.rescueIfLost()
2663- def check_slave_calls(ignored):
2664- self.assertNotIn('abort', slave.call_log)
2665- self.assertNotIn('clean', slave.call_log)
2666- return d.addCallback(check_slave_calls)
2667-
2668- def test_recover_waiting_slave_with_good_id(self):
2669- # rescueIfLost does not attempt to abort or clean a builder that is
2670- # WAITING.
2671- waiting_slave = WaitingSlave()
2672- builder = MockBuilder("mock_builder", waiting_slave, TrivialBehavior())
2673- d = builder.rescueIfLost()
2674- def check_slave_calls(ignored):
2675- self.assertNotIn('abort', waiting_slave.call_log)
2676- self.assertNotIn('clean', waiting_slave.call_log)
2677- return d.addCallback(check_slave_calls)
2678-
2679- def test_recover_waiting_slave_with_bad_id(self):
2680- # If a slave is WAITING with a build for us to get, and the build
2681- # cookie cannot be verified, which means we don't recognize the build,
2682- # then rescueBuilderIfLost should attempt to abort it, so that the
2683- # builder is reset for a new build, and the corrupt build is
2684- # discarded.
2685- waiting_slave = WaitingSlave()
2686- builder = MockBuilder("mock_builder", waiting_slave, CorruptBehavior())
2687- d = builder.rescueIfLost()
2688- def check_slave_calls(ignored):
2689- self.assertNotIn('abort', waiting_slave.call_log)
2690- self.assertIn('clean', waiting_slave.call_log)
2691- return d.addCallback(check_slave_calls)
2692-
2693- def test_recover_building_slave_with_good_id(self):
2694- # rescueIfLost does not attempt to abort or clean a builder that is
2695- # BUILDING.
2696- building_slave = BuildingSlave()
2697- builder = MockBuilder("mock_builder", building_slave, TrivialBehavior())
2698- d = builder.rescueIfLost()
2699- def check_slave_calls(ignored):
2700- self.assertNotIn('abort', building_slave.call_log)
2701- self.assertNotIn('clean', building_slave.call_log)
2702- return d.addCallback(check_slave_calls)
2703-
2704- def test_recover_building_slave_with_bad_id(self):
2705- # If a slave is BUILDING with a build id we don't recognize, then we
2706- # abort the build, thus stopping it in its tracks.
2707- building_slave = BuildingSlave()
2708- builder = MockBuilder("mock_builder", building_slave, CorruptBehavior())
2709- d = builder.rescueIfLost()
2710- def check_slave_calls(ignored):
2711- self.assertIn('abort', building_slave.call_log)
2712- self.assertNotIn('clean', building_slave.call_log)
2713- return d.addCallback(check_slave_calls)
2714-
2715-
2716-class TestBuilderSlaveStatus(TestBuilderWithTrial):
2717-
2718- # Verify what IBuilder.slaveStatus returns with slaves in different
2719- # states.
2720-
2721- def assertStatus(self, slave, builder_status=None,
2722- build_status=None, logtail=False, filemap=None,
2723- dependencies=None):
2724- builder = self.factory.makeBuilder()
2725- builder.setSlaveForTesting(slave)
2726- d = builder.slaveStatus()
2727-
2728- def got_status(status_dict):
2729- expected = {}
2730- if builder_status is not None:
2731- expected["builder_status"] = builder_status
2732- if build_status is not None:
2733- expected["build_status"] = build_status
2734- if dependencies is not None:
2735- expected["dependencies"] = dependencies
2736-
2737- # We don't care so much about the content of the logtail,
2738- # just that it's there.
2739- if logtail:
2740- tail = status_dict.pop("logtail")
2741- self.assertIsInstance(tail, xmlrpclib.Binary)
2742-
2743- self.assertEqual(expected, status_dict)
2744-
2745- return d.addCallback(got_status)
2746-
2747- def test_slaveStatus_idle_slave(self):
2748- self.assertStatus(
2749- OkSlave(), builder_status='BuilderStatus.IDLE')
2750-
2751- def test_slaveStatus_building_slave(self):
2752- self.assertStatus(
2753- BuildingSlave(), builder_status='BuilderStatus.BUILDING',
2754- logtail=True)
2755-
2756- def test_slaveStatus_waiting_slave(self):
2757- self.assertStatus(
2758- WaitingSlave(), builder_status='BuilderStatus.WAITING',
2759- build_status='BuildStatus.OK', filemap={})
2760-
2761- def test_slaveStatus_aborting_slave(self):
2762- self.assertStatus(
2763- AbortingSlave(), builder_status='BuilderStatus.ABORTING')
2764-
2765- def test_slaveStatus_aborted_slave(self):
2766- self.assertStatus(
2767- AbortedSlave(), builder_status='BuilderStatus.ABORTED')
2768-
2769- def test_isAvailable_with_not_builderok(self):
2770- # isAvailable() is a wrapper around slaveStatusSentence()
2771- builder = self.factory.makeBuilder()
2772- builder.builderok = False
2773- d = builder.isAvailable()
2774- return d.addCallback(self.assertFalse)
2775-
2776- def test_isAvailable_with_slave_fault(self):
2777- builder = self.factory.makeBuilder()
2778- builder.setSlaveForTesting(BrokenSlave())
2779- d = builder.isAvailable()
2780- return d.addCallback(self.assertFalse)
2781-
2782- def test_isAvailable_with_slave_idle(self):
2783- builder = self.factory.makeBuilder()
2784- builder.setSlaveForTesting(OkSlave())
2785- d = builder.isAvailable()
2786- return d.addCallback(self.assertTrue)
2787+ builder.rescueIfLost()
2788+
2789+ self.assertEqual(1, aborted_slave.clean.call_count)
2790
2791
2792 class TestFindBuildCandidateBase(TestCaseWithFactory):
2793@@ -413,49 +188,6 @@
2794 builder.manual = False
2795
2796
2797-class TestFindBuildCandidateGeneralCases(TestFindBuildCandidateBase):
2798- # Test usage of findBuildCandidate not specific to any archive type.
2799-
2800- def test_findBuildCandidate_supersedes_builds(self):
2801- # IBuilder._findBuildCandidate identifies if there are builds
2802- # for superseded source package releases in the queue and marks
2803- # the corresponding build record as SUPERSEDED.
2804- archive = self.factory.makeArchive()
2805- self.publisher.getPubSource(
2806- sourcename="gedit", status=PackagePublishingStatus.PUBLISHED,
2807- archive=archive).createMissingBuilds()
2808- old_candidate = removeSecurityProxy(
2809- self.frog_builder)._findBuildCandidate()
2810-
2811- # The candidate starts off as NEEDSBUILD:
2812- build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(
2813- old_candidate)
2814- self.assertEqual(BuildStatus.NEEDSBUILD, build.status)
2815-
2816- # Now supersede the source package:
2817- publication = build.current_source_publication
2818- publication.status = PackagePublishingStatus.SUPERSEDED
2819-
2820- # The candidate returned is now a different one:
2821- new_candidate = removeSecurityProxy(
2822- self.frog_builder)._findBuildCandidate()
2823- self.assertNotEqual(new_candidate, old_candidate)
2824-
2825- # And the old_candidate is superseded:
2826- self.assertEqual(BuildStatus.SUPERSEDED, build.status)
2827-
2828- def test_acquireBuildCandidate_marks_building(self):
2829- # acquireBuildCandidate() should call _findBuildCandidate and
2830- # mark the build as building.
2831- archive = self.factory.makeArchive()
2832- self.publisher.getPubSource(
2833- sourcename="gedit", status=PackagePublishingStatus.PUBLISHED,
2834- archive=archive).createMissingBuilds()
2835- candidate = removeSecurityProxy(
2836- self.frog_builder).acquireBuildCandidate()
2837- self.assertEqual(JobStatus.RUNNING, candidate.job.status)
2838-
2839-
2840 class TestFindBuildCandidatePPAWithSingleBuilder(TestCaseWithFactory):
2841
2842 layer = LaunchpadZopelessLayer
2843@@ -588,16 +320,6 @@
2844 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)
2845 self.failUnlessEqual('joesppa', build.archive.name)
2846
2847- def test_findBuildCandidate_with_disabled_archive(self):
2848- # Disabled archives should not be considered for dispatching
2849- # builds.
2850- disabled_job = removeSecurityProxy(self.builder4)._findBuildCandidate()
2851- build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(
2852- disabled_job)
2853- build.archive.disable()
2854- next_job = removeSecurityProxy(self.builder4)._findBuildCandidate()
2855- self.assertNotEqual(disabled_job, next_job)
2856-
2857
2858 class TestFindBuildCandidatePrivatePPA(TestFindBuildCandidatePPABase):
2859
2860@@ -610,14 +332,6 @@
2861 build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job)
2862 self.failUnlessEqual('joesppa', build.archive.name)
2863
2864- # If the source for the build is still pending, it won't be
2865- # dispatched because the builder has to fetch the source files
2866- # from the (password protected) repo area, not the librarian.
2867- pub = build.current_source_publication
2868- pub.status = PackagePublishingStatus.PENDING
2869- candidate = removeSecurityProxy(self.builder4)._findBuildCandidate()
2870- self.assertNotEqual(next_job.id, candidate.id)
2871-
2872
2873 class TestFindBuildCandidateDistroArchive(TestFindBuildCandidateBase):
2874
2875@@ -760,48 +474,97 @@
2876 self.builder.current_build_behavior, BinaryPackageBuildBehavior)
2877
2878
2879-class TestSlave(TrialTestCase):
2880+class TestSlave(TestCase):
2881 """
2882 Integration tests for BuilderSlave that verify how it works against a
2883 real slave server.
2884 """
2885
2886- layer = TwistedLayer
2887-
2888- def setUp(self):
2889- super(TestSlave, self).setUp()
2890- self.slave_helper = SlaveTestHelpers()
2891- self.slave_helper.setUp()
2892- self.addCleanup(self.slave_helper.cleanUp)
2893-
2894 # XXX: JonathanLange 2010-09-20 bug=643521: There are also tests for
2895 # BuilderSlave in buildd-slave.txt and in other places. The tests here
2896 # ought to become the canonical tests for BuilderSlave vs running buildd
2897 # XML-RPC server interaction.
2898
2899+ # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`.
2900+ TEST_URL = 'http://localhost:8221/rpc/'
2901+
2902+ def getServerSlave(self):
2903+ """Set up a test build slave server.
2904+
2905+ :return: A `BuilddSlaveTestSetup` object.
2906+ """
2907+ tachandler = BuilddSlaveTestSetup()
2908+ tachandler.setUp()
2909+ self.addCleanup(tachandler.tearDown)
2910+ def addLogFile(exc_info):
2911+ self.addDetail(
2912+ 'xmlrpc-log-file',
2913+ Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read()))
2914+ self.addOnException(addLogFile)
2915+ return tachandler
2916+
2917+ def getClientSlave(self):
2918+ """Return a `BuilderSlave` for use in testing.
2919+
2920+ Points to a fixed URL that is also used by `BuilddSlaveTestSetup`.
2921+ """
2922+ return BuilderSlave.makeBlockingSlave(self.TEST_URL, 'vmhost')
2923+
2924+ def makeCacheFile(self, tachandler, filename):
2925+ """Make a cache file available on the remote slave.
2926+
2927+ :param tachandler: The TacTestSetup object used to start the remote
2928+ slave.
2929+ :param filename: The name of the file to create in the file cache
2930+ area.
2931+ """
2932+ path = os.path.join(tachandler.root, 'filecache', filename)
2933+ fd = open(path, 'w')
2934+ fd.write('something')
2935+ fd.close()
2936+ self.addCleanup(os.unlink, path)
2937+
2938+ def triggerGoodBuild(self, slave, build_id=None):
2939+ """Trigger a good build on 'slave'.
2940+
2941+ :param slave: A `BuilderSlave` instance to trigger the build on.
2942+ :param build_id: The build identifier. If not specified, defaults to
2943+ an arbitrary string.
2944+ :type build_id: str
2945+ :return: The build id returned by the slave.
2946+ """
2947+ if build_id is None:
2948+ build_id = self.getUniqueString()
2949+ tachandler = self.getServerSlave()
2950+ chroot_file = 'fake-chroot'
2951+ dsc_file = 'thing'
2952+ self.makeCacheFile(tachandler, chroot_file)
2953+ self.makeCacheFile(tachandler, dsc_file)
2954+ return slave.build(
2955+ build_id, 'debian', chroot_file, {'.dsc': dsc_file},
2956+ {'ogrecomponent': 'main'})
2957+
2958 # XXX 2010-10-06 Julian bug=655559
2959 # This is failing on buildbot but not locally; it's trying to abort
2960 # before the build has started.
2961 def disabled_test_abort(self):
2962- slave = self.slave_helper.getClientSlave()
2963+ slave = self.getClientSlave()
2964 # We need to be in a BUILDING state before we can abort.
2965- d = self.slave_helper.triggerGoodBuild(slave)
2966- d.addCallback(lambda ignored: slave.abort())
2967- d.addCallback(self.assertEqual, BuilderStatus.ABORTING)
2968- return d
2969+ self.triggerGoodBuild(slave)
2970+ result = slave.abort()
2971+ self.assertEqual(result, BuilderStatus.ABORTING)
2972
2973 def test_build(self):
2974 # Calling 'build' with an expected builder type, a good build id,
2975 # valid chroot & filemaps works and returns a BuilderStatus of
2976 # BUILDING.
2977 build_id = 'some-id'
2978- slave = self.slave_helper.getClientSlave()
2979- d = self.slave_helper.triggerGoodBuild(slave, build_id)
2980- return d.addCallback(
2981- self.assertEqual, [BuilderStatus.BUILDING, build_id])
2982+ slave = self.getClientSlave()
2983+ result = self.triggerGoodBuild(slave, build_id)
2984+ self.assertEqual([BuilderStatus.BUILDING, build_id], result)
2985
2986 def test_clean(self):
2987- slave = self.slave_helper.getClientSlave()
2988+ slave = self.getClientSlave()
2989 # XXX: JonathanLange 2010-09-21: Calling clean() on the slave requires
2990 # it to be in either the WAITING or ABORTED states, and both of these
2991 # states are very difficult to achieve in a test environment. For the
2992@@ -811,248 +574,57 @@
2993 def test_echo(self):
2994 # Calling 'echo' contacts the server which returns the arguments we
2995 # gave it.
2996- self.slave_helper.getServerSlave()
2997- slave = self.slave_helper.getClientSlave()
2998- d = slave.echo('foo', 'bar', 42)
2999- return d.addCallback(self.assertEqual, ['foo', 'bar', 42])
3000+ self.getServerSlave()
3001+ slave = self.getClientSlave()
3002+ result = slave.echo('foo', 'bar', 42)
3003+ self.assertEqual(['foo', 'bar', 42], result)
3004
3005 def test_info(self):
3006 # Calling 'info' gets some information about the slave.
3007- self.slave_helper.getServerSlave()
3008- slave = self.slave_helper.getClientSlave()
3009- d = slave.info()
3010+ self.getServerSlave()
3011+ slave = self.getClientSlave()
3012+ result = slave.info()
3013 # We're testing the hard-coded values, since the version is hard-coded
3014 # into the remote slave, the supported build managers are hard-coded
3015 # into the tac file for the remote slave and config is returned from
3016 # the configuration file.
3017- return d.addCallback(
3018- self.assertEqual,
3019+ self.assertEqual(
3020 ['1.0',
3021 'i386',
3022 ['sourcepackagerecipe',
3023- 'translation-templates', 'binarypackage', 'debian']])
3024+ 'translation-templates', 'binarypackage', 'debian']],
3025+ result)
3026
3027 def test_initial_status(self):
3028 # Calling 'status' returns the current status of the slave. The
3029 # initial status is IDLE.
3030- self.slave_helper.getServerSlave()
3031- slave = self.slave_helper.getClientSlave()
3032- d = slave.status()
3033- return d.addCallback(self.assertEqual, [BuilderStatus.IDLE, ''])
3034+ self.getServerSlave()
3035+ slave = self.getClientSlave()
3036+ status = slave.status()
3037+ self.assertEqual([BuilderStatus.IDLE, ''], status)
3038
3039 def test_status_after_build(self):
3040 # Calling 'status' returns the current status of the slave. After a
3041 # build has been triggered, the status is BUILDING.
3042- slave = self.slave_helper.getClientSlave()
3043+ slave = self.getClientSlave()
3044 build_id = 'status-build-id'
3045- d = self.slave_helper.triggerGoodBuild(slave, build_id)
3046- d.addCallback(lambda ignored: slave.status())
3047- def check_status(status):
3048- self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2])
3049- [log_file] = status[2:]
3050- self.assertIsInstance(log_file, xmlrpclib.Binary)
3051- return d.addCallback(check_status)
3052+ self.triggerGoodBuild(slave, build_id)
3053+ status = slave.status()
3054+ self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2])
3055+ [log_file] = status[2:]
3056+ self.assertIsInstance(log_file, xmlrpclib.Binary)
3057
3058 def test_ensurepresent_not_there(self):
3059 # ensurepresent checks to see if a file is there.
3060- self.slave_helper.getServerSlave()
3061- slave = self.slave_helper.getClientSlave()
3062- d = slave.ensurepresent('blahblah', None, None, None)
3063- d.addCallback(self.assertEqual, [False, 'No URL'])
3064- return d
3065+ self.getServerSlave()
3066+ slave = self.getClientSlave()
3067+ result = slave.ensurepresent('blahblah', None, None, None)
3068+ self.assertEqual([False, 'No URL'], result)
3069
3070 def test_ensurepresent_actually_there(self):
3071 # ensurepresent checks to see if a file is there.
3072- tachandler = self.slave_helper.getServerSlave()
3073- slave = self.slave_helper.getClientSlave()
3074- self.slave_helper.makeCacheFile(tachandler, 'blahblah')
3075- d = slave.ensurepresent('blahblah', None, None, None)
3076- d.addCallback(self.assertEqual, [True, 'No URL'])
3077- return d
3078-
3079- def test_sendFileToSlave_not_there(self):
3080- self.slave_helper.getServerSlave()
3081- slave = self.slave_helper.getClientSlave()
3082- d = slave.sendFileToSlave('blahblah', None, None, None)
3083- return self.assertFailure(d, CannotFetchFile)
3084-
3085- def test_sendFileToSlave_actually_there(self):
3086- tachandler = self.slave_helper.getServerSlave()
3087- slave = self.slave_helper.getClientSlave()
3088- self.slave_helper.makeCacheFile(tachandler, 'blahblah')
3089- d = slave.sendFileToSlave('blahblah', None, None, None)
3090- def check_present(ignored):
3091- d = slave.ensurepresent('blahblah', None, None, None)
3092- return d.addCallback(self.assertEqual, [True, 'No URL'])
3093- d.addCallback(check_present)
3094- return d
3095-
3096- def test_resumeHost_success(self):
3097- # On a successful resume resume() fires the returned deferred
3098- # callback with 'None'.
3099- self.slave_helper.getServerSlave()
3100- slave = self.slave_helper.getClientSlave()
3101-
3102- # The configuration testing command-line.
3103- self.assertEqual(
3104- 'echo %(vm_host)s', config.builddmaster.vm_resume_command)
3105-
3106- # On success the response is None.
3107- def check_resume_success(response):
3108- out, err, code = response
3109- self.assertEqual(os.EX_OK, code)
3110- # XXX: JonathanLange 2010-09-23: We should instead pass the
3111- # expected vm_host into the client slave. Not doing this now,
3112- # since the SlaveHelper is being moved around.
3113- self.assertEqual("%s\n" % slave._vm_host, out)
3114- d = slave.resume()
3115- d.addBoth(check_resume_success)
3116- return d
3117-
3118- def test_resumeHost_failure(self):
3119- # On a failed resume, 'resumeHost' fires the returned deferred
3120- # errorback with the `ProcessTerminated` failure.
3121- self.slave_helper.getServerSlave()
3122- slave = self.slave_helper.getClientSlave()
3123-
3124- # Override the configuration command-line with one that will fail.
3125- failed_config = """
3126- [builddmaster]
3127- vm_resume_command: test "%(vm_host)s = 'no-sir'"
3128- """
3129- config.push('failed_resume_command', failed_config)
3130- self.addCleanup(config.pop, 'failed_resume_command')
3131-
3132- # On failures, the response is a twisted `Failure` object containing
3133- # a tuple.
3134- def check_resume_failure(failure):
3135- out, err, code = failure.value
3136- # The process will exit with a return code of "1".
3137- self.assertEqual(code, 1)
3138- d = slave.resume()
3139- d.addBoth(check_resume_failure)
3140- return d
3141-
3142- def test_resumeHost_timeout(self):
3143- # On a resume timeouts, 'resumeHost' fires the returned deferred
3144- # errorback with the `TimeoutError` failure.
3145- self.slave_helper.getServerSlave()
3146- slave = self.slave_helper.getClientSlave()
3147-
3148- # Override the configuration command-line with one that will timeout.
3149- timeout_config = """
3150- [builddmaster]
3151- vm_resume_command: sleep 5
3152- socket_timeout: 1
3153- """
3154- config.push('timeout_resume_command', timeout_config)
3155- self.addCleanup(config.pop, 'timeout_resume_command')
3156-
3157- # On timeouts, the response is a twisted `Failure` object containing
3158- # a `TimeoutError` error.
3159- def check_resume_timeout(failure):
3160- self.assertIsInstance(failure, Failure)
3161- out, err, code = failure.value
3162- self.assertEqual(code, signal.SIGKILL)
3163- clock = Clock()
3164- d = slave.resume(clock=clock)
3165- # Move the clock beyond the socket_timeout but earlier than the
3166- # sleep 5. This stops the test having to wait for the timeout.
3167- # Fast tests FTW!
3168- clock.advance(2)
3169- d.addBoth(check_resume_timeout)
3170- return d
3171-
3172-
3173-class TestSlaveTimeouts(TrialTestCase):
3174- # Testing that the methods that call callRemote() all time out
3175- # as required.
3176-
3177- layer = TwistedLayer
3178-
3179- def setUp(self):
3180- super(TestSlaveTimeouts, self).setUp()
3181- self.slave_helper = SlaveTestHelpers()
3182- self.slave_helper.setUp()
3183- self.addCleanup(self.slave_helper.cleanUp)
3184- self.clock = Clock()
3185- self.proxy = DeadProxy("url")
3186- self.slave = self.slave_helper.getClientSlave(
3187- reactor=self.clock, proxy=self.proxy)
3188-
3189- def assertCancelled(self, d):
3190- self.clock.advance(config.builddmaster.socket_timeout + 1)
3191- return self.assertFailure(d, CancelledError)
3192-
3193- def test_timeout_abort(self):
3194- return self.assertCancelled(self.slave.abort())
3195-
3196- def test_timeout_clean(self):
3197- return self.assertCancelled(self.slave.clean())
3198-
3199- def test_timeout_echo(self):
3200- return self.assertCancelled(self.slave.echo())
3201-
3202- def test_timeout_info(self):
3203- return self.assertCancelled(self.slave.info())
3204-
3205- def test_timeout_status(self):
3206- return self.assertCancelled(self.slave.status())
3207-
3208- def test_timeout_ensurepresent(self):
3209- return self.assertCancelled(
3210- self.slave.ensurepresent(None, None, None, None))
3211-
3212- def test_timeout_build(self):
3213- return self.assertCancelled(
3214- self.slave.build(None, None, None, None, None))
3215-
3216-
3217-class TestSlaveWithLibrarian(TrialTestCase):
3218- """Tests that need more of Launchpad to run."""
3219-
3220- layer = TwistedLaunchpadZopelessLayer
3221-
3222- def setUp(self):
3223- super(TestSlaveWithLibrarian, self)
3224- self.slave_helper = SlaveTestHelpers()
3225- self.slave_helper.setUp()
3226- self.addCleanup(self.slave_helper.cleanUp)
3227- self.factory = LaunchpadObjectFactory()
3228- login_as(ANONYMOUS)
3229- self.addCleanup(logout)
3230-
3231- def test_ensurepresent_librarian(self):
3232- # ensurepresent, when given an http URL for a file will download the
3233- # file from that URL and report that the file is present, and it was
3234- # downloaded.
3235-
3236- # Use the Librarian because it's a "convenient" web server.
3237- lf = self.factory.makeLibraryFileAlias(
3238- 'HelloWorld.txt', content="Hello World")
3239- self.layer.txn.commit()
3240- self.slave_helper.getServerSlave()
3241- slave = self.slave_helper.getClientSlave()
3242- d = slave.ensurepresent(
3243- lf.content.sha1, lf.http_url, "", "")
3244- d.addCallback(self.assertEqual, [True, 'Download'])
3245- return d
3246-
3247- def test_retrieve_files_from_filecache(self):
3248- # Files that are present on the slave can be downloaded with a
3249- # filename made from the sha1 of the content underneath the
3250- # 'filecache' directory.
3251- content = "Hello World"
3252- lf = self.factory.makeLibraryFileAlias(
3253- 'HelloWorld.txt', content=content)
3254- self.layer.txn.commit()
3255- expected_url = '%s/filecache/%s' % (
3256- self.slave_helper.BASE_URL, lf.content.sha1)
3257- self.slave_helper.getServerSlave()
3258- slave = self.slave_helper.getClientSlave()
3259- d = slave.ensurepresent(
3260- lf.content.sha1, lf.http_url, "", "")
3261- def check_file(ignored):
3262- d = getPage(expected_url.encode('utf8'))
3263- return d.addCallback(self.assertEqual, content)
3264- return d.addCallback(check_file)
3265+ tachandler = self.getServerSlave()
3266+ slave = self.getClientSlave()
3267+ self.makeCacheFile(tachandler, 'blahblah')
3268+ result = slave.ensurepresent('blahblah', None, None, None)
3269+ self.assertEqual([True, 'No URL'], result)
3270
3271=== modified file 'lib/lp/buildmaster/tests/test_manager.py'
3272--- lib/lp/buildmaster/tests/test_manager.py 2010-10-19 13:58:21 +0000
3273+++ lib/lp/buildmaster/tests/test_manager.py 2010-12-07 16:24:04 +0000
3274@@ -6,7 +6,6 @@
3275 import os
3276 import signal
3277 import time
3278-import xmlrpclib
3279
3280 import transaction
3281
3282@@ -15,7 +14,9 @@
3283 reactor,
3284 task,
3285 )
3286+from twisted.internet.error import ConnectionClosed
3287 from twisted.internet.task import (
3288+ Clock,
3289 deferLater,
3290 )
3291 from twisted.python.failure import Failure
3292@@ -29,45 +30,577 @@
3293 ANONYMOUS,
3294 login,
3295 )
3296-from canonical.launchpad.scripts.logger import (
3297- QuietFakeLogger,
3298- )
3299+from canonical.launchpad.scripts.logger import BufferLogger
3300 from canonical.testing.layers import (
3301 LaunchpadScriptLayer,
3302- TwistedLaunchpadZopelessLayer,
3303+ LaunchpadZopelessLayer,
3304 TwistedLayer,
3305- ZopelessDatabaseLayer,
3306 )
3307 from lp.buildmaster.enums import BuildStatus
3308 from lp.buildmaster.interfaces.builder import IBuilderSet
3309 from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
3310 from lp.buildmaster.manager import (
3311- assessFailureCounts,
3312+ BaseDispatchResult,
3313+ buildd_success_result_map,
3314 BuilddManager,
3315+ FailDispatchResult,
3316 NewBuildersScanner,
3317+ RecordingSlave,
3318+ ResetDispatchResult,
3319 SlaveScanner,
3320 )
3321-from lp.buildmaster.model.builder import Builder
3322 from lp.buildmaster.tests.harness import BuilddManagerTestSetup
3323-from lp.buildmaster.tests.mock_slaves import (
3324- BrokenSlave,
3325- BuildingSlave,
3326- OkSlave,
3327- )
3328+from lp.buildmaster.tests.mock_slaves import BuildingSlave
3329 from lp.registry.interfaces.distribution import IDistributionSet
3330 from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet
3331-from lp.testing import TestCaseWithFactory
3332+from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
3333+from lp.testing import TestCase as LaunchpadTestCase
3334 from lp.testing.factory import LaunchpadObjectFactory
3335 from lp.testing.fakemethod import FakeMethod
3336 from lp.testing.sampledata import BOB_THE_BUILDER_NAME
3337
3338
3339+class TestRecordingSlaves(TrialTestCase):
3340+ """Tests for the recording slave class."""
3341+ layer = TwistedLayer
3342+
3343+ def setUp(self):
3344+ """Setup a fresh `RecordingSlave` for tests."""
3345+ TrialTestCase.setUp(self)
3346+ self.slave = RecordingSlave(
3347+ 'foo', 'http://foo:8221/rpc', 'foo.host')
3348+
3349+ def test_representation(self):
3350+ """`RecordingSlave` has a custom representation.
3351+
3352+ It encloses builder name and xmlrpc url for debug purposes.
3353+ """
3354+ self.assertEqual('<foo:http://foo:8221/rpc>', repr(self.slave))
3355+
3356+ def assert_ensurepresent(self, func):
3357+ """Helper function to test results from calling ensurepresent."""
3358+ self.assertEqual(
3359+ [True, 'Download'],
3360+ func('boing', 'bar', 'baz'))
3361+ self.assertEqual(
3362+ [('ensurepresent', ('boing', 'bar', 'baz'))],
3363+ self.slave.calls)
3364+
3365+ def test_ensurepresent(self):
3366+ """`RecordingSlave.ensurepresent` always succeeds.
3367+
3368+ It returns the expected succeed code and records the interaction
3369+ information for later use.
3370+ """
3371+ self.assert_ensurepresent(self.slave.ensurepresent)
3372+
3373+ def test_sendFileToSlave(self):
3374+ """RecordingSlave.sendFileToSlave always succeeeds.
3375+
3376+ It calls ensurepresent() and hence returns the same results.
3377+ """
3378+ self.assert_ensurepresent(self.slave.sendFileToSlave)
3379+
3380+ def test_build(self):
3381+ """`RecordingSlave.build` always succeeds.
3382+
3383+ It returns the expected succeed code and records the interaction
3384+ information for later use.
3385+ """
3386+ self.assertEqual(
3387+ ['BuilderStatus.BUILDING', 'boing'],
3388+ self.slave.build('boing', 'bar', 'baz'))
3389+ self.assertEqual(
3390+ [('build', ('boing', 'bar', 'baz'))],
3391+ self.slave.calls)
3392+
3393+ def test_resume(self):
3394+ """`RecordingSlave.resume` always returns successs."""
3395+ # Resume isn't requested in a just-instantiated RecordingSlave.
3396+ self.assertFalse(self.slave.resume_requested)
3397+
3398+ # When resume is called, it returns the success list and mark
3399+ # the slave for resuming.
3400+ self.assertEqual(['', '', os.EX_OK], self.slave.resume())
3401+ self.assertTrue(self.slave.resume_requested)
3402+
3403+ def test_resumeHost_success(self):
3404+ # On a successful resume resumeHost() fires the returned deferred
3405+ # callback with 'None'.
3406+
3407+ # The configuration testing command-line.
3408+ self.assertEqual(
3409+ 'echo %(vm_host)s', config.builddmaster.vm_resume_command)
3410+
3411+ # On success the response is None.
3412+ def check_resume_success(response):
3413+ out, err, code = response
3414+ self.assertEqual(os.EX_OK, code)
3415+ self.assertEqual("%s\n" % self.slave.vm_host, out)
3416+ d = self.slave.resumeSlave()
3417+ d.addBoth(check_resume_success)
3418+ return d
3419+
3420+ def test_resumeHost_failure(self):
3421+ # On a failed resume, 'resumeHost' fires the returned deferred
3422+ # errorback with the `ProcessTerminated` failure.
3423+
3424+ # Override the configuration command-line with one that will fail.
3425+ failed_config = """
3426+ [builddmaster]
3427+ vm_resume_command: test "%(vm_host)s = 'no-sir'"
3428+ """
3429+ config.push('failed_resume_command', failed_config)
3430+ self.addCleanup(config.pop, 'failed_resume_command')
3431+
3432+ # On failures, the response is a twisted `Failure` object containing
3433+ # a tuple.
3434+ def check_resume_failure(failure):
3435+ out, err, code = failure.value
3436+ # The process will exit with a return code of "1".
3437+ self.assertEqual(code, 1)
3438+ d = self.slave.resumeSlave()
3439+ d.addBoth(check_resume_failure)
3440+ return d
3441+
3442+ def test_resumeHost_timeout(self):
3443+ # On a resume timeouts, 'resumeHost' fires the returned deferred
3444+ # errorback with the `TimeoutError` failure.
3445+
3446+ # Override the configuration command-line with one that will timeout.
3447+ timeout_config = """
3448+ [builddmaster]
3449+ vm_resume_command: sleep 5
3450+ socket_timeout: 1
3451+ """
3452+ config.push('timeout_resume_command', timeout_config)
3453+ self.addCleanup(config.pop, 'timeout_resume_command')
3454+
3455+ # On timeouts, the response is a twisted `Failure` object containing
3456+ # a `TimeoutError` error.
3457+ def check_resume_timeout(failure):
3458+ self.assertIsInstance(failure, Failure)
3459+ out, err, code = failure.value
3460+ self.assertEqual(code, signal.SIGKILL)
3461+ clock = Clock()
3462+ d = self.slave.resumeSlave(clock=clock)
3463+ # Move the clock beyond the socket_timeout but earlier than the
3464+ # sleep 5. This stops the test having to wait for the timeout.
3465+ # Fast tests FTW!
3466+ clock.advance(2)
3467+ d.addBoth(check_resume_timeout)
3468+ return d
3469+
3470+
3471+class TestingXMLRPCProxy:
3472+ """This class mimics a twisted XMLRPC Proxy class."""
3473+
3474+ def __init__(self, failure_info=None):
3475+ self.calls = []
3476+ self.failure_info = failure_info
3477+ self.works = failure_info is None
3478+
3479+ def callRemote(self, *args):
3480+ self.calls.append(args)
3481+ if self.works:
3482+ result = buildd_success_result_map.get(args[0])
3483+ else:
3484+ result = 'boing'
3485+ return defer.succeed([result, self.failure_info])
3486+
3487+
3488+class TestingResetDispatchResult(ResetDispatchResult):
3489+ """Override the evaluation method to simply annotate the call."""
3490+
3491+ def __init__(self, slave, info=None):
3492+ ResetDispatchResult.__init__(self, slave, info)
3493+ self.processed = False
3494+
3495+ def __call__(self):
3496+ self.processed = True
3497+
3498+
3499+class TestingFailDispatchResult(FailDispatchResult):
3500+ """Override the evaluation method to simply annotate the call."""
3501+
3502+ def __init__(self, slave, info=None):
3503+ FailDispatchResult.__init__(self, slave, info)
3504+ self.processed = False
3505+
3506+ def __call__(self):
3507+ self.processed = True
3508+
3509+
3510+class TestingSlaveScanner(SlaveScanner):
3511+ """Override the dispatch result factories """
3512+
3513+ reset_result = TestingResetDispatchResult
3514+ fail_result = TestingFailDispatchResult
3515+
3516+
3517+class TestSlaveScanner(TrialTestCase):
3518+ """Tests for the actual build slave manager."""
3519+ layer = LaunchpadZopelessLayer
3520+
3521+ def setUp(self):
3522+ TrialTestCase.setUp(self)
3523+ self.manager = TestingSlaveScanner(
3524+ BOB_THE_BUILDER_NAME, BufferLogger())
3525+
3526+ self.fake_builder_url = 'http://bob.buildd:8221/'
3527+ self.fake_builder_host = 'bob.host'
3528+
3529+ # We will use an instrumented SlaveScanner instance for tests in
3530+ # this context.
3531+
3532+ # Stop cyclic execution and record the end of the cycle.
3533+ self.stopped = False
3534+
3535+ def testNextCycle():
3536+ self.stopped = True
3537+
3538+ self.manager.scheduleNextScanCycle = testNextCycle
3539+
3540+ # Return the testing Proxy version.
3541+ self.test_proxy = TestingXMLRPCProxy()
3542+
3543+ def testGetProxyForSlave(slave):
3544+ return self.test_proxy
3545+ self.manager._getProxyForSlave = testGetProxyForSlave
3546+
3547+ # Deactivate the 'scan' method.
3548+ def testScan():
3549+ pass
3550+ self.manager.scan = testScan
3551+
3552+ # Stop automatic collection of dispatching results.
3553+ def testslaveConversationEnded():
3554+ pass
3555+ self._realslaveConversationEnded = self.manager.slaveConversationEnded
3556+ self.manager.slaveConversationEnded = testslaveConversationEnded
3557+
3558+ def assertIsDispatchReset(self, result):
3559+ self.assertTrue(
3560+ isinstance(result, TestingResetDispatchResult),
3561+ 'Dispatch failure did not result in a ResetBuildResult object')
3562+
3563+ def assertIsDispatchFail(self, result):
3564+ self.assertTrue(
3565+ isinstance(result, TestingFailDispatchResult),
3566+ 'Dispatch failure did not result in a FailBuildResult object')
3567+
3568+ def test_checkResume(self):
3569+ """`SlaveScanner.checkResume` is chained after resume requests.
3570+
3571+ If the resume request succeed it returns None, otherwise it returns
3572+ a `ResetBuildResult` (the one in the test context) that will be
3573+ collect and evaluated later.
3574+
3575+ See `RecordingSlave.resumeHost` for more information about the resume
3576+ result contents.
3577+ """
3578+ slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
3579+
3580+ successful_response = ['', '', os.EX_OK]
3581+ result = self.manager.checkResume(successful_response, slave)
3582+ self.assertEqual(
3583+ None, result, 'Successful resume checks should return None')
3584+
3585+ failed_response = ['stdout', 'stderr', 1]
3586+ result = self.manager.checkResume(failed_response, slave)
3587+ self.assertIsDispatchReset(result)
3588+ self.assertEqual(
3589+ '<foo:http://foo.buildd:8221/> reset failure', repr(result))
3590+ self.assertEqual(
3591+ result.info, "stdout\nstderr")
3592+
3593+ def test_fail_to_resume_slave_resets_slave(self):
3594+ # If an attempt to resume and dispatch a slave fails, we reset the
3595+ # slave by calling self.reset_result(slave)().
3596+
3597+ reset_result_calls = []
3598+
3599+ class LoggingResetResult(BaseDispatchResult):
3600+ """A DispatchResult that logs calls to itself.
3601+
3602+ This *must* subclass BaseDispatchResult, otherwise finishCycle()
3603+ won't treat it like a dispatch result.
3604+ """
3605+
3606+ def __init__(self, slave, info=None):
3607+ self.slave = slave
3608+
3609+ def __call__(self):
3610+ reset_result_calls.append(self.slave)
3611+
3612+ # Make a failing slave that is requesting a resume.
3613+ slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
3614+ slave.resume_requested = True
3615+ slave.resumeSlave = lambda: deferLater(
3616+ reactor, 0, defer.fail, Failure(('out', 'err', 1)))
3617+
3618+ # Make the manager log the reset result calls.
3619+ self.manager.reset_result = LoggingResetResult
3620+
3621+ # We only care about this one slave. Reset the list of manager
3622+ # deferreds in case setUp did something unexpected.
3623+ self.manager._deferred_list = []
3624+
3625+ # Here, we're patching the slaveConversationEnded method so we can
3626+ # get an extra callback at the end of it, so we can
3627+ # verify that the reset_result was really called.
3628+ def _slaveConversationEnded():
3629+ d = self._realslaveConversationEnded()
3630+ return d.addCallback(
3631+ lambda ignored: self.assertEqual([slave], reset_result_calls))
3632+ self.manager.slaveConversationEnded = _slaveConversationEnded
3633+
3634+ self.manager.resumeAndDispatch(slave)
3635+
3636+ def test_failed_to_resume_slave_ready_for_reset(self):
3637+ # When a slave fails to resume, the manager has a Deferred in its
3638+ # Deferred list that is ready to fire with a ResetDispatchResult.
3639+
3640+ # Make a failing slave that is requesting a resume.
3641+ slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host')
3642+ slave.resume_requested = True
3643+ slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1)))
3644+
3645+ # We only care about this one slave. Reset the list of manager
3646+ # deferreds in case setUp did something unexpected.
3647+ self.manager._deferred_list = []
3648+ # Restore the slaveConversationEnded method. It's very relevant to
3649+ # this test.
3650+ self.manager.slaveConversationEnded = self._realslaveConversationEnded
3651+ self.manager.resumeAndDispatch(slave)
3652+ [d] = self.manager._deferred_list
3653+
3654+ # The Deferred for our failing slave should be ready to fire
3655+ # successfully with a ResetDispatchResult.
3656+ def check_result(result):
3657+ self.assertIsInstance(result, ResetDispatchResult)
3658+ self.assertEqual(slave, result.slave)
3659+ self.assertFalse(result.processed)
3660+ return d.addCallback(check_result)
3661+
3662+ def _setUpSlaveAndBuilder(self, builder_failure_count=None,
3663+ job_failure_count=None):
3664+ # Helper function to set up a builder and its recording slave.
3665+ if builder_failure_count is None:
3666+ builder_failure_count = 0
3667+ if job_failure_count is None:
3668+ job_failure_count = 0
3669+ slave = RecordingSlave(
3670+ BOB_THE_BUILDER_NAME, self.fake_builder_url,
3671+ self.fake_builder_host)
3672+ bob_builder = getUtility(IBuilderSet)[slave.name]
3673+ bob_builder.failure_count = builder_failure_count
3674+ bob_builder.getCurrentBuildFarmJob().failure_count = job_failure_count
3675+ return slave, bob_builder
3676+
3677+ def test_checkDispatch_success(self):
3678+ # SlaveScanner.checkDispatch returns None for a successful
3679+ # dispatch.
3680+
3681+ """
3682+ If the dispatch request fails or a unknown method is given, it
3683+ returns a `FailDispatchResult` (in the test context) that will
3684+ be evaluated later.
3685+
3686+ Builders will be marked as failed if the following responses
3687+ categories are received.
3688+
3689+ * Legitimate slave failures: when the response is a list with 2
3690+ elements but the first element ('status') does not correspond to
3691+ the expected 'success' result. See `buildd_success_result_map`.
3692+
3693+ * Unexpected (code) failures: when the given 'method' is unknown
3694+ or the response isn't a 2-element list or Failure instance.
3695+
3696+ Communication failures (a twisted `Failure` instance) will simply
3697+ cause the builder to be reset, a `ResetDispatchResult` object is
3698+ returned. In other words, network failures are ignored in this
3699+ stage, broken builders will be identified and marked as so
3700+ during 'scan()' stage.
3701+
3702+ On success dispatching it returns None.
3703+ """
3704+ slave, bob_builder = self._setUpSlaveAndBuilder(
3705+ builder_failure_count=0, job_failure_count=0)
3706+
3707+ # Successful legitimate response, None is returned.
3708+ successful_response = [
3709+ buildd_success_result_map.get('ensurepresent'), 'cool builder']
3710+ result = self.manager.checkDispatch(
3711+ successful_response, 'ensurepresent', slave)
3712+ self.assertEqual(
3713+ None, result, 'Successful dispatch checks should return None')
3714+
3715+ def test_checkDispatch_first_fail(self):
3716+ # Failed legitimate response, results in FailDispatchResult and
3717+ # failure_count on the job and the builder are both incremented.
3718+ slave, bob_builder = self._setUpSlaveAndBuilder(
3719+ builder_failure_count=0, job_failure_count=0)
3720+
3721+ failed_response = [False, 'uncool builder']
3722+ result = self.manager.checkDispatch(
3723+ failed_response, 'ensurepresent', slave)
3724+ self.assertIsDispatchFail(result)
3725+ self.assertEqual(
3726+ repr(result),
3727+ '<bob:%s> failure (uncool builder)' % self.fake_builder_url)
3728+ self.assertEqual(1, bob_builder.failure_count)
3729+ self.assertEqual(
3730+ 1, bob_builder.getCurrentBuildFarmJob().failure_count)
3731+
3732+ def test_checkDispatch_second_reset_fail_by_builder(self):
3733+ # Twisted Failure response, results in a `FailDispatchResult`.
3734+ slave, bob_builder = self._setUpSlaveAndBuilder(
3735+ builder_failure_count=1, job_failure_count=0)
3736+
3737+ twisted_failure = Failure(ConnectionClosed('Boom!'))
3738+ result = self.manager.checkDispatch(
3739+ twisted_failure, 'ensurepresent', slave)
3740+ self.assertIsDispatchFail(result)
3741+ self.assertEqual(
3742+ '<bob:%s> failure (None)' % self.fake_builder_url, repr(result))
3743+ self.assertEqual(2, bob_builder.failure_count)
3744+ self.assertEqual(
3745+ 1, bob_builder.getCurrentBuildFarmJob().failure_count)
3746+
3747+ def test_checkDispatch_second_comms_fail_by_builder(self):
3748+ # Unexpected response, results in a `FailDispatchResult`.
3749+ slave, bob_builder = self._setUpSlaveAndBuilder(
3750+ builder_failure_count=1, job_failure_count=0)
3751+
3752+ unexpected_response = [1, 2, 3]
3753+ result = self.manager.checkDispatch(
3754+ unexpected_response, 'build', slave)
3755+ self.assertIsDispatchFail(result)
3756+ self.assertEqual(
3757+ '<bob:%s> failure '
3758+ '(Unexpected response: [1, 2, 3])' % self.fake_builder_url,
3759+ repr(result))
3760+ self.assertEqual(2, bob_builder.failure_count)
3761+ self.assertEqual(
3762+ 1, bob_builder.getCurrentBuildFarmJob().failure_count)
3763+
3764+ def test_checkDispatch_second_comms_fail_by_job(self):
3765+ # Unknown method was given, results in a `FailDispatchResult`.
3766+ # This could be caused by a faulty job which would fail the job.
3767+ slave, bob_builder = self._setUpSlaveAndBuilder(
3768+ builder_failure_count=0, job_failure_count=1)
3769+
3770+ successful_response = [
3771+ buildd_success_result_map.get('ensurepresent'), 'cool builder']
3772+ result = self.manager.checkDispatch(
3773+ successful_response, 'unknown-method', slave)
3774+ self.assertIsDispatchFail(result)
3775+ self.assertEqual(
3776+ '<bob:%s> failure '
3777+ '(Unknown slave method: unknown-method)' % self.fake_builder_url,
3778+ repr(result))
3779+ self.assertEqual(1, bob_builder.failure_count)
3780+ self.assertEqual(
3781+ 2, bob_builder.getCurrentBuildFarmJob().failure_count)
3782+
3783+ def test_initiateDispatch(self):
3784+ """Check `dispatchBuild` in various scenarios.
3785+
3786+ When there are no recording slaves (i.e. no build got dispatched
3787+ in scan()) it simply finishes the cycle.
3788+
3789+ When there is a recording slave with pending slave calls, they are
3790+ performed and if they all succeed the cycle is finished with no
3791+ errors.
3792+
3793+ On slave call failure the chain is stopped immediately and an
3794+ FailDispatchResult is collected while finishing the cycle.
3795+ """
3796+ def check_no_events(results):
3797+ errors = [
3798+ r for s, r in results if isinstance(r, BaseDispatchResult)]
3799+ self.assertEqual(0, len(errors))
3800+
3801+ def check_events(results):
3802+ [error] = [r for s, r in results if r is not None]
3803+ self.assertEqual(
3804+ '<bob:%s> failure (very broken slave)'
3805+ % self.fake_builder_url,
3806+ repr(error))
3807+ self.assertTrue(error.processed)
3808+
3809+ def _wait_on_deferreds_then_check_no_events():
3810+ dl = self._realslaveConversationEnded()
3811+ dl.addCallback(check_no_events)
3812+
3813+ def _wait_on_deferreds_then_check_events():
3814+ dl = self._realslaveConversationEnded()
3815+ dl.addCallback(check_events)
3816+
3817+ # A functional slave charged with some interactions.
3818+ slave = RecordingSlave(
3819+ BOB_THE_BUILDER_NAME, self.fake_builder_url,
3820+ self.fake_builder_host)
3821+ slave.ensurepresent('arg1', 'arg2', 'arg3')
3822+ slave.build('arg1', 'arg2', 'arg3')
3823+
3824+ # If the previous step (resuming) has failed nothing gets dispatched.
3825+ reset_result = ResetDispatchResult(slave)
3826+ result = self.manager.initiateDispatch(reset_result, slave)
3827+ self.assertTrue(result is reset_result)
3828+ self.assertFalse(slave.resume_requested)
3829+ self.assertEqual(0, len(self.manager._deferred_list))
3830+
3831+ # Operation with the default (funcional slave), no resets or
3832+ # failures results are triggered.
3833+ slave.resume()
3834+ result = self.manager.initiateDispatch(None, slave)
3835+ self.assertEqual(None, result)
3836+ self.assertTrue(slave.resume_requested)
3837+ self.assertEqual(
3838+ [('ensurepresent', 'arg1', 'arg2', 'arg3'),
3839+ ('build', 'arg1', 'arg2', 'arg3')],
3840+ self.test_proxy.calls)
3841+ self.assertEqual(2, len(self.manager._deferred_list))
3842+
3843+ # Monkey patch the slaveConversationEnded method so we can chain a
3844+ # callback to check the end of the result chain.
3845+ self.manager.slaveConversationEnded = \
3846+ _wait_on_deferreds_then_check_no_events
3847+ events = self.manager.slaveConversationEnded()
3848+
3849+ # Create a broken slave and insert interaction that will
3850+ # cause the builder to be marked as fail.
3851+ self.test_proxy = TestingXMLRPCProxy('very broken slave')
3852+ slave = RecordingSlave(
3853+ BOB_THE_BUILDER_NAME, self.fake_builder_url,
3854+ self.fake_builder_host)
3855+ slave.ensurepresent('arg1', 'arg2', 'arg3')
3856+ slave.build('arg1', 'arg2', 'arg3')
3857+
3858+ result = self.manager.initiateDispatch(None, slave)
3859+ self.assertEqual(None, result)
3860+ self.assertEqual(3, len(self.manager._deferred_list))
3861+ self.assertEqual(
3862+ [('ensurepresent', 'arg1', 'arg2', 'arg3')],
3863+ self.test_proxy.calls)
3864+
3865+ # Monkey patch the slaveConversationEnded method so we can chain a
3866+ # callback to check the end of the result chain.
3867+ self.manager.slaveConversationEnded = \
3868+ _wait_on_deferreds_then_check_events
3869+ events = self.manager.slaveConversationEnded()
3870+
3871+ return events
3872+
3873+
3874 class TestSlaveScannerScan(TrialTestCase):
3875 """Tests `SlaveScanner.scan` method.
3876
3877 This method uses the old framework for scanning and dispatching builds.
3878 """
3879- layer = TwistedLaunchpadZopelessLayer
3880+ layer = LaunchpadZopelessLayer
3881
3882 def setUp(self):
3883 """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest.
3884@@ -75,18 +608,19 @@
3885 Also adjust the sampledata in a way a build can be dispatched to
3886 'bob' builder.
3887 """
3888- from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
3889 TwistedLayer.testSetUp()
3890 TrialTestCase.setUp(self)
3891 self.slave = BuilddSlaveTestSetup()
3892 self.slave.setUp()
3893
3894 # Creating the required chroots needed for dispatching.
3895+ login('foo.bar@canonical.com')
3896 test_publisher = SoyuzTestPublisher()
3897 ubuntu = getUtility(IDistributionSet).getByName('ubuntu')
3898 hoary = ubuntu.getSeries('hoary')
3899 test_publisher.setUpDefaultDistroSeries(hoary)
3900 test_publisher.addFakeChroots()
3901+ login(ANONYMOUS)
3902
3903 def tearDown(self):
3904 self.slave.tearDown()
3905@@ -94,7 +628,8 @@
3906 TwistedLayer.testTearDown()
3907
3908 def _resetBuilder(self, builder):
3909- """Reset the given builder and its job."""
3910+ """Reset the given builder and it's job."""
3911+ login('foo.bar@canonical.com')
3912
3913 builder.builderok = True
3914 job = builder.currentjob
3915@@ -102,6 +637,7 @@
3916 job.reset()
3917
3918 transaction.commit()
3919+ login(ANONYMOUS)
3920
3921 def assertBuildingJob(self, job, builder, logtail=None):
3922 """Assert the given job is building on the given builder."""
3923@@ -117,25 +653,55 @@
3924 self.assertEqual(build.status, BuildStatus.BUILDING)
3925 self.assertEqual(job.logtail, logtail)
3926
3927- def _getScanner(self, builder_name=None):
3928+ def _getManager(self):
3929 """Instantiate a SlaveScanner object.
3930
3931 Replace its default logging handler by a testing version.
3932 """
3933- if builder_name is None:
3934- builder_name = BOB_THE_BUILDER_NAME
3935- scanner = SlaveScanner(builder_name, QuietFakeLogger())
3936- scanner.logger.name = 'slave-scanner'
3937+ manager = SlaveScanner(BOB_THE_BUILDER_NAME, BufferLogger())
3938+ manager.logger.name = 'slave-scanner'
3939
3940- return scanner
3941+ return manager
3942
3943 def _checkDispatch(self, slave, builder):
3944- # SlaveScanner.scan returns a slave when a dispatch was
3945- # successful. We also check that the builder has a job on it.
3946-
3947- self.assertTrue(slave is not None, "Expected a slave.")
3948+ """`SlaveScanner.scan` returns a `RecordingSlave`.
3949+
3950+ The single slave returned should match the given builder and
3951+ contain interactions that should be performed asynchronously for
3952+ properly dispatching the sampledata job.
3953+ """
3954+ self.assertFalse(
3955+ slave is None, "Unexpected recording_slaves.")
3956+
3957+ self.assertEqual(slave.name, builder.name)
3958+ self.assertEqual(slave.url, builder.url)
3959+ self.assertEqual(slave.vm_host, builder.vm_host)
3960 self.assertEqual(0, builder.failure_count)
3961- self.assertTrue(builder.currentjob is not None)
3962+
3963+ self.assertEqual(
3964+ [('ensurepresent',
3965+ ('0feca720e2c29dafb2c900713ba560e03b758711',
3966+ 'http://localhost:58000/93/fake_chroot.tar.gz',
3967+ '', '')),
3968+ ('ensurepresent',
3969+ ('4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33',
3970+ 'http://localhost:58000/43/alsa-utils_1.0.9a-4ubuntu1.dsc',
3971+ '', '')),
3972+ ('build',
3973+ ('6358a89e2215e19b02bf91e2e4d009640fae5cf8',
3974+ 'binarypackage', '0feca720e2c29dafb2c900713ba560e03b758711',
3975+ {'alsa-utils_1.0.9a-4ubuntu1.dsc':
3976+ '4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33'},
3977+ {'arch_indep': True,
3978+ 'arch_tag': 'i386',
3979+ 'archive_private': False,
3980+ 'archive_purpose': 'PRIMARY',
3981+ 'archives':
3982+ ['deb http://ftpmaster.internal/ubuntu hoary main'],
3983+ 'build_debug_symbols': False,
3984+ 'ogrecomponent': 'main',
3985+ 'suite': u'hoary'}))],
3986+ slave.calls, "Job was not properly dispatched.")
3987
3988 def testScanDispatchForResetBuilder(self):
3989 # A job gets dispatched to the sampledata builder after it's reset.
3990@@ -143,27 +709,26 @@
3991 # Reset sampledata builder.
3992 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
3993 self._resetBuilder(builder)
3994- builder.setSlaveForTesting(OkSlave())
3995 # Set this to 1 here so that _checkDispatch can make sure it's
3996 # reset to 0 after a successful dispatch.
3997 builder.failure_count = 1
3998
3999 # Run 'scan' and check its result.
4000- self.layer.txn.commit()
4001- self.layer.switchDbUser(config.builddmaster.dbuser)
4002- scanner = self._getScanner()
4003- d = defer.maybeDeferred(scanner.scan)
4004+ LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)
4005+ manager = self._getManager()
4006+ d = defer.maybeDeferred(manager.scan)
4007 d.addCallback(self._checkDispatch, builder)
4008 return d
4009
4010- def _checkNoDispatch(self, slave, builder):
4011+ def _checkNoDispatch(self, recording_slave, builder):
4012 """Assert that no dispatch has occurred.
4013
4014- 'slave' is None, so no interations would be passed
4015+ 'recording_slave' is None, so no interations would be passed
4016 to the asynchonous dispatcher and the builder remained active
4017 and IDLE.
4018 """
4019- self.assertTrue(slave is None, "Unexpected slave.")
4020+ self.assertTrue(
4021+ recording_slave is None, "Unexpected recording_slave.")
4022
4023 builder = getUtility(IBuilderSet).get(builder.id)
4024 self.assertTrue(builder.builderok)
4025@@ -188,9 +753,9 @@
4026 login(ANONYMOUS)
4027
4028 # Run 'scan' and check its result.
4029- self.layer.switchDbUser(config.builddmaster.dbuser)
4030- scanner = self._getScanner()
4031- d = defer.maybeDeferred(scanner.singleCycle)
4032+ LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)
4033+ manager = self._getManager()
4034+ d = defer.maybeDeferred(manager.scan)
4035 d.addCallback(self._checkNoDispatch, builder)
4036 return d
4037
4038@@ -228,9 +793,9 @@
4039 login(ANONYMOUS)
4040
4041 # Run 'scan' and check its result.
4042- self.layer.switchDbUser(config.builddmaster.dbuser)
4043- scanner = self._getScanner()
4044- d = defer.maybeDeferred(scanner.scan)
4045+ LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)
4046+ manager = self._getManager()
4047+ d = defer.maybeDeferred(manager.scan)
4048 d.addCallback(self._checkJobRescued, builder, job)
4049 return d
4050
4051@@ -249,6 +814,8 @@
4052 self.assertBuildingJob(job, builder, logtail='This is a build log')
4053
4054 def testScanUpdatesBuildingJobs(self):
4055+ # The job assigned to a broken builder is rescued.
4056+
4057 # Enable sampledata builder attached to an appropriate testing
4058 # slave. It will respond as if it was building the sampledata job.
4059 builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
4060@@ -263,174 +830,188 @@
4061 self.assertBuildingJob(job, builder)
4062
4063 # Run 'scan' and check its result.
4064- self.layer.switchDbUser(config.builddmaster.dbuser)
4065- scanner = self._getScanner()
4066- d = defer.maybeDeferred(scanner.scan)
4067+ LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser)
4068+ manager = self._getManager()
4069+ d = defer.maybeDeferred(manager.scan)
4070 d.addCallback(self._checkJobUpdated, builder, job)
4071 return d
4072
4073- def test_scan_with_nothing_to_dispatch(self):
4074- factory = LaunchpadObjectFactory()
4075- builder = factory.makeBuilder()
4076- builder.setSlaveForTesting(OkSlave())
4077- scanner = self._getScanner(builder_name=builder.name)
4078- d = scanner.scan()
4079- return d.addCallback(self._checkNoDispatch, builder)
4080-
4081- def test_scan_with_manual_builder(self):
4082- # Reset sampledata builder.
4083- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
4084- self._resetBuilder(builder)
4085- builder.setSlaveForTesting(OkSlave())
4086- builder.manual = True
4087- scanner = self._getScanner()
4088- d = scanner.scan()
4089- d.addCallback(self._checkNoDispatch, builder)
4090- return d
4091-
4092- def test_scan_with_not_ok_builder(self):
4093- # Reset sampledata builder.
4094- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
4095- self._resetBuilder(builder)
4096- builder.setSlaveForTesting(OkSlave())
4097- builder.builderok = False
4098- scanner = self._getScanner()
4099- d = scanner.scan()
4100- # Because the builder is not ok, we can't use _checkNoDispatch.
4101- d.addCallback(
4102- lambda ignored: self.assertIdentical(None, builder.currentjob))
4103- return d
4104-
4105- def test_scan_of_broken_slave(self):
4106- builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
4107- self._resetBuilder(builder)
4108- builder.setSlaveForTesting(BrokenSlave())
4109- builder.failure_count = 0
4110- scanner = self._getScanner(builder_name=builder.name)
4111- d = scanner.scan()
4112- return self.assertFailure(d, xmlrpclib.Fault)
4113-
4114- def _assertFailureCounting(self, builder_count, job_count,
4115- expected_builder_count, expected_job_count):
4116+ def test_scan_assesses_failure_exceptions(self):
4117 # If scan() fails with an exception, failure_counts should be
4118- # incremented. What we do with the results of the failure
4119- # counts is tested below separately, this test just makes sure that
4120- # scan() is setting the counts.
4121+ # incremented and tested.
4122 def failing_scan():
4123- return defer.fail(Exception("fake exception"))
4124- scanner = self._getScanner()
4125- scanner.scan = failing_scan
4126+ raise Exception("fake exception")
4127+ manager = self._getManager()
4128+ manager.scan = failing_scan
4129+ manager.scheduleNextScanCycle = FakeMethod()
4130 from lp.buildmaster import manager as manager_module
4131 self.patch(manager_module, 'assessFailureCounts', FakeMethod())
4132- builder = getUtility(IBuilderSet)[scanner.builder_name]
4133-
4134- builder.failure_count = builder_count
4135- builder.currentjob.specific_job.build.failure_count = job_count
4136- # The _scanFailed() calls abort, so make sure our existing
4137- # failure counts are persisted.
4138- self.layer.txn.commit()
4139-
4140- # singleCycle() calls scan() which is our fake one that throws an
4141+ builder = getUtility(IBuilderSet)[manager.builder_name]
4142+
4143+ # Failure counts start at zero.
4144+ self.assertEqual(0, builder.failure_count)
4145+ self.assertEqual(
4146+ 0, builder.currentjob.specific_job.build.failure_count)
4147+
4148+ # startCycle() calls scan() which is our fake one that throws an
4149 # exception.
4150- d = scanner.singleCycle()
4151+ manager.startCycle()
4152
4153 # Failure counts should be updated, and the assessment method
4154- # should have been called. The actual behaviour is tested below
4155- # in TestFailureAssessments.
4156- def got_scan(ignored):
4157- self.assertEqual(expected_builder_count, builder.failure_count)
4158- self.assertEqual(
4159- expected_job_count,
4160- builder.currentjob.specific_job.build.failure_count)
4161- self.assertEqual(
4162- 1, manager_module.assessFailureCounts.call_count)
4163-
4164- return d.addCallback(got_scan)
4165-
4166- def test_scan_first_fail(self):
4167- # The first failure of a job should result in the failure_count
4168- # on the job and the builder both being incremented.
4169- self._assertFailureCounting(
4170- builder_count=0, job_count=0, expected_builder_count=1,
4171- expected_job_count=1)
4172-
4173- def test_scan_second_builder_fail(self):
4174- # The first failure of a job should result in the failure_count
4175- # on the job and the builder both being incremented.
4176- self._assertFailureCounting(
4177- builder_count=1, job_count=0, expected_builder_count=2,
4178- expected_job_count=1)
4179-
4180- def test_scan_second_job_fail(self):
4181- # The first failure of a job should result in the failure_count
4182- # on the job and the builder both being incremented.
4183- self._assertFailureCounting(
4184- builder_count=0, job_count=1, expected_builder_count=1,
4185- expected_job_count=2)
4186-
4187- def test_scanFailed_handles_lack_of_a_job_on_the_builder(self):
4188- def failing_scan():
4189- return defer.fail(Exception("fake exception"))
4190- scanner = self._getScanner()
4191- scanner.scan = failing_scan
4192- builder = getUtility(IBuilderSet)[scanner.builder_name]
4193- builder.failure_count = Builder.FAILURE_THRESHOLD
4194- builder.currentjob.reset()
4195- self.layer.txn.commit()
4196-
4197- d = scanner.singleCycle()
4198-
4199- def scan_finished(ignored):
4200- self.assertFalse(builder.builderok)
4201-
4202- return d.addCallback(scan_finished)
4203-
4204- def test_fail_to_resume_slave_resets_job(self):
4205- # If an attempt to resume and dispatch a slave fails, it should
4206- # reset the job via job.reset()
4207-
4208- # Make a slave with a failing resume() method.
4209- slave = OkSlave()
4210- slave.resume = lambda: deferLater(
4211- reactor, 0, defer.fail, Failure(('out', 'err', 1)))
4212-
4213- # Reset sampledata builder.
4214- builder = removeSecurityProxy(
4215- getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME])
4216- self._resetBuilder(builder)
4217- self.assertEqual(0, builder.failure_count)
4218- builder.setSlaveForTesting(slave)
4219- builder.vm_host = "fake_vm_host"
4220-
4221- scanner = self._getScanner()
4222-
4223- # Get the next job that will be dispatched.
4224- job = removeSecurityProxy(builder._findBuildCandidate())
4225- job.virtualized = True
4226- builder.virtualized = True
4227- d = scanner.singleCycle()
4228-
4229- def check(ignored):
4230- # The failure_count will have been incremented on the
4231- # builder, we can check that to see that a dispatch attempt
4232- # did indeed occur.
4233- self.assertEqual(1, builder.failure_count)
4234- # There should also be no builder set on the job.
4235- self.assertTrue(job.builder is None)
4236- build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)
4237- self.assertEqual(build.status, BuildStatus.NEEDSBUILD)
4238-
4239- return d.addCallback(check)
4240+ # should have been called.
4241+ self.assertEqual(1, builder.failure_count)
4242+ self.assertEqual(
4243+ 1, builder.currentjob.specific_job.build.failure_count)
4244+
4245+ self.assertEqual(
4246+ 1, manager_module.assessFailureCounts.call_count)
4247+
4248+
4249+class TestDispatchResult(LaunchpadTestCase):
4250+ """Tests `BaseDispatchResult` variations.
4251+
4252+ Variations of `BaseDispatchResult` when evaluated update the database
4253+ information according to their purpose.
4254+ """
4255+
4256+ layer = LaunchpadZopelessLayer
4257+
4258+ def _getBuilder(self, name):
4259+ """Return a fixed `IBuilder` instance from the sampledata.
4260+
4261+ Ensure it's active (builderok=True) and it has a in-progress job.
4262+ """
4263+ login('foo.bar@canonical.com')
4264+
4265+ builder = getUtility(IBuilderSet)[name]
4266+ builder.builderok = True
4267+
4268+ job = builder.currentjob
4269+ build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)
4270+ self.assertEqual(
4271+ 'i386 build of mozilla-firefox 0.9 in ubuntu hoary RELEASE',
4272+ build.title)
4273+
4274+ self.assertEqual('BUILDING', build.status.name)
4275+ self.assertNotEqual(None, job.builder)
4276+ self.assertNotEqual(None, job.date_started)
4277+ self.assertNotEqual(None, job.logtail)
4278+
4279+ transaction.commit()
4280+
4281+ return builder, job.id
4282+
4283+ def assertBuildqueueIsClean(self, buildqueue):
4284+ # Check that the buildqueue is reset.
4285+ self.assertEqual(None, buildqueue.builder)
4286+ self.assertEqual(None, buildqueue.date_started)
4287+ self.assertEqual(None, buildqueue.logtail)
4288+
4289+ def assertBuilderIsClean(self, builder):
4290+ # Check that the builder is ready for a new build.
4291+ self.assertTrue(builder.builderok)
4292+ self.assertIs(None, builder.failnotes)
4293+ self.assertIs(None, builder.currentjob)
4294+
4295+ def testResetDispatchResult(self):
4296+ # Test that `ResetDispatchResult` resets the builder and job.
4297+ builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)
4298+ buildqueue_id = builder.currentjob.id
4299+ builder.builderok = True
4300+ builder.failure_count = 1
4301+
4302+ # Setup a interaction to satisfy 'write_transaction' decorator.
4303+ login(ANONYMOUS)
4304+ slave = RecordingSlave(builder.name, builder.url, builder.vm_host)
4305+ result = ResetDispatchResult(slave)
4306+ result()
4307+
4308+ buildqueue = getUtility(IBuildQueueSet).get(buildqueue_id)
4309+ self.assertBuildqueueIsClean(buildqueue)
4310+
4311+ # XXX Julian
4312+ # Disabled test until bug 586362 is fixed.
4313+ #self.assertFalse(builder.builderok)
4314+ self.assertBuilderIsClean(builder)
4315+
4316+ def testFailDispatchResult(self):
4317+ # Test that `FailDispatchResult` calls assessFailureCounts() so
4318+ # that we know the builders and jobs are failed as necessary
4319+ # when a FailDispatchResult is called at the end of the dispatch
4320+ # chain.
4321+ builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)
4322+
4323+ # Setup a interaction to satisfy 'write_transaction' decorator.
4324+ login(ANONYMOUS)
4325+ slave = RecordingSlave(builder.name, builder.url, builder.vm_host)
4326+ result = FailDispatchResult(slave, 'does not work!')
4327+ result.assessFailureCounts = FakeMethod()
4328+ self.assertEqual(0, result.assessFailureCounts.call_count)
4329+ result()
4330+ self.assertEqual(1, result.assessFailureCounts.call_count)
4331+
4332+ def _setup_failing_dispatch_result(self):
4333+ # assessFailureCounts should fail jobs or builders depending on
4334+ # whether it sees the failure_counts on each increasing.
4335+ builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME)
4336+ slave = RecordingSlave(builder.name, builder.url, builder.vm_host)
4337+ result = FailDispatchResult(slave, 'does not work!')
4338+ return builder, result
4339+
4340+ def test_assessFailureCounts_equal_failures(self):
4341+ # Basic case where the failure counts are equal and the job is
4342+ # reset to try again & the builder is not failed.
4343+ builder, result = self._setup_failing_dispatch_result()
4344+ buildqueue = builder.currentjob
4345+ build = buildqueue.specific_job.build
4346+ builder.failure_count = 2
4347+ build.failure_count = 2
4348+ result.assessFailureCounts()
4349+
4350+ self.assertBuilderIsClean(builder)
4351+ self.assertEqual('NEEDSBUILD', build.status.name)
4352+ self.assertBuildqueueIsClean(buildqueue)
4353+
4354+ def test_assessFailureCounts_job_failed(self):
4355+ # Case where the job has failed more than the builder.
4356+ builder, result = self._setup_failing_dispatch_result()
4357+ buildqueue = builder.currentjob
4358+ build = buildqueue.specific_job.build
4359+ build.failure_count = 2
4360+ builder.failure_count = 1
4361+ result.assessFailureCounts()
4362+
4363+ self.assertBuilderIsClean(builder)
4364+ self.assertEqual('FAILEDTOBUILD', build.status.name)
4365+ # The buildqueue should have been removed entirely.
4366+ self.assertEqual(
4367+ None, getUtility(IBuildQueueSet).getByBuilder(builder),
4368+ "Buildqueue was not removed when it should be.")
4369+
4370+ def test_assessFailureCounts_builder_failed(self):
4371+ # Case where the builder has failed more than the job.
4372+ builder, result = self._setup_failing_dispatch_result()
4373+ buildqueue = builder.currentjob
4374+ build = buildqueue.specific_job.build
4375+ build.failure_count = 2
4376+ builder.failure_count = 3
4377+ result.assessFailureCounts()
4378+
4379+ self.assertFalse(builder.builderok)
4380+ self.assertEqual('does not work!', builder.failnotes)
4381+ self.assertTrue(builder.currentjob is None)
4382+ self.assertEqual('NEEDSBUILD', build.status.name)
4383+ self.assertBuildqueueIsClean(buildqueue)
4384
4385
4386 class TestBuilddManager(TrialTestCase):
4387
4388- layer = TwistedLaunchpadZopelessLayer
4389+ layer = LaunchpadZopelessLayer
4390
4391 def _stub_out_scheduleNextScanCycle(self):
4392 # stub out the code that adds a callLater, so that later tests
4393 # don't get surprises.
4394- self.patch(SlaveScanner, 'startCycle', FakeMethod())
4395+ self.patch(SlaveScanner, 'scheduleNextScanCycle', FakeMethod())
4396
4397 def test_addScanForBuilders(self):
4398 # Test that addScanForBuilders generates NewBuildersScanner objects.
4399@@ -459,62 +1040,10 @@
4400 self.assertNotEqual(0, manager.new_builders_scanner.scan.call_count)
4401
4402
4403-class TestFailureAssessments(TestCaseWithFactory):
4404-
4405- layer = ZopelessDatabaseLayer
4406-
4407- def setUp(self):
4408- TestCaseWithFactory.setUp(self)
4409- self.builder = self.factory.makeBuilder()
4410- self.build = self.factory.makeSourcePackageRecipeBuild()
4411- self.buildqueue = self.build.queueBuild()
4412- self.buildqueue.markAsBuilding(self.builder)
4413-
4414- def test_equal_failures_reset_job(self):
4415- self.builder.gotFailure()
4416- self.builder.getCurrentBuildFarmJob().gotFailure()
4417-
4418- assessFailureCounts(self.builder, "failnotes")
4419- self.assertIs(None, self.builder.currentjob)
4420- self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
4421-
4422- def test_job_failing_more_than_builder_fails_job(self):
4423- self.builder.getCurrentBuildFarmJob().gotFailure()
4424-
4425- assessFailureCounts(self.builder, "failnotes")
4426- self.assertIs(None, self.builder.currentjob)
4427- self.assertEqual(self.build.status, BuildStatus.FAILEDTOBUILD)
4428-
4429- def test_builder_failing_more_than_job_but_under_fail_threshold(self):
4430- self.builder.failure_count = Builder.FAILURE_THRESHOLD - 1
4431-
4432- assessFailureCounts(self.builder, "failnotes")
4433- self.assertIs(None, self.builder.currentjob)
4434- self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
4435- self.assertTrue(self.builder.builderok)
4436-
4437- def test_builder_failing_more_than_job_but_over_fail_threshold(self):
4438- self.builder.failure_count = Builder.FAILURE_THRESHOLD
4439-
4440- assessFailureCounts(self.builder, "failnotes")
4441- self.assertIs(None, self.builder.currentjob)
4442- self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD)
4443- self.assertFalse(self.builder.builderok)
4444- self.assertEqual("failnotes", self.builder.failnotes)
4445-
4446- def test_builder_failing_with_no_attached_job(self):
4447- self.buildqueue.reset()
4448- self.builder.failure_count = Builder.FAILURE_THRESHOLD
4449-
4450- assessFailureCounts(self.builder, "failnotes")
4451- self.assertFalse(self.builder.builderok)
4452- self.assertEqual("failnotes", self.builder.failnotes)
4453-
4454-
4455 class TestNewBuilders(TrialTestCase):
4456 """Test detecting of new builders."""
4457
4458- layer = TwistedLaunchpadZopelessLayer
4459+ layer = LaunchpadZopelessLayer
4460
4461 def _getScanner(self, manager=None, clock=None):
4462 return NewBuildersScanner(manager=manager, clock=clock)
4463@@ -555,8 +1084,11 @@
4464 new_builders, builder_scanner.checkForNewBuilders())
4465
4466 def test_scan(self):
4467- # See if scan detects new builders.
4468+ # See if scan detects new builders and schedules the next scan.
4469
4470+ # stub out the addScanForBuilders and scheduleScan methods since
4471+ # they use callLater; we only want to assert that they get
4472+ # called.
4473 def fake_checkForNewBuilders():
4474 return "new_builders"
4475
4476@@ -572,6 +1104,9 @@
4477 builder_scanner.scan()
4478 advance = NewBuildersScanner.SCAN_INTERVAL + 1
4479 clock.advance(advance)
4480+ self.assertNotEqual(
4481+ 0, builder_scanner.scheduleScan.call_count,
4482+ "scheduleScan did not get called")
4483
4484
4485 def is_file_growing(filepath, poll_interval=1, poll_repeat=10):
4486@@ -612,7 +1147,7 @@
4487 return False
4488
4489
4490-class TestBuilddManagerScript(TestCaseWithFactory):
4491+class TestBuilddManagerScript(LaunchpadTestCase):
4492
4493 layer = LaunchpadScriptLayer
4494
4495@@ -621,7 +1156,6 @@
4496 fixture = BuilddManagerTestSetup()
4497 fixture.setUp()
4498 fixture.tearDown()
4499- self.layer.force_dirty_database()
4500
4501 # XXX Julian 2010-08-06 bug=614275
4502 # These next 2 tests are in the wrong place, they should be near the
4503
4504=== modified file 'lib/lp/buildmaster/tests/test_packagebuild.py'
4505--- lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-26 20:43:50 +0000
4506+++ lib/lp/buildmaster/tests/test_packagebuild.py 2010-12-07 16:24:04 +0000
4507@@ -97,8 +97,6 @@
4508 self.assertRaises(
4509 NotImplementedError, self.package_build.verifySuccessfulUpload)
4510 self.assertRaises(NotImplementedError, self.package_build.notify)
4511- # XXX 2010-10-18 bug=662631
4512- # Change this to do non-blocking IO.
4513 self.assertRaises(
4514 NotImplementedError, self.package_build.handleStatus,
4515 None, None, None)
4516@@ -311,8 +309,6 @@
4517 # A filemap with plain filenames should not cause a problem.
4518 # The call to handleStatus will attempt to get the file from
4519 # the slave resulting in a URL error in this test case.
4520- # XXX 2010-10-18 bug=662631
4521- # Change this to do non-blocking IO.
4522 self.build.handleStatus('OK', None, {
4523 'filemap': {'myfile.py': 'test_file_hash'},
4524 })
4525@@ -323,8 +319,6 @@
4526 def test_handleStatus_OK_absolute_filepath(self):
4527 # A filemap that tries to write to files outside of
4528 # the upload directory will result in a failed upload.
4529- # XXX 2010-10-18 bug=662631
4530- # Change this to do non-blocking IO.
4531 self.build.handleStatus('OK', None, {
4532 'filemap': {'/tmp/myfile.py': 'test_file_hash'},
4533 })
4534@@ -335,8 +329,6 @@
4535 def test_handleStatus_OK_relative_filepath(self):
4536 # A filemap that tries to write to files outside of
4537 # the upload directory will result in a failed upload.
4538- # XXX 2010-10-18 bug=662631
4539- # Change this to do non-blocking IO.
4540 self.build.handleStatus('OK', None, {
4541 'filemap': {'../myfile.py': 'test_file_hash'},
4542 })
4543@@ -347,8 +339,6 @@
4544 # The build log is set during handleStatus.
4545 removeSecurityProxy(self.build).log = None
4546 self.assertEqual(None, self.build.log)
4547- # XXX 2010-10-18 bug=662631
4548- # Change this to do non-blocking IO.
4549 self.build.handleStatus('OK', None, {
4550 'filemap': {'myfile.py': 'test_file_hash'},
4551 })
4552@@ -358,8 +348,6 @@
4553 # The date finished is updated during handleStatus_OK.
4554 removeSecurityProxy(self.build).date_finished = None
4555 self.assertEqual(None, self.build.date_finished)
4556- # XXX 2010-10-18 bug=662631
4557- # Change this to do non-blocking IO.
4558 self.build.handleStatus('OK', None, {
4559 'filemap': {'myfile.py': 'test_file_hash'},
4560 })
4561
4562=== modified file 'lib/lp/code/model/recipebuilder.py'
4563--- lib/lp/code/model/recipebuilder.py 2010-09-24 12:47:12 +0000
4564+++ lib/lp/code/model/recipebuilder.py 2010-12-07 16:24:04 +0000
4565@@ -117,42 +117,38 @@
4566 raise CannotBuild("Unable to find distroarchseries for %s in %s" %
4567 (self._builder.processor.name,
4568 self.build.distroseries.displayname))
4569- args = self._extraBuildArgs(distroarchseries, logger)
4570+
4571 chroot = distroarchseries.getChroot()
4572 if chroot is None:
4573 raise CannotBuild("Unable to find a chroot for %s" %
4574 distroarchseries.displayname)
4575- d = self._builder.slave.cacheFile(logger, chroot)
4576-
4577- def got_cache_file(ignored):
4578- # Generate a string which can be used to cross-check when obtaining
4579- # results so we know we are referring to the right database object in
4580- # subsequent runs.
4581- buildid = "%s-%s" % (self.build.id, build_queue_id)
4582- cookie = self.buildfarmjob.generateSlaveBuildCookie()
4583- chroot_sha1 = chroot.content.sha1
4584- logger.debug(
4585- "Initiating build %s on %s" % (buildid, self._builder.url))
4586-
4587- return self._builder.slave.build(
4588- cookie, "sourcepackagerecipe", chroot_sha1, {}, args)
4589-
4590- def log_build_result((status, info)):
4591- message = """%s (%s):
4592- ***** RESULT *****
4593- %s
4594- %s: %s
4595- ******************
4596- """ % (
4597- self._builder.name,
4598- self._builder.url,
4599- args,
4600- status,
4601- info,
4602- )
4603- logger.info(message)
4604-
4605- return d.addCallback(got_cache_file).addCallback(log_build_result)
4606+ self._builder.slave.cacheFile(logger, chroot)
4607+
4608+ # Generate a string which can be used to cross-check when obtaining
4609+ # results so we know we are referring to the right database object in
4610+ # subsequent runs.
4611+ buildid = "%s-%s" % (self.build.id, build_queue_id)
4612+ cookie = self.buildfarmjob.generateSlaveBuildCookie()
4613+ chroot_sha1 = chroot.content.sha1
4614+ logger.debug(
4615+ "Initiating build %s on %s" % (buildid, self._builder.url))
4616+
4617+ args = self._extraBuildArgs(distroarchseries, logger)
4618+ status, info = self._builder.slave.build(
4619+ cookie, "sourcepackagerecipe", chroot_sha1, {}, args)
4620+ message = """%s (%s):
4621+ ***** RESULT *****
4622+ %s
4623+ %s: %s
4624+ ******************
4625+ """ % (
4626+ self._builder.name,
4627+ self._builder.url,
4628+ args,
4629+ status,
4630+ info,
4631+ )
4632+ logger.info(message)
4633
4634 def verifyBuildRequest(self, logger):
4635 """Assert some pre-build checks.
4636
4637=== modified file 'lib/lp/soyuz/browser/tests/test_builder_views.py'
4638--- lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-06 12:20:03 +0000
4639+++ lib/lp/soyuz/browser/tests/test_builder_views.py 2010-12-07 16:24:04 +0000
4640@@ -34,7 +34,7 @@
4641 return view
4642
4643 def test_posting_form_doesnt_call_slave_xmlrpc(self):
4644- # Posting the +edit for should not call isAvailable, which
4645+ # Posting the +edit for should not call is_available, which
4646 # would do xmlrpc to a slave builder and is explicitly forbidden
4647 # in a webapp process.
4648 view = self.initialize_view()
4649
4650=== added file 'lib/lp/soyuz/doc/buildd-dispatching.txt'
4651--- lib/lp/soyuz/doc/buildd-dispatching.txt 1970-01-01 00:00:00 +0000
4652+++ lib/lp/soyuz/doc/buildd-dispatching.txt 2010-12-07 16:24:04 +0000
4653@@ -0,0 +1,371 @@
4654+= Buildd Dispatching =
4655+
4656+ >>> import transaction
4657+ >>> import logging
4658+ >>> logger = logging.getLogger()
4659+ >>> logger.setLevel(logging.DEBUG)
4660+
4661+The buildd dispatching basically consists of finding a available
4662+slave in IDLE state, pushing any required files to it, then requesting
4663+that it starts the build procedure. These tasks are implemented by the
4664+BuilderSet and Builder classes.
4665+
4666+Setup the test builder:
4667+
4668+ >>> from canonical.buildd.tests import BuilddSlaveTestSetup
4669+ >>> fixture = BuilddSlaveTestSetup()
4670+ >>> fixture.setUp()
4671+
4672+Setup a suitable chroot for Hoary i386:
4673+
4674+ >>> from StringIO import StringIO
4675+ >>> from canonical.librarian.interfaces import ILibrarianClient
4676+ >>> librarian_client = getUtility(ILibrarianClient)
4677+
4678+ >>> content = 'anything'
4679+ >>> alias_id = librarian_client.addFile(
4680+ ... 'foo.tar.gz', len(content), StringIO(content), 'text/plain')
4681+
4682+ >>> from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet
4683+ >>> from lp.registry.interfaces.distribution import IDistributionSet
4684+ >>> from lp.registry.interfaces.pocket import PackagePublishingPocket
4685+
4686+ >>> hoary = getUtility(IDistributionSet)['ubuntu']['hoary']
4687+ >>> hoary_i386 = hoary['i386']
4688+
4689+ >>> chroot = getUtility(ILibraryFileAliasSet)[alias_id]
4690+ >>> pc = hoary_i386.addOrUpdateChroot(chroot=chroot)
4691+
4692+Activate builders present in sampledata, we need to be logged in as a
4693+member of launchpad-buildd-admin:
4694+
4695+ >>> from canonical.launchpad.ftests import login
4696+ >>> login('celso.providelo@canonical.com')
4697+
4698+Set IBuilder.builderok of all present builders:
4699+
4700+ >>> from lp.buildmaster.interfaces.builder import IBuilderSet
4701+ >>> builder_set = getUtility(IBuilderSet)
4702+
4703+ >>> builder_set.count()
4704+ 2
4705+
4706+ >>> from canonical.launchpad.ftests import syncUpdate
4707+ >>> for b in builder_set:
4708+ ... b.builderok = True
4709+ ... syncUpdate(b)
4710+
4711+Clean up previous BuildQueue results from sampledata:
4712+
4713+ >>> from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet
4714+ >>> lost_job = getUtility(IBuildQueueSet).get(1)
4715+ >>> lost_job.builder.name
4716+ u'bob'
4717+ >>> lost_job.destroySelf()
4718+ >>> transaction.commit()
4719+
4720+If the specified buildd slave reset command (used inside resumeSlaveHost())
4721+fails, the slave will still be marked as failed.
4722+
4723+ >>> from canonical.config import config
4724+ >>> reset_fail_config = '''
4725+ ... [builddmaster]
4726+ ... vm_resume_command: /bin/false'''
4727+ >>> config.push('reset fail', reset_fail_config)
4728+ >>> frog_builder = builder_set['frog']
4729+ >>> frog_builder.handleTimeout(logger, 'The universe just collapsed')
4730+ WARNING:root:Resetting builder: http://localhost:9221/ -- The universe just collapsed
4731+ ...
4732+ WARNING:root:Failed to reset builder: http://localhost:9221/ -- Resuming failed:
4733+ ...
4734+ WARNING:root:Disabling builder: http://localhost:9221/ -- The universe just collapsed
4735+ ...
4736+ <BLANKLINE>
4737+
4738+Since we were unable to reset the 'frog' builder it was marked as 'failed'.
4739+
4740+ >>> frog_builder.builderok
4741+ False
4742+
4743+Restore default value for resume command.
4744+
4745+ >>> ignored_config = config.pop('reset fail')
4746+
4747+The 'bob' builder is available for build jobs.
4748+
4749+ >>> bob_builder = builder_set['bob']
4750+ >>> bob_builder.name
4751+ u'bob'
4752+ >>> bob_builder.virtualized
4753+ False
4754+ >>> bob_builder.is_available
4755+ True
4756+ >>> bob_builder.builderok
4757+ True
4758+
4759+
4760+== Builder dispatching API ==
4761+
4762+Now let's check the build candidates which will be considered for the
4763+builder 'bob':
4764+
4765+ >>> from zope.security.proxy import removeSecurityProxy
4766+ >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
4767+
4768+The single BuildQueue found is a non-virtual pending build:
4769+
4770+ >>> job.id
4771+ 2
4772+ >>> from lp.soyuz.interfaces.binarypackagebuild import (
4773+ ... IBinaryPackageBuildSet)
4774+ >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job)
4775+ >>> build.status.name
4776+ 'NEEDSBUILD'
4777+ >>> job.builder is None
4778+ True
4779+ >>> job.date_started is None
4780+ True
4781+ >>> build.is_virtualized
4782+ False
4783+
4784+The build start time is not set yet either.
4785+
4786+ >>> print build.date_first_dispatched
4787+ None
4788+
4789+Update the SourcePackageReleaseFile corresponding to this job:
4790+
4791+ >>> content = 'anything'
4792+ >>> alias_id = librarian_client.addFile(
4793+ ... 'foo.dsc', len(content), StringIO(content), 'application/dsc')
4794+
4795+ >>> sprf = build.source_package_release.files[0]
4796+ >>> naked_sprf = removeSecurityProxy(sprf)
4797+ >>> naked_sprf.libraryfile = getUtility(ILibraryFileAliasSet)[alias_id]
4798+ >>> flush_database_updates()
4799+
4800+Check the dispatching method itself:
4801+
4802+ >>> dispatched_job = bob_builder.findAndStartJob()
4803+ >>> job == dispatched_job
4804+ True
4805+ >>> bob_builder.builderok = True
4806+
4807+ >>> flush_database_updates()
4808+
4809+Verify if the job (BuildQueue) was updated appropriately:
4810+
4811+ >>> job.builder.id == bob_builder.id
4812+ True
4813+
4814+ >>> dispatched_build = getUtility(
4815+ ... IBinaryPackageBuildSet).getByQueueEntry(job)
4816+ >>> dispatched_build == build
4817+ True
4818+
4819+ >>> build.status.name
4820+ 'BUILDING'
4821+
4822+Shutdown builder, mark the build record as failed and remove the
4823+buildqueue record, so the build was eliminated:
4824+
4825+ >>> fixture.tearDown()
4826+
4827+ >>> from lp.buildmaster.enums import BuildStatus
4828+ >>> build.status = BuildStatus.FAILEDTOBUILD
4829+ >>> job.destroySelf()
4830+ >>> flush_database_updates()
4831+
4832+
4833+== PPA build dispatching ==
4834+
4835+Create a new Build record of the same source targeted for a PPA archive:
4836+
4837+ >>> from lp.registry.interfaces.person import IPersonSet
4838+ >>> cprov = getUtility(IPersonSet).getByName('cprov')
4839+
4840+ >>> ppa_build = sprf.sourcepackagerelease.createBuild(
4841+ ... hoary_i386, PackagePublishingPocket.RELEASE, cprov.archive)
4842+
4843+Create BuildQueue record and inspect some parameters:
4844+
4845+ >>> ppa_job = ppa_build.queueBuild()
4846+ >>> ppa_job.id
4847+ 3
4848+ >>> ppa_job.builder == None
4849+ True
4850+ >>> ppa_job.date_started == None
4851+ True
4852+
4853+The build job's archive requires virtualized builds.
4854+
4855+ >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(ppa_job)
4856+ >>> build.archive.require_virtualized
4857+ True
4858+
4859+But the builder is not virtualized.
4860+
4861+ >>> bob_builder.virtualized
4862+ False
4863+
4864+Hence, the builder will not be able to pick up the PPA build job created
4865+above.
4866+
4867+ >>> bob_builder.vm_host = 'localhost.ppa'
4868+ >>> syncUpdate(bob_builder)
4869+
4870+ >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
4871+ >>> print job
4872+ None
4873+
4874+In order to enable 'bob' to find and build the PPA job, we have to
4875+change it to virtualized. This is because PPA builds will only build
4876+on virtualized builders. We also need to make sure this build's source
4877+is published, or it will also be ignored (by superseding it). We can
4878+do this by copying the existing publication in Ubuntu.
4879+
4880+ >>> from lp.soyuz.model.publishing import (
4881+ ... SourcePackagePublishingHistory)
4882+ >>> [old_pub] = SourcePackagePublishingHistory.selectBy(
4883+ ... distroseries=build.distro_series,
4884+ ... sourcepackagerelease=build.source_package_release)
4885+ >>> new_pub = old_pub.copyTo(
4886+ ... old_pub.distroseries, old_pub.pocket, build.archive)
4887+
4888+ >>> bob_builder.virtualized = True
4889+ >>> syncUpdate(bob_builder)
4890+
4891+ >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate()
4892+ >>> ppa_job.id == job.id
4893+ True
4894+
4895+For further details regarding IBuilder._findBuildCandidate() please see
4896+lib/lp/soyuz/tests/test_builder.py.
4897+
4898+Start buildd-slave to be able to dispatch jobs.
4899+
4900+ >>> fixture = BuilddSlaveTestSetup()
4901+ >>> fixture.setUp()
4902+
4903+Before dispatching we can check if the builder is protected against
4904+mistakes in code that results in a attempt to build a virtual job in
4905+a non-virtual build.
4906+
4907+ >>> bob_builder.virtualized = False
4908+ >>> flush_database_updates()
4909+ >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(ppa_job)
4910+ Traceback (most recent call last):
4911+ ...
4912+ AssertionError: Attempt to build non-virtual item on a virtual builder.
4913+
4914+Mark the builder as virtual again, so we can dispatch the ppa job
4915+successfully.
4916+
4917+ >>> bob_builder.virtualized = True
4918+ >>> flush_database_updates()
4919+
4920+ >>> dispatched_job = bob_builder.findAndStartJob()
4921+ >>> ppa_job == dispatched_job
4922+ True
4923+
4924+ >>> flush_database_updates()
4925+
4926+PPA job is building.
4927+
4928+ >>> ppa_job.builder.name
4929+ u'bob'
4930+
4931+ >>> build.status.name
4932+ 'BUILDING'
4933+
4934+Shutdown builder slave, mark the ppa build record as failed, remove the
4935+buildqueue record and make 'bob' builder non-virtual again, so the
4936+environment is back to the initial state.
4937+
4938+ >>> fixture.tearDown()
4939+
4940+ >>> build.status = BuildStatus.FAILEDTOBUILD
4941+ >>> ppa_job.destroySelf()
4942+ >>> bob_builder.virtualized = False
4943+ >>> flush_database_updates()
4944+
4945+
4946+== Security build dispatching ==
4947+
4948+Setup chroot for warty/i386.
4949+
4950+ >>> warty = getUtility(IDistributionSet)['ubuntu']['warty']
4951+ >>> warty_i386 = warty['i386']
4952+ >>> pc = warty_i386.addOrUpdateChroot(chroot=chroot)
4953+
4954+Create a new Build record for test source targeted to warty/i386
4955+architecture and SECURITY pocket:
4956+
4957+ >>> sec_build = sprf.sourcepackagerelease.createBuild(
4958+ ... warty_i386, PackagePublishingPocket.SECURITY, hoary.main_archive)
4959+
4960+Create BuildQueue record and inspect some parameters:
4961+
4962+ >>> sec_job = sec_build.queueBuild()
4963+ >>> sec_job.id
4964+ 4
4965+ >>> print sec_job.builder
4966+ None
4967+ >>> print sec_job.date_started
4968+ None
4969+ >>> sec_build.is_virtualized
4970+ False
4971+
4972+In normal conditions the next available candidate would be the job
4973+targeted to SECURITY pocket. However, the builders are forbidden to
4974+accept such jobs until we have finished the EMBARGOED archive
4975+implementation.
4976+
4977+ >>> fixture = BuilddSlaveTestSetup()
4978+ >>> fixture.setUp()
4979+ >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(sec_job)
4980+ Traceback (most recent call last):
4981+ ...
4982+ AssertionError: Soyuz is not yet capable of building SECURITY uploads.
4983+ >>> fixture.tearDown()
4984+
4985+To solve this problem temporarily until we start building security
4986+uploads, we will mark builds targeted to the SECURITY pocket as
4987+FAILEDTOBUILD during the _findBuildCandidate look-up.
4988+
4989+We will also create another build candidate in breezy-autotest/i386 to
4990+check if legitimate pending candidates will remain valid.
4991+
4992+ >>> breezy = getUtility(IDistributionSet)['ubuntu']['breezy-autotest']
4993+ >>> breezy_i386 = breezy['i386']
4994+ >>> pc = breezy_i386.addOrUpdateChroot(chroot=chroot)
4995+
4996+ >>> pending_build = sprf.sourcepackagerelease.createBuild(
4997+ ... breezy_i386, PackagePublishingPocket.UPDATES, hoary.main_archive)
4998+ >>> pending_job = pending_build.queueBuild()
4999+
5000+We set the score of the security job to ensure it is considered
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches

to all changes: