Merge lp:~mars/launchpad/test-ghost-update into lp:~launchpad/launchpad/ghost-line
- test-ghost-update
- Merge into ghost-line
Proposed by
Māris Fogels
Status: | Merged |
---|---|
Approved by: | Māris Fogels |
Approved revision: | 11785 |
Merged at revision: | 11785 |
Proposed branch: | lp:~mars/launchpad/test-ghost-update |
Merge into: | lp:~launchpad/launchpad/ghost-line |
Diff against target: |
7192 lines (+3508/-2210) 24 files modified
lib/lp/buildmaster/doc/builder.txt (+118/-2) lib/lp/buildmaster/interfaces/builder.py (+62/-83) lib/lp/buildmaster/manager.py (+468/-204) lib/lp/buildmaster/model/builder.py (+224/-240) lib/lp/buildmaster/model/buildfarmjobbehavior.py (+52/-60) lib/lp/buildmaster/model/packagebuild.py (+0/-6) lib/lp/buildmaster/tests/mock_slaves.py (+32/-157) lib/lp/buildmaster/tests/test_builder.py (+154/-582) lib/lp/buildmaster/tests/test_manager.py (+782/-248) lib/lp/buildmaster/tests/test_packagebuild.py (+0/-12) lib/lp/code/model/recipebuilder.py (+28/-32) lib/lp/soyuz/browser/tests/test_builder_views.py (+1/-1) lib/lp/soyuz/doc/buildd-dispatching.txt (+371/-0) lib/lp/soyuz/doc/buildd-slavescanner.txt (+876/-0) lib/lp/soyuz/model/binarypackagebuildbehavior.py (+41/-59) lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+8/-290) lib/lp/soyuz/tests/test_doc.py (+6/-0) lib/lp/testing/factory.py (+2/-8) lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+114/-0) lib/lp/translations/model/translationtemplatesbuildbehavior.py (+14/-20) lib/lp/translations/stories/buildfarm/xx-build-summary.txt (+1/-1) lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+153/-202) lib/lp_sitecustomize.py (+0/-3) utilities/migrater/file-ownership.txt (+1/-0) |
To merge this branch: | bzr merge lp:~mars/launchpad/test-ghost-update |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Māris Fogels (community) | Approve | ||
Review via email: mp+42514@code.launchpad.net |
Commit message
Test merge for bundle-merge command
Description of the change
Test merge
To post a comment you must log in.
Revision history for this message
Māris Fogels (mars) : | # |
review:
Approve
Revision history for this message
Launchpad PQM Bot (launchpad-pqm) wrote : | # |
Revision history for this message
Māris Fogels (mars) : | # |
review:
Approve
Revision history for this message
Launchpad PQM Bot (launchpad-pqm) wrote : | # |
The attempt to merge lp:~mars/launchpad/test-ghost-update into lp:~launchpad/launchpad/ghost-line failed. Below is the output from the failed tests.
rm -f lib/canonical/
rm -f -r lazr-js/build
rm -f -r bin
rm -f -r parts
rm -f -r develop-eggs
rm -f .installed.cfg
rm -f -r build
rm -f _pythonpath.py
make -C sourcecode/
make: *** sourcecode/
make: *** [clean] Error 2
- 11785. By Māris Fogels
-
Added a file for testing
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'lib/lp/buildmaster/doc/builder.txt' |
2 | --- lib/lp/buildmaster/doc/builder.txt 2010-09-24 12:10:52 +0000 |
3 | +++ lib/lp/buildmaster/doc/builder.txt 2010-12-07 16:24:04 +0000 |
4 | @@ -19,6 +19,9 @@ |
5 | As expected, it implements IBuilder. |
6 | |
7 | >>> from canonical.launchpad.webapp.testing import verifyObject |
8 | + >>> from lp.buildmaster.interfaces.builder import IBuilder |
9 | + >>> verifyObject(IBuilder, builder) |
10 | + True |
11 | |
12 | >>> print builder.name |
13 | bob |
14 | @@ -83,7 +86,7 @@ |
15 | The 'new' method will create a new builder in the database. |
16 | |
17 | >>> bnew = builderset.new(1, 'http://dummy.com:8221/', 'dummy', |
18 | - ... 'Dummy Title', 'eh ?', 1) |
19 | + ... 'Dummy Title', 'eh ?', 1) |
20 | >>> bnew.name |
21 | u'dummy' |
22 | |
23 | @@ -167,7 +170,7 @@ |
24 | >>> recipe_bq.processor = i386_family.processors[0] |
25 | >>> recipe_bq.virtualized = True |
26 | >>> transaction.commit() |
27 | - |
28 | + |
29 | >>> queue_sizes = builderset.getBuildQueueSizes() |
30 | >>> print queue_sizes['virt']['386'] |
31 | (1L, datetime.timedelta(0, 64)) |
32 | @@ -185,3 +188,116 @@ |
33 | |
34 | >>> print queue_sizes['virt']['386'] |
35 | (2L, datetime.timedelta(0, 128)) |
36 | + |
37 | + |
38 | +Resuming buildd slaves |
39 | +====================== |
40 | + |
41 | +Virtual slaves are resumed using a command specified in the |
42 | +configuration profile. Production configuration uses a SSH trigger |
43 | +account accessed via a private key available in the builddmaster |
44 | +machine (which used ftpmaster configuration profile) as in: |
45 | + |
46 | +{{{ |
47 | +ssh ~/.ssh/ppa-reset-key ppa@%(vm_host)s |
48 | +}}} |
49 | + |
50 | +The test configuration uses a fake command that can be performed in |
51 | +development machine and allow us to tests the important features used |
52 | +in production, as 'vm_host' variable replacement. |
53 | + |
54 | + >>> from canonical.config import config |
55 | + >>> config.builddmaster.vm_resume_command |
56 | + 'echo %(vm_host)s' |
57 | + |
58 | +Before performing the command, it checks if the builder is indeed |
59 | +virtual and raises CannotResumeHost if it isn't. |
60 | + |
61 | + >>> bob = getUtility(IBuilderSet)['bob'] |
62 | + >>> bob.resumeSlaveHost() |
63 | + Traceback (most recent call last): |
64 | + ... |
65 | + CannotResumeHost: Builder is not virtualized. |
66 | + |
67 | +For testing purposes resumeSlaveHost returns the stdout and stderr |
68 | +buffer resulted from the command. |
69 | + |
70 | + >>> frog = getUtility(IBuilderSet)['frog'] |
71 | + >>> out, err = frog.resumeSlaveHost() |
72 | + >>> print out.strip() |
73 | + localhost-host.ppa |
74 | + |
75 | +If the specified command fails, resumeSlaveHost also raises |
76 | +CannotResumeHost exception with the results stdout and stderr. |
77 | + |
78 | + # The command must have a vm_host dict key and when executed, |
79 | + # have a returncode that is not 0. |
80 | + >>> vm_resume_command = """ |
81 | + ... [builddmaster] |
82 | + ... vm_resume_command: test "%(vm_host)s = 'false'" |
83 | + ... """ |
84 | + >>> config.push('vm_resume_command', vm_resume_command) |
85 | + >>> frog.resumeSlaveHost() |
86 | + Traceback (most recent call last): |
87 | + ... |
88 | + CannotResumeHost: Resuming failed: |
89 | + OUT: |
90 | + <BLANKLINE> |
91 | + ERR: |
92 | + <BLANKLINE> |
93 | + |
94 | +Restore default value for resume command. |
95 | + |
96 | + >>> config_data = config.pop('vm_resume_command') |
97 | + |
98 | + |
99 | +Rescuing lost slaves |
100 | +==================== |
101 | + |
102 | +Builder.rescueIfLost() checks the build ID reported in the slave status |
103 | +against the database. If it isn't building what we think it should be, |
104 | +the current build will be aborted and the slave cleaned in preparation |
105 | +for a new task. The decision about the slave's correctness is left up |
106 | +to IBuildFarmJobBehavior.verifySlaveBuildCookie -- for these examples we |
107 | +will use a special behavior that just checks if the cookie reads 'good'. |
108 | + |
109 | + >>> import logging |
110 | + >>> from lp.buildmaster.interfaces.builder import CorruptBuildCookie |
111 | + >>> from lp.buildmaster.tests.mock_slaves import ( |
112 | + ... BuildingSlave, MockBuilder, OkSlave, WaitingSlave) |
113 | + |
114 | + >>> class TestBuildBehavior: |
115 | + ... def verifySlaveBuildCookie(self, cookie): |
116 | + ... if cookie != 'good': |
117 | + ... raise CorruptBuildCookie('Bad value') |
118 | + |
119 | + >>> def rescue_slave_if_lost(slave): |
120 | + ... builder = MockBuilder('mock', slave, TestBuildBehavior()) |
121 | + ... builder.rescueIfLost(logging.getLogger()) |
122 | + |
123 | +An idle slave is not rescued. |
124 | + |
125 | + >>> rescue_slave_if_lost(OkSlave()) |
126 | + |
127 | +Slaves building or having built the correct build are not rescued |
128 | +either. |
129 | + |
130 | + >>> rescue_slave_if_lost(BuildingSlave(build_id='good')) |
131 | + >>> rescue_slave_if_lost(WaitingSlave(build_id='good')) |
132 | + |
133 | +But if a slave is building the wrong ID, it is declared lost and |
134 | +an abort is attempted. MockSlave prints out a message when it is aborted |
135 | +or cleaned. |
136 | + |
137 | + >>> rescue_slave_if_lost(BuildingSlave(build_id='bad')) |
138 | + Aborting slave |
139 | + INFO:root:Builder 'mock' rescued from 'bad': 'Bad value' |
140 | + |
141 | +Slaves having completed an incorrect build are also declared lost, |
142 | +but there's no need to abort a completed build. Such builders are |
143 | +instead simply cleaned, ready for the next build. |
144 | + |
145 | + >>> rescue_slave_if_lost(WaitingSlave(build_id='bad')) |
146 | + Cleaning slave |
147 | + INFO:root:Builder 'mock' rescued from 'bad': 'Bad value' |
148 | + |
149 | |
150 | === modified file 'lib/lp/buildmaster/interfaces/builder.py' |
151 | --- lib/lp/buildmaster/interfaces/builder.py 2010-10-18 11:57:09 +0000 |
152 | +++ lib/lp/buildmaster/interfaces/builder.py 2010-12-07 16:24:04 +0000 |
153 | @@ -154,6 +154,11 @@ |
154 | |
155 | currentjob = Attribute("BuildQueue instance for job being processed.") |
156 | |
157 | + is_available = Bool( |
158 | + title=_("Whether or not a builder is available for building " |
159 | + "new jobs. "), |
160 | + required=False) |
161 | + |
162 | failure_count = Int( |
163 | title=_('Failure Count'), required=False, default=0, |
164 | description=_("Number of consecutive failures for this builder.")) |
165 | @@ -168,74 +173,32 @@ |
166 | def resetFailureCount(): |
167 | """Set the failure_count back to zero.""" |
168 | |
169 | - def failBuilder(reason): |
170 | - """Mark builder as failed for a given reason.""" |
171 | - |
172 | - def setSlaveForTesting(proxy): |
173 | - """Sets the RPC proxy through which to operate the build slave.""" |
174 | - |
175 | - def verifySlaveBuildCookie(slave_build_id): |
176 | - """Verify that a slave's build cookie is consistent. |
177 | - |
178 | - This should delegate to the current `IBuildFarmJobBehavior`. |
179 | - """ |
180 | - |
181 | - def transferSlaveFileToLibrarian(file_sha1, filename, private): |
182 | - """Transfer a file from the slave to the librarian. |
183 | - |
184 | - :param file_sha1: The file's sha1, which is how the file is addressed |
185 | - in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog' |
186 | - will cause the build log to be retrieved and gzipped. |
187 | - :param filename: The name of the file to be given to the librarian file |
188 | - alias. |
189 | - :param private: True if the build is for a private archive. |
190 | - :return: A librarian file alias. |
191 | - """ |
192 | - |
193 | - def getBuildQueue(): |
194 | - """Return a `BuildQueue` if there's an active job on this builder. |
195 | - |
196 | - :return: A BuildQueue, or None. |
197 | - """ |
198 | - |
199 | - def getCurrentBuildFarmJob(): |
200 | - """Return a `BuildFarmJob` for this builder.""" |
201 | - |
202 | - # All methods below here return Deferred. |
203 | - |
204 | - def isAvailable(): |
205 | - """Whether or not a builder is available for building new jobs. |
206 | - |
207 | - :return: A Deferred that fires with True or False, depending on |
208 | - whether the builder is available or not. |
209 | + def checkSlaveAlive(): |
210 | + """Check that the buildd slave is alive. |
211 | + |
212 | + This pings the slave over the network via the echo method and looks |
213 | + for the sent message as the reply. |
214 | + |
215 | + :raises BuildDaemonError: When the slave is down. |
216 | """ |
217 | |
218 | def rescueIfLost(logger=None): |
219 | """Reset the slave if its job information doesn't match the DB. |
220 | |
221 | - This checks the build ID reported in the slave status against the |
222 | - database. If it isn't building what we think it should be, the current |
223 | - build will be aborted and the slave cleaned in preparation for a new |
224 | - task. The decision about the slave's correctness is left up to |
225 | - `IBuildFarmJobBehavior.verifySlaveBuildCookie`. |
226 | - |
227 | - :return: A Deferred that fires when the dialog with the slave is |
228 | - finished. It does not have a return value. |
229 | + If the builder is BUILDING or WAITING but has a build ID string |
230 | + that doesn't match what is stored in the DB, we have to dismiss |
231 | + its current actions and clean the slave for another job, assuming |
232 | + the XMLRPC is working properly at this point. |
233 | """ |
234 | |
235 | def updateStatus(logger=None): |
236 | - """Update the builder's status by probing it. |
237 | - |
238 | - :return: A Deferred that fires when the dialog with the slave is |
239 | - finished. It does not have a return value. |
240 | - """ |
241 | + """Update the builder's status by probing it.""" |
242 | |
243 | def cleanSlave(): |
244 | - """Clean any temporary files from the slave. |
245 | - |
246 | - :return: A Deferred that fires when the dialog with the slave is |
247 | - finished. It does not have a return value. |
248 | - """ |
249 | + """Clean any temporary files from the slave.""" |
250 | + |
251 | + def failBuilder(reason): |
252 | + """Mark builder as failed for a given reason.""" |
253 | |
254 | def requestAbort(): |
255 | """Ask that a build be aborted. |
256 | @@ -243,9 +206,6 @@ |
257 | This takes place asynchronously: Actually killing everything running |
258 | can take some time so the slave status should be queried again to |
259 | detect when the abort has taken effect. (Look for status ABORTED). |
260 | - |
261 | - :return: A Deferred that fires when the dialog with the slave is |
262 | - finished. It does not have a return value. |
263 | """ |
264 | |
265 | def resumeSlaveHost(): |
266 | @@ -257,35 +217,37 @@ |
267 | :raises: CannotResumeHost: if builder is not virtual or if the |
268 | configuration command has failed. |
269 | |
270 | - :return: A Deferred that fires when the resume operation finishes, |
271 | - whose value is a (stdout, stderr) tuple for success, or a Failure |
272 | - whose value is a CannotResumeHost exception. |
273 | + :return: command stdout and stderr buffers as a tuple. |
274 | """ |
275 | |
276 | + def setSlaveForTesting(proxy): |
277 | + """Sets the RPC proxy through which to operate the build slave.""" |
278 | + |
279 | def slaveStatus(): |
280 | """Get the slave status for this builder. |
281 | |
282 | - :return: A Deferred which fires when the slave dialog is complete. |
283 | - Its value is a dict containing at least builder_status, but |
284 | - potentially other values included by the current build |
285 | - behavior. |
286 | + :return: a dict containing at least builder_status, but potentially |
287 | + other values included by the current build behavior. |
288 | """ |
289 | |
290 | def slaveStatusSentence(): |
291 | """Get the slave status sentence for this builder. |
292 | |
293 | - :return: A Deferred which fires when the slave dialog is complete. |
294 | - Its value is a tuple with the first element containing the |
295 | - slave status, build_id-queue-id and then optionally more |
296 | - elements depending on the status. |
297 | + :return: A tuple with the first element containing the slave status, |
298 | + build_id-queue-id and then optionally more elements depending on |
299 | + the status. |
300 | + """ |
301 | + |
302 | + def verifySlaveBuildCookie(slave_build_id): |
303 | + """Verify that a slave's build cookie is consistent. |
304 | + |
305 | + This should delegate to the current `IBuildFarmJobBehavior`. |
306 | """ |
307 | |
308 | def updateBuild(queueItem): |
309 | """Verify the current build job status. |
310 | |
311 | Perform the required actions for each state. |
312 | - |
313 | - :return: A Deferred that fires when the slave dialog is finished. |
314 | """ |
315 | |
316 | def startBuild(build_queue_item, logger): |
317 | @@ -293,10 +255,21 @@ |
318 | |
319 | :param build_queue_item: A BuildQueueItem to build. |
320 | :param logger: A logger to be used to log diagnostic information. |
321 | - |
322 | - :return: A Deferred that fires after the dispatch has completed whose |
323 | - value is None, or a Failure that contains an exception |
324 | - explaining what went wrong. |
325 | + :raises BuildSlaveFailure: When the build slave fails. |
326 | + :raises CannotBuild: When a build cannot be started for some reason |
327 | + other than the build slave failing. |
328 | + """ |
329 | + |
330 | + def transferSlaveFileToLibrarian(file_sha1, filename, private): |
331 | + """Transfer a file from the slave to the librarian. |
332 | + |
333 | + :param file_sha1: The file's sha1, which is how the file is addressed |
334 | + in the slave XMLRPC protocol. Specially, the file_sha1 'buildlog' |
335 | + will cause the build log to be retrieved and gzipped. |
336 | + :param filename: The name of the file to be given to the librarian file |
337 | + alias. |
338 | + :param private: True if the build is for a private archive. |
339 | + :return: A librarian file alias. |
340 | """ |
341 | |
342 | def handleTimeout(logger, error_message): |
343 | @@ -311,8 +284,6 @@ |
344 | |
345 | :param logger: The logger object to be used for logging. |
346 | :param error_message: The error message to be used for logging. |
347 | - :return: A Deferred that fires after the virtual slave was resumed |
348 | - or immediately if it's a non-virtual slave. |
349 | """ |
350 | |
351 | def findAndStartJob(buildd_slave=None): |
352 | @@ -320,9 +291,17 @@ |
353 | |
354 | :param buildd_slave: An optional buildd slave that this builder should |
355 | talk to. |
356 | - :return: A Deferred whose value is the `IBuildQueue` instance |
357 | - found or None if no job was found. |
358 | - """ |
359 | + :return: the `IBuildQueue` instance found or None if no job was found. |
360 | + """ |
361 | + |
362 | + def getBuildQueue(): |
363 | + """Return a `BuildQueue` if there's an active job on this builder. |
364 | + |
365 | + :return: A BuildQueue, or None. |
366 | + """ |
367 | + |
368 | + def getCurrentBuildFarmJob(): |
369 | + """Return a `BuildFarmJob` for this builder.""" |
370 | |
371 | |
372 | class IBuilderSet(Interface): |
373 | |
374 | === modified file 'lib/lp/buildmaster/manager.py' |
375 | --- lib/lp/buildmaster/manager.py 2010-10-20 12:28:46 +0000 |
376 | +++ lib/lp/buildmaster/manager.py 2010-12-07 16:24:04 +0000 |
377 | @@ -10,10 +10,13 @@ |
378 | 'BuilddManager', |
379 | 'BUILDD_MANAGER_LOG_NAME', |
380 | 'FailDispatchResult', |
381 | + 'RecordingSlave', |
382 | 'ResetDispatchResult', |
383 | + 'buildd_success_result_map', |
384 | ] |
385 | |
386 | import logging |
387 | +import os |
388 | |
389 | import transaction |
390 | from twisted.application import service |
391 | @@ -21,27 +24,129 @@ |
392 | defer, |
393 | reactor, |
394 | ) |
395 | -from twisted.internet.task import LoopingCall |
396 | +from twisted.protocols.policies import TimeoutMixin |
397 | from twisted.python import log |
398 | +from twisted.python.failure import Failure |
399 | +from twisted.web import xmlrpc |
400 | from zope.component import getUtility |
401 | |
402 | +from canonical.config import config |
403 | +from canonical.launchpad.webapp import urlappend |
404 | +from lp.services.database import write_transaction |
405 | from lp.buildmaster.enums import BuildStatus |
406 | -from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
407 | - BuildBehaviorMismatch, |
408 | - ) |
409 | -from lp.buildmaster.model.builder import Builder |
410 | -from lp.buildmaster.interfaces.builder import ( |
411 | - BuildDaemonError, |
412 | - BuildSlaveFailure, |
413 | - CannotBuild, |
414 | - CannotFetchFile, |
415 | - CannotResumeHost, |
416 | - ) |
417 | +from lp.services.twistedsupport.processmonitor import ProcessWithTimeout |
418 | |
419 | |
420 | BUILDD_MANAGER_LOG_NAME = "slave-scanner" |
421 | |
422 | |
423 | +buildd_success_result_map = { |
424 | + 'ensurepresent': True, |
425 | + 'build': 'BuilderStatus.BUILDING', |
426 | + } |
427 | + |
428 | + |
429 | +class QueryWithTimeoutProtocol(xmlrpc.QueryProtocol, TimeoutMixin): |
430 | + """XMLRPC query protocol with a configurable timeout. |
431 | + |
432 | + XMLRPC queries using this protocol will be unconditionally closed |
433 | + when the timeout is elapsed. The timeout is fetched from the context |
434 | + Launchpad configuration file (`config.builddmaster.socket_timeout`). |
435 | + """ |
436 | + def connectionMade(self): |
437 | + xmlrpc.QueryProtocol.connectionMade(self) |
438 | + self.setTimeout(config.builddmaster.socket_timeout) |
439 | + |
440 | + |
441 | +class QueryFactoryWithTimeout(xmlrpc._QueryFactory): |
442 | + """XMLRPC client factory with timeout support.""" |
443 | + # Make this factory quiet. |
444 | + noisy = False |
445 | + # Use the protocol with timeout support. |
446 | + protocol = QueryWithTimeoutProtocol |
447 | + |
448 | + |
449 | +class RecordingSlave: |
450 | + """An RPC proxy for buildd slaves that records instructions to the latter. |
451 | + |
452 | + The idea here is to merely record the instructions that the slave-scanner |
453 | + issues to the buildd slaves and "replay" them a bit later in asynchronous |
454 | + and parallel fashion. |
455 | + |
456 | + By dealing with a number of buildd slaves in parallel we remove *the* |
457 | + major slave-scanner throughput issue while avoiding large-scale changes to |
458 | + its code base. |
459 | + """ |
460 | + |
461 | + def __init__(self, name, url, vm_host): |
462 | + self.name = name |
463 | + self.url = url |
464 | + self.vm_host = vm_host |
465 | + |
466 | + self.resume_requested = False |
467 | + self.calls = [] |
468 | + |
469 | + def __repr__(self): |
470 | + return '<%s:%s>' % (self.name, self.url) |
471 | + |
472 | + def cacheFile(self, logger, libraryfilealias): |
473 | + """Cache the file on the server.""" |
474 | + self.ensurepresent( |
475 | + libraryfilealias.content.sha1, libraryfilealias.http_url, '', '') |
476 | + |
477 | + def sendFileToSlave(self, *args): |
478 | + """Helper to send a file to this builder.""" |
479 | + return self.ensurepresent(*args) |
480 | + |
481 | + def ensurepresent(self, *args): |
482 | + """Download files needed for the build.""" |
483 | + self.calls.append(('ensurepresent', args)) |
484 | + result = buildd_success_result_map.get('ensurepresent') |
485 | + return [result, 'Download'] |
486 | + |
487 | + def build(self, *args): |
488 | + """Perform the build.""" |
489 | + # XXX: This method does not appear to be used. |
490 | + self.calls.append(('build', args)) |
491 | + result = buildd_success_result_map.get('build') |
492 | + return [result, args[0]] |
493 | + |
494 | + def resume(self): |
495 | + """Record the request to resume the builder.. |
496 | + |
497 | + Always succeed. |
498 | + |
499 | + :return: a (stdout, stderr, subprocess exitcode) triple |
500 | + """ |
501 | + self.resume_requested = True |
502 | + return ['', '', 0] |
503 | + |
504 | + def resumeSlave(self, clock=None): |
505 | + """Resume the builder in a asynchronous fashion. |
506 | + |
507 | + Used the configuration command-line in the same way |
508 | + `BuilddSlave.resume` does. |
509 | + |
510 | + Also use the builddmaster configuration 'socket_timeout' as |
511 | + the process timeout. |
512 | + |
513 | + :param clock: An optional twisted.internet.task.Clock to override |
514 | + the default clock. For use in tests. |
515 | + |
516 | + :return: a Deferred |
517 | + """ |
518 | + resume_command = config.builddmaster.vm_resume_command % { |
519 | + 'vm_host': self.vm_host} |
520 | + # Twisted API require string and the configuration provides unicode. |
521 | + resume_argv = [str(term) for term in resume_command.split()] |
522 | + |
523 | + d = defer.Deferred() |
524 | + p = ProcessWithTimeout( |
525 | + d, config.builddmaster.socket_timeout, clock=clock) |
526 | + p.spawnProcess(resume_argv[0], tuple(resume_argv)) |
527 | + return d |
528 | + |
529 | + |
530 | def get_builder(name): |
531 | """Helper to return the builder given the slave for this request.""" |
532 | # Avoiding circular imports. |
533 | @@ -54,12 +159,9 @@ |
534 | # builder.currentjob hides a complicated query, don't run it twice. |
535 | # See bug 623281. |
536 | current_job = builder.currentjob |
537 | - if current_job is None: |
538 | - job_failure_count = 0 |
539 | - else: |
540 | - job_failure_count = current_job.specific_job.build.failure_count |
541 | + build_job = current_job.specific_job.build |
542 | |
543 | - if builder.failure_count == job_failure_count and current_job is not None: |
544 | + if builder.failure_count == build_job.failure_count: |
545 | # If the failure count for the builder is the same as the |
546 | # failure count for the job being built, then we cannot |
547 | # tell whether the job or the builder is at fault. The best |
548 | @@ -68,28 +170,17 @@ |
549 | current_job.reset() |
550 | return |
551 | |
552 | - if builder.failure_count > job_failure_count: |
553 | + if builder.failure_count > build_job.failure_count: |
554 | # The builder has failed more than the jobs it's been |
555 | - # running. |
556 | - |
557 | - # Re-schedule the build if there is one. |
558 | - if current_job is not None: |
559 | - current_job.reset() |
560 | - |
561 | - # We are a little more tolerant with failing builders than |
562 | - # failing jobs because sometimes they get unresponsive due to |
563 | - # human error, flaky networks etc. We expect the builder to get |
564 | - # better, whereas jobs are very unlikely to get better. |
565 | - if builder.failure_count >= Builder.FAILURE_THRESHOLD: |
566 | - # It's also gone over the threshold so let's disable it. |
567 | - builder.failBuilder(fail_notes) |
568 | + # running, so let's disable it and re-schedule the build. |
569 | + builder.failBuilder(fail_notes) |
570 | + current_job.reset() |
571 | else: |
572 | # The job is the culprit! Override its status to 'failed' |
573 | # to make sure it won't get automatically dispatched again, |
574 | # and remove the buildqueue request. The failure should |
575 | # have already caused any relevant slave data to be stored |
576 | # on the build record so don't worry about that here. |
577 | - build_job = current_job.specific_job.build |
578 | build_job.status = BuildStatus.FAILEDTOBUILD |
579 | builder.currentjob.destroySelf() |
580 | |
581 | @@ -99,108 +190,133 @@ |
582 | # next buildd scan. |
583 | |
584 | |
585 | +class BaseDispatchResult: |
586 | + """Base class for *DispatchResult variations. |
587 | + |
588 | + It will be extended to represent dispatching results and allow |
589 | + homogeneous processing. |
590 | + """ |
591 | + |
592 | + def __init__(self, slave, info=None): |
593 | + self.slave = slave |
594 | + self.info = info |
595 | + |
596 | + def _cleanJob(self, job): |
597 | + """Clean up in case of builder reset or dispatch failure.""" |
598 | + if job is not None: |
599 | + job.reset() |
600 | + |
601 | + def assessFailureCounts(self): |
602 | + """View builder/job failure_count and work out which needs to die. |
603 | + |
604 | + :return: True if we disabled something, False if we did not. |
605 | + """ |
606 | + builder = get_builder(self.slave.name) |
607 | + assessFailureCounts(builder, self.info) |
608 | + |
609 | + def ___call__(self): |
610 | + raise NotImplementedError( |
611 | + "Call sites must define an evaluation method.") |
612 | + |
613 | + |
614 | +class FailDispatchResult(BaseDispatchResult): |
615 | + """Represents a communication failure while dispatching a build job.. |
616 | + |
617 | + When evaluated this object mark the corresponding `IBuilder` as |
618 | + 'NOK' with the given text as 'failnotes'. It also cleans up the running |
619 | + job (`IBuildQueue`). |
620 | + """ |
621 | + |
622 | + def __repr__(self): |
623 | + return '%r failure (%s)' % (self.slave, self.info) |
624 | + |
625 | + @write_transaction |
626 | + def __call__(self): |
627 | + self.assessFailureCounts() |
628 | + |
629 | + |
630 | +class ResetDispatchResult(BaseDispatchResult): |
631 | + """Represents a failure to reset a builder. |
632 | + |
633 | + When evaluated this object simply cleans up the running job |
634 | + (`IBuildQueue`) and marks the builder down. |
635 | + """ |
636 | + |
637 | + def __repr__(self): |
638 | + return '%r reset failure' % self.slave |
639 | + |
640 | + @write_transaction |
641 | + def __call__(self): |
642 | + builder = get_builder(self.slave.name) |
643 | + # Builders that fail to reset should be disabled as per bug |
644 | + # 563353. |
645 | + # XXX Julian bug=586362 |
646 | + # This is disabled until this code is not also used for dispatch |
647 | + # failures where we *don't* want to disable the builder. |
648 | + # builder.failBuilder(self.info) |
649 | + self._cleanJob(builder.currentjob) |
650 | + |
651 | + |
652 | class SlaveScanner: |
653 | """A manager for a single builder.""" |
654 | |
655 | - # The interval between each poll cycle, in seconds. We'd ideally |
656 | - # like this to be lower but 5 seems a reasonable compromise between |
657 | - # responsivity and load on the database server, since in each cycle |
658 | - # we can run quite a few queries. |
659 | SCAN_INTERVAL = 5 |
660 | |
661 | + # These are for the benefit of tests; see `TestingSlaveScanner`. |
662 | + # It pokes fake versions in here so that it can verify methods were |
663 | + # called. The tests should really be using FakeMethod() though. |
664 | + reset_result = ResetDispatchResult |
665 | + fail_result = FailDispatchResult |
666 | + |
667 | def __init__(self, builder_name, logger): |
668 | self.builder_name = builder_name |
669 | self.logger = logger |
670 | + self._deferred_list = [] |
671 | + |
672 | + def scheduleNextScanCycle(self): |
673 | + """Schedule another scan of the builder some time in the future.""" |
674 | + self._deferred_list = [] |
675 | + # XXX: Change this to use LoopingCall. |
676 | + reactor.callLater(self.SCAN_INTERVAL, self.startCycle) |
677 | |
678 | def startCycle(self): |
679 | """Scan the builder and dispatch to it or deal with failures.""" |
680 | - self.loop = LoopingCall(self.singleCycle) |
681 | - self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL) |
682 | - return self.stopping_deferred |
683 | - |
684 | - def stopCycle(self): |
685 | - """Terminate the LoopingCall.""" |
686 | - self.loop.stop() |
687 | - |
688 | - def singleCycle(self): |
689 | self.logger.debug("Scanning builder: %s" % self.builder_name) |
690 | - d = self.scan() |
691 | - |
692 | - d.addErrback(self._scanFailed) |
693 | - return d |
694 | - |
695 | - def _scanFailed(self, failure): |
696 | - """Deal with failures encountered during the scan cycle. |
697 | - |
698 | - 1. Print the error in the log |
699 | - 2. Increment and assess failure counts on the builder and job. |
700 | - """ |
701 | - # Make sure that pending database updates are removed as it |
702 | - # could leave the database in an inconsistent state (e.g. The |
703 | - # job says it's running but the buildqueue has no builder set). |
704 | - transaction.abort() |
705 | - |
706 | - # If we don't recognise the exception include a stack trace with |
707 | - # the error. |
708 | - error_message = failure.getErrorMessage() |
709 | - if failure.check( |
710 | - BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch, |
711 | - CannotResumeHost, BuildDaemonError, CannotFetchFile): |
712 | - self.logger.info("Scanning failed with: %s" % error_message) |
713 | - else: |
714 | + |
715 | + try: |
716 | + slave = self.scan() |
717 | + if slave is None: |
718 | + self.scheduleNextScanCycle() |
719 | + else: |
720 | + # XXX: Ought to return Deferred. |
721 | + self.resumeAndDispatch(slave) |
722 | + except: |
723 | + error = Failure() |
724 | self.logger.info("Scanning failed with: %s\n%s" % |
725 | - (failure.getErrorMessage(), failure.getTraceback())) |
726 | + (error.getErrorMessage(), error.getTraceback())) |
727 | |
728 | - # Decide if we need to terminate the job or fail the |
729 | - # builder. |
730 | - try: |
731 | builder = get_builder(self.builder_name) |
732 | - builder.gotFailure() |
733 | - if builder.currentjob is not None: |
734 | - build_farm_job = builder.getCurrentBuildFarmJob() |
735 | - build_farm_job.gotFailure() |
736 | - self.logger.info( |
737 | - "builder %s failure count: %s, " |
738 | - "job '%s' failure count: %s" % ( |
739 | - self.builder_name, |
740 | - builder.failure_count, |
741 | - build_farm_job.title, |
742 | - build_farm_job.failure_count)) |
743 | - else: |
744 | - self.logger.info( |
745 | - "Builder %s failed a probe, count: %s" % ( |
746 | - self.builder_name, builder.failure_count)) |
747 | - assessFailureCounts(builder, failure.getErrorMessage()) |
748 | + |
749 | + # Decide if we need to terminate the job or fail the |
750 | + # builder. |
751 | + self._incrementFailureCounts(builder) |
752 | + self.logger.info( |
753 | + "builder failure count: %s, job failure count: %s" % ( |
754 | + builder.failure_count, |
755 | + builder.getCurrentBuildFarmJob().failure_count)) |
756 | + assessFailureCounts(builder, error.getErrorMessage()) |
757 | transaction.commit() |
758 | - except: |
759 | - # Catastrophic code failure! Not much we can do. |
760 | - self.logger.error( |
761 | - "Miserable failure when trying to examine failure counts:\n", |
762 | - exc_info=True) |
763 | - transaction.abort() |
764 | - |
765 | + |
766 | + self.scheduleNextScanCycle() |
767 | + |
768 | + @write_transaction |
769 | def scan(self): |
770 | """Probe the builder and update/dispatch/collect as appropriate. |
771 | |
772 | - There are several steps to scanning: |
773 | - |
774 | - 1. If the builder is marked as "ok" then probe it to see what state |
775 | - it's in. This is where lost jobs are rescued if we think the |
776 | - builder is doing something that it later tells us it's not, |
777 | - and also where the multi-phase abort procedure happens. |
778 | - See IBuilder.rescueIfLost, which is called by |
779 | - IBuilder.updateStatus(). |
780 | - 2. If the builder is still happy, we ask it if it has an active build |
781 | - and then either update the build in Launchpad or collect the |
782 | - completed build. (builder.updateBuild) |
783 | - 3. If the builder is not happy or it was marked as unavailable |
784 | - mid-build, we need to reset the job that we thought it had, so |
785 | - that the job is dispatched elsewhere. |
786 | - 4. If the builder is idle and we have another build ready, dispatch |
787 | - it. |
788 | - |
789 | - :return: A Deferred that fires when the scan is complete, whose |
790 | - value is A `BuilderSlave` if we dispatched a job to it, or None. |
791 | + The whole method is wrapped in a transaction, but we do partial |
792 | + commits to avoid holding locks on tables. |
793 | + |
794 | + :return: A `RecordingSlave` if we dispatched a job to it, or None. |
795 | """ |
796 | # We need to re-fetch the builder object on each cycle as the |
797 | # Storm store is invalidated over transaction boundaries. |
798 | @@ -208,72 +324,240 @@ |
799 | self.builder = get_builder(self.builder_name) |
800 | |
801 | if self.builder.builderok: |
802 | - d = self.builder.updateStatus(self.logger) |
803 | + self.builder.updateStatus(self.logger) |
804 | + transaction.commit() |
805 | + |
806 | + # See if we think there's an active build on the builder. |
807 | + buildqueue = self.builder.getBuildQueue() |
808 | + |
809 | + # XXX Julian 2010-07-29 bug=611258 |
810 | + # We're not using the RecordingSlave until dispatching, which |
811 | + # means that this part blocks until we've received a response |
812 | + # from the builder. updateBuild() needs to be made |
813 | + # asyncronous. |
814 | + |
815 | + # Scan the slave and get the logtail, or collect the build if |
816 | + # it's ready. Yes, "updateBuild" is a bad name. |
817 | + if buildqueue is not None: |
818 | + self.builder.updateBuild(buildqueue) |
819 | + transaction.commit() |
820 | + |
821 | + # If the builder is in manual mode, don't dispatch anything. |
822 | + if self.builder.manual: |
823 | + self.logger.debug( |
824 | + '%s is in manual mode, not dispatching.' % self.builder.name) |
825 | + return None |
826 | + |
827 | + # If the builder is marked unavailable, don't dispatch anything. |
828 | + # Additionaly, because builders can be removed from the pool at |
829 | + # any time, we need to see if we think there was a build running |
830 | + # on it before it was marked unavailable. In this case we reset |
831 | + # the build thusly forcing it to get re-dispatched to another |
832 | + # builder. |
833 | + if not self.builder.is_available: |
834 | + job = self.builder.currentjob |
835 | + if job is not None and not self.builder.builderok: |
836 | + self.logger.info( |
837 | + "%s was made unavailable, resetting attached " |
838 | + "job" % self.builder.name) |
839 | + job.reset() |
840 | + transaction.commit() |
841 | + return None |
842 | + |
843 | + # See if there is a job we can dispatch to the builder slave. |
844 | + |
845 | + # XXX: Rather than use the slave actually associated with the builder |
846 | + # (which, incidentally, shouldn't be a property anyway), we make a new |
847 | + # RecordingSlave so we can get access to its asynchronous |
848 | + # "resumeSlave" method. Blech. |
849 | + slave = RecordingSlave( |
850 | + self.builder.name, self.builder.url, self.builder.vm_host) |
851 | + # XXX: Passing buildd_slave=slave overwrites the 'slave' property of |
852 | + # self.builder. Not sure why this is needed yet. |
853 | + self.builder.findAndStartJob(buildd_slave=slave) |
854 | + if self.builder.currentjob is not None: |
855 | + # After a successful dispatch we can reset the |
856 | + # failure_count. |
857 | + self.builder.resetFailureCount() |
858 | + transaction.commit() |
859 | + return slave |
860 | + |
861 | + return None |
862 | + |
863 | + def resumeAndDispatch(self, slave): |
864 | + """Chain the resume and dispatching Deferreds.""" |
865 | + # XXX: resumeAndDispatch makes Deferreds without returning them. |
866 | + if slave.resume_requested: |
867 | + # The slave needs to be reset before we can dispatch to |
868 | + # it (e.g. a virtual slave) |
869 | + |
870 | + # XXX: Two problems here. The first is that 'resumeSlave' only |
871 | + # exists on RecordingSlave (BuilderSlave calls it 'resume'). |
872 | + d = slave.resumeSlave() |
873 | + d.addBoth(self.checkResume, slave) |
874 | else: |
875 | + # No resume required, build dispatching can commence. |
876 | d = defer.succeed(None) |
877 | |
878 | - def status_updated(ignored): |
879 | - # Commit the changes done while possibly rescuing jobs, to |
880 | - # avoid holding table locks. |
881 | - transaction.commit() |
882 | - |
883 | - # See if we think there's an active build on the builder. |
884 | - buildqueue = self.builder.getBuildQueue() |
885 | - |
886 | - # Scan the slave and get the logtail, or collect the build if |
887 | - # it's ready. Yes, "updateBuild" is a bad name. |
888 | - if buildqueue is not None: |
889 | - return self.builder.updateBuild(buildqueue) |
890 | - |
891 | - def build_updated(ignored): |
892 | - # Commit changes done while updating the build, to avoid |
893 | - # holding table locks. |
894 | - transaction.commit() |
895 | - |
896 | - # If the builder is in manual mode, don't dispatch anything. |
897 | - if self.builder.manual: |
898 | - self.logger.debug( |
899 | - '%s is in manual mode, not dispatching.' % |
900 | - self.builder.name) |
901 | - return |
902 | - |
903 | - # If the builder is marked unavailable, don't dispatch anything. |
904 | - # Additionaly, because builders can be removed from the pool at |
905 | - # any time, we need to see if we think there was a build running |
906 | - # on it before it was marked unavailable. In this case we reset |
907 | - # the build thusly forcing it to get re-dispatched to another |
908 | - # builder. |
909 | - |
910 | - return self.builder.isAvailable().addCallback(got_available) |
911 | - |
912 | - def got_available(available): |
913 | - if not available: |
914 | - job = self.builder.currentjob |
915 | - if job is not None and not self.builder.builderok: |
916 | - self.logger.info( |
917 | - "%s was made unavailable, resetting attached " |
918 | - "job" % self.builder.name) |
919 | - job.reset() |
920 | - transaction.commit() |
921 | - return |
922 | - |
923 | - # See if there is a job we can dispatch to the builder slave. |
924 | - |
925 | - d = self.builder.findAndStartJob() |
926 | - def job_started(candidate): |
927 | - if self.builder.currentjob is not None: |
928 | - # After a successful dispatch we can reset the |
929 | - # failure_count. |
930 | - self.builder.resetFailureCount() |
931 | - transaction.commit() |
932 | - return self.builder.slave |
933 | - else: |
934 | + # Dispatch the build to the slave asynchronously. |
935 | + d.addCallback(self.initiateDispatch, slave) |
936 | + # Store this deferred so we can wait for it along with all |
937 | + # the others that will be generated by RecordingSlave during |
938 | + # the dispatch process, and chain a callback after they've |
939 | + # all fired. |
940 | + self._deferred_list.append(d) |
941 | + |
942 | + def initiateDispatch(self, resume_result, slave): |
943 | + """Start dispatching a build to a slave. |
944 | + |
945 | + If the previous task in chain (slave resuming) has failed it will |
946 | + receive a `ResetBuilderRequest` instance as 'resume_result' and |
947 | + will immediately return that so the subsequent callback can collect |
948 | + it. |
949 | + |
950 | + If the slave resuming succeeded, it starts the XMLRPC dialogue. The |
951 | + dialogue may consist of many calls to the slave before the build |
952 | + starts. Each call is done via a Deferred event, where slave calls |
953 | + are sent in callSlave(), and checked in checkDispatch() which will |
954 | + keep firing events via callSlave() until all the events are done or |
955 | + an error occurs. |
956 | + """ |
957 | + if resume_result is not None: |
958 | + self.slaveConversationEnded() |
959 | + return resume_result |
960 | + |
961 | + self.logger.info('Dispatching: %s' % slave) |
962 | + self.callSlave(slave) |
963 | + |
964 | + def _getProxyForSlave(self, slave): |
965 | + """Return a twisted.web.xmlrpc.Proxy for the buildd slave. |
966 | + |
967 | + Uses a protocol with timeout support, See QueryFactoryWithTimeout. |
968 | + """ |
969 | + proxy = xmlrpc.Proxy(str(urlappend(slave.url, 'rpc'))) |
970 | + proxy.queryFactory = QueryFactoryWithTimeout |
971 | + return proxy |
972 | + |
973 | + def callSlave(self, slave): |
974 | + """Dispatch the next XMLRPC for the given slave.""" |
975 | + if len(slave.calls) == 0: |
976 | + # That's the end of the dialogue with the slave. |
977 | + self.slaveConversationEnded() |
978 | + return |
979 | + |
980 | + # Get an XMLRPC proxy for the buildd slave. |
981 | + proxy = self._getProxyForSlave(slave) |
982 | + method, args = slave.calls.pop(0) |
983 | + d = proxy.callRemote(method, *args) |
984 | + d.addBoth(self.checkDispatch, method, slave) |
985 | + self._deferred_list.append(d) |
986 | + self.logger.debug('%s -> %s(%s)' % (slave, method, args)) |
987 | + |
988 | + def slaveConversationEnded(self): |
989 | + """After all the Deferreds are set up, chain a callback on them.""" |
990 | + dl = defer.DeferredList(self._deferred_list, consumeErrors=True) |
991 | + dl.addBoth(self.evaluateDispatchResult) |
992 | + return dl |
993 | + |
994 | + def evaluateDispatchResult(self, deferred_list_results): |
995 | + """Process the DispatchResult for this dispatch chain. |
996 | + |
997 | + After waiting for the Deferred chain to finish, we'll have a |
998 | + DispatchResult to evaluate, which deals with the result of |
999 | + dispatching. |
1000 | + """ |
1001 | + # The `deferred_list_results` is what we get when waiting on a |
1002 | + # DeferredList. It's a list of tuples of (status, result) where |
1003 | + # result is what the last callback in that chain returned. |
1004 | + |
1005 | + # If the result is an instance of BaseDispatchResult we need to |
1006 | + # evaluate it, as there's further action required at the end of |
1007 | + # the dispatch chain. None, resulting from successful chains, |
1008 | + # are discarded. |
1009 | + |
1010 | + dispatch_results = [ |
1011 | + result for status, result in deferred_list_results |
1012 | + if isinstance(result, BaseDispatchResult)] |
1013 | + |
1014 | + for result in dispatch_results: |
1015 | + self.logger.info("%r" % result) |
1016 | + result() |
1017 | + |
1018 | + # At this point, we're done dispatching, so we can schedule the |
1019 | + # next scan cycle. |
1020 | + self.scheduleNextScanCycle() |
1021 | + |
1022 | + # For the test suite so that it can chain callback results. |
1023 | + return deferred_list_results |
1024 | + |
1025 | + def checkResume(self, response, slave): |
1026 | + """Check the result of resuming a slave. |
1027 | + |
1028 | + If there's a problem resuming, we return a ResetDispatchResult which |
1029 | + will get evaluated at the end of the scan, or None if the resume |
1030 | + was OK. |
1031 | + |
1032 | + :param response: the tuple that's constructed in |
1033 | + ProcessWithTimeout.processEnded(), or a Failure that |
1034 | + contains the tuple. |
1035 | + :param slave: the slave object we're talking to |
1036 | + """ |
1037 | + if isinstance(response, Failure): |
1038 | + out, err, code = response.value |
1039 | + else: |
1040 | + out, err, code = response |
1041 | + if code == os.EX_OK: |
1042 | + return None |
1043 | + |
1044 | + error_text = '%s\n%s' % (out, err) |
1045 | + self.logger.error('%s resume failure: %s' % (slave, error_text)) |
1046 | + return self.reset_result(slave, error_text) |
1047 | + |
1048 | + def _incrementFailureCounts(self, builder): |
1049 | + builder.gotFailure() |
1050 | + builder.getCurrentBuildFarmJob().gotFailure() |
1051 | + |
1052 | + def checkDispatch(self, response, method, slave): |
1053 | + """Verify the results of a slave xmlrpc call. |
1054 | + |
1055 | + If it failed and it compromises the slave then return a corresponding |
1056 | + `FailDispatchResult`, if it was a communication failure, simply |
1057 | + reset the slave by returning a `ResetDispatchResult`. |
1058 | + """ |
1059 | + from lp.buildmaster.interfaces.builder import IBuilderSet |
1060 | + builder = getUtility(IBuilderSet)[slave.name] |
1061 | + |
1062 | + # XXX these DispatchResult classes are badly named and do the |
1063 | + # same thing. We need to fix that. |
1064 | + self.logger.debug( |
1065 | + '%s response for "%s": %s' % (slave, method, response)) |
1066 | + |
1067 | + if isinstance(response, Failure): |
1068 | + self.logger.warn( |
1069 | + '%s communication failed (%s)' % |
1070 | + (slave, response.getErrorMessage())) |
1071 | + self.slaveConversationEnded() |
1072 | + self._incrementFailureCounts(builder) |
1073 | + return self.fail_result(slave) |
1074 | + |
1075 | + if isinstance(response, list) and len(response) == 2: |
1076 | + if method in buildd_success_result_map: |
1077 | + expected_status = buildd_success_result_map.get(method) |
1078 | + status, info = response |
1079 | + if status == expected_status: |
1080 | + self.callSlave(slave) |
1081 | return None |
1082 | - return d.addCallback(job_started) |
1083 | - |
1084 | - d.addCallback(status_updated) |
1085 | - d.addCallback(build_updated) |
1086 | - return d |
1087 | + else: |
1088 | + info = 'Unknown slave method: %s' % method |
1089 | + else: |
1090 | + info = 'Unexpected response: %s' % repr(response) |
1091 | + |
1092 | + self.logger.error( |
1093 | + '%s failed to dispatch (%s)' % (slave, info)) |
1094 | + |
1095 | + self.slaveConversationEnded() |
1096 | + self._incrementFailureCounts(builder) |
1097 | + return self.fail_result(slave, info) |
1098 | |
1099 | |
1100 | class NewBuildersScanner: |
1101 | @@ -294,21 +578,15 @@ |
1102 | self.current_builders = [ |
1103 | builder.name for builder in getUtility(IBuilderSet)] |
1104 | |
1105 | - def stop(self): |
1106 | - """Terminate the LoopingCall.""" |
1107 | - self.loop.stop() |
1108 | - |
1109 | def scheduleScan(self): |
1110 | """Schedule a callback SCAN_INTERVAL seconds later.""" |
1111 | - self.loop = LoopingCall(self.scan) |
1112 | - self.loop.clock = self._clock |
1113 | - self.stopping_deferred = self.loop.start(self.SCAN_INTERVAL) |
1114 | - return self.stopping_deferred |
1115 | + return self._clock.callLater(self.SCAN_INTERVAL, self.scan) |
1116 | |
1117 | def scan(self): |
1118 | """If a new builder appears, create a SlaveScanner for it.""" |
1119 | new_builders = self.checkForNewBuilders() |
1120 | self.manager.addScanForBuilders(new_builders) |
1121 | + self.scheduleScan() |
1122 | |
1123 | def checkForNewBuilders(self): |
1124 | """See if any new builders were added.""" |
1125 | @@ -331,7 +609,10 @@ |
1126 | manager=self, clock=clock) |
1127 | |
1128 | def _setupLogger(self): |
1129 | - """Set up a 'slave-scanner' logger that redirects to twisted. |
1130 | + """Setup a 'slave-scanner' logger that redirects to twisted. |
1131 | + |
1132 | + It is going to be used locally and within the thread running |
1133 | + the scan() method. |
1134 | |
1135 | Make it less verbose to avoid messing too much with the old code. |
1136 | """ |
1137 | @@ -362,29 +643,12 @@ |
1138 | # Events will now fire in the SlaveScanner objects to scan each |
1139 | # builder. |
1140 | |
1141 | - def stopService(self): |
1142 | - """Callback for when we need to shut down.""" |
1143 | - # XXX: lacks unit tests |
1144 | - # All the SlaveScanner objects need to be halted gracefully. |
1145 | - deferreds = [slave.stopping_deferred for slave in self.builder_slaves] |
1146 | - deferreds.append(self.new_builders_scanner.stopping_deferred) |
1147 | - |
1148 | - self.new_builders_scanner.stop() |
1149 | - for slave in self.builder_slaves: |
1150 | - slave.stopCycle() |
1151 | - |
1152 | - # The 'stopping_deferred's are called back when the loops are |
1153 | - # stopped, so we can wait on them all at once here before |
1154 | - # exiting. |
1155 | - d = defer.DeferredList(deferreds, consumeErrors=True) |
1156 | - return d |
1157 | - |
1158 | def addScanForBuilders(self, builders): |
1159 | """Set up scanner objects for the builders specified.""" |
1160 | for builder in builders: |
1161 | slave_scanner = SlaveScanner(builder, self.logger) |
1162 | self.builder_slaves.append(slave_scanner) |
1163 | - slave_scanner.startCycle() |
1164 | + slave_scanner.scheduleNextScanCycle() |
1165 | |
1166 | # Return the slave list for the benefit of tests. |
1167 | return self.builder_slaves |
1168 | |
1169 | === modified file 'lib/lp/buildmaster/model/builder.py' |
1170 | --- lib/lp/buildmaster/model/builder.py 2010-10-20 11:54:27 +0000 |
1171 | +++ lib/lp/buildmaster/model/builder.py 2010-12-07 16:24:04 +0000 |
1172 | @@ -13,11 +13,12 @@ |
1173 | ] |
1174 | |
1175 | import gzip |
1176 | +import httplib |
1177 | import logging |
1178 | import os |
1179 | import socket |
1180 | +import subprocess |
1181 | import tempfile |
1182 | -import transaction |
1183 | import urllib2 |
1184 | import xmlrpclib |
1185 | |
1186 | @@ -33,13 +34,6 @@ |
1187 | Count, |
1188 | Sum, |
1189 | ) |
1190 | - |
1191 | -from twisted.internet import ( |
1192 | - defer, |
1193 | - reactor as default_reactor, |
1194 | - ) |
1195 | -from twisted.web import xmlrpc |
1196 | - |
1197 | from zope.component import getUtility |
1198 | from zope.interface import implements |
1199 | |
1200 | @@ -64,6 +58,7 @@ |
1201 | from lp.buildmaster.interfaces.builder import ( |
1202 | BuildDaemonError, |
1203 | BuildSlaveFailure, |
1204 | + CannotBuild, |
1205 | CannotFetchFile, |
1206 | CannotResumeHost, |
1207 | CorruptBuildCookie, |
1208 | @@ -71,6 +66,9 @@ |
1209 | IBuilderSet, |
1210 | ) |
1211 | from lp.buildmaster.interfaces.buildfarmjob import IBuildFarmJobSet |
1212 | +from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
1213 | + BuildBehaviorMismatch, |
1214 | + ) |
1215 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
1216 | from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior |
1217 | from lp.buildmaster.model.buildqueue import ( |
1218 | @@ -80,9 +78,9 @@ |
1219 | from lp.registry.interfaces.person import validate_public_person |
1220 | from lp.services.job.interfaces.job import JobStatus |
1221 | from lp.services.job.model.job import Job |
1222 | +from lp.services.osutils import until_no_eintr |
1223 | from lp.services.propertycache import cachedproperty |
1224 | -from lp.services.twistedsupport.processmonitor import ProcessWithTimeout |
1225 | -from lp.services.twistedsupport import cancel_on_timeout |
1226 | +from lp.services.twistedsupport.xmlrpc import BlockingProxy |
1227 | # XXX Michael Nelson 2010-01-13 bug=491330 |
1228 | # These dependencies on soyuz will be removed when getBuildRecords() |
1229 | # is moved. |
1230 | @@ -94,9 +92,25 @@ |
1231 | from lp.soyuz.model.processor import Processor |
1232 | |
1233 | |
1234 | -class QuietQueryFactory(xmlrpc._QueryFactory): |
1235 | - """XMLRPC client factory that doesn't splatter the log with junk.""" |
1236 | - noisy = False |
1237 | +class TimeoutHTTPConnection(httplib.HTTPConnection): |
1238 | + |
1239 | + def connect(self): |
1240 | + """Override the standard connect() methods to set a timeout""" |
1241 | + ret = httplib.HTTPConnection.connect(self) |
1242 | + self.sock.settimeout(config.builddmaster.socket_timeout) |
1243 | + return ret |
1244 | + |
1245 | + |
1246 | +class TimeoutHTTP(httplib.HTTP): |
1247 | + _connection_class = TimeoutHTTPConnection |
1248 | + |
1249 | + |
1250 | +class TimeoutTransport(xmlrpclib.Transport): |
1251 | + """XMLRPC Transport to setup a socket with defined timeout""" |
1252 | + |
1253 | + def make_connection(self, host): |
1254 | + host, extra_headers, x509 = self.get_host_info(host) |
1255 | + return TimeoutHTTP(host) |
1256 | |
1257 | |
1258 | class BuilderSlave(object): |
1259 | @@ -111,7 +125,24 @@ |
1260 | # many false positives in your test run and will most likely break |
1261 | # production. |
1262 | |
1263 | - def __init__(self, proxy, builder_url, vm_host, reactor=None): |
1264 | + # XXX: This (BuilderSlave) should use composition, rather than |
1265 | + # inheritance. |
1266 | + |
1267 | + # XXX: Have a documented interface for the XML-RPC server: |
1268 | + # - what methods |
1269 | + # - what return values expected |
1270 | + # - what faults |
1271 | + # (see XMLRPCBuildDSlave in lib/canonical/buildd/slave.py). |
1272 | + |
1273 | + # XXX: Arguably, this interface should be asynchronous |
1274 | + # (i.e. Deferred-returning). This would mean that Builder (see below) |
1275 | + # would have to expect Deferreds. |
1276 | + |
1277 | + # XXX: Once we have a client object with a defined, tested interface, we |
1278 | + # should make a test double that doesn't do any XML-RPC and can be used to |
1279 | + # make testing easier & tests faster. |
1280 | + |
1281 | + def __init__(self, proxy, builder_url, vm_host): |
1282 | """Initialize a BuilderSlave. |
1283 | |
1284 | :param proxy: An XML-RPC proxy, implementing 'callRemote'. It must |
1285 | @@ -124,87 +155,63 @@ |
1286 | self._file_cache_url = urlappend(builder_url, 'filecache') |
1287 | self._server = proxy |
1288 | |
1289 | - if reactor is None: |
1290 | - self.reactor = default_reactor |
1291 | - else: |
1292 | - self.reactor = reactor |
1293 | - |
1294 | @classmethod |
1295 | - def makeBuilderSlave(cls, builder_url, vm_host, reactor=None, proxy=None): |
1296 | - """Create and return a `BuilderSlave`. |
1297 | - |
1298 | - :param builder_url: The URL of the slave buildd machine, |
1299 | - e.g. http://localhost:8221 |
1300 | - :param vm_host: If the slave is virtual, specify its host machine here. |
1301 | - :param reactor: Used by tests to override the Twisted reactor. |
1302 | - :param proxy: Used By tests to override the xmlrpc.Proxy. |
1303 | - """ |
1304 | - rpc_url = urlappend(builder_url.encode('utf-8'), 'rpc') |
1305 | - if proxy is None: |
1306 | - server_proxy = xmlrpc.Proxy(rpc_url, allowNone=True) |
1307 | - server_proxy.queryFactory = QuietQueryFactory |
1308 | - else: |
1309 | - server_proxy = proxy |
1310 | - return cls(server_proxy, builder_url, vm_host, reactor) |
1311 | - |
1312 | - def _with_timeout(self, d): |
1313 | - TIMEOUT = config.builddmaster.socket_timeout |
1314 | - return cancel_on_timeout(d, TIMEOUT, self.reactor) |
1315 | + def makeBlockingSlave(cls, builder_url, vm_host): |
1316 | + rpc_url = urlappend(builder_url, 'rpc') |
1317 | + server_proxy = xmlrpclib.ServerProxy( |
1318 | + rpc_url, transport=TimeoutTransport(), allow_none=True) |
1319 | + return cls(BlockingProxy(server_proxy), builder_url, vm_host) |
1320 | |
1321 | def abort(self): |
1322 | """Abort the current build.""" |
1323 | - return self._with_timeout(self._server.callRemote('abort')) |
1324 | + return self._server.callRemote('abort') |
1325 | |
1326 | def clean(self): |
1327 | """Clean up the waiting files and reset the slave's internal state.""" |
1328 | - return self._with_timeout(self._server.callRemote('clean')) |
1329 | + return self._server.callRemote('clean') |
1330 | |
1331 | def echo(self, *args): |
1332 | """Echo the arguments back.""" |
1333 | - return self._with_timeout(self._server.callRemote('echo', *args)) |
1334 | + return self._server.callRemote('echo', *args) |
1335 | |
1336 | def info(self): |
1337 | """Return the protocol version and the builder methods supported.""" |
1338 | - return self._with_timeout(self._server.callRemote('info')) |
1339 | + return self._server.callRemote('info') |
1340 | |
1341 | def status(self): |
1342 | """Return the status of the build daemon.""" |
1343 | - return self._with_timeout(self._server.callRemote('status')) |
1344 | + return self._server.callRemote('status') |
1345 | |
1346 | def ensurepresent(self, sha1sum, url, username, password): |
1347 | - # XXX: Nothing external calls this. Make it private. |
1348 | """Attempt to ensure the given file is present.""" |
1349 | - return self._with_timeout(self._server.callRemote( |
1350 | - 'ensurepresent', sha1sum, url, username, password)) |
1351 | + return self._server.callRemote( |
1352 | + 'ensurepresent', sha1sum, url, username, password) |
1353 | |
1354 | def getFile(self, sha_sum): |
1355 | """Construct a file-like object to return the named file.""" |
1356 | - # XXX 2010-10-18 bug=662631 |
1357 | - # Change this to do non-blocking IO. |
1358 | file_url = urlappend(self._file_cache_url, sha_sum) |
1359 | return urllib2.urlopen(file_url) |
1360 | |
1361 | - def resume(self, clock=None): |
1362 | - """Resume the builder in an asynchronous fashion. |
1363 | - |
1364 | - We use the builddmaster configuration 'socket_timeout' as |
1365 | - the process timeout. |
1366 | - |
1367 | - :param clock: An optional twisted.internet.task.Clock to override |
1368 | - the default clock. For use in tests. |
1369 | - |
1370 | - :return: a Deferred that returns a |
1371 | - (stdout, stderr, subprocess exitcode) triple |
1372 | + def resume(self): |
1373 | + """Resume a virtual builder. |
1374 | + |
1375 | + It uses the configuration command-line (replacing 'vm_host') and |
1376 | + return its output. |
1377 | + |
1378 | + :return: a (stdout, stderr, subprocess exitcode) triple |
1379 | """ |
1380 | + # XXX: This executes the vm_resume_command |
1381 | + # synchronously. RecordingSlave does so asynchronously. Since we |
1382 | + # always want to do this asynchronously, there's no need for the |
1383 | + # duplication. |
1384 | resume_command = config.builddmaster.vm_resume_command % { |
1385 | 'vm_host': self._vm_host} |
1386 | - # Twisted API requires string but the configuration provides unicode. |
1387 | - resume_argv = [term.encode('utf-8') for term in resume_command.split()] |
1388 | - d = defer.Deferred() |
1389 | - p = ProcessWithTimeout( |
1390 | - d, config.builddmaster.socket_timeout, clock=clock) |
1391 | - p.spawnProcess(resume_argv[0], tuple(resume_argv)) |
1392 | - return d |
1393 | + resume_argv = resume_command.split() |
1394 | + resume_process = subprocess.Popen( |
1395 | + resume_argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE) |
1396 | + stdout, stderr = resume_process.communicate() |
1397 | + |
1398 | + return (stdout, stderr, resume_process.returncode) |
1399 | |
1400 | def cacheFile(self, logger, libraryfilealias): |
1401 | """Make sure that the file at 'libraryfilealias' is on the slave. |
1402 | @@ -217,15 +224,13 @@ |
1403 | "Asking builder on %s to ensure it has file %s (%s, %s)" % ( |
1404 | self._file_cache_url, libraryfilealias.filename, url, |
1405 | libraryfilealias.content.sha1)) |
1406 | - return self.sendFileToSlave(libraryfilealias.content.sha1, url) |
1407 | + self.sendFileToSlave(libraryfilealias.content.sha1, url) |
1408 | |
1409 | def sendFileToSlave(self, sha1, url, username="", password=""): |
1410 | """Helper to send the file at 'url' with 'sha1' to this builder.""" |
1411 | - d = self.ensurepresent(sha1, url, username, password) |
1412 | - def check_present((present, info)): |
1413 | - if not present: |
1414 | - raise CannotFetchFile(url, info) |
1415 | - return d.addCallback(check_present) |
1416 | + present, info = self.ensurepresent(sha1, url, username, password) |
1417 | + if not present: |
1418 | + raise CannotFetchFile(url, info) |
1419 | |
1420 | def build(self, buildid, builder_type, chroot_sha1, filemap, args): |
1421 | """Build a thing on this build slave. |
1422 | @@ -238,18 +243,19 @@ |
1423 | :param args: A dictionary of extra arguments. The contents depend on |
1424 | the build job type. |
1425 | """ |
1426 | - d = self._with_timeout(self._server.callRemote( |
1427 | - 'build', buildid, builder_type, chroot_sha1, filemap, args)) |
1428 | - def got_fault(failure): |
1429 | - failure.trap(xmlrpclib.Fault) |
1430 | - raise BuildSlaveFailure(failure.value) |
1431 | - return d.addErrback(got_fault) |
1432 | + try: |
1433 | + return self._server.callRemote( |
1434 | + 'build', buildid, builder_type, chroot_sha1, filemap, args) |
1435 | + except xmlrpclib.Fault, info: |
1436 | + raise BuildSlaveFailure(info) |
1437 | |
1438 | |
1439 | # This is a separate function since MockBuilder needs to use it too. |
1440 | # Do not use it -- (Mock)Builder.rescueIfLost should be used instead. |
1441 | def rescueBuilderIfLost(builder, logger=None): |
1442 | """See `IBuilder`.""" |
1443 | + status_sentence = builder.slaveStatusSentence() |
1444 | + |
1445 | # 'ident_position' dict relates the position of the job identifier |
1446 | # token in the sentence received from status(), according the |
1447 | # two status we care about. See see lib/canonical/buildd/slave.py |
1448 | @@ -259,58 +265,61 @@ |
1449 | 'BuilderStatus.WAITING': 2 |
1450 | } |
1451 | |
1452 | - d = builder.slaveStatusSentence() |
1453 | - |
1454 | - def got_status(status_sentence): |
1455 | - """After we get the status, clean if we have to. |
1456 | - |
1457 | - Always return status_sentence. |
1458 | - """ |
1459 | - # Isolate the BuilderStatus string, always the first token in |
1460 | - # see lib/canonical/buildd/slave.py and |
1461 | - # IBuilder.slaveStatusSentence(). |
1462 | - status = status_sentence[0] |
1463 | - |
1464 | - # If the cookie test below fails, it will request an abort of the |
1465 | - # builder. This will leave the builder in the aborted state and |
1466 | - # with no assigned job, and we should now "clean" the slave which |
1467 | - # will reset its state back to IDLE, ready to accept new builds. |
1468 | - # This situation is usually caused by a temporary loss of |
1469 | - # communications with the slave and the build manager had to reset |
1470 | - # the job. |
1471 | - if status == 'BuilderStatus.ABORTED' and builder.currentjob is None: |
1472 | - if logger is not None: |
1473 | - logger.info( |
1474 | - "Builder '%s' being cleaned up from ABORTED" % |
1475 | - (builder.name,)) |
1476 | - d = builder.cleanSlave() |
1477 | - return d.addCallback(lambda ignored: status_sentence) |
1478 | + # Isolate the BuilderStatus string, always the first token in |
1479 | + # see lib/canonical/buildd/slave.py and |
1480 | + # IBuilder.slaveStatusSentence(). |
1481 | + status = status_sentence[0] |
1482 | + |
1483 | + # If the cookie test below fails, it will request an abort of the |
1484 | + # builder. This will leave the builder in the aborted state and |
1485 | + # with no assigned job, and we should now "clean" the slave which |
1486 | + # will reset its state back to IDLE, ready to accept new builds. |
1487 | + # This situation is usually caused by a temporary loss of |
1488 | + # communications with the slave and the build manager had to reset |
1489 | + # the job. |
1490 | + if status == 'BuilderStatus.ABORTED' and builder.currentjob is None: |
1491 | + builder.cleanSlave() |
1492 | + if logger is not None: |
1493 | + logger.info( |
1494 | + "Builder '%s' cleaned up from ABORTED" % builder.name) |
1495 | + return |
1496 | + |
1497 | + # If slave is not building nor waiting, it's not in need of rescuing. |
1498 | + if status not in ident_position.keys(): |
1499 | + return |
1500 | + |
1501 | + slave_build_id = status_sentence[ident_position[status]] |
1502 | + |
1503 | + try: |
1504 | + builder.verifySlaveBuildCookie(slave_build_id) |
1505 | + except CorruptBuildCookie, reason: |
1506 | + if status == 'BuilderStatus.WAITING': |
1507 | + builder.cleanSlave() |
1508 | else: |
1509 | - return status_sentence |
1510 | - |
1511 | - def rescue_slave(status_sentence): |
1512 | - # If slave is not building nor waiting, it's not in need of rescuing. |
1513 | - status = status_sentence[0] |
1514 | - if status not in ident_position.keys(): |
1515 | - return |
1516 | - slave_build_id = status_sentence[ident_position[status]] |
1517 | - try: |
1518 | - builder.verifySlaveBuildCookie(slave_build_id) |
1519 | - except CorruptBuildCookie, reason: |
1520 | - if status == 'BuilderStatus.WAITING': |
1521 | - d = builder.cleanSlave() |
1522 | - else: |
1523 | - d = builder.requestAbort() |
1524 | - def log_rescue(ignored): |
1525 | - if logger: |
1526 | - logger.info( |
1527 | - "Builder '%s' rescued from '%s': '%s'" % |
1528 | - (builder.name, slave_build_id, reason)) |
1529 | - return d.addCallback(log_rescue) |
1530 | - |
1531 | - d.addCallback(got_status) |
1532 | - d.addCallback(rescue_slave) |
1533 | - return d |
1534 | + builder.requestAbort() |
1535 | + if logger: |
1536 | + logger.info( |
1537 | + "Builder '%s' rescued from '%s': '%s'" % |
1538 | + (builder.name, slave_build_id, reason)) |
1539 | + |
1540 | + |
1541 | +def _update_builder_status(builder, logger=None): |
1542 | + """Really update the builder status.""" |
1543 | + try: |
1544 | + builder.checkSlaveAlive() |
1545 | + builder.rescueIfLost(logger) |
1546 | + # Catch only known exceptions. |
1547 | + # XXX cprov 2007-06-15 bug=120571: ValueError & TypeError catching is |
1548 | + # disturbing in this context. We should spend sometime sanitizing the |
1549 | + # exceptions raised in the Builder API since we already started the |
1550 | + # main refactoring of this area. |
1551 | + except (ValueError, TypeError, xmlrpclib.Fault, |
1552 | + BuildDaemonError), reason: |
1553 | + builder.failBuilder(str(reason)) |
1554 | + if logger: |
1555 | + logger.warn( |
1556 | + "%s (%s) marked as failed due to: %s", |
1557 | + builder.name, builder.url, builder.failnotes, exc_info=True) |
1558 | |
1559 | |
1560 | def updateBuilderStatus(builder, logger=None): |
1561 | @@ -318,7 +327,16 @@ |
1562 | if logger: |
1563 | logger.debug('Checking %s' % builder.name) |
1564 | |
1565 | - return builder.rescueIfLost(logger) |
1566 | + MAX_EINTR_RETRIES = 42 # pulling a number out of my a$$ here |
1567 | + try: |
1568 | + return until_no_eintr( |
1569 | + MAX_EINTR_RETRIES, _update_builder_status, builder, logger=logger) |
1570 | + except socket.error, reason: |
1571 | + # In Python 2.6 we can use IOError instead. It also has |
1572 | + # reason.errno but we might be using 2.5 here so use the |
1573 | + # index hack. |
1574 | + error_message = str(reason) |
1575 | + builder.handleTimeout(logger, error_message) |
1576 | |
1577 | |
1578 | class Builder(SQLBase): |
1579 | @@ -346,10 +364,6 @@ |
1580 | active = BoolCol(dbName='active', notNull=True, default=True) |
1581 | failure_count = IntCol(dbName='failure_count', default=0, notNull=True) |
1582 | |
1583 | - # The number of times a builder can consecutively fail before we |
1584 | - # give up and mark it builderok=False. |
1585 | - FAILURE_THRESHOLD = 5 |
1586 | - |
1587 | def _getCurrentBuildBehavior(self): |
1588 | """Return the current build behavior.""" |
1589 | if not safe_hasattr(self, '_current_build_behavior'): |
1590 | @@ -395,13 +409,18 @@ |
1591 | """See `IBuilder`.""" |
1592 | self.failure_count = 0 |
1593 | |
1594 | + def checkSlaveAlive(self): |
1595 | + """See IBuilder.""" |
1596 | + if self.slave.echo("Test")[0] != "Test": |
1597 | + raise BuildDaemonError("Failed to echo OK") |
1598 | + |
1599 | def rescueIfLost(self, logger=None): |
1600 | """See `IBuilder`.""" |
1601 | - return rescueBuilderIfLost(self, logger) |
1602 | + rescueBuilderIfLost(self, logger) |
1603 | |
1604 | def updateStatus(self, logger=None): |
1605 | """See `IBuilder`.""" |
1606 | - return updateBuilderStatus(self, logger) |
1607 | + updateBuilderStatus(self, logger) |
1608 | |
1609 | def cleanSlave(self): |
1610 | """See IBuilder.""" |
1611 | @@ -421,23 +440,20 @@ |
1612 | def resumeSlaveHost(self): |
1613 | """See IBuilder.""" |
1614 | if not self.virtualized: |
1615 | - return defer.fail(CannotResumeHost('Builder is not virtualized.')) |
1616 | + raise CannotResumeHost('Builder is not virtualized.') |
1617 | |
1618 | if not self.vm_host: |
1619 | - return defer.fail(CannotResumeHost('Undefined vm_host.')) |
1620 | + raise CannotResumeHost('Undefined vm_host.') |
1621 | |
1622 | logger = self._getSlaveScannerLogger() |
1623 | logger.debug("Resuming %s (%s)" % (self.name, self.url)) |
1624 | |
1625 | - d = self.slave.resume() |
1626 | - def got_resume_ok((stdout, stderr, returncode)): |
1627 | - return stdout, stderr |
1628 | - def got_resume_bad(failure): |
1629 | - stdout, stderr, code = failure.value |
1630 | + stdout, stderr, returncode = self.slave.resume() |
1631 | + if returncode != 0: |
1632 | raise CannotResumeHost( |
1633 | "Resuming failed:\nOUT:\n%s\nERR:\n%s\n" % (stdout, stderr)) |
1634 | |
1635 | - return d.addCallback(got_resume_ok).addErrback(got_resume_bad) |
1636 | + return stdout, stderr |
1637 | |
1638 | @cachedproperty |
1639 | def slave(self): |
1640 | @@ -446,7 +462,7 @@ |
1641 | # the slave object, which is usually an XMLRPC client, with a |
1642 | # stub object that removes the need to actually create a buildd |
1643 | # slave in various states - which can be hard to create. |
1644 | - return BuilderSlave.makeBuilderSlave(self.url, self.vm_host) |
1645 | + return BuilderSlave.makeBlockingSlave(self.url, self.vm_host) |
1646 | |
1647 | def setSlaveForTesting(self, proxy): |
1648 | """See IBuilder.""" |
1649 | @@ -467,23 +483,18 @@ |
1650 | |
1651 | # If we are building a virtual build, resume the virtual machine. |
1652 | if self.virtualized: |
1653 | - d = self.resumeSlaveHost() |
1654 | - else: |
1655 | - d = defer.succeed(None) |
1656 | + self.resumeSlaveHost() |
1657 | |
1658 | - def resume_done(ignored): |
1659 | - return self.current_build_behavior.dispatchBuildToSlave( |
1660 | + # Do it. |
1661 | + build_queue_item.markAsBuilding(self) |
1662 | + try: |
1663 | + self.current_build_behavior.dispatchBuildToSlave( |
1664 | build_queue_item.id, logger) |
1665 | - |
1666 | - def eb_slave_failure(failure): |
1667 | - failure.trap(BuildSlaveFailure) |
1668 | - e = failure.value |
1669 | + except BuildSlaveFailure, e: |
1670 | + logger.debug("Disabling builder: %s" % self.url, exc_info=1) |
1671 | self.failBuilder( |
1672 | "Exception (%s) when setting up to new job" % (e,)) |
1673 | - |
1674 | - def eb_cannot_fetch_file(failure): |
1675 | - failure.trap(CannotFetchFile) |
1676 | - e = failure.value |
1677 | + except CannotFetchFile, e: |
1678 | message = """Slave '%s' (%s) was unable to fetch file. |
1679 | ****** URL ******** |
1680 | %s |
1681 | @@ -492,19 +503,10 @@ |
1682 | ******************* |
1683 | """ % (self.name, self.url, e.file_url, e.error_information) |
1684 | raise BuildDaemonError(message) |
1685 | - |
1686 | - def eb_socket_error(failure): |
1687 | - failure.trap(socket.error) |
1688 | - e = failure.value |
1689 | + except socket.error, e: |
1690 | error_message = "Exception (%s) when setting up new job" % (e,) |
1691 | - d = self.handleTimeout(logger, error_message) |
1692 | - return d.addBoth(lambda ignored: failure) |
1693 | - |
1694 | - d.addCallback(resume_done) |
1695 | - d.addErrback(eb_slave_failure) |
1696 | - d.addErrback(eb_cannot_fetch_file) |
1697 | - d.addErrback(eb_socket_error) |
1698 | - return d |
1699 | + self.handleTimeout(logger, error_message) |
1700 | + raise BuildSlaveFailure |
1701 | |
1702 | def failBuilder(self, reason): |
1703 | """See IBuilder""" |
1704 | @@ -532,24 +534,22 @@ |
1705 | |
1706 | def slaveStatus(self): |
1707 | """See IBuilder.""" |
1708 | - d = self.slave.status() |
1709 | - def got_status(status_sentence): |
1710 | - status = {'builder_status': status_sentence[0]} |
1711 | - |
1712 | - # Extract detailed status and log information if present. |
1713 | - # Although build_id is also easily extractable here, there is no |
1714 | - # valid reason for anything to use it, so we exclude it. |
1715 | - if status['builder_status'] == 'BuilderStatus.WAITING': |
1716 | - status['build_status'] = status_sentence[1] |
1717 | - else: |
1718 | - if status['builder_status'] == 'BuilderStatus.BUILDING': |
1719 | - status['logtail'] = status_sentence[2] |
1720 | - |
1721 | - self.current_build_behavior.updateSlaveStatus( |
1722 | - status_sentence, status) |
1723 | - return status |
1724 | - |
1725 | - return d.addCallback(got_status) |
1726 | + builder_version, builder_arch, mechanisms = self.slave.info() |
1727 | + status_sentence = self.slave.status() |
1728 | + |
1729 | + status = {'builder_status': status_sentence[0]} |
1730 | + |
1731 | + # Extract detailed status and log information if present. |
1732 | + # Although build_id is also easily extractable here, there is no |
1733 | + # valid reason for anything to use it, so we exclude it. |
1734 | + if status['builder_status'] == 'BuilderStatus.WAITING': |
1735 | + status['build_status'] = status_sentence[1] |
1736 | + else: |
1737 | + if status['builder_status'] == 'BuilderStatus.BUILDING': |
1738 | + status['logtail'] = status_sentence[2] |
1739 | + |
1740 | + self.current_build_behavior.updateSlaveStatus(status_sentence, status) |
1741 | + return status |
1742 | |
1743 | def slaveStatusSentence(self): |
1744 | """See IBuilder.""" |
1745 | @@ -562,15 +562,13 @@ |
1746 | |
1747 | def updateBuild(self, queueItem): |
1748 | """See `IBuilder`.""" |
1749 | - return self.current_build_behavior.updateBuild(queueItem) |
1750 | + self.current_build_behavior.updateBuild(queueItem) |
1751 | |
1752 | def transferSlaveFileToLibrarian(self, file_sha1, filename, private): |
1753 | """See IBuilder.""" |
1754 | out_file_fd, out_file_name = tempfile.mkstemp(suffix=".buildlog") |
1755 | out_file = os.fdopen(out_file_fd, "r+") |
1756 | try: |
1757 | - # XXX 2010-10-18 bug=662631 |
1758 | - # Change this to do non-blocking IO. |
1759 | slave_file = self.slave.getFile(file_sha1) |
1760 | copy_and_close(slave_file, out_file) |
1761 | # If the requested file is the 'buildlog' compress it using gzip |
1762 | @@ -601,17 +599,18 @@ |
1763 | |
1764 | return library_file.id |
1765 | |
1766 | - def isAvailable(self): |
1767 | + @property |
1768 | + def is_available(self): |
1769 | """See `IBuilder`.""" |
1770 | if not self.builderok: |
1771 | - return defer.succeed(False) |
1772 | - d = self.slaveStatusSentence() |
1773 | - def catch_fault(failure): |
1774 | - failure.trap(xmlrpclib.Fault, socket.error) |
1775 | - return False |
1776 | - def check_available(status): |
1777 | - return status[0] == BuilderStatus.IDLE |
1778 | - return d.addCallbacks(check_available, catch_fault) |
1779 | + return False |
1780 | + try: |
1781 | + slavestatus = self.slaveStatusSentence() |
1782 | + except (xmlrpclib.Fault, socket.error): |
1783 | + return False |
1784 | + if slavestatus[0] != BuilderStatus.IDLE: |
1785 | + return False |
1786 | + return True |
1787 | |
1788 | def _getSlaveScannerLogger(self): |
1789 | """Return the logger instance from buildd-slave-scanner.py.""" |
1790 | @@ -622,27 +621,6 @@ |
1791 | logger = logging.getLogger('slave-scanner') |
1792 | return logger |
1793 | |
1794 | - def acquireBuildCandidate(self): |
1795 | - """Acquire a build candidate in an atomic fashion. |
1796 | - |
1797 | - When retrieiving a candidate we need to mark it as building |
1798 | - immediately so that it is not dispatched by another builder in the |
1799 | - build manager. |
1800 | - |
1801 | - We can consider this to be atomic because although the build manager |
1802 | - is a Twisted app and gives the appearance of doing lots of things at |
1803 | - once, it's still single-threaded so no more than one builder scan |
1804 | - can be in this code at the same time. |
1805 | - |
1806 | - If there's ever more than one build manager running at once, then |
1807 | - this code will need some sort of mutex. |
1808 | - """ |
1809 | - candidate = self._findBuildCandidate() |
1810 | - if candidate is not None: |
1811 | - candidate.markAsBuilding(self) |
1812 | - transaction.commit() |
1813 | - return candidate |
1814 | - |
1815 | def _findBuildCandidate(self): |
1816 | """Find a candidate job for dispatch to an idle buildd slave. |
1817 | |
1818 | @@ -722,46 +700,52 @@ |
1819 | :param candidate: The job to dispatch. |
1820 | """ |
1821 | logger = self._getSlaveScannerLogger() |
1822 | - # Using maybeDeferred ensures that any exceptions are also |
1823 | - # wrapped up and caught later. |
1824 | - d = defer.maybeDeferred(self.startBuild, candidate, logger) |
1825 | - return d |
1826 | + try: |
1827 | + self.startBuild(candidate, logger) |
1828 | + except (BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch), err: |
1829 | + logger.warn('Could not build: %s' % err) |
1830 | |
1831 | def handleTimeout(self, logger, error_message): |
1832 | """See IBuilder.""" |
1833 | + builder_should_be_failed = True |
1834 | + |
1835 | if self.virtualized: |
1836 | # Virtualized/PPA builder: attempt a reset. |
1837 | logger.warn( |
1838 | "Resetting builder: %s -- %s" % (self.url, error_message), |
1839 | exc_info=True) |
1840 | - d = self.resumeSlaveHost() |
1841 | - return d |
1842 | - else: |
1843 | - # XXX: This should really let the failure bubble up to the |
1844 | - # scan() method that does the failure counting. |
1845 | + try: |
1846 | + self.resumeSlaveHost() |
1847 | + except CannotResumeHost, err: |
1848 | + # Failed to reset builder. |
1849 | + logger.warn( |
1850 | + "Failed to reset builder: %s -- %s" % |
1851 | + (self.url, str(err)), exc_info=True) |
1852 | + else: |
1853 | + # Builder was reset, do *not* mark it as failed. |
1854 | + builder_should_be_failed = False |
1855 | + |
1856 | + if builder_should_be_failed: |
1857 | # Mark builder as 'failed'. |
1858 | logger.warn( |
1859 | - "Disabling builder: %s -- %s" % (self.url, error_message)) |
1860 | + "Disabling builder: %s -- %s" % (self.url, error_message), |
1861 | + exc_info=True) |
1862 | self.failBuilder(error_message) |
1863 | - return defer.succeed(None) |
1864 | |
1865 | def findAndStartJob(self, buildd_slave=None): |
1866 | """See IBuilder.""" |
1867 | - # XXX This method should be removed in favour of two separately |
1868 | - # called methods that find and dispatch the job. It will |
1869 | - # require a lot of test fixing. |
1870 | logger = self._getSlaveScannerLogger() |
1871 | - candidate = self.acquireBuildCandidate() |
1872 | + candidate = self._findBuildCandidate() |
1873 | |
1874 | if candidate is None: |
1875 | logger.debug("No build candidates available for builder.") |
1876 | - return defer.succeed(None) |
1877 | + return None |
1878 | |
1879 | if buildd_slave is not None: |
1880 | self.setSlaveForTesting(buildd_slave) |
1881 | |
1882 | - d = self._dispatchBuildCandidate(candidate) |
1883 | - return d.addCallback(lambda ignored: candidate) |
1884 | + self._dispatchBuildCandidate(candidate) |
1885 | + return candidate |
1886 | |
1887 | def getBuildQueue(self): |
1888 | """See `IBuilder`.""" |
1889 | |
1890 | === modified file 'lib/lp/buildmaster/model/buildfarmjobbehavior.py' |
1891 | --- lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-10-20 11:54:27 +0000 |
1892 | +++ lib/lp/buildmaster/model/buildfarmjobbehavior.py 2010-12-07 16:24:04 +0000 |
1893 | @@ -16,18 +16,13 @@ |
1894 | import socket |
1895 | import xmlrpclib |
1896 | |
1897 | -from twisted.internet import defer |
1898 | - |
1899 | from zope.component import getUtility |
1900 | from zope.interface import implements |
1901 | from zope.security.proxy import removeSecurityProxy |
1902 | |
1903 | from canonical import encoding |
1904 | from canonical.librarian.interfaces import ILibrarianClient |
1905 | -from lp.buildmaster.interfaces.builder import ( |
1906 | - BuildSlaveFailure, |
1907 | - CorruptBuildCookie, |
1908 | - ) |
1909 | +from lp.buildmaster.interfaces.builder import CorruptBuildCookie |
1910 | from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
1911 | BuildBehaviorMismatch, |
1912 | IBuildFarmJobBehavior, |
1913 | @@ -74,53 +69,54 @@ |
1914 | """See `IBuildFarmJobBehavior`.""" |
1915 | logger = logging.getLogger('slave-scanner') |
1916 | |
1917 | - d = self._builder.slaveStatus() |
1918 | - |
1919 | - def got_failure(failure): |
1920 | - failure.trap(xmlrpclib.Fault, socket.error) |
1921 | - info = failure.value |
1922 | + try: |
1923 | + slave_status = self._builder.slaveStatus() |
1924 | + except (xmlrpclib.Fault, socket.error), info: |
1925 | + # XXX cprov 2005-06-29: |
1926 | + # Hmm, a problem with the xmlrpc interface, |
1927 | + # disable the builder ?? or simple notice the failure |
1928 | + # with a timestamp. |
1929 | info = ("Could not contact the builder %s, caught a (%s)" |
1930 | % (queueItem.builder.url, info)) |
1931 | - raise BuildSlaveFailure(info) |
1932 | - |
1933 | - def got_status(slave_status): |
1934 | - builder_status_handlers = { |
1935 | - 'BuilderStatus.IDLE': self.updateBuild_IDLE, |
1936 | - 'BuilderStatus.BUILDING': self.updateBuild_BUILDING, |
1937 | - 'BuilderStatus.ABORTING': self.updateBuild_ABORTING, |
1938 | - 'BuilderStatus.ABORTED': self.updateBuild_ABORTED, |
1939 | - 'BuilderStatus.WAITING': self.updateBuild_WAITING, |
1940 | - } |
1941 | - |
1942 | - builder_status = slave_status['builder_status'] |
1943 | - if builder_status not in builder_status_handlers: |
1944 | - logger.critical( |
1945 | - "Builder on %s returned unknown status %s, failing it" |
1946 | - % (self._builder.url, builder_status)) |
1947 | - self._builder.failBuilder( |
1948 | - "Unknown status code (%s) returned from status() probe." |
1949 | - % builder_status) |
1950 | - # XXX: This will leave the build and job in a bad state, but |
1951 | - # should never be possible, since our builder statuses are |
1952 | - # known. |
1953 | - queueItem._builder = None |
1954 | - queueItem.setDateStarted(None) |
1955 | - return |
1956 | - |
1957 | - # Since logtail is a xmlrpclib.Binary container and it is |
1958 | - # returned from the IBuilder content class, it arrives |
1959 | - # protected by a Zope Security Proxy, which is not declared, |
1960 | - # thus empty. Before passing it to the status handlers we |
1961 | - # will simply remove the proxy. |
1962 | - logtail = removeSecurityProxy(slave_status.get('logtail')) |
1963 | - |
1964 | - method = builder_status_handlers[builder_status] |
1965 | - return defer.maybeDeferred( |
1966 | - method, queueItem, slave_status, logtail, logger) |
1967 | - |
1968 | - d.addErrback(got_failure) |
1969 | - d.addCallback(got_status) |
1970 | - return d |
1971 | + logger.debug(info, exc_info=True) |
1972 | + # keep the job for scan |
1973 | + return |
1974 | + |
1975 | + builder_status_handlers = { |
1976 | + 'BuilderStatus.IDLE': self.updateBuild_IDLE, |
1977 | + 'BuilderStatus.BUILDING': self.updateBuild_BUILDING, |
1978 | + 'BuilderStatus.ABORTING': self.updateBuild_ABORTING, |
1979 | + 'BuilderStatus.ABORTED': self.updateBuild_ABORTED, |
1980 | + 'BuilderStatus.WAITING': self.updateBuild_WAITING, |
1981 | + } |
1982 | + |
1983 | + builder_status = slave_status['builder_status'] |
1984 | + if builder_status not in builder_status_handlers: |
1985 | + logger.critical( |
1986 | + "Builder on %s returned unknown status %s, failing it" |
1987 | + % (self._builder.url, builder_status)) |
1988 | + self._builder.failBuilder( |
1989 | + "Unknown status code (%s) returned from status() probe." |
1990 | + % builder_status) |
1991 | + # XXX: This will leave the build and job in a bad state, but |
1992 | + # should never be possible, since our builder statuses are |
1993 | + # known. |
1994 | + queueItem._builder = None |
1995 | + queueItem.setDateStarted(None) |
1996 | + return |
1997 | + |
1998 | + # Since logtail is a xmlrpclib.Binary container and it is returned |
1999 | + # from the IBuilder content class, it arrives protected by a Zope |
2000 | + # Security Proxy, which is not declared, thus empty. Before passing |
2001 | + # it to the status handlers we will simply remove the proxy. |
2002 | + logtail = removeSecurityProxy(slave_status.get('logtail')) |
2003 | + |
2004 | + method = builder_status_handlers[builder_status] |
2005 | + try: |
2006 | + method(queueItem, slave_status, logtail, logger) |
2007 | + except TypeError, e: |
2008 | + logger.critical("Received wrong number of args in response.") |
2009 | + logger.exception(e) |
2010 | |
2011 | def updateBuild_IDLE(self, queueItem, slave_status, logtail, logger): |
2012 | """Somehow the builder forgot about the build job. |
2013 | @@ -150,13 +146,11 @@ |
2014 | |
2015 | Clean the builder for another jobs. |
2016 | """ |
2017 | - d = queueItem.builder.cleanSlave() |
2018 | - def got_cleaned(ignored): |
2019 | - queueItem.builder = None |
2020 | - if queueItem.job.status != JobStatus.FAILED: |
2021 | - queueItem.job.fail() |
2022 | - queueItem.specific_job.jobAborted() |
2023 | - return d.addCallback(got_cleaned) |
2024 | + queueItem.builder.cleanSlave() |
2025 | + queueItem.builder = None |
2026 | + if queueItem.job.status != JobStatus.FAILED: |
2027 | + queueItem.job.fail() |
2028 | + queueItem.specific_job.jobAborted() |
2029 | |
2030 | def extractBuildStatus(self, slave_status): |
2031 | """Read build status name. |
2032 | @@ -191,8 +185,6 @@ |
2033 | # XXX: dsilvers 2005-03-02: Confirm the builder has the right build? |
2034 | |
2035 | build = queueItem.specific_job.build |
2036 | - # XXX 2010-10-18 bug=662631 |
2037 | - # Change this to do non-blocking IO. |
2038 | build.handleStatus(build_status, librarian, slave_status) |
2039 | |
2040 | |
2041 | |
2042 | === modified file 'lib/lp/buildmaster/model/packagebuild.py' |
2043 | --- lib/lp/buildmaster/model/packagebuild.py 2010-10-26 20:43:50 +0000 |
2044 | +++ lib/lp/buildmaster/model/packagebuild.py 2010-12-07 16:24:04 +0000 |
2045 | @@ -163,8 +163,6 @@ |
2046 | def getLogFromSlave(package_build): |
2047 | """See `IPackageBuild`.""" |
2048 | builder = package_build.buildqueue_record.builder |
2049 | - # XXX 2010-10-18 bug=662631 |
2050 | - # Change this to do non-blocking IO. |
2051 | return builder.transferSlaveFileToLibrarian( |
2052 | SLAVE_LOG_FILENAME, |
2053 | package_build.buildqueue_record.getLogFileName(), |
2054 | @@ -180,8 +178,6 @@ |
2055 | # log, builder and date_finished are read-only, so we must |
2056 | # currently remove the security proxy to set them. |
2057 | naked_build = removeSecurityProxy(build) |
2058 | - # XXX 2010-10-18 bug=662631 |
2059 | - # Change this to do non-blocking IO. |
2060 | naked_build.log = build.getLogFromSlave(build) |
2061 | naked_build.builder = build.buildqueue_record.builder |
2062 | # XXX cprov 20060615 bug=120584: Currently buildduration includes |
2063 | @@ -278,8 +274,6 @@ |
2064 | logger.critical("Unknown BuildStatus '%s' for builder '%s'" |
2065 | % (status, self.buildqueue_record.builder.url)) |
2066 | return |
2067 | - # XXX 2010-10-18 bug=662631 |
2068 | - # Change this to do non-blocking IO. |
2069 | method(librarian, slave_status, logger) |
2070 | |
2071 | def _handleStatus_OK(self, librarian, slave_status, logger): |
2072 | |
2073 | === modified file 'lib/lp/buildmaster/tests/mock_slaves.py' |
2074 | --- lib/lp/buildmaster/tests/mock_slaves.py 2010-10-14 15:37:56 +0000 |
2075 | +++ lib/lp/buildmaster/tests/mock_slaves.py 2010-12-07 16:24:04 +0000 |
2076 | @@ -6,40 +6,21 @@ |
2077 | __metaclass__ = type |
2078 | |
2079 | __all__ = [ |
2080 | - 'AbortedSlave', |
2081 | - 'AbortingSlave', |
2082 | + 'MockBuilder', |
2083 | + 'LostBuildingBrokenSlave', |
2084 | 'BrokenSlave', |
2085 | + 'OkSlave', |
2086 | 'BuildingSlave', |
2087 | - 'CorruptBehavior', |
2088 | - 'DeadProxy', |
2089 | - 'LostBuildingBrokenSlave', |
2090 | - 'MockBuilder', |
2091 | - 'OkSlave', |
2092 | - 'SlaveTestHelpers', |
2093 | - 'TrivialBehavior', |
2094 | + 'AbortedSlave', |
2095 | 'WaitingSlave', |
2096 | + 'AbortingSlave', |
2097 | ] |
2098 | |
2099 | -import fixtures |
2100 | -import os |
2101 | - |
2102 | from StringIO import StringIO |
2103 | import xmlrpclib |
2104 | |
2105 | -from testtools.content import Content |
2106 | -from testtools.content_type import UTF8_TEXT |
2107 | - |
2108 | -from twisted.internet import defer |
2109 | -from twisted.web import xmlrpc |
2110 | - |
2111 | -from canonical.buildd.tests.harness import BuilddSlaveTestSetup |
2112 | - |
2113 | -from lp.buildmaster.interfaces.builder import ( |
2114 | - CannotFetchFile, |
2115 | - CorruptBuildCookie, |
2116 | - ) |
2117 | +from lp.buildmaster.interfaces.builder import CannotFetchFile |
2118 | from lp.buildmaster.model.builder import ( |
2119 | - BuilderSlave, |
2120 | rescueBuilderIfLost, |
2121 | updateBuilderStatus, |
2122 | ) |
2123 | @@ -78,9 +59,15 @@ |
2124 | slave_build_id) |
2125 | |
2126 | def cleanSlave(self): |
2127 | + # XXX: This should not print anything. The print is only here to make |
2128 | + # doc/builder.txt a meaningful test. |
2129 | + print 'Cleaning slave' |
2130 | return self.slave.clean() |
2131 | |
2132 | def requestAbort(self): |
2133 | + # XXX: This should not print anything. The print is only here to make |
2134 | + # doc/builder.txt a meaningful test. |
2135 | + print 'Aborting slave' |
2136 | return self.slave.abort() |
2137 | |
2138 | def resumeSlave(self, logger): |
2139 | @@ -90,10 +77,10 @@ |
2140 | pass |
2141 | |
2142 | def rescueIfLost(self, logger=None): |
2143 | - return rescueBuilderIfLost(self, logger) |
2144 | + rescueBuilderIfLost(self, logger) |
2145 | |
2146 | def updateStatus(self, logger=None): |
2147 | - return defer.maybeDeferred(updateBuilderStatus, self, logger) |
2148 | + updateBuilderStatus(self, logger) |
2149 | |
2150 | |
2151 | # XXX: It would be *really* nice to run some set of tests against the real |
2152 | @@ -108,44 +95,36 @@ |
2153 | self.arch_tag = arch_tag |
2154 | |
2155 | def status(self): |
2156 | - return defer.succeed(('BuilderStatus.IDLE', '')) |
2157 | + return ('BuilderStatus.IDLE', '') |
2158 | |
2159 | def ensurepresent(self, sha1, url, user=None, password=None): |
2160 | self.call_log.append(('ensurepresent', url, user, password)) |
2161 | - return defer.succeed((True, None)) |
2162 | + return True, None |
2163 | |
2164 | def build(self, buildid, buildtype, chroot, filemap, args): |
2165 | self.call_log.append( |
2166 | ('build', buildid, buildtype, chroot, filemap.keys(), args)) |
2167 | info = 'OkSlave BUILDING' |
2168 | - return defer.succeed(('BuildStatus.Building', info)) |
2169 | + return ('BuildStatus.Building', info) |
2170 | |
2171 | def echo(self, *args): |
2172 | self.call_log.append(('echo',) + args) |
2173 | - return defer.succeed(args) |
2174 | + return args |
2175 | |
2176 | def clean(self): |
2177 | self.call_log.append('clean') |
2178 | - return defer.succeed(None) |
2179 | |
2180 | def abort(self): |
2181 | self.call_log.append('abort') |
2182 | - return defer.succeed(None) |
2183 | |
2184 | def info(self): |
2185 | self.call_log.append('info') |
2186 | - return defer.succeed(('1.0', self.arch_tag, 'debian')) |
2187 | - |
2188 | - def resume(self): |
2189 | - self.call_log.append('resume') |
2190 | - return defer.succeed(("", "", 0)) |
2191 | + return ('1.0', self.arch_tag, 'debian') |
2192 | |
2193 | def sendFileToSlave(self, sha1, url, username="", password=""): |
2194 | - d = self.ensurepresent(sha1, url, username, password) |
2195 | - def check_present((present, info)): |
2196 | - if not present: |
2197 | - raise CannotFetchFile(url, info) |
2198 | - return d.addCallback(check_present) |
2199 | + present, info = self.ensurepresent(sha1, url, username, password) |
2200 | + if not present: |
2201 | + raise CannotFetchFile(url, info) |
2202 | |
2203 | def cacheFile(self, logger, libraryfilealias): |
2204 | return self.sendFileToSlave( |
2205 | @@ -162,11 +141,9 @@ |
2206 | def status(self): |
2207 | self.call_log.append('status') |
2208 | buildlog = xmlrpclib.Binary("This is a build log") |
2209 | - return defer.succeed( |
2210 | - ('BuilderStatus.BUILDING', self.build_id, buildlog)) |
2211 | + return ('BuilderStatus.BUILDING', self.build_id, buildlog) |
2212 | |
2213 | def getFile(self, sum): |
2214 | - # XXX: This needs to be updated to return a Deferred. |
2215 | self.call_log.append('getFile') |
2216 | if sum == "buildlog": |
2217 | s = StringIO("This is a build log") |
2218 | @@ -178,15 +155,11 @@ |
2219 | """A mock slave that looks like it's currently waiting.""" |
2220 | |
2221 | def __init__(self, state='BuildStatus.OK', dependencies=None, |
2222 | - build_id='1-1', filemap=None): |
2223 | + build_id='1-1'): |
2224 | super(WaitingSlave, self).__init__() |
2225 | self.state = state |
2226 | self.dependencies = dependencies |
2227 | self.build_id = build_id |
2228 | - if filemap is None: |
2229 | - self.filemap = {} |
2230 | - else: |
2231 | - self.filemap = filemap |
2232 | |
2233 | # By default, the slave only has a buildlog, but callsites |
2234 | # can update this list as needed. |
2235 | @@ -194,12 +167,10 @@ |
2236 | |
2237 | def status(self): |
2238 | self.call_log.append('status') |
2239 | - return defer.succeed(( |
2240 | - 'BuilderStatus.WAITING', self.state, self.build_id, self.filemap, |
2241 | - self.dependencies)) |
2242 | + return ('BuilderStatus.WAITING', self.state, self.build_id, {}, |
2243 | + self.dependencies) |
2244 | |
2245 | def getFile(self, hash): |
2246 | - # XXX: This needs to be updated to return a Deferred. |
2247 | self.call_log.append('getFile') |
2248 | if hash in self.valid_file_hashes: |
2249 | content = "This is a %s" % hash |
2250 | @@ -213,19 +184,15 @@ |
2251 | |
2252 | def status(self): |
2253 | self.call_log.append('status') |
2254 | - return defer.succeed(('BuilderStatus.ABORTING', '1-1')) |
2255 | + return ('BuilderStatus.ABORTING', '1-1') |
2256 | |
2257 | |
2258 | class AbortedSlave(OkSlave): |
2259 | """A mock slave that looks like it's aborted.""" |
2260 | |
2261 | - def clean(self): |
2262 | + def status(self): |
2263 | self.call_log.append('status') |
2264 | - return defer.succeed(None) |
2265 | - |
2266 | - def status(self): |
2267 | - self.call_log.append('clean') |
2268 | - return defer.succeed(('BuilderStatus.ABORTED', '1-1')) |
2269 | + return ('BuilderStatus.ABORTED', '1-1') |
2270 | |
2271 | |
2272 | class LostBuildingBrokenSlave: |
2273 | @@ -239,108 +206,16 @@ |
2274 | |
2275 | def status(self): |
2276 | self.call_log.append('status') |
2277 | - return defer.succeed(('BuilderStatus.BUILDING', '1000-10000')) |
2278 | + return ('BuilderStatus.BUILDING', '1000-10000') |
2279 | |
2280 | def abort(self): |
2281 | self.call_log.append('abort') |
2282 | - return defer.fail(xmlrpclib.Fault(8002, "Could not abort")) |
2283 | + raise xmlrpclib.Fault(8002, "Could not abort") |
2284 | |
2285 | |
2286 | class BrokenSlave: |
2287 | """A mock slave that reports that it is broken.""" |
2288 | |
2289 | - def __init__(self): |
2290 | - self.call_log = [] |
2291 | - |
2292 | def status(self): |
2293 | self.call_log.append('status') |
2294 | - return defer.fail(xmlrpclib.Fault(8001, "Broken slave")) |
2295 | - |
2296 | - |
2297 | -class CorruptBehavior: |
2298 | - |
2299 | - def verifySlaveBuildCookie(self, cookie): |
2300 | - raise CorruptBuildCookie("Bad value: %r" % (cookie,)) |
2301 | - |
2302 | - |
2303 | -class TrivialBehavior: |
2304 | - |
2305 | - def verifySlaveBuildCookie(self, cookie): |
2306 | - pass |
2307 | - |
2308 | - |
2309 | -class DeadProxy(xmlrpc.Proxy): |
2310 | - """An xmlrpc.Proxy that doesn't actually send any messages. |
2311 | - |
2312 | - Used when you want to test timeouts, for example. |
2313 | - """ |
2314 | - |
2315 | - def callRemote(self, *args, **kwargs): |
2316 | - return defer.Deferred() |
2317 | - |
2318 | - |
2319 | -class SlaveTestHelpers(fixtures.Fixture): |
2320 | - |
2321 | - # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`. |
2322 | - BASE_URL = 'http://localhost:8221' |
2323 | - TEST_URL = '%s/rpc/' % (BASE_URL,) |
2324 | - |
2325 | - def getServerSlave(self): |
2326 | - """Set up a test build slave server. |
2327 | - |
2328 | - :return: A `BuilddSlaveTestSetup` object. |
2329 | - """ |
2330 | - tachandler = BuilddSlaveTestSetup() |
2331 | - tachandler.setUp() |
2332 | - # Basically impossible to do this w/ TrialTestCase. But it would be |
2333 | - # really nice to keep it. |
2334 | - # |
2335 | - # def addLogFile(exc_info): |
2336 | - # self.addDetail( |
2337 | - # 'xmlrpc-log-file', |
2338 | - # Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read())) |
2339 | - # self.addOnException(addLogFile) |
2340 | - self.addCleanup(tachandler.tearDown) |
2341 | - return tachandler |
2342 | - |
2343 | - def getClientSlave(self, reactor=None, proxy=None): |
2344 | - """Return a `BuilderSlave` for use in testing. |
2345 | - |
2346 | - Points to a fixed URL that is also used by `BuilddSlaveTestSetup`. |
2347 | - """ |
2348 | - return BuilderSlave.makeBuilderSlave( |
2349 | - self.TEST_URL, 'vmhost', reactor, proxy) |
2350 | - |
2351 | - def makeCacheFile(self, tachandler, filename): |
2352 | - """Make a cache file available on the remote slave. |
2353 | - |
2354 | - :param tachandler: The TacTestSetup object used to start the remote |
2355 | - slave. |
2356 | - :param filename: The name of the file to create in the file cache |
2357 | - area. |
2358 | - """ |
2359 | - path = os.path.join(tachandler.root, 'filecache', filename) |
2360 | - fd = open(path, 'w') |
2361 | - fd.write('something') |
2362 | - fd.close() |
2363 | - self.addCleanup(os.unlink, path) |
2364 | - |
2365 | - def triggerGoodBuild(self, slave, build_id=None): |
2366 | - """Trigger a good build on 'slave'. |
2367 | - |
2368 | - :param slave: A `BuilderSlave` instance to trigger the build on. |
2369 | - :param build_id: The build identifier. If not specified, defaults to |
2370 | - an arbitrary string. |
2371 | - :type build_id: str |
2372 | - :return: The build id returned by the slave. |
2373 | - """ |
2374 | - if build_id is None: |
2375 | - build_id = 'random-build-id' |
2376 | - tachandler = self.getServerSlave() |
2377 | - chroot_file = 'fake-chroot' |
2378 | - dsc_file = 'thing' |
2379 | - self.makeCacheFile(tachandler, chroot_file) |
2380 | - self.makeCacheFile(tachandler, dsc_file) |
2381 | - return slave.build( |
2382 | - build_id, 'debian', chroot_file, {'.dsc': dsc_file}, |
2383 | - {'ogrecomponent': 'main'}) |
2384 | + raise xmlrpclib.Fault(8001, "Broken slave") |
2385 | |
2386 | === modified file 'lib/lp/buildmaster/tests/test_builder.py' |
2387 | --- lib/lp/buildmaster/tests/test_builder.py 2010-10-18 16:44:22 +0000 |
2388 | +++ lib/lp/buildmaster/tests/test_builder.py 2010-12-07 16:24:04 +0000 |
2389 | @@ -3,24 +3,20 @@ |
2390 | |
2391 | """Test Builder features.""" |
2392 | |
2393 | +import errno |
2394 | import os |
2395 | -import signal |
2396 | +import socket |
2397 | import xmlrpclib |
2398 | |
2399 | -from twisted.web.client import getPage |
2400 | - |
2401 | -from twisted.internet.defer import CancelledError |
2402 | -from twisted.internet.task import Clock |
2403 | -from twisted.python.failure import Failure |
2404 | -from twisted.trial.unittest import TestCase as TrialTestCase |
2405 | +from testtools.content import Content |
2406 | +from testtools.content_type import UTF8_TEXT |
2407 | |
2408 | from zope.component import getUtility |
2409 | from zope.security.proxy import removeSecurityProxy |
2410 | |
2411 | from canonical.buildd.slave import BuilderStatus |
2412 | -from canonical.config import config |
2413 | +from canonical.buildd.tests.harness import BuilddSlaveTestSetup |
2414 | from canonical.database.sqlbase import flush_database_updates |
2415 | -from canonical.launchpad.scripts import QuietFakeLogger |
2416 | from canonical.launchpad.webapp.interfaces import ( |
2417 | DEFAULT_FLAVOR, |
2418 | IStoreSelector, |
2419 | @@ -28,38 +24,21 @@ |
2420 | ) |
2421 | from canonical.testing.layers import ( |
2422 | DatabaseFunctionalLayer, |
2423 | - LaunchpadZopelessLayer, |
2424 | - TwistedLaunchpadZopelessLayer, |
2425 | - TwistedLayer, |
2426 | + LaunchpadZopelessLayer |
2427 | ) |
2428 | from lp.buildmaster.enums import BuildStatus |
2429 | -from lp.buildmaster.interfaces.builder import ( |
2430 | - CannotFetchFile, |
2431 | - IBuilder, |
2432 | - IBuilderSet, |
2433 | - ) |
2434 | +from lp.buildmaster.interfaces.builder import IBuilder, IBuilderSet |
2435 | from lp.buildmaster.interfaces.buildfarmjobbehavior import ( |
2436 | IBuildFarmJobBehavior, |
2437 | ) |
2438 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
2439 | -from lp.buildmaster.interfaces.builder import CannotResumeHost |
2440 | +from lp.buildmaster.model.builder import BuilderSlave |
2441 | from lp.buildmaster.model.buildfarmjobbehavior import IdleBuildBehavior |
2442 | from lp.buildmaster.model.buildqueue import BuildQueue |
2443 | from lp.buildmaster.tests.mock_slaves import ( |
2444 | AbortedSlave, |
2445 | - AbortingSlave, |
2446 | - BrokenSlave, |
2447 | - BuildingSlave, |
2448 | - CorruptBehavior, |
2449 | - DeadProxy, |
2450 | - LostBuildingBrokenSlave, |
2451 | MockBuilder, |
2452 | - OkSlave, |
2453 | - SlaveTestHelpers, |
2454 | - TrivialBehavior, |
2455 | - WaitingSlave, |
2456 | ) |
2457 | -from lp.services.job.interfaces.job import JobStatus |
2458 | from lp.soyuz.enums import ( |
2459 | ArchivePurpose, |
2460 | PackagePublishingStatus, |
2461 | @@ -70,12 +49,9 @@ |
2462 | ) |
2463 | from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
2464 | from lp.testing import ( |
2465 | - ANONYMOUS, |
2466 | - login_as, |
2467 | - logout, |
2468 | + TestCase, |
2469 | TestCaseWithFactory, |
2470 | ) |
2471 | -from lp.testing.factory import LaunchpadObjectFactory |
2472 | from lp.testing.fakemethod import FakeMethod |
2473 | |
2474 | |
2475 | @@ -116,121 +92,42 @@ |
2476 | bq = builder.getBuildQueue() |
2477 | self.assertIs(None, bq) |
2478 | |
2479 | - |
2480 | -class TestBuilderWithTrial(TrialTestCase): |
2481 | - |
2482 | - layer = TwistedLaunchpadZopelessLayer |
2483 | - |
2484 | - def setUp(self): |
2485 | - super(TestBuilderWithTrial, self) |
2486 | - self.slave_helper = SlaveTestHelpers() |
2487 | - self.slave_helper.setUp() |
2488 | - self.addCleanup(self.slave_helper.cleanUp) |
2489 | - self.factory = LaunchpadObjectFactory() |
2490 | - login_as(ANONYMOUS) |
2491 | - self.addCleanup(logout) |
2492 | - |
2493 | - def test_updateStatus_aborts_lost_and_broken_slave(self): |
2494 | - # A slave that's 'lost' should be aborted; when the slave is |
2495 | - # broken then abort() should also throw a fault. |
2496 | - slave = LostBuildingBrokenSlave() |
2497 | - lostbuilding_builder = MockBuilder( |
2498 | - 'Lost Building Broken Slave', slave, behavior=CorruptBehavior()) |
2499 | - d = lostbuilding_builder.updateStatus(QuietFakeLogger()) |
2500 | - def check_slave_status(failure): |
2501 | - self.assertIn('abort', slave.call_log) |
2502 | - # 'Fault' comes from the LostBuildingBrokenSlave, this is |
2503 | - # just testing that the value is passed through. |
2504 | - self.assertIsInstance(failure.value, xmlrpclib.Fault) |
2505 | - return d.addBoth(check_slave_status) |
2506 | - |
2507 | - def test_resumeSlaveHost_nonvirtual(self): |
2508 | - builder = self.factory.makeBuilder(virtualized=False) |
2509 | - d = builder.resumeSlaveHost() |
2510 | - return self.assertFailure(d, CannotResumeHost) |
2511 | - |
2512 | - def test_resumeSlaveHost_no_vmhost(self): |
2513 | - builder = self.factory.makeBuilder(virtualized=True, vm_host=None) |
2514 | - d = builder.resumeSlaveHost() |
2515 | - return self.assertFailure(d, CannotResumeHost) |
2516 | - |
2517 | - def test_resumeSlaveHost_success(self): |
2518 | - reset_config = """ |
2519 | - [builddmaster] |
2520 | - vm_resume_command: /bin/echo -n parp""" |
2521 | - config.push('reset', reset_config) |
2522 | - self.addCleanup(config.pop, 'reset') |
2523 | - |
2524 | - builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2525 | - d = builder.resumeSlaveHost() |
2526 | - def got_resume(output): |
2527 | - self.assertEqual(('parp', ''), output) |
2528 | - return d.addCallback(got_resume) |
2529 | - |
2530 | - def test_resumeSlaveHost_command_failed(self): |
2531 | - reset_fail_config = """ |
2532 | - [builddmaster] |
2533 | - vm_resume_command: /bin/false""" |
2534 | - config.push('reset fail', reset_fail_config) |
2535 | - self.addCleanup(config.pop, 'reset fail') |
2536 | - builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2537 | - d = builder.resumeSlaveHost() |
2538 | - return self.assertFailure(d, CannotResumeHost) |
2539 | - |
2540 | - def test_handleTimeout_resume_failure(self): |
2541 | - reset_fail_config = """ |
2542 | - [builddmaster] |
2543 | - vm_resume_command: /bin/false""" |
2544 | - config.push('reset fail', reset_fail_config) |
2545 | - self.addCleanup(config.pop, 'reset fail') |
2546 | - builder = self.factory.makeBuilder(virtualized=True, vm_host="pop") |
2547 | - builder.builderok = True |
2548 | - d = builder.handleTimeout(QuietFakeLogger(), 'blah') |
2549 | - return self.assertFailure(d, CannotResumeHost) |
2550 | - |
2551 | - def _setupRecipeBuildAndBuilder(self): |
2552 | - # Helper function to make a builder capable of building a |
2553 | - # recipe, returning both. |
2554 | - processor = self.factory.makeProcessor(name="i386") |
2555 | - builder = self.factory.makeBuilder( |
2556 | - processor=processor, virtualized=True, vm_host="bladh") |
2557 | - builder.setSlaveForTesting(OkSlave()) |
2558 | - distroseries = self.factory.makeDistroSeries() |
2559 | - das = self.factory.makeDistroArchSeries( |
2560 | - distroseries=distroseries, architecturetag="i386", |
2561 | - processorfamily=processor.family) |
2562 | - chroot = self.factory.makeLibraryFileAlias() |
2563 | - das.addOrUpdateChroot(chroot) |
2564 | - distroseries.nominatedarchindep = das |
2565 | - build = self.factory.makeSourcePackageRecipeBuild( |
2566 | - distroseries=distroseries) |
2567 | - return builder, build |
2568 | - |
2569 | - def test_findAndStartJob_returns_candidate(self): |
2570 | - # findAndStartJob finds the next queued job using _findBuildCandidate. |
2571 | - # We don't care about the type of build at all. |
2572 | - builder, build = self._setupRecipeBuildAndBuilder() |
2573 | - candidate = build.queueBuild() |
2574 | - # _findBuildCandidate is tested elsewhere, we just make sure that |
2575 | - # findAndStartJob delegates to it. |
2576 | - removeSecurityProxy(builder)._findBuildCandidate = FakeMethod( |
2577 | - result=candidate) |
2578 | - d = builder.findAndStartJob() |
2579 | - return d.addCallback(self.assertEqual, candidate) |
2580 | - |
2581 | - def test_findAndStartJob_starts_job(self): |
2582 | - # findAndStartJob finds the next queued job using _findBuildCandidate |
2583 | - # and then starts it. |
2584 | - # We don't care about the type of build at all. |
2585 | - builder, build = self._setupRecipeBuildAndBuilder() |
2586 | - candidate = build.queueBuild() |
2587 | - removeSecurityProxy(builder)._findBuildCandidate = FakeMethod( |
2588 | - result=candidate) |
2589 | - d = builder.findAndStartJob() |
2590 | - def check_build_started(candidate): |
2591 | - self.assertEqual(candidate.builder, builder) |
2592 | - self.assertEqual(BuildStatus.BUILDING, build.status) |
2593 | - return d.addCallback(check_build_started) |
2594 | + def test_updateBuilderStatus_catches_repeated_EINTR(self): |
2595 | + # A single EINTR return from a socket operation should cause the |
2596 | + # operation to be retried, not fail/reset the builder. |
2597 | + builder = removeSecurityProxy(self.factory.makeBuilder()) |
2598 | + builder.handleTimeout = FakeMethod() |
2599 | + builder.rescueIfLost = FakeMethod() |
2600 | + |
2601 | + def _fake_checkSlaveAlive(): |
2602 | + # Raise an EINTR error for all invocations. |
2603 | + raise socket.error(errno.EINTR, "fake eintr") |
2604 | + |
2605 | + builder.checkSlaveAlive = _fake_checkSlaveAlive |
2606 | + builder.updateStatus() |
2607 | + |
2608 | + # builder.updateStatus should eventually have called |
2609 | + # handleTimeout() |
2610 | + self.assertEqual(1, builder.handleTimeout.call_count) |
2611 | + |
2612 | + def test_updateBuilderStatus_catches_single_EINTR(self): |
2613 | + builder = removeSecurityProxy(self.factory.makeBuilder()) |
2614 | + builder.handleTimeout = FakeMethod() |
2615 | + builder.rescueIfLost = FakeMethod() |
2616 | + self.eintr_returned = False |
2617 | + |
2618 | + def _fake_checkSlaveAlive(): |
2619 | + # raise an EINTR error for the first invocation only. |
2620 | + if not self.eintr_returned: |
2621 | + self.eintr_returned = True |
2622 | + raise socket.error(errno.EINTR, "fake eintr") |
2623 | + |
2624 | + builder.checkSlaveAlive = _fake_checkSlaveAlive |
2625 | + builder.updateStatus() |
2626 | + |
2627 | + # builder.updateStatus should never call handleTimeout() for a |
2628 | + # single EINTR. |
2629 | + self.assertEqual(0, builder.handleTimeout.call_count) |
2630 | |
2631 | def test_slave(self): |
2632 | # Builder.slave is a BuilderSlave that points at the actual Builder. |
2633 | @@ -239,147 +136,25 @@ |
2634 | builder = removeSecurityProxy(self.factory.makeBuilder()) |
2635 | self.assertEqual(builder.url, builder.slave.url) |
2636 | |
2637 | + |
2638 | +class Test_rescueBuilderIfLost(TestCaseWithFactory): |
2639 | + """Tests for lp.buildmaster.model.builder.rescueBuilderIfLost.""" |
2640 | + |
2641 | + layer = LaunchpadZopelessLayer |
2642 | + |
2643 | def test_recovery_of_aborted_slave(self): |
2644 | # If a slave is in the ABORTED state, rescueBuilderIfLost should |
2645 | # clean it if we don't think it's currently building anything. |
2646 | # See bug 463046. |
2647 | aborted_slave = AbortedSlave() |
2648 | + # The slave's clean() method is normally an XMLRPC call, so we |
2649 | + # can just stub it out and check that it got called. |
2650 | + aborted_slave.clean = FakeMethod() |
2651 | builder = MockBuilder("mock_builder", aborted_slave) |
2652 | builder.currentjob = None |
2653 | - d = builder.rescueIfLost() |
2654 | - def check_slave_calls(ignored): |
2655 | - self.assertIn('clean', aborted_slave.call_log) |
2656 | - return d.addCallback(check_slave_calls) |
2657 | - |
2658 | - def test_recover_ok_slave(self): |
2659 | - # An idle slave is not rescued. |
2660 | - slave = OkSlave() |
2661 | - builder = MockBuilder("mock_builder", slave, TrivialBehavior()) |
2662 | - d = builder.rescueIfLost() |
2663 | - def check_slave_calls(ignored): |
2664 | - self.assertNotIn('abort', slave.call_log) |
2665 | - self.assertNotIn('clean', slave.call_log) |
2666 | - return d.addCallback(check_slave_calls) |
2667 | - |
2668 | - def test_recover_waiting_slave_with_good_id(self): |
2669 | - # rescueIfLost does not attempt to abort or clean a builder that is |
2670 | - # WAITING. |
2671 | - waiting_slave = WaitingSlave() |
2672 | - builder = MockBuilder("mock_builder", waiting_slave, TrivialBehavior()) |
2673 | - d = builder.rescueIfLost() |
2674 | - def check_slave_calls(ignored): |
2675 | - self.assertNotIn('abort', waiting_slave.call_log) |
2676 | - self.assertNotIn('clean', waiting_slave.call_log) |
2677 | - return d.addCallback(check_slave_calls) |
2678 | - |
2679 | - def test_recover_waiting_slave_with_bad_id(self): |
2680 | - # If a slave is WAITING with a build for us to get, and the build |
2681 | - # cookie cannot be verified, which means we don't recognize the build, |
2682 | - # then rescueBuilderIfLost should attempt to abort it, so that the |
2683 | - # builder is reset for a new build, and the corrupt build is |
2684 | - # discarded. |
2685 | - waiting_slave = WaitingSlave() |
2686 | - builder = MockBuilder("mock_builder", waiting_slave, CorruptBehavior()) |
2687 | - d = builder.rescueIfLost() |
2688 | - def check_slave_calls(ignored): |
2689 | - self.assertNotIn('abort', waiting_slave.call_log) |
2690 | - self.assertIn('clean', waiting_slave.call_log) |
2691 | - return d.addCallback(check_slave_calls) |
2692 | - |
2693 | - def test_recover_building_slave_with_good_id(self): |
2694 | - # rescueIfLost does not attempt to abort or clean a builder that is |
2695 | - # BUILDING. |
2696 | - building_slave = BuildingSlave() |
2697 | - builder = MockBuilder("mock_builder", building_slave, TrivialBehavior()) |
2698 | - d = builder.rescueIfLost() |
2699 | - def check_slave_calls(ignored): |
2700 | - self.assertNotIn('abort', building_slave.call_log) |
2701 | - self.assertNotIn('clean', building_slave.call_log) |
2702 | - return d.addCallback(check_slave_calls) |
2703 | - |
2704 | - def test_recover_building_slave_with_bad_id(self): |
2705 | - # If a slave is BUILDING with a build id we don't recognize, then we |
2706 | - # abort the build, thus stopping it in its tracks. |
2707 | - building_slave = BuildingSlave() |
2708 | - builder = MockBuilder("mock_builder", building_slave, CorruptBehavior()) |
2709 | - d = builder.rescueIfLost() |
2710 | - def check_slave_calls(ignored): |
2711 | - self.assertIn('abort', building_slave.call_log) |
2712 | - self.assertNotIn('clean', building_slave.call_log) |
2713 | - return d.addCallback(check_slave_calls) |
2714 | - |
2715 | - |
2716 | -class TestBuilderSlaveStatus(TestBuilderWithTrial): |
2717 | - |
2718 | - # Verify what IBuilder.slaveStatus returns with slaves in different |
2719 | - # states. |
2720 | - |
2721 | - def assertStatus(self, slave, builder_status=None, |
2722 | - build_status=None, logtail=False, filemap=None, |
2723 | - dependencies=None): |
2724 | - builder = self.factory.makeBuilder() |
2725 | - builder.setSlaveForTesting(slave) |
2726 | - d = builder.slaveStatus() |
2727 | - |
2728 | - def got_status(status_dict): |
2729 | - expected = {} |
2730 | - if builder_status is not None: |
2731 | - expected["builder_status"] = builder_status |
2732 | - if build_status is not None: |
2733 | - expected["build_status"] = build_status |
2734 | - if dependencies is not None: |
2735 | - expected["dependencies"] = dependencies |
2736 | - |
2737 | - # We don't care so much about the content of the logtail, |
2738 | - # just that it's there. |
2739 | - if logtail: |
2740 | - tail = status_dict.pop("logtail") |
2741 | - self.assertIsInstance(tail, xmlrpclib.Binary) |
2742 | - |
2743 | - self.assertEqual(expected, status_dict) |
2744 | - |
2745 | - return d.addCallback(got_status) |
2746 | - |
2747 | - def test_slaveStatus_idle_slave(self): |
2748 | - self.assertStatus( |
2749 | - OkSlave(), builder_status='BuilderStatus.IDLE') |
2750 | - |
2751 | - def test_slaveStatus_building_slave(self): |
2752 | - self.assertStatus( |
2753 | - BuildingSlave(), builder_status='BuilderStatus.BUILDING', |
2754 | - logtail=True) |
2755 | - |
2756 | - def test_slaveStatus_waiting_slave(self): |
2757 | - self.assertStatus( |
2758 | - WaitingSlave(), builder_status='BuilderStatus.WAITING', |
2759 | - build_status='BuildStatus.OK', filemap={}) |
2760 | - |
2761 | - def test_slaveStatus_aborting_slave(self): |
2762 | - self.assertStatus( |
2763 | - AbortingSlave(), builder_status='BuilderStatus.ABORTING') |
2764 | - |
2765 | - def test_slaveStatus_aborted_slave(self): |
2766 | - self.assertStatus( |
2767 | - AbortedSlave(), builder_status='BuilderStatus.ABORTED') |
2768 | - |
2769 | - def test_isAvailable_with_not_builderok(self): |
2770 | - # isAvailable() is a wrapper around slaveStatusSentence() |
2771 | - builder = self.factory.makeBuilder() |
2772 | - builder.builderok = False |
2773 | - d = builder.isAvailable() |
2774 | - return d.addCallback(self.assertFalse) |
2775 | - |
2776 | - def test_isAvailable_with_slave_fault(self): |
2777 | - builder = self.factory.makeBuilder() |
2778 | - builder.setSlaveForTesting(BrokenSlave()) |
2779 | - d = builder.isAvailable() |
2780 | - return d.addCallback(self.assertFalse) |
2781 | - |
2782 | - def test_isAvailable_with_slave_idle(self): |
2783 | - builder = self.factory.makeBuilder() |
2784 | - builder.setSlaveForTesting(OkSlave()) |
2785 | - d = builder.isAvailable() |
2786 | - return d.addCallback(self.assertTrue) |
2787 | + builder.rescueIfLost() |
2788 | + |
2789 | + self.assertEqual(1, aborted_slave.clean.call_count) |
2790 | |
2791 | |
2792 | class TestFindBuildCandidateBase(TestCaseWithFactory): |
2793 | @@ -413,49 +188,6 @@ |
2794 | builder.manual = False |
2795 | |
2796 | |
2797 | -class TestFindBuildCandidateGeneralCases(TestFindBuildCandidateBase): |
2798 | - # Test usage of findBuildCandidate not specific to any archive type. |
2799 | - |
2800 | - def test_findBuildCandidate_supersedes_builds(self): |
2801 | - # IBuilder._findBuildCandidate identifies if there are builds |
2802 | - # for superseded source package releases in the queue and marks |
2803 | - # the corresponding build record as SUPERSEDED. |
2804 | - archive = self.factory.makeArchive() |
2805 | - self.publisher.getPubSource( |
2806 | - sourcename="gedit", status=PackagePublishingStatus.PUBLISHED, |
2807 | - archive=archive).createMissingBuilds() |
2808 | - old_candidate = removeSecurityProxy( |
2809 | - self.frog_builder)._findBuildCandidate() |
2810 | - |
2811 | - # The candidate starts off as NEEDSBUILD: |
2812 | - build = getUtility(IBinaryPackageBuildSet).getByQueueEntry( |
2813 | - old_candidate) |
2814 | - self.assertEqual(BuildStatus.NEEDSBUILD, build.status) |
2815 | - |
2816 | - # Now supersede the source package: |
2817 | - publication = build.current_source_publication |
2818 | - publication.status = PackagePublishingStatus.SUPERSEDED |
2819 | - |
2820 | - # The candidate returned is now a different one: |
2821 | - new_candidate = removeSecurityProxy( |
2822 | - self.frog_builder)._findBuildCandidate() |
2823 | - self.assertNotEqual(new_candidate, old_candidate) |
2824 | - |
2825 | - # And the old_candidate is superseded: |
2826 | - self.assertEqual(BuildStatus.SUPERSEDED, build.status) |
2827 | - |
2828 | - def test_acquireBuildCandidate_marks_building(self): |
2829 | - # acquireBuildCandidate() should call _findBuildCandidate and |
2830 | - # mark the build as building. |
2831 | - archive = self.factory.makeArchive() |
2832 | - self.publisher.getPubSource( |
2833 | - sourcename="gedit", status=PackagePublishingStatus.PUBLISHED, |
2834 | - archive=archive).createMissingBuilds() |
2835 | - candidate = removeSecurityProxy( |
2836 | - self.frog_builder).acquireBuildCandidate() |
2837 | - self.assertEqual(JobStatus.RUNNING, candidate.job.status) |
2838 | - |
2839 | - |
2840 | class TestFindBuildCandidatePPAWithSingleBuilder(TestCaseWithFactory): |
2841 | |
2842 | layer = LaunchpadZopelessLayer |
2843 | @@ -588,16 +320,6 @@ |
2844 | build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job) |
2845 | self.failUnlessEqual('joesppa', build.archive.name) |
2846 | |
2847 | - def test_findBuildCandidate_with_disabled_archive(self): |
2848 | - # Disabled archives should not be considered for dispatching |
2849 | - # builds. |
2850 | - disabled_job = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2851 | - build = getUtility(IBinaryPackageBuildSet).getByQueueEntry( |
2852 | - disabled_job) |
2853 | - build.archive.disable() |
2854 | - next_job = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2855 | - self.assertNotEqual(disabled_job, next_job) |
2856 | - |
2857 | |
2858 | class TestFindBuildCandidatePrivatePPA(TestFindBuildCandidatePPABase): |
2859 | |
2860 | @@ -610,14 +332,6 @@ |
2861 | build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(next_job) |
2862 | self.failUnlessEqual('joesppa', build.archive.name) |
2863 | |
2864 | - # If the source for the build is still pending, it won't be |
2865 | - # dispatched because the builder has to fetch the source files |
2866 | - # from the (password protected) repo area, not the librarian. |
2867 | - pub = build.current_source_publication |
2868 | - pub.status = PackagePublishingStatus.PENDING |
2869 | - candidate = removeSecurityProxy(self.builder4)._findBuildCandidate() |
2870 | - self.assertNotEqual(next_job.id, candidate.id) |
2871 | - |
2872 | |
2873 | class TestFindBuildCandidateDistroArchive(TestFindBuildCandidateBase): |
2874 | |
2875 | @@ -760,48 +474,97 @@ |
2876 | self.builder.current_build_behavior, BinaryPackageBuildBehavior) |
2877 | |
2878 | |
2879 | -class TestSlave(TrialTestCase): |
2880 | +class TestSlave(TestCase): |
2881 | """ |
2882 | Integration tests for BuilderSlave that verify how it works against a |
2883 | real slave server. |
2884 | """ |
2885 | |
2886 | - layer = TwistedLayer |
2887 | - |
2888 | - def setUp(self): |
2889 | - super(TestSlave, self).setUp() |
2890 | - self.slave_helper = SlaveTestHelpers() |
2891 | - self.slave_helper.setUp() |
2892 | - self.addCleanup(self.slave_helper.cleanUp) |
2893 | - |
2894 | # XXX: JonathanLange 2010-09-20 bug=643521: There are also tests for |
2895 | # BuilderSlave in buildd-slave.txt and in other places. The tests here |
2896 | # ought to become the canonical tests for BuilderSlave vs running buildd |
2897 | # XML-RPC server interaction. |
2898 | |
2899 | + # The URL for the XML-RPC service set up by `BuilddSlaveTestSetup`. |
2900 | + TEST_URL = 'http://localhost:8221/rpc/' |
2901 | + |
2902 | + def getServerSlave(self): |
2903 | + """Set up a test build slave server. |
2904 | + |
2905 | + :return: A `BuilddSlaveTestSetup` object. |
2906 | + """ |
2907 | + tachandler = BuilddSlaveTestSetup() |
2908 | + tachandler.setUp() |
2909 | + self.addCleanup(tachandler.tearDown) |
2910 | + def addLogFile(exc_info): |
2911 | + self.addDetail( |
2912 | + 'xmlrpc-log-file', |
2913 | + Content(UTF8_TEXT, lambda: open(tachandler.logfile, 'r').read())) |
2914 | + self.addOnException(addLogFile) |
2915 | + return tachandler |
2916 | + |
2917 | + def getClientSlave(self): |
2918 | + """Return a `BuilderSlave` for use in testing. |
2919 | + |
2920 | + Points to a fixed URL that is also used by `BuilddSlaveTestSetup`. |
2921 | + """ |
2922 | + return BuilderSlave.makeBlockingSlave(self.TEST_URL, 'vmhost') |
2923 | + |
2924 | + def makeCacheFile(self, tachandler, filename): |
2925 | + """Make a cache file available on the remote slave. |
2926 | + |
2927 | + :param tachandler: The TacTestSetup object used to start the remote |
2928 | + slave. |
2929 | + :param filename: The name of the file to create in the file cache |
2930 | + area. |
2931 | + """ |
2932 | + path = os.path.join(tachandler.root, 'filecache', filename) |
2933 | + fd = open(path, 'w') |
2934 | + fd.write('something') |
2935 | + fd.close() |
2936 | + self.addCleanup(os.unlink, path) |
2937 | + |
2938 | + def triggerGoodBuild(self, slave, build_id=None): |
2939 | + """Trigger a good build on 'slave'. |
2940 | + |
2941 | + :param slave: A `BuilderSlave` instance to trigger the build on. |
2942 | + :param build_id: The build identifier. If not specified, defaults to |
2943 | + an arbitrary string. |
2944 | + :type build_id: str |
2945 | + :return: The build id returned by the slave. |
2946 | + """ |
2947 | + if build_id is None: |
2948 | + build_id = self.getUniqueString() |
2949 | + tachandler = self.getServerSlave() |
2950 | + chroot_file = 'fake-chroot' |
2951 | + dsc_file = 'thing' |
2952 | + self.makeCacheFile(tachandler, chroot_file) |
2953 | + self.makeCacheFile(tachandler, dsc_file) |
2954 | + return slave.build( |
2955 | + build_id, 'debian', chroot_file, {'.dsc': dsc_file}, |
2956 | + {'ogrecomponent': 'main'}) |
2957 | + |
2958 | # XXX 2010-10-06 Julian bug=655559 |
2959 | # This is failing on buildbot but not locally; it's trying to abort |
2960 | # before the build has started. |
2961 | def disabled_test_abort(self): |
2962 | - slave = self.slave_helper.getClientSlave() |
2963 | + slave = self.getClientSlave() |
2964 | # We need to be in a BUILDING state before we can abort. |
2965 | - d = self.slave_helper.triggerGoodBuild(slave) |
2966 | - d.addCallback(lambda ignored: slave.abort()) |
2967 | - d.addCallback(self.assertEqual, BuilderStatus.ABORTING) |
2968 | - return d |
2969 | + self.triggerGoodBuild(slave) |
2970 | + result = slave.abort() |
2971 | + self.assertEqual(result, BuilderStatus.ABORTING) |
2972 | |
2973 | def test_build(self): |
2974 | # Calling 'build' with an expected builder type, a good build id, |
2975 | # valid chroot & filemaps works and returns a BuilderStatus of |
2976 | # BUILDING. |
2977 | build_id = 'some-id' |
2978 | - slave = self.slave_helper.getClientSlave() |
2979 | - d = self.slave_helper.triggerGoodBuild(slave, build_id) |
2980 | - return d.addCallback( |
2981 | - self.assertEqual, [BuilderStatus.BUILDING, build_id]) |
2982 | + slave = self.getClientSlave() |
2983 | + result = self.triggerGoodBuild(slave, build_id) |
2984 | + self.assertEqual([BuilderStatus.BUILDING, build_id], result) |
2985 | |
2986 | def test_clean(self): |
2987 | - slave = self.slave_helper.getClientSlave() |
2988 | + slave = self.getClientSlave() |
2989 | # XXX: JonathanLange 2010-09-21: Calling clean() on the slave requires |
2990 | # it to be in either the WAITING or ABORTED states, and both of these |
2991 | # states are very difficult to achieve in a test environment. For the |
2992 | @@ -811,248 +574,57 @@ |
2993 | def test_echo(self): |
2994 | # Calling 'echo' contacts the server which returns the arguments we |
2995 | # gave it. |
2996 | - self.slave_helper.getServerSlave() |
2997 | - slave = self.slave_helper.getClientSlave() |
2998 | - d = slave.echo('foo', 'bar', 42) |
2999 | - return d.addCallback(self.assertEqual, ['foo', 'bar', 42]) |
3000 | + self.getServerSlave() |
3001 | + slave = self.getClientSlave() |
3002 | + result = slave.echo('foo', 'bar', 42) |
3003 | + self.assertEqual(['foo', 'bar', 42], result) |
3004 | |
3005 | def test_info(self): |
3006 | # Calling 'info' gets some information about the slave. |
3007 | - self.slave_helper.getServerSlave() |
3008 | - slave = self.slave_helper.getClientSlave() |
3009 | - d = slave.info() |
3010 | + self.getServerSlave() |
3011 | + slave = self.getClientSlave() |
3012 | + result = slave.info() |
3013 | # We're testing the hard-coded values, since the version is hard-coded |
3014 | # into the remote slave, the supported build managers are hard-coded |
3015 | # into the tac file for the remote slave and config is returned from |
3016 | # the configuration file. |
3017 | - return d.addCallback( |
3018 | - self.assertEqual, |
3019 | + self.assertEqual( |
3020 | ['1.0', |
3021 | 'i386', |
3022 | ['sourcepackagerecipe', |
3023 | - 'translation-templates', 'binarypackage', 'debian']]) |
3024 | + 'translation-templates', 'binarypackage', 'debian']], |
3025 | + result) |
3026 | |
3027 | def test_initial_status(self): |
3028 | # Calling 'status' returns the current status of the slave. The |
3029 | # initial status is IDLE. |
3030 | - self.slave_helper.getServerSlave() |
3031 | - slave = self.slave_helper.getClientSlave() |
3032 | - d = slave.status() |
3033 | - return d.addCallback(self.assertEqual, [BuilderStatus.IDLE, '']) |
3034 | + self.getServerSlave() |
3035 | + slave = self.getClientSlave() |
3036 | + status = slave.status() |
3037 | + self.assertEqual([BuilderStatus.IDLE, ''], status) |
3038 | |
3039 | def test_status_after_build(self): |
3040 | # Calling 'status' returns the current status of the slave. After a |
3041 | # build has been triggered, the status is BUILDING. |
3042 | - slave = self.slave_helper.getClientSlave() |
3043 | + slave = self.getClientSlave() |
3044 | build_id = 'status-build-id' |
3045 | - d = self.slave_helper.triggerGoodBuild(slave, build_id) |
3046 | - d.addCallback(lambda ignored: slave.status()) |
3047 | - def check_status(status): |
3048 | - self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2]) |
3049 | - [log_file] = status[2:] |
3050 | - self.assertIsInstance(log_file, xmlrpclib.Binary) |
3051 | - return d.addCallback(check_status) |
3052 | + self.triggerGoodBuild(slave, build_id) |
3053 | + status = slave.status() |
3054 | + self.assertEqual([BuilderStatus.BUILDING, build_id], status[:2]) |
3055 | + [log_file] = status[2:] |
3056 | + self.assertIsInstance(log_file, xmlrpclib.Binary) |
3057 | |
3058 | def test_ensurepresent_not_there(self): |
3059 | # ensurepresent checks to see if a file is there. |
3060 | - self.slave_helper.getServerSlave() |
3061 | - slave = self.slave_helper.getClientSlave() |
3062 | - d = slave.ensurepresent('blahblah', None, None, None) |
3063 | - d.addCallback(self.assertEqual, [False, 'No URL']) |
3064 | - return d |
3065 | + self.getServerSlave() |
3066 | + slave = self.getClientSlave() |
3067 | + result = slave.ensurepresent('blahblah', None, None, None) |
3068 | + self.assertEqual([False, 'No URL'], result) |
3069 | |
3070 | def test_ensurepresent_actually_there(self): |
3071 | # ensurepresent checks to see if a file is there. |
3072 | - tachandler = self.slave_helper.getServerSlave() |
3073 | - slave = self.slave_helper.getClientSlave() |
3074 | - self.slave_helper.makeCacheFile(tachandler, 'blahblah') |
3075 | - d = slave.ensurepresent('blahblah', None, None, None) |
3076 | - d.addCallback(self.assertEqual, [True, 'No URL']) |
3077 | - return d |
3078 | - |
3079 | - def test_sendFileToSlave_not_there(self): |
3080 | - self.slave_helper.getServerSlave() |
3081 | - slave = self.slave_helper.getClientSlave() |
3082 | - d = slave.sendFileToSlave('blahblah', None, None, None) |
3083 | - return self.assertFailure(d, CannotFetchFile) |
3084 | - |
3085 | - def test_sendFileToSlave_actually_there(self): |
3086 | - tachandler = self.slave_helper.getServerSlave() |
3087 | - slave = self.slave_helper.getClientSlave() |
3088 | - self.slave_helper.makeCacheFile(tachandler, 'blahblah') |
3089 | - d = slave.sendFileToSlave('blahblah', None, None, None) |
3090 | - def check_present(ignored): |
3091 | - d = slave.ensurepresent('blahblah', None, None, None) |
3092 | - return d.addCallback(self.assertEqual, [True, 'No URL']) |
3093 | - d.addCallback(check_present) |
3094 | - return d |
3095 | - |
3096 | - def test_resumeHost_success(self): |
3097 | - # On a successful resume resume() fires the returned deferred |
3098 | - # callback with 'None'. |
3099 | - self.slave_helper.getServerSlave() |
3100 | - slave = self.slave_helper.getClientSlave() |
3101 | - |
3102 | - # The configuration testing command-line. |
3103 | - self.assertEqual( |
3104 | - 'echo %(vm_host)s', config.builddmaster.vm_resume_command) |
3105 | - |
3106 | - # On success the response is None. |
3107 | - def check_resume_success(response): |
3108 | - out, err, code = response |
3109 | - self.assertEqual(os.EX_OK, code) |
3110 | - # XXX: JonathanLange 2010-09-23: We should instead pass the |
3111 | - # expected vm_host into the client slave. Not doing this now, |
3112 | - # since the SlaveHelper is being moved around. |
3113 | - self.assertEqual("%s\n" % slave._vm_host, out) |
3114 | - d = slave.resume() |
3115 | - d.addBoth(check_resume_success) |
3116 | - return d |
3117 | - |
3118 | - def test_resumeHost_failure(self): |
3119 | - # On a failed resume, 'resumeHost' fires the returned deferred |
3120 | - # errorback with the `ProcessTerminated` failure. |
3121 | - self.slave_helper.getServerSlave() |
3122 | - slave = self.slave_helper.getClientSlave() |
3123 | - |
3124 | - # Override the configuration command-line with one that will fail. |
3125 | - failed_config = """ |
3126 | - [builddmaster] |
3127 | - vm_resume_command: test "%(vm_host)s = 'no-sir'" |
3128 | - """ |
3129 | - config.push('failed_resume_command', failed_config) |
3130 | - self.addCleanup(config.pop, 'failed_resume_command') |
3131 | - |
3132 | - # On failures, the response is a twisted `Failure` object containing |
3133 | - # a tuple. |
3134 | - def check_resume_failure(failure): |
3135 | - out, err, code = failure.value |
3136 | - # The process will exit with a return code of "1". |
3137 | - self.assertEqual(code, 1) |
3138 | - d = slave.resume() |
3139 | - d.addBoth(check_resume_failure) |
3140 | - return d |
3141 | - |
3142 | - def test_resumeHost_timeout(self): |
3143 | - # On a resume timeouts, 'resumeHost' fires the returned deferred |
3144 | - # errorback with the `TimeoutError` failure. |
3145 | - self.slave_helper.getServerSlave() |
3146 | - slave = self.slave_helper.getClientSlave() |
3147 | - |
3148 | - # Override the configuration command-line with one that will timeout. |
3149 | - timeout_config = """ |
3150 | - [builddmaster] |
3151 | - vm_resume_command: sleep 5 |
3152 | - socket_timeout: 1 |
3153 | - """ |
3154 | - config.push('timeout_resume_command', timeout_config) |
3155 | - self.addCleanup(config.pop, 'timeout_resume_command') |
3156 | - |
3157 | - # On timeouts, the response is a twisted `Failure` object containing |
3158 | - # a `TimeoutError` error. |
3159 | - def check_resume_timeout(failure): |
3160 | - self.assertIsInstance(failure, Failure) |
3161 | - out, err, code = failure.value |
3162 | - self.assertEqual(code, signal.SIGKILL) |
3163 | - clock = Clock() |
3164 | - d = slave.resume(clock=clock) |
3165 | - # Move the clock beyond the socket_timeout but earlier than the |
3166 | - # sleep 5. This stops the test having to wait for the timeout. |
3167 | - # Fast tests FTW! |
3168 | - clock.advance(2) |
3169 | - d.addBoth(check_resume_timeout) |
3170 | - return d |
3171 | - |
3172 | - |
3173 | -class TestSlaveTimeouts(TrialTestCase): |
3174 | - # Testing that the methods that call callRemote() all time out |
3175 | - # as required. |
3176 | - |
3177 | - layer = TwistedLayer |
3178 | - |
3179 | - def setUp(self): |
3180 | - super(TestSlaveTimeouts, self).setUp() |
3181 | - self.slave_helper = SlaveTestHelpers() |
3182 | - self.slave_helper.setUp() |
3183 | - self.addCleanup(self.slave_helper.cleanUp) |
3184 | - self.clock = Clock() |
3185 | - self.proxy = DeadProxy("url") |
3186 | - self.slave = self.slave_helper.getClientSlave( |
3187 | - reactor=self.clock, proxy=self.proxy) |
3188 | - |
3189 | - def assertCancelled(self, d): |
3190 | - self.clock.advance(config.builddmaster.socket_timeout + 1) |
3191 | - return self.assertFailure(d, CancelledError) |
3192 | - |
3193 | - def test_timeout_abort(self): |
3194 | - return self.assertCancelled(self.slave.abort()) |
3195 | - |
3196 | - def test_timeout_clean(self): |
3197 | - return self.assertCancelled(self.slave.clean()) |
3198 | - |
3199 | - def test_timeout_echo(self): |
3200 | - return self.assertCancelled(self.slave.echo()) |
3201 | - |
3202 | - def test_timeout_info(self): |
3203 | - return self.assertCancelled(self.slave.info()) |
3204 | - |
3205 | - def test_timeout_status(self): |
3206 | - return self.assertCancelled(self.slave.status()) |
3207 | - |
3208 | - def test_timeout_ensurepresent(self): |
3209 | - return self.assertCancelled( |
3210 | - self.slave.ensurepresent(None, None, None, None)) |
3211 | - |
3212 | - def test_timeout_build(self): |
3213 | - return self.assertCancelled( |
3214 | - self.slave.build(None, None, None, None, None)) |
3215 | - |
3216 | - |
3217 | -class TestSlaveWithLibrarian(TrialTestCase): |
3218 | - """Tests that need more of Launchpad to run.""" |
3219 | - |
3220 | - layer = TwistedLaunchpadZopelessLayer |
3221 | - |
3222 | - def setUp(self): |
3223 | - super(TestSlaveWithLibrarian, self) |
3224 | - self.slave_helper = SlaveTestHelpers() |
3225 | - self.slave_helper.setUp() |
3226 | - self.addCleanup(self.slave_helper.cleanUp) |
3227 | - self.factory = LaunchpadObjectFactory() |
3228 | - login_as(ANONYMOUS) |
3229 | - self.addCleanup(logout) |
3230 | - |
3231 | - def test_ensurepresent_librarian(self): |
3232 | - # ensurepresent, when given an http URL for a file will download the |
3233 | - # file from that URL and report that the file is present, and it was |
3234 | - # downloaded. |
3235 | - |
3236 | - # Use the Librarian because it's a "convenient" web server. |
3237 | - lf = self.factory.makeLibraryFileAlias( |
3238 | - 'HelloWorld.txt', content="Hello World") |
3239 | - self.layer.txn.commit() |
3240 | - self.slave_helper.getServerSlave() |
3241 | - slave = self.slave_helper.getClientSlave() |
3242 | - d = slave.ensurepresent( |
3243 | - lf.content.sha1, lf.http_url, "", "") |
3244 | - d.addCallback(self.assertEqual, [True, 'Download']) |
3245 | - return d |
3246 | - |
3247 | - def test_retrieve_files_from_filecache(self): |
3248 | - # Files that are present on the slave can be downloaded with a |
3249 | - # filename made from the sha1 of the content underneath the |
3250 | - # 'filecache' directory. |
3251 | - content = "Hello World" |
3252 | - lf = self.factory.makeLibraryFileAlias( |
3253 | - 'HelloWorld.txt', content=content) |
3254 | - self.layer.txn.commit() |
3255 | - expected_url = '%s/filecache/%s' % ( |
3256 | - self.slave_helper.BASE_URL, lf.content.sha1) |
3257 | - self.slave_helper.getServerSlave() |
3258 | - slave = self.slave_helper.getClientSlave() |
3259 | - d = slave.ensurepresent( |
3260 | - lf.content.sha1, lf.http_url, "", "") |
3261 | - def check_file(ignored): |
3262 | - d = getPage(expected_url.encode('utf8')) |
3263 | - return d.addCallback(self.assertEqual, content) |
3264 | - return d.addCallback(check_file) |
3265 | + tachandler = self.getServerSlave() |
3266 | + slave = self.getClientSlave() |
3267 | + self.makeCacheFile(tachandler, 'blahblah') |
3268 | + result = slave.ensurepresent('blahblah', None, None, None) |
3269 | + self.assertEqual([True, 'No URL'], result) |
3270 | |
3271 | === modified file 'lib/lp/buildmaster/tests/test_manager.py' |
3272 | --- lib/lp/buildmaster/tests/test_manager.py 2010-10-19 13:58:21 +0000 |
3273 | +++ lib/lp/buildmaster/tests/test_manager.py 2010-12-07 16:24:04 +0000 |
3274 | @@ -6,7 +6,6 @@ |
3275 | import os |
3276 | import signal |
3277 | import time |
3278 | -import xmlrpclib |
3279 | |
3280 | import transaction |
3281 | |
3282 | @@ -15,7 +14,9 @@ |
3283 | reactor, |
3284 | task, |
3285 | ) |
3286 | +from twisted.internet.error import ConnectionClosed |
3287 | from twisted.internet.task import ( |
3288 | + Clock, |
3289 | deferLater, |
3290 | ) |
3291 | from twisted.python.failure import Failure |
3292 | @@ -29,45 +30,577 @@ |
3293 | ANONYMOUS, |
3294 | login, |
3295 | ) |
3296 | -from canonical.launchpad.scripts.logger import ( |
3297 | - QuietFakeLogger, |
3298 | - ) |
3299 | +from canonical.launchpad.scripts.logger import BufferLogger |
3300 | from canonical.testing.layers import ( |
3301 | LaunchpadScriptLayer, |
3302 | - TwistedLaunchpadZopelessLayer, |
3303 | + LaunchpadZopelessLayer, |
3304 | TwistedLayer, |
3305 | - ZopelessDatabaseLayer, |
3306 | ) |
3307 | from lp.buildmaster.enums import BuildStatus |
3308 | from lp.buildmaster.interfaces.builder import IBuilderSet |
3309 | from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
3310 | from lp.buildmaster.manager import ( |
3311 | - assessFailureCounts, |
3312 | + BaseDispatchResult, |
3313 | + buildd_success_result_map, |
3314 | BuilddManager, |
3315 | + FailDispatchResult, |
3316 | NewBuildersScanner, |
3317 | + RecordingSlave, |
3318 | + ResetDispatchResult, |
3319 | SlaveScanner, |
3320 | ) |
3321 | -from lp.buildmaster.model.builder import Builder |
3322 | from lp.buildmaster.tests.harness import BuilddManagerTestSetup |
3323 | -from lp.buildmaster.tests.mock_slaves import ( |
3324 | - BrokenSlave, |
3325 | - BuildingSlave, |
3326 | - OkSlave, |
3327 | - ) |
3328 | +from lp.buildmaster.tests.mock_slaves import BuildingSlave |
3329 | from lp.registry.interfaces.distribution import IDistributionSet |
3330 | from lp.soyuz.interfaces.binarypackagebuild import IBinaryPackageBuildSet |
3331 | -from lp.testing import TestCaseWithFactory |
3332 | +from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
3333 | +from lp.testing import TestCase as LaunchpadTestCase |
3334 | from lp.testing.factory import LaunchpadObjectFactory |
3335 | from lp.testing.fakemethod import FakeMethod |
3336 | from lp.testing.sampledata import BOB_THE_BUILDER_NAME |
3337 | |
3338 | |
3339 | +class TestRecordingSlaves(TrialTestCase): |
3340 | + """Tests for the recording slave class.""" |
3341 | + layer = TwistedLayer |
3342 | + |
3343 | + def setUp(self): |
3344 | + """Setup a fresh `RecordingSlave` for tests.""" |
3345 | + TrialTestCase.setUp(self) |
3346 | + self.slave = RecordingSlave( |
3347 | + 'foo', 'http://foo:8221/rpc', 'foo.host') |
3348 | + |
3349 | + def test_representation(self): |
3350 | + """`RecordingSlave` has a custom representation. |
3351 | + |
3352 | + It encloses builder name and xmlrpc url for debug purposes. |
3353 | + """ |
3354 | + self.assertEqual('<foo:http://foo:8221/rpc>', repr(self.slave)) |
3355 | + |
3356 | + def assert_ensurepresent(self, func): |
3357 | + """Helper function to test results from calling ensurepresent.""" |
3358 | + self.assertEqual( |
3359 | + [True, 'Download'], |
3360 | + func('boing', 'bar', 'baz')) |
3361 | + self.assertEqual( |
3362 | + [('ensurepresent', ('boing', 'bar', 'baz'))], |
3363 | + self.slave.calls) |
3364 | + |
3365 | + def test_ensurepresent(self): |
3366 | + """`RecordingSlave.ensurepresent` always succeeds. |
3367 | + |
3368 | + It returns the expected succeed code and records the interaction |
3369 | + information for later use. |
3370 | + """ |
3371 | + self.assert_ensurepresent(self.slave.ensurepresent) |
3372 | + |
3373 | + def test_sendFileToSlave(self): |
3374 | + """RecordingSlave.sendFileToSlave always succeeeds. |
3375 | + |
3376 | + It calls ensurepresent() and hence returns the same results. |
3377 | + """ |
3378 | + self.assert_ensurepresent(self.slave.sendFileToSlave) |
3379 | + |
3380 | + def test_build(self): |
3381 | + """`RecordingSlave.build` always succeeds. |
3382 | + |
3383 | + It returns the expected succeed code and records the interaction |
3384 | + information for later use. |
3385 | + """ |
3386 | + self.assertEqual( |
3387 | + ['BuilderStatus.BUILDING', 'boing'], |
3388 | + self.slave.build('boing', 'bar', 'baz')) |
3389 | + self.assertEqual( |
3390 | + [('build', ('boing', 'bar', 'baz'))], |
3391 | + self.slave.calls) |
3392 | + |
3393 | + def test_resume(self): |
3394 | + """`RecordingSlave.resume` always returns successs.""" |
3395 | + # Resume isn't requested in a just-instantiated RecordingSlave. |
3396 | + self.assertFalse(self.slave.resume_requested) |
3397 | + |
3398 | + # When resume is called, it returns the success list and mark |
3399 | + # the slave for resuming. |
3400 | + self.assertEqual(['', '', os.EX_OK], self.slave.resume()) |
3401 | + self.assertTrue(self.slave.resume_requested) |
3402 | + |
3403 | + def test_resumeHost_success(self): |
3404 | + # On a successful resume resumeHost() fires the returned deferred |
3405 | + # callback with 'None'. |
3406 | + |
3407 | + # The configuration testing command-line. |
3408 | + self.assertEqual( |
3409 | + 'echo %(vm_host)s', config.builddmaster.vm_resume_command) |
3410 | + |
3411 | + # On success the response is None. |
3412 | + def check_resume_success(response): |
3413 | + out, err, code = response |
3414 | + self.assertEqual(os.EX_OK, code) |
3415 | + self.assertEqual("%s\n" % self.slave.vm_host, out) |
3416 | + d = self.slave.resumeSlave() |
3417 | + d.addBoth(check_resume_success) |
3418 | + return d |
3419 | + |
3420 | + def test_resumeHost_failure(self): |
3421 | + # On a failed resume, 'resumeHost' fires the returned deferred |
3422 | + # errorback with the `ProcessTerminated` failure. |
3423 | + |
3424 | + # Override the configuration command-line with one that will fail. |
3425 | + failed_config = """ |
3426 | + [builddmaster] |
3427 | + vm_resume_command: test "%(vm_host)s = 'no-sir'" |
3428 | + """ |
3429 | + config.push('failed_resume_command', failed_config) |
3430 | + self.addCleanup(config.pop, 'failed_resume_command') |
3431 | + |
3432 | + # On failures, the response is a twisted `Failure` object containing |
3433 | + # a tuple. |
3434 | + def check_resume_failure(failure): |
3435 | + out, err, code = failure.value |
3436 | + # The process will exit with a return code of "1". |
3437 | + self.assertEqual(code, 1) |
3438 | + d = self.slave.resumeSlave() |
3439 | + d.addBoth(check_resume_failure) |
3440 | + return d |
3441 | + |
3442 | + def test_resumeHost_timeout(self): |
3443 | + # On a resume timeouts, 'resumeHost' fires the returned deferred |
3444 | + # errorback with the `TimeoutError` failure. |
3445 | + |
3446 | + # Override the configuration command-line with one that will timeout. |
3447 | + timeout_config = """ |
3448 | + [builddmaster] |
3449 | + vm_resume_command: sleep 5 |
3450 | + socket_timeout: 1 |
3451 | + """ |
3452 | + config.push('timeout_resume_command', timeout_config) |
3453 | + self.addCleanup(config.pop, 'timeout_resume_command') |
3454 | + |
3455 | + # On timeouts, the response is a twisted `Failure` object containing |
3456 | + # a `TimeoutError` error. |
3457 | + def check_resume_timeout(failure): |
3458 | + self.assertIsInstance(failure, Failure) |
3459 | + out, err, code = failure.value |
3460 | + self.assertEqual(code, signal.SIGKILL) |
3461 | + clock = Clock() |
3462 | + d = self.slave.resumeSlave(clock=clock) |
3463 | + # Move the clock beyond the socket_timeout but earlier than the |
3464 | + # sleep 5. This stops the test having to wait for the timeout. |
3465 | + # Fast tests FTW! |
3466 | + clock.advance(2) |
3467 | + d.addBoth(check_resume_timeout) |
3468 | + return d |
3469 | + |
3470 | + |
3471 | +class TestingXMLRPCProxy: |
3472 | + """This class mimics a twisted XMLRPC Proxy class.""" |
3473 | + |
3474 | + def __init__(self, failure_info=None): |
3475 | + self.calls = [] |
3476 | + self.failure_info = failure_info |
3477 | + self.works = failure_info is None |
3478 | + |
3479 | + def callRemote(self, *args): |
3480 | + self.calls.append(args) |
3481 | + if self.works: |
3482 | + result = buildd_success_result_map.get(args[0]) |
3483 | + else: |
3484 | + result = 'boing' |
3485 | + return defer.succeed([result, self.failure_info]) |
3486 | + |
3487 | + |
3488 | +class TestingResetDispatchResult(ResetDispatchResult): |
3489 | + """Override the evaluation method to simply annotate the call.""" |
3490 | + |
3491 | + def __init__(self, slave, info=None): |
3492 | + ResetDispatchResult.__init__(self, slave, info) |
3493 | + self.processed = False |
3494 | + |
3495 | + def __call__(self): |
3496 | + self.processed = True |
3497 | + |
3498 | + |
3499 | +class TestingFailDispatchResult(FailDispatchResult): |
3500 | + """Override the evaluation method to simply annotate the call.""" |
3501 | + |
3502 | + def __init__(self, slave, info=None): |
3503 | + FailDispatchResult.__init__(self, slave, info) |
3504 | + self.processed = False |
3505 | + |
3506 | + def __call__(self): |
3507 | + self.processed = True |
3508 | + |
3509 | + |
3510 | +class TestingSlaveScanner(SlaveScanner): |
3511 | + """Override the dispatch result factories """ |
3512 | + |
3513 | + reset_result = TestingResetDispatchResult |
3514 | + fail_result = TestingFailDispatchResult |
3515 | + |
3516 | + |
3517 | +class TestSlaveScanner(TrialTestCase): |
3518 | + """Tests for the actual build slave manager.""" |
3519 | + layer = LaunchpadZopelessLayer |
3520 | + |
3521 | + def setUp(self): |
3522 | + TrialTestCase.setUp(self) |
3523 | + self.manager = TestingSlaveScanner( |
3524 | + BOB_THE_BUILDER_NAME, BufferLogger()) |
3525 | + |
3526 | + self.fake_builder_url = 'http://bob.buildd:8221/' |
3527 | + self.fake_builder_host = 'bob.host' |
3528 | + |
3529 | + # We will use an instrumented SlaveScanner instance for tests in |
3530 | + # this context. |
3531 | + |
3532 | + # Stop cyclic execution and record the end of the cycle. |
3533 | + self.stopped = False |
3534 | + |
3535 | + def testNextCycle(): |
3536 | + self.stopped = True |
3537 | + |
3538 | + self.manager.scheduleNextScanCycle = testNextCycle |
3539 | + |
3540 | + # Return the testing Proxy version. |
3541 | + self.test_proxy = TestingXMLRPCProxy() |
3542 | + |
3543 | + def testGetProxyForSlave(slave): |
3544 | + return self.test_proxy |
3545 | + self.manager._getProxyForSlave = testGetProxyForSlave |
3546 | + |
3547 | + # Deactivate the 'scan' method. |
3548 | + def testScan(): |
3549 | + pass |
3550 | + self.manager.scan = testScan |
3551 | + |
3552 | + # Stop automatic collection of dispatching results. |
3553 | + def testslaveConversationEnded(): |
3554 | + pass |
3555 | + self._realslaveConversationEnded = self.manager.slaveConversationEnded |
3556 | + self.manager.slaveConversationEnded = testslaveConversationEnded |
3557 | + |
3558 | + def assertIsDispatchReset(self, result): |
3559 | + self.assertTrue( |
3560 | + isinstance(result, TestingResetDispatchResult), |
3561 | + 'Dispatch failure did not result in a ResetBuildResult object') |
3562 | + |
3563 | + def assertIsDispatchFail(self, result): |
3564 | + self.assertTrue( |
3565 | + isinstance(result, TestingFailDispatchResult), |
3566 | + 'Dispatch failure did not result in a FailBuildResult object') |
3567 | + |
3568 | + def test_checkResume(self): |
3569 | + """`SlaveScanner.checkResume` is chained after resume requests. |
3570 | + |
3571 | + If the resume request succeed it returns None, otherwise it returns |
3572 | + a `ResetBuildResult` (the one in the test context) that will be |
3573 | + collect and evaluated later. |
3574 | + |
3575 | + See `RecordingSlave.resumeHost` for more information about the resume |
3576 | + result contents. |
3577 | + """ |
3578 | + slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3579 | + |
3580 | + successful_response = ['', '', os.EX_OK] |
3581 | + result = self.manager.checkResume(successful_response, slave) |
3582 | + self.assertEqual( |
3583 | + None, result, 'Successful resume checks should return None') |
3584 | + |
3585 | + failed_response = ['stdout', 'stderr', 1] |
3586 | + result = self.manager.checkResume(failed_response, slave) |
3587 | + self.assertIsDispatchReset(result) |
3588 | + self.assertEqual( |
3589 | + '<foo:http://foo.buildd:8221/> reset failure', repr(result)) |
3590 | + self.assertEqual( |
3591 | + result.info, "stdout\nstderr") |
3592 | + |
3593 | + def test_fail_to_resume_slave_resets_slave(self): |
3594 | + # If an attempt to resume and dispatch a slave fails, we reset the |
3595 | + # slave by calling self.reset_result(slave)(). |
3596 | + |
3597 | + reset_result_calls = [] |
3598 | + |
3599 | + class LoggingResetResult(BaseDispatchResult): |
3600 | + """A DispatchResult that logs calls to itself. |
3601 | + |
3602 | + This *must* subclass BaseDispatchResult, otherwise finishCycle() |
3603 | + won't treat it like a dispatch result. |
3604 | + """ |
3605 | + |
3606 | + def __init__(self, slave, info=None): |
3607 | + self.slave = slave |
3608 | + |
3609 | + def __call__(self): |
3610 | + reset_result_calls.append(self.slave) |
3611 | + |
3612 | + # Make a failing slave that is requesting a resume. |
3613 | + slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3614 | + slave.resume_requested = True |
3615 | + slave.resumeSlave = lambda: deferLater( |
3616 | + reactor, 0, defer.fail, Failure(('out', 'err', 1))) |
3617 | + |
3618 | + # Make the manager log the reset result calls. |
3619 | + self.manager.reset_result = LoggingResetResult |
3620 | + |
3621 | + # We only care about this one slave. Reset the list of manager |
3622 | + # deferreds in case setUp did something unexpected. |
3623 | + self.manager._deferred_list = [] |
3624 | + |
3625 | + # Here, we're patching the slaveConversationEnded method so we can |
3626 | + # get an extra callback at the end of it, so we can |
3627 | + # verify that the reset_result was really called. |
3628 | + def _slaveConversationEnded(): |
3629 | + d = self._realslaveConversationEnded() |
3630 | + return d.addCallback( |
3631 | + lambda ignored: self.assertEqual([slave], reset_result_calls)) |
3632 | + self.manager.slaveConversationEnded = _slaveConversationEnded |
3633 | + |
3634 | + self.manager.resumeAndDispatch(slave) |
3635 | + |
3636 | + def test_failed_to_resume_slave_ready_for_reset(self): |
3637 | + # When a slave fails to resume, the manager has a Deferred in its |
3638 | + # Deferred list that is ready to fire with a ResetDispatchResult. |
3639 | + |
3640 | + # Make a failing slave that is requesting a resume. |
3641 | + slave = RecordingSlave('foo', 'http://foo.buildd:8221/', 'foo.host') |
3642 | + slave.resume_requested = True |
3643 | + slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err', 1))) |
3644 | + |
3645 | + # We only care about this one slave. Reset the list of manager |
3646 | + # deferreds in case setUp did something unexpected. |
3647 | + self.manager._deferred_list = [] |
3648 | + # Restore the slaveConversationEnded method. It's very relevant to |
3649 | + # this test. |
3650 | + self.manager.slaveConversationEnded = self._realslaveConversationEnded |
3651 | + self.manager.resumeAndDispatch(slave) |
3652 | + [d] = self.manager._deferred_list |
3653 | + |
3654 | + # The Deferred for our failing slave should be ready to fire |
3655 | + # successfully with a ResetDispatchResult. |
3656 | + def check_result(result): |
3657 | + self.assertIsInstance(result, ResetDispatchResult) |
3658 | + self.assertEqual(slave, result.slave) |
3659 | + self.assertFalse(result.processed) |
3660 | + return d.addCallback(check_result) |
3661 | + |
3662 | + def _setUpSlaveAndBuilder(self, builder_failure_count=None, |
3663 | + job_failure_count=None): |
3664 | + # Helper function to set up a builder and its recording slave. |
3665 | + if builder_failure_count is None: |
3666 | + builder_failure_count = 0 |
3667 | + if job_failure_count is None: |
3668 | + job_failure_count = 0 |
3669 | + slave = RecordingSlave( |
3670 | + BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3671 | + self.fake_builder_host) |
3672 | + bob_builder = getUtility(IBuilderSet)[slave.name] |
3673 | + bob_builder.failure_count = builder_failure_count |
3674 | + bob_builder.getCurrentBuildFarmJob().failure_count = job_failure_count |
3675 | + return slave, bob_builder |
3676 | + |
3677 | + def test_checkDispatch_success(self): |
3678 | + # SlaveScanner.checkDispatch returns None for a successful |
3679 | + # dispatch. |
3680 | + |
3681 | + """ |
3682 | + If the dispatch request fails or a unknown method is given, it |
3683 | + returns a `FailDispatchResult` (in the test context) that will |
3684 | + be evaluated later. |
3685 | + |
3686 | + Builders will be marked as failed if the following responses |
3687 | + categories are received. |
3688 | + |
3689 | + * Legitimate slave failures: when the response is a list with 2 |
3690 | + elements but the first element ('status') does not correspond to |
3691 | + the expected 'success' result. See `buildd_success_result_map`. |
3692 | + |
3693 | + * Unexpected (code) failures: when the given 'method' is unknown |
3694 | + or the response isn't a 2-element list or Failure instance. |
3695 | + |
3696 | + Communication failures (a twisted `Failure` instance) will simply |
3697 | + cause the builder to be reset, a `ResetDispatchResult` object is |
3698 | + returned. In other words, network failures are ignored in this |
3699 | + stage, broken builders will be identified and marked as so |
3700 | + during 'scan()' stage. |
3701 | + |
3702 | + On success dispatching it returns None. |
3703 | + """ |
3704 | + slave, bob_builder = self._setUpSlaveAndBuilder( |
3705 | + builder_failure_count=0, job_failure_count=0) |
3706 | + |
3707 | + # Successful legitimate response, None is returned. |
3708 | + successful_response = [ |
3709 | + buildd_success_result_map.get('ensurepresent'), 'cool builder'] |
3710 | + result = self.manager.checkDispatch( |
3711 | + successful_response, 'ensurepresent', slave) |
3712 | + self.assertEqual( |
3713 | + None, result, 'Successful dispatch checks should return None') |
3714 | + |
3715 | + def test_checkDispatch_first_fail(self): |
3716 | + # Failed legitimate response, results in FailDispatchResult and |
3717 | + # failure_count on the job and the builder are both incremented. |
3718 | + slave, bob_builder = self._setUpSlaveAndBuilder( |
3719 | + builder_failure_count=0, job_failure_count=0) |
3720 | + |
3721 | + failed_response = [False, 'uncool builder'] |
3722 | + result = self.manager.checkDispatch( |
3723 | + failed_response, 'ensurepresent', slave) |
3724 | + self.assertIsDispatchFail(result) |
3725 | + self.assertEqual( |
3726 | + repr(result), |
3727 | + '<bob:%s> failure (uncool builder)' % self.fake_builder_url) |
3728 | + self.assertEqual(1, bob_builder.failure_count) |
3729 | + self.assertEqual( |
3730 | + 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3731 | + |
3732 | + def test_checkDispatch_second_reset_fail_by_builder(self): |
3733 | + # Twisted Failure response, results in a `FailDispatchResult`. |
3734 | + slave, bob_builder = self._setUpSlaveAndBuilder( |
3735 | + builder_failure_count=1, job_failure_count=0) |
3736 | + |
3737 | + twisted_failure = Failure(ConnectionClosed('Boom!')) |
3738 | + result = self.manager.checkDispatch( |
3739 | + twisted_failure, 'ensurepresent', slave) |
3740 | + self.assertIsDispatchFail(result) |
3741 | + self.assertEqual( |
3742 | + '<bob:%s> failure (None)' % self.fake_builder_url, repr(result)) |
3743 | + self.assertEqual(2, bob_builder.failure_count) |
3744 | + self.assertEqual( |
3745 | + 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3746 | + |
3747 | + def test_checkDispatch_second_comms_fail_by_builder(self): |
3748 | + # Unexpected response, results in a `FailDispatchResult`. |
3749 | + slave, bob_builder = self._setUpSlaveAndBuilder( |
3750 | + builder_failure_count=1, job_failure_count=0) |
3751 | + |
3752 | + unexpected_response = [1, 2, 3] |
3753 | + result = self.manager.checkDispatch( |
3754 | + unexpected_response, 'build', slave) |
3755 | + self.assertIsDispatchFail(result) |
3756 | + self.assertEqual( |
3757 | + '<bob:%s> failure ' |
3758 | + '(Unexpected response: [1, 2, 3])' % self.fake_builder_url, |
3759 | + repr(result)) |
3760 | + self.assertEqual(2, bob_builder.failure_count) |
3761 | + self.assertEqual( |
3762 | + 1, bob_builder.getCurrentBuildFarmJob().failure_count) |
3763 | + |
3764 | + def test_checkDispatch_second_comms_fail_by_job(self): |
3765 | + # Unknown method was given, results in a `FailDispatchResult`. |
3766 | + # This could be caused by a faulty job which would fail the job. |
3767 | + slave, bob_builder = self._setUpSlaveAndBuilder( |
3768 | + builder_failure_count=0, job_failure_count=1) |
3769 | + |
3770 | + successful_response = [ |
3771 | + buildd_success_result_map.get('ensurepresent'), 'cool builder'] |
3772 | + result = self.manager.checkDispatch( |
3773 | + successful_response, 'unknown-method', slave) |
3774 | + self.assertIsDispatchFail(result) |
3775 | + self.assertEqual( |
3776 | + '<bob:%s> failure ' |
3777 | + '(Unknown slave method: unknown-method)' % self.fake_builder_url, |
3778 | + repr(result)) |
3779 | + self.assertEqual(1, bob_builder.failure_count) |
3780 | + self.assertEqual( |
3781 | + 2, bob_builder.getCurrentBuildFarmJob().failure_count) |
3782 | + |
3783 | + def test_initiateDispatch(self): |
3784 | + """Check `dispatchBuild` in various scenarios. |
3785 | + |
3786 | + When there are no recording slaves (i.e. no build got dispatched |
3787 | + in scan()) it simply finishes the cycle. |
3788 | + |
3789 | + When there is a recording slave with pending slave calls, they are |
3790 | + performed and if they all succeed the cycle is finished with no |
3791 | + errors. |
3792 | + |
3793 | + On slave call failure the chain is stopped immediately and an |
3794 | + FailDispatchResult is collected while finishing the cycle. |
3795 | + """ |
3796 | + def check_no_events(results): |
3797 | + errors = [ |
3798 | + r for s, r in results if isinstance(r, BaseDispatchResult)] |
3799 | + self.assertEqual(0, len(errors)) |
3800 | + |
3801 | + def check_events(results): |
3802 | + [error] = [r for s, r in results if r is not None] |
3803 | + self.assertEqual( |
3804 | + '<bob:%s> failure (very broken slave)' |
3805 | + % self.fake_builder_url, |
3806 | + repr(error)) |
3807 | + self.assertTrue(error.processed) |
3808 | + |
3809 | + def _wait_on_deferreds_then_check_no_events(): |
3810 | + dl = self._realslaveConversationEnded() |
3811 | + dl.addCallback(check_no_events) |
3812 | + |
3813 | + def _wait_on_deferreds_then_check_events(): |
3814 | + dl = self._realslaveConversationEnded() |
3815 | + dl.addCallback(check_events) |
3816 | + |
3817 | + # A functional slave charged with some interactions. |
3818 | + slave = RecordingSlave( |
3819 | + BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3820 | + self.fake_builder_host) |
3821 | + slave.ensurepresent('arg1', 'arg2', 'arg3') |
3822 | + slave.build('arg1', 'arg2', 'arg3') |
3823 | + |
3824 | + # If the previous step (resuming) has failed nothing gets dispatched. |
3825 | + reset_result = ResetDispatchResult(slave) |
3826 | + result = self.manager.initiateDispatch(reset_result, slave) |
3827 | + self.assertTrue(result is reset_result) |
3828 | + self.assertFalse(slave.resume_requested) |
3829 | + self.assertEqual(0, len(self.manager._deferred_list)) |
3830 | + |
3831 | + # Operation with the default (funcional slave), no resets or |
3832 | + # failures results are triggered. |
3833 | + slave.resume() |
3834 | + result = self.manager.initiateDispatch(None, slave) |
3835 | + self.assertEqual(None, result) |
3836 | + self.assertTrue(slave.resume_requested) |
3837 | + self.assertEqual( |
3838 | + [('ensurepresent', 'arg1', 'arg2', 'arg3'), |
3839 | + ('build', 'arg1', 'arg2', 'arg3')], |
3840 | + self.test_proxy.calls) |
3841 | + self.assertEqual(2, len(self.manager._deferred_list)) |
3842 | + |
3843 | + # Monkey patch the slaveConversationEnded method so we can chain a |
3844 | + # callback to check the end of the result chain. |
3845 | + self.manager.slaveConversationEnded = \ |
3846 | + _wait_on_deferreds_then_check_no_events |
3847 | + events = self.manager.slaveConversationEnded() |
3848 | + |
3849 | + # Create a broken slave and insert interaction that will |
3850 | + # cause the builder to be marked as fail. |
3851 | + self.test_proxy = TestingXMLRPCProxy('very broken slave') |
3852 | + slave = RecordingSlave( |
3853 | + BOB_THE_BUILDER_NAME, self.fake_builder_url, |
3854 | + self.fake_builder_host) |
3855 | + slave.ensurepresent('arg1', 'arg2', 'arg3') |
3856 | + slave.build('arg1', 'arg2', 'arg3') |
3857 | + |
3858 | + result = self.manager.initiateDispatch(None, slave) |
3859 | + self.assertEqual(None, result) |
3860 | + self.assertEqual(3, len(self.manager._deferred_list)) |
3861 | + self.assertEqual( |
3862 | + [('ensurepresent', 'arg1', 'arg2', 'arg3')], |
3863 | + self.test_proxy.calls) |
3864 | + |
3865 | + # Monkey patch the slaveConversationEnded method so we can chain a |
3866 | + # callback to check the end of the result chain. |
3867 | + self.manager.slaveConversationEnded = \ |
3868 | + _wait_on_deferreds_then_check_events |
3869 | + events = self.manager.slaveConversationEnded() |
3870 | + |
3871 | + return events |
3872 | + |
3873 | + |
3874 | class TestSlaveScannerScan(TrialTestCase): |
3875 | """Tests `SlaveScanner.scan` method. |
3876 | |
3877 | This method uses the old framework for scanning and dispatching builds. |
3878 | """ |
3879 | - layer = TwistedLaunchpadZopelessLayer |
3880 | + layer = LaunchpadZopelessLayer |
3881 | |
3882 | def setUp(self): |
3883 | """Setup TwistedLayer, TrialTestCase and BuilddSlaveTest. |
3884 | @@ -75,18 +608,19 @@ |
3885 | Also adjust the sampledata in a way a build can be dispatched to |
3886 | 'bob' builder. |
3887 | """ |
3888 | - from lp.soyuz.tests.test_publishing import SoyuzTestPublisher |
3889 | TwistedLayer.testSetUp() |
3890 | TrialTestCase.setUp(self) |
3891 | self.slave = BuilddSlaveTestSetup() |
3892 | self.slave.setUp() |
3893 | |
3894 | # Creating the required chroots needed for dispatching. |
3895 | + login('foo.bar@canonical.com') |
3896 | test_publisher = SoyuzTestPublisher() |
3897 | ubuntu = getUtility(IDistributionSet).getByName('ubuntu') |
3898 | hoary = ubuntu.getSeries('hoary') |
3899 | test_publisher.setUpDefaultDistroSeries(hoary) |
3900 | test_publisher.addFakeChroots() |
3901 | + login(ANONYMOUS) |
3902 | |
3903 | def tearDown(self): |
3904 | self.slave.tearDown() |
3905 | @@ -94,7 +628,8 @@ |
3906 | TwistedLayer.testTearDown() |
3907 | |
3908 | def _resetBuilder(self, builder): |
3909 | - """Reset the given builder and its job.""" |
3910 | + """Reset the given builder and it's job.""" |
3911 | + login('foo.bar@canonical.com') |
3912 | |
3913 | builder.builderok = True |
3914 | job = builder.currentjob |
3915 | @@ -102,6 +637,7 @@ |
3916 | job.reset() |
3917 | |
3918 | transaction.commit() |
3919 | + login(ANONYMOUS) |
3920 | |
3921 | def assertBuildingJob(self, job, builder, logtail=None): |
3922 | """Assert the given job is building on the given builder.""" |
3923 | @@ -117,25 +653,55 @@ |
3924 | self.assertEqual(build.status, BuildStatus.BUILDING) |
3925 | self.assertEqual(job.logtail, logtail) |
3926 | |
3927 | - def _getScanner(self, builder_name=None): |
3928 | + def _getManager(self): |
3929 | """Instantiate a SlaveScanner object. |
3930 | |
3931 | Replace its default logging handler by a testing version. |
3932 | """ |
3933 | - if builder_name is None: |
3934 | - builder_name = BOB_THE_BUILDER_NAME |
3935 | - scanner = SlaveScanner(builder_name, QuietFakeLogger()) |
3936 | - scanner.logger.name = 'slave-scanner' |
3937 | + manager = SlaveScanner(BOB_THE_BUILDER_NAME, BufferLogger()) |
3938 | + manager.logger.name = 'slave-scanner' |
3939 | |
3940 | - return scanner |
3941 | + return manager |
3942 | |
3943 | def _checkDispatch(self, slave, builder): |
3944 | - # SlaveScanner.scan returns a slave when a dispatch was |
3945 | - # successful. We also check that the builder has a job on it. |
3946 | - |
3947 | - self.assertTrue(slave is not None, "Expected a slave.") |
3948 | + """`SlaveScanner.scan` returns a `RecordingSlave`. |
3949 | + |
3950 | + The single slave returned should match the given builder and |
3951 | + contain interactions that should be performed asynchronously for |
3952 | + properly dispatching the sampledata job. |
3953 | + """ |
3954 | + self.assertFalse( |
3955 | + slave is None, "Unexpected recording_slaves.") |
3956 | + |
3957 | + self.assertEqual(slave.name, builder.name) |
3958 | + self.assertEqual(slave.url, builder.url) |
3959 | + self.assertEqual(slave.vm_host, builder.vm_host) |
3960 | self.assertEqual(0, builder.failure_count) |
3961 | - self.assertTrue(builder.currentjob is not None) |
3962 | + |
3963 | + self.assertEqual( |
3964 | + [('ensurepresent', |
3965 | + ('0feca720e2c29dafb2c900713ba560e03b758711', |
3966 | + 'http://localhost:58000/93/fake_chroot.tar.gz', |
3967 | + '', '')), |
3968 | + ('ensurepresent', |
3969 | + ('4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33', |
3970 | + 'http://localhost:58000/43/alsa-utils_1.0.9a-4ubuntu1.dsc', |
3971 | + '', '')), |
3972 | + ('build', |
3973 | + ('6358a89e2215e19b02bf91e2e4d009640fae5cf8', |
3974 | + 'binarypackage', '0feca720e2c29dafb2c900713ba560e03b758711', |
3975 | + {'alsa-utils_1.0.9a-4ubuntu1.dsc': |
3976 | + '4e3961baf4f56fdbc95d0dd47f3c5bc275da8a33'}, |
3977 | + {'arch_indep': True, |
3978 | + 'arch_tag': 'i386', |
3979 | + 'archive_private': False, |
3980 | + 'archive_purpose': 'PRIMARY', |
3981 | + 'archives': |
3982 | + ['deb http://ftpmaster.internal/ubuntu hoary main'], |
3983 | + 'build_debug_symbols': False, |
3984 | + 'ogrecomponent': 'main', |
3985 | + 'suite': u'hoary'}))], |
3986 | + slave.calls, "Job was not properly dispatched.") |
3987 | |
3988 | def testScanDispatchForResetBuilder(self): |
3989 | # A job gets dispatched to the sampledata builder after it's reset. |
3990 | @@ -143,27 +709,26 @@ |
3991 | # Reset sampledata builder. |
3992 | builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
3993 | self._resetBuilder(builder) |
3994 | - builder.setSlaveForTesting(OkSlave()) |
3995 | # Set this to 1 here so that _checkDispatch can make sure it's |
3996 | # reset to 0 after a successful dispatch. |
3997 | builder.failure_count = 1 |
3998 | |
3999 | # Run 'scan' and check its result. |
4000 | - self.layer.txn.commit() |
4001 | - self.layer.switchDbUser(config.builddmaster.dbuser) |
4002 | - scanner = self._getScanner() |
4003 | - d = defer.maybeDeferred(scanner.scan) |
4004 | + LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4005 | + manager = self._getManager() |
4006 | + d = defer.maybeDeferred(manager.scan) |
4007 | d.addCallback(self._checkDispatch, builder) |
4008 | return d |
4009 | |
4010 | - def _checkNoDispatch(self, slave, builder): |
4011 | + def _checkNoDispatch(self, recording_slave, builder): |
4012 | """Assert that no dispatch has occurred. |
4013 | |
4014 | - 'slave' is None, so no interations would be passed |
4015 | + 'recording_slave' is None, so no interations would be passed |
4016 | to the asynchonous dispatcher and the builder remained active |
4017 | and IDLE. |
4018 | """ |
4019 | - self.assertTrue(slave is None, "Unexpected slave.") |
4020 | + self.assertTrue( |
4021 | + recording_slave is None, "Unexpected recording_slave.") |
4022 | |
4023 | builder = getUtility(IBuilderSet).get(builder.id) |
4024 | self.assertTrue(builder.builderok) |
4025 | @@ -188,9 +753,9 @@ |
4026 | login(ANONYMOUS) |
4027 | |
4028 | # Run 'scan' and check its result. |
4029 | - self.layer.switchDbUser(config.builddmaster.dbuser) |
4030 | - scanner = self._getScanner() |
4031 | - d = defer.maybeDeferred(scanner.singleCycle) |
4032 | + LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4033 | + manager = self._getManager() |
4034 | + d = defer.maybeDeferred(manager.scan) |
4035 | d.addCallback(self._checkNoDispatch, builder) |
4036 | return d |
4037 | |
4038 | @@ -228,9 +793,9 @@ |
4039 | login(ANONYMOUS) |
4040 | |
4041 | # Run 'scan' and check its result. |
4042 | - self.layer.switchDbUser(config.builddmaster.dbuser) |
4043 | - scanner = self._getScanner() |
4044 | - d = defer.maybeDeferred(scanner.scan) |
4045 | + LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4046 | + manager = self._getManager() |
4047 | + d = defer.maybeDeferred(manager.scan) |
4048 | d.addCallback(self._checkJobRescued, builder, job) |
4049 | return d |
4050 | |
4051 | @@ -249,6 +814,8 @@ |
4052 | self.assertBuildingJob(job, builder, logtail='This is a build log') |
4053 | |
4054 | def testScanUpdatesBuildingJobs(self): |
4055 | + # The job assigned to a broken builder is rescued. |
4056 | + |
4057 | # Enable sampledata builder attached to an appropriate testing |
4058 | # slave. It will respond as if it was building the sampledata job. |
4059 | builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4060 | @@ -263,174 +830,188 @@ |
4061 | self.assertBuildingJob(job, builder) |
4062 | |
4063 | # Run 'scan' and check its result. |
4064 | - self.layer.switchDbUser(config.builddmaster.dbuser) |
4065 | - scanner = self._getScanner() |
4066 | - d = defer.maybeDeferred(scanner.scan) |
4067 | + LaunchpadZopelessLayer.switchDbUser(config.builddmaster.dbuser) |
4068 | + manager = self._getManager() |
4069 | + d = defer.maybeDeferred(manager.scan) |
4070 | d.addCallback(self._checkJobUpdated, builder, job) |
4071 | return d |
4072 | |
4073 | - def test_scan_with_nothing_to_dispatch(self): |
4074 | - factory = LaunchpadObjectFactory() |
4075 | - builder = factory.makeBuilder() |
4076 | - builder.setSlaveForTesting(OkSlave()) |
4077 | - scanner = self._getScanner(builder_name=builder.name) |
4078 | - d = scanner.scan() |
4079 | - return d.addCallback(self._checkNoDispatch, builder) |
4080 | - |
4081 | - def test_scan_with_manual_builder(self): |
4082 | - # Reset sampledata builder. |
4083 | - builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4084 | - self._resetBuilder(builder) |
4085 | - builder.setSlaveForTesting(OkSlave()) |
4086 | - builder.manual = True |
4087 | - scanner = self._getScanner() |
4088 | - d = scanner.scan() |
4089 | - d.addCallback(self._checkNoDispatch, builder) |
4090 | - return d |
4091 | - |
4092 | - def test_scan_with_not_ok_builder(self): |
4093 | - # Reset sampledata builder. |
4094 | - builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4095 | - self._resetBuilder(builder) |
4096 | - builder.setSlaveForTesting(OkSlave()) |
4097 | - builder.builderok = False |
4098 | - scanner = self._getScanner() |
4099 | - d = scanner.scan() |
4100 | - # Because the builder is not ok, we can't use _checkNoDispatch. |
4101 | - d.addCallback( |
4102 | - lambda ignored: self.assertIdentical(None, builder.currentjob)) |
4103 | - return d |
4104 | - |
4105 | - def test_scan_of_broken_slave(self): |
4106 | - builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME] |
4107 | - self._resetBuilder(builder) |
4108 | - builder.setSlaveForTesting(BrokenSlave()) |
4109 | - builder.failure_count = 0 |
4110 | - scanner = self._getScanner(builder_name=builder.name) |
4111 | - d = scanner.scan() |
4112 | - return self.assertFailure(d, xmlrpclib.Fault) |
4113 | - |
4114 | - def _assertFailureCounting(self, builder_count, job_count, |
4115 | - expected_builder_count, expected_job_count): |
4116 | + def test_scan_assesses_failure_exceptions(self): |
4117 | # If scan() fails with an exception, failure_counts should be |
4118 | - # incremented. What we do with the results of the failure |
4119 | - # counts is tested below separately, this test just makes sure that |
4120 | - # scan() is setting the counts. |
4121 | + # incremented and tested. |
4122 | def failing_scan(): |
4123 | - return defer.fail(Exception("fake exception")) |
4124 | - scanner = self._getScanner() |
4125 | - scanner.scan = failing_scan |
4126 | + raise Exception("fake exception") |
4127 | + manager = self._getManager() |
4128 | + manager.scan = failing_scan |
4129 | + manager.scheduleNextScanCycle = FakeMethod() |
4130 | from lp.buildmaster import manager as manager_module |
4131 | self.patch(manager_module, 'assessFailureCounts', FakeMethod()) |
4132 | - builder = getUtility(IBuilderSet)[scanner.builder_name] |
4133 | - |
4134 | - builder.failure_count = builder_count |
4135 | - builder.currentjob.specific_job.build.failure_count = job_count |
4136 | - # The _scanFailed() calls abort, so make sure our existing |
4137 | - # failure counts are persisted. |
4138 | - self.layer.txn.commit() |
4139 | - |
4140 | - # singleCycle() calls scan() which is our fake one that throws an |
4141 | + builder = getUtility(IBuilderSet)[manager.builder_name] |
4142 | + |
4143 | + # Failure counts start at zero. |
4144 | + self.assertEqual(0, builder.failure_count) |
4145 | + self.assertEqual( |
4146 | + 0, builder.currentjob.specific_job.build.failure_count) |
4147 | + |
4148 | + # startCycle() calls scan() which is our fake one that throws an |
4149 | # exception. |
4150 | - d = scanner.singleCycle() |
4151 | + manager.startCycle() |
4152 | |
4153 | # Failure counts should be updated, and the assessment method |
4154 | - # should have been called. The actual behaviour is tested below |
4155 | - # in TestFailureAssessments. |
4156 | - def got_scan(ignored): |
4157 | - self.assertEqual(expected_builder_count, builder.failure_count) |
4158 | - self.assertEqual( |
4159 | - expected_job_count, |
4160 | - builder.currentjob.specific_job.build.failure_count) |
4161 | - self.assertEqual( |
4162 | - 1, manager_module.assessFailureCounts.call_count) |
4163 | - |
4164 | - return d.addCallback(got_scan) |
4165 | - |
4166 | - def test_scan_first_fail(self): |
4167 | - # The first failure of a job should result in the failure_count |
4168 | - # on the job and the builder both being incremented. |
4169 | - self._assertFailureCounting( |
4170 | - builder_count=0, job_count=0, expected_builder_count=1, |
4171 | - expected_job_count=1) |
4172 | - |
4173 | - def test_scan_second_builder_fail(self): |
4174 | - # The first failure of a job should result in the failure_count |
4175 | - # on the job and the builder both being incremented. |
4176 | - self._assertFailureCounting( |
4177 | - builder_count=1, job_count=0, expected_builder_count=2, |
4178 | - expected_job_count=1) |
4179 | - |
4180 | - def test_scan_second_job_fail(self): |
4181 | - # The first failure of a job should result in the failure_count |
4182 | - # on the job and the builder both being incremented. |
4183 | - self._assertFailureCounting( |
4184 | - builder_count=0, job_count=1, expected_builder_count=1, |
4185 | - expected_job_count=2) |
4186 | - |
4187 | - def test_scanFailed_handles_lack_of_a_job_on_the_builder(self): |
4188 | - def failing_scan(): |
4189 | - return defer.fail(Exception("fake exception")) |
4190 | - scanner = self._getScanner() |
4191 | - scanner.scan = failing_scan |
4192 | - builder = getUtility(IBuilderSet)[scanner.builder_name] |
4193 | - builder.failure_count = Builder.FAILURE_THRESHOLD |
4194 | - builder.currentjob.reset() |
4195 | - self.layer.txn.commit() |
4196 | - |
4197 | - d = scanner.singleCycle() |
4198 | - |
4199 | - def scan_finished(ignored): |
4200 | - self.assertFalse(builder.builderok) |
4201 | - |
4202 | - return d.addCallback(scan_finished) |
4203 | - |
4204 | - def test_fail_to_resume_slave_resets_job(self): |
4205 | - # If an attempt to resume and dispatch a slave fails, it should |
4206 | - # reset the job via job.reset() |
4207 | - |
4208 | - # Make a slave with a failing resume() method. |
4209 | - slave = OkSlave() |
4210 | - slave.resume = lambda: deferLater( |
4211 | - reactor, 0, defer.fail, Failure(('out', 'err', 1))) |
4212 | - |
4213 | - # Reset sampledata builder. |
4214 | - builder = removeSecurityProxy( |
4215 | - getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]) |
4216 | - self._resetBuilder(builder) |
4217 | - self.assertEqual(0, builder.failure_count) |
4218 | - builder.setSlaveForTesting(slave) |
4219 | - builder.vm_host = "fake_vm_host" |
4220 | - |
4221 | - scanner = self._getScanner() |
4222 | - |
4223 | - # Get the next job that will be dispatched. |
4224 | - job = removeSecurityProxy(builder._findBuildCandidate()) |
4225 | - job.virtualized = True |
4226 | - builder.virtualized = True |
4227 | - d = scanner.singleCycle() |
4228 | - |
4229 | - def check(ignored): |
4230 | - # The failure_count will have been incremented on the |
4231 | - # builder, we can check that to see that a dispatch attempt |
4232 | - # did indeed occur. |
4233 | - self.assertEqual(1, builder.failure_count) |
4234 | - # There should also be no builder set on the job. |
4235 | - self.assertTrue(job.builder is None) |
4236 | - build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4237 | - self.assertEqual(build.status, BuildStatus.NEEDSBUILD) |
4238 | - |
4239 | - return d.addCallback(check) |
4240 | + # should have been called. |
4241 | + self.assertEqual(1, builder.failure_count) |
4242 | + self.assertEqual( |
4243 | + 1, builder.currentjob.specific_job.build.failure_count) |
4244 | + |
4245 | + self.assertEqual( |
4246 | + 1, manager_module.assessFailureCounts.call_count) |
4247 | + |
4248 | + |
4249 | +class TestDispatchResult(LaunchpadTestCase): |
4250 | + """Tests `BaseDispatchResult` variations. |
4251 | + |
4252 | + Variations of `BaseDispatchResult` when evaluated update the database |
4253 | + information according to their purpose. |
4254 | + """ |
4255 | + |
4256 | + layer = LaunchpadZopelessLayer |
4257 | + |
4258 | + def _getBuilder(self, name): |
4259 | + """Return a fixed `IBuilder` instance from the sampledata. |
4260 | + |
4261 | + Ensure it's active (builderok=True) and it has a in-progress job. |
4262 | + """ |
4263 | + login('foo.bar@canonical.com') |
4264 | + |
4265 | + builder = getUtility(IBuilderSet)[name] |
4266 | + builder.builderok = True |
4267 | + |
4268 | + job = builder.currentjob |
4269 | + build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4270 | + self.assertEqual( |
4271 | + 'i386 build of mozilla-firefox 0.9 in ubuntu hoary RELEASE', |
4272 | + build.title) |
4273 | + |
4274 | + self.assertEqual('BUILDING', build.status.name) |
4275 | + self.assertNotEqual(None, job.builder) |
4276 | + self.assertNotEqual(None, job.date_started) |
4277 | + self.assertNotEqual(None, job.logtail) |
4278 | + |
4279 | + transaction.commit() |
4280 | + |
4281 | + return builder, job.id |
4282 | + |
4283 | + def assertBuildqueueIsClean(self, buildqueue): |
4284 | + # Check that the buildqueue is reset. |
4285 | + self.assertEqual(None, buildqueue.builder) |
4286 | + self.assertEqual(None, buildqueue.date_started) |
4287 | + self.assertEqual(None, buildqueue.logtail) |
4288 | + |
4289 | + def assertBuilderIsClean(self, builder): |
4290 | + # Check that the builder is ready for a new build. |
4291 | + self.assertTrue(builder.builderok) |
4292 | + self.assertIs(None, builder.failnotes) |
4293 | + self.assertIs(None, builder.currentjob) |
4294 | + |
4295 | + def testResetDispatchResult(self): |
4296 | + # Test that `ResetDispatchResult` resets the builder and job. |
4297 | + builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4298 | + buildqueue_id = builder.currentjob.id |
4299 | + builder.builderok = True |
4300 | + builder.failure_count = 1 |
4301 | + |
4302 | + # Setup a interaction to satisfy 'write_transaction' decorator. |
4303 | + login(ANONYMOUS) |
4304 | + slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4305 | + result = ResetDispatchResult(slave) |
4306 | + result() |
4307 | + |
4308 | + buildqueue = getUtility(IBuildQueueSet).get(buildqueue_id) |
4309 | + self.assertBuildqueueIsClean(buildqueue) |
4310 | + |
4311 | + # XXX Julian |
4312 | + # Disabled test until bug 586362 is fixed. |
4313 | + #self.assertFalse(builder.builderok) |
4314 | + self.assertBuilderIsClean(builder) |
4315 | + |
4316 | + def testFailDispatchResult(self): |
4317 | + # Test that `FailDispatchResult` calls assessFailureCounts() so |
4318 | + # that we know the builders and jobs are failed as necessary |
4319 | + # when a FailDispatchResult is called at the end of the dispatch |
4320 | + # chain. |
4321 | + builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4322 | + |
4323 | + # Setup a interaction to satisfy 'write_transaction' decorator. |
4324 | + login(ANONYMOUS) |
4325 | + slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4326 | + result = FailDispatchResult(slave, 'does not work!') |
4327 | + result.assessFailureCounts = FakeMethod() |
4328 | + self.assertEqual(0, result.assessFailureCounts.call_count) |
4329 | + result() |
4330 | + self.assertEqual(1, result.assessFailureCounts.call_count) |
4331 | + |
4332 | + def _setup_failing_dispatch_result(self): |
4333 | + # assessFailureCounts should fail jobs or builders depending on |
4334 | + # whether it sees the failure_counts on each increasing. |
4335 | + builder, job_id = self._getBuilder(BOB_THE_BUILDER_NAME) |
4336 | + slave = RecordingSlave(builder.name, builder.url, builder.vm_host) |
4337 | + result = FailDispatchResult(slave, 'does not work!') |
4338 | + return builder, result |
4339 | + |
4340 | + def test_assessFailureCounts_equal_failures(self): |
4341 | + # Basic case where the failure counts are equal and the job is |
4342 | + # reset to try again & the builder is not failed. |
4343 | + builder, result = self._setup_failing_dispatch_result() |
4344 | + buildqueue = builder.currentjob |
4345 | + build = buildqueue.specific_job.build |
4346 | + builder.failure_count = 2 |
4347 | + build.failure_count = 2 |
4348 | + result.assessFailureCounts() |
4349 | + |
4350 | + self.assertBuilderIsClean(builder) |
4351 | + self.assertEqual('NEEDSBUILD', build.status.name) |
4352 | + self.assertBuildqueueIsClean(buildqueue) |
4353 | + |
4354 | + def test_assessFailureCounts_job_failed(self): |
4355 | + # Case where the job has failed more than the builder. |
4356 | + builder, result = self._setup_failing_dispatch_result() |
4357 | + buildqueue = builder.currentjob |
4358 | + build = buildqueue.specific_job.build |
4359 | + build.failure_count = 2 |
4360 | + builder.failure_count = 1 |
4361 | + result.assessFailureCounts() |
4362 | + |
4363 | + self.assertBuilderIsClean(builder) |
4364 | + self.assertEqual('FAILEDTOBUILD', build.status.name) |
4365 | + # The buildqueue should have been removed entirely. |
4366 | + self.assertEqual( |
4367 | + None, getUtility(IBuildQueueSet).getByBuilder(builder), |
4368 | + "Buildqueue was not removed when it should be.") |
4369 | + |
4370 | + def test_assessFailureCounts_builder_failed(self): |
4371 | + # Case where the builder has failed more than the job. |
4372 | + builder, result = self._setup_failing_dispatch_result() |
4373 | + buildqueue = builder.currentjob |
4374 | + build = buildqueue.specific_job.build |
4375 | + build.failure_count = 2 |
4376 | + builder.failure_count = 3 |
4377 | + result.assessFailureCounts() |
4378 | + |
4379 | + self.assertFalse(builder.builderok) |
4380 | + self.assertEqual('does not work!', builder.failnotes) |
4381 | + self.assertTrue(builder.currentjob is None) |
4382 | + self.assertEqual('NEEDSBUILD', build.status.name) |
4383 | + self.assertBuildqueueIsClean(buildqueue) |
4384 | |
4385 | |
4386 | class TestBuilddManager(TrialTestCase): |
4387 | |
4388 | - layer = TwistedLaunchpadZopelessLayer |
4389 | + layer = LaunchpadZopelessLayer |
4390 | |
4391 | def _stub_out_scheduleNextScanCycle(self): |
4392 | # stub out the code that adds a callLater, so that later tests |
4393 | # don't get surprises. |
4394 | - self.patch(SlaveScanner, 'startCycle', FakeMethod()) |
4395 | + self.patch(SlaveScanner, 'scheduleNextScanCycle', FakeMethod()) |
4396 | |
4397 | def test_addScanForBuilders(self): |
4398 | # Test that addScanForBuilders generates NewBuildersScanner objects. |
4399 | @@ -459,62 +1040,10 @@ |
4400 | self.assertNotEqual(0, manager.new_builders_scanner.scan.call_count) |
4401 | |
4402 | |
4403 | -class TestFailureAssessments(TestCaseWithFactory): |
4404 | - |
4405 | - layer = ZopelessDatabaseLayer |
4406 | - |
4407 | - def setUp(self): |
4408 | - TestCaseWithFactory.setUp(self) |
4409 | - self.builder = self.factory.makeBuilder() |
4410 | - self.build = self.factory.makeSourcePackageRecipeBuild() |
4411 | - self.buildqueue = self.build.queueBuild() |
4412 | - self.buildqueue.markAsBuilding(self.builder) |
4413 | - |
4414 | - def test_equal_failures_reset_job(self): |
4415 | - self.builder.gotFailure() |
4416 | - self.builder.getCurrentBuildFarmJob().gotFailure() |
4417 | - |
4418 | - assessFailureCounts(self.builder, "failnotes") |
4419 | - self.assertIs(None, self.builder.currentjob) |
4420 | - self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4421 | - |
4422 | - def test_job_failing_more_than_builder_fails_job(self): |
4423 | - self.builder.getCurrentBuildFarmJob().gotFailure() |
4424 | - |
4425 | - assessFailureCounts(self.builder, "failnotes") |
4426 | - self.assertIs(None, self.builder.currentjob) |
4427 | - self.assertEqual(self.build.status, BuildStatus.FAILEDTOBUILD) |
4428 | - |
4429 | - def test_builder_failing_more_than_job_but_under_fail_threshold(self): |
4430 | - self.builder.failure_count = Builder.FAILURE_THRESHOLD - 1 |
4431 | - |
4432 | - assessFailureCounts(self.builder, "failnotes") |
4433 | - self.assertIs(None, self.builder.currentjob) |
4434 | - self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4435 | - self.assertTrue(self.builder.builderok) |
4436 | - |
4437 | - def test_builder_failing_more_than_job_but_over_fail_threshold(self): |
4438 | - self.builder.failure_count = Builder.FAILURE_THRESHOLD |
4439 | - |
4440 | - assessFailureCounts(self.builder, "failnotes") |
4441 | - self.assertIs(None, self.builder.currentjob) |
4442 | - self.assertEqual(self.build.status, BuildStatus.NEEDSBUILD) |
4443 | - self.assertFalse(self.builder.builderok) |
4444 | - self.assertEqual("failnotes", self.builder.failnotes) |
4445 | - |
4446 | - def test_builder_failing_with_no_attached_job(self): |
4447 | - self.buildqueue.reset() |
4448 | - self.builder.failure_count = Builder.FAILURE_THRESHOLD |
4449 | - |
4450 | - assessFailureCounts(self.builder, "failnotes") |
4451 | - self.assertFalse(self.builder.builderok) |
4452 | - self.assertEqual("failnotes", self.builder.failnotes) |
4453 | - |
4454 | - |
4455 | class TestNewBuilders(TrialTestCase): |
4456 | """Test detecting of new builders.""" |
4457 | |
4458 | - layer = TwistedLaunchpadZopelessLayer |
4459 | + layer = LaunchpadZopelessLayer |
4460 | |
4461 | def _getScanner(self, manager=None, clock=None): |
4462 | return NewBuildersScanner(manager=manager, clock=clock) |
4463 | @@ -555,8 +1084,11 @@ |
4464 | new_builders, builder_scanner.checkForNewBuilders()) |
4465 | |
4466 | def test_scan(self): |
4467 | - # See if scan detects new builders. |
4468 | + # See if scan detects new builders and schedules the next scan. |
4469 | |
4470 | + # stub out the addScanForBuilders and scheduleScan methods since |
4471 | + # they use callLater; we only want to assert that they get |
4472 | + # called. |
4473 | def fake_checkForNewBuilders(): |
4474 | return "new_builders" |
4475 | |
4476 | @@ -572,6 +1104,9 @@ |
4477 | builder_scanner.scan() |
4478 | advance = NewBuildersScanner.SCAN_INTERVAL + 1 |
4479 | clock.advance(advance) |
4480 | + self.assertNotEqual( |
4481 | + 0, builder_scanner.scheduleScan.call_count, |
4482 | + "scheduleScan did not get called") |
4483 | |
4484 | |
4485 | def is_file_growing(filepath, poll_interval=1, poll_repeat=10): |
4486 | @@ -612,7 +1147,7 @@ |
4487 | return False |
4488 | |
4489 | |
4490 | -class TestBuilddManagerScript(TestCaseWithFactory): |
4491 | +class TestBuilddManagerScript(LaunchpadTestCase): |
4492 | |
4493 | layer = LaunchpadScriptLayer |
4494 | |
4495 | @@ -621,7 +1156,6 @@ |
4496 | fixture = BuilddManagerTestSetup() |
4497 | fixture.setUp() |
4498 | fixture.tearDown() |
4499 | - self.layer.force_dirty_database() |
4500 | |
4501 | # XXX Julian 2010-08-06 bug=614275 |
4502 | # These next 2 tests are in the wrong place, they should be near the |
4503 | |
4504 | === modified file 'lib/lp/buildmaster/tests/test_packagebuild.py' |
4505 | --- lib/lp/buildmaster/tests/test_packagebuild.py 2010-10-26 20:43:50 +0000 |
4506 | +++ lib/lp/buildmaster/tests/test_packagebuild.py 2010-12-07 16:24:04 +0000 |
4507 | @@ -97,8 +97,6 @@ |
4508 | self.assertRaises( |
4509 | NotImplementedError, self.package_build.verifySuccessfulUpload) |
4510 | self.assertRaises(NotImplementedError, self.package_build.notify) |
4511 | - # XXX 2010-10-18 bug=662631 |
4512 | - # Change this to do non-blocking IO. |
4513 | self.assertRaises( |
4514 | NotImplementedError, self.package_build.handleStatus, |
4515 | None, None, None) |
4516 | @@ -311,8 +309,6 @@ |
4517 | # A filemap with plain filenames should not cause a problem. |
4518 | # The call to handleStatus will attempt to get the file from |
4519 | # the slave resulting in a URL error in this test case. |
4520 | - # XXX 2010-10-18 bug=662631 |
4521 | - # Change this to do non-blocking IO. |
4522 | self.build.handleStatus('OK', None, { |
4523 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4524 | }) |
4525 | @@ -323,8 +319,6 @@ |
4526 | def test_handleStatus_OK_absolute_filepath(self): |
4527 | # A filemap that tries to write to files outside of |
4528 | # the upload directory will result in a failed upload. |
4529 | - # XXX 2010-10-18 bug=662631 |
4530 | - # Change this to do non-blocking IO. |
4531 | self.build.handleStatus('OK', None, { |
4532 | 'filemap': {'/tmp/myfile.py': 'test_file_hash'}, |
4533 | }) |
4534 | @@ -335,8 +329,6 @@ |
4535 | def test_handleStatus_OK_relative_filepath(self): |
4536 | # A filemap that tries to write to files outside of |
4537 | # the upload directory will result in a failed upload. |
4538 | - # XXX 2010-10-18 bug=662631 |
4539 | - # Change this to do non-blocking IO. |
4540 | self.build.handleStatus('OK', None, { |
4541 | 'filemap': {'../myfile.py': 'test_file_hash'}, |
4542 | }) |
4543 | @@ -347,8 +339,6 @@ |
4544 | # The build log is set during handleStatus. |
4545 | removeSecurityProxy(self.build).log = None |
4546 | self.assertEqual(None, self.build.log) |
4547 | - # XXX 2010-10-18 bug=662631 |
4548 | - # Change this to do non-blocking IO. |
4549 | self.build.handleStatus('OK', None, { |
4550 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4551 | }) |
4552 | @@ -358,8 +348,6 @@ |
4553 | # The date finished is updated during handleStatus_OK. |
4554 | removeSecurityProxy(self.build).date_finished = None |
4555 | self.assertEqual(None, self.build.date_finished) |
4556 | - # XXX 2010-10-18 bug=662631 |
4557 | - # Change this to do non-blocking IO. |
4558 | self.build.handleStatus('OK', None, { |
4559 | 'filemap': {'myfile.py': 'test_file_hash'}, |
4560 | }) |
4561 | |
4562 | === modified file 'lib/lp/code/model/recipebuilder.py' |
4563 | --- lib/lp/code/model/recipebuilder.py 2010-09-24 12:47:12 +0000 |
4564 | +++ lib/lp/code/model/recipebuilder.py 2010-12-07 16:24:04 +0000 |
4565 | @@ -117,42 +117,38 @@ |
4566 | raise CannotBuild("Unable to find distroarchseries for %s in %s" % |
4567 | (self._builder.processor.name, |
4568 | self.build.distroseries.displayname)) |
4569 | - args = self._extraBuildArgs(distroarchseries, logger) |
4570 | + |
4571 | chroot = distroarchseries.getChroot() |
4572 | if chroot is None: |
4573 | raise CannotBuild("Unable to find a chroot for %s" % |
4574 | distroarchseries.displayname) |
4575 | - d = self._builder.slave.cacheFile(logger, chroot) |
4576 | - |
4577 | - def got_cache_file(ignored): |
4578 | - # Generate a string which can be used to cross-check when obtaining |
4579 | - # results so we know we are referring to the right database object in |
4580 | - # subsequent runs. |
4581 | - buildid = "%s-%s" % (self.build.id, build_queue_id) |
4582 | - cookie = self.buildfarmjob.generateSlaveBuildCookie() |
4583 | - chroot_sha1 = chroot.content.sha1 |
4584 | - logger.debug( |
4585 | - "Initiating build %s on %s" % (buildid, self._builder.url)) |
4586 | - |
4587 | - return self._builder.slave.build( |
4588 | - cookie, "sourcepackagerecipe", chroot_sha1, {}, args) |
4589 | - |
4590 | - def log_build_result((status, info)): |
4591 | - message = """%s (%s): |
4592 | - ***** RESULT ***** |
4593 | - %s |
4594 | - %s: %s |
4595 | - ****************** |
4596 | - """ % ( |
4597 | - self._builder.name, |
4598 | - self._builder.url, |
4599 | - args, |
4600 | - status, |
4601 | - info, |
4602 | - ) |
4603 | - logger.info(message) |
4604 | - |
4605 | - return d.addCallback(got_cache_file).addCallback(log_build_result) |
4606 | + self._builder.slave.cacheFile(logger, chroot) |
4607 | + |
4608 | + # Generate a string which can be used to cross-check when obtaining |
4609 | + # results so we know we are referring to the right database object in |
4610 | + # subsequent runs. |
4611 | + buildid = "%s-%s" % (self.build.id, build_queue_id) |
4612 | + cookie = self.buildfarmjob.generateSlaveBuildCookie() |
4613 | + chroot_sha1 = chroot.content.sha1 |
4614 | + logger.debug( |
4615 | + "Initiating build %s on %s" % (buildid, self._builder.url)) |
4616 | + |
4617 | + args = self._extraBuildArgs(distroarchseries, logger) |
4618 | + status, info = self._builder.slave.build( |
4619 | + cookie, "sourcepackagerecipe", chroot_sha1, {}, args) |
4620 | + message = """%s (%s): |
4621 | + ***** RESULT ***** |
4622 | + %s |
4623 | + %s: %s |
4624 | + ****************** |
4625 | + """ % ( |
4626 | + self._builder.name, |
4627 | + self._builder.url, |
4628 | + args, |
4629 | + status, |
4630 | + info, |
4631 | + ) |
4632 | + logger.info(message) |
4633 | |
4634 | def verifyBuildRequest(self, logger): |
4635 | """Assert some pre-build checks. |
4636 | |
4637 | === modified file 'lib/lp/soyuz/browser/tests/test_builder_views.py' |
4638 | --- lib/lp/soyuz/browser/tests/test_builder_views.py 2010-10-06 12:20:03 +0000 |
4639 | +++ lib/lp/soyuz/browser/tests/test_builder_views.py 2010-12-07 16:24:04 +0000 |
4640 | @@ -34,7 +34,7 @@ |
4641 | return view |
4642 | |
4643 | def test_posting_form_doesnt_call_slave_xmlrpc(self): |
4644 | - # Posting the +edit for should not call isAvailable, which |
4645 | + # Posting the +edit for should not call is_available, which |
4646 | # would do xmlrpc to a slave builder and is explicitly forbidden |
4647 | # in a webapp process. |
4648 | view = self.initialize_view() |
4649 | |
4650 | === added file 'lib/lp/soyuz/doc/buildd-dispatching.txt' |
4651 | --- lib/lp/soyuz/doc/buildd-dispatching.txt 1970-01-01 00:00:00 +0000 |
4652 | +++ lib/lp/soyuz/doc/buildd-dispatching.txt 2010-12-07 16:24:04 +0000 |
4653 | @@ -0,0 +1,371 @@ |
4654 | += Buildd Dispatching = |
4655 | + |
4656 | + >>> import transaction |
4657 | + >>> import logging |
4658 | + >>> logger = logging.getLogger() |
4659 | + >>> logger.setLevel(logging.DEBUG) |
4660 | + |
4661 | +The buildd dispatching basically consists of finding a available |
4662 | +slave in IDLE state, pushing any required files to it, then requesting |
4663 | +that it starts the build procedure. These tasks are implemented by the |
4664 | +BuilderSet and Builder classes. |
4665 | + |
4666 | +Setup the test builder: |
4667 | + |
4668 | + >>> from canonical.buildd.tests import BuilddSlaveTestSetup |
4669 | + >>> fixture = BuilddSlaveTestSetup() |
4670 | + >>> fixture.setUp() |
4671 | + |
4672 | +Setup a suitable chroot for Hoary i386: |
4673 | + |
4674 | + >>> from StringIO import StringIO |
4675 | + >>> from canonical.librarian.interfaces import ILibrarianClient |
4676 | + >>> librarian_client = getUtility(ILibrarianClient) |
4677 | + |
4678 | + >>> content = 'anything' |
4679 | + >>> alias_id = librarian_client.addFile( |
4680 | + ... 'foo.tar.gz', len(content), StringIO(content), 'text/plain') |
4681 | + |
4682 | + >>> from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet |
4683 | + >>> from lp.registry.interfaces.distribution import IDistributionSet |
4684 | + >>> from lp.registry.interfaces.pocket import PackagePublishingPocket |
4685 | + |
4686 | + >>> hoary = getUtility(IDistributionSet)['ubuntu']['hoary'] |
4687 | + >>> hoary_i386 = hoary['i386'] |
4688 | + |
4689 | + >>> chroot = getUtility(ILibraryFileAliasSet)[alias_id] |
4690 | + >>> pc = hoary_i386.addOrUpdateChroot(chroot=chroot) |
4691 | + |
4692 | +Activate builders present in sampledata, we need to be logged in as a |
4693 | +member of launchpad-buildd-admin: |
4694 | + |
4695 | + >>> from canonical.launchpad.ftests import login |
4696 | + >>> login('celso.providelo@canonical.com') |
4697 | + |
4698 | +Set IBuilder.builderok of all present builders: |
4699 | + |
4700 | + >>> from lp.buildmaster.interfaces.builder import IBuilderSet |
4701 | + >>> builder_set = getUtility(IBuilderSet) |
4702 | + |
4703 | + >>> builder_set.count() |
4704 | + 2 |
4705 | + |
4706 | + >>> from canonical.launchpad.ftests import syncUpdate |
4707 | + >>> for b in builder_set: |
4708 | + ... b.builderok = True |
4709 | + ... syncUpdate(b) |
4710 | + |
4711 | +Clean up previous BuildQueue results from sampledata: |
4712 | + |
4713 | + >>> from lp.buildmaster.interfaces.buildqueue import IBuildQueueSet |
4714 | + >>> lost_job = getUtility(IBuildQueueSet).get(1) |
4715 | + >>> lost_job.builder.name |
4716 | + u'bob' |
4717 | + >>> lost_job.destroySelf() |
4718 | + >>> transaction.commit() |
4719 | + |
4720 | +If the specified buildd slave reset command (used inside resumeSlaveHost()) |
4721 | +fails, the slave will still be marked as failed. |
4722 | + |
4723 | + >>> from canonical.config import config |
4724 | + >>> reset_fail_config = ''' |
4725 | + ... [builddmaster] |
4726 | + ... vm_resume_command: /bin/false''' |
4727 | + >>> config.push('reset fail', reset_fail_config) |
4728 | + >>> frog_builder = builder_set['frog'] |
4729 | + >>> frog_builder.handleTimeout(logger, 'The universe just collapsed') |
4730 | + WARNING:root:Resetting builder: http://localhost:9221/ -- The universe just collapsed |
4731 | + ... |
4732 | + WARNING:root:Failed to reset builder: http://localhost:9221/ -- Resuming failed: |
4733 | + ... |
4734 | + WARNING:root:Disabling builder: http://localhost:9221/ -- The universe just collapsed |
4735 | + ... |
4736 | + <BLANKLINE> |
4737 | + |
4738 | +Since we were unable to reset the 'frog' builder it was marked as 'failed'. |
4739 | + |
4740 | + >>> frog_builder.builderok |
4741 | + False |
4742 | + |
4743 | +Restore default value for resume command. |
4744 | + |
4745 | + >>> ignored_config = config.pop('reset fail') |
4746 | + |
4747 | +The 'bob' builder is available for build jobs. |
4748 | + |
4749 | + >>> bob_builder = builder_set['bob'] |
4750 | + >>> bob_builder.name |
4751 | + u'bob' |
4752 | + >>> bob_builder.virtualized |
4753 | + False |
4754 | + >>> bob_builder.is_available |
4755 | + True |
4756 | + >>> bob_builder.builderok |
4757 | + True |
4758 | + |
4759 | + |
4760 | +== Builder dispatching API == |
4761 | + |
4762 | +Now let's check the build candidates which will be considered for the |
4763 | +builder 'bob': |
4764 | + |
4765 | + >>> from zope.security.proxy import removeSecurityProxy |
4766 | + >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4767 | + |
4768 | +The single BuildQueue found is a non-virtual pending build: |
4769 | + |
4770 | + >>> job.id |
4771 | + 2 |
4772 | + >>> from lp.soyuz.interfaces.binarypackagebuild import ( |
4773 | + ... IBinaryPackageBuildSet) |
4774 | + >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(job) |
4775 | + >>> build.status.name |
4776 | + 'NEEDSBUILD' |
4777 | + >>> job.builder is None |
4778 | + True |
4779 | + >>> job.date_started is None |
4780 | + True |
4781 | + >>> build.is_virtualized |
4782 | + False |
4783 | + |
4784 | +The build start time is not set yet either. |
4785 | + |
4786 | + >>> print build.date_first_dispatched |
4787 | + None |
4788 | + |
4789 | +Update the SourcePackageReleaseFile corresponding to this job: |
4790 | + |
4791 | + >>> content = 'anything' |
4792 | + >>> alias_id = librarian_client.addFile( |
4793 | + ... 'foo.dsc', len(content), StringIO(content), 'application/dsc') |
4794 | + |
4795 | + >>> sprf = build.source_package_release.files[0] |
4796 | + >>> naked_sprf = removeSecurityProxy(sprf) |
4797 | + >>> naked_sprf.libraryfile = getUtility(ILibraryFileAliasSet)[alias_id] |
4798 | + >>> flush_database_updates() |
4799 | + |
4800 | +Check the dispatching method itself: |
4801 | + |
4802 | + >>> dispatched_job = bob_builder.findAndStartJob() |
4803 | + >>> job == dispatched_job |
4804 | + True |
4805 | + >>> bob_builder.builderok = True |
4806 | + |
4807 | + >>> flush_database_updates() |
4808 | + |
4809 | +Verify if the job (BuildQueue) was updated appropriately: |
4810 | + |
4811 | + >>> job.builder.id == bob_builder.id |
4812 | + True |
4813 | + |
4814 | + >>> dispatched_build = getUtility( |
4815 | + ... IBinaryPackageBuildSet).getByQueueEntry(job) |
4816 | + >>> dispatched_build == build |
4817 | + True |
4818 | + |
4819 | + >>> build.status.name |
4820 | + 'BUILDING' |
4821 | + |
4822 | +Shutdown builder, mark the build record as failed and remove the |
4823 | +buildqueue record, so the build was eliminated: |
4824 | + |
4825 | + >>> fixture.tearDown() |
4826 | + |
4827 | + >>> from lp.buildmaster.enums import BuildStatus |
4828 | + >>> build.status = BuildStatus.FAILEDTOBUILD |
4829 | + >>> job.destroySelf() |
4830 | + >>> flush_database_updates() |
4831 | + |
4832 | + |
4833 | +== PPA build dispatching == |
4834 | + |
4835 | +Create a new Build record of the same source targeted for a PPA archive: |
4836 | + |
4837 | + >>> from lp.registry.interfaces.person import IPersonSet |
4838 | + >>> cprov = getUtility(IPersonSet).getByName('cprov') |
4839 | + |
4840 | + >>> ppa_build = sprf.sourcepackagerelease.createBuild( |
4841 | + ... hoary_i386, PackagePublishingPocket.RELEASE, cprov.archive) |
4842 | + |
4843 | +Create BuildQueue record and inspect some parameters: |
4844 | + |
4845 | + >>> ppa_job = ppa_build.queueBuild() |
4846 | + >>> ppa_job.id |
4847 | + 3 |
4848 | + >>> ppa_job.builder == None |
4849 | + True |
4850 | + >>> ppa_job.date_started == None |
4851 | + True |
4852 | + |
4853 | +The build job's archive requires virtualized builds. |
4854 | + |
4855 | + >>> build = getUtility(IBinaryPackageBuildSet).getByQueueEntry(ppa_job) |
4856 | + >>> build.archive.require_virtualized |
4857 | + True |
4858 | + |
4859 | +But the builder is not virtualized. |
4860 | + |
4861 | + >>> bob_builder.virtualized |
4862 | + False |
4863 | + |
4864 | +Hence, the builder will not be able to pick up the PPA build job created |
4865 | +above. |
4866 | + |
4867 | + >>> bob_builder.vm_host = 'localhost.ppa' |
4868 | + >>> syncUpdate(bob_builder) |
4869 | + |
4870 | + >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4871 | + >>> print job |
4872 | + None |
4873 | + |
4874 | +In order to enable 'bob' to find and build the PPA job, we have to |
4875 | +change it to virtualized. This is because PPA builds will only build |
4876 | +on virtualized builders. We also need to make sure this build's source |
4877 | +is published, or it will also be ignored (by superseding it). We can |
4878 | +do this by copying the existing publication in Ubuntu. |
4879 | + |
4880 | + >>> from lp.soyuz.model.publishing import ( |
4881 | + ... SourcePackagePublishingHistory) |
4882 | + >>> [old_pub] = SourcePackagePublishingHistory.selectBy( |
4883 | + ... distroseries=build.distro_series, |
4884 | + ... sourcepackagerelease=build.source_package_release) |
4885 | + >>> new_pub = old_pub.copyTo( |
4886 | + ... old_pub.distroseries, old_pub.pocket, build.archive) |
4887 | + |
4888 | + >>> bob_builder.virtualized = True |
4889 | + >>> syncUpdate(bob_builder) |
4890 | + |
4891 | + >>> job = removeSecurityProxy(bob_builder)._findBuildCandidate() |
4892 | + >>> ppa_job.id == job.id |
4893 | + True |
4894 | + |
4895 | +For further details regarding IBuilder._findBuildCandidate() please see |
4896 | +lib/lp/soyuz/tests/test_builder.py. |
4897 | + |
4898 | +Start buildd-slave to be able to dispatch jobs. |
4899 | + |
4900 | + >>> fixture = BuilddSlaveTestSetup() |
4901 | + >>> fixture.setUp() |
4902 | + |
4903 | +Before dispatching we can check if the builder is protected against |
4904 | +mistakes in code that results in a attempt to build a virtual job in |
4905 | +a non-virtual build. |
4906 | + |
4907 | + >>> bob_builder.virtualized = False |
4908 | + >>> flush_database_updates() |
4909 | + >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(ppa_job) |
4910 | + Traceback (most recent call last): |
4911 | + ... |
4912 | + AssertionError: Attempt to build non-virtual item on a virtual builder. |
4913 | + |
4914 | +Mark the builder as virtual again, so we can dispatch the ppa job |
4915 | +successfully. |
4916 | + |
4917 | + >>> bob_builder.virtualized = True |
4918 | + >>> flush_database_updates() |
4919 | + |
4920 | + >>> dispatched_job = bob_builder.findAndStartJob() |
4921 | + >>> ppa_job == dispatched_job |
4922 | + True |
4923 | + |
4924 | + >>> flush_database_updates() |
4925 | + |
4926 | +PPA job is building. |
4927 | + |
4928 | + >>> ppa_job.builder.name |
4929 | + u'bob' |
4930 | + |
4931 | + >>> build.status.name |
4932 | + 'BUILDING' |
4933 | + |
4934 | +Shutdown builder slave, mark the ppa build record as failed, remove the |
4935 | +buildqueue record and make 'bob' builder non-virtual again, so the |
4936 | +environment is back to the initial state. |
4937 | + |
4938 | + >>> fixture.tearDown() |
4939 | + |
4940 | + >>> build.status = BuildStatus.FAILEDTOBUILD |
4941 | + >>> ppa_job.destroySelf() |
4942 | + >>> bob_builder.virtualized = False |
4943 | + >>> flush_database_updates() |
4944 | + |
4945 | + |
4946 | +== Security build dispatching == |
4947 | + |
4948 | +Setup chroot for warty/i386. |
4949 | + |
4950 | + >>> warty = getUtility(IDistributionSet)['ubuntu']['warty'] |
4951 | + >>> warty_i386 = warty['i386'] |
4952 | + >>> pc = warty_i386.addOrUpdateChroot(chroot=chroot) |
4953 | + |
4954 | +Create a new Build record for test source targeted to warty/i386 |
4955 | +architecture and SECURITY pocket: |
4956 | + |
4957 | + >>> sec_build = sprf.sourcepackagerelease.createBuild( |
4958 | + ... warty_i386, PackagePublishingPocket.SECURITY, hoary.main_archive) |
4959 | + |
4960 | +Create BuildQueue record and inspect some parameters: |
4961 | + |
4962 | + >>> sec_job = sec_build.queueBuild() |
4963 | + >>> sec_job.id |
4964 | + 4 |
4965 | + >>> print sec_job.builder |
4966 | + None |
4967 | + >>> print sec_job.date_started |
4968 | + None |
4969 | + >>> sec_build.is_virtualized |
4970 | + False |
4971 | + |
4972 | +In normal conditions the next available candidate would be the job |
4973 | +targeted to SECURITY pocket. However, the builders are forbidden to |
4974 | +accept such jobs until we have finished the EMBARGOED archive |
4975 | +implementation. |
4976 | + |
4977 | + >>> fixture = BuilddSlaveTestSetup() |
4978 | + >>> fixture.setUp() |
4979 | + >>> removeSecurityProxy(bob_builder)._dispatchBuildCandidate(sec_job) |
4980 | + Traceback (most recent call last): |
4981 | + ... |
4982 | + AssertionError: Soyuz is not yet capable of building SECURITY uploads. |
4983 | + >>> fixture.tearDown() |
4984 | + |
4985 | +To solve this problem temporarily until we start building security |
4986 | +uploads, we will mark builds targeted to the SECURITY pocket as |
4987 | +FAILEDTOBUILD during the _findBuildCandidate look-up. |
4988 | + |
4989 | +We will also create another build candidate in breezy-autotest/i386 to |
4990 | +check if legitimate pending candidates will remain valid. |
4991 | + |
4992 | + >>> breezy = getUtility(IDistributionSet)['ubuntu']['breezy-autotest'] |
4993 | + >>> breezy_i386 = breezy['i386'] |
4994 | + >>> pc = breezy_i386.addOrUpdateChroot(chroot=chroot) |
4995 | + |
4996 | + >>> pending_build = sprf.sourcepackagerelease.createBuild( |
4997 | + ... breezy_i386, PackagePublishingPocket.UPDATES, hoary.main_archive) |
4998 | + >>> pending_job = pending_build.queueBuild() |
4999 | + |
5000 | +We set the score of the security job to ensure it is considered |
The diff has been truncated for viewing.
There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.