Merge lp:~jml/launchpad/broken-resume-handling into lp:launchpad
Status: | Merged |
---|---|
Approved by: | Julian Edwards |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jml/launchpad/broken-resume-handling |
Merge into: | lp:launchpad |
Diff against target: |
92 lines (+64/-1) 2 files modified
lib/lp/buildmaster/manager.py (+0/-1) lib/lp/buildmaster/tests/test_manager.py (+64/-0) |
To merge this branch: | bzr merge lp:~jml/launchpad/broken-resume-handling |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Julian Edwards (community) | Approve | ||
Review via email: mp+24004@code.launchpad.net |
Description of the change
This branch fixes a bug in the build master code that Julian discovered while trying to land his own buildmaster branch.
I don't really understand the problem domain very well, but I can string words together meaningfully.
When a slave fails to resume, we need to reset it. I think. This work is done in finishCycle(), which calls an object returned by one of the callbacks, calling this object does the reset.
EXCEPT, if something goes wrong in the callbacks, say a ValueError is raised, then finishCycle() won't even call the object that resets the slave, because it won't *have* the object that resets the slave. It will have a Failure instead.
This branch adds some tests for this whole code path, and fixes the case where a slave was being removed from a list twice, thus raising a ValueError.
More generally, the build manager could be written more defensively by being more explicitly a state machine and clearly separating success and error paths, saving addBoth only for cleanup cases.
Thanks for helping me write the tests for this! The change looks good. With any luck my own branch will start working now!
Cheers.