Merge lp:~flacoste/ubuntu-archive-tools/copy-build-scheduler into lp:ubuntu-archive-tools

Proposed by Francis J. Lacoste
Status: Merged
Merge reported by: Jamie Strandboge
Merged at revision: not available
Proposed branch: lp:~flacoste/ubuntu-archive-tools/copy-build-scheduler
Merge into: lp:ubuntu-archive-tools
Diff against target: 178 lines (+174/-0)
1 file modified
copy-build-scheduler.py (+174/-0)
To merge this branch: bzr merge lp:~flacoste/ubuntu-archive-tools/copy-build-scheduler
Reviewer Review Type Date Requested Status
Jamie Strandboge Approve
Julian Edwards (community) Needs Fixing
Robert Collins Pending
Review via email: mp+69370@code.launchpad.net

Description of the change

This branch adds the copy-build-scheduler.py script. It can be use by a buildd administrator to process copy archive rebuilds in a timely manner.

Normally, copy builds have the lowest priority and are only being processed when the build farm is otherwise idle. That's fine for some cases, but to feed back the results of the rebuild into the normal Ubuntu QA cycle, the rebuild must be processed in a more timely manner.

This script loops until all the builds in the archive are completed. It will remove the copy archive penalty applied to some of the archive builds. But only a few at a time as not to take over the build farm. The default ratio is 0.25, meaning that up to (builders * 0.25) builds will be rescored. That should make sure that the rebuilds progress.

This should fix bug #805634.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

I have two specific comments.

Firstly the queue capacity changes over time; so that should be refreshed periodically - and it can change -fast- when QA nab machines for testing.

Secondly, I think 1 minute is too fast a polling time, something like 5 minutes would be a bit gentler. (I wish we had webhooks :P)

280. By Francis J. Lacoste

Increase time between schedule run, and recompute capacity every 5 schedule iteration.

281. By Francis J. Lacoste

Typo.

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

The queue capacity is actually pretty constant, except when HW enablement grabs machine for testing which shouldn't happen during copy rebuilds which require timely processing. (As they are now supposed to coordinate with platform to make sure they don't reduce capacity when it's most needed.)

Nonetheless, I did move the capacity computing within the loop and it's going to be recomputed every 25 minutes.

I also increase the time between schedule run to 5 minutes. The time we should use is probably around the time it takes for short build to complete. But 5 minutes will probably work fine any way.

Thanks for the review.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

120 + archive = distribution.getArchive(name=args[0])
121 + if archive is None:
122 + parser.error('unknown archive: %s' % args[0])

It might be worth adding extra validation here that the selected archive's is_copy property is True.

142 + for build in pending_builds:
143 + if builds_to_rescore <= 0:
144 + break
145 +
146 + if build.arch_tag != processor:
147 + continue

You're going to iterate over the batch once for each architecture here, that'll create a lot of extra API requests won't it? It might be worth sorting the batch into a dict of builds lists indexed on archtag first perhaps?

Rob said:
> Firstly the queue capacity changes over time; so that should be refreshed periodically - and it can change -fast- when QA nab machines for testing.

Well spotted :)

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

This should be landable now, no?

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

I think so, but I've got no news from the ubuntu-archive-tools maintainers (and they are the only one who can land the change).

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I'll merge this-- it is a simple leaf application that shouldn't affect any other tools in ubuntu-archive-tools. I can only give a (very) high level review and things seem ok. Can someone comment on Julian's request for extra error checking?

review: Approve
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Thanks Jamie.

That error check cannot be added because is_copy isn't exported in the API.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'copy-build-scheduler.py'
2--- copy-build-scheduler.py 1970-01-01 00:00:00 +0000
3+++ copy-build-scheduler.py 2011-07-27 14:58:35 +0000
4@@ -0,0 +1,174 @@
5+#! /usr/bin/python
6+# Copyright 2011 Canonical Ltd.
7+#
8+# This script can be used to reschedule some of the copy archives
9+# builds so that they are processed like regular PPA builds.
10+#
11+# Copy archives builds have a huge penalty applied to them which means
12+# that they are only processed when there is nothing else being processed
13+# by the build farm. That's usually fine, but for some rebuilds, we want
14+# more timely processing, while at the same time, we do want to continue to
15+# service regular PPA builds.
16+#
17+# This script will try to have a portion of the build farm processing copy
18+# builds. It does that by rescoring builds to the normal build priority
19+# range. But will only rescore a few builds at a time, so as not to take ove
20+# the build pool. By default, it won't rescore more than 1/4 the number of
21+# available builders. So for example, if there are 12 i386 builders, only
22+# 3 builds at a time will have a "normal priority".
23+import logging
24+import sys
25+import time
26+
27+from launchpadlib.launchpad import Launchpad
28+from optparse import OptionParser
29+
30+
31+API_NAME = 'copy-build-scheduler'
32+
33+NEEDS_BUILDING = 'Needs building'
34+BUILDING = 'Currently building'
35+COPY_ARCHIVE_SCORE_PENALTY = 2600
36+# Number of minutes to wait between schedule run.
37+SCHEDULE_PERIOD = 5
38+
39+def determine_builder_capacity(lp, options):
40+ """Find how many builders to use for copy builds by processor."""
41+ capacity = {}
42+ for processor in options.processors:
43+ queue = lp.builders.getBuildersForQueue(
44+ processor='/+processors/%s' % processor, virtualized=True)
45+ max_capacity = len(queue)
46+ capacity[processor] = round(max_capacity * options.builder_ratio)
47+ # Make sure at least 1 builders is associated
48+ if capacity[processor] == 0:
49+ capacity[processor] = 1
50+ logging.info(
51+ 'Will use %d out of %d %s builders', capacity[processor],
52+ max_capacity, processor)
53+ return capacity
54+
55+
56+def get_archive_used_builders_capacity(archive):
57+ """Return the number of builds currently being done for the archive."""
58+ capacity = {}
59+ building = archive.getBuildRecords(build_state=BUILDING)
60+ for build in building:
61+ capacity.setdefault(build.arch_tag, 0)
62+ capacity[build.arch_tag] += 1
63+ return capacity
64+
65+
66+def main(argv):
67+ parser = OptionParser(
68+ usage="%prog [options] copy-archive-name")
69+ parser.add_option(
70+ '--lp-instance', default='production', dest='lp_instance',
71+ help="Select the Launchapd instance to run against. Defaults to "
72+ "'production'")
73+ parser.add_option(
74+ '-v', '--verbose', default=0, action='count', dest='verbose',
75+ help="Increase verbosity of the script. -v prints info messages"
76+ "-vv will print debug messages.")
77+ parser.add_option(
78+ '-c', '--credentials', default=None, action='store',
79+ dest='credentials',
80+ help="Use the OAuth credentials in FILE instead of the desktop "
81+ "one.", metavar='FILE')
82+ parser.add_option(
83+ '-d', '--distribution', default='ubuntu', action='store',
84+ dest='distribution',
85+ help="The archive distribution. Defaults to 'ubuntu'.")
86+ parser.add_option(
87+ '-p', '--processor', default=('i386', 'amd64'), action='append',
88+ dest='processors',
89+ help="The processor for which to schedule builds. "
90+ "Default to i386 and amd64.")
91+ parser.add_option(
92+ '-r', '--ratio', default=0.25, action='store', type='float',
93+ dest='builder_ratio',
94+ help="The ratio of builders that you want to use for the copy "
95+ "builds. Default to 25% of the available builders.")
96+ options, args = parser.parse_args(argv[1:])
97+ if len(args) != 1:
98+ parser.error('Missing archive name.')
99+
100+ if options.verbose >= 2:
101+ log_level = logging.DEBUG
102+ elif options.verbose == 1:
103+ log_level = logging.INFO
104+ else:
105+ log_level = logging.WARNING
106+ logging.basicConfig(level=log_level)
107+
108+ if options.builder_ratio >= 1 or options.builder_ratio < 0:
109+ parser.error(
110+ 'ratio should be a float between 0 and 1: %s' %
111+ options.builder_ratio)
112+
113+ lp = Launchpad.login_with(
114+ API_NAME, options.lp_instance,
115+ credentials_file=options.credentials,
116+ version='devel')
117+
118+ try:
119+ distribution = lp.distributions[options.distribution]
120+ except KeyError:
121+ parser.error('unknown distribution: %s' % options.distribution)
122+
123+ archive = distribution.getArchive(name=args[0])
124+ if archive is None:
125+ parser.error('unknown archive: %s' % args[0])
126+
127+ iteration = 0
128+ while True:
129+ # Every 5 schedules run - and on the first - compute available
130+ # capacity.
131+ if (iteration % 5) == 0:
132+ capacity = determine_builder_capacity(lp, options)
133+ iteration += 1
134+
135+ pending_builds = archive.getBuildRecords(build_state=NEEDS_BUILDING)
136+ logging.debug('Found %d pending builds.' % len(pending_builds))
137+ if len(pending_builds) == 0:
138+ logging.info('No more builds pending. We are done.')
139+ break
140+
141+ used_capacity = get_archive_used_builders_capacity(archive)
142+
143+ # For each processor, rescore up as many builds as we have
144+ # capacity for.
145+ for processor in options.processors:
146+ builds_to_rescore = (
147+ capacity[processor] - used_capacity.get(processor, 0))
148+ logging.debug(
149+ 'Will try to rescore %d %s builds', builds_to_rescore,
150+ processor)
151+ for build in pending_builds:
152+ if builds_to_rescore <= 0:
153+ break
154+
155+ if build.arch_tag != processor:
156+ continue
157+
158+ if build.score < 0:
159+ # Only rescore builds that look like the negative
160+ # copy archive modified have been applied.
161+ logging.info('Rescoring %s' % build.title)
162+ # This should make them considered like a regular build.
163+ build.rescore(
164+ score=build.score + COPY_ARCHIVE_SCORE_PENALTY)
165+ else:
166+ logging.debug('%s already rescored', build.title)
167+
168+ # If the score was already above 0, it was probably
169+ # rescored already, count it against our limit anyway.
170+ builds_to_rescore -= 1
171+
172+ # Reschedule in a while.
173+ logging.debug('Sleeping for %d minutes.', SCHEDULE_PERIOD)
174+ time.sleep(SCHEDULE_PERIOD * 60)
175+
176+
177+if __name__ == '__main__':
178+ main(sys.argv)

Subscribers

People subscribed via source and target branches