Merge ~nacc/git-ubuntu:modernize-scripts-v2 into git-ubuntu:master
- Git
- lp:~nacc/git-ubuntu
- modernize-scripts-v2
- Merge into master
Status: | Merged | ||||||||
---|---|---|---|---|---|---|---|---|---|
Approved by: | Nish Aravamudan | ||||||||
Approved revision: | 9ed2e8a2ac18665cbc7d966a74b1cf0deee32680 | ||||||||
Merged at revision: | af30fd0a33d1ea36b514420c5d8867dbfb2d7160 | ||||||||
Proposed branch: | ~nacc/git-ubuntu:modernize-scripts-v2 | ||||||||
Merge into: | git-ubuntu:master | ||||||||
Prerequisite: | ~nacc/git-ubuntu:lp1730734-cache-importer-progress | ||||||||
Diff against target: |
1613 lines (+1122/-290) 10 files modified
dev/null (+0/-199) gitubuntu/importer.py (+3/-50) gitubuntu/source-package-blacklist.txt (+7/-0) gitubuntu/source-package-whitelist.txt (+0/-7) gitubuntu/source_information.py (+2/-16) man/man1/git-ubuntu-import.1 (+2/-18) scripts/import-source-packages.py (+377/-0) scripts/scriptutils.py (+191/-0) scripts/source-package-walker.py (+272/-0) scripts/update-repository-alias.py (+268/-0) |
||||||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robie Basak | Approve | ||
Server Team CI bot | continuous-integration | Approve | |
Review via email: mp+333500@code.launchpad.net |
This proposal supersedes a proposal from 2017-11-01.
Commit message
Description of the change
Make jenkins happy.
Server Team CI bot (server-team-bot) wrote : Posted in a previous version of this proposal | # |
Robie Basak (racb) wrote : Posted in a previous version of this proposal | # |
Why not put all the code into the gitubuntu module, with thin wrappers in bin/, like git-ubuntu, rather than have a separate infrastructure in scripts/ with a different additional sys.path hack? Or is there some kind of snap-related difficulty with this?
Robie Basak (racb) wrote : Posted in a previous version of this proposal | # |
If we're going with pylint and treating these as new, then none of this is currently pylint clean. I'm not saying it necessarily should be. What's your opinion?
Nish Aravamudan (nacc) wrote : Posted in a previous version of this proposal | # |
On Tue, Nov 7, 2017 at 6:24 AM, Robie Basak <email address hidden> wrote:
> Why not put all the code into the gitubuntu module, with thin wrappers in bin/, like git-ubuntu,
I can do this, but I don't want these scripts in the snap. Every
default change would imply a a new snap version, which forces everyone
to download a new update. While the xdelta gunk from the store should
make that minimal, in practice it does not.
> rather than have a separate infrastructure in scripts/ with a different additional sys.path hack?
I can move it to bin/, but it was mostly for keeping the code isolated
and clean. It's not something everyone is going to need to run. I
realize, though, the alias script, at least, should be in the snap
maybe.
Robie Basak (racb) wrote : Posted in a previous version of this proposal | # |
On Tue, Nov 07, 2017 at 04:04:53PM -0000, Nish Aravamudan wrote:
> On Tue, Nov 7, 2017 at 6:24 AM, Robie Basak <email address hidden> wrote:
> > Why not put all the code into the gitubuntu module, with thin wrappers in bin/, like git-ubuntu,
>
> I can do this, but I don't want these scripts in the snap. Every
> default change would imply a a new snap version, which forces everyone
> to download a new update. While the xdelta gunk from the store should
> make that minimal, in practice it does not.
Is this because edge is automatically generated? Could we start using
"beta" instead, and push to beta only when there's some other real
change? Or does this make other pain worse?
Nish Aravamudan (nacc) wrote : Posted in a previous version of this proposal | # |
On Tue, Nov 7, 2017 at 6:28 AM, Robie Basak <email address hidden> wrote:
> If we're going with pylint and treating these as new, then none of this is currently pylint clean. I'm not saying it necessarily should be. What's your opinion?
I'm fine with targetting clean runs, but we need a pylintrc, i think,
in order to have it agree to our formatting (4 space always indented).
And I think the 'too many arguments' and 'too many variables' rules
are dumb, but I'll read why the exist first. I pushed a few cleanups
that I had missed previously.
Nish Aravamudan (nacc) wrote : Posted in a previous version of this proposal | # |
On Tue, Nov 7, 2017 at 8:12 AM, Robie Basak <email address hidden> wrote:
> On Tue, Nov 07, 2017 at 04:04:53PM -0000, Nish Aravamudan wrote:
>> On Tue, Nov 7, 2017 at 6:24 AM, Robie Basak <email address hidden> wrote:
>> > Why not put all the code into the gitubuntu module, with thin wrappers in bin/, like git-ubuntu,
>>
>> I can do this, but I don't want these scripts in the snap. Every
>> default change would imply a a new snap version, which forces everyone
>> to download a new update. While the xdelta gunk from the store should
>> make that minimal, in practice it does not.
>
> Is this because edge is automatically generated? Could we start using
> "beta" instead, and push to beta only when there's some other real
> change? Or does this make other pain worse?
Well, and also, these are not really part of the application, they are
part of the wrapper of the application.
Adding another branch for the beta channel would be fine, but then we
have even more management to deal with (merge from edge to beta, merge
from beta to stable).
Server Team CI bot (server-team-bot) wrote : Posted in a previous version of this proposal | # |
PASSED: Continuous integration, rev:5b794876d4e
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Robie Basak (racb) wrote : Posted in a previous version of this proposal | # |
Looks good in general! I like the multiprocess.
See my review branch for review comments for which it's easier just to describe in code. Individual explanations are in individual commit messages: https:/
I've left a couple of inline comments.
Some other general comments:
I'm not keen on the return spec of import_srcpkg. It seems rather arbitrary and unnatural, and it's not even convenient as you still have to massage it in the caller. That massaging is also duplicated twice in the two callers. I suggest you change the return spec to (pkgname, success_bool) instead. That would save you from having to zip it back up in the caller.
If you still want a (success_list, fail_list) tuple to process further up the stack more easily, then that can't be done by changing the return type of import_srcpkg further. So you'd still end up having some duplication. To fix that (if you still want success_list, fail_list), I suggest wrapping import_srcpkg and putting the wrapper in scriptutils. The wrapper could to the multiprocess pool, call the real import_srcpkg, and generate the (success_list, fail_list) result.
An example of getting from a (pkgname, success_bool) list to a (success_list, fail_list):
return (
[pkg for pkg, success in results if success],
[pkg for pkg, succces in results if not success],
)
General comments on source-
Parsing Sources.xz alarmed me; perhaps add a comment that it's a cjwatson suggestion because the API way will be very slow and there's no better API for it :)
Can we make the components a list, because we'll be adding more later? Otherwise there's code duplication already due to the hardcoding like this.
We also need to import from other pockets and other series (at least the supported ones). Eg. source packages that have been deleted before this cycle, and source packages that were added to older stable releases in the other pockets directly such as HWE stacks.
Please parse using debian.deb822 or similar:
10:42 <rbasak> While you're here (I'm review nacc's MP), any opinion on the actual parsing, as implemented in Python? nacc is doing some Python-based parsing of the Sources file. Which feels ugly, but shelling out to grep-dctrl would also be ugly.
10:42 <rbasak> What would you do?
10:42 <cjwatson> rbasak: I'd use python-debian
10:42 <cjwatson> The stuff in debian.deb822 is generally fine for this
10:43 <rbasak> Thanks!
10:43 <cjwatson> rbasak: (python-apt is also fine; python-debian sometimes makes use of that for speed. Use whichever interface is more comfortable.)
10:44 <cjwatson> rbasak: germinate uses python-apt, so possibly I preferred that at one point. I think when I wrote that python-debian was significantly less good.
Robie Basak (racb) wrote : Posted in a previous version of this proposal | # |
PS. all my suggestions are entirely untested :)
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:56ea91693c5
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:325412302b1
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:9ed2e8a2ac1
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Robie Basak (racb) wrote : | # |
import_
Nish Aravamudan (nacc) wrote : | # |
On Mon, Nov 13, 2017 at 11:36 PM, Robie Basak <email address hidden> wrote:
> Review: Approve
>
> import_
Responses inline.
>
> Diff comments:
>
>> diff --git a/scripts/
>> new file mode 100644
>> index 0000000..c89507e
>> --- /dev/null
>> +++ b/scripts/
>> @@ -0,0 +1,191 @@
>> +from collections import namedtuple
>> +import functools
>> +import hashlib
>> +import multiprocessing
>> +import os
>> +import sys
>> +import subprocess
>> +import time
>> +
>> +import pkg_resources
>> +
>> +# We expect to be running from a git repository in master for this
>> +# script, because the snap's python code is not exposed except within
>> +# the snap
>> +try:
>> + REALPATH = os.readlink(
>> +except OSError:
>> + REALPATH = __file__
>> +sys.path.insert(
>> + 0,
>> + os.path.abspath(
>> + os.path.
>> + )
>> +)
>> +
>> +from gitubuntu.run import run
>> +
>> +Defaults = namedtuple(
>> + 'Defaults',
>> + [
>> + 'num_workers',
>> + 'whitelist',
>> + 'blacklist',
>> + 'phasing_universe',
>> + 'phasing_main',
>> + 'dry_run',
>> + 'use_whitelist',
>> + ],
>> +)
>> +
>> +DEFAULTS = Defaults(
>> + num_workers=10,
>> + whitelist=
>> + 'gitubuntu',
>> + 'source-
>> + ),
>> + blacklist=
>> + 'gitubuntu',
>> + 'source-
>> + ),
>> + phasing_universe=0,
>> + phasing_main=1,
>> + dry_run=False,
>> + use_whitelist=True,
>> +)
>> +
>> +
>> +def should_
>> + pkgname,
>> + component,
>> + whitelist,
>> + blacklist,
>> + phasing_main,
>> + phasing_universe,
>> +):
>> + """should_
>> +
>> + The phasing is implemented similarly to update-manager. If the
>> + md5sum of the source package name is less than the (appropriate
>> + percentage * 2^128) (the maximum representable md5sum), the source
>> + package name is in the appropriate phasing set.
>> +
>> + Arguments:
>> + pkgname - string name of a source package
>> + component - string archive component of @pgkname
>> + whitelist - a list of of packages to always import
>> + blacklist - a list of packages to never import
>> + phasing_main - a integer percentage of all packages in main to import
>> + phasing_universe - a integer percentage of all packages in universe to import
>> +
>> + Returns:
>> + True if @pkgname should be imported, False if not.
>> + """
>> + if pkgname in blacklist:
>> + return False
>> + if pkgname in whitelist:
>> + return True
>> + md5sum = int(
>> + hashlib.md5(
>> + pkgname.
>> + ).hexdigest(),
>> + 16,
>> + )
>> + if component == 'main':
>> + if md5sum <= (phasing_main / 100) * (2**128):
>> + return Tru...
Robie Basak (racb) wrote : | # |
On Tue, Nov 14, 2017 at 03:30:56PM -0000, Nish Aravamudan wrote:
> On Mon, Nov 13, 2017 at 11:36 PM, Robie Basak <email address hidden> wrote:
> Heh, and I apparently have an aversion to the below method. In C, the
> following can result in less efficient code (iirc), on some
> architectures.
>
> I'm happy to make your change, though.
IMHO, extra state is more error-prone, and so are bindings that get
redefined (ie. "variables" that change value). OTOH, Python is already
pretty non-performant, and we don't care much for that, in favour of
readability and less-error-
performance issue, or if there are two equivalent ways of doing things
and one is more performant without harming anything else.
> > Style: no trailing comma
>
> I find this a bit confusing, if I had done
>
> pocket_suffixes = [
> '-proposed',
> '-updates',
> '-security',
> ''
> ]
>
> I think our style would have indicated it was wrong because it was missing
> a trailing comma. Why does it matter if it's inline versus multiline for the
> style rule to be applied? Can this be clarified in the style document (it also
> means moving from inline to multi-line is more prone to missing this, IMO).
Python convention is to use [x, y, z] (with no other whitespace or
commas) AIUI. I'm not sure about multi-line, but Python allows the
trailing comma specifically (AIUI) to allow the style of not having to
edit the previous line to add a new one. So in my head, the trailing
comma only applies to the multi-line case.
I'm interested to know what others on the team think. But yeah, I'll
happily edit the style document if you agree.
Preview Diff
1 | diff --git a/bin/import-cron b/bin/import-cron |
2 | deleted file mode 100755 |
3 | index 0044b74..0000000 |
4 | --- a/bin/import-cron |
5 | +++ /dev/null |
6 | @@ -1,199 +0,0 @@ |
7 | -#!/usr/bin/env python3 |
8 | - |
9 | -from copy import copy |
10 | -import os |
11 | -import subprocess |
12 | -import sys |
13 | -import tempfile |
14 | -import time |
15 | -# we know that our relative path doesn't include the module |
16 | -try: |
17 | - realpath = os.readlink(__file__) |
18 | -except OSError: |
19 | - realpath = __file__ |
20 | -sys.path.insert( |
21 | - 0, |
22 | - os.path.abspath( |
23 | - os.path.join(os.path.dirname(realpath), os.path.pardir) |
24 | - ) |
25 | -) |
26 | -from gitubuntu.source_information import launchpad_login |
27 | -from gitubuntu.run import run |
28 | -try: |
29 | - pkg = 'python3-pkg-resources' |
30 | - import pkg_resources |
31 | -except ImportError: |
32 | - logging.error('Is %s installed?', pkg) |
33 | - sys.exit(1) |
34 | - |
35 | -# we might want this to be contentfully stored in the importer's git |
36 | -# repository? |
37 | -import_scan_timestamps = dict() |
38 | -import_scan_timestamps['debian'] = time.time() - (24 * 60 * 60) |
39 | -import_scan_timestamps['ubuntu'] = time.time() - (24 * 60 * 60) |
40 | -import_scan_links = dict() |
41 | -import_scan_links['debian'] = None |
42 | -import_scan_links['ubuntu'] = None |
43 | - |
44 | -import_cron_log = os.path.join(tempfile.gettempdir(), 'import-cron-log') |
45 | -import_cron_packages = pkg_resources.resource_filename('gitubuntu', |
46 | - 'import-cron-packages.txt') |
47 | - |
48 | -def write_timestamps(): |
49 | - global import_scan_timestamps |
50 | - global import_scan_links |
51 | - |
52 | - with open(import_cron_log, 'w+') as f: |
53 | - for dist_name in ['debian', 'ubuntu']: |
54 | - f.write('%s timestamp: %f\n' % (dist_name, import_scan_timestamps[dist_name])) |
55 | - f.write('%s link: %s\n' % (dist_name, import_scan_links[dist_name])) |
56 | - |
57 | -def read_timestamps(): |
58 | - global import_scan_timestamps |
59 | - global import_scan_links |
60 | - |
61 | - # This could be more defensive |
62 | - try: |
63 | - with open(import_cron_log, 'r') as f: |
64 | - for line in f: |
65 | - dist_name, log_type, value = line.split() |
66 | - if 'timestamp' in log_type: |
67 | - import_scan_timestamps[dist_name] = float(value) |
68 | - elif 'link' in log_type: |
69 | - import_scan_links[dist_name] = value |
70 | - except IOError: |
71 | - pass |
72 | - |
73 | -def update_timestamp(dist_name, spphr): |
74 | - global import_scan_timestamps |
75 | - global import_scan_links |
76 | - |
77 | - if spphr.date_published: |
78 | - timestamp = spphr.date_published.timestamp() |
79 | - link = spphr.self_link |
80 | - import_scan_timestamps[dist_name] = timestamp |
81 | - import_scan_links[dist_name] = str(link) |
82 | - return True |
83 | - return False |
84 | - |
85 | -def import_new_published_sources(launchpad, packages, imported_srcpkgs, failed_srcpkgs): |
86 | - global import_scan_timestamps |
87 | - global import_scan_links |
88 | - args = {'order_by_date':True} |
89 | - |
90 | - for dist_name in ['debian', 'ubuntu']: |
91 | - print('Examining publishes in %s since %f' % (dist_name, import_scan_timestamps[dist_name])) |
92 | - dist = launchpad.distributions[dist_name] |
93 | - spph = dist.main_archive.getPublishedSources(**args) |
94 | - if len(spph) == 0: |
95 | - print('No publishing data found in %s' % dist_name) |
96 | - continue |
97 | - if import_scan_timestamps[dist_name]: |
98 | - _spph = list() |
99 | - for spphr in spph: |
100 | - # only check if we should stop iterating if there is a |
101 | - # timestamp to compare to |
102 | - if spphr.date_published: |
103 | - # stop iterating (backwards chronologically) when we see |
104 | - # a publish timestamp before the last cron run |
105 | - if spphr.date_published.timestamp() < import_scan_timestamps[dist_name]: |
106 | - break |
107 | - if spphr.source_package_name not in packages: |
108 | - continue |
109 | - _spph.append(spphr) |
110 | - # if no packages in our whitelist have been updated in the |
111 | - # scan window, then bump the last scan date manually to the |
112 | - # last valid timestamp in the window |
113 | - if len(_spph) == 0: |
114 | - print('No new relevant publishes found in %s relative to %f' % (dist_name, import_scan_timestamps[dist_name])) |
115 | - for spphr in spph: |
116 | - if update_timestamp(dist_name, spphr): |
117 | - break |
118 | - spph = _spph |
119 | - spph_iter = reversed(spph) |
120 | - caught_up = False |
121 | - for spphr in spph_iter: |
122 | - # find the matching upload to the last publish seen |
123 | - if not caught_up: |
124 | - if not import_scan_links[dist_name] or str(spphr.self_link) == import_scan_links[dist_name]: |
125 | - caught_up = True |
126 | - continue |
127 | - |
128 | - if spphr.source_package_name in failed_srcpkgs: |
129 | - update_timestamp(dist_name, spphr) |
130 | - continue |
131 | - |
132 | - # try up to 3 times before declaring failure, in case of |
133 | - # racing with the publisher/transient download failure |
134 | - success = False |
135 | - for i in range(3): |
136 | - if spphr.source_package_name in imported_srcpkgs: |
137 | - success = True |
138 | - break |
139 | - print('git ubuntu import -l usd-importer-bot %s' % spphr.source_package_name) |
140 | - try: |
141 | - run(['git', 'ubuntu', 'import', '-l', 'usd-importer-bot', spphr.source_package_name], check=True) |
142 | - imported_srcpkgs.append(spphr.source_package_name) |
143 | - except subprocess.CalledProcessError: |
144 | - print('failed to import %s (attempt %d/3)' % (spphr.source_package_name, i+1)) |
145 | - time.sleep(10) |
146 | - continue |
147 | - |
148 | - if not success: |
149 | - print('failed to import %s' % spphr.source_package_name) |
150 | - failed_srcpkgs.append(spphr.source_package_name) |
151 | - # we want to bump the timestamp for every source package |
152 | - # published, regardless of whether we have already run the |
153 | - # importer for it in this cron run, since it might be published |
154 | - # multiple times (e.g., different series for an SRU). |
155 | - update_timestamp(dist_name, spphr) |
156 | - return (imported_srcpkgs, failed_srcpkgs) |
157 | - |
158 | -def main(): |
159 | - try: |
160 | - run(['git', 'config', '--global', 'user.name'], check=True) |
161 | - except subprocess.CalledProcessError: |
162 | - run(['git', 'config', '--global', 'user.name', 'Ubuntu Git Importer']) |
163 | - try: |
164 | - run(['git', 'config', '--global', 'user.email'], check=True) |
165 | - except subprocess.CalledProcessError: |
166 | - run(['git', 'config', '--global', 'user.email', 'usd-importer-announce@list.canonical.com']) |
167 | - try: |
168 | - with open(import_cron_packages, 'r') as f: |
169 | - packages = [line.strip() for line in f if not line.startswith('#')] |
170 | - except: |
171 | - packages = list() |
172 | - read_timestamps() |
173 | - # minutes |
174 | - sleep_interval = 20 |
175 | - # hours |
176 | - mail_interval = 1 |
177 | - launchpad = launchpad_login() |
178 | - imported_srcpkgs = list() |
179 | - failed_srcpkgs = list() |
180 | - last_mail_timestamp = time.time() |
181 | - while True: |
182 | - if (time.time() - last_mail_timestamp >= (mail_interval * 60 * 60) and |
183 | - (len(imported_srcpkgs) != 0 or len(failed_srcpkgs) != 0) |
184 | - ): |
185 | - i = b'Subject: Importer report\n' |
186 | - if len(imported_srcpkgs) != 0: |
187 | - i += b'Successfully imported the following source packages\n' + b'\n'.join(map(lambda x : x.encode('utf-8'), imported_srcpkgs)) |
188 | - if len(failed_srcpkgs) != 0: |
189 | - i += b'\nFailed to import the following source packages\n' + b'\n'.join(map(lambda x : x.encode('utf-8'), failed_srcpkgs)) |
190 | - run(['sendmail', '-F', 'Ubuntu Git Importer', '-f', 'usd-importer-do-not-mail@canonical.com', 'usd-import-announce@lists.canonical.com'], input=i) |
191 | - last_mail_timestamp = time.time() |
192 | - orig_imported_srcpkgs = copy(imported_srcpkgs) |
193 | - imported_srcpkgs, failed_srcpkgs = import_new_published_sources(launchpad, packages, imported_srcpkgs, failed_srcpkgs) |
194 | - print('Imported %d source packages' % (len(imported_srcpkgs) - len(orig_imported_srcpkgs))) |
195 | - if len(imported_srcpkgs) == len(orig_imported_srcpkgs): |
196 | - write_timestamps() |
197 | - time.sleep(sleep_interval * 60) |
198 | - imported_srcpkgs = list() |
199 | - failed_srcpkgs = list() |
200 | - # I have seen transient network issues lead to timeouts and stalls |
201 | - # so reset the connection on each iteration |
202 | - launchpad = launchpad_login() |
203 | - |
204 | -if __name__ == '__main__': |
205 | - main() |
206 | diff --git a/gitubuntu/importer.py b/gitubuntu/importer.py |
207 | index fb5cd77..f7dfe92 100644 |
208 | --- a/gitubuntu/importer.py |
209 | +++ b/gitubuntu/importer.py |
210 | @@ -26,7 +26,6 @@ |
211 | |
212 | import argparse |
213 | import atexit |
214 | -import dbm |
215 | import functools |
216 | import getpass |
217 | import logging |
218 | @@ -151,7 +150,6 @@ def main( |
219 | parentfile=top_level_defaults.parentfile, |
220 | retries=top_level_defaults.retries, |
221 | retry_backoffs=top_level_defaults.retry_backoffs, |
222 | - db_cache_dir=None, |
223 | ): |
224 | """Main entry point to the importer |
225 | |
226 | @@ -180,9 +178,6 @@ def main( |
227 | @parentfile: string path to file specifying parent overrides |
228 | @retries: integer number of download retries to attempt |
229 | @retry_backoffs: list of backoff durations to use between retries |
230 | - @db_cache_dir: string fileystem directory containing 'ubuntu' and |
231 | - 'debian' dbm database files, which store the progress of prior |
232 | - importer runs |
233 | |
234 | If directory is None, a temporary directory is created and used. |
235 | |
236 | @@ -192,11 +187,6 @@ def main( |
237 | If dl_cache is None, CACHE_PATH in the local repository will be |
238 | used. |
239 | |
240 | - If db_cache_dir is None, no database lookups are performed which makes |
241 | - it possible that the import will attempt to re-import already |
242 | - imported publishes. This should be fine, although less efficient |
243 | - than possible. |
244 | - |
245 | Returns 0 on successful import (which includes non-fatal failures); |
246 | 1 otherwise. |
247 | """ |
248 | @@ -324,9 +314,6 @@ def main( |
249 | else: |
250 | workdir = dl_cache |
251 | |
252 | - if db_cache_dir: |
253 | - os.makedirs(db_cache_dir, exist_ok=True) |
254 | - |
255 | os.makedirs(workdir, exist_ok=True) |
256 | |
257 | # now sets a global _PARENT_OVERRIDES |
258 | @@ -339,7 +326,6 @@ def main( |
259 | patches_applied=False, |
260 | debian_head_versions=debian_head_versions, |
261 | ubuntu_head_versions=ubuntu_head_versions, |
262 | - db_cache_dir=db_cache_dir, |
263 | debian_sinfo=debian_sinfo, |
264 | ubuntu_sinfo=ubuntu_sinfo, |
265 | active_series_only=active_series_only, |
266 | @@ -361,7 +347,6 @@ def main( |
267 | patches_applied=True, |
268 | debian_head_versions=applied_debian_head_versions, |
269 | ubuntu_head_versions=applied_ubuntu_head_versions, |
270 | - db_cache_dir=db_cache_dir, |
271 | debian_sinfo=debian_sinfo, |
272 | ubuntu_sinfo=ubuntu_sinfo, |
273 | active_series_only=active_series_only, |
274 | @@ -1367,7 +1352,6 @@ def import_publishes( |
275 | patches_applied, |
276 | debian_head_versions, |
277 | ubuntu_head_versions, |
278 | - db_cache_dir, |
279 | debian_sinfo, |
280 | ubuntu_sinfo, |
281 | active_series_only, |
282 | @@ -1378,8 +1362,6 @@ def import_publishes( |
283 | history_found = False |
284 | only_debian = False |
285 | srcpkg_information = None |
286 | - last_debian_spphr = None |
287 | - last_ubuntu_spphr = None |
288 | if patches_applied: |
289 | _namespace = namespace |
290 | namespace = '%s/applied' % namespace |
291 | @@ -1395,28 +1377,15 @@ def import_publishes( |
292 | import_unapplied_spi, |
293 | skip_orig=skip_orig, |
294 | ) |
295 | - |
296 | - for distname, versions, dist_sinfo, last_spphr in ( |
297 | - ("debian", debian_head_versions, debian_sinfo, last_debian_spphr), |
298 | - ("ubuntu", ubuntu_head_versions, ubuntu_sinfo, last_ubuntu_spphr), |
299 | - ): |
300 | + for distname, versions, dist_sinfo in ( |
301 | + ("debian", debian_head_versions, debian_sinfo), |
302 | + ("ubuntu", ubuntu_head_versions, ubuntu_sinfo)): |
303 | if active_series_only and distname == "debian": |
304 | continue |
305 | - |
306 | - last_spphr = None |
307 | - if db_cache_dir: |
308 | - with dbm.open(os.path.join(db_cache_dir, distname), 'c') as cache: |
309 | - try: |
310 | - last_spphr = decode_binary(cache[pkgname]) |
311 | - except KeyError: |
312 | - pass |
313 | - |
314 | try: |
315 | - last_spi = None |
316 | for srcpkg_information in dist_sinfo.launchpad_versions_published_after( |
317 | versions, |
318 | namespace, |
319 | - last_spphr, |
320 | workdir=workdir, |
321 | active_series_only=active_series_only |
322 | ): |
323 | @@ -1427,11 +1396,6 @@ def import_publishes( |
324 | namespace=_namespace, |
325 | ubuntu_sinfo=ubuntu_sinfo, |
326 | ) |
327 | - last_spi = srcpkg_information |
328 | - if last_spi: |
329 | - if db_cache_dir: |
330 | - with dbm.open(os.path.join(db_cache_dir, distname), 'w') as db_cache: |
331 | - db_cache[pkgname] = str(last_spi.spphr) |
332 | except NoPublicationHistoryException: |
333 | logging.warning("No publication history found for %s in %s.", |
334 | pkgname, distname |
335 | @@ -1507,11 +1471,6 @@ def parse_args(subparsers=None, base_subparsers=None): |
336 | action='store_true', |
337 | help=argparse.SUPPRESS, |
338 | ) |
339 | - parser.add_argument( |
340 | - '--db-cache', |
341 | - type=str, |
342 | - help=argparse.SUPPRESS, |
343 | - ) |
344 | if not subparsers: |
345 | return parser.parse_args() |
346 | return 'import - %s' % kwargs['description'] |
347 | @@ -1540,11 +1499,6 @@ def cli_main(args): |
348 | except AttributeError: |
349 | dl_cache = None |
350 | |
351 | - try: |
352 | - db_cache = args.db_cache |
353 | - except AttributeError: |
354 | - db_cache = None |
355 | - |
356 | return main( |
357 | pkgname=args.package, |
358 | owner=args.lp_owner, |
359 | @@ -1564,5 +1518,4 @@ def cli_main(args): |
360 | parentfile=args.parentfile, |
361 | retries=args.retries, |
362 | retry_backoffs=args.retry_backoffs, |
363 | - db_cache_dir=args.db_cache, |
364 | ) |
365 | diff --git a/gitubuntu/source-package-blacklist.txt b/gitubuntu/source-package-blacklist.txt |
366 | new file mode 100644 |
367 | index 0000000..ac8aef9 |
368 | --- /dev/null |
369 | +++ b/gitubuntu/source-package-blacklist.txt |
370 | @@ -0,0 +1,7 @@ |
371 | +linux |
372 | +linux-base |
373 | +linux-firmware |
374 | +linux-meta |
375 | +lxc |
376 | +lxcfs |
377 | +lxd |
378 | diff --git a/gitubuntu/import-cron-packages.txt b/gitubuntu/source-package-whitelist.txt |
379 | index 74df660..61b6f35 100644 |
380 | --- a/gitubuntu/import-cron-packages.txt |
381 | +++ b/gitubuntu/source-package-whitelist.txt |
382 | @@ -435,11 +435,7 @@ libxml-security-java |
383 | libxml-xpath-perl |
384 | libxmu |
385 | libyaml |
386 | -#linux |
387 | linux-atm |
388 | -#linux-base |
389 | -#linux-firmware |
390 | -#linux-meta |
391 | lm-sensors |
392 | lockfile-progs |
393 | logcheck |
394 | @@ -454,9 +450,6 @@ ltrace |
395 | lua5.2 |
396 | lua-lpeg |
397 | lvm2 |
398 | -#lxc |
399 | -#lxcfs |
400 | -#lxd |
401 | lz4 |
402 | lzo2 |
403 | m2300w |
404 | diff --git a/gitubuntu/source_information.py b/gitubuntu/source_information.py |
405 | index 6399460..3fef4ec 100644 |
406 | --- a/gitubuntu/source_information.py |
407 | +++ b/gitubuntu/source_information.py |
408 | @@ -421,14 +421,7 @@ class GitUbuntuSourceInformation(object): |
409 | for srcpkg in spph: |
410 | yield self.get_corrected_spi(srcpkg, workdir) |
411 | |
412 | - def launchpad_versions_published_after( |
413 | - self, |
414 | - head_versions, |
415 | - namespace, |
416 | - last_spphr=None, |
417 | - workdir=None, |
418 | - active_series_only=False, |
419 | - ): |
420 | + def launchpad_versions_published_after(self, head_versions, namespace, workdir=None, active_series_only=False): |
421 | args = { |
422 | 'exact_match':True, |
423 | 'source_name':self.pkgname, |
424 | @@ -456,14 +449,7 @@ class GitUbuntuSourceInformation(object): |
425 | if len(spph) == 0: |
426 | raise NoPublicationHistoryException("Is %s published in %s?" % |
427 | (self.pkgname, self.dist_name)) |
428 | - if last_spphr: |
429 | - _spph = list() |
430 | - for spphr in spph: |
431 | - if str(spphr) == last_spphr: |
432 | - break |
433 | - _spph.append(spphr) |
434 | - spph = _spph |
435 | - elif head_versions: |
436 | + if len(head_versions) > 0: |
437 | _spph = list() |
438 | for spphr in spph: |
439 | spi = GitUbuntuSourcePackageInformation(spphr, self.dist_name, |
440 | diff --git a/man/man1/git-ubuntu-import.1 b/man/man1/git-ubuntu-import.1 |
441 | index 74bfe83..dd7fd9e 100644 |
442 | --- a/man/man1/git-ubuntu-import.1 |
443 | +++ b/man/man1/git-ubuntu-import.1 |
444 | @@ -1,4 +1,4 @@ |
445 | -.TH "GIT-UBUNTU-IMPORT" "1" "2017-11-08" "Git-Ubuntu 0.6.2" "Git-Ubuntu Manual" |
446 | +.TH "GIT-UBUNTU-IMPORT" "1" "2017-07-19" "Git-Ubuntu 0.2" "Git-Ubuntu Manual" |
447 | |
448 | .SH "NAME" |
449 | git-ubuntu import \- Import Launchpad publishing history to Git |
450 | @@ -9,8 +9,7 @@ git-ubuntu import \- Import Launchpad publishing history to Git |
451 | <user>] [\-\-dl-cache <dl_cache>] [\-\-no-fetch] [\-\-no-push] |
452 | [\-\-no-clean] [\-d | \-\-directory <directory>] |
453 | [\-\-active-series-only] [\-\-skip-applied] [\-\-skip-orig] |
454 | -[\-\-reimport] [\-\-allow-applied-failures] [\-\-db-cache <db_cache>] |
455 | -<package> |
456 | +[\-\-reimport] [\-\-allow-applied-failures] <package> |
457 | .FI |
458 | .SP |
459 | .SH "DESCRIPTION" |
460 | @@ -198,21 +197,6 @@ After investigation, this flag can be used to indicate the importer is |
461 | allowed to ignore such a failure\&. |
462 | .RE |
463 | .PP |
464 | -\-\-db-cache <db_cache> |
465 | -.RS 4 |
466 | -The path to a directory containing Python dbm database disk files for |
467 | -importer metadata\&. |
468 | -If \fB<db_cache>\fR does not exist, it will be created\&. |
469 | -Two files in \fB<db_cache>\fR are used, "ubuntu" and "debian', which are |
470 | -created if not already present\&. |
471 | -The cache files provide information to the importer about prior imports |
472 | -of \fB<package>\fR and which Launchpad publishing record was last |
473 | -imported\&. |
474 | -This is necessary because the imported Git repository does not |
475 | -necessarily maintain any metadata about Launchpad publishing |
476 | -information\&. |
477 | -.RE |
478 | -.PP |
479 | <package> |
480 | .RS 4 |
481 | The name of the source package to import\&. |
482 | diff --git a/scripts/import-source-packages.py b/scripts/import-source-packages.py |
483 | new file mode 100755 |
484 | index 0000000..698ae35 |
485 | --- /dev/null |
486 | +++ b/scripts/import-source-packages.py |
487 | @@ -0,0 +1,377 @@ |
488 | +#!/usr/bin/env python3 |
489 | + |
490 | +# General design: |
491 | +# Infinite loop: |
492 | +# now = time() |
493 | +# let new_publishes be the set of unique srcpkg names in publishes between last run and now |
494 | +# for srcpkg in new_publishes: |
495 | +# If srcpkg in blacklist: skip |
496 | +# If srcpkg not in PHASING_{component}: skip |
497 | +# try to import srcpkg |
498 | +# Report on successful and failed imports |
499 | +# If no srcpkgs: sleep for some time to let publisher run |
500 | + |
501 | +import argparse |
502 | +import collections |
503 | +import datetime |
504 | +import os |
505 | +import sys |
506 | +import tempfile |
507 | +import time |
508 | + |
509 | +# We expect to be running from a git repository in master for this |
510 | +# script, because the snap's python code is not exposed except within |
511 | +# the snap |
512 | +try: |
513 | + REALPATH = os.readlink(__file__) |
514 | +except OSError: |
515 | + REALPATH = __file__ |
516 | +sys.path.insert( |
517 | + 0, |
518 | + os.path.abspath( |
519 | + os.path.join(os.path.dirname(REALPATH), os.path.pardir) |
520 | + ) |
521 | +) |
522 | + |
523 | +from gitubuntu.source_information import launchpad_login |
524 | +from gitubuntu.run import run |
525 | +import scriptutils |
526 | + |
527 | +# The 'time' attribute is the publication date, as a timestamp, of the SPPHR |
528 | +# corresponding to the URL stored in the 'link' attribute. |
529 | +Timestamp = collections.namedtuple('Timestamp', ['time', 'link']) |
530 | + |
531 | +LOG_PATH = os.path.join(tempfile.gettempdir(), 'import-source-packages-log') |
532 | + |
533 | +def import_new_published_sources( |
534 | + num_workers, |
535 | + whitelist, |
536 | + blacklist, |
537 | + phasing_main, |
538 | + phasing_universe, |
539 | + dry_run, |
540 | +): |
541 | + """import_new_published_source - Import all new publishes since a prior execution |
542 | + |
543 | + Arguments: |
544 | + num_workers - integer number of worker processes to use |
545 | + whitelist - a list of of packages to always import |
546 | + blacklist - a list of packages to never import |
547 | + phasing_main - a integer percentage of all packages in main to import |
548 | + phasing_universe - a integer percentage of all packages in universe to import |
549 | + dry_run - a boolean to indicate a dry-run operation |
550 | + |
551 | + Returns: |
552 | + A tuple of two lists, the first containing the names of all |
553 | + successfully imported source packages, the second containing the |
554 | + names of all source packages that failed to import. |
555 | + """ |
556 | + timestamps = read_timestamps() |
557 | + launchpad = launchpad_login() |
558 | + |
559 | + # filtered_pkgnames is the list of source package names across all |
560 | + # distributions we will want to process, based upon |
561 | + # scriptutils.should_import_srcpkg |
562 | + filtered_pkgnames = set() |
563 | + |
564 | + for dist_name in ['debian', 'ubuntu']: |
565 | + timestamp = timestamps[dist_name] |
566 | + |
567 | + # dist_newest_spphr is the most recent publication record with a |
568 | + # valid published date in the distribution |
569 | + dist_newest_spphr = None |
570 | + |
571 | + # dist_filtered_pkgnames is the set of source package names in |
572 | + # the dist_name distribution that qualify for import according to our |
573 | + # whitelists, blacklists and phasing requirements |
574 | + dist_filtered_pkgnames = set() |
575 | + |
576 | + print( |
577 | + "Examining publishes in %s since %s" % ( |
578 | + dist_name, |
579 | + datetime.datetime.fromtimestamp( |
580 | + timestamp.time, |
581 | + ).strftime("%Y-%m-%d %H:%M:%S"), |
582 | + ) |
583 | + ) |
584 | + |
585 | + # spph is the raw publication history for a distribution from |
586 | + # Launchpad, sorted by publication date in reverse chronological order |
587 | + dist = launchpad.distributions[dist_name] |
588 | + spph = dist.main_archive.getPublishedSources(order_by_date=True) |
589 | + if not spph: |
590 | + print("No publishing data found in %s" % dist_name) |
591 | + continue |
592 | + |
593 | + for spphr in spph: |
594 | + # this is the matching upload to the last publish seen |
595 | + if str(spphr) == timestamp.link: |
596 | + break |
597 | + |
598 | + # only check if we should stop iterating due to the |
599 | + # timestamps if there is a timestamp to compare to, which |
600 | + # means the source package is actually published. |
601 | + if spphr.date_published: |
602 | + if not dist_newest_spphr: |
603 | + dist_newest_spphr = spphr |
604 | + # stop iterating (backwards chronologically) when we see |
605 | + # a publish timestamp before the last run |
606 | + if spphr.date_published.timestamp() < timestamp.time: |
607 | + break |
608 | + if scriptutils.should_import_srcpkg( |
609 | + spphr.source_package_name, |
610 | + spphr.component_name, |
611 | + whitelist, |
612 | + blacklist, |
613 | + phasing_main, |
614 | + phasing_universe, |
615 | + ): |
616 | + dist_filtered_pkgnames.add(spphr.source_package_name) |
617 | + |
618 | + if not dist_filtered_pkgnames: |
619 | + print( |
620 | + "No new relevant publishes found in %s relative to %s" % ( |
621 | + dist_name, |
622 | + datetime.datetime.fromtimestamp( |
623 | + timestamp.time |
624 | + ).strftime("%Y-%m-%d %H:%M:%S"), |
625 | + ) |
626 | + ) |
627 | + |
628 | + filtered_pkgnames = filtered_pkgnames | dist_filtered_pkgnames |
629 | + if dist_newest_spphr: |
630 | + # Update timestamp |
631 | + timestamps[dist_name] = timestamp = Timestamp( |
632 | + time=dist_newest_spphr.date_published.timestamp(), |
633 | + link=str(dist_newest_spphr), |
634 | + ) |
635 | + |
636 | + |
637 | + ret = scriptutils.pool_map_import_srcpkg( |
638 | + num_workers=num_workers, |
639 | + dry_run=dry_run, |
640 | + pkgnames=filtered_pkgnames, |
641 | + ) |
642 | + |
643 | + write_timestamps(timestamps) |
644 | + |
645 | + return ret |
646 | + |
647 | +def read_timestamps(): |
648 | + """read_timestamps - Read saved timestamp values from LOG_PATH |
649 | + |
650 | + If the log file is not readable (e.g., does not exist), the |
651 | + timestamps will be set to 24 hours before now. |
652 | + |
653 | + This method is symmetrical to write_timestamps. |
654 | + |
655 | + Returns: |
656 | + A dictionary with 'debian' and 'ubuntu' keys, each of which is a |
657 | + Timestamp namedtuple. |
658 | + """ |
659 | + try: |
660 | + timestamps = dict() |
661 | + with open(LOG_PATH, 'r') as log_file: |
662 | + for line in log_file: |
663 | + dist_name, log_type, value = line.split() |
664 | + if dist_name not in timestamps: |
665 | + timestamps[dist_name] = dict() |
666 | + assert log_type in ['time', 'link'] |
667 | + assert log_type not in timestamps[dist_name] |
668 | + if log_type == 'link': |
669 | + timestamps[dist_name][log_type] = value |
670 | + else: |
671 | + timestamps[dist_name][log_type] = float(value) |
672 | + return { |
673 | + dist: Timestamp(**dict_form) |
674 | + for dist, dict_form in timestamps |
675 | + } |
676 | + except IOError: |
677 | + _start = time.time() - (24 * 60 * 60) |
678 | + return dict( |
679 | + ubuntu=Timestamp(time=_start, link=None), |
680 | + debian=Timestamp(time=_start, link=None), |
681 | + ) |
682 | + |
683 | +def write_timestamps(timestamps): |
684 | + """write_timestamps - Write timestamp values to LOG_PATH |
685 | + |
686 | + Arguments: |
687 | + timestamps - a dictionary with 'debian' and 'ubuntu' keys, each of which is |
688 | + a Timestamp namedtuple. |
689 | + |
690 | + This method is symmetrical to read_timestamps. |
691 | + """ |
692 | + new_log = os.path.join(LOG_PATH, '.new') |
693 | + with open(new_log, 'w+') as log_file: |
694 | + log_file.write( |
695 | + 'debian time %f\n' % |
696 | + timestamps['debian'].time |
697 | + ) |
698 | + log_file.write( |
699 | + 'debian link %s\n' % |
700 | + timestamps['debian'].link |
701 | + ) |
702 | + log_file.write( |
703 | + 'ubuntu time %f\n' % |
704 | + timestamps['ubuntu'].time |
705 | + ) |
706 | + log_file.write( |
707 | + 'ubuntu link %s\n' % |
708 | + timestamps['ubuntu'].link |
709 | + ) |
710 | + os.replace(new_log, LOG_PATH) |
711 | + |
712 | +def main( |
713 | + num_workers=scriptutils.DEFAULTS.num_workers, |
714 | + whitelist_path=scriptutils.DEFAULTS.whitelist, |
715 | + blacklist_path=scriptutils.DEFAULTS.blacklist, |
716 | + phasing_main=scriptutils.DEFAULTS.phasing_main, |
717 | + phasing_universe=scriptutils.DEFAULTS.phasing_universe, |
718 | + dry_run=scriptutils.DEFAULTS.dry_run, |
719 | +): |
720 | + """main - Main entry point to the script |
721 | + |
722 | + Arguments: |
723 | + num_workers - integer number of worker threads to use |
724 | + whitelist_path - string filesystem path to a text file of packages |
725 | + to always import |
726 | + blacklist_path - string filesystem path to a text file of packages |
727 | + to never import |
728 | + phasing_main - a integer percentage of all packages in main to |
729 | + import |
730 | + phasing_universe - a integer percentage of all packages in universe |
731 | + to import |
732 | + dry_run - a boolean to indicate a dry-run operation |
733 | + """ |
734 | + scriptutils.setup_git_config() |
735 | + |
736 | + try: |
737 | + with open(whitelist_path, 'r') as whitelist_file: |
738 | + whitelist = [ |
739 | + line.strip() for line in whitelist_file |
740 | + if not line.startswith('#') |
741 | + ] |
742 | + except (FileNotFoundError, IOError): |
743 | + whitelist = list() |
744 | + |
745 | + try: |
746 | + with open(blacklist_path, 'r') as blacklist_file: |
747 | + blacklist = [ |
748 | + line.strip() for line in blacklist_file |
749 | + if not line.startswith('#') |
750 | + ] |
751 | + except (FileNotFoundError, IOError): |
752 | + blacklist = list() |
753 | + |
754 | + sleep_interval_minutes = 20 |
755 | + mail_interval_hours = 1 |
756 | + |
757 | + # pretend we sent an e-mail recently |
758 | + last_mail_timestamp = time.time() |
759 | + mail_imported_srcpkgs = set() |
760 | + mail_failed_srcpkgs = set() |
761 | + |
762 | + while True: |
763 | + imported_srcpkgs, failed_srcpkgs = import_new_published_sources( |
764 | + num_workers, |
765 | + whitelist, |
766 | + blacklist, |
767 | + phasing_main, |
768 | + phasing_universe, |
769 | + dry_run, |
770 | + ) |
771 | + print("Imported %d source packages" % len(imported_srcpkgs)) |
772 | + mail_imported_srcpkgs |= set(imported_srcpkgs) |
773 | + mail_failed_srcpkgs |= set(failed_srcpkgs) |
774 | + secs_since_last_mail = time.time() - last_mail_timestamp |
775 | + if ( |
776 | + secs_since_last_mail >= (mail_interval_hours * 60 * 60) and |
777 | + (mail_imported_srcpkgs or mail_failed_srcpkgs) |
778 | + ): |
779 | + msg = b"Subject: Importer report\n" |
780 | + if mail_imported_srcpkgs: |
781 | + msg += b"Successfully imported the following source packages:\n" |
782 | + msg += b"\n".join( |
783 | + map(lambda x: x.encode('utf-8'), mail_imported_srcpkgs) |
784 | + ) |
785 | + if mail_failed_srcpkgs: |
786 | + msg += b"\nFailed to import the following source packages:\n" |
787 | + msg += b"\n".join( |
788 | + map(lambda x: x.encode('utf-8'), mail_failed_srcpkgs) |
789 | + ) |
790 | + if dry_run: |
791 | + print("Would send email with contents:\n%s" % msg.decode()) |
792 | + else: |
793 | + run( |
794 | + [ |
795 | + 'sendmail', |
796 | + '-F', 'Ubuntu Git Importer', |
797 | + '-f', 'usd-importer-do-not-mail@canonical.com', |
798 | + 'usd-import-announce@lists.canonical.com', |
799 | + ], |
800 | + input=msg, |
801 | + ) |
802 | + last_mail_timestamp = time.time() |
803 | + mail_imported_srcpkgs = set() |
804 | + mail_failed_srcpkgs = set() |
805 | + # if we have caught up to the publisher, go to sleep |
806 | + if not imported_srcpkgs: |
807 | + time.sleep(sleep_interval_minutes * 60) |
808 | + |
809 | +def cli_main(): |
810 | + """cli_main - CLI entry point to script |
811 | + """ |
812 | + parser = argparse.ArgumentParser( |
813 | + description='Script to import all source packages with phasing', |
814 | + ) |
815 | + parser.add_argument( |
816 | + '--num-workers', |
817 | + type=int, |
818 | + help="Number of worker threads to use", |
819 | + default=scriptutils.DEFAULTS.num_workers, |
820 | + ) |
821 | + parser.add_argument( |
822 | + '--whitelist', |
823 | + type=str, |
824 | + help="Path to whitelist file", |
825 | + default=scriptutils.DEFAULTS.whitelist, |
826 | + ) |
827 | + parser.add_argument( |
828 | + '--blacklist', |
829 | + type=str, |
830 | + help="Path to blacklist file", |
831 | + default=scriptutils.DEFAULTS.blacklist, |
832 | + ) |
833 | + parser.add_argument( |
834 | + '--phasing-universe', |
835 | + type=int, |
836 | + help="Percentage of universe packages to phase", |
837 | + default=scriptutils.DEFAULTS.phasing_universe, |
838 | + ) |
839 | + parser.add_argument( |
840 | + '--phasing-main', |
841 | + type=int, |
842 | + help="Percentage of main packages to phase", |
843 | + default=scriptutils.DEFAULTS.phasing_main, |
844 | + ) |
845 | + parser.add_argument( |
846 | + '--dry-run', |
847 | + action='store_true', |
848 | + help="Simulate operation but do not actually do anything", |
849 | + default=scriptutils.DEFAULTS.dry_run, |
850 | + ) |
851 | + |
852 | + args = parser.parse_args() |
853 | + |
854 | + main( |
855 | + num_workers=args.num_workers, |
856 | + whitelist_path=args.whitelist, |
857 | + blacklist_path=args.blacklist, |
858 | + phasing_main=args.phasing_main, |
859 | + phasing_universe=args.phasing_universe, |
860 | + dry_run=args.dry_run, |
861 | + ) |
862 | + |
863 | +if __name__ == '__main__': |
864 | + cli_main() |
865 | diff --git a/scripts/scriptutils.py b/scripts/scriptutils.py |
866 | new file mode 100644 |
867 | index 0000000..c89507e |
868 | --- /dev/null |
869 | +++ b/scripts/scriptutils.py |
870 | @@ -0,0 +1,191 @@ |
871 | +from collections import namedtuple |
872 | +import functools |
873 | +import hashlib |
874 | +import multiprocessing |
875 | +import os |
876 | +import sys |
877 | +import subprocess |
878 | +import time |
879 | + |
880 | +import pkg_resources |
881 | + |
882 | +# We expect to be running from a git repository in master for this |
883 | +# script, because the snap's python code is not exposed except within |
884 | +# the snap |
885 | +try: |
886 | + REALPATH = os.readlink(__file__) |
887 | +except OSError: |
888 | + REALPATH = __file__ |
889 | +sys.path.insert( |
890 | + 0, |
891 | + os.path.abspath( |
892 | + os.path.join(os.path.dirname(REALPATH), os.path.pardir) |
893 | + ) |
894 | +) |
895 | + |
896 | +from gitubuntu.run import run |
897 | + |
898 | +Defaults = namedtuple( |
899 | + 'Defaults', |
900 | + [ |
901 | + 'num_workers', |
902 | + 'whitelist', |
903 | + 'blacklist', |
904 | + 'phasing_universe', |
905 | + 'phasing_main', |
906 | + 'dry_run', |
907 | + 'use_whitelist', |
908 | + ], |
909 | +) |
910 | + |
911 | +DEFAULTS = Defaults( |
912 | + num_workers=10, |
913 | + whitelist=pkg_resources.resource_filename( |
914 | + 'gitubuntu', |
915 | + 'source-package-whitelist.txt', |
916 | + ), |
917 | + blacklist=pkg_resources.resource_filename( |
918 | + 'gitubuntu', |
919 | + 'source-package-blacklist.txt', |
920 | + ), |
921 | + phasing_universe=0, |
922 | + phasing_main=1, |
923 | + dry_run=False, |
924 | + use_whitelist=True, |
925 | +) |
926 | + |
927 | + |
928 | +def should_import_srcpkg( |
929 | + pkgname, |
930 | + component, |
931 | + whitelist, |
932 | + blacklist, |
933 | + phasing_main, |
934 | + phasing_universe, |
935 | +): |
936 | + """should_import_srcpkg - indicate if a given source package should be imported |
937 | + |
938 | + The phasing is implemented similarly to update-manager. If the |
939 | + md5sum of the source package name is less than the (appropriate |
940 | + percentage * 2^128) (the maximum representable md5sum), the source |
941 | + package name is in the appropriate phasing set. |
942 | + |
943 | + Arguments: |
944 | + pkgname - string name of a source package |
945 | + component - string archive component of @pgkname |
946 | + whitelist - a list of of packages to always import |
947 | + blacklist - a list of packages to never import |
948 | + phasing_main - a integer percentage of all packages in main to import |
949 | + phasing_universe - a integer percentage of all packages in universe to import |
950 | + |
951 | + Returns: |
952 | + True if @pkgname should be imported, False if not. |
953 | + """ |
954 | + if pkgname in blacklist: |
955 | + return False |
956 | + if pkgname in whitelist: |
957 | + return True |
958 | + md5sum = int( |
959 | + hashlib.md5( |
960 | + pkgname.encode('utf-8') |
961 | + ).hexdigest(), |
962 | + 16, |
963 | + ) |
964 | + if component == 'main': |
965 | + if md5sum <= (phasing_main / 100) * (2**128): |
966 | + return True |
967 | + elif component == 'universe': |
968 | + if md5sum <= (phasing_universe / 100) * (2**128): |
969 | + return True |
970 | + # skip partner and multiverse for now |
971 | + return False |
972 | + |
973 | + |
974 | +def import_srcpkg(pkgname, dry_run): |
975 | + """import_srcpkg - Invoke git ubuntu import on @pkgname |
976 | + |
977 | + Arguments: |
978 | + pkgname - string name of a source package |
979 | + dry_run - a boolean to indicate a dry-run operation |
980 | + |
981 | + Returns: |
982 | + A tuple of boolean and a string, where the boolean is the success or |
983 | + failure of the import and the string is the package name. |
984 | + """ |
985 | + ret = False |
986 | + |
987 | + # try up to 3 times before declaring failure, in case of |
988 | + # racing with the publisher finalizing files and/or |
989 | + # transient download failure |
990 | + |
991 | + for attempt in range(3): |
992 | + cmd = [ |
993 | + 'git', |
994 | + 'ubuntu', |
995 | + 'import', |
996 | + '-l', |
997 | + 'usd-importer-bot', |
998 | + pkgname, |
999 | + ] |
1000 | + try: |
1001 | + print(' '.join(cmd)) |
1002 | + if not dry_run: |
1003 | + run(cmd, check=True) |
1004 | + ret = True |
1005 | + break |
1006 | + except subprocess.CalledProcessError: |
1007 | + print( |
1008 | + "Failed to import %s (attempt %d/3)" % ( |
1009 | + pkgname, |
1010 | + attempt+1, |
1011 | + ) |
1012 | + ) |
1013 | + time.sleep(10) |
1014 | + |
1015 | + return pkgname, ret |
1016 | + |
1017 | +def setup_git_config( |
1018 | + name='Ubuntu Git Importer', |
1019 | + email='usd-importer-announce@lists.canonical.com', |
1020 | +): |
1021 | + """setup_git_config - Ensure global required Git configuration values are set |
1022 | + |
1023 | + Arguments: |
1024 | + name - string name to set as user.name in Git config |
1025 | + email - string email to set as user.email in Git config |
1026 | + """ |
1027 | + try: |
1028 | + run(['git', 'config', '--global', 'user.name'], check=True) |
1029 | + except subprocess.CalledProcessError: |
1030 | + run(['git', 'config', '--global', 'user.name', name]) |
1031 | + try: |
1032 | + run(['git', 'config', '--global', 'user.email'], check=True) |
1033 | + except subprocess.CalledProcessError: |
1034 | + run(['git', 'config', '--global', 'user.email', email]) |
1035 | + |
1036 | +def pool_map_import_srcpkg( |
1037 | + num_workers, |
1038 | + dry_run, |
1039 | + pkgnames, |
1040 | +): |
1041 | + """pool_map_import_srcpkg - Use a multiprocessing.Pool to parallel |
1042 | + import source packages |
1043 | + |
1044 | + Arguments: |
1045 | + num_workers - integer number of worker processes to use |
1046 | + dry_run - a boolean to indicate a dry-run operation |
1047 | + pkgnames - a list of string names of source packages |
1048 | + """ |
1049 | + with multiprocessing.Pool(processes=num_workers) as pool: |
1050 | + results = pool.map( |
1051 | + functools.partial( |
1052 | + import_srcpkg, |
1053 | + dry_run=dry_run, |
1054 | + ), |
1055 | + pkgnames, |
1056 | + ) |
1057 | + |
1058 | + return ( |
1059 | + [pkg for pkg, success in results if success], |
1060 | + [pkg for pkg, success in results if not success], |
1061 | + ) |
1062 | diff --git a/scripts/source-package-walker.py b/scripts/source-package-walker.py |
1063 | new file mode 100755 |
1064 | index 0000000..aaa8ef1 |
1065 | --- /dev/null |
1066 | +++ b/scripts/source-package-walker.py |
1067 | @@ -0,0 +1,272 @@ |
1068 | +#!/usr/bin/env python3 |
1069 | + |
1070 | +# General design: |
1071 | +# let publishes be the set of srcpkg names |
1072 | +# for srcpkg in publishes: |
1073 | +# If srcpkg in blacklist: skip |
1074 | +# If srcpkg not in PHASING_{component}: skip |
1075 | +# try to import srcpkg |
1076 | +# Report on successful and failed imports |
1077 | + |
1078 | +import argparse |
1079 | +import bz2 |
1080 | +import itertools |
1081 | +import gzip |
1082 | +import lzma |
1083 | +import os |
1084 | +import sys |
1085 | +import urllib.request |
1086 | + |
1087 | +import scriptutils |
1088 | + |
1089 | +from debian.deb822 import Sources |
1090 | + |
1091 | +# We expect to be running from a git repository in master for this |
1092 | +# script, because the snap's python code is not exposed except within |
1093 | +# the snap |
1094 | +try: |
1095 | + REALPATH = os.readlink(__file__) |
1096 | +except OSError: |
1097 | + REALPATH = __file__ |
1098 | +sys.path.insert( |
1099 | + 0, |
1100 | + os.path.abspath( |
1101 | + os.path.join(os.path.dirname(REALPATH), os.path.pardir) |
1102 | + ) |
1103 | +) |
1104 | + |
1105 | +from gitubuntu.source_information import GitUbuntuSourceInformation |
1106 | + |
1107 | +def import_all_published_sources( |
1108 | + num_workers, |
1109 | + whitelist, |
1110 | + blacklist, |
1111 | + phasing_main, |
1112 | + phasing_universe, |
1113 | + dry_run, |
1114 | +): |
1115 | + """import_all_published_sources - Import all publishes satisfying a |
1116 | + {white,black}list and phasing |
1117 | + |
1118 | + Arguments: |
1119 | + num_workers - integer number of worker processes to use |
1120 | + whitelist - a list of of packages to always import |
1121 | + blacklist - a list of packages to never import |
1122 | + phasing_main - a integer percentage of all packages in main to import |
1123 | + phasing_universe - a integer percentage of all packages in universe to import |
1124 | + dry_run - a boolean to indicate a dry-run operation |
1125 | + |
1126 | + Returns: |
1127 | + A tuple of two lists, the first containing the names of all |
1128 | + successfully imported source packages, the second containing the |
1129 | + names of all source packages that failed to import. |
1130 | + """ |
1131 | + serieses = GitUbuntuSourceInformation('ubuntu').active_series_name_list |
1132 | + components = ['main', 'universe',] |
1133 | + pocket_suffixes = ['-proposed', '-updates', '-security', '',] |
1134 | + compressions = { |
1135 | + '.xz': lzma.open, |
1136 | + '.bz2': bz2.open, |
1137 | + '.gz': gzip.open, |
1138 | + } |
1139 | + |
1140 | + base_sources_url = ( |
1141 | + 'http://archive.ubuntu.com/ubuntu/dists/%s%s/%s/source/Sources' |
1142 | + ) |
1143 | + pkgnames = set() |
1144 | + for component in components: |
1145 | + for series, pocket_suffix in itertools.product( |
1146 | + serieses, |
1147 | + pocket_suffixes, |
1148 | + ): |
1149 | + url = base_sources_url % ( |
1150 | + series, |
1151 | + pocket_suffix, |
1152 | + component, |
1153 | + ) |
1154 | + for compression, opener in compressions.items(): |
1155 | + try: |
1156 | + with urllib.request.urlopen( |
1157 | + url + compression |
1158 | + ) as source_url_file: |
1159 | + with opener(source_url_file, mode='r') as sources: |
1160 | + print(url + compression) |
1161 | + for src in Sources.iter_paragraphs( |
1162 | + sources, |
1163 | + #fields=['Package,'], |
1164 | + use_apt_pkg=False, |
1165 | + ): |
1166 | + pkgname = src['Package'] |
1167 | + if scriptutils.should_import_srcpkg( |
1168 | + pkgname, |
1169 | + component, |
1170 | + whitelist, |
1171 | + blacklist, |
1172 | + phasing_main, |
1173 | + phasing_universe, |
1174 | + ): |
1175 | + pkgnames.add(pkgname) |
1176 | + break |
1177 | + except urllib.error.HTTPError: |
1178 | + pass |
1179 | + else: |
1180 | + print( |
1181 | + "Unable to find any Sources file for component=%s, " |
1182 | + "series=%s, pocket=%s" % ( |
1183 | + component, |
1184 | + series, |
1185 | + pocket_suffix, |
1186 | + ) |
1187 | + ) |
1188 | + sys.exit(1) |
1189 | + |
1190 | + if not pkgnames: |
1191 | + print("No relevant publishes found") |
1192 | + return [], [] |
1193 | + |
1194 | + return scriptutils.pool_map_import_srcpkg( |
1195 | + num_workers=num_workers, |
1196 | + dry_run=dry_run, |
1197 | + pkgnames=pkgnames, |
1198 | + ) |
1199 | + |
1200 | +def main( |
1201 | + num_workers=scriptutils.DEFAULTS.num_workers, |
1202 | + whitelist_path=scriptutils.DEFAULTS.whitelist, |
1203 | + blacklist_path=scriptutils.DEFAULTS.blacklist, |
1204 | + phasing_main=scriptutils.DEFAULTS.phasing_main, |
1205 | + phasing_universe=scriptutils.DEFAULTS.phasing_universe, |
1206 | + dry_run=scriptutils.DEFAULTS.dry_run, |
1207 | + use_whitelist=scriptutils.DEFAULTS.use_whitelist, |
1208 | +): |
1209 | + """main - Main entry point to the script |
1210 | + |
1211 | + Arguments: |
1212 | + num_workers - integer number of worker threads to use |
1213 | + whitelist_path - string filesystem path to a text file of packages |
1214 | + to always import |
1215 | + blacklist_path - string filesystem path to a text file of packages |
1216 | + to never import |
1217 | + phasing_main - a integer percentage of all packages in main to |
1218 | + import |
1219 | + phasing_universe - a integer percentage of all packages in universe |
1220 | + to import |
1221 | + dry_run - a boolean to indicate a dry-run operation |
1222 | + use_whitelist - a boolean to control whether the whitelist data is |
1223 | + used |
1224 | + |
1225 | + use_whitelist exists because during the rampup of imports, we want |
1226 | + to import the whitelist packages and the phased packages. But after |
1227 | + that first operation to import (or possibly reimport), we do not |
1228 | + want to keep hitting the whitelist set (we only want to adjust the |
1229 | + phasing). This is more important in other scripts, but is relevant |
1230 | + here too. |
1231 | + """ |
1232 | + scriptutils.setup_git_config() |
1233 | + |
1234 | + if use_whitelist: |
1235 | + try: |
1236 | + with open(whitelist_path, 'r') as whitelist_file: |
1237 | + whitelist = [ |
1238 | + line.strip() for line in whitelist_file |
1239 | + if not line.startswith('#') |
1240 | + ] |
1241 | + except (FileNotFoundError, IOError): |
1242 | + whitelist = list() |
1243 | + else: |
1244 | + whitelist = list() |
1245 | + |
1246 | + try: |
1247 | + with open(blacklist_path, 'r') as blacklist_file: |
1248 | + blacklist = [ |
1249 | + line.strip() for line in blacklist_file |
1250 | + if not line.startswith('#') |
1251 | + ] |
1252 | + except (FileNotFoundError, IOError): |
1253 | + blacklist = list() |
1254 | + |
1255 | + imported_srcpkgs, failed_srcpkgs = import_all_published_sources( |
1256 | + num_workers, |
1257 | + whitelist, |
1258 | + blacklist, |
1259 | + phasing_main, |
1260 | + phasing_universe, |
1261 | + dry_run, |
1262 | + ) |
1263 | + print( |
1264 | + "Imported %d source packages:\n%s" % ( |
1265 | + len(imported_srcpkgs), |
1266 | + '\n'.join(imported_srcpkgs), |
1267 | + ) |
1268 | + ) |
1269 | + print( |
1270 | + "Failed to import %d source packages:\n%s" % ( |
1271 | + len(failed_srcpkgs), |
1272 | + '\n'.join(failed_srcpkgs), |
1273 | + ) |
1274 | + ) |
1275 | + |
1276 | +def cli_main(): |
1277 | + """cli_main - CLI entry point to script |
1278 | + """ |
1279 | + parser = argparse.ArgumentParser( |
1280 | + description='Script to import all source packages with phasing', |
1281 | + ) |
1282 | + parser.add_argument( |
1283 | + '--num-workers', |
1284 | + type=int, |
1285 | + help="Number of worker threads to use", |
1286 | + default=scriptutils.DEFAULTS.num_workers, |
1287 | + ) |
1288 | + parser.add_argument( |
1289 | + '--no-whitelist', |
1290 | + action='store_false', |
1291 | + dest='use_whitelist', |
1292 | + help="Do not process packages in the whitelist", |
1293 | + default=not scriptutils.DEFAULTS.use_whitelist, |
1294 | + ) |
1295 | + parser.add_argument( |
1296 | + '--whitelist', |
1297 | + type=str, |
1298 | + help="Path to whitelist file", |
1299 | + default=scriptutils.DEFAULTS.whitelist, |
1300 | + ) |
1301 | + parser.add_argument( |
1302 | + '--blacklist', |
1303 | + type=str, |
1304 | + help="Path to blacklist file", |
1305 | + default=scriptutils.DEFAULTS.blacklist, |
1306 | + ) |
1307 | + parser.add_argument( |
1308 | + '--phasing-universe', |
1309 | + type=int, |
1310 | + help="Percentage of universe packages to phase", |
1311 | + default=scriptutils.DEFAULTS.phasing_universe, |
1312 | + ) |
1313 | + parser.add_argument( |
1314 | + '--phasing-main', |
1315 | + type=int, |
1316 | + help="Percentage of main packages to phase", |
1317 | + default=scriptutils.DEFAULTS.phasing_main, |
1318 | + ) |
1319 | + parser.add_argument( |
1320 | + '--dry-run', |
1321 | + action='store_true', |
1322 | + help="Simulate operation but do not actually do anything", |
1323 | + default=scriptutils.DEFAULTS.dry_run, |
1324 | + ) |
1325 | + |
1326 | + args = parser.parse_args() |
1327 | + |
1328 | + main( |
1329 | + num_workers=args.num_workers, |
1330 | + whitelist_path=args.whitelist, |
1331 | + blacklist_path=args.blacklist, |
1332 | + phasing_main=args.phasing_main, |
1333 | + phasing_universe=args.phasing_universe, |
1334 | + dry_run=args.dry_run, |
1335 | + use_whitelist=args.use_whitelist, |
1336 | + ) |
1337 | + |
1338 | +if __name__ == '__main__': |
1339 | + cli_main() |
1340 | diff --git a/scripts/update-repository-alias.py b/scripts/update-repository-alias.py |
1341 | new file mode 100755 |
1342 | index 0000000..5f2c8ae |
1343 | --- /dev/null |
1344 | +++ b/scripts/update-repository-alias.py |
1345 | @@ -0,0 +1,268 @@ |
1346 | +#!/usr/bin/env python3 |
1347 | + |
1348 | +import argparse |
1349 | +import functools |
1350 | +import lzma |
1351 | +import multiprocessing |
1352 | +import os |
1353 | +import sys |
1354 | +import urllib |
1355 | + |
1356 | +import scriptutils |
1357 | + |
1358 | +# We expect to be running from a git repository in master for this |
1359 | +# script, because the snap's python code is not exposed except within |
1360 | +# the snap |
1361 | +try: |
1362 | + REALPATH = os.readlink(__file__) |
1363 | +except OSError: |
1364 | + REALPATH = __file__ |
1365 | +sys.path.insert( |
1366 | + 0, |
1367 | + os.path.abspath( |
1368 | + os.path.join(os.path.dirname(REALPATH), os.path.pardir) |
1369 | + ) |
1370 | +) |
1371 | + |
1372 | +from gitubuntu.source_information import launchpad_login_auth |
1373 | + |
1374 | +def update_git_repository(package, dry_run, unset): |
1375 | + """update_git_repository - set the default Git Repository on Launchpad for a source package |
1376 | + |
1377 | + Arguments: |
1378 | + package - string name of a source package |
1379 | + dry_run - a boolean to indicate a dry-run operation |
1380 | + unset - a boolean to indicate the URL should be set to None instead |
1381 | + of the usd-import-team repository |
1382 | + """ |
1383 | + launchpad = launchpad_login_auth() |
1384 | + quoted_package = urllib.parse.quote(package) |
1385 | + target = launchpad.load('ubuntu/+source/%s' % quoted_package) |
1386 | + current_default = launchpad.git_repositories.getDefaultRepository( |
1387 | + target=target, |
1388 | + ) |
1389 | + logmsg = list() |
1390 | + if current_default: |
1391 | + logmsg.append( |
1392 | + "Current default Git repository for %s: %s" % ( |
1393 | + package, |
1394 | + current_default.git_https_url, |
1395 | + ) |
1396 | + ) |
1397 | + else: |
1398 | + logmsg.append("No default Git repository set for %s" % package) |
1399 | + |
1400 | + if unset: |
1401 | + logmsg.append("Unsetting default Git repository for %s" % package) |
1402 | + if not dry_run: |
1403 | + launchpad.git_repositories.setDefaultRepository( |
1404 | + repository=None, |
1405 | + target=target, |
1406 | + ) |
1407 | + else: |
1408 | + path = '~usd-import-team/ubuntu/+source/%s/+git/%s' % ( |
1409 | + quoted_package, |
1410 | + quoted_package, |
1411 | + ) |
1412 | + repository = launchpad.git_repositories.getByPath(path=path) |
1413 | + if repository: |
1414 | + logmsg.append( |
1415 | + "Setting default Git repository for %s to %s" % ( |
1416 | + package, |
1417 | + path, |
1418 | + ) |
1419 | + ) |
1420 | + if not dry_run: |
1421 | + launchpad.git_repositories.setDefaultRepository( |
1422 | + repository=repository, |
1423 | + target=target, |
1424 | + ) |
1425 | + else: |
1426 | + logmsg.append("No usd-import-team repository for %s" % package) |
1427 | + |
1428 | + print('\n'.join(logmsg)) |
1429 | + |
1430 | +# add whitelist, blacklist, phasing and loop |
1431 | +def main( |
1432 | + num_workers=scriptutils.DEFAULTS.num_workers, |
1433 | + whitelist_path=scriptutils.DEFAULTS.whitelist, |
1434 | + blacklist_path=scriptutils.DEFAULTS.blacklist, |
1435 | + phasing_main=scriptutils.DEFAULTS.phasing_main, |
1436 | + phasing_universe=scriptutils.DEFAULTS.phasing_universe, |
1437 | + dry_run=scriptutils.DEFAULTS.dry_run, |
1438 | + unset=False, |
1439 | + use_whitelist=scriptutils.DEFAULTS.use_whitelist, |
1440 | +): |
1441 | + """main - Main entry point to the script |
1442 | + |
1443 | + Set the default Git Repository target on Launchpad for all source |
1444 | + packages imported to usd-import-team to the usd-import-team |
1445 | + repository. |
1446 | + |
1447 | + Arguments: |
1448 | + num_workers - integer number of worker threads to use |
1449 | + whitelist - a list of of packages to always import |
1450 | + blacklist - a list of packages to never import |
1451 | + phasing_main - a integer percentage of all packages in main to import |
1452 | + phasing_universe - a integer percentage of all packages in universe to import |
1453 | + dry_run - a boolean to indicate a dry-run operation |
1454 | + unset - a boolean to indicate the URL should be set to None instead |
1455 | + of the usd-import-team repository |
1456 | + use_whitelist - a boolean to control whether the whitelist data is |
1457 | + used |
1458 | + |
1459 | + use_whitelist exists because during the rampup of imports, we want |
1460 | + to import the whitelist packages and the phased packages. But after |
1461 | + that first operation to import (or possibly reimport), we do not |
1462 | + want to keep hitting the whitelist set (we only want to adjust the |
1463 | + phasing). This is more important in other scripts, but is relevant |
1464 | + here too. |
1465 | + """ |
1466 | + if use_whitelist: |
1467 | + try: |
1468 | + with open(whitelist_path, 'r') as whitelist_file: |
1469 | + whitelist = [ |
1470 | + line.strip() for line in whitelist_file |
1471 | + if not line.startswith('#') |
1472 | + ] |
1473 | + except (FileNotFoundError, IOError): |
1474 | + whitelist = list() |
1475 | + else: |
1476 | + whitelist = list() |
1477 | + |
1478 | + try: |
1479 | + with open(blacklist_path, 'r') as blacklist_file: |
1480 | + blacklist = [ |
1481 | + line.strip() for line in blacklist_file |
1482 | + if not line.startswith('#') |
1483 | + ] |
1484 | + except (FileNotFoundError, IOError): |
1485 | + blacklist = list() |
1486 | + |
1487 | + main_packages = list() |
1488 | + with urllib.request.urlopen( |
1489 | + 'http://archive.ubuntu.com/ubuntu/dists/devel/main/source/Sources.xz' |
1490 | + ) as source_url_file: |
1491 | + with lzma.open(source_url_file, mode='rt') as main_sources: |
1492 | + for line in main_sources: |
1493 | + if line.startswith('Package:'): |
1494 | + _, pkgname = line.split(':') |
1495 | + main_packages.append(pkgname.strip()) |
1496 | + |
1497 | + universe_packages = list() |
1498 | + with urllib.request.urlopen( |
1499 | + 'http://archive.ubuntu.com/ubuntu/dists/devel/universe/source/Sources.xz' |
1500 | + ) as source_url_file: |
1501 | + with lzma.open(source_url_file, mode='rt') as universe_sources: |
1502 | + for line in universe_sources: |
1503 | + if line.startswith('Package:'): |
1504 | + _, pkgname = line.split(':') |
1505 | + universe_packages.append(pkgname.strip()) |
1506 | + |
1507 | + filtered_pkgnames = set() |
1508 | + for pkgname in main_packages: |
1509 | + if scriptutils.should_import_srcpkg( |
1510 | + pkgname, |
1511 | + 'main', |
1512 | + whitelist, |
1513 | + blacklist, |
1514 | + phasing_main, |
1515 | + phasing_universe, |
1516 | + ): |
1517 | + filtered_pkgnames.add(pkgname) |
1518 | + for pkgname in universe_packages: |
1519 | + if scriptutils.should_import_srcpkg( |
1520 | + pkgname, |
1521 | + 'universe', |
1522 | + whitelist, |
1523 | + blacklist, |
1524 | + phasing_main, |
1525 | + phasing_universe, |
1526 | + ): |
1527 | + filtered_pkgnames.add(pkgname) |
1528 | + |
1529 | + if not filtered_pkgnames: |
1530 | + print("No relevant publishes found") |
1531 | + return |
1532 | + |
1533 | + with multiprocessing.Pool(processes=num_workers) as pool: |
1534 | + pool.map( |
1535 | + functools.partial( |
1536 | + update_git_repository, |
1537 | + dry_run=dry_run, |
1538 | + unset=unset, |
1539 | + ), |
1540 | + filtered_pkgnames, |
1541 | + ) |
1542 | + |
1543 | +def cli_main(): |
1544 | + """cli_main - main entry point for CLI |
1545 | + """ |
1546 | + parser = argparse.ArgumentParser( |
1547 | + description='Update the default Git repository for imported source packages', |
1548 | + ) |
1549 | + parser.add_argument( |
1550 | + '--num-workers', |
1551 | + type=int, |
1552 | + help="Number of worker threads to use", |
1553 | + default=scriptutils.DEFAULTS.num_workers, |
1554 | + ) |
1555 | + parser.add_argument( |
1556 | + '--no-whitelist', |
1557 | + action='store_false', |
1558 | + dest='use_whitelist', |
1559 | + help="Do not process packages in the whitelist", |
1560 | + default=not scriptutils.DEFAULTS.use_whitelist, |
1561 | + ) |
1562 | + parser.add_argument( |
1563 | + '--whitelist', |
1564 | + type=str, |
1565 | + help="Path to whitelist file", |
1566 | + default=scriptutils.DEFAULTS.whitelist, |
1567 | + ) |
1568 | + parser.add_argument( |
1569 | + '--blacklist', |
1570 | + type=str, |
1571 | + help="Path to blacklist file", |
1572 | + default=scriptutils.DEFAULTS.blacklist, |
1573 | + ) |
1574 | + parser.add_argument( |
1575 | + '--phasing-universe', |
1576 | + type=int, |
1577 | + help="Percentage of universe packages to phase", |
1578 | + default=scriptutils.DEFAULTS.phasing_universe, |
1579 | + ) |
1580 | + parser.add_argument( |
1581 | + '--phasing-main', |
1582 | + type=int, |
1583 | + help="Percentage of main packages to phase", |
1584 | + default=scriptutils.DEFAULTS.phasing_main, |
1585 | + ) |
1586 | + parser.add_argument( |
1587 | + '--dry-run', |
1588 | + action='store_true', |
1589 | + help="Simulate operation but do not actually do anything", |
1590 | + default=False, |
1591 | + ) |
1592 | + parser.add_argument( |
1593 | + '--unset', |
1594 | + action='store_true', |
1595 | + help="Unset default repository (for testing)", |
1596 | + default=False, |
1597 | + ) |
1598 | + |
1599 | + args = parser.parse_args() |
1600 | + |
1601 | + main( |
1602 | + num_workers=args.num_workers, |
1603 | + whitelist_path=args.whitelist, |
1604 | + blacklist_path=args.blacklist, |
1605 | + phasing_main=args.phasing_main, |
1606 | + phasing_universe=args.phasing_universe, |
1607 | + dry_run=args.dry_run, |
1608 | + unset=args.unset, |
1609 | + use_whitelist=args.use_whitelist, |
1610 | + ) |
1611 | + |
1612 | +if __name__ == '__main__': |
1613 | + cli_main() |
PASSED: Continuous integration, rev:a341a5c452d de92ab1d0a319c8 aac0f729cbd90f /jenkins. ubuntu. com/server/ job/git- ubuntu- ci/195/
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild: /jenkins. ubuntu. com/server/ job/git- ubuntu- ci/195/ rebuild
https:/