Merge ~nacc/git-ubuntu:lp1730734-cache-importer-progress into git-ubuntu:master
- Git
- lp:~nacc/git-ubuntu
- lp1730734-cache-importer-progress
- Merge into master
Status: | Work in progress |
---|---|
Proposed branch: | ~nacc/git-ubuntu:lp1730734-cache-importer-progress |
Merge into: | git-ubuntu:master |
Diff against target: |
475 lines (+136/-8) 6 files modified
gitubuntu/importer.py (+50/-3) gitubuntu/source_information.py (+16/-2) man/man1/git-ubuntu-import.1 (+18/-2) scripts/import-source-packages.py (+18/-0) scripts/scriptutils.py (+16/-1) scripts/source-package-walker.py (+18/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Server Team CI bot | continuous-integration | Approve | |
git-ubuntu developers | Pending | ||
Review via email: mp+333499@code.launchpad.net |
Commit message
Description of the change
Make jenkins happy.
Server Team CI bot (server-team-bot) wrote : | # |
Nish Aravamudan (nacc) wrote : | # |
As a local test, I built the git-ubuntu snap with this change in place and ran the following:
# 1) prime the cache
$ git ubuntu import --no-push ipsec-tools --no-clean --db-cache /tmp/git-
11/10/2017 11:58:19 - INFO:Ubuntu Server Team importer v0.6.2
11/10/2017 11:58:19 - INFO:Using git repository at /tmp/tmpso3vrstw
11/10/2017 11:59:25 - INFO:Importing patches-unapplied 1:0.8.2+20140711-10 to ubuntu/bionic
11/10/2017 11:59:39 - WARNING:
11/10/2017 12:00:15 - INFO:Not pushing to remote as specified
11/10/2017 12:00:15 - INFO:Leaving /tmp/tmpso3vrstw as directed
# 2) setup test repositories
$ cp -R /tmp/tmpso3vrstw /tmp/cache-test
$ cp -R /tmp/tmpso3vrstw /tmp/no-cache-test
# 3) Run again using the same cache, no new uploads processed (cache worked)
$ git ubuntu import --no-push ipsec-tools --no-clean --db-cache /tmp/git-
11/10/2017 12:01:45 - INFO:Ubuntu Server Team importer v0.6.2
11/10/2017 12:02:49 - INFO:Not pushing to remote as specified
11/10/2017 12:02:49 - INFO:Leaving /tmp/cache-test as directed
# 4) Run again not using the cache (default operation, as well), we end up processing already seen records
$ git ubuntu import --no-push ipsec-tools --no-clean --no-fetch -d /tmp/no-cache-test
11/10/2017 12:03:33 - INFO:Ubuntu Server Team importer v0.6.2
11/10/2017 12:04:17 - INFO:Importing patches-unapplied 1:0.8.2+20140711-10 to ubuntu/bionic
11/10/2017 12:04:22 - WARNING:
11/10/2017 12:04:37 - INFO:Importing patches-applied 1:0.8.2+20140711-10 to ubuntu/bionic
11/10/2017 12:04:40 - WARNING:
11/10/2017 12:05:07 - INFO:Not pushing to remote as specified
11/10/2017 12:05:07 - INFO:Leaving /tmp/no-cache-test as directed
Nish Aravamudan (nacc) wrote : | # |
I am fairly confident in the actual changes here. What I think needs deciding is how the linear import script and publisher-watching script should interact with a shared DBM cache. I think it is relatively safe within one process, but from what I can tell, there is no guarantee the two won't race and the DBM implementation itself is not concurrency safe.
Possibly we should switch to sqlite3 for that, then, even though it is more overhead to configure and query.
Finally, I'm wondering how to describe the race in question. Basically, we'd see two processes read different (possibly unset in one case) values for the last SPPHR seen in the cache. They would then iterate a different set of Launchpad records, but the result would be the same, relative to the actual contents of the imports. The question is the branches, I think. They both start with the same git repository (effectively, under the assumption neither has pushed) and move the branches. In the case of one processing more (older) records, they would move the branches more, but the end result *should* be the same, or an ancestor of what will occur. But that's the concern (ancestor) as we are now timing-sensitive to what gets pushed (since we force push branches). I wonder if we should treat it more like RCU, in that we can always read or write, but the read data may be stale. So it needs to be verified before and after our loop. I imagine we could read the data, iterate our records and import them, and before we write our last_spphr back into the database, check to see if the current value is still the same? That would shrink the race to between re-reading the value and writing the new value (and the linear importer is a one-time operation, in theory).
Server Team CI bot (server-team-bot) wrote : | # |
PASSED: Continuous integration, rev:e087180218b
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild:
https:/
Nish Aravamudan (nacc) wrote : | # |
Aother thought, does the cache need to be distinct for debian/ubuntu X unapplied/applied? Given that applied failures may be fixable, they can be at a different version than the unapplied?
Christian Ehrhardt (paelzer) wrote : | # |
I'm not sure I can help you with the "implications" as you asked on IRC without thinking much more into it. But I have a few questions that might help to clarify at least.
To do so I want to hear more details on the actual use-case that lets us run into this race:
1) Do we even want that two importers run concurrently on the same system?
If not lets just lock the DB to be of single use - problem solved.
2) If we want multiple processes to be allowed we could make the use of
the cache exclusive per package. (I beg your pardon if I don't see the point why this would
fail). So only one worker can be in said [package] at any time.
Processes working on different packages should never conflict right?
Another worker with a different [package] can continue and we know there won't be a race.
Essentially locking on [package] (yes this might need a better db to support
rollback/
The tail of the processing would be updating sphhr and then unlocking the given package.
In general I think you should add tests along these commits that e.g. does "import-
Nish Aravamudan (nacc) wrote : | # |
On 08.01.2018 [20:17:52 -0000], ChristianEhrhardt wrote:
> I'm not sure I can help you with the "implications" as you asked on
> IRC without thinking much more into it. But I have a few questions
> that might help to clarify at least.
Understood and thank you!
> To do so I want to hear more details on the actual use-case that lets
> us run into this race:
The race is internal, in some ways, to the importer, but you do bring up
an interesting second case.
`git ubuntu import` is managed by a looping script,
scripts/
fashion. Roughly:
Obtain a LIST of publishing events in reverse chronological order
Walk LIST backwards until we find a PUBLISH that we have
already imported (or earlier)
Walk LIST foward after PUBLISH and IMPORT each unique source package
name
IMPORT itself is algorithmically:
Get current REPOSITORY
Get Launchpad publishing DATA for a given source package in reverse
chronological order
Walk DATA backwards until we find a PUBLISH that we have already
imported (or earlier)
Walk DATA forwards and import each publish record
The issue we are trying to resolve here is the "until we find a PUBLISH
that we have already imported" in IMPORT.
In the prior importer code, we had a unique Git commit for every
publication event, because the targetted <series>-<pocket> was part of
the Git commit (via the commit message and parenting). So as we did
IMPORT's reverse walk, we could look at the branch tips and compare
their commit data (where we stored the publishing timestamp) to the
publishing records to find the last imported publishing record.
We dropped that ability, by dropping publishing parents altogether. We
now just import all published versions once, tying them together only by
their changelogs. We then forcibly move the branch tips.
So now if we use an unmodified importer, we will end up going back in
IMPORT to the last publish that was the first time we saw a given
version (typically in Debian, therefore) and walking forward from there.
This will be unnecessary iteration of publishing data, at least, and
unnecessary moving of the branch pointers.
My branch modifies the catch-up to move the storage of the publication
event to an external cache, currently DBM.
> 1) Do we even want that two importers run concurrently on the same system?
> If not lets just lock the DB to be of single use - problem solved.
Right, so we have a mode we know will exist at some point, where there
is a linear script walking all 'to-import' source packages getting them
loaded to Launchpad, and the keep-up script which is ensuring that
publishes that are happeninng while the linear walker is going get
integrated. So we do expect to see (and it shouldn't just break) if
there are two imports of the same source package going on. What we don't
want to happen is for a slower 'old' import (where the list of events to
import is older) to somehow trump a faster 'new' import and thus end up
setting the branch pointers to the wrong place(s).
The problem is, that if we just outright lock the DB, then the linear
script can't work if it's also using the DB. And the point of it is that
they...
Nish Aravamudan (nacc) wrote : | # |
> Aother thought, does the cache need to be distinct for debian/ubuntu X
> unapplied/applied? Given that applied failures may be fixable, they can be at
> a different version than the unapplied?
I do think we actually need 4 caches, since the unapplied / applied heads can easily be at a different state. I'll work on this now.
I'll also add some changes to the importer to clean up that code.
Nish Aravamudan (nacc) wrote : | # |
> Aother thought, does the cache need to be distinct for debian/ubuntu X
> unapplied/applied? Given that applied failures may be fixable, they can be at
> a different version than the unapplied?
I do think we actually need 4 caches, since the unapplied / applied heads can easily be at a different state. I'll work on this now.
I'll also add some changes to the importer to clean up that code.
Christian Ehrhardt (paelzer) wrote : | # |
[...]
> So now if we use an unmodified importer, we will end up going back in
> IMPORT to the last publish that was the first time we saw a given
> version (typically in Debian, therefore) and walking forward from there.
To confirm - you go to "the first time we saw a given version" because without parenting history that is the only safe place to walk forward from?
> This will be unnecessary iteration of publishing data
because some might already have been handled.
>, at least, and
> unnecessary moving of the branch pointers.
For the walking
Thanks for the details, much clearer to me now what your case actually is.
>
> My branch modifies the catch-up to move the storage of the publication
> event to an external cache, currently DBM.
[...]
> The more I think about it, I think I am spinning myself on this problem
> for no reason :)
>
> I think we can let the linear walker not use the dbm [we can keep the
> code, just not pass anything] (or sqlite) and it will just mean if there
> is a race between when the linear walker hits a source package and when
> the keep-up script does, there will be some no-op looping (unless new
> SPRs are found).
Sounds good, and then you can actually lock the DB to be fully safe against unintentional concurrent use of it by the catch up (if one manually starts it twice or so).
[...]
> I do think we actually need 4 caches, since the unapplied / applied heads can easily be at a
> different state. I'll work on this now.
> I'll also add some changes to the importer to clean up that code.
Since you will end up only accessing it by code and never manually I think you could even implement as debian/ubuntu x unapplied/applied x per-package. If you end up locking the DBs in any way you will still have a better granularity on those locks.
And for your code it doesn't matter how split the file structure is.
Unmerged commits
- a9960c8... by Nish Aravamudan
-
git ubuntu import: use a dbm cache to store importer progress
With the recent changes to the importer algorithm, we no longer can
identify a Launchpad publishing event by the commit information in the
repository -- a given SourcePackageRelease (source package name and
version) is only imported once (presuming all future publishes of the
same name and version match exactly). We rely on this in
launchpad_versions_ published_ after, which iterates the Launchpad data
backwards until we either match the commit data for a branch head, or
see Launchpad data from before one of the branch heads.Change the importer code to take a --db-cache argument as a directory
containing a DBM cache for debian and ubuntu (this is needed, because
DBM are single-leve string-indexed string storage databases). Lookup the
source package name in the relevant cache before we lookup Launchpad
data, to obtain the last SPPHR used. Store the latest processed SPPHR
after iterating the Launchpad data.Also update the scripts to support passing a persistent/
consistent value
to the import for the cache.LP: #1730734
Preview Diff
1 | diff --git a/gitubuntu/importer.py b/gitubuntu/importer.py | |||
2 | index 2063f83..6abad7d 100644 | |||
3 | --- a/gitubuntu/importer.py | |||
4 | +++ b/gitubuntu/importer.py | |||
5 | @@ -26,6 +26,7 @@ | |||
6 | 26 | 26 | ||
7 | 27 | import argparse | 27 | import argparse |
8 | 28 | import atexit | 28 | import atexit |
9 | 29 | import dbm | ||
10 | 29 | import functools | 30 | import functools |
11 | 30 | import getpass | 31 | import getpass |
12 | 31 | import logging | 32 | import logging |
13 | @@ -150,6 +151,7 @@ def main( | |||
14 | 150 | parentfile=top_level_defaults.parentfile, | 151 | parentfile=top_level_defaults.parentfile, |
15 | 151 | retries=top_level_defaults.retries, | 152 | retries=top_level_defaults.retries, |
16 | 152 | retry_backoffs=top_level_defaults.retry_backoffs, | 153 | retry_backoffs=top_level_defaults.retry_backoffs, |
17 | 154 | db_cache_dir=None, | ||
18 | 153 | ): | 155 | ): |
19 | 154 | """Main entry point to the importer | 156 | """Main entry point to the importer |
20 | 155 | 157 | ||
21 | @@ -178,6 +180,9 @@ def main( | |||
22 | 178 | @parentfile: string path to file specifying parent overrides | 180 | @parentfile: string path to file specifying parent overrides |
23 | 179 | @retries: integer number of download retries to attempt | 181 | @retries: integer number of download retries to attempt |
24 | 180 | @retry_backoffs: list of backoff durations to use between retries | 182 | @retry_backoffs: list of backoff durations to use between retries |
25 | 183 | @db_cache_dir: string fileystem directory containing 'ubuntu' and | ||
26 | 184 | 'debian' dbm database files, which store the progress of prior | ||
27 | 185 | importer runs | ||
28 | 181 | 186 | ||
29 | 182 | If directory is None, a temporary directory is created and used. | 187 | If directory is None, a temporary directory is created and used. |
30 | 183 | 188 | ||
31 | @@ -187,6 +192,11 @@ def main( | |||
32 | 187 | If dl_cache is None, CACHE_PATH in the local repository will be | 192 | If dl_cache is None, CACHE_PATH in the local repository will be |
33 | 188 | used. | 193 | used. |
34 | 189 | 194 | ||
35 | 195 | If db_cache_dir is None, no database lookups are performed which makes | ||
36 | 196 | it possible that the import will attempt to re-import already | ||
37 | 197 | imported publishes. This should be fine, although less efficient | ||
38 | 198 | than possible. | ||
39 | 199 | |||
40 | 190 | Returns 0 on successful import (which includes non-fatal failures); | 200 | Returns 0 on successful import (which includes non-fatal failures); |
41 | 191 | 1 otherwise. | 201 | 1 otherwise. |
42 | 192 | """ | 202 | """ |
43 | @@ -314,6 +324,9 @@ def main( | |||
44 | 314 | else: | 324 | else: |
45 | 315 | workdir = dl_cache | 325 | workdir = dl_cache |
46 | 316 | 326 | ||
47 | 327 | if db_cache_dir: | ||
48 | 328 | os.makedirs(db_cache_dir, exist_ok=True) | ||
49 | 329 | |||
50 | 317 | os.makedirs(workdir, exist_ok=True) | 330 | os.makedirs(workdir, exist_ok=True) |
51 | 318 | 331 | ||
52 | 319 | # now sets a global _PARENT_OVERRIDES | 332 | # now sets a global _PARENT_OVERRIDES |
53 | @@ -326,6 +339,7 @@ def main( | |||
54 | 326 | patches_applied=False, | 339 | patches_applied=False, |
55 | 327 | debian_head_versions=debian_head_versions, | 340 | debian_head_versions=debian_head_versions, |
56 | 328 | ubuntu_head_versions=ubuntu_head_versions, | 341 | ubuntu_head_versions=ubuntu_head_versions, |
57 | 342 | db_cache_dir=db_cache_dir, | ||
58 | 329 | debian_sinfo=debian_sinfo, | 343 | debian_sinfo=debian_sinfo, |
59 | 330 | ubuntu_sinfo=ubuntu_sinfo, | 344 | ubuntu_sinfo=ubuntu_sinfo, |
60 | 331 | active_series_only=active_series_only, | 345 | active_series_only=active_series_only, |
61 | @@ -347,6 +361,7 @@ def main( | |||
62 | 347 | patches_applied=True, | 361 | patches_applied=True, |
63 | 348 | debian_head_versions=applied_debian_head_versions, | 362 | debian_head_versions=applied_debian_head_versions, |
64 | 349 | ubuntu_head_versions=applied_ubuntu_head_versions, | 363 | ubuntu_head_versions=applied_ubuntu_head_versions, |
65 | 364 | db_cache_dir=db_cache_dir, | ||
66 | 350 | debian_sinfo=debian_sinfo, | 365 | debian_sinfo=debian_sinfo, |
67 | 351 | ubuntu_sinfo=ubuntu_sinfo, | 366 | ubuntu_sinfo=ubuntu_sinfo, |
68 | 352 | active_series_only=active_series_only, | 367 | active_series_only=active_series_only, |
69 | @@ -1401,6 +1416,7 @@ def import_publishes( | |||
70 | 1401 | patches_applied, | 1416 | patches_applied, |
71 | 1402 | debian_head_versions, | 1417 | debian_head_versions, |
72 | 1403 | ubuntu_head_versions, | 1418 | ubuntu_head_versions, |
73 | 1419 | db_cache_dir, | ||
74 | 1404 | debian_sinfo, | 1420 | debian_sinfo, |
75 | 1405 | ubuntu_sinfo, | 1421 | ubuntu_sinfo, |
76 | 1406 | active_series_only, | 1422 | active_series_only, |
77 | @@ -1411,6 +1427,8 @@ def import_publishes( | |||
78 | 1411 | history_found = False | 1427 | history_found = False |
79 | 1412 | only_debian = False | 1428 | only_debian = False |
80 | 1413 | srcpkg_information = None | 1429 | srcpkg_information = None |
81 | 1430 | last_debian_spphr = None | ||
82 | 1431 | last_ubuntu_spphr = None | ||
83 | 1414 | if patches_applied: | 1432 | if patches_applied: |
84 | 1415 | _namespace = namespace | 1433 | _namespace = namespace |
85 | 1416 | namespace = '%s/applied' % namespace | 1434 | namespace = '%s/applied' % namespace |
86 | @@ -1426,15 +1444,28 @@ def import_publishes( | |||
87 | 1426 | import_unapplied_spi, | 1444 | import_unapplied_spi, |
88 | 1427 | skip_orig=skip_orig, | 1445 | skip_orig=skip_orig, |
89 | 1428 | ) | 1446 | ) |
93 | 1429 | for distname, versions, dist_sinfo in ( | 1447 | |
94 | 1430 | ("debian", debian_head_versions, debian_sinfo), | 1448 | for distname, versions, dist_sinfo, last_spphr in ( |
95 | 1431 | ("ubuntu", ubuntu_head_versions, ubuntu_sinfo)): | 1449 | ("debian", debian_head_versions, debian_sinfo, last_debian_spphr), |
96 | 1450 | ("ubuntu", ubuntu_head_versions, ubuntu_sinfo, last_ubuntu_spphr), | ||
97 | 1451 | ): | ||
98 | 1432 | if active_series_only and distname == "debian": | 1452 | if active_series_only and distname == "debian": |
99 | 1433 | continue | 1453 | continue |
100 | 1454 | |||
101 | 1455 | last_spphr = None | ||
102 | 1456 | if db_cache_dir: | ||
103 | 1457 | with dbm.open(os.path.join(db_cache_dir, distname), 'c') as cache: | ||
104 | 1458 | try: | ||
105 | 1459 | last_spphr = decode_binary(cache[pkgname]) | ||
106 | 1460 | except KeyError: | ||
107 | 1461 | pass | ||
108 | 1462 | |||
109 | 1434 | try: | 1463 | try: |
110 | 1464 | last_spi = None | ||
111 | 1435 | for srcpkg_information in dist_sinfo.launchpad_versions_published_after( | 1465 | for srcpkg_information in dist_sinfo.launchpad_versions_published_after( |
112 | 1436 | versions, | 1466 | versions, |
113 | 1437 | namespace, | 1467 | namespace, |
114 | 1468 | last_spphr, | ||
115 | 1438 | workdir=workdir, | 1469 | workdir=workdir, |
116 | 1439 | active_series_only=active_series_only | 1470 | active_series_only=active_series_only |
117 | 1440 | ): | 1471 | ): |
118 | @@ -1445,6 +1476,11 @@ def import_publishes( | |||
119 | 1445 | namespace=_namespace, | 1476 | namespace=_namespace, |
120 | 1446 | ubuntu_sinfo=ubuntu_sinfo, | 1477 | ubuntu_sinfo=ubuntu_sinfo, |
121 | 1447 | ) | 1478 | ) |
122 | 1479 | last_spi = srcpkg_information | ||
123 | 1480 | if last_spi: | ||
124 | 1481 | if db_cache_dir: | ||
125 | 1482 | with dbm.open(os.path.join(db_cache_dir, distname), 'w') as db_cache: | ||
126 | 1483 | db_cache[pkgname] = str(last_spi.spphr) | ||
127 | 1448 | except NoPublicationHistoryException: | 1484 | except NoPublicationHistoryException: |
128 | 1449 | logging.warning("No publication history found for %s in %s.", | 1485 | logging.warning("No publication history found for %s in %s.", |
129 | 1450 | pkgname, distname | 1486 | pkgname, distname |
130 | @@ -1520,6 +1556,11 @@ def parse_args(subparsers=None, base_subparsers=None): | |||
131 | 1520 | action='store_true', | 1556 | action='store_true', |
132 | 1521 | help=argparse.SUPPRESS, | 1557 | help=argparse.SUPPRESS, |
133 | 1522 | ) | 1558 | ) |
134 | 1559 | parser.add_argument( | ||
135 | 1560 | '--db-cache', | ||
136 | 1561 | type=str, | ||
137 | 1562 | help=argparse.SUPPRESS, | ||
138 | 1563 | ) | ||
139 | 1523 | if not subparsers: | 1564 | if not subparsers: |
140 | 1524 | return parser.parse_args() | 1565 | return parser.parse_args() |
141 | 1525 | return 'import - %s' % kwargs['description'] | 1566 | return 'import - %s' % kwargs['description'] |
142 | @@ -1548,6 +1589,11 @@ def cli_main(args): | |||
143 | 1548 | except AttributeError: | 1589 | except AttributeError: |
144 | 1549 | dl_cache = None | 1590 | dl_cache = None |
145 | 1550 | 1591 | ||
146 | 1592 | try: | ||
147 | 1593 | db_cache = args.db_cache | ||
148 | 1594 | except AttributeError: | ||
149 | 1595 | db_cache = None | ||
150 | 1596 | |||
151 | 1551 | return main( | 1597 | return main( |
152 | 1552 | pkgname=args.package, | 1598 | pkgname=args.package, |
153 | 1553 | owner=args.lp_owner, | 1599 | owner=args.lp_owner, |
154 | @@ -1567,4 +1613,5 @@ def cli_main(args): | |||
155 | 1567 | parentfile=args.parentfile, | 1613 | parentfile=args.parentfile, |
156 | 1568 | retries=args.retries, | 1614 | retries=args.retries, |
157 | 1569 | retry_backoffs=args.retry_backoffs, | 1615 | retry_backoffs=args.retry_backoffs, |
158 | 1616 | db_cache_dir=args.db_cache, | ||
159 | 1570 | ) | 1617 | ) |
160 | diff --git a/gitubuntu/source_information.py b/gitubuntu/source_information.py | |||
161 | index b97bc8a..050d60a 100644 | |||
162 | --- a/gitubuntu/source_information.py | |||
163 | +++ b/gitubuntu/source_information.py | |||
164 | @@ -443,7 +443,14 @@ class GitUbuntuSourceInformation(object): | |||
165 | 443 | for srcpkg in spph: | 443 | for srcpkg in spph: |
166 | 444 | yield self.get_corrected_spi(srcpkg, workdir) | 444 | yield self.get_corrected_spi(srcpkg, workdir) |
167 | 445 | 445 | ||
169 | 446 | def launchpad_versions_published_after(self, head_versions, namespace, workdir=None, active_series_only=False): | 446 | def launchpad_versions_published_after( |
170 | 447 | self, | ||
171 | 448 | head_versions, | ||
172 | 449 | namespace, | ||
173 | 450 | last_spphr=None, | ||
174 | 451 | workdir=None, | ||
175 | 452 | active_series_only=False, | ||
176 | 453 | ): | ||
177 | 447 | args = { | 454 | args = { |
178 | 448 | 'exact_match':True, | 455 | 'exact_match':True, |
179 | 449 | 'source_name':self.pkgname, | 456 | 'source_name':self.pkgname, |
180 | @@ -471,7 +478,14 @@ class GitUbuntuSourceInformation(object): | |||
181 | 471 | if len(spph) == 0: | 478 | if len(spph) == 0: |
182 | 472 | raise NoPublicationHistoryException("Is %s published in %s?" % | 479 | raise NoPublicationHistoryException("Is %s published in %s?" % |
183 | 473 | (self.pkgname, self.dist_name)) | 480 | (self.pkgname, self.dist_name)) |
185 | 474 | if len(head_versions) > 0: | 481 | if last_spphr: |
186 | 482 | _spph = list() | ||
187 | 483 | for spphr in spph: | ||
188 | 484 | if str(spphr) == last_spphr: | ||
189 | 485 | break | ||
190 | 486 | _spph.append(spphr) | ||
191 | 487 | spph = _spph | ||
192 | 488 | elif head_versions: | ||
193 | 475 | _spph = list() | 489 | _spph = list() |
194 | 476 | for spphr in spph: | 490 | for spphr in spph: |
195 | 477 | spi = GitUbuntuSourcePackageInformation(spphr, self.dist_name, | 491 | spi = GitUbuntuSourcePackageInformation(spphr, self.dist_name, |
196 | diff --git a/man/man1/git-ubuntu-import.1 b/man/man1/git-ubuntu-import.1 | |||
197 | index dd7fd9e..74bfe83 100644 | |||
198 | --- a/man/man1/git-ubuntu-import.1 | |||
199 | +++ b/man/man1/git-ubuntu-import.1 | |||
200 | @@ -1,4 +1,4 @@ | |||
202 | 1 | .TH "GIT-UBUNTU-IMPORT" "1" "2017-07-19" "Git-Ubuntu 0.2" "Git-Ubuntu Manual" | 1 | .TH "GIT-UBUNTU-IMPORT" "1" "2017-11-08" "Git-Ubuntu 0.6.2" "Git-Ubuntu Manual" |
203 | 2 | 2 | ||
204 | 3 | .SH "NAME" | 3 | .SH "NAME" |
205 | 4 | git-ubuntu import \- Import Launchpad publishing history to Git | 4 | git-ubuntu import \- Import Launchpad publishing history to Git |
206 | @@ -9,7 +9,8 @@ git-ubuntu import \- Import Launchpad publishing history to Git | |||
207 | 9 | <user>] [\-\-dl-cache <dl_cache>] [\-\-no-fetch] [\-\-no-push] | 9 | <user>] [\-\-dl-cache <dl_cache>] [\-\-no-fetch] [\-\-no-push] |
208 | 10 | [\-\-no-clean] [\-d | \-\-directory <directory>] | 10 | [\-\-no-clean] [\-d | \-\-directory <directory>] |
209 | 11 | [\-\-active-series-only] [\-\-skip-applied] [\-\-skip-orig] | 11 | [\-\-active-series-only] [\-\-skip-applied] [\-\-skip-orig] |
211 | 12 | [\-\-reimport] [\-\-allow-applied-failures] <package> | 12 | [\-\-reimport] [\-\-allow-applied-failures] [\-\-db-cache <db_cache>] |
212 | 13 | <package> | ||
213 | 13 | .FI | 14 | .FI |
214 | 14 | .SP | 15 | .SP |
215 | 15 | .SH "DESCRIPTION" | 16 | .SH "DESCRIPTION" |
216 | @@ -197,6 +198,21 @@ After investigation, this flag can be used to indicate the importer is | |||
217 | 197 | allowed to ignore such a failure\&. | 198 | allowed to ignore such a failure\&. |
218 | 198 | .RE | 199 | .RE |
219 | 199 | .PP | 200 | .PP |
220 | 201 | \-\-db-cache <db_cache> | ||
221 | 202 | .RS 4 | ||
222 | 203 | The path to a directory containing Python dbm database disk files for | ||
223 | 204 | importer metadata\&. | ||
224 | 205 | If \fB<db_cache>\fR does not exist, it will be created\&. | ||
225 | 206 | Two files in \fB<db_cache>\fR are used, "ubuntu" and "debian', which are | ||
226 | 207 | created if not already present\&. | ||
227 | 208 | The cache files provide information to the importer about prior imports | ||
228 | 209 | of \fB<package>\fR and which Launchpad publishing record was last | ||
229 | 210 | imported\&. | ||
230 | 211 | This is necessary because the imported Git repository does not | ||
231 | 212 | necessarily maintain any metadata about Launchpad publishing | ||
232 | 213 | information\&. | ||
233 | 214 | .RE | ||
234 | 215 | .PP | ||
235 | 200 | <package> | 216 | <package> |
236 | 201 | .RS 4 | 217 | .RS 4 |
237 | 202 | The name of the source package to import\&. | 218 | The name of the source package to import\&. |
238 | diff --git a/scripts/import-source-packages.py b/scripts/import-source-packages.py | |||
239 | index 698ae35..7dd534e 100755 | |||
240 | --- a/scripts/import-source-packages.py | |||
241 | +++ b/scripts/import-source-packages.py | |||
242 | @@ -50,6 +50,7 @@ def import_new_published_sources( | |||
243 | 50 | phasing_main, | 50 | phasing_main, |
244 | 51 | phasing_universe, | 51 | phasing_universe, |
245 | 52 | dry_run, | 52 | dry_run, |
246 | 53 | db_cache, | ||
247 | 53 | ): | 54 | ): |
248 | 54 | """import_new_published_source - Import all new publishes since a prior execution | 55 | """import_new_published_source - Import all new publishes since a prior execution |
249 | 55 | 56 | ||
250 | @@ -60,6 +61,10 @@ def import_new_published_sources( | |||
251 | 60 | phasing_main - a integer percentage of all packages in main to import | 61 | phasing_main - a integer percentage of all packages in main to import |
252 | 61 | phasing_universe - a integer percentage of all packages in universe to import | 62 | phasing_universe - a integer percentage of all packages in universe to import |
253 | 62 | dry_run - a boolean to indicate a dry-run operation | 63 | dry_run - a boolean to indicate a dry-run operation |
254 | 64 | db_cache - string filesystem path containing DBM files for storing | ||
255 | 65 | importer progress, will be created if it does not exist | ||
256 | 66 | |||
257 | 67 | If db_cache is None, no cache is used. | ||
258 | 63 | 68 | ||
259 | 64 | Returns: | 69 | Returns: |
260 | 65 | A tuple of two lists, the first containing the names of all | 70 | A tuple of two lists, the first containing the names of all |
261 | @@ -150,6 +155,7 @@ def import_new_published_sources( | |||
262 | 150 | ret = scriptutils.pool_map_import_srcpkg( | 155 | ret = scriptutils.pool_map_import_srcpkg( |
263 | 151 | num_workers=num_workers, | 156 | num_workers=num_workers, |
264 | 152 | dry_run=dry_run, | 157 | dry_run=dry_run, |
265 | 158 | db_cache=db_cache, | ||
266 | 153 | pkgnames=filtered_pkgnames, | 159 | pkgnames=filtered_pkgnames, |
267 | 154 | ) | 160 | ) |
268 | 155 | 161 | ||
269 | @@ -229,6 +235,7 @@ def main( | |||
270 | 229 | phasing_main=scriptutils.DEFAULTS.phasing_main, | 235 | phasing_main=scriptutils.DEFAULTS.phasing_main, |
271 | 230 | phasing_universe=scriptutils.DEFAULTS.phasing_universe, | 236 | phasing_universe=scriptutils.DEFAULTS.phasing_universe, |
272 | 231 | dry_run=scriptutils.DEFAULTS.dry_run, | 237 | dry_run=scriptutils.DEFAULTS.dry_run, |
273 | 238 | db_cache=scriptutils.DEFAULTS.db_cache, | ||
274 | 232 | ): | 239 | ): |
275 | 233 | """main - Main entry point to the script | 240 | """main - Main entry point to the script |
276 | 234 | 241 | ||
277 | @@ -243,6 +250,8 @@ def main( | |||
278 | 243 | phasing_universe - a integer percentage of all packages in universe | 250 | phasing_universe - a integer percentage of all packages in universe |
279 | 244 | to import | 251 | to import |
280 | 245 | dry_run - a boolean to indicate a dry-run operation | 252 | dry_run - a boolean to indicate a dry-run operation |
281 | 253 | db_cache - string filesystem path containing DBM files for storing | ||
282 | 254 | importer progress, will be created if it does not exist | ||
283 | 246 | """ | 255 | """ |
284 | 247 | scriptutils.setup_git_config() | 256 | scriptutils.setup_git_config() |
285 | 248 | 257 | ||
286 | @@ -280,6 +289,7 @@ def main( | |||
287 | 280 | phasing_main, | 289 | phasing_main, |
288 | 281 | phasing_universe, | 290 | phasing_universe, |
289 | 282 | dry_run, | 291 | dry_run, |
290 | 292 | db_cache, | ||
291 | 283 | ) | 293 | ) |
292 | 284 | print("Imported %d source packages" % len(imported_srcpkgs)) | 294 | print("Imported %d source packages" % len(imported_srcpkgs)) |
293 | 285 | mail_imported_srcpkgs |= set(imported_srcpkgs) | 295 | mail_imported_srcpkgs |= set(imported_srcpkgs) |
294 | @@ -361,6 +371,13 @@ def cli_main(): | |||
295 | 361 | help="Simulate operation but do not actually do anything", | 371 | help="Simulate operation but do not actually do anything", |
296 | 362 | default=scriptutils.DEFAULTS.dry_run, | 372 | default=scriptutils.DEFAULTS.dry_run, |
297 | 363 | ) | 373 | ) |
298 | 374 | parser.add_argument( | ||
299 | 375 | '--db-cache', | ||
300 | 376 | type=str, | ||
301 | 377 | help="Directory containing Python DBM files which store importer " | ||
302 | 378 | "progress", | ||
303 | 379 | default=scriptutils.DEFAULTS.db_cache, | ||
304 | 380 | ) | ||
305 | 364 | 381 | ||
306 | 365 | args = parser.parse_args() | 382 | args = parser.parse_args() |
307 | 366 | 383 | ||
308 | @@ -371,6 +388,7 @@ def cli_main(): | |||
309 | 371 | phasing_main=args.phasing_main, | 388 | phasing_main=args.phasing_main, |
310 | 372 | phasing_universe=args.phasing_universe, | 389 | phasing_universe=args.phasing_universe, |
311 | 373 | dry_run=args.dry_run, | 390 | dry_run=args.dry_run, |
312 | 391 | db_cache=args.db_cache, | ||
313 | 374 | ) | 392 | ) |
314 | 375 | 393 | ||
315 | 376 | if __name__ == '__main__': | 394 | if __name__ == '__main__': |
316 | diff --git a/scripts/scriptutils.py b/scripts/scriptutils.py | |||
317 | index 912efd1..e24c394 100644 | |||
318 | --- a/scripts/scriptutils.py | |||
319 | +++ b/scripts/scriptutils.py | |||
320 | @@ -5,6 +5,7 @@ import multiprocessing | |||
321 | 5 | import os | 5 | import os |
322 | 6 | import sys | 6 | import sys |
323 | 7 | import subprocess | 7 | import subprocess |
324 | 8 | import tempfile | ||
325 | 8 | import time | 9 | import time |
326 | 9 | 10 | ||
327 | 10 | import pkg_resources | 11 | import pkg_resources |
328 | @@ -35,6 +36,7 @@ Defaults = namedtuple( | |||
329 | 35 | 'phasing_main', | 36 | 'phasing_main', |
330 | 36 | 'dry_run', | 37 | 'dry_run', |
331 | 37 | 'use_whitelist', | 38 | 'use_whitelist', |
332 | 39 | 'db_cache', | ||
333 | 38 | ], | 40 | ], |
334 | 39 | ) | 41 | ) |
335 | 40 | 42 | ||
336 | @@ -52,6 +54,7 @@ DEFAULTS = Defaults( | |||
337 | 52 | phasing_main=1, | 54 | phasing_main=1, |
338 | 53 | dry_run=False, | 55 | dry_run=False, |
339 | 54 | use_whitelist=True, | 56 | use_whitelist=True, |
340 | 57 | db_cache=os.path.join(tempfile.gettempdir(), 'git-ubuntu-db-cache'), | ||
341 | 55 | ) | 58 | ) |
342 | 56 | 59 | ||
343 | 57 | 60 | ||
344 | @@ -101,12 +104,16 @@ def should_import_srcpkg( | |||
345 | 101 | return False | 104 | return False |
346 | 102 | 105 | ||
347 | 103 | 106 | ||
349 | 104 | def import_srcpkg(pkgname, dry_run): | 107 | def import_srcpkg(pkgname, dry_run, db_cache): |
350 | 105 | """import_srcpkg - Invoke git ubuntu import on @pkgname | 108 | """import_srcpkg - Invoke git ubuntu import on @pkgname |
351 | 106 | 109 | ||
352 | 107 | Arguments: | 110 | Arguments: |
353 | 108 | pkgname - string name of a source package | 111 | pkgname - string name of a source package |
354 | 109 | dry_run - a boolean to indicate a dry-run operation | 112 | dry_run - a boolean to indicate a dry-run operation |
355 | 113 | db_cache - string filesystem path containing DBM files for storing | ||
356 | 114 | importer progress, will be created if it does not exist | ||
357 | 115 | |||
358 | 116 | If db_cache is None, no cache is used. | ||
359 | 110 | 117 | ||
360 | 111 | Returns: | 118 | Returns: |
361 | 112 | A tuple of a string and a boolean, where the boolean is the success | 119 | A tuple of a string and a boolean, where the boolean is the success |
362 | @@ -125,6 +132,8 @@ def import_srcpkg(pkgname, dry_run): | |||
363 | 125 | 'usd-importer-bot', | 132 | 'usd-importer-bot', |
364 | 126 | pkgname, | 133 | pkgname, |
365 | 127 | ] | 134 | ] |
366 | 135 | if db_cache: | ||
367 | 136 | cmd.extend(['--db-cache', db_cache,]) | ||
368 | 128 | try: | 137 | try: |
369 | 129 | print(' '.join(cmd)) | 138 | print(' '.join(cmd)) |
370 | 130 | if not dry_run: | 139 | if not dry_run: |
371 | @@ -163,6 +172,7 @@ def setup_git_config( | |||
372 | 163 | def pool_map_import_srcpkg( | 172 | def pool_map_import_srcpkg( |
373 | 164 | num_workers, | 173 | num_workers, |
374 | 165 | dry_run, | 174 | dry_run, |
375 | 175 | db_cache, | ||
376 | 166 | pkgnames, | 176 | pkgnames, |
377 | 167 | ): | 177 | ): |
378 | 168 | """pool_map_import_srcpkg - Use a multiprocessing.Pool to parallel | 178 | """pool_map_import_srcpkg - Use a multiprocessing.Pool to parallel |
379 | @@ -171,13 +181,18 @@ def pool_map_import_srcpkg( | |||
380 | 171 | Arguments: | 181 | Arguments: |
381 | 172 | num_workers - integer number of worker processes to use | 182 | num_workers - integer number of worker processes to use |
382 | 173 | dry_run - a boolean to indicate a dry-run operation | 183 | dry_run - a boolean to indicate a dry-run operation |
383 | 184 | db_cache - string filesystem path containing DBM files for storing | ||
384 | 185 | importer progress, will be created if it does not exist | ||
385 | 174 | pkgnames - a list of string names of source packages | 186 | pkgnames - a list of string names of source packages |
386 | 187 | |||
387 | 188 | If db_cache is None, no cache is used. | ||
388 | 175 | """ | 189 | """ |
389 | 176 | with multiprocessing.Pool(processes=num_workers) as pool: | 190 | with multiprocessing.Pool(processes=num_workers) as pool: |
390 | 177 | results = pool.map( | 191 | results = pool.map( |
391 | 178 | functools.partial( | 192 | functools.partial( |
392 | 179 | import_srcpkg, | 193 | import_srcpkg, |
393 | 180 | dry_run=dry_run, | 194 | dry_run=dry_run, |
394 | 195 | db_cache=db_cache, | ||
395 | 181 | ), | 196 | ), |
396 | 182 | pkgnames, | 197 | pkgnames, |
397 | 183 | ) | 198 | ) |
398 | diff --git a/scripts/source-package-walker.py b/scripts/source-package-walker.py | |||
399 | index 58c3d5a..f7b9629 100755 | |||
400 | --- a/scripts/source-package-walker.py | |||
401 | +++ b/scripts/source-package-walker.py | |||
402 | @@ -44,6 +44,7 @@ def import_all_published_sources( | |||
403 | 44 | phasing_main, | 44 | phasing_main, |
404 | 45 | phasing_universe, | 45 | phasing_universe, |
405 | 46 | dry_run, | 46 | dry_run, |
406 | 47 | db_cache, | ||
407 | 47 | ): | 48 | ): |
408 | 48 | """import_all_published_sources - Import all publishes satisfying a | 49 | """import_all_published_sources - Import all publishes satisfying a |
409 | 49 | {white,black}list and phasing | 50 | {white,black}list and phasing |
410 | @@ -55,6 +56,10 @@ def import_all_published_sources( | |||
411 | 55 | phasing_main - a integer percentage of all packages in main to import | 56 | phasing_main - a integer percentage of all packages in main to import |
412 | 56 | phasing_universe - a integer percentage of all packages in universe to import | 57 | phasing_universe - a integer percentage of all packages in universe to import |
413 | 57 | dry_run - a boolean to indicate a dry-run operation | 58 | dry_run - a boolean to indicate a dry-run operation |
414 | 59 | db_cache - string filesystem path containing DBM files for storing | ||
415 | 60 | importer progress, will be created if it does not exist | ||
416 | 61 | |||
417 | 62 | If db_cache is None, no cache is used. | ||
418 | 58 | 63 | ||
419 | 59 | Returns: | 64 | Returns: |
420 | 60 | A tuple of two lists, the first containing the names of all | 65 | A tuple of two lists, the first containing the names of all |
421 | @@ -126,6 +131,7 @@ def import_all_published_sources( | |||
422 | 126 | return scriptutils.pool_map_import_srcpkg( | 131 | return scriptutils.pool_map_import_srcpkg( |
423 | 127 | num_workers=num_workers, | 132 | num_workers=num_workers, |
424 | 128 | dry_run=dry_run, | 133 | dry_run=dry_run, |
425 | 134 | db_cache=db_cache, | ||
426 | 129 | pkgnames=pkgnames, | 135 | pkgnames=pkgnames, |
427 | 130 | ) | 136 | ) |
428 | 131 | 137 | ||
429 | @@ -137,6 +143,7 @@ def main( | |||
430 | 137 | phasing_universe=scriptutils.DEFAULTS.phasing_universe, | 143 | phasing_universe=scriptutils.DEFAULTS.phasing_universe, |
431 | 138 | dry_run=scriptutils.DEFAULTS.dry_run, | 144 | dry_run=scriptutils.DEFAULTS.dry_run, |
432 | 139 | use_whitelist=scriptutils.DEFAULTS.use_whitelist, | 145 | use_whitelist=scriptutils.DEFAULTS.use_whitelist, |
433 | 146 | db_cache=scriptutils.DEFAULTS.db_cache, | ||
434 | 140 | ): | 147 | ): |
435 | 141 | """main - Main entry point to the script | 148 | """main - Main entry point to the script |
436 | 142 | 149 | ||
437 | @@ -153,6 +160,8 @@ def main( | |||
438 | 153 | dry_run - a boolean to indicate a dry-run operation | 160 | dry_run - a boolean to indicate a dry-run operation |
439 | 154 | use_whitelist - a boolean to control whether the whitelist data is | 161 | use_whitelist - a boolean to control whether the whitelist data is |
440 | 155 | used | 162 | used |
441 | 163 | db_cache - string filesystem path containing DBM files for storing | ||
442 | 164 | importer progress, will be created if it does not exist | ||
443 | 156 | 165 | ||
444 | 157 | use_whitelist exists because during the rampup of imports, we want | 166 | use_whitelist exists because during the rampup of imports, we want |
445 | 158 | to import the whitelist packages and the phased packages. But after | 167 | to import the whitelist packages and the phased packages. But after |
446 | @@ -191,6 +200,7 @@ def main( | |||
447 | 191 | phasing_main, | 200 | phasing_main, |
448 | 192 | phasing_universe, | 201 | phasing_universe, |
449 | 193 | dry_run, | 202 | dry_run, |
450 | 203 | db_cache, | ||
451 | 194 | ) | 204 | ) |
452 | 195 | print( | 205 | print( |
453 | 196 | "Imported %d source packages:\n%s" % ( | 206 | "Imported %d source packages:\n%s" % ( |
454 | @@ -254,6 +264,13 @@ def cli_main(): | |||
455 | 254 | help="Simulate operation but do not actually do anything", | 264 | help="Simulate operation but do not actually do anything", |
456 | 255 | default=scriptutils.DEFAULTS.dry_run, | 265 | default=scriptutils.DEFAULTS.dry_run, |
457 | 256 | ) | 266 | ) |
458 | 267 | parser.add_argument( | ||
459 | 268 | '--db-cache', | ||
460 | 269 | type=str, | ||
461 | 270 | help="Directory containing Python DBM files which store importer " | ||
462 | 271 | "progress", | ||
463 | 272 | default=scriptutils.DEFAULTS.db_cache, | ||
464 | 273 | ) | ||
465 | 257 | 274 | ||
466 | 258 | args = parser.parse_args() | 275 | args = parser.parse_args() |
467 | 259 | 276 | ||
468 | @@ -265,6 +282,7 @@ def cli_main(): | |||
469 | 265 | phasing_universe=args.phasing_universe, | 282 | phasing_universe=args.phasing_universe, |
470 | 266 | dry_run=args.dry_run, | 283 | dry_run=args.dry_run, |
471 | 267 | use_whitelist=args.use_whitelist, | 284 | use_whitelist=args.use_whitelist, |
472 | 285 | db_cache=args.db_cache, | ||
473 | 268 | ) | 286 | ) |
474 | 269 | 287 | ||
475 | 270 | if __name__ == '__main__': | 288 | if __name__ == '__main__': |
PASSED: Continuous integration, rev:16af9e00af3 e93c4fcba5a49cd 660d01d6d10f61 /jenkins. ubuntu. com/server/ job/git- ubuntu- ci/205/
https:/
Executed test runs:
SUCCESS: Checkout
SUCCESS: Style Check
SUCCESS: Unit Tests
SUCCESS: Integration Tests
IN_PROGRESS: Declarative: Post Actions
Click here to trigger a rebuild: /jenkins. ubuntu. com/server/ job/git- ubuntu- ci/205/ rebuild
https:/