Merge lp:~stevenk/launchpad/populate-spr-changelogs into lp:launchpad
- populate-spr-changelogs
- Merge into devel
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Robert Collins | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 12569 | ||||
Proposed branch: | lp:~stevenk/launchpad/populate-spr-changelogs | ||||
Merge into: | lp:launchpad | ||||
Diff against target: |
438 lines (+222/-16) 4 files modified
database/schema/security.cfg (+3/-0) lib/lp/archiveuploader/dscfile.py (+3/-2) lib/lp/scripts/garbo.py (+137/-0) lib/lp/scripts/tests/test_garbo.py (+79/-14) |
||||
To merge this branch: | bzr merge lp:~stevenk/launchpad/populate-spr-changelogs | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Approve | ||
William Grant | code* | Approve | |
Review via email: mp+52363@code.launchpad.net |
Commit message
[r=lifeless,
Description of the change
Add a garbo-hourly job that looks for Debian SPRs that don't have a changelog set, unpacks package, uploads the debian/changelog file to the Librarian and sets SPR.changelog.
This code is expected to only be temporary, and used as a one-shot. It will be removed once it has been determined that is no longer doing anything useful.
I have cleaned up some lint, along with actually exporting findFile in lp.archiveuploa
William Grant (wgrant) wrote : | # |
William Grant (wgrant) wrote : | # |
Looks good, but could you elaborate on the sampledata suckage? It's not at all obvious to anyone.
I'm also tempted to ask you to rename getCandidateSPRs back to getCandidateSPRIDs, since your change to make it return SPRs failed. But I think you might injure me, so I won't force the issue.
Robert Collins (lifeless) wrote : | # |
I agree with wgrant on the function name. Also there is a fixture/context manager for temporary directories in fixtures - from fixtures import TempDir. That will be a little more correct than manual management of tempdir lifetime. Both changes are up to you but I heartily commend them to you.
Robert Collins (lifeless) wrote : | # |
(particularly as your tempdir handling is buggy right now).
Preview Diff
1 | === modified file 'database/schema/security.cfg' |
2 | --- database/schema/security.cfg 2011-03-08 16:42:43 +0000 |
3 | +++ database/schema/security.cfg 2011-03-09 23:47:00 +0000 |
4 | @@ -2218,6 +2218,9 @@ |
5 | public.job = SELECT, INSERT, DELETE |
6 | public.branchjob = SELECT, DELETE |
7 | public.bugjob = SELECT, INSERT |
8 | +public.libraryfilealias = SELECT, INSERT |
9 | +public.libraryfilecontent = SELECT, INSERT |
10 | +public.sourcepackagerelease = SELECT, UPDATE |
11 | |
12 | [garbo_daily] |
13 | type=user |
14 | |
15 | === modified file 'lib/lp/archiveuploader/dscfile.py' |
16 | --- lib/lp/archiveuploader/dscfile.py 2011-02-17 17:02:54 +0000 |
17 | +++ lib/lp/archiveuploader/dscfile.py 2011-03-09 23:47:00 +0000 |
18 | @@ -10,11 +10,12 @@ |
19 | __metaclass__ = type |
20 | |
21 | __all__ = [ |
22 | - 'SignableTagFile', |
23 | 'DSCFile', |
24 | 'DSCUploadedFile', |
25 | + 'findFile', |
26 | 'find_changelog', |
27 | 'find_copyright', |
28 | + 'SignableTagFile', |
29 | ] |
30 | |
31 | from cStringIO import StringIO |
32 | @@ -28,7 +29,6 @@ |
33 | from debian.deb822 import Deb822Dict |
34 | from zope.component import getUtility |
35 | |
36 | -from lp.services.encoding import guess as guess_encoding |
37 | from canonical.launchpad.interfaces.gpghandler import ( |
38 | GPGVerificationError, |
39 | IGPGHandler, |
40 | @@ -64,6 +64,7 @@ |
41 | ) |
42 | from lp.registry.interfaces.sourcepackage import SourcePackageFileType |
43 | from lp.registry.interfaces.sourcepackagename import ISourcePackageNameSet |
44 | +from lp.services.encoding import guess as guess_encoding |
45 | from lp.soyuz.enums import ( |
46 | ArchivePurpose, |
47 | SourcePackageFormat, |
48 | |
49 | === modified file 'lib/lp/scripts/garbo.py' |
50 | --- lib/lp/scripts/garbo.py 2011-02-23 10:28:53 +0000 |
51 | +++ lib/lp/scripts/garbo.py 2011-03-09 23:47:00 +0000 |
52 | @@ -13,11 +13,18 @@ |
53 | datetime, |
54 | timedelta, |
55 | ) |
56 | +from fixtures import TempDir |
57 | +import os |
58 | +import signal |
59 | +import subprocess |
60 | import time |
61 | |
62 | from psycopg2 import IntegrityError |
63 | import pytz |
64 | +from storm.expr import LeftJoin |
65 | from storm.locals import ( |
66 | + And, |
67 | + Count, |
68 | Max, |
69 | Min, |
70 | Select, |
71 | @@ -43,6 +50,7 @@ |
72 | from canonical.launchpad.database.oauth import OAuthNonce |
73 | from canonical.launchpad.database.openidconsumer import OpenIDConsumerNonce |
74 | from canonical.launchpad.interfaces.emailaddress import EmailAddressStatus |
75 | +from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet |
76 | from canonical.launchpad.interfaces.lpstorm import IMasterStore |
77 | from canonical.launchpad.utilities.looptuner import TunableLoop |
78 | from canonical.launchpad.webapp.interfaces import ( |
79 | @@ -50,6 +58,9 @@ |
80 | MAIN_STORE, |
81 | MASTER_FLAVOR, |
82 | ) |
83 | +from canonical.librarian.utils import copy_and_close |
84 | +from lp.archiveuploader.dscfile import findFile |
85 | +from lp.archiveuploader.nascentuploadfile import UploadError |
86 | from lp.bugs.interfaces.bug import IBugSet |
87 | from lp.bugs.model.bug import Bug |
88 | from lp.bugs.model.bugattachment import BugAttachment |
89 | @@ -70,10 +81,13 @@ |
90 | from lp.hardwaredb.model.hwdb import HWSubmission |
91 | from lp.registry.model.person import Person |
92 | from lp.services.job.model.job import Job |
93 | +from lp.services.memcache.interfaces import IMemcacheClient |
94 | from lp.services.scripts.base import ( |
95 | LaunchpadCronScript, |
96 | SilentLaunchpadScriptFailure, |
97 | ) |
98 | +from lp.soyuz.model.files import SourcePackageReleaseFile |
99 | +from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease |
100 | from lp.translations.interfaces.potemplate import IPOTemplateSet |
101 | from lp.translations.model.potranslation import POTranslation |
102 | |
103 | @@ -81,6 +95,10 @@ |
104 | ONE_DAY_IN_SECONDS = 24*60*60 |
105 | |
106 | |
107 | +def subprocess_setup(): |
108 | + signal.signal(signal.SIGPIPE, signal.SIG_DFL) |
109 | + |
110 | + |
111 | class BulkPruner(TunableLoop): |
112 | """A abstract ITunableLoop base class for simple pruners. |
113 | |
114 | @@ -893,6 +911,124 @@ |
115 | self.done = True |
116 | |
117 | |
118 | +class PopulateSPRChangelogs(TunableLoop): |
119 | + maximum_chunk_size = 1 |
120 | + |
121 | + def __init__(self, log, abort_time=None): |
122 | + super(PopulateSPRChangelogs, self).__init__(log, abort_time) |
123 | + value = getUtility(IMemcacheClient).get('populate-spr-changelogs') |
124 | + if not value: |
125 | + self.start_at = 0 |
126 | + else: |
127 | + self.start_at = value |
128 | + self.finish_at = self.getCandidateSPRs(0).last() |
129 | + |
130 | + def getCandidateSPRs(self, start_at): |
131 | + return IMasterStore(SourcePackageRelease).using( |
132 | + SourcePackageRelease, |
133 | + # Find any SPRFs that have expired (LFA.content IS NULL). |
134 | + LeftJoin( |
135 | + SourcePackageReleaseFile, |
136 | + SourcePackageReleaseFile.sourcepackagereleaseID == |
137 | + SourcePackageRelease.id), |
138 | + LeftJoin( |
139 | + LibraryFileAlias, |
140 | + And(LibraryFileAlias.id == |
141 | + SourcePackageReleaseFile.libraryfileID, |
142 | + LibraryFileAlias.content == None)), |
143 | + # And exclude any SPRs that have any expired SPRFs. |
144 | + ).find( |
145 | + SourcePackageRelease.id, |
146 | + SourcePackageRelease.id >= start_at, |
147 | + SourcePackageRelease.changelog == None, |
148 | + ).group_by(SourcePackageRelease.id).having( |
149 | + Count(LibraryFileAlias) == 0 |
150 | + ).order_by(SourcePackageRelease.id) |
151 | + |
152 | + def isDone(self): |
153 | + return self.start_at > self.finish_at |
154 | + |
155 | + def __call__(self, chunk_size): |
156 | + for sprid in self.getCandidateSPRs(self.start_at)[:chunk_size]: |
157 | + spr = SourcePackageRelease.get(sprid) |
158 | + with TempDir() as tmp_dir: |
159 | + dsc_file = None |
160 | + |
161 | + # Grab the files from the librarian into a temporary |
162 | + # directory. |
163 | + try: |
164 | + for sprf in spr.files: |
165 | + dest = os.path.join( |
166 | + tmp_dir.path, sprf.libraryfile.filename) |
167 | + dest_file = open(dest, 'w') |
168 | + sprf.libraryfile.open() |
169 | + copy_and_close(sprf.libraryfile, dest_file) |
170 | + if dest.endswith('.dsc'): |
171 | + dsc_file = dest |
172 | + except LookupError: |
173 | + self.log.warning( |
174 | + 'SPR %d (%s %s) has missing library files.' % ( |
175 | + spr.id, spr.name, spr.version)) |
176 | + continue |
177 | + |
178 | + if dsc_file is None: |
179 | + self.log.warning( |
180 | + 'SPR %d (%s %s) has no DSC.' % ( |
181 | + spr.id, spr.name, spr.version)) |
182 | + continue |
183 | + |
184 | + # Extract the source package. Throw away stdout/stderr |
185 | + # -- we only really care about the return code. |
186 | + fnull = open('/dev/null', 'w') |
187 | + ret = subprocess.call( |
188 | + ['dpkg-source', '-x', dsc_file, os.path.join( |
189 | + tmp_dir.path, 'extracted')], |
190 | + stdout=fnull, stderr=fnull, |
191 | + preexec_fn=subprocess_setup) |
192 | + fnull.close() |
193 | + if ret != 0: |
194 | + self.log.warning( |
195 | + 'SPR %d (%s %s) failed to unpack: returned %d' % ( |
196 | + spr.id, spr.name, spr.version, ret)) |
197 | + continue |
198 | + |
199 | + # We have an extracted source package. Let's get the |
200 | + # changelog. findFile ensures that it's not too huge, and |
201 | + # not a symlink. |
202 | + try: |
203 | + changelog_path = findFile( |
204 | + tmp_dir.path, 'debian/changelog') |
205 | + except UploadError, e: |
206 | + changelog_path = None |
207 | + self.log.warning( |
208 | + 'SPR %d (%s %s) changelog could not be ' |
209 | + 'imported: %s' % ( |
210 | + spr.id, spr.name, spr.version, e)) |
211 | + if changelog_path: |
212 | + # The LFA should be restricted only if there aren't any |
213 | + # public publications. |
214 | + restricted = not any( |
215 | + not a.private for a in spr.published_archives) |
216 | + spr.changelog = getUtility(ILibraryFileAliasSet).create( |
217 | + 'changelog', |
218 | + os.stat(changelog_path).st_size, |
219 | + open(changelog_path, "r"), |
220 | + "text/x-debian-source-changelog", |
221 | + restricted=restricted) |
222 | + self.log.info('SPR %d (%s %s) changelog imported.' % ( |
223 | + spr.id, spr.name, spr.version)) |
224 | + else: |
225 | + self.log.warning('SPR %d (%s %s) had no changelog.' % ( |
226 | + spr.id, spr.name, spr.version)) |
227 | + |
228 | + self.start_at = spr.id + 1 |
229 | + result = getUtility(IMemcacheClient).set( |
230 | + 'populate-spr-changelogs', self.start_at) |
231 | + if not result: |
232 | + self.log.warning('Failed to set start_at in memcache.') |
233 | + transaction.commit() |
234 | + |
235 | + |
236 | class BaseDatabaseGarbageCollector(LaunchpadCronScript): |
237 | """Abstract base class to run a collection of TunableLoops.""" |
238 | script_name = None # Script name for locking and database user. Override. |
239 | @@ -977,6 +1113,7 @@ |
240 | RevisionCachePruner, |
241 | BugHeatUpdater, |
242 | BugWatchScheduler, |
243 | + PopulateSPRChangelogs, |
244 | ] |
245 | experimental_tunable_loops = [] |
246 | |
247 | |
248 | === modified file 'lib/lp/scripts/tests/test_garbo.py' |
249 | --- lib/lp/scripts/tests/test_garbo.py 2011-02-28 11:50:34 +0000 |
250 | +++ lib/lp/scripts/tests/test_garbo.py 2011-03-09 23:47:00 +0000 |
251 | @@ -10,7 +10,9 @@ |
252 | datetime, |
253 | timedelta, |
254 | ) |
255 | -import operator |
256 | +from fixtures import TempDir |
257 | +import os |
258 | +import subprocess |
259 | import time |
260 | |
261 | from pytz import UTC |
262 | @@ -35,6 +37,7 @@ |
263 | from canonical.launchpad.database.oauth import OAuthNonce |
264 | from canonical.launchpad.database.openidconsumer import OpenIDConsumerNonce |
265 | from canonical.launchpad.interfaces.emailaddress import EmailAddressStatus |
266 | +from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet |
267 | from canonical.launchpad.interfaces.lpstorm import IMasterStore |
268 | from canonical.launchpad.scripts.tests import run_script |
269 | from canonical.launchpad.webapp.interfaces import ( |
270 | @@ -48,6 +51,7 @@ |
271 | LaunchpadZopelessLayer, |
272 | ZopelessDatabaseLayer, |
273 | ) |
274 | +from lp.archiveuploader.dscfile import findFile |
275 | from lp.bugs.model.bugnotification import ( |
276 | BugNotification, |
277 | BugNotificationRecipient, |
278 | @@ -64,6 +68,7 @@ |
279 | ) |
280 | from lp.code.model.codeimportevent import CodeImportEvent |
281 | from lp.code.model.codeimportresult import CodeImportResult |
282 | +from lp.registry.interfaces.distribution import IDistributionSet |
283 | from lp.registry.interfaces.person import ( |
284 | IPersonSet, |
285 | PersonCreationRationale, |
286 | @@ -76,6 +81,8 @@ |
287 | ) |
288 | from lp.services.job.model.job import Job |
289 | from lp.services.log.logger import BufferLogger |
290 | +from lp.soyuz.enums import PackagePublishingStatus |
291 | +from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease |
292 | from lp.testing import ( |
293 | TestCase, |
294 | TestCaseWithFactory, |
295 | @@ -95,6 +102,13 @@ |
296 | |
297 | def test_hourly_script(self): |
298 | """Ensure garbo-hourly.py actually runs.""" |
299 | + # Our sampledata doesn't contain anything that PopulateSPRChangelogs |
300 | + # can process without errors, so it's easier to just set all of the |
301 | + # changelogs to a random LFA. We can't just expire every LFA, since |
302 | + # a bunch of SPRs have no SPRFs at all. |
303 | + IMasterStore(SourcePackageRelease).find(SourcePackageRelease).set( |
304 | + changelogID=1) |
305 | + transaction.commit() # run_script() is a different process. |
306 | rv, out, err = run_script( |
307 | "cronscripts/garbo-hourly.py", ["-q"], expect_returncode=0) |
308 | self.failIf(out.strip(), "Output to stdout: %s" % out) |
309 | @@ -513,7 +527,7 @@ |
310 | # them will have the present day as its date created, and so will not |
311 | # be deleted, whereas the other will have a creation date far in the |
312 | # past, so it will be deleted. |
313 | - person = self.factory.makePerson(name='test-unlinked-person-new') |
314 | + self.factory.makePerson(name='test-unlinked-person-new') |
315 | person_old = self.factory.makePerson(name='test-unlinked-person-old') |
316 | removeSecurityProxy(person_old).datecreated = datetime( |
317 | 2008, 01, 01, tzinfo=UTC) |
318 | @@ -541,7 +555,7 @@ |
319 | bugID=1, |
320 | is_comment=True, |
321 | date_emailed=None) |
322 | - recipient = BugNotificationRecipient( |
323 | + BugNotificationRecipient( |
324 | bug_notification=notification, |
325 | personID=1, |
326 | reason_header='Whatever', |
327 | @@ -555,7 +569,7 @@ |
328 | bugID=1, |
329 | is_comment=True, |
330 | date_emailed=UTC_NOW + SQL("interval '%d days'" % delta)) |
331 | - recipient = BugNotificationRecipient( |
332 | + BugNotificationRecipient( |
333 | bug_notification=notification, |
334 | personID=1, |
335 | reason_header='Whatever', |
336 | @@ -609,7 +623,6 @@ |
337 | Store.of(db_branch).flush() |
338 | branch_job = BranchUpgradeJob.create(db_branch) |
339 | branch_job.job.date_finished = THIRTY_DAYS_AGO |
340 | - job_id = branch_job.job.id |
341 | |
342 | self.assertEqual( |
343 | store.find( |
344 | @@ -617,7 +630,7 @@ |
345 | BranchJob.branch == db_branch.id).count(), |
346 | 1) |
347 | |
348 | - collector = self.runDaily() |
349 | + self.runDaily() |
350 | |
351 | LaunchpadZopelessLayer.switchDbUser('testadmin') |
352 | self.assertEqual( |
353 | @@ -638,21 +651,16 @@ |
354 | |
355 | branch_job = BranchUpgradeJob.create(db_branch) |
356 | branch_job.job.date_finished = THIRTY_DAYS_AGO |
357 | - job_id = branch_job.job.id |
358 | |
359 | db_branch2 = self.factory.makeAnyBranch( |
360 | branch_format=BranchFormat.BZR_BRANCH_5, |
361 | repository_format=RepositoryFormat.BZR_KNIT_1) |
362 | - branch_job2 = BranchUpgradeJob.create(db_branch2) |
363 | - job_id_newer = branch_job2.job.id |
364 | + BranchUpgradeJob.create(db_branch2) |
365 | |
366 | - collector = self.runDaily() |
367 | + self.runDaily() |
368 | |
369 | LaunchpadZopelessLayer.switchDbUser('testadmin') |
370 | - self.assertEqual( |
371 | - store.find( |
372 | - BranchJob).count(), |
373 | - 1) |
374 | + self.assertEqual(store.find(BranchJob).count(), 1) |
375 | |
376 | def test_ObsoleteBugAttachmentDeleter(self): |
377 | # Bug attachments without a LibraryFileContent record are removed. |
378 | @@ -710,3 +718,60 @@ |
379 | """ % sqlbase.quote(template.id)).get_one() |
380 | |
381 | self.assertEqual(1, count) |
382 | + |
383 | + def upload_to_debian(self, restricted=False): |
384 | + sid = getUtility(IDistributionSet)['debian']['sid'] |
385 | + spn = self.factory.makeSourcePackageName('9wm') |
386 | + spr = self.factory.makeSourcePackageRelease( |
387 | + sourcepackagename=spn, version='1.2-7', distroseries=sid) |
388 | + archive = sid.main_archive |
389 | + if restricted: |
390 | + archive = self.factory.makeArchive( |
391 | + distribution=sid.distribution, private=True) |
392 | + self.factory.makeSourcePackagePublishingHistory( |
393 | + sourcepackagerelease=spr, archive=archive, |
394 | + status=PackagePublishingStatus.PUBLISHED) |
395 | + for name in ( |
396 | + '9wm_1.2-7.diff.gz', '9wm_1.2.orig.tar.gz', '9wm_1.2-7.dsc'): |
397 | + path = os.path.join( |
398 | + 'lib/lp/soyuz/scripts/tests/gina_test_archive/pool/main/9', |
399 | + '9wm', name) |
400 | + lfa = getUtility(ILibraryFileAliasSet).create( |
401 | + name, os.stat(path).st_size, open(path, 'r'), |
402 | + 'application/octet-stream', restricted=restricted) |
403 | + spr.addFile(lfa) |
404 | + with TempDir() as tmp_dir: |
405 | + fnull = open('/dev/null', 'w') |
406 | + ret = subprocess.call( |
407 | + ['dpkg-source', '-x', path, os.path.join( |
408 | + tmp_dir.path, 'extracted')], |
409 | + stdout=fnull, stderr=fnull) |
410 | + fnull.close() |
411 | + self.assertEqual(0, ret) |
412 | + changelog_path = findFile(tmp_dir.path, 'debian/changelog') |
413 | + changelog = open(changelog_path, 'r').read() |
414 | + transaction.commit() # .runHourly() switches dbuser. |
415 | + return (spr, changelog) |
416 | + |
417 | + def test_populateSPRChangelogs(self): |
418 | + # We set SPR.changelog for imported records from Debian. |
419 | + LaunchpadZopelessLayer.switchDbUser('testadmin') |
420 | + spr, changelog = self.upload_to_debian() |
421 | + collector = self.runHourly() |
422 | + log = collector.logger.getLogBuffer() |
423 | + self.assertTrue( |
424 | + 'SPR %d (9wm 1.2-7) changelog imported.' % spr.id in log) |
425 | + self.assertFalse(spr.changelog == None) |
426 | + self.assertFalse(spr.changelog.restricted) |
427 | + self.assertEqual(changelog, spr.changelog.read()) |
428 | + |
429 | + def test_populateSPRChangelogs_restricted_sprf(self): |
430 | + LaunchpadZopelessLayer.switchDbUser('testadmin') |
431 | + spr, changelog = self.upload_to_debian(restricted=True) |
432 | + collector = self.runHourly() |
433 | + log = collector.logger.getLogBuffer() |
434 | + self.assertTrue( |
435 | + 'SPR %d (9wm 1.2-7) changelog imported.' % spr.id in log) |
436 | + self.assertFalse(spr.changelog == None) |
437 | + self.assertTrue(spr.changelog.restricted) |
438 | + self.assertEqual(changelog, spr.changelog.read()) |
I have a major concern with this: the script was initially written to be a single run, so it doesn't record failures so it can skip them next time. This means that each garbo run will initially go through any import failures and retry them, before getting to anything new. I don't really know how many failures there will be, but it's very possible that it will get into a situation in which it is spending the entirety of each run reattempting old failures, not making any new progress.
Apart from that, some small comments:
- Why getCandidateSPRIDs instead of getCandidateSPRs (my fault, but your problem now)?
- Why use the master store? Isn't the slave store good enough for the query (ditto)?
- Why is there an archive ID hardcoded? 3 is Debian primary, but we want to do it for everything. Even if you think we don't want to do it for everything, hardcoding the ID is the not the Right Way.
- I'd like additional tests, if only to confirm that the restrictedness check works.