Merge lp:~stevenk/launchpad/populate-spr-changelogs into lp:launchpad

Proposed by Steve Kowalik
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 12569
Proposed branch: lp:~stevenk/launchpad/populate-spr-changelogs
Merge into: lp:launchpad
Diff against target: 438 lines (+222/-16)
4 files modified
database/schema/security.cfg (+3/-0)
lib/lp/archiveuploader/dscfile.py (+3/-2)
lib/lp/scripts/garbo.py (+137/-0)
lib/lp/scripts/tests/test_garbo.py (+79/-14)
To merge this branch: bzr merge lp:~stevenk/launchpad/populate-spr-changelogs
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
William Grant code* Approve
Review via email: mp+52363@code.launchpad.net

Commit message

[r=lifeless,wgrant][bug=732334] Add a garbo-hourly job that attempts to set changelogs for SPRs that don't have one set.

Description of the change

Add a garbo-hourly job that looks for Debian SPRs that don't have a changelog set, unpacks package, uploads the debian/changelog file to the Librarian and sets SPR.changelog.

This code is expected to only be temporary, and used as a one-shot. It will be removed once it has been determined that is no longer doing anything useful.

I have cleaned up some lint, along with actually exporting findFile in lp.archiveuploader.dscfile.

To post a comment you must log in.
Revision history for this message
William Grant (wgrant) wrote :

I have a major concern with this: the script was initially written to be a single run, so it doesn't record failures so it can skip them next time. This means that each garbo run will initially go through any import failures and retry them, before getting to anything new. I don't really know how many failures there will be, but it's very possible that it will get into a situation in which it is spending the entirety of each run reattempting old failures, not making any new progress.

Apart from that, some small comments:
 - Why getCandidateSPRIDs instead of getCandidateSPRs (my fault, but your problem now)?
 - Why use the master store? Isn't the slave store good enough for the query (ditto)?
 - Why is there an archive ID hardcoded? 3 is Debian primary, but we want to do it for everything. Even if you think we don't want to do it for everything, hardcoding the ID is the not the Right Way.
 - I'd like additional tests, if only to confirm that the restrictedness check works.

review: Needs Information (code*)
Revision history for this message
William Grant (wgrant) wrote :

Looks good, but could you elaborate on the sampledata suckage? It's not at all obvious to anyone.

I'm also tempted to ask you to rename getCandidateSPRs back to getCandidateSPRIDs, since your change to make it return SPRs failed. But I think you might injure me, so I won't force the issue.

review: Approve (code*)
Revision history for this message
Robert Collins (lifeless) wrote :

I agree with wgrant on the function name. Also there is a fixture/context manager for temporary directories in fixtures - from fixtures import TempDir. That will be a little more correct than manual management of tempdir lifetime. Both changes are up to you but I heartily commend them to you.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

(particularly as your tempdir handling is buggy right now).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'database/schema/security.cfg'
2--- database/schema/security.cfg 2011-03-08 16:42:43 +0000
3+++ database/schema/security.cfg 2011-03-09 23:47:00 +0000
4@@ -2218,6 +2218,9 @@
5 public.job = SELECT, INSERT, DELETE
6 public.branchjob = SELECT, DELETE
7 public.bugjob = SELECT, INSERT
8+public.libraryfilealias = SELECT, INSERT
9+public.libraryfilecontent = SELECT, INSERT
10+public.sourcepackagerelease = SELECT, UPDATE
11
12 [garbo_daily]
13 type=user
14
15=== modified file 'lib/lp/archiveuploader/dscfile.py'
16--- lib/lp/archiveuploader/dscfile.py 2011-02-17 17:02:54 +0000
17+++ lib/lp/archiveuploader/dscfile.py 2011-03-09 23:47:00 +0000
18@@ -10,11 +10,12 @@
19 __metaclass__ = type
20
21 __all__ = [
22- 'SignableTagFile',
23 'DSCFile',
24 'DSCUploadedFile',
25+ 'findFile',
26 'find_changelog',
27 'find_copyright',
28+ 'SignableTagFile',
29 ]
30
31 from cStringIO import StringIO
32@@ -28,7 +29,6 @@
33 from debian.deb822 import Deb822Dict
34 from zope.component import getUtility
35
36-from lp.services.encoding import guess as guess_encoding
37 from canonical.launchpad.interfaces.gpghandler import (
38 GPGVerificationError,
39 IGPGHandler,
40@@ -64,6 +64,7 @@
41 )
42 from lp.registry.interfaces.sourcepackage import SourcePackageFileType
43 from lp.registry.interfaces.sourcepackagename import ISourcePackageNameSet
44+from lp.services.encoding import guess as guess_encoding
45 from lp.soyuz.enums import (
46 ArchivePurpose,
47 SourcePackageFormat,
48
49=== modified file 'lib/lp/scripts/garbo.py'
50--- lib/lp/scripts/garbo.py 2011-02-23 10:28:53 +0000
51+++ lib/lp/scripts/garbo.py 2011-03-09 23:47:00 +0000
52@@ -13,11 +13,18 @@
53 datetime,
54 timedelta,
55 )
56+from fixtures import TempDir
57+import os
58+import signal
59+import subprocess
60 import time
61
62 from psycopg2 import IntegrityError
63 import pytz
64+from storm.expr import LeftJoin
65 from storm.locals import (
66+ And,
67+ Count,
68 Max,
69 Min,
70 Select,
71@@ -43,6 +50,7 @@
72 from canonical.launchpad.database.oauth import OAuthNonce
73 from canonical.launchpad.database.openidconsumer import OpenIDConsumerNonce
74 from canonical.launchpad.interfaces.emailaddress import EmailAddressStatus
75+from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet
76 from canonical.launchpad.interfaces.lpstorm import IMasterStore
77 from canonical.launchpad.utilities.looptuner import TunableLoop
78 from canonical.launchpad.webapp.interfaces import (
79@@ -50,6 +58,9 @@
80 MAIN_STORE,
81 MASTER_FLAVOR,
82 )
83+from canonical.librarian.utils import copy_and_close
84+from lp.archiveuploader.dscfile import findFile
85+from lp.archiveuploader.nascentuploadfile import UploadError
86 from lp.bugs.interfaces.bug import IBugSet
87 from lp.bugs.model.bug import Bug
88 from lp.bugs.model.bugattachment import BugAttachment
89@@ -70,10 +81,13 @@
90 from lp.hardwaredb.model.hwdb import HWSubmission
91 from lp.registry.model.person import Person
92 from lp.services.job.model.job import Job
93+from lp.services.memcache.interfaces import IMemcacheClient
94 from lp.services.scripts.base import (
95 LaunchpadCronScript,
96 SilentLaunchpadScriptFailure,
97 )
98+from lp.soyuz.model.files import SourcePackageReleaseFile
99+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
100 from lp.translations.interfaces.potemplate import IPOTemplateSet
101 from lp.translations.model.potranslation import POTranslation
102
103@@ -81,6 +95,10 @@
104 ONE_DAY_IN_SECONDS = 24*60*60
105
106
107+def subprocess_setup():
108+ signal.signal(signal.SIGPIPE, signal.SIG_DFL)
109+
110+
111 class BulkPruner(TunableLoop):
112 """A abstract ITunableLoop base class for simple pruners.
113
114@@ -893,6 +911,124 @@
115 self.done = True
116
117
118+class PopulateSPRChangelogs(TunableLoop):
119+ maximum_chunk_size = 1
120+
121+ def __init__(self, log, abort_time=None):
122+ super(PopulateSPRChangelogs, self).__init__(log, abort_time)
123+ value = getUtility(IMemcacheClient).get('populate-spr-changelogs')
124+ if not value:
125+ self.start_at = 0
126+ else:
127+ self.start_at = value
128+ self.finish_at = self.getCandidateSPRs(0).last()
129+
130+ def getCandidateSPRs(self, start_at):
131+ return IMasterStore(SourcePackageRelease).using(
132+ SourcePackageRelease,
133+ # Find any SPRFs that have expired (LFA.content IS NULL).
134+ LeftJoin(
135+ SourcePackageReleaseFile,
136+ SourcePackageReleaseFile.sourcepackagereleaseID ==
137+ SourcePackageRelease.id),
138+ LeftJoin(
139+ LibraryFileAlias,
140+ And(LibraryFileAlias.id ==
141+ SourcePackageReleaseFile.libraryfileID,
142+ LibraryFileAlias.content == None)),
143+ # And exclude any SPRs that have any expired SPRFs.
144+ ).find(
145+ SourcePackageRelease.id,
146+ SourcePackageRelease.id >= start_at,
147+ SourcePackageRelease.changelog == None,
148+ ).group_by(SourcePackageRelease.id).having(
149+ Count(LibraryFileAlias) == 0
150+ ).order_by(SourcePackageRelease.id)
151+
152+ def isDone(self):
153+ return self.start_at > self.finish_at
154+
155+ def __call__(self, chunk_size):
156+ for sprid in self.getCandidateSPRs(self.start_at)[:chunk_size]:
157+ spr = SourcePackageRelease.get(sprid)
158+ with TempDir() as tmp_dir:
159+ dsc_file = None
160+
161+ # Grab the files from the librarian into a temporary
162+ # directory.
163+ try:
164+ for sprf in spr.files:
165+ dest = os.path.join(
166+ tmp_dir.path, sprf.libraryfile.filename)
167+ dest_file = open(dest, 'w')
168+ sprf.libraryfile.open()
169+ copy_and_close(sprf.libraryfile, dest_file)
170+ if dest.endswith('.dsc'):
171+ dsc_file = dest
172+ except LookupError:
173+ self.log.warning(
174+ 'SPR %d (%s %s) has missing library files.' % (
175+ spr.id, spr.name, spr.version))
176+ continue
177+
178+ if dsc_file is None:
179+ self.log.warning(
180+ 'SPR %d (%s %s) has no DSC.' % (
181+ spr.id, spr.name, spr.version))
182+ continue
183+
184+ # Extract the source package. Throw away stdout/stderr
185+ # -- we only really care about the return code.
186+ fnull = open('/dev/null', 'w')
187+ ret = subprocess.call(
188+ ['dpkg-source', '-x', dsc_file, os.path.join(
189+ tmp_dir.path, 'extracted')],
190+ stdout=fnull, stderr=fnull,
191+ preexec_fn=subprocess_setup)
192+ fnull.close()
193+ if ret != 0:
194+ self.log.warning(
195+ 'SPR %d (%s %s) failed to unpack: returned %d' % (
196+ spr.id, spr.name, spr.version, ret))
197+ continue
198+
199+ # We have an extracted source package. Let's get the
200+ # changelog. findFile ensures that it's not too huge, and
201+ # not a symlink.
202+ try:
203+ changelog_path = findFile(
204+ tmp_dir.path, 'debian/changelog')
205+ except UploadError, e:
206+ changelog_path = None
207+ self.log.warning(
208+ 'SPR %d (%s %s) changelog could not be '
209+ 'imported: %s' % (
210+ spr.id, spr.name, spr.version, e))
211+ if changelog_path:
212+ # The LFA should be restricted only if there aren't any
213+ # public publications.
214+ restricted = not any(
215+ not a.private for a in spr.published_archives)
216+ spr.changelog = getUtility(ILibraryFileAliasSet).create(
217+ 'changelog',
218+ os.stat(changelog_path).st_size,
219+ open(changelog_path, "r"),
220+ "text/x-debian-source-changelog",
221+ restricted=restricted)
222+ self.log.info('SPR %d (%s %s) changelog imported.' % (
223+ spr.id, spr.name, spr.version))
224+ else:
225+ self.log.warning('SPR %d (%s %s) had no changelog.' % (
226+ spr.id, spr.name, spr.version))
227+
228+ self.start_at = spr.id + 1
229+ result = getUtility(IMemcacheClient).set(
230+ 'populate-spr-changelogs', self.start_at)
231+ if not result:
232+ self.log.warning('Failed to set start_at in memcache.')
233+ transaction.commit()
234+
235+
236 class BaseDatabaseGarbageCollector(LaunchpadCronScript):
237 """Abstract base class to run a collection of TunableLoops."""
238 script_name = None # Script name for locking and database user. Override.
239@@ -977,6 +1113,7 @@
240 RevisionCachePruner,
241 BugHeatUpdater,
242 BugWatchScheduler,
243+ PopulateSPRChangelogs,
244 ]
245 experimental_tunable_loops = []
246
247
248=== modified file 'lib/lp/scripts/tests/test_garbo.py'
249--- lib/lp/scripts/tests/test_garbo.py 2011-02-28 11:50:34 +0000
250+++ lib/lp/scripts/tests/test_garbo.py 2011-03-09 23:47:00 +0000
251@@ -10,7 +10,9 @@
252 datetime,
253 timedelta,
254 )
255-import operator
256+from fixtures import TempDir
257+import os
258+import subprocess
259 import time
260
261 from pytz import UTC
262@@ -35,6 +37,7 @@
263 from canonical.launchpad.database.oauth import OAuthNonce
264 from canonical.launchpad.database.openidconsumer import OpenIDConsumerNonce
265 from canonical.launchpad.interfaces.emailaddress import EmailAddressStatus
266+from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet
267 from canonical.launchpad.interfaces.lpstorm import IMasterStore
268 from canonical.launchpad.scripts.tests import run_script
269 from canonical.launchpad.webapp.interfaces import (
270@@ -48,6 +51,7 @@
271 LaunchpadZopelessLayer,
272 ZopelessDatabaseLayer,
273 )
274+from lp.archiveuploader.dscfile import findFile
275 from lp.bugs.model.bugnotification import (
276 BugNotification,
277 BugNotificationRecipient,
278@@ -64,6 +68,7 @@
279 )
280 from lp.code.model.codeimportevent import CodeImportEvent
281 from lp.code.model.codeimportresult import CodeImportResult
282+from lp.registry.interfaces.distribution import IDistributionSet
283 from lp.registry.interfaces.person import (
284 IPersonSet,
285 PersonCreationRationale,
286@@ -76,6 +81,8 @@
287 )
288 from lp.services.job.model.job import Job
289 from lp.services.log.logger import BufferLogger
290+from lp.soyuz.enums import PackagePublishingStatus
291+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
292 from lp.testing import (
293 TestCase,
294 TestCaseWithFactory,
295@@ -95,6 +102,13 @@
296
297 def test_hourly_script(self):
298 """Ensure garbo-hourly.py actually runs."""
299+ # Our sampledata doesn't contain anything that PopulateSPRChangelogs
300+ # can process without errors, so it's easier to just set all of the
301+ # changelogs to a random LFA. We can't just expire every LFA, since
302+ # a bunch of SPRs have no SPRFs at all.
303+ IMasterStore(SourcePackageRelease).find(SourcePackageRelease).set(
304+ changelogID=1)
305+ transaction.commit() # run_script() is a different process.
306 rv, out, err = run_script(
307 "cronscripts/garbo-hourly.py", ["-q"], expect_returncode=0)
308 self.failIf(out.strip(), "Output to stdout: %s" % out)
309@@ -513,7 +527,7 @@
310 # them will have the present day as its date created, and so will not
311 # be deleted, whereas the other will have a creation date far in the
312 # past, so it will be deleted.
313- person = self.factory.makePerson(name='test-unlinked-person-new')
314+ self.factory.makePerson(name='test-unlinked-person-new')
315 person_old = self.factory.makePerson(name='test-unlinked-person-old')
316 removeSecurityProxy(person_old).datecreated = datetime(
317 2008, 01, 01, tzinfo=UTC)
318@@ -541,7 +555,7 @@
319 bugID=1,
320 is_comment=True,
321 date_emailed=None)
322- recipient = BugNotificationRecipient(
323+ BugNotificationRecipient(
324 bug_notification=notification,
325 personID=1,
326 reason_header='Whatever',
327@@ -555,7 +569,7 @@
328 bugID=1,
329 is_comment=True,
330 date_emailed=UTC_NOW + SQL("interval '%d days'" % delta))
331- recipient = BugNotificationRecipient(
332+ BugNotificationRecipient(
333 bug_notification=notification,
334 personID=1,
335 reason_header='Whatever',
336@@ -609,7 +623,6 @@
337 Store.of(db_branch).flush()
338 branch_job = BranchUpgradeJob.create(db_branch)
339 branch_job.job.date_finished = THIRTY_DAYS_AGO
340- job_id = branch_job.job.id
341
342 self.assertEqual(
343 store.find(
344@@ -617,7 +630,7 @@
345 BranchJob.branch == db_branch.id).count(),
346 1)
347
348- collector = self.runDaily()
349+ self.runDaily()
350
351 LaunchpadZopelessLayer.switchDbUser('testadmin')
352 self.assertEqual(
353@@ -638,21 +651,16 @@
354
355 branch_job = BranchUpgradeJob.create(db_branch)
356 branch_job.job.date_finished = THIRTY_DAYS_AGO
357- job_id = branch_job.job.id
358
359 db_branch2 = self.factory.makeAnyBranch(
360 branch_format=BranchFormat.BZR_BRANCH_5,
361 repository_format=RepositoryFormat.BZR_KNIT_1)
362- branch_job2 = BranchUpgradeJob.create(db_branch2)
363- job_id_newer = branch_job2.job.id
364+ BranchUpgradeJob.create(db_branch2)
365
366- collector = self.runDaily()
367+ self.runDaily()
368
369 LaunchpadZopelessLayer.switchDbUser('testadmin')
370- self.assertEqual(
371- store.find(
372- BranchJob).count(),
373- 1)
374+ self.assertEqual(store.find(BranchJob).count(), 1)
375
376 def test_ObsoleteBugAttachmentDeleter(self):
377 # Bug attachments without a LibraryFileContent record are removed.
378@@ -710,3 +718,60 @@
379 """ % sqlbase.quote(template.id)).get_one()
380
381 self.assertEqual(1, count)
382+
383+ def upload_to_debian(self, restricted=False):
384+ sid = getUtility(IDistributionSet)['debian']['sid']
385+ spn = self.factory.makeSourcePackageName('9wm')
386+ spr = self.factory.makeSourcePackageRelease(
387+ sourcepackagename=spn, version='1.2-7', distroseries=sid)
388+ archive = sid.main_archive
389+ if restricted:
390+ archive = self.factory.makeArchive(
391+ distribution=sid.distribution, private=True)
392+ self.factory.makeSourcePackagePublishingHistory(
393+ sourcepackagerelease=spr, archive=archive,
394+ status=PackagePublishingStatus.PUBLISHED)
395+ for name in (
396+ '9wm_1.2-7.diff.gz', '9wm_1.2.orig.tar.gz', '9wm_1.2-7.dsc'):
397+ path = os.path.join(
398+ 'lib/lp/soyuz/scripts/tests/gina_test_archive/pool/main/9',
399+ '9wm', name)
400+ lfa = getUtility(ILibraryFileAliasSet).create(
401+ name, os.stat(path).st_size, open(path, 'r'),
402+ 'application/octet-stream', restricted=restricted)
403+ spr.addFile(lfa)
404+ with TempDir() as tmp_dir:
405+ fnull = open('/dev/null', 'w')
406+ ret = subprocess.call(
407+ ['dpkg-source', '-x', path, os.path.join(
408+ tmp_dir.path, 'extracted')],
409+ stdout=fnull, stderr=fnull)
410+ fnull.close()
411+ self.assertEqual(0, ret)
412+ changelog_path = findFile(tmp_dir.path, 'debian/changelog')
413+ changelog = open(changelog_path, 'r').read()
414+ transaction.commit() # .runHourly() switches dbuser.
415+ return (spr, changelog)
416+
417+ def test_populateSPRChangelogs(self):
418+ # We set SPR.changelog for imported records from Debian.
419+ LaunchpadZopelessLayer.switchDbUser('testadmin')
420+ spr, changelog = self.upload_to_debian()
421+ collector = self.runHourly()
422+ log = collector.logger.getLogBuffer()
423+ self.assertTrue(
424+ 'SPR %d (9wm 1.2-7) changelog imported.' % spr.id in log)
425+ self.assertFalse(spr.changelog == None)
426+ self.assertFalse(spr.changelog.restricted)
427+ self.assertEqual(changelog, spr.changelog.read())
428+
429+ def test_populateSPRChangelogs_restricted_sprf(self):
430+ LaunchpadZopelessLayer.switchDbUser('testadmin')
431+ spr, changelog = self.upload_to_debian(restricted=True)
432+ collector = self.runHourly()
433+ log = collector.logger.getLogBuffer()
434+ self.assertTrue(
435+ 'SPR %d (9wm 1.2-7) changelog imported.' % spr.id in log)
436+ self.assertFalse(spr.changelog == None)
437+ self.assertTrue(spr.changelog.restricted)
438+ self.assertEqual(changelog, spr.changelog.read())