Merge lp:~cjwatson/launchpad/gina-dsc-binaries into lp:launchpad

Proposed by Colin Watson on 2012-01-05
Status: Merged
Approved by: Gavin Panella on 2012-01-05
Approved revision: no longer in the source branch.
Merged at revision: 14643
Proposed branch: lp:~cjwatson/launchpad/gina-dsc-binaries
Merge into: lp:launchpad
Diff against target: 153 lines (+78/-3)
5 files modified
database/schema/security.cfg (+1/-1)
lib/lp/scripts/garbo.py (+37/-0)
lib/lp/scripts/tests/test_garbo.py (+36/-0)
lib/lp/soyuz/doc/gina.txt (+2/-0)
lib/lp/soyuz/scripts/gina/handlers.py (+2/-2)
To merge this branch: bzr merge lp:~cjwatson/launchpad/gina-dsc-binaries
Reviewer Review Type Date Requested Status
Gavin Panella (community) 2012-01-05 Approve on 2012-01-05
Review via email: mp+87632@code.launchpad.net

Commit Message

[r=allenap][bug=911943] Separate dsc_binaries correctly in gina, and add a garbo-daily job to clean up existing wreckage.

Description of the Change

== Summary ==

There is some wrong data in SourcePackageRelease.dsc_binaries, which I believe is all gina's fault: that field is meant to be separated with comma-space, not space. This breaks lp:~cjwatson/launchpad/germinate-from-db, but would also be a blocker (one of several, but still) to switching to no-more-apt-ftparchive publishing for Ubuntu.

== Proposed fix ==

Fix gina to separate dsc_binaries correctly, and add a garbo-daily job to clean up the wreckage.

== Pre-implementation notes ==

Robert and Stuart educated me a bit on how to write garbo jobs.

== Implementation details ==

I found a few rows on dogfood where the Binary field had been incorrectly written as a dependency relationship field, complete with "(>= ...)". This is wildly wrong and it came from PPA uploads rather than gina; but it's in the past and s/ /, /g would not improve matters, so I just excluded anything containing '(', and included a test that such rows remain unchanged.

== Tests ==

bin/test -vvc -t gina -t garbo

== Demo and Q/A ==

I'd like to run this garbo job on dogfood and check that it really does clean everything up, which is probably most easily done by manual inspection of:

  SELECT dsc_binaries FROM SourcePackageRelease WHERE dsc_binaries ~ '[a-z0-9+.-] ';

== lint ==

./lib/lp/soyuz/doc/gina.txt
     162: want exceeds 78 characters.
     179: want exceeds 78 characters.
     223: want exceeds 78 characters.
     236: want exceeds 78 characters.
     242: want exceeds 78 characters.

Pre-existing, fiddly to fix, and should be ignoreable for the purposes of this branch.

To post a comment you must log in.
Gavin Panella (allenap) wrote :

Nice :)

[1]

+ WHERE dsc_binaries ~ '[a-z0-9+.-] '

Could the regex be simplified to '[^,] '?

[2]

Fwiw, an alternative, simpler, implementation might be something like:

    def __init__(self, log, abort_time=None):
        super(SourcePackageReleaseDscBinariesUpdater, self).__init__(
            log, abort_time)
        self.store = IMasterStore(SourcePackageRelease)
        self.ids = list(
            self.store.find(
                SourcePackageRelease.id,
                # Get all SPR IDs which have an incorrectly-separated
                # dsc_binaries value (space rather than comma-space).
                SQL("dsc_binaries ~ '[a-z0-9+.-] '"),
                # Skip rows with dsc_binaries in dependency relationship
                # format. This is a different bug.
                SQL("dsc_binaries NOT LIKE '%(%'")))

    def isDone(self):
        """See `TunableLoop`."""
        return len(self.ids) == 0

    def __call__(self, chunk_size):
        """See `TunableLoop`."""
        chunk_ids = self.ids[:chunk_size]
        del self.ids[:chunk_size]
        self.store.execute("""
            UPDATE SourcePackageRelease
            SET dsc_binaries = REGEXP_REPLACE(
                dsc_binaries, '([a-z0-9+.-]) ', E'\\\\1, ', 'g')
            WHERE id IN %s""" % sqlvalues(chunk_ids), noresult=True)
        transaction.commit()

[3]

+ three = [
+ self.factory.getUniqueString(),
+ self.factory.getUniqueString(),
+ self.factory.getUniqueString(),
+ ]
+ spr_three = self.factory.makeSourcePackageRelease(
+ dsc_binaries=" ".join(three))

Is it important to use unique strings? If not it might be clearer to
make them static:

        spr_three = self.factory.makeSourcePackageRelease("one1 two2 three3")
        ...
        self.assertEqual("one1, two2, three3", spr_three.dsc_binaries)

review: Approve
Colin Watson (cjwatson) wrote :

On Thu, Jan 05, 2012 at 04:16:30PM -0000, Gavin Panella wrote:
> + WHERE dsc_binaries ~ '[a-z0-9+.-] '
>
> Could the regex be simplified to '[^,] '?

No, because there exist dsc_binaries fields with embedded newlines.

Something like [^\n,] might be possible, but I actually felt the above
was a bit clearer. I checked on dogfood's database to make sure there
were no rows that slipped between the cracks.

> Fwiw, an alternative, simpler, implementation might be something like:

Thanks! This is indeed much easier to read. I've applied that.

> + three = [
> + self.factory.getUniqueString(),
> + self.factory.getUniqueString(),
> + self.factory.getUniqueString(),
> + ]
> + spr_three = self.factory.makeSourcePackageRelease(
> + dsc_binaries=" ".join(three))
>
> Is it important to use unique strings? If not it might be clearer to
> make them static:

Yes, you're probably right; I got a bit carried away with the factory
there. I've simplified this as you suggest.

Do you think you could land this for me, if you're satisfied with my
answer to your first point? TIA.

Gavin Panella (allenap) wrote :

> Do you think you could land this for me, if you're satisfied with my
> answer to your first point? TIA.

Sure, I'll send it off to ec2 now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'database/schema/security.cfg'
2--- database/schema/security.cfg 2012-01-05 15:19:43 +0000
3+++ database/schema/security.cfg 2012-01-05 17:21:29 +0000
4@@ -2157,7 +2157,7 @@
5 public.revisionauthor = SELECT, UPDATE
6 public.revisioncache = SELECT, DELETE
7 public.sourcepackagename = SELECT
8-public.sourcepackagerelease = SELECT
9+public.sourcepackagerelease = SELECT, UPDATE
10 public.sourcepackagepublishinghistory = SELECT, UPDATE
11 public.suggestivepotemplate = INSERT, DELETE
12 public.teamparticipation = SELECT, DELETE
13
14=== modified file 'lib/lp/scripts/garbo.py'
15--- lib/lp/scripts/garbo.py 2012-01-04 11:49:08 +0000
16+++ lib/lp/scripts/garbo.py 2012-01-05 17:21:29 +0000
17@@ -86,6 +86,7 @@
18 MAIN_STORE,
19 MASTER_FLAVOR,
20 )
21+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
22 from lp.translations.interfaces.potemplate import IPOTemplateSet
23 from lp.translations.model.potmsgset import POTMsgSet
24 from lp.translations.model.potranslation import POTranslation
25@@ -901,6 +902,41 @@
26 self._update_oldest()
27
28
29+class SourcePackageReleaseDscBinariesUpdater(TunableLoop):
30+ """Fix incorrect values for SourcePackageRelease.dsc_binaries."""
31+
32+ maximum_chunk_size = 1000
33+
34+ def __init__(self, log, abort_time=None):
35+ super(SourcePackageReleaseDscBinariesUpdater, self).__init__(
36+ log, abort_time)
37+ self.store = IMasterStore(SourcePackageRelease)
38+ self.ids = list(
39+ self.store.find(
40+ SourcePackageRelease.id,
41+ # Get all SPR IDs which have an incorrectly-separated
42+ # dsc_binaries value (space rather than comma-space).
43+ SQL("dsc_binaries ~ '[a-z0-9+.-] '"),
44+ # Skip rows with dsc_binaries in dependency relationship
45+ # format. This is a different bug.
46+ SQL("dsc_binaries NOT LIKE '%(%'")))
47+
48+ def isDone(self):
49+ """See `TunableLoop`."""
50+ return len(self.ids) == 0
51+
52+ def __call__(self, chunk_size):
53+ """See `TunableLoop`."""
54+ chunk_ids = self.ids[:chunk_size]
55+ del self.ids[:chunk_size]
56+ self.store.execute("""
57+ UPDATE SourcePackageRelease
58+ SET dsc_binaries = regexp_replace(
59+ dsc_binaries, '([a-z0-9+.-]) ', E'\\\\1, ', 'g')
60+ WHERE id IN %s""" % sqlvalues(chunk_ids), noresult=True)
61+ transaction.commit()
62+
63+
64 class SuggestiveTemplatesCacheUpdater(TunableLoop):
65 """Refresh the SuggestivePOTemplate cache.
66
67@@ -1265,6 +1301,7 @@
68 ObsoleteBugAttachmentPruner,
69 OldTimeLimitedTokenDeleter,
70 RevisionAuthorEmailLinker,
71+ SourcePackageReleaseDscBinariesUpdater,
72 SuggestiveTemplatesCacheUpdater,
73 POTranslationPruner,
74 UnusedPOTMsgSetPruner,
75
76=== modified file 'lib/lp/scripts/tests/test_garbo.py'
77--- lib/lp/scripts/tests/test_garbo.py 2012-01-04 11:49:08 +0000
78+++ lib/lp/scripts/tests/test_garbo.py 2012-01-05 17:21:29 +0000
79@@ -996,6 +996,42 @@
80 self.runDaily()
81 self.assertEqual(0, unreferenced_msgsets.count())
82
83+ def test_SourcePackageReleaseDscBinariesUpdater_updates_incorrect(self):
84+ # SourcePackageReleaseDscBinariesUpdater fixes incorrectly-separated
85+ # dsc_binaries values.
86+ LaunchpadZopelessLayer.switchDbUser('testadmin')
87+ spr = self.factory.makeSourcePackageRelease(
88+ dsc_binaries="one two three")
89+ transaction.commit()
90+ self.runDaily()
91+ self.assertEqual("one, two, three", spr.dsc_binaries)
92+
93+ def test_SourcePackageReleaseDscBinariesUpdater_skips_correct(self):
94+ # SourcePackageReleaseDscBinariesUpdater leaves correct dsc_binaries
95+ # values alone.
96+ LaunchpadZopelessLayer.switchDbUser('testadmin')
97+ spr_one = self.factory.makeSourcePackageRelease(dsc_binaries="one")
98+ spr_three = self.factory.makeSourcePackageRelease(
99+ dsc_binaries="one, two, three")
100+ transaction.commit()
101+ self.runDaily()
102+ self.assertEqual("one", spr_one.dsc_binaries)
103+ self.assertEqual("one, two, three", spr_three.dsc_binaries)
104+
105+ def test_SourcePackageReleaseDscBinariesUpdater_skips_broken(self):
106+ # There have been a few instances of Binary fields in PPA packages
107+ # that are formatted like a dependency relationship field, complete
108+ # with (>= ...). This is completely invalid (and failed to build),
109+ # but does exist historically, so we have to deal with it.
110+ # SourcePackageReleaseDscBinariesUpdater leaves such fields well
111+ # alone.
112+ LaunchpadZopelessLayer.switchDbUser('testadmin')
113+ spr = self.factory.makeSourcePackageRelease(
114+ dsc_binaries="one (>= 1), two")
115+ transaction.commit()
116+ self.runDaily()
117+ self.assertEqual("one (>= 1), two", spr.dsc_binaries)
118+
119
120 class TestGarboTasks(TestCaseWithFactory):
121 layer = LaunchpadZopelessLayer
122
123=== modified file 'lib/lp/soyuz/doc/gina.txt'
124--- lib/lp/soyuz/doc/gina.txt 2011-12-30 06:14:56 +0000
125+++ lib/lp/soyuz/doc/gina.txt 2012-01-05 17:21:29 +0000
126@@ -280,6 +280,8 @@
127 -----END PGP SIGNATURE-----
128 >>> print cap.maintainer.displayname
129 Michael Vogt
130+ >>> print cap.dsc_binaries
131+ libcap-dev, libcap-bin, libcap1
132
133 Test ubuntu-meta in breezy, which was forcefully imported.
134
135
136=== modified file 'lib/lp/soyuz/scripts/gina/handlers.py'
137--- lib/lp/soyuz/scripts/gina/handlers.py 2011-12-30 06:14:56 +0000
138+++ lib/lp/soyuz/scripts/gina/handlers.py 2012-01-05 17:21:29 +0000
139@@ -1,4 +1,4 @@
140-# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
141+# Copyright 2009-2012 Canonical Ltd. This software is licensed under the
142 # GNU Affero General Public License version 3 (see the file LICENSE).
143
144 """Gina db handlers.
145@@ -654,7 +654,7 @@
146 dsc_format=src.format,
147 dsc_maintainer_rfc822=maintainer_line,
148 dsc_standards_version=src.standards_version,
149- dsc_binaries=" ".join(src.binaries),
150+ dsc_binaries=", ".join(src.binaries),
151 upload_archive=distroseries.main_archive)
152 log.info('Source Package Release %s (%s) created' %
153 (name.name, src.version))