gina imports SourcePackageRelease.dsc_binaries incorrectly

Bug #911943 reported by Colin Watson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Colin Watson

Bug Description

launchpad_dogfood=> SELECT spr.version, spr.dsc_binaries FROM sourcepackagename spn, sourcepackagerelease spr WHERE spn.id = spr.sourcepackagename AND spn.name = 'acl';
        version | dsc_binaries
------------------------+---------------------------
 2.2.39-1ubuntu1 |
 2.2.49-4ubuntu2 | acl, libacl1-dev, libacl1
 2.2.45-1 | libacl1-dev, libacl1, acl
 2.2.39-1ubuntu2 |
 2.2.41-1ubuntu1 |
 2.2.42-1ubuntu1 | libacl1-dev, libacl1, acl
 2.2.49-4 | acl libacl1-dev libacl1
 2.2.49-3 | acl, libacl1-dev, libacl1
 2.2.49-5 | acl libacl1-dev libacl1
 2.2.49-3-1cross1 | acl, libacl1-dev, libacl1
 2.2.49-3-1cross2 | acl, libacl1-dev, libacl1
 2.2.49-4 | acl, libacl1-dev, libacl1
 2.2.49-4ubuntu1 | acl, libacl1-dev, libacl1
 2.2.23-1 |
 2.2.26-1 |
 2.2.29-1 |
 2.2.34-1 |
 2.2.34-1ubuntu1 |
 2.2.48-1 | acl, libacl1-dev, libacl1
 2.2.49-5ubuntu1 | acl, libacl1-dev, libacl1
 2.2.51-1 | acl, libacl1-dev, libacl1
 2.2.51-1 | acl libacl1-dev libacl1
 2.2.51-2 | acl libacl1-dev libacl1
 2.2.47-3 | acl libacl1-dev libacl1
 2.2.47-2 | acl, libacl1-dev, libacl1
 2.2.47-2 | acl libacl1-dev libacl1
 2.2.47-2-100~ppa1 | acl, libacl1-dev, libacl1
 2.2.48-1 | acl libacl1-dev libacl1
 2.2.47-2ubuntu1~cross1 | acl, libacl1-dev, libacl1
 2.2.49-1 | acl libacl1-dev libacl1
 2.2.49-1 | acl, libacl1-dev, libacl1
 2.2.49-3 | acl libacl1-dev libacl1
 2.2.49-2 | acl libacl1-dev libacl1
 2.2.49-2 | acl, libacl1-dev, libacl1
 2.2.51-3 | acl libacl1-dev libacl1
(35 rows)

Observe that some of these lines have commas and some don't. The ones that don't are the ones that were imported by gina, which does this:

./lib/lp/soyuz/scripts/gina/handlers.py:657: dsc_binaries=" ".join(src.binaries),

dsc_binaries is taken directly from the Binary field in the .dsc file, which is supposed to be comma-plus-optional-whitespace-separated, not space-separated (reference: http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Binary). This should be ", ".join(src.binaries) instead.

I'm running into this in lp:~cjwatson/launchpad/germinate-from-db where germinate chokes on the invalid Binary field in some source stanzas, but this would also be a blocker to switching to no-more-apt-ftparchive publishing for Ubuntu. I think, then, that as well as fixing gina to import these properly, we need a garbo job to clean up the incorrect data. (Alternatively, IndexStanzaFields.makeOutput could fix it up on the fly, but that would be less efficient.) Does that sound reasonable?

Tags: qa-ok

Related branches

Aaron Bentley (abentley)
Changed in launchpad:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Colin Watson (cjwatson) wrote :

There are currently 55065 affected rows on production (thanks, spm). Would it be better to clean this up with manual SQL or a garbo job?

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 911943] Re: gina imports SourcePackageRelease.dsc_binaries incorrectly

On Thu, Jan 5, 2012 at 11:42 AM, Colin Watson <email address hidden> wrote:
> There are currently 55065 affected rows on production (thanks, spm).
> Would it be better to clean this up with manual SQL or a garbo job?

Garbo.

Colin Watson (cjwatson)
Changed in launchpad:
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

2012-01-06 11:06:21 DEBUG [SourcePackageReleaseDscBinariesUpdater] Aquired lock /var/lock/launchpad-garbo-SourcePackageReleaseDscBinariesUpdater.lock.
2012-01-06 11:06:21 INFO [SourcePackageReleaseDscBinariesUpdater] Running SourcePackageReleaseDscBinariesUpdater
2012-01-06 11:09:28 ERROR [SourcePackageReleaseDscBinariesUpdater] Unhandled exception
 -> http://dogfood.launchpadlibrarian.net/80686768/oTRNIKXrEtFjFzBi1dVUrBQKgH3.txt (slice indices must be integers or None or have an __index__ method)
2012-01-06 11:09:32 DEBUG [SourcePackageReleaseDscBinariesUpdater] Released lock /var/lock/launchpad-garbo-SourcePackageReleaseDscBinariesUpdater.lock.

This is because chunk_size is a float, not an int (and this is poorly documented).

tags: added: qa-bad
removed: qa-needstesting
William Grant (wgrant)
tags: added: qa-ok
removed: qa-bad
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
Revision history for this message
Colin Watson (cjwatson) wrote :

The garbo job took a little under 15 minutes on dogfood, including getting blocked on garbo-hourly for a while, so that seems reasonable. Per the SELECT in https://code.launchpad.net/~cjwatson/launchpad/gina-dsc-binaries/+merge/87632, it has cleaned everything up with the exception of the four anomalously-formatted rows that were intentionally excluded.

Furthermore, the original SELECT from this bug report now returns the following, which on manual inspection is a correctly-cleaned-up version of the original (the rows with empty dsc_binaries were empty in the original too):

launchpad_dogfood=> SELECT spr.version, spr.dsc_binaries FROM sourcepackagename spn, sourcepackagerelease spr WHERE spn.id = spr.sourcepackagename AND spn.name = 'acl';
        version | dsc_binaries
------------------------+---------------------------
 2.2.39-1ubuntu1 |
 2.2.49-4ubuntu2 | acl, libacl1-dev, libacl1
 2.2.45-1 | libacl1-dev, libacl1, acl
 2.2.39-1ubuntu2 |
 2.2.41-1ubuntu1 |
 2.2.42-1ubuntu1 | libacl1-dev, libacl1, acl
 2.2.49-3 | acl, libacl1-dev, libacl1
 2.2.49-3-1cross1 | acl, libacl1-dev, libacl1
 2.2.49-3-1cross2 | acl, libacl1-dev, libacl1
 2.2.49-4 | acl, libacl1-dev, libacl1
 2.2.49-4ubuntu1 | acl, libacl1-dev, libacl1
 2.2.23-1 |
 2.2.26-1 |
 2.2.29-1 |
 2.2.34-1 |
 2.2.34-1ubuntu1 |
 2.2.48-1 | acl, libacl1-dev, libacl1
 2.2.49-5ubuntu1 | acl, libacl1-dev, libacl1
 2.2.51-1 | acl, libacl1-dev, libacl1
 2.2.47-2 | acl, libacl1-dev, libacl1
 2.2.47-2-100~ppa1 | acl, libacl1-dev, libacl1
 2.2.47-2ubuntu1~cross1 | acl, libacl1-dev, libacl1
 2.2.49-1 | acl, libacl1-dev, libacl1
 2.2.49-2 | acl, libacl1-dev, libacl1
 2.2.47-3 | acl, libacl1-dev, libacl1
 2.2.51-1 | acl, libacl1-dev, libacl1
 2.2.51-2 | acl, libacl1-dev, libacl1
 2.2.47-2 | acl, libacl1-dev, libacl1
 2.2.48-1 | acl, libacl1-dev, libacl1
 2.2.49-1 | acl, libacl1-dev, libacl1
 2.2.49-3 | acl, libacl1-dev, libacl1
 2.2.49-2 | acl, libacl1-dev, libacl1
 2.2.51-3 | acl, libacl1-dev, libacl1
 2.2.49-4 | acl, libacl1-dev, libacl1
 2.2.49-5 | acl, libacl1-dev, libacl1
(35 rows)

So it's all good now. qa-ok.

tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.