Merge lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround into lp:launchpad
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Gavin Panella on 2009-10-07 | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | not available | ||||
Proposed branch: | lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround | ||||
Merge into: | lp:launchpad | ||||
Diff against target: |
537 lines 9 files modified
lib/canonical/launchpad/emailtemplates/upload-accepted.txt (+1/-0) lib/canonical/launchpad/emailtemplates/upload-announcement.txt (+1/-0) lib/canonical/launchpad/emailtemplates/upload-new.txt (+1/-0) lib/canonical/launchpad/emailtemplates/upload-rejection.txt (+1/-0) lib/lp/archiveuploader/changesfile.py (+15/-1) lib/lp/archiveuploader/tagfiles.py (+57/-22) lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes (+23/-0) lib/lp/archiveuploader/tests/test_tagfiles.py (+101/-37) lib/lp/soyuz/model/queue.py (+15/-7) |
||||
To merge this branch: | bzr merge lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Colin Watson | 2009-10-13 | Approve on 2009-10-15 | |
Gavin Panella (community) | 2009-10-07 | Approve on 2009-10-07 | |
Review via email:
|
Michael Nelson (michael.nelson) wrote : | # |
Gavin Panella (allenap) wrote : | # |
Hi Michael,
I looked at the diff (revisions 9647 and 9648) and there's nothing to
comment on there, so the diff I've reviewed doesn't contain those
revisions.
My only comments below are almost trivial, so r=me. It was a very good
idea to compare to apt_pkg. Also, parse_tagfile_lines is a ugly pig of
a function!
Gavin.
> === modified file 'lib/lp/
> --- lib/lp/
> +++ lib/lp/
> @@ -64,8 +64,8 @@
> """This exception is raised if parse_changes encounters nastiness"""
> pass
>
> -re_single_
> -re_multi_
> +re_single_
> +re_multi_
>
>
> def parse_tagfile_
> @@ -103,7 +103,8 @@
>
> num_of_lines = len(indexed_
> index = 0
> - first = -1
> + first_value_
> + more_values_
> while index < num_of_lines:
> index += 1
> line = indexed_
> @@ -145,21 +146,47 @@
> if slf:
> field = slf.groups(
> changes[field] = slf.groups()[1]
> - first = 1
> +
> + # If there is no value on this line, we assume this is
> + # the first line of a multiline field, such as the 'files'
> + # field.
> + if changes[field] == '':
> + first_value_
> +
> + # Either way, more values for this field could follow
> + # on the next line.
> + more_values_
> continue
> if line.rstrip() == " .":
> - changes[field] += '\n'
> + changes[field] += '\n' + line
> continue
> mlf = re_multi_
> if mlf:
> - if first == -1:
> + if more_values_
> raise TagFileParseErr
> "line %d: '%s'\n [Multi-line field continuing on from nothing?]"
> % (filename, index,line))
> - if first == 1 and changes[field] != "":
> - changes[field] += '\n'
> - first = 0
> - changes[field] += mlf.groups()[0] + '\n'
> +
> + # XXX Michael Nelson 20091001 bug=440014
> + # Is there any reason why we're not simply using
> + # apt_pkg.
> + # If we can get rid of this code that is trying to mimic
> + # what ParseTagFile does out of the box, it would be a good
> + # thing.
> +
> + # The first value for a newline delimited field, such as
> + # the 'files' field, has its leading spaces stripped. Other
> + # fields (such as a 'binary' field spanning multiple lines)
> + # should *not* be l-stripped of their leading spaces otherwise
> + ...
Michael Nelson (michael.nelson) wrote : | # |
On Wed, 2009-10-07 at 14:51 +0000, Gavin Panella wrote:
> Review: Approve
> Hi Michael,
>
> I looked at the diff (revisions 9647 and 9648) and there's nothing to
> comment on there, so the diff I've reviewed doesn't contain those
> revisions.
>
> My only comments below are almost trivial, so r=me. It was a very good
> idea to compare to apt_pkg. Also, parse_tagfile_lines is a ugly pig of
> a function!
Thanks Gavin - yeah, I'm keen to get rid of parse_tagfile_lines in the
future if possible.
>
> Gavin.
>
[snip]
> > === modified file 'lib/lp/
> > --- lib/lp/
> > +++ lib/lp/
> > @@ -5,6 +5,7 @@
> >
> > # arch-tag: 52e0c871-
> >
> > +import apt_pkg
> > import unittest
> > import sys
> > import shutil
> > @@ -106,19 +107,110 @@
> > tf = parse_tagfile(
> > self.assertRais
> >
> > +
> > +class TestTagFileDebi
> > +
> > + def setUp(self):
> > + """Parse the test file using apt_pkg for comparison."""
> > +
> > + tagfile_path = datadir(
> > + tagfile = open(tagfile_path)
> > + self.apt_
> > + self.apt_
> > +
> > + # Is it just because this is a very old file that test-related
> > + # things are imported locally?
> > + from lp.archiveuploa
> > + self.parse_
> > + tagfile_path, allow_unsigned = True)
> > +
> > + def test_parse_
> > + """parse_tagfile should not leave trailing '\n' on multiline values.
> > +
> > + This is a regression test for bug 436182.
> > + Previously we,
> > + 1. Stripped leading space/tab from subsequent lines of multiline
> > + values, and
> > + 2. appended a trailing '\n' to the end of the value.
> > + """
> > +
> > + expected_text = (
> > + 'test75874, anotherbinary,\n'
> > + ' andanother, andonemore,\n'
> > + '\tlastone')
> > +
> > + self.assertEqual(
> > + expected_text,
> > + self.apt_
> > +
> > + self.assertEqual(
> > + expected_text,
> > + self.parse_
> > +
> > + def test_parse_
> > + """parse_tagfile should not leave leading or tailing '\n' when
> > + parsing newline delimited fields.
> > +
> > + Newline-delimited fields should be parsed to match
> > + apt_pkg.
> > +
> > + Note: in the past, our parse_tagfile function left the leading
> > + '\n' in the parsed value, whereas it should not have.
> > +
> > + XXX check where this value is used and ensure it won't be affected
> > + (ie. skipp...
Michael Nelson (michael.nelson) wrote : | # |
Just adding a incremental that fixes test breakages - ensuring that we
always format the changes before displaying it in emails (so it appears
exactly as it used to in emails - without the '\n .' blank line token or
initial spaces.
Julian has already reviewed the incremental and I've put it through ec2
test so it's ready for testing on dogfood.
--
Michael
Michael Nelson (michael.nelson) wrote : | # |
Hi Colin,
I'm adding you as a reviewer as apparently you've been around long enough to have a better understanding of the code in question here (our custom parse_tagfile_lines function), or at least know who else might :).
There is already a work-around that's been CP'd onto Germanium that ensures that the Source index creation of PPAs copes with the incorrectly parsed data.
This branch fixes the actual parsing error so that we won't get any more incorrectly-parsed dsc_binaries etc., ensuring that our custom parser does exactly what apt_pkg.
The problem was not restricted to Binary fields in dsc files, but all values that can span multiple lines in both dsc and changes files. From my manual searching, as well as from the current test-suite coverage, this affected only:
(1) Indexes generation for PPAs (per the bug) as we do not use apt-ftparchive,
(2) formatting of text in Queue notification emails.
but there could be places that both the test coverage and I missed.
I've created bug 440014 to completely get rid of our custom parse_tagfile_lines function (replacing it with apt_pkg.
I've also tested this branch on dogfood, details here: http://
So if you approve of this branch landing, when would you prefer to be landed? We could
1. get it cherry-pick it now and so have some time to be condident that there are no unforseen problems, or
2. land it in devel without a CP, but then it will be rolled out in the beginning of Nov - probably not ideal, or
3. wait until LP 3.1.10 has been rolled out, and Karmic is released, and commit it to devel then (perhaps CPing it). Safer - but then more corrupt data in the database.
The dsc files with Binary values spanning multiple lines have approximately doubled each week, - mainly from soures that produce many binaries (linux, mono, oo.org) being added to people's ppas:
24-09-2009 - 12
06-10-2009 - 24
13-10-2009 - 48
Thanks for your time!
Colin Watson (cjwatson) wrote : | # |
> I'm adding you as a reviewer as apparently you've been around long enough to
> have a better understanding of the code in question here (our custom
> parse_tagfile_lines function), or at least know who else might :).
Thanks for asking me to review. The algorithmic changes look OK to me. I'm not sure I know any better than you why we have our own version of this function, though. It's been around since the very first version of Soyuz was committed (see r542, although it was called parse_changes back then). Even then, apt_pkg.
Reasons I can think of why it might have needed to be a separate function:
* no other way to enforce dsc_whitespace_
* problems with case-insensitivity at the time? (http://
* apt_pkg.
> So if you approve of this branch landing, when would you prefer to be landed?
> We could
> 1. get it cherry-pick it now and so have some time to be condident that there
> are no unforseen problems, or
> 2. land it in devel without a CP, but then it will be rolled out in the
> beginning of Nov - probably not ideal, or
> 3. wait until LP 3.1.10 has been rolled out, and Karmic is released, and
> commit it to devel then (perhaps CPing it). Safer - but then more corrupt data
> in the database.
Given that it's much more likely to affect PPAs than the main archive, and as you say the amount of corrupt data is increasing, I'd be happy for this to be cherry-picked ASAP.
= Summary =
This branch fixes a soyuz parsing error when a debian source control
file is uploaded.
The error is described in bug 436182 (although, somewhat confusedly...
as I was discovering the issues as I went).
To re-cap, soyuz has a function for parsing tag files (such as debian
source control files - dsc, or changes files), which should adhere to
http:// www.debian. org/doc/ debian- policy/ ch-controlfield s.html
but it doesn't in a number of cases, in particular one case which
started occurring recently - fields containing values that span multiple
lines.
The original issue that is noted in bug 436182 is that, when a value for ParseTagFile) to fail.
a field spans multiple lines, our parser was leaving a trailing '\n' on
the parsed value - which caused the Source index to be created with a
*blank* line (as shown in the original snippet in the bug description).
This is completely incorrect and causes any reparsing of this file by
standard libraries (such as apt_pkg.
While creating a workaround for this a second issue was discovered: our
parser removes the required space/tab from subsequent lines of a value
that spans multiple lines. According to the policy:
"Many fields' values may span several lines; in this case each
continuation line must start with a space or a tab."
This issue can also be seen in the example snippet in the bug ParseTagFile which will ignore
description. The effect that a Source generated from this incorrectly
parsed data will be reparsed by apt_pgk.
the subsequent lines that are not preceded with space or tab.
== Proposed fix ==
I've already landed a work-around that will handle the corrupt data in bazaar. launchpad. net/~launchpad- pqm/launchpad/ devel/revision/ 9606 bazaar. launchpad. net/~launchpad- pqm/launchpad/ devel/revision/ 9654
the database (of which there are 27 as of 2009-10-06) when generating
the index files (see
http://
and
http://
) and will get this cherry-picked.
This branch itself ensures that our parse_tagfiles function parses the ParseTagFile( ). ParseTagFile( ) - if that's an
three different types of multi-line values available in tag files (a
new-line-delimited value, a value spanning multiple lines and a
description) with exactly the same result as apt_pkg.
Part of the reason for this is to make it easier to later remove the
function and replace it with apt_pkg.
option (I need to talk with Colin or someone who knows the history
regarding why we've created our own parsing function).
== Pre-implementation notes ==
== Implementation details ==
== Tests ==
bin/test -vv -t Testtagfiles -t TestTagFileDebi anPolicyCompat
== Demo and Q/A ==
I'll be QA'ing this on DF.
= Launchpad lint =
Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.
Wow - ok, so I guess this file hasn't been touched since our python
style guide came into effect ;).
I've fixed all the lint issues below.
Linting changed files: archiveuploader /tagfiles. py archiveuploader /tests/ test_tagfiles. py archiveuploader /tests/ data/test436182 _0.1_source. changes
lib/lp/
lib/lp/
lib/lp/
== Pyflakes notices ==
lib/lp/ archiveuploader /tests/ test_tagfiles. py
10: 'sys'...