Merge lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround into lp:launchpad

Proposed by Michael Nelson
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround
Merge into: lp:launchpad
Diff against target: 537 lines
9 files modified
lib/canonical/launchpad/emailtemplates/upload-accepted.txt (+1/-0)
lib/canonical/launchpad/emailtemplates/upload-announcement.txt (+1/-0)
lib/canonical/launchpad/emailtemplates/upload-new.txt (+1/-0)
lib/canonical/launchpad/emailtemplates/upload-rejection.txt (+1/-0)
lib/lp/archiveuploader/changesfile.py (+15/-1)
lib/lp/archiveuploader/tagfiles.py (+57/-22)
lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes (+23/-0)
lib/lp/archiveuploader/tests/test_tagfiles.py (+101/-37)
lib/lp/soyuz/model/queue.py (+15/-7)
To merge this branch: bzr merge lp:~michael.nelson/launchpad/436182-newlines-in-sources-fix-without-workaround
Reviewer Review Type Date Requested Status
Colin Watson (community) Approve
Gavin Panella (community) Approve
Review via email: mp+12988@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (4.0 KiB)

= Summary =
This branch fixes a soyuz parsing error when a debian source control
file is uploaded.

The error is described in bug 436182 (although, somewhat confusedly...
as I was discovering the issues as I went).

To re-cap, soyuz has a function for parsing tag files (such as debian
source control files - dsc, or changes files), which should adhere to

http://www.debian.org/doc/debian-policy/ch-controlfields.html

but it doesn't in a number of cases, in particular one case which
started occurring recently - fields containing values that span multiple
lines.

The original issue that is noted in bug 436182 is that, when a value for
a field spans multiple lines, our parser was leaving a trailing '\n' on
the parsed value - which caused the Source index to be created with a
*blank* line (as shown in the original snippet in the bug description).
This is completely incorrect and causes any reparsing of this file by
standard libraries (such as apt_pkg.ParseTagFile) to fail.

While creating a workaround for this a second issue was discovered: our
parser removes the required space/tab from subsequent lines of a value
that spans multiple lines. According to the policy:

"Many fields' values may span several lines; in this case each
continuation line must start with a space or a tab."

This issue can also be seen in the example snippet in the bug
description. The effect that a Source generated from this incorrectly
parsed data will be reparsed by apt_pgk.ParseTagFile which will ignore
the subsequent lines that are not preceded with space or tab.

== Proposed fix ==

I've already landed a work-around that will handle the corrupt data in
the database (of which there are 27 as of 2009-10-06) when generating
the index files (see
http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/9606
and
http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/9654
) and will get this cherry-picked.

This branch itself ensures that our parse_tagfiles function parses the
three different types of multi-line values available in tag files (a
new-line-delimited value, a value spanning multiple lines and a
description) with exactly the same result as apt_pkg.ParseTagFile().
Part of the reason for this is to make it easier to later remove the
function and replace it with apt_pkg.ParseTagFile() - if that's an
option (I need to talk with Colin or someone who knows the history
regarding why we've created our own parsing function).

== Pre-implementation notes ==

== Implementation details ==

== Tests ==

bin/test -vv -t Testtagfiles -t TestTagFileDebianPolicyCompat

== Demo and Q/A ==

I'll be QA'ing this on DF.

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Wow - ok, so I guess this file hasn't been touched since our python
style guide came into effect ;).

I've fixed all the lint issues below.

Linting changed files:
  lib/lp/archiveuploader/tagfiles.py
  lib/lp/archiveuploader/tests/test_tagfiles.py
  lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes

== Pyflakes notices ==

lib/lp/archiveuploader/tests/test_tagfiles.py
    10: 'sys'...

Read more...

Revision history for this message
Gavin Panella (allenap) wrote :
Download full text (10.0 KiB)

Hi Michael,

I looked at the diff (revisions 9647 and 9648) and there's nothing to
comment on there, so the diff I've reviewed doesn't contain those
revisions.

My only comments below are almost trivial, so r=me. It was a very good
idea to compare to apt_pkg. Also, parse_tagfile_lines is a ugly pig of
a function!

Gavin.

> === modified file 'lib/lp/archiveuploader/tagfiles.py'
> --- lib/lp/archiveuploader/tagfiles.py 2009-06-24 23:33:29 +0000
> +++ lib/lp/archiveuploader/tagfiles.py 2009-10-06 12:09:42 +0000
> @@ -64,8 +64,8 @@
> """This exception is raised if parse_changes encounters nastiness"""
> pass
>
> -re_single_line_field = re.compile(r"^(\S*)\s*:\s*(.*)");
> -re_multi_line_field = re.compile(r"^\s(.*)");
> +re_single_line_field = re.compile(r"^(\S*)\s*:\s*(.*)")
> +re_multi_line_field = re.compile(r"^(\s.*)")
>
>
> def parse_tagfile_lines(lines, dsc_whitespace_rules=0, allow_unsigned=False,
> @@ -103,7 +103,8 @@
>
> num_of_lines = len(indexed_lines.keys())
> index = 0
> - first = -1
> + first_value_for_newline_delimited_field = False
> + more_values_can_follow = False
> while index < num_of_lines:
> index += 1
> line = indexed_lines[index]
> @@ -145,21 +146,47 @@
> if slf:
> field = slf.groups()[0].lower()
> changes[field] = slf.groups()[1]
> - first = 1
> +
> + # If there is no value on this line, we assume this is
> + # the first line of a multiline field, such as the 'files'
> + # field.
> + if changes[field] == '':
> + first_value_for_newline_delimited_field = True
> +
> + # Either way, more values for this field could follow
> + # on the next line.
> + more_values_can_follow = True
> continue
> if line.rstrip() == " .":
> - changes[field] += '\n'
> + changes[field] += '\n' + line
> continue
> mlf = re_multi_line_field.match(line)
> if mlf:
> - if first == -1:
> + if more_values_can_follow is False:
> raise TagFileParseError("%s: could not parse .changes file "
> "line %d: '%s'\n [Multi-line field continuing on from nothing?]"
> % (filename, index,line))
> - if first == 1 and changes[field] != "":
> - changes[field] += '\n'
> - first = 0
> - changes[field] += mlf.groups()[0] + '\n'
> +
> + # XXX Michael Nelson 20091001 bug=440014
> + # Is there any reason why we're not simply using
> + # apt_pkg.ParseTagFile instead of this looong function.
> + # If we can get rid of this code that is trying to mimic
> + # what ParseTagFile does out of the box, it would be a good
> + # thing.
> +
> + # The first value for a newline delimited field, such as
> + # the 'files' field, has its leading spaces stripped. Other
> + # fields (such as a 'binary' field spanning multiple lines)
> + # should *not* be l-stripped of their leading spaces otherwise
> + ...

review: Approve
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (5.3 KiB)

On Wed, 2009-10-07 at 14:51 +0000, Gavin Panella wrote:
> Review: Approve
> Hi Michael,
>
> I looked at the diff (revisions 9647 and 9648) and there's nothing to
> comment on there, so the diff I've reviewed doesn't contain those
> revisions.
>
> My only comments below are almost trivial, so r=me. It was a very good
> idea to compare to apt_pkg. Also, parse_tagfile_lines is a ugly pig of
> a function!

Thanks Gavin - yeah, I'm keen to get rid of parse_tagfile_lines in the
future if possible.

>
> Gavin.
>

[snip]

> > === modified file 'lib/lp/archiveuploader/tests/test_tagfiles.py'
> > --- lib/lp/archiveuploader/tests/test_tagfiles.py 2009-06-24 23:33:29 +0000
> > +++ lib/lp/archiveuploader/tests/test_tagfiles.py 2009-10-07 11:10:35 +0000
> > @@ -5,6 +5,7 @@
> >
> > # arch-tag: 52e0c871-49a3-4186-beb8-9817d02d5465
> >
> > +import apt_pkg
> > import unittest
> > import sys
> > import shutil
> > @@ -106,19 +107,110 @@
> > tf = parse_tagfile(datadir("changes-with-exploit-bottom"))
> > self.assertRaises(KeyError, tf.__getitem__, "you")
> >
> > +
> > +class TestTagFileDebianPolicyCompat(unittest.TestCase):
> > +
> > + def setUp(self):
> > + """Parse the test file using apt_pkg for comparison."""
> > +
> > + tagfile_path = datadir("test436182_0.1_source.changes")
> > + tagfile = open(tagfile_path)
> > + self.apt_pkg_parsed_version = apt_pkg.ParseTagFile(tagfile)
> > + self.apt_pkg_parsed_version.Step()
> > +
> > + # Is it just because this is a very old file that test-related
> > + # things are imported locally?
> > + from lp.archiveuploader.tagfiles import parse_tagfile
> > + self.parse_tagfile_version = parse_tagfile(
> > + tagfile_path, allow_unsigned = True)
> > +
> > + def test_parse_tagfile_with_multiline_values(self):
> > + """parse_tagfile should not leave trailing '\n' on multiline values.
> > +
> > + This is a regression test for bug 436182.
> > + Previously we,
> > + 1. Stripped leading space/tab from subsequent lines of multiline
> > + values, and
> > + 2. appended a trailing '\n' to the end of the value.
> > + """
> > +
> > + expected_text = (
> > + 'test75874, anotherbinary,\n'
> > + ' andanother, andonemore,\n'
> > + '\tlastone')
> > +
> > + self.assertEqual(
> > + expected_text,
> > + self.apt_pkg_parsed_version.Section['Binary'])
> > +
> > + self.assertEqual(
> > + expected_text,
> > + self.parse_tagfile_version['binary'])
> > +
> > + def test_parse_tagfile_with_newline_delimited_field(self):
> > + """parse_tagfile should not leave leading or tailing '\n' when
> > + parsing newline delimited fields.
> > +
> > + Newline-delimited fields should be parsed to match
> > + apt_pkg.ParseTageFile.
> > +
> > + Note: in the past, our parse_tagfile function left the leading
> > + '\n' in the parsed value, whereas it should not have.
> > +
> > + XXX check where this value is used and ensure it won't be affected
> > + (ie. skipp...

Read more...

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Just adding a incremental that fixes test breakages - ensuring that we
always format the changes before displaying it in emails (so it appears
exactly as it used to in emails - without the '\n .' blank line token or
initial spaces.

Julian has already reviewed the incremental and I've put it through ec2
test so it's ready for testing on dogfood.

--
Michael

1=== modified file 'lib/canonical/launchpad/emailtemplates/upload-accepted.txt'
2--- lib/canonical/launchpad/emailtemplates/upload-accepted.txt 2008-09-18 11:08:16 +0000
3+++ lib/canonical/launchpad/emailtemplates/upload-accepted.txt 2009-10-12 09:21:07 +0000
4@@ -1,4 +1,5 @@
5 %(CHANGESFILE)s
6+
7 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
8 %(SPR_URL)s
9
10
11=== modified file 'lib/canonical/launchpad/emailtemplates/upload-announcement.txt'
12--- lib/canonical/launchpad/emailtemplates/upload-announcement.txt 2008-09-18 11:08:16 +0000
13+++ lib/canonical/launchpad/emailtemplates/upload-announcement.txt 2009-10-12 09:27:30 +0000
14@@ -1,3 +1,4 @@
15 %(CHANGESFILE)s
16+
17 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
18 %(SPR_URL)s
19
20=== modified file 'lib/canonical/launchpad/emailtemplates/upload-new.txt'
21--- lib/canonical/launchpad/emailtemplates/upload-new.txt 2007-07-03 11:15:10 +0000
22+++ lib/canonical/launchpad/emailtemplates/upload-new.txt 2009-10-12 09:17:07 +0000
23@@ -2,6 +2,7 @@
24
25 %(CHANGESFILE)s
26
27+
28 Your package contains new components which requires manual editing of
29 the override file. It is ok otherwise, so please be patient. New
30 packages are usually added to the overrides about once a week.
31
32=== modified file 'lib/canonical/launchpad/emailtemplates/upload-rejection.txt'
33--- lib/canonical/launchpad/emailtemplates/upload-rejection.txt 2008-09-26 07:34:37 +0000
34+++ lib/canonical/launchpad/emailtemplates/upload-rejection.txt 2009-10-12 09:34:50 +0000
35@@ -2,6 +2,7 @@
36 %(SUMMARY)s
37
38 %(CHANGESFILE)s
39+
40 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
41
42 ===
43
44=== modified file 'lib/lp/archiveuploader/changesfile.py'
45--- lib/lp/archiveuploader/changesfile.py 2009-06-24 23:33:29 +0000
46+++ lib/lp/archiveuploader/changesfile.py 2009-10-12 08:33:56 +0000
47@@ -307,10 +307,24 @@
48 """Return changesfile claimed version."""
49 return self._dict['version']
50
51+ @classmethod
52+ def formatChangesComment(cls, comment):
53+ """A class utility method for formatting changes for display."""
54+
55+ # Return the display version of the comment using the
56+ # debian policy rules. First replacing the blank line
57+ # indicator '\n .' and then stripping one space from each
58+ # successive line.
59+ comment = comment.replace('\n .', '\n')
60+ comment = comment.replace('\n ', '\n')
61+ return comment
62+
63 @property
64 def changes_comment(self):
65 """Return changesfile 'change' comment."""
66- return self._dict['changes']
67+ comment = self._dict['changes']
68+
69+ return self.formatChangesComment(comment)
70
71 @property
72 def date(self):
73
74=== modified file 'lib/lp/soyuz/model/queue.py'
75--- lib/lp/soyuz/model/queue.py 2009-10-01 07:05:22 +0000
76+++ lib/lp/soyuz/model/queue.py 2009-10-12 07:53:03 +0000
77@@ -33,6 +33,7 @@
78 from lp.archivepublisher.config import getPubConfig
79 from lp.archivepublisher.customupload import CustomUploadError
80 from lp.archivepublisher.utils import get_ppa_reference
81+from lp.archiveuploader.changesfile import ChangesFile
82 from lp.archiveuploader.tagfiles import parse_tagfile_lines
83 from lp.archiveuploader.utils import safe_fix_maintainer
84 from lp.buildmaster.pas import BuildDaemonPackagesArchSpecific
85@@ -737,14 +738,16 @@
86 """PPA rejected message."""
87 template = get_email_template('ppa-upload-rejection.txt')
88 SUMMARY = sanitize_string(summary_text)
89- CHANGESFILE = sanitize_string("".join(changes_lines))
90+ CHANGESFILE = sanitize_string(
91+ ChangesFile.formatChangesComment("".join(changes_lines)))
92 USERS_ADDRESS = config.launchpad.users_address
93
94 class RejectedMessage:
95 """Rejected message."""
96 template = get_email_template('upload-rejection.txt')
97 SUMMARY = sanitize_string(summary_text)
98- CHANGESFILE = sanitize_string(changes['changes'])
99+ CHANGESFILE = sanitize_string(
100+ ChangesFile.formatChangesComment(changes['changes']))
101 CHANGEDBY = ''
102 ORIGIN = ''
103 SIGNER = ''
104@@ -821,7 +824,8 @@
105
106 STATUS = "New"
107 SUMMARY = summarystring
108- CHANGESFILE = sanitize_string(changes['changes'])
109+ CHANGESFILE = sanitize_string(
110+ ChangesFile.formatChangesComment(changes['changes']))
111 DISTRO = self.distroseries.distribution.title
112 if announce_list:
113 ANNOUNCE = 'Announcing to %s' % announce_list
114@@ -835,7 +839,8 @@
115 STATUS = "Waiting for approval"
116 SUMMARY = summarystring + (
117 "\nThis upload awaits approval by a distro manager\n")
118- CHANGESFILE = sanitize_string(changes['changes'])
119+ CHANGESFILE = sanitize_string(
120+ ChangesFile.formatChangesComment(changes['changes']))
121 DISTRO = self.distroseries.distribution.title
122 if announce_list:
123 ANNOUNCE = 'Announcing to %s' % announce_list
124@@ -853,7 +858,8 @@
125
126 STATUS = "Accepted"
127 SUMMARY = summarystring
128- CHANGESFILE = sanitize_string(changes['changes'])
129+ CHANGESFILE = sanitize_string(
130+ ChangesFile.formatChangesComment(changes['changes']))
131 DISTRO = self.distroseries.distribution.title
132 if announce_list:
133 ANNOUNCE = 'Announcing to %s' % announce_list
134@@ -871,14 +877,16 @@
135
136 STATUS = "Accepted"
137 SUMMARY = summarystring
138- CHANGESFILE = guess_encoding("".join(changes_lines))
139+ CHANGESFILE = guess_encoding(
140+ ChangesFile.formatChangesComment("".join(changes_lines)))
141
142 class AnnouncementMessage:
143 template = get_email_template('upload-announcement.txt')
144
145 STATUS = "Accepted"
146 SUMMARY = summarystring
147- CHANGESFILE = sanitize_string(changes['changes'])
148+ CHANGESFILE = sanitize_string(
149+ ChangesFile.formatChangesComment(changes['changes']))
150 CHANGEDBY = ''
151 ORIGIN = ''
152 SIGNER = ''
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi Colin,

I'm adding you as a reviewer as apparently you've been around long enough to have a better understanding of the code in question here (our custom parse_tagfile_lines function), or at least know who else might :).

There is already a work-around that's been CP'd onto Germanium that ensures that the Source index creation of PPAs copes with the incorrectly parsed data.

This branch fixes the actual parsing error so that we won't get any more incorrectly-parsed dsc_binaries etc., ensuring that our custom parser does exactly what apt_pkg.ParseTagFile() would do in the given scenarios that failed.

The problem was not restricted to Binary fields in dsc files, but all values that can span multiple lines in both dsc and changes files. From my manual searching, as well as from the current test-suite coverage, this affected only:
  (1) Indexes generation for PPAs (per the bug) as we do not use apt-ftparchive,
  (2) formatting of text in Queue notification emails.
but there could be places that both the test coverage and I missed.

I've created bug 440014 to completely get rid of our custom parse_tagfile_lines function (replacing it with apt_pkg.ParseTagFile()).

I've also tested this branch on dogfood, details here: http://pastebin.ubuntu.com/292254/

So if you approve of this branch landing, when would you prefer to be landed? We could
 1. get it cherry-pick it now and so have some time to be condident that there are no unforseen problems, or
 2. land it in devel without a CP, but then it will be rolled out in the beginning of Nov - probably not ideal, or
 3. wait until LP 3.1.10 has been rolled out, and Karmic is released, and commit it to devel then (perhaps CPing it). Safer - but then more corrupt data in the database.

The dsc files with Binary values spanning multiple lines have approximately doubled each week, - mainly from soures that produce many binaries (linux, mono, oo.org) being added to people's ppas:
24-09-2009 - 12
06-10-2009 - 24
13-10-2009 - 48

Thanks for your time!

Revision history for this message
Colin Watson (cjwatson) wrote :

> I'm adding you as a reviewer as apparently you've been around long enough to
> have a better understanding of the code in question here (our custom
> parse_tagfile_lines function), or at least know who else might :).

Thanks for asking me to review. The algorithmic changes look OK to me. I'm not sure I know any better than you why we have our own version of this function, though. It's been around since the very first version of Soyuz was committed (see r542, although it was called parse_changes back then). Even then, apt_pkg.ParseTagFile was used in the very same file, so it clearly wasn't that Daniel didn't know about it; although the TagFile class has apparently never been used outside tests.

Reasons I can think of why it might have needed to be a separate function:

  * no other way to enforce dsc_whitespace_rules?
  * problems with case-insensitivity at the time? (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=384182)
  * apt_pkg.ParseTagFile seems to really want a real file, not e.g. StringIO

> So if you approve of this branch landing, when would you prefer to be landed?
> We could
> 1. get it cherry-pick it now and so have some time to be condident that there
> are no unforseen problems, or
> 2. land it in devel without a CP, but then it will be rolled out in the
> beginning of Nov - probably not ideal, or
> 3. wait until LP 3.1.10 has been rolled out, and Karmic is released, and
> commit it to devel then (perhaps CPing it). Safer - but then more corrupt data
> in the database.

Given that it's much more likely to affect PPAs than the main archive, and as you say the amount of corrupt data is increasing, I'd be happy for this to be cherry-picked ASAP.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/emailtemplates/upload-accepted.txt'
2--- lib/canonical/launchpad/emailtemplates/upload-accepted.txt 2008-09-18 11:08:16 +0000
3+++ lib/canonical/launchpad/emailtemplates/upload-accepted.txt 2009-10-15 13:21:16 +0000
4@@ -1,4 +1,5 @@
5 %(CHANGESFILE)s
6+
7 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
8 %(SPR_URL)s
9
10
11=== modified file 'lib/canonical/launchpad/emailtemplates/upload-announcement.txt'
12--- lib/canonical/launchpad/emailtemplates/upload-announcement.txt 2008-09-18 11:08:16 +0000
13+++ lib/canonical/launchpad/emailtemplates/upload-announcement.txt 2009-10-15 13:21:16 +0000
14@@ -1,3 +1,4 @@
15 %(CHANGESFILE)s
16+
17 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
18 %(SPR_URL)s
19
20=== modified file 'lib/canonical/launchpad/emailtemplates/upload-new.txt'
21--- lib/canonical/launchpad/emailtemplates/upload-new.txt 2007-07-03 11:15:10 +0000
22+++ lib/canonical/launchpad/emailtemplates/upload-new.txt 2009-10-15 13:21:16 +0000
23@@ -2,6 +2,7 @@
24
25 %(CHANGESFILE)s
26
27+
28 Your package contains new components which requires manual editing of
29 the override file. It is ok otherwise, so please be patient. New
30 packages are usually added to the overrides about once a week.
31
32=== modified file 'lib/canonical/launchpad/emailtemplates/upload-rejection.txt'
33--- lib/canonical/launchpad/emailtemplates/upload-rejection.txt 2008-09-26 07:34:37 +0000
34+++ lib/canonical/launchpad/emailtemplates/upload-rejection.txt 2009-10-15 13:21:16 +0000
35@@ -2,6 +2,7 @@
36 %(SUMMARY)s
37
38 %(CHANGESFILE)s
39+
40 %(DATE)s%(CHANGEDBY)s%(MAINTAINER)s%(SIGNER)s%(ORIGIN)s
41
42 ===
43
44=== modified file 'lib/lp/archiveuploader/changesfile.py'
45--- lib/lp/archiveuploader/changesfile.py 2009-06-24 23:33:29 +0000
46+++ lib/lp/archiveuploader/changesfile.py 2009-10-15 13:21:16 +0000
47@@ -307,10 +307,24 @@
48 """Return changesfile claimed version."""
49 return self._dict['version']
50
51+ @classmethod
52+ def formatChangesComment(cls, comment):
53+ """A class utility method for formatting changes for display."""
54+
55+ # Return the display version of the comment using the
56+ # debian policy rules. First replacing the blank line
57+ # indicator '\n .' and then stripping one space from each
58+ # successive line.
59+ comment = comment.replace('\n .', '\n')
60+ comment = comment.replace('\n ', '\n')
61+ return comment
62+
63 @property
64 def changes_comment(self):
65 """Return changesfile 'change' comment."""
66- return self._dict['changes']
67+ comment = self._dict['changes']
68+
69+ return self.formatChangesComment(comment)
70
71 @property
72 def date(self):
73
74=== modified file 'lib/lp/archiveuploader/tagfiles.py'
75--- lib/lp/archiveuploader/tagfiles.py 2009-06-24 23:33:29 +0000
76+++ lib/lp/archiveuploader/tagfiles.py 2009-10-15 13:21:16 +0000
77@@ -53,19 +53,19 @@
78 """Expose a dicty has_key"""
79 return key in self.stanza.keys()
80
81- """Enables (foo in bar) functionality"""
82+ # Enables (foo in bar) functionality.
83 __contains__ = has_key
84
85 def items(self):
86 """Allows for k,v in foo.items()"""
87- return [ (k,self.stanza[k]) for k in self.stanza.keys() ]
88+ return [ (k, self.stanza[k]) for k in self.stanza.keys() ]
89
90 class TagFileParseError(Exception):
91 """This exception is raised if parse_changes encounters nastiness"""
92 pass
93
94-re_single_line_field = re.compile(r"^(\S*)\s*:\s*(.*)");
95-re_multi_line_field = re.compile(r"^\s(.*)");
96+re_single_line_field = re.compile(r"^(\S*)\s*:\s*(.*)")
97+re_multi_line_field = re.compile(r"^(\s.*)")
98
99
100 def parse_tagfile_lines(lines, dsc_whitespace_rules=0, allow_unsigned=False,
101@@ -103,7 +103,8 @@
102
103 num_of_lines = len(indexed_lines.keys())
104 index = 0
105- first = -1
106+ first_value_for_newline_delimited_field = False
107+ more_values_can_follow = False
108 while index < num_of_lines:
109 index += 1
110 line = indexed_lines[index]
111@@ -116,11 +117,15 @@
112 if dsc_whitespace_rules:
113 index += 1
114 if index > num_of_lines:
115- raise TagFileParseError("%s: invalid .dsc file at line %d" % (filename, index))
116+ raise TagFileParseError(
117+ "%s: invalid .dsc file at line %d" % (
118+ filename, index))
119 line = indexed_lines[index]
120 if not line.startswith("-----BEGIN PGP SIGNATURE"):
121- raise TagFileParseError("%s: invalid .dsc file at line %d -- "
122- "expected PGP signature; got '%s'" % (filename, index,line))
123+ raise TagFileParseError(
124+ "%s: invalid .dsc file at line %d -- "
125+ "expected PGP signature; got '%s'" % (
126+ filename, index,line))
127 inside_signature = 0
128 break
129 else:
130@@ -128,8 +133,9 @@
131 if line.startswith("-----BEGIN PGP SIGNATURE"):
132 break
133
134- # If we're at the start of a signed section, then consume the signature
135- # information, and remember that we're inside the signed data.
136+ # If we're at the start of a signed section, then consume the
137+ # signature information, and remember that we're inside the signed
138+ # data.
139 if line.startswith("-----BEGIN PGP SIGNED MESSAGE"):
140 inside_signature = 1
141 if dsc_whitespace_rules:
142@@ -145,31 +151,60 @@
143 if slf:
144 field = slf.groups()[0].lower()
145 changes[field] = slf.groups()[1]
146- first = 1
147+
148+ # If there is no value on this line, we assume this is
149+ # the first line of a multiline field, such as the 'files'
150+ # field.
151+ if changes[field] == '':
152+ first_value_for_newline_delimited_field = True
153+
154+ # Either way, more values for this field could follow
155+ # on the next line.
156+ more_values_can_follow = True
157 continue
158 if line.rstrip() == " .":
159- changes[field] += '\n'
160+ changes[field] += '\n' + line
161 continue
162 mlf = re_multi_line_field.match(line)
163 if mlf:
164- if first == -1:
165- raise TagFileParseError("%s: could not parse .changes file "
166- "line %d: '%s'\n [Multi-line field continuing on from nothing?]"
167- % (filename, index,line))
168- if first == 1 and changes[field] != "":
169- changes[field] += '\n'
170- first = 0
171- changes[field] += mlf.groups()[0] + '\n'
172+ if more_values_can_follow is False:
173+ raise TagFileParseError(
174+ "%s: could not parse .changes file line %d: '%s'\n"
175+ " [Multi-line field continuing on from nothing?]" % (
176+ filename, index,line))
177+
178+ # XXX Michael Nelson 20091001 bug=440014
179+ # Is there any reason why we're not simply using
180+ # apt_pkg.ParseTagFile instead of this looong function.
181+ # If we can get rid of this code that is trying to mimic
182+ # what ParseTagFile does out of the box, it would be a good
183+ # thing.
184+
185+ # The first value for a newline delimited field, such as
186+ # the 'files' field, has its leading spaces stripped. Other
187+ # fields (such as a 'binary' field spanning multiple lines)
188+ # should *not* be l-stripped of their leading spaces otherwise
189+ # they will be re-parsed incorrectly by apt_get.ParseTagFiles()
190+ # (for example, from a Source index).
191+ value = mlf.groups()[0]
192+ if first_value_for_newline_delimited_field:
193+ changes[field] = value.lstrip()
194+ else:
195+ changes[field] += '\n' + value
196+
197+ first_value_for_newline_delimited_field = False
198 continue
199 error += line
200
201 if dsc_whitespace_rules and inside_signature:
202- raise TagFileParseError("%s: invalid .dsc format at line %d" % (filename, index))
203+ raise TagFileParseError(
204+ "%s: invalid .dsc format at line %d" % (filename, index))
205
206 changes["filecontents"] = "".join(lines)
207
208 if error:
209- raise TagFileParseError("%s: unable to parse .changes file: %s" % (filename, error))
210+ raise TagFileParseError(
211+ "%s: unable to parse .changes file: %s" % (filename, error))
212
213 return changes
214
215
216=== added file 'lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes'
217--- lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes 1970-01-01 00:00:00 +0000
218+++ lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes 2009-10-15 13:21:16 +0000
219@@ -0,0 +1,23 @@
220+Format: 1.0
221+Date: Fri, 15 Dec 2006 12:02:42 +0000
222+Source: test75874
223+Binary: test75874 anotherbinary
224+ andanother andonemore
225+ lastone
226+Architecture: all
227+Version: 0.1
228+Maintainer: Colin Watson <cjwatson@ubuntu.com>
229+Description: Here's the single-line synopsis.
230+ Then there is the extended description which can
231+ span multiple lines, and even include blank-lines like this:
232+ .
233+ There you go. If a line starts with two or more spaces,
234+ it will be displayed verbatim. Like this one:
235+ Example verbatim line.
236+ Another verbatim line.
237+ OK, back to normal.
238+Files:
239+ f26bb9b29b1108e53139da3584a4dc92 1511 test75874_0.1.tar.gz
240+ 29c955ff520cea32ab3e0316306d0ac1 393742 pmount_0.9.7.orig.tar.gz
241+ 91a8f46d372c406fadcb57c6ff7016f3 5302 pmount_0.9.7-2ubuntu2.diff.gz
242+
243
244=== modified file 'lib/lp/archiveuploader/tests/test_tagfiles.py'
245--- lib/lp/archiveuploader/tests/test_tagfiles.py 2009-06-24 23:33:29 +0000
246+++ lib/lp/archiveuploader/tests/test_tagfiles.py 2009-10-15 13:21:16 +0000
247@@ -5,23 +5,18 @@
248
249 # arch-tag: 52e0c871-49a3-4186-beb8-9817d02d5465
250
251+import apt_pkg
252 import unittest
253-import sys
254-import shutil
255 from lp.archiveuploader.tests import datadir
256
257+from lp.archiveuploader.tagfiles import (
258+ parse_tagfile, TagFile, TagFileParseError)
259+
260 class Testtagfiles(unittest.TestCase):
261
262- def testImport(self):
263- """lp.archiveuploader.tagfiles should be importable"""
264- from lp.archiveuploader.tagfiles import TagFile
265- from lp.archiveuploader.tagfiles import TagFileParseError
266- from lp.archiveuploader.tagfiles import parse_tagfile
267-
268 def testTagFileOnSingular(self):
269 """lp.archiveuploader.tagfiles.TagFile should parse a singular stanza
270 """
271- from lp.archiveuploader.tagfiles import TagFile
272 f = TagFile(file(datadir("singular-stanza"), "r"))
273 seenone = False
274 for stanza in f:
275@@ -32,8 +27,7 @@
276 self.assertEquals("FooBar" in stanza, False)
277
278 def testTagFileOnSeveral(self):
279- """lp.archiveuploader.tagfiles.TagFile should parse multiple stanzas"""
280- from lp.archiveuploader.tagfiles import TagFile
281+ """TagFile should parse multiple stanzas."""
282 f = TagFile(file(datadir("multiple-stanzas"), "r"))
283 seen = 0
284 for stanza in f:
285@@ -47,15 +41,12 @@
286 """lp.archiveuploader.tagfiles.parse_tagfile should work on a good
287 changes file
288 """
289- from lp.archiveuploader.tagfiles import parse_tagfile
290 p = parse_tagfile(datadir("good-signed-changes"))
291
292 def testCheckParseBadChangesRaises(self):
293 """lp.archiveuploader.tagfiles.parse_chantges should raise
294 TagFileParseError on failure
295 """
296- from lp.archiveuploader.tagfiles import parse_tagfile
297- from lp.archiveuploader.tagfiles import TagFileParseError
298 self.assertRaises(TagFileParseError,
299 parse_tagfile, datadir("badformat-changes"), 1)
300
301@@ -63,8 +54,6 @@
302 """lp.archiveuploader.tagfiles.parse_chantges should raise
303 TagFileParseError on empty
304 """
305- from lp.archiveuploader.tagfiles import parse_tagfile
306- from lp.archiveuploader.tagfiles import TagFileParseError
307 self.assertRaises(TagFileParseError,
308 parse_tagfile, datadir("empty-file"), 1)
309
310@@ -72,16 +61,12 @@
311 """lp.archiveuploader.tagfiles.parse_chantges should raise
312 TagFileParseError on malformed signatures
313 """
314- from lp.archiveuploader.tagfiles import parse_tagfile
315- from lp.archiveuploader.tagfiles import TagFileParseError
316 self.assertRaises(TagFileParseError,
317 parse_tagfile, datadir("malformed-sig-changes"), 1)
318
319 def testCheckParseMalformedMultilineRaises(self):
320 """lp.archiveuploader.tagfiles.parse_chantges should raise
321 TagFileParseError on malformed continuation lines"""
322- from lp.archiveuploader.tagfiles import parse_tagfile
323- from lp.archiveuploader.tagfiles import TagFileParseError
324 self.assertRaises(TagFileParseError,
325 parse_tagfile, datadir("bad-multiline-changes"), 1)
326
327@@ -89,8 +74,6 @@
328 """lp.archiveuploader.tagfiles.parse_chantges should raise
329 TagFileParseError on unterminated signatures
330 """
331- from lp.archiveuploader.tagfiles import parse_tagfile
332- from lp.archiveuploader.tagfiles import TagFileParseError
333 self.assertRaises(TagFileParseError,
334 parse_tagfile,
335 datadir("unterminated-sig-changes"),
336@@ -100,25 +83,106 @@
337 """lp.archiveuploader.tagfiles.parse_tagfile should not be vulnerable
338 to tags outside of the signed portion
339 """
340- from lp.archiveuploader.tagfiles import parse_tagfile
341 tf = parse_tagfile(datadir("changes-with-exploit-top"))
342 self.assertRaises(KeyError, tf.__getitem__, "you")
343 tf = parse_tagfile(datadir("changes-with-exploit-bottom"))
344 self.assertRaises(KeyError, tf.__getitem__, "you")
345
346+
347+class TestTagFileDebianPolicyCompat(unittest.TestCase):
348+
349+ def setUp(self):
350+ """Parse the test file using apt_pkg for comparison."""
351+
352+ tagfile_path = datadir("test436182_0.1_source.changes")
353+ tagfile = open(tagfile_path)
354+ self.apt_pkg_parsed_version = apt_pkg.ParseTagFile(tagfile)
355+ self.apt_pkg_parsed_version.Step()
356+
357+ self.parse_tagfile_version = parse_tagfile(
358+ tagfile_path, allow_unsigned = True)
359+
360+ def test_parse_tagfile_with_multiline_values(self):
361+ """parse_tagfile should not leave trailing '\n' on multiline values.
362+
363+ This is a regression test for bug 436182.
364+ Previously we,
365+ 1. Stripped leading space/tab from subsequent lines of multiline
366+ values, and
367+ 2. appended a trailing '\n' to the end of the value.
368+ """
369+
370+ expected_text = (
371+ 'test75874 anotherbinary\n'
372+ ' andanother andonemore\n'
373+ '\tlastone')
374+
375+ self.assertEqual(
376+ expected_text,
377+ self.apt_pkg_parsed_version.Section['Binary'])
378+
379+ self.assertEqual(
380+ expected_text,
381+ self.parse_tagfile_version['binary'])
382+
383+ def test_parse_tagfile_with_newline_delimited_field(self):
384+ """parse_tagfile should not leave leading or tailing '\n' when
385+ parsing newline delimited fields.
386+
387+ Newline-delimited fields should be parsed to match
388+ apt_pkg.ParseTageFile.
389+
390+ Note: in the past, our parse_tagfile function left the leading
391+ '\n' in the parsed value, whereas it should not have.
392+
393+ For an example,
394+ see http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Files
395+ """
396+
397+ expected_text = (
398+ 'f26bb9b29b1108e53139da3584a4dc92 1511 test75874_0.1.tar.gz\n '
399+ '29c955ff520cea32ab3e0316306d0ac1 393742 '
400+ 'pmount_0.9.7.orig.tar.gz\n'
401+ ' 91a8f46d372c406fadcb57c6ff7016f3 5302 '
402+ 'pmount_0.9.7-2ubuntu2.diff.gz')
403+
404+ self.assertEqual(
405+ expected_text,
406+ self.apt_pkg_parsed_version.Section['Files'])
407+
408+ self.assertEqual(
409+ expected_text,
410+ self.parse_tagfile_version['files'])
411+
412+ def test_parse_description_field(self):
413+ """Apt-pkg preserves the blank-line indicator and does not strip
414+ leading spaces.
415+
416+ See http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Description
417+ """
418+ expected_text = (
419+ "Here's the single-line synopsis.\n"
420+ " Then there is the extended description which can\n"
421+ " span multiple lines, and even include blank-lines like this:\n"
422+ " .\n"
423+ " There you go. If a line starts with two or more spaces,\n"
424+ " it will be displayed verbatim. Like this one:\n"
425+ " Example verbatim line.\n"
426+ " Another verbatim line.\n"
427+ " OK, back to normal.")
428+
429+ self.assertEqual(
430+ expected_text,
431+ self.apt_pkg_parsed_version.Section['Description'])
432+
433+ # In the past our parse_tagfile function replaced blank-line
434+ # indicators in the description (' .\n') with new lines ('\n'),
435+ # but it is now compatible with ParseTagFiles (and ready to be
436+ # replaced by ParseTagFiles).
437+ self.assertEqual(
438+ expected_text,
439+ self.parse_tagfile_version['description'])
440+
441 def test_suite():
442- suite = unittest.TestSuite()
443- loader = unittest.TestLoader()
444- suite.addTest(loader.loadTestsFromTestCase(Testtagfiles))
445- return suite
446-
447-def main(argv):
448- suite = test_suite()
449- runner = unittest.TextTestRunner(verbosity = 2)
450- if not runner.run(suite).wasSuccessful():
451- return 1
452- return 0
453-
454-if __name__ == '__main__':
455- sys.exit(main(sys.argv))
456+ return unittest.TestLoader().loadTestsFromName(__name__)
457
458
459=== modified file 'lib/lp/soyuz/model/queue.py'
460--- lib/lp/soyuz/model/queue.py 2009-10-09 11:13:30 +0000
461+++ lib/lp/soyuz/model/queue.py 2009-10-15 13:21:16 +0000
462@@ -33,6 +33,7 @@
463 from lp.archivepublisher.config import getPubConfig
464 from lp.archivepublisher.customupload import CustomUploadError
465 from lp.archivepublisher.utils import get_ppa_reference
466+from lp.archiveuploader.changesfile import ChangesFile
467 from lp.archiveuploader.tagfiles import parse_tagfile_lines
468 from lp.archiveuploader.utils import safe_fix_maintainer
469 from lp.buildmaster.pas import BuildDaemonPackagesArchSpecific
470@@ -747,14 +748,16 @@
471 """PPA rejected message."""
472 template = get_email_template('ppa-upload-rejection.txt')
473 SUMMARY = sanitize_string(summary_text)
474- CHANGESFILE = sanitize_string("".join(changes_lines))
475+ CHANGESFILE = sanitize_string(
476+ ChangesFile.formatChangesComment("".join(changes_lines)))
477 USERS_ADDRESS = config.launchpad.users_address
478
479 class RejectedMessage:
480 """Rejected message."""
481 template = get_email_template('upload-rejection.txt')
482 SUMMARY = sanitize_string(summary_text)
483- CHANGESFILE = sanitize_string(changes['changes'])
484+ CHANGESFILE = sanitize_string(
485+ ChangesFile.formatChangesComment(changes['changes']))
486 CHANGEDBY = ''
487 ORIGIN = ''
488 SIGNER = ''
489@@ -831,7 +834,8 @@
490
491 STATUS = "New"
492 SUMMARY = summarystring
493- CHANGESFILE = sanitize_string(changes['changes'])
494+ CHANGESFILE = sanitize_string(
495+ ChangesFile.formatChangesComment(changes['changes']))
496 DISTRO = self.distroseries.distribution.title
497 if announce_list:
498 ANNOUNCE = 'Announcing to %s' % announce_list
499@@ -845,7 +849,8 @@
500 STATUS = "Waiting for approval"
501 SUMMARY = summarystring + (
502 "\nThis upload awaits approval by a distro manager\n")
503- CHANGESFILE = sanitize_string(changes['changes'])
504+ CHANGESFILE = sanitize_string(
505+ ChangesFile.formatChangesComment(changes['changes']))
506 DISTRO = self.distroseries.distribution.title
507 if announce_list:
508 ANNOUNCE = 'Announcing to %s' % announce_list
509@@ -863,7 +868,8 @@
510
511 STATUS = "Accepted"
512 SUMMARY = summarystring
513- CHANGESFILE = sanitize_string(changes['changes'])
514+ CHANGESFILE = sanitize_string(
515+ ChangesFile.formatChangesComment(changes['changes']))
516 DISTRO = self.distroseries.distribution.title
517 if announce_list:
518 ANNOUNCE = 'Announcing to %s' % announce_list
519@@ -881,14 +887,16 @@
520
521 STATUS = "Accepted"
522 SUMMARY = summarystring
523- CHANGESFILE = guess_encoding("".join(changes_lines))
524+ CHANGESFILE = guess_encoding(
525+ ChangesFile.formatChangesComment("".join(changes_lines)))
526
527 class AnnouncementMessage:
528 template = get_email_template('upload-announcement.txt')
529
530 STATUS = "Accepted"
531 SUMMARY = summarystring
532- CHANGESFILE = sanitize_string(changes['changes'])
533+ CHANGESFILE = sanitize_string(
534+ ChangesFile.formatChangesComment(changes['changes']))
535 CHANGEDBY = ''
536 ORIGIN = ''
537 SIGNER = ''