= Summary = This branch fixes a soyuz parsing error when a debian source control file is uploaded. The error is described in bug 436182 (although, somewhat confusedly... as I was discovering the issues as I went). To re-cap, soyuz has a function for parsing tag files (such as debian source control files - dsc, or changes files), which should adhere to http://www.debian.org/doc/debian-policy/ch-controlfields.html but it doesn't in a number of cases, in particular one case which started occurring recently - fields containing values that span multiple lines. The original issue that is noted in bug 436182 is that, when a value for a field spans multiple lines, our parser was leaving a trailing '\n' on the parsed value - which caused the Source index to be created with a *blank* line (as shown in the original snippet in the bug description). This is completely incorrect and causes any reparsing of this file by standard libraries (such as apt_pkg.ParseTagFile) to fail. While creating a workaround for this a second issue was discovered: our parser removes the required space/tab from subsequent lines of a value that spans multiple lines. According to the policy: "Many fields' values may span several lines; in this case each continuation line must start with a space or a tab." This issue can also be seen in the example snippet in the bug description. The effect that a Source generated from this incorrectly parsed data will be reparsed by apt_pgk.ParseTagFile which will ignore the subsequent lines that are not preceded with space or tab. == Proposed fix == I've already landed a work-around that will handle the corrupt data in the database (of which there are 27 as of 2009-10-06) when generating the index files (see http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/9606 and http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/9654 ) and will get this cherry-picked. This branch itself ensures that our parse_tagfiles function parses the three different types of multi-line values available in tag files (a new-line-delimited value, a value spanning multiple lines and a description) with exactly the same result as apt_pkg.ParseTagFile(). Part of the reason for this is to make it easier to later remove the function and replace it with apt_pkg.ParseTagFile() - if that's an option (I need to talk with Colin or someone who knows the history regarding why we've created our own parsing function). == Pre-implementation notes == == Implementation details == == Tests == bin/test -vv -t Testtagfiles -t TestTagFileDebianPolicyCompat == Demo and Q/A == I'll be QA'ing this on DF. = Launchpad lint = Checking for conflicts. and issues in doctests and templates. Running jslint, xmllint, pyflakes, and pylint. Using normal rules. Wow - ok, so I guess this file hasn't been touched since our python style guide came into effect ;). I've fixed all the lint issues below. Linting changed files: lib/lp/archiveuploader/tagfiles.py lib/lp/archiveuploader/tests/test_tagfiles.py lib/lp/archiveuploader/tests/data/test436182_0.1_source.changes == Pyflakes notices == lib/lp/archiveuploader/tests/test_tagfiles.py 10: 'sys' imported but unused 11: 'shutil' imported but unused 18: 'TagFile' imported but unused 19: 'TagFileParseError' imported but unused 20: 'parse_tagfile' imported but unused 188: 'parse_tagfile' imported but unused == Pylint notices == lib/lp/archiveuploader/tagfiles.py 120: [C0301] Line too long (99/78) 123: [C0301] Line too long (82/78) 124: [C0301] Line too long (84/78) 132: [C0301] Line too long (79/78) 167: [C0301] Line too long (80/78) 194: [C0301] Line too long (89/78) 199: [C0301] Line too long (92/78) 56: [W0105, TagStanza] String statement has no effect 61: [C0324, TagStanza.items] Comma not followed by a space return [ (k,self.stanza[k]) for k in self.stanza.keys() ] ^^ lib/lp/archiveuploader/tests/test_tagfiles.py 36: [C0301] Line too long (79/78) 10: [W0611] Unused import sys 11: [W0611] Unused import shutil -- Michael