Merge lp:~jml/pkgme/extended-characters-in-description into lp:pkgme

Proposed by Jonathan Lange
Status: Merged
Approved by: James Westby
Approved revision: 100
Merged at revision: 100
Proposed branch: lp:~jml/pkgme/extended-characters-in-description
Merge into: lp:pkgme
Diff against target: 31 lines (+11/-0)
2 files modified
pkgme/info_elements.py (+2/-0)
pkgme/tests/test_info_elements.py (+9/-0)
To merge this branch: bzr merge lp:~jml/pkgme/extended-characters-in-description
Reviewer Review Type Date Requested Status
James Westby Approve
Review via email: mp+99293@code.launchpad.net

Commit message

Support extended characters in the description

Description of the change

We had a case where John submitted a package that had pretty quotes in the description field. 'json' correctly serializes and deserializes these values as unicode, so the unicode made its way over the HTTP request, into the devportal-metadata.json file and straight into the Description info element.

http://www.debian.org/doc/debian-policy/ch-controlfields.html states that 'all control files must be encoded in UTF-8', which is very sensible of them. The implication for us is that every InfoElement's get_value must return a UTF-8 encoded bytestring. Well, if that's where we want to draw the line. We could also change PackageFile to encode the values it receives before serializing them.

Anyway, I only did this for Description, since that was the presenting case. I've added a test, and have also build & installed a package that used smart quotes.

To post a comment you must log in.
Revision history for this message
James Westby (james-w) wrote :

Hi,

Looks good.

I wonder if it should be applied more widely, but this is a fine start.

Thanks,

James

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'pkgme/info_elements.py'
--- pkgme/info_elements.py 2012-02-04 18:29:54 +0000
+++ pkgme/info_elements.py 2012-03-26 11:41:21 +0000
@@ -236,6 +236,8 @@
236 formatted but actually is not.236 formatted but actually is not.
237 :return: A correctly-formatted version of that string.237 :return: A correctly-formatted version of that string.
238 """238 """
239 if isinstance(value, unicode):
240 value = value.encode('utf-8')
239 if not value.strip():241 if not value.strip():
240 return ''242 return ''
241 return '\n'.join(cls._format_lines(value.strip().splitlines()))243 return '\n'.join(cls._format_lines(value.strip().splitlines()))
242244
=== modified file 'pkgme/tests/test_info_elements.py'
--- pkgme/tests/test_info_elements.py 2012-02-04 18:29:54 +0000
+++ pkgme/tests/test_info_elements.py 2012-03-26 11:41:21 +0000
@@ -264,6 +264,15 @@
264 follow-up line264 follow-up line
265 more information""", cleaned)265 more information""", cleaned)
266266
267 def test_extended_characters_in_descriptions(self):
268 # Some people like putting extended characters in their descriptions.
269 # Let's see what happens.
270 description = (
271 u'\u201cPretty \u2018speech\u2019 marks\u201d\u202a \u2013'
272 u'what fun!')
273 cleaned = Description.clean(description)
274 self.assertEqual(description.encode('utf-8'), cleaned)
275
267276
268class TestPackageName(TestCase):277class TestPackageName(TestCase):
269278

Subscribers

People subscribed via source and target branches