Merge lp:~jml/pkgme/extended-characters-in-description into lp:pkgme

Proposed by Jonathan Lange on 2012-03-26
Status: Merged
Approved by: James Westby on 2012-03-26
Approved revision: 100
Merged at revision: 100
Proposed branch: lp:~jml/pkgme/extended-characters-in-description
Merge into: lp:pkgme
Diff against target: 31 lines (+11/-0) 2 files modified
To merge this branch: bzr merge lp:~jml/pkgme/extended-characters-in-description
Reviewer Review Type Date Requested Status
James Westby 2012-03-26 Approve on 2012-03-26
Review via email: mp+99293@code.launchpad.net

Commit Message

Support extended characters in the description

Description of the Change

We had a case where John submitted a package that had pretty quotes in the description field. 'json' correctly serializes and deserializes these values as unicode, so the unicode made its way over the HTTP request, into the devportal-metadata.json file and straight into the Description info element.

http://www.debian.org/doc/debian-policy/ch-controlfields.html states that 'all control files must be encoded in UTF-8', which is very sensible of them. The implication for us is that every InfoElement's get_value must return a UTF-8 encoded bytestring. Well, if that's where we want to draw the line. We could also change PackageFile to encode the values it receives before serializing them.

Anyway, I only did this for Description, since that was the presenting case. I've added a test, and have also build & installed a package that used smart quotes.

To post a comment you must log in.
James Westby (james-w) wrote :

Hi,

Looks good.

I wonder if it should be applied more widely, but this is a fine start.

Thanks,

James

review: Approve

Preview Diff

1=== modified file 'pkgme/info_elements.py'
2--- pkgme/info_elements.py 2012-02-04 18:29:54 +0000
3+++ pkgme/info_elements.py 2012-03-26 11:41:21 +0000
4@@ -236,6 +236,8 @@
5 formatted but actually is not.
6 :return: A correctly-formatted version of that string.
7 """
8+ if isinstance(value, unicode):
9+ value = value.encode('utf-8')
10 if not value.strip():
11 return ''
12 return '\n'.join(cls._format_lines(value.strip().splitlines()))
13
14=== modified file 'pkgme/tests/test_info_elements.py'
15--- pkgme/tests/test_info_elements.py 2012-02-04 18:29:54 +0000
16+++ pkgme/tests/test_info_elements.py 2012-03-26 11:41:21 +0000
17@@ -264,6 +264,15 @@
18 follow-up line
19 more information""", cleaned)
20
21+ def test_extended_characters_in_descriptions(self):
22+ # Some people like putting extended characters in their descriptions.
23+ # Let's see what happens.
24+ description = (
25+ u'\u201cPretty \u2018speech\u2019 marks\u201d\u202a \u2013'
26+ u'what fun!')
27+ cleaned = Description.clean(description)
28+ self.assertEqual(description.encode('utf-8'), cleaned)
29+
30
31 class TestPackageName(TestCase):
32

Subscribers

People subscribed via source and target branches