Merge lp:~cjwatson/launchpad/destroy-ascii-smash into lp:launchpad

Proposed by Colin Watson
Status: Merged
Merged at revision: 17611
Proposed branch: lp:~cjwatson/launchpad/destroy-ascii-smash
Merge into: lp:launchpad
Diff against target: 708 lines (+101/-302)
8 files modified
lib/lp/answers/doc/person.txt (+2/-3)
lib/lp/answers/doc/questionsets.txt (+14/-15)
lib/lp/app/stories/launchpad-root/site-search.txt (+1/-3)
lib/lp/archiveuploader/tests/nascentupload-announcements.txt (+4/-7)
lib/lp/archiveuploader/tests/safe_fix_maintainer.txt (+24/-22)
lib/lp/archiveuploader/utils.py (+5/-10)
lib/lp/services/encoding.py (+1/-200)
lib/lp/soyuz/adapters/notification.py (+50/-42)
To merge this branch: bzr merge lp:~cjwatson/launchpad/destroy-ascii-smash
Reviewer Review Type Date Requested Status
William Grant code Approve
Review via email: mp+264039@code.launchpad.net

Commit message

Perform proper RFC2047-encoding of mail notification headers, and remove ascii_smash.

Description of the change

ascii_smash made some sense for shipit, where there were third-party limitations on label printing; but now that that's gone, there is no valid justification for using it anywhere in Launchpad. Its rendering of Arabic text is particularly terrible. I've replaced all uses of ascii_smash with either straightforward Unicode or RFC2047-encoding.

The Soyuz case requires delicate treatment, as we need fix_maintainer to cope with the particular variant of RFC822 used in Maintainer and Changed-By fields, but it isn't necessarily possible to run the RFC822 output of fix_maintainer back through fix_maintainer, because it may output a parenthesised form rather than an angle-bracketed form. I avoided this problem by keeping track of the e-mail address on its own in a few more places, which is enough for a Person lookup.

To post a comment you must log in.
Revision history for this message
William Grant (wgrant) :
review: Approve (code)
Revision history for this message
Colin Watson (cjwatson) :

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/answers/doc/person.txt'
--- lib/lp/answers/doc/person.txt 2013-05-01 00:23:31 +0000
+++ lib/lp/answers/doc/person.txt 2015-07-07 13:35:34 +0000
@@ -170,13 +170,12 @@
170But Carlos has one.170But Carlos has one.
171171
172 # Because not everyone uses a real editor <wink>172 # Because not everyone uses a real editor <wink>
173 >>> from lp.services.encoding import ascii_smash
174 >>> carlos_raw = personset.getByName('carlos')173 >>> carlos_raw = personset.getByName('carlos')
175 >>> carlos = IQuestionsPerson(carlos_raw)174 >>> carlos = IQuestionsPerson(carlos_raw)
176 >>> for question in carlos.searchQuestions(175 >>> for question in carlos.searchQuestions(
177 ... language=(english, spanish)):176 ... language=(english, spanish)):
178 ... print ascii_smash(question.title), question.language.code177 ... print question.title, question.language.code
179 Problema al recompilar kernel con soporte smp (doble-nucleo) es178 Problema al recompilar kernel con soporte smp (doble-núcleo) es
180179
181180
182Questions needing attention181Questions needing attention
183182
=== modified file 'lib/lp/answers/doc/questionsets.txt'
--- lib/lp/answers/doc/questionsets.txt 2013-05-01 00:23:31 +0000
+++ lib/lp/answers/doc/questionsets.txt 2015-07-07 13:35:34 +0000
@@ -48,16 +48,15 @@
48regular full text algorithm.48regular full text algorithm.
4949
50 # Because not everyone uses a real editor <wink>50 # Because not everyone uses a real editor <wink>
51 >>> from lp.services.encoding import ascii_smash
52 >>> for question in question_set.searchQuestions(search_text=u'firefox'):51 >>> for question in question_set.searchQuestions(search_text=u'firefox'):
53 ... print ascii_smash(question.title), question.target.displayname52 ... print question.title, question.target.displayname
54 Problemas de Impressao no Firefox Mozilla Firefox53 Problemas de Impressão no Firefox Mozilla Firefox
55 Firefox loses focus and gets stuck Mozilla Firefox54 Firefox loses focus and gets stuck Mozilla Firefox
56 Firefox cannot render Bank Site Mozilla Firefox55 Firefox cannot render Bank Site Mozilla Firefox
57 mailto: problem in webpage mozilla-firefox in Ubuntu56 mailto: problem in webpage mozilla-firefox in Ubuntu
58 Newly installed plug-in doesn't seem to be used Mozilla Firefox57 Newly installed plug-in doesn't seem to be used Mozilla Firefox
59 Problem showing the SVG demo on W3C site Mozilla Firefox58 Problem showing the SVG demo on W3C site Mozilla Firefox
60 AINKAFSEEN ALEFLAMTEHGHAINYEHYEHREHALEFTEH ... Ubuntu59 عكس التغييرات غير المحفوظة للمستن؟ Ubuntu
6160
6261
63Status62Status
@@ -93,8 +92,8 @@
93 >>> from lp.services.worlddata.interfaces.language import ILanguageSet92 >>> from lp.services.worlddata.interfaces.language import ILanguageSet
94 >>> spanish = getUtility(ILanguageSet)['es']93 >>> spanish = getUtility(ILanguageSet)['es']
95 >>> for t in question_set.searchQuestions(language=spanish):94 >>> for t in question_set.searchQuestions(language=spanish):
96 ... print ascii_smash(t.title)95 ... print t.title
97 Problema al recompilar kernel con soporte smp (doble-nucleo)96 Problema al recompilar kernel con soporte smp (doble-núcleo)
9897
9998
100Combinations99Combinations
@@ -106,14 +105,14 @@
106 >>> for question in question_set.searchQuestions(105 >>> for question in question_set.searchQuestions(
107 ... search_text=u'firefox',106 ... search_text=u'firefox',
108 ... status=(QuestionStatus.OPEN, QuestionStatus.INVALID)):107 ... status=(QuestionStatus.OPEN, QuestionStatus.INVALID)):
109 ... print ascii_smash(question.title), question.status.title, (108 ... print question.title, question.status.title, (
110 ... question.target.displayname)109 ... question.target.displayname)
111 Problemas de Impressao no Firefox Open Mozilla Firefox110 Problemas de Impressão no Firefox Open Mozilla Firefox
112 Firefox is slow and consumes too much ... mozilla-firefox in Ubuntu111 Firefox is slow and consumes too much ... mozilla-firefox in Ubuntu
113 Firefox loses focus and gets stuck Open Mozilla Firefox112 Firefox loses focus and gets stuck Open Mozilla Firefox
114 Firefox cannot render Bank Site Open Mozilla Firefox113 Firefox cannot render Bank Site Open Mozilla Firefox
115 Problem showing the SVG demo on W3C site Open Mozilla Firefox114 Problem showing the SVG demo on W3C site Open Mozilla Firefox
116 AINKAFSEEN ALEFLAMTEHGHAINYEHYEHREHALEFTEH ... Ubuntu115 عكس التغييرات غير المحفوظة للمستن؟ Open Ubuntu
117116
118117
119Sort order118Sort order
@@ -126,24 +125,24 @@
126 >>> from lp.answers.enums import QuestionSort125 >>> from lp.answers.enums import QuestionSort
127 >>> for question in question_set.searchQuestions(126 >>> for question in question_set.searchQuestions(
128 ... search_text=u'firefox', sort=QuestionSort.OLDEST_FIRST):127 ... search_text=u'firefox', sort=QuestionSort.OLDEST_FIRST):
129 ... print question.id, ascii_smash(question.title), (128 ... print question.id, question.title, (
130 ... question.target.displayname)129 ... question.target.displayname)
131 14 AINKAFSEEN ALEFLAMTEHGHAINYEHYEHREHALEFTEH ... Ubuntu130 14 عكس التغييرات غير المحفوظة للمستن؟ Ubuntu
132 1 Firefox cannot render Bank Site Mozilla Firefox131 1 Firefox cannot render Bank Site Mozilla Firefox
133 2 Problem showing the SVG demo on W3C site Mozilla Firefox132 2 Problem showing the SVG demo on W3C site Mozilla Firefox
134 4 Firefox loses focus and gets stuck Mozilla Firefox133 4 Firefox loses focus and gets stuck Mozilla Firefox
135 6 Newly installed plug-in doesn't seem to be used Mozilla Firefox134 6 Newly installed plug-in doesn't seem to be used Mozilla Firefox
136 9 mailto: problem in webpage mozilla-firefox in Ubuntu135 9 mailto: problem in webpage mozilla-firefox in Ubuntu
137 13 Problemas de Impressao no Firefox Mozilla Firefox136 13 Problemas de Impressão no Firefox Mozilla Firefox
138137
139When no text search is done, the default sort order is by newest first.138When no text search is done, the default sort order is by newest first.
140139
141 >>> for question in question_set.searchQuestions(140 >>> for question in question_set.searchQuestions(
142 ... status=QuestionStatus.OPEN)[:5]:141 ... status=QuestionStatus.OPEN)[:5]:
143 ... print question.id, ascii_smash(question.title), (142 ... print question.id, question.title, (
144 ... question.target.displayname)143 ... question.target.displayname)
145 13 Problemas de Impressao no Firefox Mozilla Firefox144 13 Problemas de Impressão no Firefox Mozilla Firefox
146 12 Problema al recompilar kernel con soporte smp (doble-nucleo) Ubuntu145 12 Problema al recompilar kernel con soporte smp (doble-núcleo) Ubuntu
147 11 Continue playing after shutdown Ubuntu146 11 Continue playing after shutdown Ubuntu
148 5 Installation failed Ubuntu147 5 Installation failed Ubuntu
149 4 Firefox loses focus and gets stuck Mozilla Firefox148 4 Firefox loses focus and gets stuck Mozilla Firefox
150149
=== modified file 'lib/lp/app/stories/launchpad-root/site-search.txt'
--- lib/lp/app/stories/launchpad-root/site-search.txt 2013-09-27 04:13:23 +0000
+++ lib/lp/app/stories/launchpad-root/site-search.txt 2015-07-07 13:35:34 +0000
@@ -5,8 +5,6 @@
5specific search with Launchpad's prominent objects (projects, bugs,5specific search with Launchpad's prominent objects (projects, bugs,
6teams, etc.).6teams, etc.).
77
8 >>> from lp.services.encoding import ascii_smash
9
10 # Our very helpful function for printing all the page results.8 # Our very helpful function for printing all the page results.
119
12 >>> def print_search_results(contents=None):10 >>> def print_search_results(contents=None):
@@ -14,7 +12,7 @@
14 ... contents = anon_browser.contents12 ... contents = anon_browser.contents
15 ... tag = find_tag_by_id(contents, 'search-results')13 ... tag = find_tag_by_id(contents, 'search-results')
16 ... if tag:14 ... if tag:
17 ... print ascii_smash(extract_text(tag))15 ... print extract_text(tag)
1816
19 # Another helper to make searching convenient.17 # Another helper to make searching convenient.
2018
2119
=== modified file 'lib/lp/archiveuploader/tests/nascentupload-announcements.txt'
--- lib/lp/archiveuploader/tests/nascentupload-announcements.txt 2014-11-08 23:53:17 +0000
+++ lib/lp/archiveuploader/tests/nascentupload-announcements.txt 2015-07-07 13:35:34 +0000
@@ -512,7 +512,7 @@
512 DEBUG * Changer using non-preferred email512 DEBUG * Changer using non-preferred email
513 DEBUG513 DEBUG
514 DEBUG Date: Tue, 25 Apr 2006 10:36:14 -0300514 DEBUG Date: Tue, 25 Apr 2006 10:36:14 -0300
515 DEBUG Changed-By: Celso R. Providelo <cprov@ubuntu.com>515 DEBUG Changed-By: cprov@ubuntu.com (Celso R. Providelo)
516 DEBUG Maintainer: Launchpad team <launchpad@lists.canonical.com>516 DEBUG Maintainer: Launchpad team <launchpad@lists.canonical.com>
517 DEBUG http://launchpad.dev/ubuntu/+source/bar/1.0-4517 DEBUG http://launchpad.dev/ubuntu/+source/bar/1.0-4
518 DEBUG518 DEBUG
@@ -679,8 +679,7 @@
679 0679 0
680680
681Uploads with UTF-8 characters in email addresses in the changes file are681Uploads with UTF-8 characters in email addresses in the changes file are
682permitted, but converted to ASCII, which is a limitation of the mailer.682permitted, but RFC2047-encoded. UTF-8 in the mail content is preserved.
683However, UTF-8 in the mail content is preserved.
684683
685 >>> hoary.status = SeriesStatus.DEVELOPMENT684 >>> hoary.status = SeriesStatus.DEVELOPMENT
686 >>> anything_policy = getPolicy(685 >>> anything_policy = getPolicy(
@@ -701,10 +700,8 @@
701 >>> len(msgs)700 >>> len(msgs)
702 2701 2
703702
704"Cihar" should actually be "Čihař" but the mailer will convert to ASCII.703 >>> [message['From'].replace('\n ', ' ') for message in msgs]
705704 ['Root <root@localhost>', '=?utf-8?q?Non-ascii_changed-by_=C4=8Ciha=C5=99?=
706 >>> [message['From'] for message in msgs]
707 ['Root <root@localhost>', 'Non-ascii changed-by Cihar
708 <daniel.silverstone@canonical.com>']705 <daniel.silverstone@canonical.com>']
709706
710UTF-8 text in the changes file that is sent on the email is preserved707UTF-8 text in the changes file that is sent on the email is preserved
711708
=== modified file 'lib/lp/archiveuploader/tests/safe_fix_maintainer.txt'
--- lib/lp/archiveuploader/tests/safe_fix_maintainer.txt 2010-07-24 09:12:37 +0000
+++ lib/lp/archiveuploader/tests/safe_fix_maintainer.txt 2015-07-07 13:35:34 +0000
@@ -1,41 +1,43 @@
1Test some utils method inheritaded from DAK:1Test some utils method inherited from DAK:
22
3safe_fix_maintainer() is a function used to sanitise the3safe_fix_maintainer() is a function used to sanitise the
4identification fields coming from the Debian control files (changes4identification fields coming from the Debian control files (changes
5and dsc). It allows safe unicode and non-unicode inputs.5and dsc). It allows safe unicode and non-unicode inputs.
66
7 >>> from lp.archiveuploader.utils import (7 >>> from lp.archiveuploader.utils import safe_fix_maintainer
8 ... safe_fix_maintainer)
98
10 >>> maintainer_field = 'maintainer'9 >>> maintainer_field = 'maintainer'
11 >>> changer_field = 'changed-by'10 >>> changer_field = 'changed-by'
1211
13Pure ASCII content using the two available fieldname (pretty much the same)12Pure ASCII content using the two available fieldname (pretty much the same)
1413
15 >>> content = 'Hello World <hello@world.com>'14 >>> content = 'Hello World <hello@world.com>'
16 >>> safe_fix_maintainer(content, maintainer_field)15 >>> safe_fix_maintainer(content, maintainer_field)
17 ('Hello World <hello@world.com>', 'Hello World <hello@world.com>', 'Hello World', 'hello@world.com')16 ('Hello World <hello@world.com>', 'Hello World <hello@world.com>',
17 'Hello World', 'hello@world.com')
1818
19 >>> content = 'Hello World <hello@world.com>'19 >>> content = 'Hello World <hello@world.com>'
20 >>> safe_fix_maintainer(content, changer_field)20 >>> safe_fix_maintainer(content, changer_field)
21 ('Hello World <hello@world.com>', 'Hello World <hello@world.com>', 'Hello World', 'hello@world.com')21 ('Hello World <hello@world.com>', 'Hello World <hello@world.com>',
22 'Hello World', 'hello@world.com')
2223
2324
24Passing Unicode:25Passing Unicode:
2526
26 # XXX cprov 2006-02-20 bug=32148: Not sure if it is working properly,27 # XXX cprov 2006-02-20 bug=32148: Not sure if it is working properly,
27 # at least doesn't raise any exception like in bug #32148.28 # at least doesn't raise any exception like in bug #32148.
2829
29 >>> content = u'Rapha\xc3l Pinson <raphink@ubuntu.com>'30 >>> content = u'Rapha\xc3l Pinson <raphink@ubuntu.com>'
30 >>> safe_fix_maintainer(content, maintainer_field)31 >>> safe_fix_maintainer(content, maintainer_field)
31 ('RaphaAl Pinson <raphink@ubuntu.com>', 'RaphaAl Pinson <raphink@ubuntu.com>', 'RaphaAl Pinson', 'raphink@ubuntu.com')32 ('Rapha\xc3\x83l Pinson <raphink@ubuntu.com>',
33 '=?utf-8?q?Rapha=C3=83l_Pinson?= <raphink@ubuntu.com>',
34 'Rapha\xc3\x83l Pinson', 'raphink@ubuntu.com')
3235
3336
34Passing latin encoded string:37Passing latin encoded string:
3538
36 >>> content = 'Rapha\xebl Pinson <raphink@ubuntu.com>'39 >>> content = 'Rapha\xebl Pinson <raphink@ubuntu.com>'
37 >>> safe_fix_maintainer(content, maintainer_field)40 >>> safe_fix_maintainer(content, maintainer_field)
38 ('Raphael Pinson <raphink@ubuntu.com>', 'Raphael Pinson <raphink@ubuntu.com>', 'Raphael Pinson', 'raphink@ubuntu.com')41 ('Rapha\xc3\xabl Pinson <raphink@ubuntu.com>',
3942 '=?utf-8?q?Rapha=C3=ABl_Pinson?= <raphink@ubuntu.com>',
4043 'Rapha\xc3\xabl Pinson', 'raphink@ubuntu.com')
41
4244
=== modified file 'lib/lp/archiveuploader/utils.py'
--- lib/lp/archiveuploader/utils.py 2015-03-13 19:05:50 +0000
+++ lib/lp/archiveuploader/utils.py 2015-07-07 13:35:34 +0000
@@ -1,4 +1,4 @@
1# Copyright 2009-2012 Canonical Ltd. This software is licensed under the1# Copyright 2009-2015 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Archive uploader utilities."""4"""Archive uploader utilities."""
@@ -37,10 +37,7 @@
37import signal37import signal
38import subprocess38import subprocess
3939
40from lp.services.encoding import (40from lp.services.encoding import guess as guess_encoding
41 ascii_smash,
42 guess as guess_encoding,
43 )
44from lp.soyuz.enums import BinaryPackageFileType41from lp.soyuz.enums import BinaryPackageFileType
4542
4643
@@ -269,15 +266,13 @@
269def safe_fix_maintainer(content, fieldname):266def safe_fix_maintainer(content, fieldname):
270 """Wrapper for fix_maintainer() to handle unicode and string argument.267 """Wrapper for fix_maintainer() to handle unicode and string argument.
271268
272 It verifies the content type and transform it in a unicode with guess()269 It verifies the content type and transforms it to a unicode with
273 before call ascii_smash(). Then we can safely call fix_maintainer().270 guess(). Then we can safely call fix_maintainer().
274 """271 """
275 if type(content) != unicode:272 if type(content) != unicode:
276 content = guess_encoding(content)273 content = guess_encoding(content)
277274
278 content = ascii_smash(content)275 return fix_maintainer(content.encode("utf-8"), fieldname)
279
280 return fix_maintainer(content, fieldname)
281276
282277
283def extract_dpkg_source(dsc_filepath, target, vendor=None):278def extract_dpkg_source(dsc_filepath, target, vendor=None):
284279
=== modified file 'lib/lp/services/encoding.py'
--- lib/lp/services/encoding.py 2011-12-19 23:38:16 +0000
+++ lib/lp/services/encoding.py 2015-07-07 13:35:34 +0000
@@ -1,20 +1,17 @@
1# Copyright 2009 Canonical Ltd. This software is licensed under the1# Copyright 2009-2015 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Character encoding utilities"""4"""Character encoding utilities"""
55
6__metaclass__ = type6__metaclass__ = type
7__all__ = [7__all__ = [
8 'ascii_smash',
9 'escape_nonascii_uniquely',8 'escape_nonascii_uniquely',
10 'guess',9 'guess',
11 'is_ascii_only',10 'is_ascii_only',
12 ]11 ]
1312
14import codecs13import codecs
15from cStringIO import StringIO
16import re14import re
17import unicodedata
1815
1916
20_boms = [17_boms = [
@@ -156,202 +153,6 @@
156 return unicode(s, 'ISO-8859-1', 'replace')153 return unicode(s, 'ISO-8859-1', 'replace')
157154
158155
159def ascii_smash(unicode_string):
160 """Attempt to convert the Unicode string, possibly containing accents,
161 to an ASCII string.
162
163 This is used for generating shipping labels because our shipping company
164 can only deal with ASCII despite being European :-/
165
166 ASCII goes through just fine
167
168 >>> ascii_smash(u"Hello")
169 'Hello'
170
171 Latin-1 accented characters have their accents stripped.
172
173 >>> ascii_smash(u"Ol\N{LATIN SMALL LETTER E WITH ACUTE}")
174 'Ole'
175 >>> ascii_smash(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}iste")
176 'Aiste'
177 >>> ascii_smash(
178 ... u"\N{LATIN SMALL LETTER AE}"
179 ... u"\N{LATIN SMALL LETTER I WITH GRAVE}"
180 ... u"\N{LATIN SMALL LETTER O WITH STROKE}"
181 ... u"\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}"
182 ... )
183 'aeiou'
184 >>> ascii_smash(
185 ... u"\N{LATIN CAPITAL LETTER AE}"
186 ... u"\N{LATIN CAPITAL LETTER I WITH GRAVE}"
187 ... u"\N{LATIN CAPITAL LETTER O WITH STROKE}"
188 ... u"\N{LATIN CAPITAL LETTER U WITH TILDE}"
189 ... )
190 'AEIOU'
191 >>> ascii_smash(u"Stra\N{LATIN SMALL LETTER SHARP S}e")
192 'Strasse'
193
194 Moving further into Eastern Europe we get more odd letters
195
196 >>> ascii_smash(
197 ... u"\N{LATIN CAPITAL LETTER Z WITH CARON}"
198 ... u"ivkovi\N{LATIN SMALL LETTER C WITH CARON}"
199 ... )
200 'Zivkovic'
201
202 >>> ascii_smash(u"\N{LATIN CAPITAL LIGATURE OE}\N{LATIN SMALL LIGATURE OE}")
203 'OEoe'
204
205 """
206 out = StringIO()
207 for char in unicode_string:
208 out.write(ascii_char_smash(char))
209 return out.getvalue()
210
211
212def ascii_char_smash(char):
213 """Smash a single Unicode character into an ASCII representation.
214
215 >>> ascii_char_smash(u"\N{KATAKANA LETTER SMALL A}")
216 'a'
217 >>> ascii_char_smash(u"\N{KATAKANA LETTER A}")
218 'A'
219 >>> ascii_char_smash(u"\N{KATAKANA LETTER KA}")
220 'KA'
221 >>> ascii_char_smash(u"\N{HIRAGANA LETTER SMALL A}")
222 'a'
223 >>> ascii_char_smash(u"\N{HIRAGANA LETTER A}")
224 'A'
225 >>> ascii_char_smash(u"\N{BOPOMOFO LETTER ANG}")
226 'ANG'
227 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER H WITH STROKE}")
228 'H'
229 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER LONG S}")
230 's'
231 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER THORN}")
232 'TH'
233 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER THORN}")
234 'th'
235 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER I WITH OGONEK}")
236 'I'
237 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER AE}")
238 'AE'
239 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER A WITH DIAERESIS}")
240 'Ae'
241 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER A WITH DIAERESIS}")
242 'ae'
243 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER O WITH DIAERESIS}")
244 'Oe'
245 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER O WITH DIAERESIS}")
246 'oe'
247 >>> ascii_char_smash(u"\N{LATIN CAPITAL LETTER U WITH DIAERESIS}")
248 'Ue'
249 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER U WITH DIAERESIS}")
250 'ue'
251 >>> ascii_char_smash(u"\N{LATIN SMALL LETTER SHARP S}")
252 'ss'
253
254 Latin-1 and other symbols are lost
255
256 >>> ascii_char_smash(u"\N{POUND SIGN}")
257 ''
258
259 Unless they also happen to be letters of some kind, such as greek
260
261 >>> ascii_char_smash(u"\N{MICRO SIGN}")
262 'mu'
263
264 Fractions
265
266 >>> ascii_char_smash(u"\N{VULGAR FRACTION ONE HALF}")
267 '1/2'
268
269 """
270 mapping = {
271 u"\N{LATIN CAPITAL LETTER AE}": "AE",
272 u"\N{LATIN SMALL LETTER AE}": "ae",
273
274 u"\N{LATIN CAPITAL LETTER A WITH DIAERESIS}": "Ae",
275 u"\N{LATIN SMALL LETTER A WITH DIAERESIS}": "ae",
276
277 u"\N{LATIN CAPITAL LETTER O WITH DIAERESIS}": "Oe",
278 u"\N{LATIN SMALL LETTER O WITH DIAERESIS}": "oe",
279
280 u"\N{LATIN CAPITAL LETTER U WITH DIAERESIS}": "Ue",
281 u"\N{LATIN SMALL LETTER U WITH DIAERESIS}": "ue",
282
283 u"\N{LATIN SMALL LETTER SHARP S}": "ss",
284
285 u"\N{LATIN CAPITAL LETTER THORN}": "TH",
286 u"\N{LATIN SMALL LETTER THORN}": "th",
287
288 u"\N{FRACTION SLASH}": "/",
289 u"\N{MULTIPLICATION SIGN}": "x",
290
291 u"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}": "=",
292 }
293
294 # Pass through ASCII
295 if ord(char) < 127:
296 return char
297
298 # Handle manual mappings
299 if mapping.has_key(char):
300 return mapping[char]
301
302 # Regress to decomposed form and recurse if necessary.
303 decomposed = unicodedata.normalize("NFKD", char)
304 if decomposed != char:
305 out = StringIO()
306 for char in decomposed:
307 out.write(ascii_char_smash(char))
308 return out.getvalue()
309
310 # Handle whitespace
311 if char.isspace():
312 return " "
313
314 # Handle digits
315 if char.isdigit():
316 return unicodedata.digit(char)
317
318 # Handle decimal (probably pointless given isdigit above)
319 if char.isdecimal():
320 return unicodedata.decimal(char)
321
322 # Handle numerics, such as 1/2
323 if char.isnumeric():
324 formatted = "%f" % unicodedata.numeric(char)
325 # Strip leading and trailing 0
326 return formatted.strip("0")
327
328 # Ignore unprintables, such as the accents we denormalized
329 if not char.isalnum():
330 return ""
331
332 # Return modified latin characters as just the latin part.
333 name = unicodedata.name(char)
334
335 match = re.search("LATIN CAPITAL LIGATURE (\w+)", name)
336 if match is not None:
337 return match.group(1)
338
339 match = re.search("LATIN SMALL LIGATURE (\w+)", name)
340 if match is not None:
341 return match.group(1).lower()
342
343 match = re.search("(?:LETTER SMALL|SMALL LETTER) (\w+)", name)
344 if match is not None:
345 return match.group(1).lower()
346
347 match = re.search("LETTER (\w+)", name)
348 if match is not None:
349 return match.group(1)
350
351 # Something we can't represent. Return empty string.
352 return ""
353
354
355def escape_nonascii_uniquely(bogus_string):156def escape_nonascii_uniquely(bogus_string):
356 """Replace non-ascii characters with a hex representation.157 """Replace non-ascii characters with a hex representation.
357158
358159
=== modified file 'lib/lp/soyuz/adapters/notification.py'
--- lib/lp/soyuz/adapters/notification.py 2015-03-13 19:05:50 +0000
+++ lib/lp/soyuz/adapters/notification.py 2015-07-07 13:35:34 +0000
@@ -1,4 +1,4 @@
1# Copyright 2011-2014 Canonical Ltd. This software is licensed under the1# Copyright 2011-2015 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Notification for uploads and copies."""4"""Notification for uploads and copies."""
@@ -27,10 +27,7 @@
27from lp.registry.interfaces.person import IPersonSet27from lp.registry.interfaces.person import IPersonSet
28from lp.registry.interfaces.pocket import PackagePublishingPocket28from lp.registry.interfaces.pocket import PackagePublishingPocket
29from lp.services.config import config29from lp.services.config import config
30from lp.services.encoding import (30from lp.services.encoding import guess as guess_encoding
31 ascii_smash,
32 guess as guess_encoding,
33 )
34from lp.services.mail.helpers import get_email_template31from lp.services.mail.helpers import get_email_template
35from lp.services.mail.sendmail import (32from lp.services.mail.sendmail import (
36 format_address,33 format_address,
@@ -232,9 +229,11 @@
232229
233 info = fetch_information(spr, bprs, changes)230 info = fetch_information(spr, bprs, changes)
234 from_addr = info['changedby']231 from_addr = info['changedby']
232 from_email = info['changedby_email']
235 if announce_from_person is not None:233 if announce_from_person is not None:
236 if announce_from_person.preferredemail is not None:234 if announce_from_person.preferredemail is not None:
237 from_addr = format_address_for_person(announce_from_person)235 from_addr = format_address_for_person(announce_from_person)
236 from_email = announce_from_person.preferredemail.email
238237
239 # If we're sending an acceptance notification for a non-PPA upload,238 # If we're sending an acceptance notification for a non-PPA upload,
240 # announce if possible. Avoid announcing backports, binary-only239 # announce if possible. Avoid announcing backports, binary-only
@@ -243,7 +242,7 @@
243 and not archive.is_ppa242 and not archive.is_ppa
244 and pocket != PackagePublishingPocket.BACKPORTS243 and pocket != PackagePublishingPocket.BACKPORTS
245 and not (pocket == PackagePublishingPocket.SECURITY and spr is None)244 and not (pocket == PackagePublishingPocket.SECURITY and spr is None)
246 and not is_auto_sync_upload(spr, bprs, pocket, from_addr)):245 and not is_auto_sync_upload(spr, bprs, pocket, from_email)):
247 name = None246 name = None
248 bcc_addr = None247 bcc_addr = None
249 if spr:248 if spr:
@@ -301,17 +300,17 @@
301 # Some syncs (e.g. from Debian) will involve packages whose300 # Some syncs (e.g. from Debian) will involve packages whose
302 # changed-by person was auto-created in LP and hence does not have a301 # changed-by person was auto-created in LP and hence does not have a
303 # preferred email address set. We'll get a None here.302 # preferred email address set. We'll get a None here.
304 changedby_person = email_to_person(info['changedby'])303 changedby_person = email_to_person(info['changedby_email'])
305304
306 if blamer is not None and blamer != changedby_person:305 if blamer is not None and blamer != changedby_person:
307 signer_signature = person_to_email(blamer)306 signer_signature = person_to_email(blamer)
308 if signer_signature != info['changedby']:307 if signer_signature != info['changedby']:
309 information['SIGNER'] = '\nSigned-By: %s' % signer_signature308 information['SIGNER'] = '\nSigned-By: %s' % signer_signature
310 # Add maintainer if present and different from changed-by.309 # Add maintainer if present and different from changed-by.
311 maintainer = info['maintainer']310 maintainer_displayname = info['maintainer_displayname']
312 changedby = info['changedby']311 if (maintainer_displayname and
313 if maintainer and maintainer != changedby:312 maintainer_displayname != changedby_displayname):
314 information['MAINTAINER'] = '\nMaintainer: %s' % maintainer313 information['MAINTAINER'] = '\nMaintainer: %s' % maintainer_displayname
315 return get_template(archive, action) % information314 return get_template(archive, action) % information
316315
317316
@@ -360,24 +359,15 @@
360 config.uploader.default_sender_name,359 config.uploader.default_sender_name,
361 config.uploader.default_sender_address)360 config.uploader.default_sender_address)
362361
363 # `sendmail`, despite handling unicode message bodies, can't
364 # cope with non-ascii sender/recipient addresses, so ascii_smash
365 # is used on all addresses.
366
367 # All emails from here have a Bcc to the default recipient.362 # All emails from here have a Bcc to the default recipient.
368 bcc_text = format_address(363 bcc_text = format_address(
369 config.uploader.default_recipient_name,364 config.uploader.default_recipient_name,
370 config.uploader.default_recipient_address)365 config.uploader.default_recipient_address)
371 if bcc:366 if bcc:
372 bcc_text = "%s, %s" % (bcc_text, bcc)367 bcc_text = "%s, %s" % (bcc_text, bcc)
373 extra_headers['Bcc'] = ascii_smash(bcc_text)368 extra_headers['Bcc'] = bcc_text
374369
375 recipients = ascii_smash(", ".join(to_addrs))370 recipients = ", ".join(to_addrs)
376 if isinstance(from_addr, unicode):
377 # ascii_smash only works on unicode strings.
378 from_addr = ascii_smash(from_addr)
379 else:
380 from_addr.encode('ascii')
381371
382 if dry_run and logger is not None:372 if dry_run and logger is not None:
383 debug(logger, "Would have sent a mail:")373 debug(logger, "Would have sent a mail:")
@@ -471,8 +461,8 @@
471 candidate_recipients = [blamer]461 candidate_recipients = [blamer]
472 info = fetch_information(spr, bprs, changes)462 info = fetch_information(spr, bprs, changes)
473463
474 changer = email_to_person(info['changedby'])464 changer = email_to_person(info['changedby_email'])
475 maintainer = email_to_person(info['maintainer'])465 maintainer = email_to_person(info['maintainer_email'])
476466
477 if blamer is None and not archive.is_copy:467 if blamer is None and not archive.is_copy:
478 debug(logger, "Changes file is unsigned; adding changer as recipient.")468 debug(logger, "Changes file is unsigned; adding changer as recipient.")
@@ -565,23 +555,16 @@
565 return summary555 return summary
566556
567557
568def email_to_person(fullemail):558def email_to_person(email):
569 """Return an `IPerson` given an RFC2047 email address.559 """Return an `IPerson` given an email address (without a name).
570560
571 :param fullemail: Potential email address.561 :param email: Potential email address.
572 :return: `IPerson` with the given email address. None if there562 :return: `IPerson` with the given email address. None if there
573 isn't one, or if `fullemail` isn't a proper email address.563 isn't one, or if `email` is None.
574 """564 """
575 if not fullemail:565 if not email:
576 return None566 return None
577567 return getUtility(IPersonSet).getByEmail(email)
578 try:
579 # The 2nd arg to s_f_m() doesn't matter as it won't fail since every-
580 # thing will have already parsed at this point.
581 rfc822, rfc2047, name, email = safe_fix_maintainer(fullemail, "email")
582 return getUtility(IPersonSet).getByEmail(email)
583 except ParseMaintError:
584 return None
585568
586569
587def person_to_email(person):570def person_to_email(person):
@@ -591,6 +574,25 @@
591 return format_address_for_person(person)574 return format_address_for_person(person)
592575
593576
577def fix_email(fullemail, field_name):
578 """Turn an email address from .changes into various useful forms.
579
580 The input address may be None, or anything that `fix_maintainer`
581 understands.
582
583 :return: A tuple of (RFC2047-compatible address, Unicode
584 RFC822-compatible address, email).
585 """
586 if not fullemail:
587 return None, None, None
588
589 try:
590 rfc822, rfc2047, _, email = safe_fix_maintainer(fullemail, field_name)
591 return rfc2047, rfc822.decode('utf-8'), email
592 except ParseMaintError:
593 return None, None, None
594
595
594def is_auto_sync_upload(spr, bprs, pocket, changed_by_email):596def is_auto_sync_upload(spr, bprs, pocket, changed_by_email):
595 """Return True if this is a (Debian) auto sync upload.597 """Return True if this is a (Debian) auto sync upload.
596598
@@ -609,17 +611,19 @@
609def fetch_information(spr, bprs, changes, previous_version=None):611def fetch_information(spr, bprs, changes, previous_version=None):
610 changedby = None612 changedby = None
611 changedby_displayname = None613 changedby_displayname = None
614 changedby_email = None
612 maintainer = None615 maintainer = None
613 maintainer_displayname = None616 maintainer_displayname = None
617 maintainer_email = None
614618
615 if changes:619 if changes:
616 changesfile = ChangesFile.formatChangesComment(620 changesfile = ChangesFile.formatChangesComment(
617 sanitize_string(changes.get('Changes')))621 sanitize_string(changes.get('Changes')))
618 date = changes.get('Date')622 date = changes.get('Date')
619 changedby = sanitize_string(changes.get('Changed-By'))623 changedby, changedby_displayname, changedby_email = fix_email(
620 maintainer = sanitize_string(changes.get('Maintainer'))624 changes.get('Changed-By'), 'Changed-By')
621 changedby_displayname = changedby625 maintainer, maintainer_displayname, maintainer_email = fix_email(
622 maintainer_displayname = maintainer626 changes.get('Maintainer'), 'Maintainer')
623 elif spr or bprs:627 elif spr or bprs:
624 if not spr and bprs:628 if not spr and bprs:
625 spr = bprs[0].build.source_package_release629 spr = bprs[0].build.source_package_release
@@ -631,10 +635,12 @@
631 addr = formataddr((spr.creator.displayname,635 addr = formataddr((spr.creator.displayname,
632 spr.creator.preferredemail.email))636 spr.creator.preferredemail.email))
633 changedby_displayname = sanitize_string(addr)637 changedby_displayname = sanitize_string(addr)
638 changedby_email = spr.creator.preferredemail.email
634 if maintainer:639 if maintainer:
635 addr = formataddr((spr.maintainer.displayname,640 addr = formataddr((spr.maintainer.displayname,
636 spr.maintainer.preferredemail.email))641 spr.maintainer.preferredemail.email))
637 maintainer_displayname = sanitize_string(addr)642 maintainer_displayname = sanitize_string(addr)
643 maintainer_email = spr.maintainer.preferredemail.email
638 else:644 else:
639 changesfile = date = None645 changesfile = date = None
640646
@@ -643,8 +649,10 @@
643 'date': date,649 'date': date,
644 'changedby': changedby,650 'changedby': changedby,
645 'changedby_displayname': changedby_displayname,651 'changedby_displayname': changedby_displayname,
652 'changedby_email': changedby_email,
646 'maintainer': maintainer,653 'maintainer': maintainer,
647 'maintainer_displayname': maintainer_displayname,654 'maintainer_displayname': maintainer_displayname,
655 'maintainer_email': maintainer_email,
648 }656 }
649657
650658