Merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches into lp:launchpad/db-devel

Proposed by Celso Providelo
Status: Merged
Approved by: Aaron Bentley
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~cprov/launchpad/bug-404144-unicode-ppa-searches
Merge into: lp:launchpad/db-devel
Diff against target: None lines
To merge this branch: bzr merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches
Reviewer Review Type Date Requested Status
Christian Reis (community) release-critical Approve
Aaron Bentley (community) Approve
Eleanor Berger (community) Approve
Michael Nelson Pending
Review via email: mp+9273@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Celso Providelo (cprov) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

reviewer michael.nelson

= Summary =

This branch fixes https://bugs.edge.launchpad.net/soyuz/+bug/404144, involving encoding/decoding issues on form variables not dealt by BrowserRequest.

== Proposed fix ==

The bug manifest itself in two ways:

1. In the PPA:+index (and others) package search form where 'name_filter' is extraced directly from the request 'QUERY_STRING', instead of the form. At that point it's not decoded (it's always in the server expected encoding, UTF-8). If it's passed directly to IArchive.getPublishedSources() (or any other storm query) it will fail badly because storm expects unicodes.

2. In the Distribution:+search package search form where the form variables (unicodes) are directly passed to urllib.urlencode() in order to build the corresponding 'source-based' search url. urllib doesn't accept unicodes.

So the *very* experimental fix are:

For case 1. we can conveniently use BrowserRequest._decode() which respect the server accepted charsets (in our case only UTF-8) to decode the QUERY_STRING variables appopriately, as they would be available in the request.form.

For case 2. we have to encode the form variable as UTF-8 (explicitly) before passing them to urllib.

I'm not entriely sure it's the right way to go. I'd prefer something embedded in get_query_string_params() that could potentially solve both aspects of the problem.

== Tests ==

./bin/test -vv -t archive-views.txt -t distribution-views.txt

== Demo and Q/A ==

Type any non-ascii text in:

 * https://launchpad.dev/ubuntu
 * https://launchpad.dev/~cprov/+archive/ppa

(I'm not entirely sure how can it be done in a US-only system, I use an US-intl keyboard and have the appropriate lang-packs for pt_BR)

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/soyuz/browser/tests/archive-views.txt
  lib/lp/registry/browser/tests/distribution-views.txt
  lib/lp/registry/browser/distribution.py
  lib/lp/soyuz/browser/archive.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkprGZYACgkQ7KBXuXyZSjAQzQCeMNz5QQouCHL7hI/Xm0NLfXFq
lI0An2jDHkNSoZQnzg+Le1gVzOs4zmhs
=/ddq
-----END PGP SIGNATURE-----

Revision history for this message
Eleanor Berger (intellectronica) :
review: Approve
Revision history for this message
Celso Providelo (cprov) wrote :

Tom,

I've discussed a simpler approach to fix this problem with Michael.

It consists of changing query_string_parameters() to already decode variables, and on the Distribution:+search view we can simply use the original form QUERY_STRING, instead of rebuilding it with urlencode.

Can you please re-review it ? https://pastebin.canonical.com/20373/

Revision history for this message
Aaron Bentley (abentley) wrote :

> It consists of changing query_string_parameters() to already decode variables,
> and on the Distribution:+search view we can simply use the original form
> QUERY_STRING, instead of rebuilding it with urlencode.

As we discussed on IRC,

1. Please don't use setdefault at line 23, because we already know there's no value for that key
2. Please add a line break at line 28
3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing. If it did not, an exception would be raised.")

review: Approve
Revision history for this message
Celso Providelo (cprov) wrote :

On Mon, Jul 27, 2009 at 11:11 AM, Aaron Bentley<email address hidden> wrote:
> Review: Approve
>> It consists of changing query_string_parameters() to already decode variables,
>> and on the Distribution:+search view we can simply use the original form
>> QUERY_STRING, instead of rebuilding it with urlencode.
>
> As we discussed on IRC,
>
> 1. Please don't use setdefault at line 23, because we already know there's no value for that key
> 2. Please add a line break at line 28
> 3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
> 4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing.  If it did not, an exception would be raised.")

Thanks for your review, Aaron.

Full diff with your comments addressed goes attached.

--
Celso Providelo <email address hidden>
IRC: cprov, Jabber: <email address hidden>, Skype: cprovidelo
1024D/681B6469 C858 2652 1A6E F6A6 037B B3F7 9FF2 583E 681B 6469

Revision history for this message
Christian Reis (kiko) :
review: Approve (release-critical)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/registry/browser/distribution.py'
2--- lib/lp/registry/browser/distribution.py 2009-07-17 00:26:05 +0000
3+++ lib/lp/registry/browser/distribution.py 2009-07-25 14:16:22 +0000
4@@ -578,7 +578,11 @@
5 By default, we search by binary names, but also provide a link
6 to the equivalent source package search in some circumstances.
7 """
8- new_query_form = self.request.form.copy()
9+ # Encode the form variables as expected by the server, UTF-8, so it
10+ # doesn't confuse urllib encoding mechanism with unicodes.
11+ new_query_form = {}
12+ for key, value in self.request.form.iteritems():
13+ new_query_form[key] = value.encode('UTF-8')
14 new_query_form['search_type'] = 'source'
15 return "%s/+search?%s" % (
16 canonical_url(self.context),
17
18=== modified file 'lib/lp/registry/browser/tests/distribution-views.txt'
19--- lib/lp/registry/browser/tests/distribution-views.txt 2009-06-12 16:36:02 +0000
20+++ lib/lp/registry/browser/tests/distribution-views.txt 2009-07-25 14:16:22 +0000
21@@ -83,6 +83,17 @@
22 >>> print distro_pkg_search_view.source_search_url
23 http://launchpad.dev/ubuntu/+search?search_type=source&text=a
24
25+Form variables are properly encoded as UTF-8 (as expected by the
26+server) before building the 'source_search_url'.
27+
28+ >>> distro_pkg_search_view = create_initialized_view(
29+ ... ubuntu, name="+search", form={
30+ ... 'text': u'\xe7',
31+ ... })
32+
33+ >>> print distro_pkg_search_view.source_search_url
34+ http://launchpad.dev/ubuntu/+search?search_type=source&text=%C3%A7
35+
36 But users can specify that the search should be on source-package-names
37 instead:
38
39
40=== modified file 'lib/lp/soyuz/browser/archive.py'
41--- lib/lp/soyuz/browser/archive.py 2009-07-17 18:46:25 +0000
42+++ lib/lp/soyuz/browser/archive.py 2009-07-25 13:52:52 +0000
43@@ -524,7 +524,8 @@
44
45 self.specified_name_filter = None
46 if requested_name_filter is not None:
47- self.specified_name_filter = requested_name_filter[0]
48+ self.specified_name_filter = self.request._decode(
49+ requested_name_filter[0])
50
51 def setupStatusFilterWidget(self):
52 """Build a customized publishing status select widget.
53
54=== modified file 'lib/lp/soyuz/browser/tests/archive-views.txt'
55--- lib/lp/soyuz/browser/tests/archive-views.txt 2009-07-18 01:03:09 +0000
56+++ lib/lp/soyuz/browser/tests/archive-views.txt 2009-07-25 13:52:52 +0000
57@@ -235,6 +235,15 @@
58 ... print pub.displayname
59 pmount 0.1-1 in warty
60
61+The 'name_filter' is decoded as UTF-8 before futher processing.
62+
63+ >>> view = create_initialized_view(
64+ ... cprov.archive, name="+delete-packages",
65+ ... query_string="field.name_filter=%C3%A7")
66+
67+ >>> len(list(view.batched_sources))
68+ 0
69+
70 Similarly, the sources can be filtered by series:
71
72 >>> view = create_initialized_view(

Subscribers

People subscribed via source and target branches

to status/vote changes: