Merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches into lp:launchpad/db-devel

Proposed by Celso Providelo
Status: Merged
Approved by: Aaron Bentley
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~cprov/launchpad/bug-404144-unicode-ppa-searches
Merge into: lp:launchpad/db-devel
Diff against target: None lines
To merge this branch: bzr merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches
Reviewer Review Type Date Requested Status
Christian Reis (community) release-critical Approve
Aaron Bentley (community) Approve
Eleanor Berger (community) Approve
Michael Nelson Pending
Review via email: mp+9273@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Celso Providelo (cprov) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

reviewer michael.nelson

= Summary =

This branch fixes https://bugs.edge.launchpad.net/soyuz/+bug/404144, involving encoding/decoding issues on form variables not dealt by BrowserRequest.

== Proposed fix ==

The bug manifest itself in two ways:

1. In the PPA:+index (and others) package search form where 'name_filter' is extraced directly from the request 'QUERY_STRING', instead of the form. At that point it's not decoded (it's always in the server expected encoding, UTF-8). If it's passed directly to IArchive.getPublishedSources() (or any other storm query) it will fail badly because storm expects unicodes.

2. In the Distribution:+search package search form where the form variables (unicodes) are directly passed to urllib.urlencode() in order to build the corresponding 'source-based' search url. urllib doesn't accept unicodes.

So the *very* experimental fix are:

For case 1. we can conveniently use BrowserRequest._decode() which respect the server accepted charsets (in our case only UTF-8) to decode the QUERY_STRING variables appopriately, as they would be available in the request.form.

For case 2. we have to encode the form variable as UTF-8 (explicitly) before passing them to urllib.

I'm not entriely sure it's the right way to go. I'd prefer something embedded in get_query_string_params() that could potentially solve both aspects of the problem.

== Tests ==

./bin/test -vv -t archive-views.txt -t distribution-views.txt

== Demo and Q/A ==

Type any non-ascii text in:

 * https://launchpad.dev/ubuntu
 * https://launchpad.dev/~cprov/+archive/ppa

(I'm not entirely sure how can it be done in a US-only system, I use an US-intl keyboard and have the appropriate lang-packs for pt_BR)

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/soyuz/browser/tests/archive-views.txt
  lib/lp/registry/browser/tests/distribution-views.txt
  lib/lp/registry/browser/distribution.py
  lib/lp/soyuz/browser/archive.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkprGZYACgkQ7KBXuXyZSjAQzQCeMNz5QQouCHL7hI/Xm0NLfXFq
lI0An2jDHkNSoZQnzg+Le1gVzOs4zmhs
=/ddq
-----END PGP SIGNATURE-----

Revision history for this message
Eleanor Berger (intellectronica) :
review: Approve
Revision history for this message
Celso Providelo (cprov) wrote :

Tom,

I've discussed a simpler approach to fix this problem with Michael.

It consists of changing query_string_parameters() to already decode variables, and on the Distribution:+search view we can simply use the original form QUERY_STRING, instead of rebuilding it with urlencode.

Can you please re-review it ? https://pastebin.canonical.com/20373/

Revision history for this message
Aaron Bentley (abentley) wrote :

> It consists of changing query_string_parameters() to already decode variables,
> and on the Distribution:+search view we can simply use the original form
> QUERY_STRING, instead of rebuilding it with urlencode.

As we discussed on IRC,

1. Please don't use setdefault at line 23, because we already know there's no value for that key
2. Please add a line break at line 28
3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing. If it did not, an exception would be raised.")

review: Approve
Revision history for this message
Celso Providelo (cprov) wrote :

On Mon, Jul 27, 2009 at 11:11 AM, Aaron Bentley<email address hidden> wrote:
> Review: Approve
>> It consists of changing query_string_parameters() to already decode variables,
>> and on the Distribution:+search view we can simply use the original form
>> QUERY_STRING, instead of rebuilding it with urlencode.
>
> As we discussed on IRC,
>
> 1. Please don't use setdefault at line 23, because we already know there's no value for that key
> 2. Please add a line break at line 28
> 3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
> 4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing.  If it did not, an exception would be raised.")

Thanks for your review, Aaron.

Full diff with your comments addressed goes attached.

--
Celso Providelo <email address hidden>
IRC: cprov, Jabber: <email address hidden>, Skype: cprovidelo
1024D/681B6469 C858 2652 1A6E F6A6 037B B3F7 9FF2 583E 681B 6469

Revision history for this message
Christian Reis (kiko) :
review: Approve (release-critical)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/registry/browser/distribution.py'
--- lib/lp/registry/browser/distribution.py 2009-07-17 00:26:05 +0000
+++ lib/lp/registry/browser/distribution.py 2009-07-25 14:16:22 +0000
@@ -578,7 +578,11 @@
578 By default, we search by binary names, but also provide a link578 By default, we search by binary names, but also provide a link
579 to the equivalent source package search in some circumstances.579 to the equivalent source package search in some circumstances.
580 """580 """
581 new_query_form = self.request.form.copy()581 # Encode the form variables as expected by the server, UTF-8, so it
582 # doesn't confuse urllib encoding mechanism with unicodes.
583 new_query_form = {}
584 for key, value in self.request.form.iteritems():
585 new_query_form[key] = value.encode('UTF-8')
582 new_query_form['search_type'] = 'source'586 new_query_form['search_type'] = 'source'
583 return "%s/+search?%s" % (587 return "%s/+search?%s" % (
584 canonical_url(self.context),588 canonical_url(self.context),
585589
=== modified file 'lib/lp/registry/browser/tests/distribution-views.txt'
--- lib/lp/registry/browser/tests/distribution-views.txt 2009-06-12 16:36:02 +0000
+++ lib/lp/registry/browser/tests/distribution-views.txt 2009-07-25 14:16:22 +0000
@@ -83,6 +83,17 @@
83 >>> print distro_pkg_search_view.source_search_url83 >>> print distro_pkg_search_view.source_search_url
84 http://launchpad.dev/ubuntu/+search?search_type=source&text=a84 http://launchpad.dev/ubuntu/+search?search_type=source&text=a
8585
86Form variables are properly encoded as UTF-8 (as expected by the
87server) before building the 'source_search_url'.
88
89 >>> distro_pkg_search_view = create_initialized_view(
90 ... ubuntu, name="+search", form={
91 ... 'text': u'\xe7',
92 ... })
93
94 >>> print distro_pkg_search_view.source_search_url
95 http://launchpad.dev/ubuntu/+search?search_type=source&text=%C3%A7
96
86But users can specify that the search should be on source-package-names97But users can specify that the search should be on source-package-names
87instead:98instead:
8899
89100
=== modified file 'lib/lp/soyuz/browser/archive.py'
--- lib/lp/soyuz/browser/archive.py 2009-07-17 18:46:25 +0000
+++ lib/lp/soyuz/browser/archive.py 2009-07-25 13:52:52 +0000
@@ -524,7 +524,8 @@
524524
525 self.specified_name_filter = None525 self.specified_name_filter = None
526 if requested_name_filter is not None:526 if requested_name_filter is not None:
527 self.specified_name_filter = requested_name_filter[0]527 self.specified_name_filter = self.request._decode(
528 requested_name_filter[0])
528529
529 def setupStatusFilterWidget(self):530 def setupStatusFilterWidget(self):
530 """Build a customized publishing status select widget.531 """Build a customized publishing status select widget.
531532
=== modified file 'lib/lp/soyuz/browser/tests/archive-views.txt'
--- lib/lp/soyuz/browser/tests/archive-views.txt 2009-07-18 01:03:09 +0000
+++ lib/lp/soyuz/browser/tests/archive-views.txt 2009-07-25 13:52:52 +0000
@@ -235,6 +235,15 @@
235 ... print pub.displayname235 ... print pub.displayname
236 pmount 0.1-1 in warty236 pmount 0.1-1 in warty
237237
238The 'name_filter' is decoded as UTF-8 before futher processing.
239
240 >>> view = create_initialized_view(
241 ... cprov.archive, name="+delete-packages",
242 ... query_string="field.name_filter=%C3%A7")
243
244 >>> len(list(view.batched_sources))
245 0
246
238Similarly, the sources can be filtered by series:247Similarly, the sources can be filtered by series:
239248
240 >>> view = create_initialized_view(249 >>> view = create_initialized_view(

Subscribers

People subscribed via source and target branches

to status/vote changes: