Merge into db-devel : bug-404144-unicode-ppa-searches : Code : Launchpad itself

Status:

Merged

Approved by:

Aaron Bentley on 2009-07-27

Approved revision:

no longer in the source branch.

Merged at revision:

not available

Proposed branch:

lp:~cprov/launchpad/bug-404144-unicode-ppa-searches

Merge into:

lp:launchpad/db-devel

Diff against target:

None lines

To merge this branch:

bzr merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Christian Reis (community)	release-critical	2009-07-27	Approve on 2009-07-27
Aaron Bentley (community)			Approve on 2009-07-27
Eleanor Berger (community)		2009-07-25	Approve on 2009-07-27
Michael Nelson		2009-07-25	Pending
Review via email: mp+9273@code.launchpad.net

Revision history for this message

Celso Providelo (cprov) wrote on 2009-07-25:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

reviewer michael.nelson

= Summary =

This branch fixes https://bugs.edge.launchpad.net/soyuz/+bug/404144, involving encoding/decoding issues on form variables not dealt by BrowserRequest.

== Proposed fix ==

The bug manifest itself in two ways:

1. In the PPA:+index (and others) package search form where 'name_filter' is extraced directly from the request 'QUERY_STRING', instead of the form. At that point it's not decoded (it's always in the server expected encoding, UTF-8). If it's passed directly to IArchive.getPublishedSources() (or any other storm query) it will fail badly because storm expects unicodes.

2. In the Distribution:+search package search form where the form variables (unicodes) are directly passed to urllib.urlencode() in order to build the corresponding 'source-based' search url. urllib doesn't accept unicodes.

So the *very* experimental fix are:

For case 1. we can conveniently use BrowserRequest._decode() which respect the server accepted charsets (in our case only UTF-8) to decode the QUERY_STRING variables appopriately, as they would be available in the request.form.

For case 2. we have to encode the form variable as UTF-8 (explicitly) before passing them to urllib.

I'm not entriely sure it's the right way to go. I'd prefer something embedded in get_query_string_params() that could potentially solve both aspects of the problem.

== Tests ==

./bin/test -vv -t archive-views.txt -t distribution-views.txt

== Demo and Q/A ==

Type any non-ascii text in:

* https://launchpad.dev/ubuntu
* https://launchpad.dev/~cprov/+archive/ppa

(I'm not entirely sure how can it be done in a US-only system, I use an US-intl keyboard and have the appropriate lang-packs for pt_BR)

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/soyuz/browser/tests/archive-views.txt
  lib/lp/registry/browser/tests/distribution-views.txt
  lib/lp/registry/browser/distribution.py
  lib/lp/soyuz/browser/archive.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkprGZYACgkQ7KBXuXyZSjAQzQCeMNz5QQouCHL7hI/Xm0NLfXFq
lI0An2jDHkNSoZQnzg+Le1gVzOs4zmhs
=/ddq
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

reviewer michael.nelson

= Summary =

This branch fixes https://bugs.edge.launchpad.net/soyuz/+bug/404144, involving encoding/decoding issues on form variables not dealt by BrowserRequest.

== Proposed fix ==

The bug manifest itself in two ways:

1. In the PPA:+index (and others) package search form where 'name_filter' is extraced directly from the request 'QUERY_STRING', instead of the form. At that point it's not decoded (it's always in the server expected encoding, UTF-8). If it's passed directly to IArchive.getPublishedSources() (or any other storm query) it will fail badly because storm expects unicodes.

2. In the Distribution:+search package search form where the form variables (unicodes) are directly passed to urllib.urlencode() in order to build the corresponding 'source-based' search url. urllib doesn't accept unicodes.

So the *very* experimental fix are:

For case 1. we can conveniently use BrowserRequest._decode() which respect the server accepted charsets (in our case only UTF-8) to decode the QUERY_STRING variables appopriately, as they would be available in the request.form.

For case 2. we have to encode the form variable as UTF-8 (explicitly) before passing them to urllib.

I'm not entriely sure it's the right way to go. I'd prefer something embedded in get_query_string_params() that could potentially solve both aspects of the problem.

== Tests ==

./bin/test -vv -t archive-views.txt -t distribution-views.txt

== Demo and Q/A ==

Type any non-ascii text in:

* https://launchpad.dev/ubuntu
 * https://launchpad.dev/~cprov/+archive/ppa

(I'm not entirely sure how can it be done in a US-only system, I use an US-intl keyboard and have the appropriate lang-packs for pt_BR)

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/soyuz/browser/tests/archive-views.txt
  lib/lp/registry/browser/tests/distribution-views.txt
  lib/lp/registry/browser/distribution.py
  lib/lp/soyuz/browser/archive.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkprGZYACgkQ7KBXuXyZSjAQzQCeMNz5QQouCHL7hI/Xm0NLfXFq
lI0An2jDHkNSoZQnzg+Le1gVzOs4zmhs
=/ddq
-----END PGP SIGNATURE-----

Revision history for this message

Eleanor Berger (intellectronica) on 2009-07-27:

#

review: Approve

Revision history for this message

Celso Providelo (cprov) wrote on 2009-07-27:

#

Tom,

I've discussed a simpler approach to fix this problem with Michael.

It consists of changing query_string_parameters() to already decode variables, and on the Distribution:+search view we can simply use the original form QUERY_STRING, instead of rebuilding it with urlencode.

Can you please re-review it ? https://pastebin.canonical.com/20373/

Revision history for this message

Aaron Bentley (abentley) wrote on 2009-07-27:

#

> It consists of changing query_string_parameters() to already decode variables,
> and on the Distribution:+search view we can simply use the original form
> QUERY_STRING, instead of rebuilding it with urlencode.

As we discussed on IRC,

1. Please don't use setdefault at line 23, because we already know there's no value for that key
2. Please add a line break at line 28
3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing. If it did not, an exception would be raised.")

review: Approve

Revision history for this message

Celso Providelo (cprov) wrote on 2009-07-27:

#

On Mon, Jul 27, 2009 at 11:11 AM, Aaron Bentley<email address hidden> wrote:
> Review: Approve
>> It consists of changing query_string_parameters() to already decode variables,
>> and on the Distribution:+search view we can simply use the original form
>> QUERY_STRING, instead of rebuilding it with urlencode.
>
> As we discussed on IRC,
>
> 1. Please don't use setdefault at line 23, because we already know there's no value for that key
> 2. Please add a line break at line 28
> 3. Please clarify that get_query_string_params returns lists of unicode strings in the docstring.
> 4. Please explain that the doctest at 188 would raise an exception if the value weren't utf-8 decoded (e.g. "The 'name_filter' is decoded as UTF-8 before further processing. If it did not, an exception would be raised.")

Thanks for your review, Aaron.

Full diff with your comments addressed goes attached.

--
Celso Providelo <email address hidden>
IRC: cprov, Jabber: <email address hidden>, Skype: cprovidelo
1024D/681B6469 C858 2652 1A6E F6A6 037B B3F7 9FF2 583E 681B 6469

Revision history for this message

Christian Reis (kiko) on 2009-07-27:

#

review: Approve (release-critical)

 === modified file 'lib/lp/registry/browser/distribution.py'
 --- lib/lp/registry/browser/distribution.py	2009-07-17 00:26:05 +0000
 +++ lib/lp/registry/browser/distribution.py	2009-07-25 14:16:22 +0000
@@ -578,7 +578,11 @@
          By default, we search by binary names, but also provide a link
          to the equivalent source package search in some circumstances.
          """
--        new_query_form = self.request.form.copy()
++        # Encode the form variables as expected by the server, UTF-8, so it
++        # doesn't confuse urllib encoding mechanism with unicodes.
++        new_query_form = {}
++        for key, value in self.request.form.iteritems():
++            new_query_form[key] = value.encode('UTF-8')
          new_query_form['search_type'] = 'source'
          return "%s/+search?%s" % (
              canonical_url(self.context),
 === modified file 'lib/lp/registry/browser/tests/distribution-views.txt'
 --- lib/lp/registry/browser/tests/distribution-views.txt	2009-06-12 16:36:02 +0000
 +++ lib/lp/registry/browser/tests/distribution-views.txt	2009-07-25 14:16:22 +0000
@@ -83,6 +83,17 @@
      >>> print distro_pkg_search_view.source_search_url
      http://launchpad.dev/ubuntu/+search?search_type=source&text=a
++Form variables are properly encoded as UTF-8 (as expected by the
++server) before building the 'source_search_url'.
++
++    >>> distro_pkg_search_view = create_initialized_view(
++    ...     ubuntu, name="+search", form={
++    ...         'text': u'\xe7',
++    ...     })
++
++    >>> print distro_pkg_search_view.source_search_url
++    http://launchpad.dev/ubuntu/+search?search_type=source&text=%C3%A7
++
  But users can specify that the search should be on source-package-names
  instead:
 === modified file 'lib/lp/soyuz/browser/archive.py'
 --- lib/lp/soyuz/browser/archive.py	2009-07-17 18:46:25 +0000
 +++ lib/lp/soyuz/browser/archive.py	2009-07-25 13:52:52 +0000
@@ -524,7 +524,8 @@
          self.specified_name_filter = None
          if requested_name_filter is not None:
--            self.specified_name_filter = requested_name_filter[0]
++            self.specified_name_filter = self.request._decode(
++                requested_name_filter[0])
      def setupStatusFilterWidget(self):
          """Build a customized publishing status select widget.
 === modified file 'lib/lp/soyuz/browser/tests/archive-views.txt'
 --- lib/lp/soyuz/browser/tests/archive-views.txt	2009-07-18 01:03:09 +0000
 +++ lib/lp/soyuz/browser/tests/archive-views.txt	2009-07-25 13:52:52 +0000
@@ -235,6 +235,15 @@
      ...     print pub.displayname
      pmount 0.1-1 in warty
++The 'name_filter' is decoded as UTF-8 before futher processing.
++
++    >>> view = create_initialized_view(
++    ...     cprov.archive, name="+delete-packages",
++    ...     query_string="field.name_filter=%C3%A7")
++
++    >>> len(list(view.batched_sources))
++    0
++
  Similarly, the sources can be filtered by series:
      >>> view = create_initialized_view(

Launchpad itself

Merge lp:~cprov/launchpad/bug-404144-unicode-ppa-searches into lp:launchpad/db-devel

Commit message

Description of the change

Preview Diff

Subscribers