Merge lp:~gz/bzr/url_unquote_unreserved_842223 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Jelmer Vernooij
Approved revision: no longer in the source branch.
Merged at revision: 6265
Proposed branch: lp:~gz/bzr/url_unquote_unreserved_842223
Merge into: lp:bzr
Diff against target: 85 lines (+29/-3)
4 files modified
bzrlib/tests/per_transport.py (+22/-0)
bzrlib/tests/test_smart_transport.py (+1/-1)
bzrlib/urlutils.py (+3/-2)
doc/en/release-notes/bzr-2.5.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/url_unquote_unreserved_842223
Reviewer Review Type Date Requested Status
Martin Pool Approve
Jelmer Vernooij (community) Approve
Review via email: mp+82309@code.launchpad.net

Commit message

Unquote unreserved characters including tilde in urlutils.URL methods

Description of the change

Fixes regression in URL handling introduced by changes needed for addressing colocated branches. This branch doesn't clean up the design of urlutils, but adds more wallpaper to get the basic handling back to how it was, and actually makes transport.pathfilter a little better.

The basic concept is that to make comparison of urls simple (and fix the problems that the 'escape test isolation' checks have with a number of sftp tests), unreserved characters should not be percent encoded. The most current url related RFC recommends this, and particularly calls out tilde:

<http://tools.ietf.org/html/rfc3986#section-2.3>

There should probably be some urlutils.URL specific tests for this class of problems, but I the per_transport cases were the neatest way of comparing before urlutils rewrite and after.

I removed a test for the other side of this (that reserved characters should be escaped) because it broke things too badly, and whether reserved characters can be correctly interpreted as just being in need of escaping is context dependent.

To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) :
review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/15/2011 9:14 PM, Martin Packman wrote:
> Martin Packman has proposed merging
> lp:~gz/bzr/url_unquote_unreserved_842223 into lp:bzr.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #842223 in
> Bazaar: "Change to tilde escaping causes test failures"
> https://bugs.launchpad.net/bzr/+bug/842223
>
> For more details, see:
> https://code.launchpad.net/~gz/bzr/url_unquote_unreserved_842223/+merge/82309
>
> Fixes regression in URL handling introduced by changes needed for
> addressing colocated branches. This branch doesn't clean up the
> design of urlutils, but adds more wallpaper to get the basic
> handling back to how it was, and actually makes
> transport.pathfilter a little better.
>
> The basic concept is that to make comparison of urls simple (and
> fix the problems that the 'escape test isolation' checks have with
> a number of sftp tests), unreserved characters should not be
> percent encoded. The most current url related RFC recommends this,
> and particularly calls out tilde:
>
> <http://tools.ietf.org/html/rfc3986#section-2.3>
>
> There should probably be some urlutils.URL specific tests for this
> class of problems, but I the per_transport cases were the neatest
> way of comparing before urlutils rewrite and after.
>
> I removed a test for the other side of this (that reserved
> characters should be escaped) because it broke things too badly,
> and whether reserved characters can be correctly interpreted as
> just being in need of escaping is context dependent.

I haven't really reviewed the code, I would just be really careful
changing how encoding works. It tends to break users because of the
"is the branch I'm pulling from my master branch" check.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7C7L8ACgkQJdeBCYSNAAOqgQCeMyMw7wV7NqL56wW9KXkhHsvb
QcMAn1iCLXByuH9chSOvcaIM98+95Mu+
=vk65
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

  vote approve

This seems reasonable though you might want review from jelmer or jam.

[tweak] Mention the bug in the test docstrings, for context.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

> I haven't really reviewed the code, I would just be really careful
> changing how encoding works. It tends to break users because of the
> "is the branch I'm pulling from my master branch" check.

Yes, I agree it's scary, which is why I based this branch branch on a revision preceding the urlutils changes to make sure the tests would get the previous logic back correctly.

So, to clarify, the only behavioural change should be to transport.pathfilter handling, and that's a separate revision that could be backed out - if a skip was added to the tests to allow those transports using it to fail.

I'm doing a couple of runs on babune now to see what the status is.

Revision history for this message
Martin Packman (gz) wrote :

Good and bad news from babune.

The current state of trunk:
<http://babune.ladeuil.net:24842/view/Windows/job/selftest-windows/592/>

This branch:
<http://babune.ladeuil.net:24842/view/debug/job/debug-windows/94/>

The failures targeted bug 842223 and this branch - those in per_transport and the various "Attempt to escape test isolation" ones - have been fixed. Oddly the failures from bug 842233 also seem to be gone.

Some (probably sftp/windows specific) path handling issues still remain:

  bb.test_branch.TestRemoteBranch.test_branch_local_remote
  bb.test_too_much.SFTPTestsAbsolute.test_push
  bt.per_branch.test_stacking.TestStackingConnections.test_open_stacked(RemoteBranchFormat-default)
  bt.per_branch.test_stacking.TestStackingConnections.test_open_stacked_relative(RemoteBranchFormat-v2)
  bt.commands.test_commit.TestCommitWithBoundBranch.test_commit_mine_modified
  bt.commands.test_pull.TestPull.test_pull_with_bound_branch

There are also a lot of other rather perplexing failures with no obvious cause, in:

  bt.test__btree_serializer.TestGCCKHSHA1LeafNode
  bt.per_intertree.test_compare.TestCompare
  bt.per_intertree.test_compare.TestIterChanges

My best guess there is an extension building issue on the slave.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

One (non random!) failure on PQM:

  bt.test_smart_transport.RemoteHTTPTransportTestCase.test_clone_unnormal_base

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/testtools/runtest.py", line 144, in _run_user
    return fn(*args)
  File "/usr/lib/python2.6/dist-packages/testtools/testcase.py", line 465, in _run_test_method
    testMethod()
  File "/home/pqm/pqm-workdir/srv/+trunk/bzrlib/tests/test_smart_transport.py", line 3889, in test_clone_unnormal_base
    self.assertEqual('bzr+http://host/%7Ea/b/c/', new_transport.base)
AssertionError: not equal:
a = 'bzr+http://host/%7Ea/b/c/'
b = 'bzr+http://host/~a/b/c/'

Looking into the most sensible way of resolving this.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/2011 2:47 PM, Martin Packman wrote:
> sent to pqm by email
>

Have you tried using this with a real branch? The test was enforcing
that we preserve the existing formatting, so that users doing "bzr up"
don't get "failed-to-lock-master-branch" failures when the URL mismatches.

Certainly, we can just rewrite the test, etc.

What would be nice if we could update the matching logic to
normalize-but-preserve. So it would know that "~a" == "%7Ea", but it
would leave the URL at whatever it was when it read it.

We've flip-flopped at least once on this, and got a bunch of bug
reports both times. I really don't want us to do that again.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7Dv3oACgkQJdeBCYSNAANJjwCdGlvTASTI4Adl+qvSQZ5XtJDc
5GMAn3CzRBAqUIOiSfwy+NNFgmpQK7Hw
=6Dyl
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

> Have you tried using this with a real branch? The test was enforcing
> that we preserve the existing formatting, so that users doing "bzr up"
> don't get "failed-to-lock-master-branch" failures when the URL mismatches.

This is what I'm worried about. I went ahead and changed the test and sent again anyway because:

* The underlying URL handling is changed, and overall this branch is closer to the old behaviour
* Backing out parts or doing further changes to get closer to the old handling is still possible
* For serious breakage, I'd hope we'd have more than one failing test.

What manual testing is needed on top of the test suite for those old issues? I've just tried a few things with a remote http branch, bind, manually editing the branch.conf and so on, and apart from no longer getting a notice about a http redirect from '...%7E' -> '...~' haven't seen any changes.

> What would be nice if we could update the matching logic to
> normalize-but-preserve. So it would know that "~a" == "%7Ea", but it
> would leave the URL at whatever it was when it read it.

There is new equivalence handling (not entirely correct perhaps, but an improvement) in URL.__eq__ but code is generally still doing string comparisons.

> We've flip-flopped at least once on this, and got a bunch of bug
> reports both times. I really don't want us to do that again.

I agree completely, but the current state of play isn't clear to me. What tests were added for the fallout previously?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/2011 3:16 PM, Martin Packman wrote:
>> Have you tried using this with a real branch? The test was
>> enforcing that we preserve the existing formatting, so that users
>> doing "bzr up" don't get "failed-to-lock-master-branch" failures
>> when the URL mismatches.
>
> This is what I'm worried about. I went ahead and changed the test
> and sent again anyway because:
>
> * The underlying URL handling is changed, and overall this branch
> is closer to the old behaviour * Backing out parts or doing further
> changes to get closer to the old handling is still possible * For
> serious breakage, I'd hope we'd have more than one failing test.
>
> What manual testing is needed on top of the test suite for those
> old issues? I've just tried a few things with a remote http branch,
> bind, manually editing the branch.conf and so on, and apart from no
> longer getting a notice about a http redirect from '...%7E' ->
> '...~' haven't seen any changes.

$ cat .bzr/branch/branch.conf
bound_location = bzr+ssh://bazaar.launchpad.net/%2Bbranch/bzr/
bound = True
...

$ cd $checkout-of-lp-from-older-bzr
$ bzr up

Interestingly, when I switch to another checkout the new default
branch paths help prevent this from happening:
$ cat .bzr/branch/branch.conf
bound_location = bzr+ssh://bazaar.launchpad.net/+branch/u1db/
bound = True

Also interestingly, if I use:

$ bzr uncommit --local --force
$ bzr up
Updated to revision 6265 of branch
bzr+ssh://bazaar.launchpad.net/%2Bbranch/bzr

Note that your patch did *not* change the final value. Even weirder,
it was still +branch but one version of bzr used %2Bbranch and a newer
version of bzr used +branch.

>
>> What would be nice if we could update the matching logic to
>> normalize-but-preserve. So it would know that "~a" == "%7Ea", but
>> it would leave the URL at whatever it was when it read it.
>
> There is new equivalence handling (not entirely correct perhaps,
> but an improvement) in URL.__eq__ but code is generally still doing
> string comparisons.
>
>> We've flip-flopped at least once on this, and got a bunch of bug
>> reports both times. I really don't want us to do that again.
>
> I agree completely, but the current state of play isn't clear to
> me. What tests were added for the fallout previously?

Things are still working for me, so maybe URL.__eq__ is happy. I don't
have much more time to dig into this, but basically, we want to check
that a checkout created by bzr-2.4 of a readonly branch can be updated
by bzr.dev.

So something like:

$ bzr2.4 co -r -2 lp:~user/project/branch test
$ cd test
$ bzr.dev up

Make sure that ~user isn't yourself, so that you're sure you won't
have write permission on the source branch.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7D2PQACgkQJdeBCYSNAAMocgCfcHiAJ6tTJfOMNtRr9IDErFzp
QAkAoK/rZl6bEbsU+PFbpOEWiRCZXTmh
=cdXs
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

Thanks John, I'll chase this up tomorrow and post to the list if I run into anything.

> Note that your patch did *not* change the final value. Even weirder,
> it was still +branch but one version of bzr used %2Bbranch and a newer
> version of bzr used +branch.

That's correct, '+' isn't in the unreserved set so this mp has absolutely no effect on it, the handling is just as inconsistent as it's always been previously.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/per_transport.py'
2--- bzrlib/tests/per_transport.py 2011-08-19 22:34:02 +0000
3+++ bzrlib/tests/per_transport.py 2011-11-16 13:46:30 +0000
4@@ -1802,3 +1802,25 @@
5 t2 = t.clone('link')
6 st = t2.stat('')
7 self.assertTrue(stat.S_ISLNK(st.st_mode))
8+
9+ def test_abspath_url_unquote_unreserved(self):
10+ """URLs from abspath should have unreserved characters unquoted
11+
12+ Need consistent quoting notably for tildes, see lp:842223 for more.
13+ """
14+ t = self.get_transport()
15+ needlessly_escaped_dir = "%2D%2E%30%39%41%5A%5F%61%7A%7E/"
16+ self.assertEqual(t.base + "-.09AZ_az~",
17+ t.abspath(needlessly_escaped_dir))
18+
19+ def test_clone_url_unquote_unreserved(self):
20+ """Base URL of a cloned branch needs unreserved characters unquoted
21+
22+ Cloned transports should be prefix comparable for things like the
23+ isolation checking of tests, see lp:842223 for more.
24+ """
25+ t1 = self.get_transport()
26+ needlessly_escaped_dir = "%2D%2E%30%39%41%5A%5F%61%7A%7E/"
27+ self.build_tree([needlessly_escaped_dir], transport=t1)
28+ t2 = t1.clone(needlessly_escaped_dir)
29+ self.assertEqual(t1.base + "-.09AZ_az~/", t2.base)
30
31=== modified file 'bzrlib/tests/test_smart_transport.py'
32--- bzrlib/tests/test_smart_transport.py 2011-11-07 17:21:15 +0000
33+++ bzrlib/tests/test_smart_transport.py 2011-11-16 13:46:30 +0000
34@@ -3886,7 +3886,7 @@
35 # still work correctly.
36 base_transport = remote.RemoteHTTPTransport('bzr+http://host/%7Ea/b')
37 new_transport = base_transport.clone('c')
38- self.assertEqual('bzr+http://host/%7Ea/b/c/', new_transport.base)
39+ self.assertEqual(base_transport.base + 'c/', new_transport.base)
40 self.assertEqual(
41 'c/',
42 new_transport._client.remote_path_from_transport(new_transport))
43
44=== modified file 'bzrlib/urlutils.py'
45--- bzrlib/urlutils.py 2011-10-06 07:43:13 +0000
46+++ bzrlib/urlutils.py 2011-11-16 13:46:30 +0000
47@@ -750,7 +750,7 @@
48 else:
49 self.password = None
50 self.port = port
51- self.quoted_path = quoted_path
52+ self.quoted_path = _url_hex_escapes_re.sub(_unescape_safe_chars, quoted_path)
53 self.path = urllib.unquote(self.quoted_path)
54
55 def __eq__(self, other):
56@@ -835,6 +835,7 @@
57 """
58 if not isinstance(relpath, str):
59 raise errors.InvalidURL(relpath)
60+ relpath = _url_hex_escapes_re.sub(_unescape_safe_chars, relpath)
61 if relpath.startswith('/'):
62 base_parts = []
63 else:
64@@ -866,7 +867,7 @@
65 if offset is not None:
66 relative = unescape(offset).encode('utf-8')
67 path = self._combine_paths(self.path, relative)
68- path = urllib.quote(path)
69+ path = urllib.quote(path, safe="/~")
70 else:
71 path = self.quoted_path
72 return self.__class__(self.scheme, self.quoted_user,
73
74=== modified file 'doc/en/release-notes/bzr-2.5.txt'
75--- doc/en/release-notes/bzr-2.5.txt 2011-11-16 12:25:23 +0000
76+++ doc/en/release-notes/bzr-2.5.txt 2011-11-16 13:46:30 +0000
77@@ -32,6 +32,9 @@
78 .. Fixes for situations where bzr would previously crash or give incorrect
79 or undesirable results.
80
81+* Resolve regression from colocated branch path handling, by ensuring that
82+ unreserved characters are unquoted in URLs. (Martin Packman, #842223)
83+
84 * Support verifying signatures on remote repositories.
85 (Jelmer Vernooij, #889694)
86