Merge lp:~gz/bzr/url_unquote_unreserved_842223 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Jelmer Vernooij
Approved revision: no longer in the source branch.
Merged at revision: 6265
Proposed branch: lp:~gz/bzr/url_unquote_unreserved_842223
Merge into: lp:bzr
Diff against target: 85 lines (+29/-3)
4 files modified
bzrlib/tests/per_transport.py (+22/-0)
bzrlib/tests/test_smart_transport.py (+1/-1)
bzrlib/urlutils.py (+3/-2)
doc/en/release-notes/bzr-2.5.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/url_unquote_unreserved_842223
Reviewer Review Type Date Requested Status
Martin Pool Approve
Jelmer Vernooij (community) Approve
Review via email: mp+82309@code.launchpad.net

Commit message

Unquote unreserved characters including tilde in urlutils.URL methods

Description of the change

Fixes regression in URL handling introduced by changes needed for addressing colocated branches. This branch doesn't clean up the design of urlutils, but adds more wallpaper to get the basic handling back to how it was, and actually makes transport.pathfilter a little better.

The basic concept is that to make comparison of urls simple (and fix the problems that the 'escape test isolation' checks have with a number of sftp tests), unreserved characters should not be percent encoded. The most current url related RFC recommends this, and particularly calls out tilde:

<http://tools.ietf.org/html/rfc3986#section-2.3>

There should probably be some urlutils.URL specific tests for this class of problems, but I the per_transport cases were the neatest way of comparing before urlutils rewrite and after.

I removed a test for the other side of this (that reserved characters should be escaped) because it broke things too badly, and whether reserved characters can be correctly interpreted as just being in need of escaping is context dependent.

To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) :
review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/15/2011 9:14 PM, Martin Packman wrote:
> Martin Packman has proposed merging
> lp:~gz/bzr/url_unquote_unreserved_842223 into lp:bzr.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #842223 in
> Bazaar: "Change to tilde escaping causes test failures"
> https://bugs.launchpad.net/bzr/+bug/842223
>
> For more details, see:
> https://code.launchpad.net/~gz/bzr/url_unquote_unreserved_842223/+merge/82309
>
> Fixes regression in URL handling introduced by changes needed for
> addressing colocated branches. This branch doesn't clean up the
> design of urlutils, but adds more wallpaper to get the basic
> handling back to how it was, and actually makes
> transport.pathfilter a little better.
>
> The basic concept is that to make comparison of urls simple (and
> fix the problems that the 'escape test isolation' checks have with
> a number of sftp tests), unreserved characters should not be
> percent encoded. The most current url related RFC recommends this,
> and particularly calls out tilde:
>
> <http://tools.ietf.org/html/rfc3986#section-2.3>
>
> There should probably be some urlutils.URL specific tests for this
> class of problems, but I the per_transport cases were the neatest
> way of comparing before urlutils rewrite and after.
>
> I removed a test for the other side of this (that reserved
> characters should be escaped) because it broke things too badly,
> and whether reserved characters can be correctly interpreted as
> just being in need of escaping is context dependent.

I haven't really reviewed the code, I would just be really careful
changing how encoding works. It tends to break users because of the
"is the branch I'm pulling from my master branch" check.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7C7L8ACgkQJdeBCYSNAAOqgQCeMyMw7wV7NqL56wW9KXkhHsvb
QcMAn1iCLXByuH9chSOvcaIM98+95Mu+
=vk65
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

  vote approve

This seems reasonable though you might want review from jelmer or jam.

[tweak] Mention the bug in the test docstrings, for context.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

> I haven't really reviewed the code, I would just be really careful
> changing how encoding works. It tends to break users because of the
> "is the branch I'm pulling from my master branch" check.

Yes, I agree it's scary, which is why I based this branch branch on a revision preceding the urlutils changes to make sure the tests would get the previous logic back correctly.

So, to clarify, the only behavioural change should be to transport.pathfilter handling, and that's a separate revision that could be backed out - if a skip was added to the tests to allow those transports using it to fail.

I'm doing a couple of runs on babune now to see what the status is.

Revision history for this message
Martin Packman (gz) wrote :

Good and bad news from babune.

The current state of trunk:
<http://babune.ladeuil.net:24842/view/Windows/job/selftest-windows/592/>

This branch:
<http://babune.ladeuil.net:24842/view/debug/job/debug-windows/94/>

The failures targeted bug 842223 and this branch - those in per_transport and the various "Attempt to escape test isolation" ones - have been fixed. Oddly the failures from bug 842233 also seem to be gone.

Some (probably sftp/windows specific) path handling issues still remain:

  bb.test_branch.TestRemoteBranch.test_branch_local_remote
  bb.test_too_much.SFTPTestsAbsolute.test_push
  bt.per_branch.test_stacking.TestStackingConnections.test_open_stacked(RemoteBranchFormat-default)
  bt.per_branch.test_stacking.TestStackingConnections.test_open_stacked_relative(RemoteBranchFormat-v2)
  bt.commands.test_commit.TestCommitWithBoundBranch.test_commit_mine_modified
  bt.commands.test_pull.TestPull.test_pull_with_bound_branch

There are also a lot of other rather perplexing failures with no obvious cause, in:

  bt.test__btree_serializer.TestGCCKHSHA1LeafNode
  bt.per_intertree.test_compare.TestCompare
  bt.per_intertree.test_compare.TestIterChanges

My best guess there is an extension building issue on the slave.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

One (non random!) failure on PQM:

  bt.test_smart_transport.RemoteHTTPTransportTestCase.test_clone_unnormal_base

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/testtools/runtest.py", line 144, in _run_user
    return fn(*args)
  File "/usr/lib/python2.6/dist-packages/testtools/testcase.py", line 465, in _run_test_method
    testMethod()
  File "/home/pqm/pqm-workdir/srv/+trunk/bzrlib/tests/test_smart_transport.py", line 3889, in test_clone_unnormal_base
    self.assertEqual('bzr+http://host/%7Ea/b/c/', new_transport.base)
AssertionError: not equal:
a = 'bzr+http://host/%7Ea/b/c/'
b = 'bzr+http://host/~a/b/c/'

Looking into the most sensible way of resolving this.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/2011 2:47 PM, Martin Packman wrote:
> sent to pqm by email
>

Have you tried using this with a real branch? The test was enforcing
that we preserve the existing formatting, so that users doing "bzr up"
don't get "failed-to-lock-master-branch" failures when the URL mismatches.

Certainly, we can just rewrite the test, etc.

What would be nice if we could update the matching logic to
normalize-but-preserve. So it would know that "~a" == "%7Ea", but it
would leave the URL at whatever it was when it read it.

We've flip-flopped at least once on this, and got a bunch of bug
reports both times. I really don't want us to do that again.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7Dv3oACgkQJdeBCYSNAANJjwCdGlvTASTI4Adl+qvSQZ5XtJDc
5GMAn3CzRBAqUIOiSfwy+NNFgmpQK7Hw
=6Dyl
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

> Have you tried using this with a real branch? The test was enforcing
> that we preserve the existing formatting, so that users doing "bzr up"
> don't get "failed-to-lock-master-branch" failures when the URL mismatches.

This is what I'm worried about. I went ahead and changed the test and sent again anyway because:

* The underlying URL handling is changed, and overall this branch is closer to the old behaviour
* Backing out parts or doing further changes to get closer to the old handling is still possible
* For serious breakage, I'd hope we'd have more than one failing test.

What manual testing is needed on top of the test suite for those old issues? I've just tried a few things with a remote http branch, bind, manually editing the branch.conf and so on, and apart from no longer getting a notice about a http redirect from '...%7E' -> '...~' haven't seen any changes.

> What would be nice if we could update the matching logic to
> normalize-but-preserve. So it would know that "~a" == "%7Ea", but it
> would leave the URL at whatever it was when it read it.

There is new equivalence handling (not entirely correct perhaps, but an improvement) in URL.__eq__ but code is generally still doing string comparisons.

> We've flip-flopped at least once on this, and got a bunch of bug
> reports both times. I really don't want us to do that again.

I agree completely, but the current state of play isn't clear to me. What tests were added for the fallout previously?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/2011 3:16 PM, Martin Packman wrote:
>> Have you tried using this with a real branch? The test was
>> enforcing that we preserve the existing formatting, so that users
>> doing "bzr up" don't get "failed-to-lock-master-branch" failures
>> when the URL mismatches.
>
> This is what I'm worried about. I went ahead and changed the test
> and sent again anyway because:
>
> * The underlying URL handling is changed, and overall this branch
> is closer to the old behaviour * Backing out parts or doing further
> changes to get closer to the old handling is still possible * For
> serious breakage, I'd hope we'd have more than one failing test.
>
> What manual testing is needed on top of the test suite for those
> old issues? I've just tried a few things with a remote http branch,
> bind, manually editing the branch.conf and so on, and apart from no
> longer getting a notice about a http redirect from '...%7E' ->
> '...~' haven't seen any changes.

$ cat .bzr/branch/branch.conf
bound_location = bzr+ssh://bazaar.launchpad.net/%2Bbranch/bzr/
bound = True
...

$ cd $checkout-of-lp-from-older-bzr
$ bzr up

Interestingly, when I switch to another checkout the new default
branch paths help prevent this from happening:
$ cat .bzr/branch/branch.conf
bound_location = bzr+ssh://bazaar.launchpad.net/+branch/u1db/
bound = True

Also interestingly, if I use:

$ bzr uncommit --local --force
$ bzr up
Updated to revision 6265 of branch
bzr+ssh://bazaar.launchpad.net/%2Bbranch/bzr

Note that your patch did *not* change the final value. Even weirder,
it was still +branch but one version of bzr used %2Bbranch and a newer
version of bzr used +branch.

>
>> What would be nice if we could update the matching logic to
>> normalize-but-preserve. So it would know that "~a" == "%7Ea", but
>> it would leave the URL at whatever it was when it read it.
>
> There is new equivalence handling (not entirely correct perhaps,
> but an improvement) in URL.__eq__ but code is generally still doing
> string comparisons.
>
>> We've flip-flopped at least once on this, and got a bunch of bug
>> reports both times. I really don't want us to do that again.
>
> I agree completely, but the current state of play isn't clear to
> me. What tests were added for the fallout previously?

Things are still working for me, so maybe URL.__eq__ is happy. I don't
have much more time to dig into this, but basically, we want to check
that a checkout created by bzr-2.4 of a readonly branch can be updated
by bzr.dev.

So something like:

$ bzr2.4 co -r -2 lp:~user/project/branch test
$ cd test
$ bzr.dev up

Make sure that ~user isn't yourself, so that you're sure you won't
have write permission on the source branch.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7D2PQACgkQJdeBCYSNAAMocgCfcHiAJ6tTJfOMNtRr9IDErFzp
QAkAoK/rZl6bEbsU+PFbpOEWiRCZXTmh
=cdXs
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

Thanks John, I'll chase this up tomorrow and post to the list if I run into anything.

> Note that your patch did *not* change the final value. Even weirder,
> it was still +branch but one version of bzr used %2Bbranch and a newer
> version of bzr used +branch.

That's correct, '+' isn't in the unreserved set so this mp has absolutely no effect on it, the handling is just as inconsistent as it's always been previously.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/tests/per_transport.py'
--- bzrlib/tests/per_transport.py 2011-08-19 22:34:02 +0000
+++ bzrlib/tests/per_transport.py 2011-11-16 13:46:30 +0000
@@ -1802,3 +1802,25 @@
1802 t2 = t.clone('link')1802 t2 = t.clone('link')
1803 st = t2.stat('')1803 st = t2.stat('')
1804 self.assertTrue(stat.S_ISLNK(st.st_mode))1804 self.assertTrue(stat.S_ISLNK(st.st_mode))
1805
1806 def test_abspath_url_unquote_unreserved(self):
1807 """URLs from abspath should have unreserved characters unquoted
1808
1809 Need consistent quoting notably for tildes, see lp:842223 for more.
1810 """
1811 t = self.get_transport()
1812 needlessly_escaped_dir = "%2D%2E%30%39%41%5A%5F%61%7A%7E/"
1813 self.assertEqual(t.base + "-.09AZ_az~",
1814 t.abspath(needlessly_escaped_dir))
1815
1816 def test_clone_url_unquote_unreserved(self):
1817 """Base URL of a cloned branch needs unreserved characters unquoted
1818
1819 Cloned transports should be prefix comparable for things like the
1820 isolation checking of tests, see lp:842223 for more.
1821 """
1822 t1 = self.get_transport()
1823 needlessly_escaped_dir = "%2D%2E%30%39%41%5A%5F%61%7A%7E/"
1824 self.build_tree([needlessly_escaped_dir], transport=t1)
1825 t2 = t1.clone(needlessly_escaped_dir)
1826 self.assertEqual(t1.base + "-.09AZ_az~/", t2.base)
18051827
=== modified file 'bzrlib/tests/test_smart_transport.py'
--- bzrlib/tests/test_smart_transport.py 2011-11-07 17:21:15 +0000
+++ bzrlib/tests/test_smart_transport.py 2011-11-16 13:46:30 +0000
@@ -3886,7 +3886,7 @@
3886 # still work correctly.3886 # still work correctly.
3887 base_transport = remote.RemoteHTTPTransport('bzr+http://host/%7Ea/b')3887 base_transport = remote.RemoteHTTPTransport('bzr+http://host/%7Ea/b')
3888 new_transport = base_transport.clone('c')3888 new_transport = base_transport.clone('c')
3889 self.assertEqual('bzr+http://host/%7Ea/b/c/', new_transport.base)3889 self.assertEqual(base_transport.base + 'c/', new_transport.base)
3890 self.assertEqual(3890 self.assertEqual(
3891 'c/',3891 'c/',
3892 new_transport._client.remote_path_from_transport(new_transport))3892 new_transport._client.remote_path_from_transport(new_transport))
38933893
=== modified file 'bzrlib/urlutils.py'
--- bzrlib/urlutils.py 2011-10-06 07:43:13 +0000
+++ bzrlib/urlutils.py 2011-11-16 13:46:30 +0000
@@ -750,7 +750,7 @@
750 else:750 else:
751 self.password = None751 self.password = None
752 self.port = port752 self.port = port
753 self.quoted_path = quoted_path753 self.quoted_path = _url_hex_escapes_re.sub(_unescape_safe_chars, quoted_path)
754 self.path = urllib.unquote(self.quoted_path)754 self.path = urllib.unquote(self.quoted_path)
755755
756 def __eq__(self, other):756 def __eq__(self, other):
@@ -835,6 +835,7 @@
835 """835 """
836 if not isinstance(relpath, str):836 if not isinstance(relpath, str):
837 raise errors.InvalidURL(relpath)837 raise errors.InvalidURL(relpath)
838 relpath = _url_hex_escapes_re.sub(_unescape_safe_chars, relpath)
838 if relpath.startswith('/'):839 if relpath.startswith('/'):
839 base_parts = []840 base_parts = []
840 else:841 else:
@@ -866,7 +867,7 @@
866 if offset is not None:867 if offset is not None:
867 relative = unescape(offset).encode('utf-8')868 relative = unescape(offset).encode('utf-8')
868 path = self._combine_paths(self.path, relative)869 path = self._combine_paths(self.path, relative)
869 path = urllib.quote(path)870 path = urllib.quote(path, safe="/~")
870 else:871 else:
871 path = self.quoted_path872 path = self.quoted_path
872 return self.__class__(self.scheme, self.quoted_user,873 return self.__class__(self.scheme, self.quoted_user,
873874
=== modified file 'doc/en/release-notes/bzr-2.5.txt'
--- doc/en/release-notes/bzr-2.5.txt 2011-11-16 12:25:23 +0000
+++ doc/en/release-notes/bzr-2.5.txt 2011-11-16 13:46:30 +0000
@@ -32,6 +32,9 @@
32.. Fixes for situations where bzr would previously crash or give incorrect32.. Fixes for situations where bzr would previously crash or give incorrect
33 or undesirable results.33 or undesirable results.
3434
35* Resolve regression from colocated branch path handling, by ensuring that
36 unreserved characters are unquoted in URLs. (Martin Packman, #842223)
37
35* Support verifying signatures on remote repositories.38* Support verifying signatures on remote repositories.
36 (Jelmer Vernooij, #889694)39 (Jelmer Vernooij, #889694)
3740