Merge lp:~gz/bzr/split_segment_params_not_url_842233 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6286
Proposed branch: lp:~gz/bzr/split_segment_params_not_url_842233
Merge into: lp:bzr
Diff against target: 94 lines (+23/-7)
3 files modified
bzrlib/tests/test_urlutils.py (+9/-3)
bzrlib/urlutils.py (+8/-4)
doc/en/release-notes/bzr-2.5.txt (+6/-0)
To merge this branch: bzr merge lp:~gz/bzr/split_segment_params_not_url_842233
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+82722@code.launchpad.net

Commit message

Avoid using URL specific join and split when seperating segment parameters

Description of the change

Correctly split segment parameters from a url, without depending on url handling functions. This branch keeps the current semantics, except for a couple of assertions on relative paths which are invalid.

For background, see the merge proposal fixing the main initial fallout from the addition colo branch addressing:

<https://code.launchpad.net/~gz/bzr/root_drive_file_url_841322/+merge/74034>

The salient point is that to make a colo address, the join code roughly uses "%(url)s,branch=name" with whatever the base url is. Blindly sticking a comma and segment params on the end of any old url means the result is no longer a url - depending on the base format it may still be parsable as one, but it may not.

Such as:
    <http://example.com> is valid
    <http://example.com,branch=name> is invalid
    <file:///> is special-cased to valid in bzrlib on windows
    <file:///,branch=name> is invalid on windows

So, either we must be careful to make sure to always remove the segment parameters smuggled on before using any url handling functions or passing anywhere that needs a url, or a different approach is needed.

This is further complicated by the laxness of the various Transport implementations when it comes to url handling, which involves a lot of plain string operations without checking inputs. Particularly the adding terminal slashes in some places and removing them in others is troublesome.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Thanks for your patience on that.

Overall, I agree that url handling in the various transports classes may receive some love especially about which encoding is used (including url-escaped) is/should/should not be used by which methods.

It's both amazing and scary that you've been able to fix this bug without touching a single transport test... I understand why but it's still amazing.

A couple of nits:

8 + # Check relative references with absolute paths
17 + # Check relative references with relative paths
29 + # TODO: Check full URLs as well as relative references

33 + # Check relative references with absolute paths
42 + # Check relative references with relative paths
51 + # TODO: Check full URLs as well as relative references

Hmm, do I see a pattern there ?

This screams: split these tests to my ears.

70 + # Segements begin at first comma after last forward slash, if one exists

s/Segements/Segments/

71 + segment_start = lurl.find(",", lurl.rfind("/")+1)

'+1'... Cunning ;) and commented !

74 - return (join(parent_url, subsegments[0]), subsegments[1:])
75 + return lurl[:segment_start], lurl[segment_start+1:].split(",")

The comma is easy to miss, the leading paren was clearer to indicate that two values are returned.

Two final notes:
- there has been discussion in the past about never removing the final path from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
- I'd really like to see your mail summarising your thoughts about url handling across the code base ;-D

.. and one more thing: news entry

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

> Overall, I agree that url handling in the various transports classes may
> receive some love especially about which encoding is used (including url-
> escaped) is/should/should not be used by which methods.

Yes, Jelmer has gone some way towards sorting this out, but there's still a lot of confusion in the code.

> It's both amazing and scary that you've been able to fix this bug without
> touching a single transport test... I understand why but it's still amazing.

Well, that's due to the cheat with split_trailing_slash which lets me replicate the existing (somewhat bogus) behaviour but without tickling the main problem by trying to split/join as a url.

> Hmm, do I see a pattern there ?
>
> This screams: split these tests to my ears.

Yes, and given it should really cover the case with schemes as well, it almost wants parametrizing. I resisted that as part of this change as I'm unconvinced by the current interfaces.

> Two final notes:
> - there has been discussion in the past about never removing the final path
> from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
> - I'd really like to see your mail summarising your thoughts about url
> handling across the code base ;-D

Being far more careful about what are valid operations on different kinds of url scheme would help, which perhaps the URL class can assist with though I'm a little leery of the added weight there.

> .. and one more thing: news entry

Added, and nits addressed.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/tests/test_urlutils.py'
--- bzrlib/tests/test_urlutils.py 2011-09-04 23:18:43 +0000
+++ bzrlib/tests/test_urlutils.py 2011-11-24 12:10:31 +0000
@@ -501,6 +501,7 @@
501501
502 def test_split_segment_parameters_raw(self):502 def test_split_segment_parameters_raw(self):
503 split_segment_parameters_raw = urlutils.split_segment_parameters_raw503 split_segment_parameters_raw = urlutils.split_segment_parameters_raw
504 # Check relative references with absolute paths
504 self.assertEquals(("/some/path", []),505 self.assertEquals(("/some/path", []),
505 split_segment_parameters_raw("/some/path"))506 split_segment_parameters_raw("/some/path"))
506 self.assertEquals(("/some/path", ["tip"]),507 self.assertEquals(("/some/path", ["tip"]),
@@ -511,19 +512,22 @@
511 split_segment_parameters_raw("/somedir/path,heads%2Ftip"))512 split_segment_parameters_raw("/somedir/path,heads%2Ftip"))
512 self.assertEquals(("/somedir/path", ["heads%2Ftip", "bar"]),513 self.assertEquals(("/somedir/path", ["heads%2Ftip", "bar"]),
513 split_segment_parameters_raw("/somedir/path,heads%2Ftip,bar"))514 split_segment_parameters_raw("/somedir/path,heads%2Ftip,bar"))
514 self.assertEquals(("/", ["key1=val1"]),515 # Check relative references with relative paths
516 self.assertEquals(("", ["key1=val1"]),
515 split_segment_parameters_raw(",key1=val1"))517 split_segment_parameters_raw(",key1=val1"))
516 self.assertEquals(("foo/", ["key1=val1"]),518 self.assertEquals(("foo/", ["key1=val1"]),
517 split_segment_parameters_raw("foo/,key1=val1"))519 split_segment_parameters_raw("foo/,key1=val1"))
518 self.assertEquals(("/foo", ["key1=val1"]),520 self.assertEquals(("foo", ["key1=val1"]),
519 split_segment_parameters_raw("foo,key1=val1"))521 split_segment_parameters_raw("foo,key1=val1"))
520 self.assertEquals(("foo/base,la=bla/other/elements", []),522 self.assertEquals(("foo/base,la=bla/other/elements", []),
521 split_segment_parameters_raw("foo/base,la=bla/other/elements"))523 split_segment_parameters_raw("foo/base,la=bla/other/elements"))
522 self.assertEquals(("foo/base,la=bla/other/elements", ["a=b"]),524 self.assertEquals(("foo/base,la=bla/other/elements", ["a=b"]),
523 split_segment_parameters_raw("foo/base,la=bla/other/elements,a=b"))525 split_segment_parameters_raw("foo/base,la=bla/other/elements,a=b"))
526 # TODO: Check full URLs as well as relative references
524527
525 def test_split_segment_parameters(self):528 def test_split_segment_parameters(self):
526 split_segment_parameters = urlutils.split_segment_parameters529 split_segment_parameters = urlutils.split_segment_parameters
530 # Check relative references with absolute paths
527 self.assertEquals(("/some/path", {}),531 self.assertEquals(("/some/path", {}),
528 split_segment_parameters("/some/path"))532 split_segment_parameters("/some/path"))
529 self.assertEquals(("/some/path", {"branch": "tip"}),533 self.assertEquals(("/some/path", {"branch": "tip"}),
@@ -538,7 +542,8 @@
538 "/somedir/path,ref=heads%2Ftip,key1=val1"))542 "/somedir/path,ref=heads%2Ftip,key1=val1"))
539 self.assertEquals(("/somedir/path", {"ref": "heads%2F=tip"}),543 self.assertEquals(("/somedir/path", {"ref": "heads%2F=tip"}),
540 split_segment_parameters("/somedir/path,ref=heads%2F=tip"))544 split_segment_parameters("/somedir/path,ref=heads%2F=tip"))
541 self.assertEquals(("/", {"key1": "val1"}),545 # Check relative references with relative paths
546 self.assertEquals(("", {"key1": "val1"}),
542 split_segment_parameters(",key1=val1"))547 split_segment_parameters(",key1=val1"))
543 self.assertEquals(("foo/", {"key1": "val1"}),548 self.assertEquals(("foo/", {"key1": "val1"}),
544 split_segment_parameters("foo/,key1=val1"))549 split_segment_parameters("foo/,key1=val1"))
@@ -547,6 +552,7 @@
547 self.assertEquals(("foo/base,key1=val1/other/elements",552 self.assertEquals(("foo/base,key1=val1/other/elements",
548 {"key2": "val2"}), split_segment_parameters(553 {"key2": "val2"}), split_segment_parameters(
549 "foo/base,key1=val1/other/elements,key2=val2"))554 "foo/base,key1=val1/other/elements,key2=val2"))
555 # TODO: Check full URLs as well as relative references
550556
551 def test_win32_strip_local_trailing_slash(self):557 def test_win32_strip_local_trailing_slash(self):
552 strip = urlutils._win32_strip_local_trailing_slash558 strip = urlutils._win32_strip_local_trailing_slash
553559
=== modified file 'bzrlib/urlutils.py'
--- bzrlib/urlutils.py 2011-11-16 10:36:31 +0000
+++ bzrlib/urlutils.py 2011-11-24 12:10:31 +0000
@@ -441,11 +441,15 @@
441 :param url: A relative or absolute URL441 :param url: A relative or absolute URL
442 :return: (url, subsegments)442 :return: (url, subsegments)
443 """443 """
444 (parent_url, child_dir) = split(url)444 # GZ 2011-11-18: Dodgy removing the terminal slash like this, function
445 subsegments = child_dir.split(",")445 # operates on urls not url+segments, and Transport classes
446 if len(subsegments) == 1:446 # should not be blindly adding slashes in the first place.
447 lurl = strip_trailing_slash(url)
448 # Segments begin at first comma after last forward slash, if one exists
449 segment_start = lurl.find(",", lurl.rfind("/")+1)
450 if segment_start == -1:
447 return (url, [])451 return (url, [])
448 return (join(parent_url, subsegments[0]), subsegments[1:])452 return (lurl[:segment_start], lurl[segment_start+1:].split(","))
449453
450454
451def split_segment_parameters(url):455def split_segment_parameters(url):
452456
=== modified file 'doc/en/release-notes/bzr-2.5.txt'
--- doc/en/release-notes/bzr-2.5.txt 2011-11-23 14:11:03 +0000
+++ doc/en/release-notes/bzr-2.5.txt 2011-11-24 12:10:31 +0000
@@ -50,6 +50,12 @@
50* Resolve regression from colocated branch path handling, by ensuring that50* Resolve regression from colocated branch path handling, by ensuring that
51 unreserved characters are unquoted in URLs. (Martin Packman, #842223)51 unreserved characters are unquoted in URLs. (Martin Packman, #842223)
5252
53* Split segments from URLs for colocated branches without assuming the
54 combined form is a valid. (Martin Packman, #842233)
55
56* Support looking up revision numbers by revision id in empty branches.
57 (Jelmer Vernooij, #535031)
58
53* Support verifying signatures on remote repositories.59* Support verifying signatures on remote repositories.
54 (Jelmer Vernooij, #889694)60 (Jelmer Vernooij, #889694)
5561