Bazaar

Merge lp:~gz/bzr/split_segment_params_not_url_842233 into lp:bzr

split_segment_params_not_url_842233
Merge into bzr.dev

Proposed by Martin Packman on 2011-11-18

Status:

Merged

Approved by:

Vincent Ladeuil on 2011-11-18

Approved revision:

no longer in the source branch.

Merged at revision:

6286

Proposed branch:

lp:~gz/bzr/split_segment_params_not_url_842233

Merge into:

lp:bzr

Diff against target:

94 lines (+23/-7)

3 files modified

bzrlib/tests/test_urlutils.py (+9/-3)
bzrlib/urlutils.py (+8/-4)
doc/en/release-notes/bzr-2.5.txt (+6/-0)

To merge this branch:

bzr merge lp:~gz/bzr/split_segment_params_not_url_842233

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Vincent Ladeuil		2011-11-18	Approve on 2011-11-18
Review via email: mp+82722@code.launchpad.net

Commit message

Avoid using URL specific join and split when seperating segment parameters

Description of the change

Correctly split segment parameters from a url, without depending on url handling functions. This branch keeps the current semantics, except for a couple of assertions on relative paths which are invalid.

For background, see the merge proposal fixing the main initial fallout from the addition colo branch addressing:

<https://code.launchpad.net/~gz/bzr/root_drive_file_url_841322/+merge/74034>

The salient point is that to make a colo address, the join code roughly uses "%(url)s,branch=name" with whatever the base url is. Blindly sticking a comma and segment params on the end of any old url means the result is no longer a url - depending on the base format it may still be parsable as one, but it may not.

Such as:
    <http://example.com> is valid
    <http://example.com,branch=name> is invalid
    <file:///> is special-cased to valid in bzrlib on windows
    <file:///,branch=name> is invalid on windows

So, either we must be careful to make sure to always remove the segment parameters smuggled on before using any url handling functions or passing anywhere that needs a url, or a different approach is needed.

This is further complicated by the laxness of the various Transport implementations when it comes to url handling, which involves a lot of plain string operations without checking inputs. Particularly the adding terminal slashes in some places and removing them in others is troublesome.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-11-18:

Thanks for your patience on that.

Overall, I agree that url handling in the various transports classes may receive some love especially about which encoding is used (including url-escaped) is/should/should not be used by which methods.

It's both amazing and scary that you've been able to fix this bug without touching a single transport test... I understand why but it's still amazing.

A couple of nits:

8 + # Check relative references with absolute paths
17 + # Check relative references with relative paths
29 + # TODO: Check full URLs as well as relative references

33 + # Check relative references with absolute paths
42 + # Check relative references with relative paths
51 + # TODO: Check full URLs as well as relative references

Hmm, do I see a pattern there ?

This screams: split these tests to my ears.

70 + # Segements begin at first comma after last forward slash, if one exists

s/Segements/Segments/

71 + segment_start = lurl.find(",", lurl.rfind("/")+1)

'+1'... Cunning ;) and commented !

74 - return (join(parent_url, subsegments[0]), subsegments[1:])
75 + return lurl[:segment_start], lurl[segment_start+1:].split(",")

The comma is easy to miss, the leading paren was clearer to indicate that two values are returned.

Two final notes:
- there has been discussion in the past about never removing the final path from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
- I'd really like to see your mail summarising your thoughts about url handling across the code base ;-D

.. and one more thing: news entry

review: Approve

Revision history for this message

Martin Packman (gz) wrote on 2011-11-21:

> Overall, I agree that url handling in the various transports classes may
> receive some love especially about which encoding is used (including url-
> escaped) is/should/should not be used by which methods.

Yes, Jelmer has gone some way towards sorting this out, but there's still a lot of confusion in the code.

> It's both amazing and scary that you've been able to fix this bug without
> touching a single transport test... I understand why but it's still amazing.

Well, that's due to the cheat with split_trailing_slash which lets me replicate the existing (somewhat bogus) behaviour but without tickling the main problem by trying to split/join as a url.

> Hmm, do I see a pattern there ?
>
> This screams: split these tests to my ears.

Yes, and given it should really cover the case with schemes as well, it almost wants parametrizing. I resisted that as part of this change as I'm unconvinced by the current interfaces.

> Two final notes:
> - there has been discussion in the past about never removing the final path
> from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
> - I'd really like to see your mail summarising your thoughts about url
> handling across the code base ;-D

Being far more careful about what are valid operations on different kinds of url scheme would help, which perhaps the URL class can assist with though I'm a little leery of the added weight there.

> .. and one more thing: news entry

Added, and nits addressed.

Revision history for this message

Martin Packman (gz) wrote on 2011-11-24:

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Martin Packman

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

1	=== modified file 'bzrlib/tests/test_urlutils.py'
2	--- bzrlib/tests/test_urlutils.py 2011-09-04 23:18:43 +0000
3	+++ bzrlib/tests/test_urlutils.py 2011-11-24 12:10:31 +0000
4	@@ -501,6 +501,7 @@
5
6	def test_split_segment_parameters_raw(self):
7	split_segment_parameters_raw = urlutils.split_segment_parameters_raw
8	+ # Check relative references with absolute paths
9	self.assertEquals(("/some/path", []),
10	split_segment_parameters_raw("/some/path"))
11	self.assertEquals(("/some/path", ["tip"]),
12	@@ -511,19 +512,22 @@
13	split_segment_parameters_raw("/somedir/path,heads%2Ftip"))
14	self.assertEquals(("/somedir/path", ["heads%2Ftip", "bar"]),
15	split_segment_parameters_raw("/somedir/path,heads%2Ftip,bar"))
16	- self.assertEquals(("/", ["key1=val1"]),
17	+ # Check relative references with relative paths
18	+ self.assertEquals(("", ["key1=val1"]),
19	split_segment_parameters_raw(",key1=val1"))
20	self.assertEquals(("foo/", ["key1=val1"]),
21	split_segment_parameters_raw("foo/,key1=val1"))
22	- self.assertEquals(("/foo", ["key1=val1"]),
23	+ self.assertEquals(("foo", ["key1=val1"]),
24	split_segment_parameters_raw("foo,key1=val1"))
25	self.assertEquals(("foo/base,la=bla/other/elements", []),
26	split_segment_parameters_raw("foo/base,la=bla/other/elements"))
27	self.assertEquals(("foo/base,la=bla/other/elements", ["a=b"]),
28	split_segment_parameters_raw("foo/base,la=bla/other/elements,a=b"))
29	+ # TODO: Check full URLs as well as relative references
30
31	def test_split_segment_parameters(self):
32	split_segment_parameters = urlutils.split_segment_parameters
33	+ # Check relative references with absolute paths
34	self.assertEquals(("/some/path", {}),
35	split_segment_parameters("/some/path"))
36	self.assertEquals(("/some/path", {"branch": "tip"}),
37	@@ -538,7 +542,8 @@
38	"/somedir/path,ref=heads%2Ftip,key1=val1"))
39	self.assertEquals(("/somedir/path", {"ref": "heads%2F=tip"}),
40	split_segment_parameters("/somedir/path,ref=heads%2F=tip"))
41	- self.assertEquals(("/", {"key1": "val1"}),
42	+ # Check relative references with relative paths
43	+ self.assertEquals(("", {"key1": "val1"}),
44	split_segment_parameters(",key1=val1"))
45	self.assertEquals(("foo/", {"key1": "val1"}),
46	split_segment_parameters("foo/,key1=val1"))
47	@@ -547,6 +552,7 @@
48	self.assertEquals(("foo/base,key1=val1/other/elements",
49	{"key2": "val2"}), split_segment_parameters(
50	"foo/base,key1=val1/other/elements,key2=val2"))
51	+ # TODO: Check full URLs as well as relative references
52
53	def test_win32_strip_local_trailing_slash(self):
54	strip = urlutils._win32_strip_local_trailing_slash
55
56	=== modified file 'bzrlib/urlutils.py'
57	--- bzrlib/urlutils.py 2011-11-16 10:36:31 +0000
58	+++ bzrlib/urlutils.py 2011-11-24 12:10:31 +0000
59	@@ -441,11 +441,15 @@
60	:param url: A relative or absolute URL
61	:return: (url, subsegments)
62	"""
63	- (parent_url, child_dir) = split(url)
64	- subsegments = child_dir.split(",")
65	- if len(subsegments) == 1:
66	+ # GZ 2011-11-18: Dodgy removing the terminal slash like this, function
67	+ # operates on urls not url+segments, and Transport classes
68	+ # should not be blindly adding slashes in the first place.
69	+ lurl = strip_trailing_slash(url)
70	+ # Segments begin at first comma after last forward slash, if one exists
71	+ segment_start = lurl.find(",", lurl.rfind("/")+1)
72	+ if segment_start == -1:
73	return (url, [])
74	- return (join(parent_url, subsegments[0]), subsegments[1:])
75	+ return (lurl[:segment_start], lurl[segment_start+1:].split(","))
76
77
78	def split_segment_parameters(url):
79
80	=== modified file 'doc/en/release-notes/bzr-2.5.txt'
81	--- doc/en/release-notes/bzr-2.5.txt 2011-11-23 14:11:03 +0000
82	+++ doc/en/release-notes/bzr-2.5.txt 2011-11-24 12:10:31 +0000
83	@@ -50,6 +50,12 @@
84	* Resolve regression from colocated branch path handling, by ensuring that
85	unreserved characters are unquoted in URLs. (Martin Packman, #842223)
86
87	+* Split segments from URLs for colocated branches without assuming the
88	+ combined form is a valid. (Martin Packman, #842233)
89	+
90	+* Support looking up revision numbers by revision id in empty branches.
91	+ (Jelmer Vernooij, #535031)
92	+
93	* Support verifying signatures on remote repositories.
94	(Jelmer Vernooij, #889694)
95