Merge lp:~bialix/bzr/2.1-528944-backslashes-strike-back into lp:bzr/2.1

Proposed by Alexander Belchenko
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 4841
Proposed branch: lp:~bialix/bzr/2.1-528944-backslashes-strike-back
Merge into: lp:bzr/2.1
Diff against target: 142 lines (+54/-55)
2 files modified
bzrlib/tests/test_win32utils.py (+21/-1)
bzrlib/win32utils.py (+33/-54)
To merge this branch: bzr merge lp:~bialix/bzr/2.1-528944-backslashes-strike-back
Reviewer Review Type Date Requested Status
Martin Packman (community) Approve
Gordon Tyler Approve
John A Meinel Pending
bzr-core Pending
Review via email: mp+25169@code.launchpad.net

Description of the change

I have failed to create a small change to fix the backslashes problem, so I have forced to rewrite the parser from scratch using 2 substates instead of main _state. Otherwise I have to duplicate backslashes handling in two states.

To post a comment you must log in.
Revision history for this message
Gordon Tyler (doxxx) wrote :

I'm wondering why you didn't just backport the cmdline module and related changes from bzr.dev? As far as I know that should handle backslashes correctly and if it doesn't that's a bug that needs to be fixed.

review: Needs Information
Revision history for this message
Alexander Belchenko (bialix) wrote :

Gordon Tyler пишет:
> Review: Needs Information
> I'm wondering why you didn't just backport the cmdline module and related changes from bzr.dev? As far as I know that should handle backslashes correctly and if it doesn't that's a bug that needs to be fixed.

Because this patch explicitly created for 2.1.x series only. Your cmdline module is more like new
feature and new API, while I want only my rug back^W^W^W to fix it for 2.1 series with minimal
changes and no API breaks. Also my code is smaller than your and even smaller than previous
implementation from John although it's now handling more weird cases than 2.1.1 code.

And I'd be happy to get in backport of your tests in addition to ones in my patch, so I can check if
there is more weird cases we want to care about.

Revision history for this message
Gordon Tyler (doxxx) wrote :

I see your point. Conversely, it would be nice if you could add your new test cases to bzr.dev's test_cmdline if they aren't already covered.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

Adding the test to trunk as well would indeed be a good.

I'm not really happy about the size of the change here, both me and Alexander were hoping for a simple one or two line diff. The redeeming aspect is that the code is at least well tested, so should be reasonably safe despite the several rewrites it's gone through.

review: Approve
Revision history for this message
Alexander Belchenko (bialix) wrote :

John A Meinel пишет:
> The proposal to merge lp:~bialix/bzr/2.1-528944-backslashes-strike-back into lp:bzr/2.1 has been updated.
>
> Status: Needs review => Approved

Please, help me land it.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/test_win32utils.py'
2--- bzrlib/tests/test_win32utils.py 2010-02-17 17:11:16 +0000
3+++ bzrlib/tests/test_win32utils.py 2010-05-12 16:23:24 +0000
4@@ -345,9 +345,29 @@
5 self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
6
7 def test_double_escape(self):
8- self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
9+ self.assertAsTokens([(True, u'foo\\\\bar')], u'"foo\\\\bar"')
10 self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
11
12+ def test_n_backslashes_handling(self):
13+ # https://bugs.launchpad.net/bzr/+bug/528944
14+ # actually we care about the doubled backslashes when they're
15+ # represents UNC paths.
16+ # But in fact there is too much weird corner cases
17+ # (see https://bugs.launchpad.net/tortoisebzr/+bug/569050)
18+ # so to reproduce every bit of windows command-line handling
19+ # could be not worth of efforts?
20+ self.requireFeature(BackslashDirSeparatorFeature)
21+ self.assertAsTokens([(True, r'\\host\path')], r'"\\host\path"')
22+ self.assertAsTokens([(False, r'\\host\path')], r'\\host\path')
23+ # handling of " after the 2n and 2n+1 backslashes
24+ # inside and outside the quoted string
25+ self.assertAsTokens([(True, r'\\'), (False, r'*.py')], r'"\\\\" *.py')
26+ self.assertAsTokens([(True, r'\\" *.py')], r'"\\\\\" *.py"')
27+ self.assertAsTokens([(True, r'\\ *.py')], r'\\\\" *.py"')
28+ self.assertAsTokens([(False, r'\\"'), (False, r'*.py')],
29+ r'\\\\\" *.py')
30+ self.assertAsTokens([(True, u'\\\\')], u'"\\\\')
31+
32
33 class Test_CommandLineToArgv(tests.TestCaseInTempDir):
34
35
36=== modified file 'bzrlib/win32utils.py'
37--- bzrlib/win32utils.py 2010-02-17 17:11:16 +0000
38+++ bzrlib/win32utils.py 2010-05-12 16:23:24 +0000
39@@ -536,70 +536,49 @@
40 # self._quote_match = re.compile(u'[\'"]').match
41 self._escape_match = lambda x: None # Never matches
42 self._escape = '\\'
43- # State can be
44- # ' ' - after whitespace, starting a new token
45- # 'a' - after text, currently working on a token
46- # '"' - after ", currently in a "-delimited quoted section
47- # "\" - after '\', checking the next char
48- self._state = ' '
49 self._token = [] # Current token being parsed
50
51 def _get_token(self):
52 # Were there quote chars as part of this token?
53- quoted = False
54- quoted_state = None
55+ quoted = None # state:
56+ # None - the string is not quoted
57+ # empty string ('') - there was quoted substring
58+ # double quote (") - we're inside quoted chunk
59+ number_of_backslashes = 0
60 for nextchar in self._input_iter:
61- if self._state == ' ':
62- if self._whitespace_match(nextchar):
63- # if self._token: return token
64- continue
65- elif nextchar in self._quote_chars:
66- self._state = nextchar # quoted state
67- elif self._word_match(nextchar):
68- self._token.append(nextchar)
69- self._state = 'a'
70- else:
71- raise AssertionError('wtttf?')
72- elif self._state in self._quote_chars:
73- quoted = True
74- if nextchar == self._state: # End of quote
75- self._state = 'a' # posix allows 'foo'bar to translate to
76- # foobar
77- elif self._state == '"' and nextchar == self._escape:
78- quoted_state = self._state
79- self._state = nextchar
80- else:
81- self._token.append(nextchar)
82- elif self._state == self._escape:
83- if nextchar == '\\':
84- self._token.append('\\')
85- elif nextchar == '"':
86- self._token.append(nextchar)
87- else:
88- self._token.append('\\' + nextchar)
89- self._state = quoted_state
90- elif self._state == 'a':
91- if self._whitespace_match(nextchar):
92- if self._token:
93- break # emit this token
94+ if self._whitespace_match(nextchar):
95+ if quoted:
96+ self._token.append(nextchar)
97+ elif self._token:
98+ break
99+ elif nextchar == '\\':
100+ number_of_backslashes += 1
101+ elif nextchar in self._quote_chars:
102+ if number_of_backslashes:
103+ self._token.append('\\'*(number_of_backslashes/2))
104+ if number_of_backslashes % 2:
105+ self._token.append('"')
106 else:
107- continue # no token to emit
108- elif nextchar in self._quote_chars:
109- # Start a new quoted section
110- self._state = nextchar
111- # escape?
112- elif (self._word_match(nextchar)
113- or nextchar in self._quote_chars
114- # or whitespace_split?
115- ):
116- self._token.append(nextchar)
117+ if quoted:
118+ quoted = ''
119+ else:
120+ quoted = nextchar
121+ number_of_backslashes = 0
122+ elif nextchar == quoted:
123+ # end of quoted string
124+ quoted = ''
125 else:
126- raise AssertionError('state == "a", char: %r'
127- % (nextchar,))
128+ quoted = nextchar
129 else:
130- raise AssertionError('unknown state: %r' % (self._state,))
131+ if number_of_backslashes:
132+ self._token.append('\\'*number_of_backslashes)
133+ number_of_backslashes = 0
134+ self._token.append(nextchar)
135+ if number_of_backslashes > 0:
136+ self._token.append('\\'*number_of_backslashes)
137 result = ''.join(self._token)
138 self._token = []
139+ quoted = quoted is not None
140 if not quoted and result == '':
141 result = None
142 return quoted, result

Subscribers

People subscribed via source and target branches