Merge lp:~doxxx/bzr/392428 into lp:bzr

Proposed by Gordon Tyler
Status: Superseded
Proposed branch: lp:~doxxx/bzr/392428
Merge into: lp:bzr
Diff against target: 439 lines (+182/-129)
6 files modified
bzrlib/commands.py (+6/-2)
bzrlib/tests/test_commands.py (+4/-1)
bzrlib/tests/test_diff.py (+18/-1)
bzrlib/tests/test_rules.py (+6/-2)
bzrlib/tests/test_win32utils.py (+10/-13)
bzrlib/win32utils.py (+138/-110)
To merge this branch: bzr merge lp:~doxxx/bzr/392428
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Alexander Belchenko Approve
Review via email: mp+16444@code.launchpad.net

This proposal has been superseded by a proposal from 2010-01-06.

To post a comment you must log in.
Revision history for this message
Gordon Tyler (doxxx) wrote :

Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.

I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.

Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:

bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
bzrdev diff --using C:\Tools\diff.cmd

'bzrdev selftest win32utils' still passes.

Revision history for this message
Alexander Belchenko (bialix) wrote :

For non-windows it's better to use the helper function from commands.py: shlex_split_unicode. But even so it's OK for me.

Thanks!

review: Approve
Revision history for this message
Gordon Tyler (doxxx) wrote :

> For non-windows it's better to use the helper function from commands.py:
> shlex_split_unicode. But even so it's OK for me.

That's the function I modified to use a different codepath on Windows.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> Gordon Tyler has proposed merging lp:~doxxx/bzr/392428 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #392428 `bzr diff --using C:\foo\bar` does not work
> https://bugs.launchpad.net/bugs/392428
>
>
> Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.
>
> I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.
>
> Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:
>
> bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
> bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
> bzrdev diff --using C:\Tools\diff.cmd
>
> 'bzrdev selftest win32utils' still passes.
>
>

I think command_line_to_argv is going to glob, and you probably don't
want that. It may not matter in practice, but its a thought.

The other changes seem gratuitous, but not specifically wrong.

As for testing...

1) Make sure any existing tests of 'shlex_split_unicode' aren't being
skipped on windows

2) Probably add a win32 specific test that when splitting "C:\foo\bar"
we preserve the '\' characters.

I'd like to see at least a smoke test added, but the code is simple
enough you don't have to be super thorough.

 review: needsfixing

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksv2mcACgkQJdeBCYSNAAN+WgCgwHvIP3/4KsE5JRpStpQuwk4i
uLEAoNmmJOXEPjwsaTg5VPSjbhrzW9qq
=ml5g
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
Gordon Tyler (doxxx) wrote :

So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

UnicodeShlex explicitly does not deal with ' because I was asked not to.
Alexander Belchenko feels that single quotes are not "Windows". (As
people can do:
  touch don'tdothis
  bzr add don'tdothis
)

*I* wanted ' to be a quote char. Just to mention that adding them in is
not just a bug, it is by design that we don't handle them.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksynfkACgkQJdeBCYSNAAPtWwCgrDby/wk6EopdNiwfj6V4Mxp4
wq4An2wJzn9ifGlLklYaQUOnYxLsvN3c
=cadk
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

2009/12/24 Gordon Tyler <email address hidden>:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Which ones? I don't see anything obviously relevant.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Gordon Tyler (doxxx) wrote :

C:\dev\bzr\392428>bzrdev selftest commands
testing: C:/dev/bzr/392428/bzr
   C:\dev\bzr\392428\bzrlib
   bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1

======================================================================
FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
le_quotes
    commands.get_alias("diff", config=my_config))
AssertionError: not equal:
a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
b = [u'diff',
 u'-r',
 u'-2..-1',
 u'--diff-options',
 u"'--strip-trailing-cr",
 u"-wp'"]

======================================================================
FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
ode
    commands.get_alias("iam", config=my_config))
AssertionError: not equal:
a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]

----------------------------------------------------------------------
Ran 41 tests in 0.547s

FAILED (failures=2)
1 test skipped
Missing feature 'FTPServer' skipped 18 tests.

Revision history for this message
Martin Pool (mbp) wrote :

OK.

So I think that unless we change the approach to quoting on Windows,
we probably need to make the tests skip on Windows, and add a comment
about why.

The fact that single quotes are accepted in configuration files (as
opposed to the command line) on unix but not on Windows is a bit ugly.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Gordon Tyler (doxxx) wrote :

I've pushed an update which makes test_single_quotes skip on win32 and fixes test_unicode for win32 by making it use double-quotes instead of single-quotes (the type of quote doesn't really matter for that specific test).

So now both test_win32utils and test_commands pass. I'm running full selftest now to see if anything else related fails.

Revision history for this message
Alexander Belchenko (bialix) wrote :

These two failures are wrong by itself and should not be affected by
single quote vs double quote differences.

Gordon Tyler пишет:
> C:\dev\bzr\392428>bzrdev selftest commands
> testing: C:/dev/bzr/392428/bzr
> C:\dev\bzr\392428\bzrlib
> bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1
>
>
> ======================================================================
> FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
> le_quotes
> commands.get_alias("diff", config=my_config))
> AssertionError: not equal:
> a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
> b = [u'diff',
> u'-r',
> u'-2..-1',
> u'--diff-options',
> u"'--strip-trailing-cr",
> u"-wp'"]
>
>
> ======================================================================
> FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
> ode
> commands.get_alias("iam", config=my_config))
> AssertionError: not equal:
> a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
> b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]
>
>
> ----------------------------------------------------------------------
> Ran 41 tests in 0.547s
>
> FAILED (failures=2)
> 1 test skipped
> Missing feature 'FTPServer' skipped 18 tests.
>

Revision history for this message
Gordon Tyler (doxxx) wrote :

> These two failures are wrong by itself and should not be affected by
> single quote vs double quote differences.

Okay, I'll see about fixing them.

Revision history for this message
Gordon Tyler (doxxx) wrote :

UnicodeShlex doesn't handling escaping a space, so test_from_string_u5 in test_diff.TestDiffFromTool is failing with the modified shlex_split_unicode codepath for win32. So another thing to be fixed.

Revision history for this message
Gordon Tyler (doxxx) wrote :

I've written a replacement for UnicodeShlex which duplicates the way the Win32 API function CommandLineToArgvW handles splitting a commandline into arguments, but with the added functionality of indicating which arguments are quoted, like UnicodeShlex did.

Revision history for this message
Gordon Tyler (doxxx) wrote :

And I've just added a test to test_diff which actually tests the scenario in bug 392428.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/commands.py'
2--- bzrlib/commands.py 2009-12-09 05:47:32 +0000
3+++ bzrlib/commands.py 2010-01-02 01:26:16 +0000
4@@ -869,8 +869,12 @@
5
6
7 def shlex_split_unicode(unsplit):
8- import shlex
9- return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
10+ if sys.platform == "win32":
11+ from bzrlib.win32utils import command_line_to_argv
12+ return command_line_to_argv(unsplit, wildcard_expansion=False)
13+ else:
14+ import shlex
15+ return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
16
17
18 def get_alias(cmd, config=None):
19
20=== modified file 'bzrlib/tests/test_commands.py'
21--- bzrlib/tests/test_commands.py 2009-05-23 21:01:51 +0000
22+++ bzrlib/tests/test_commands.py 2010-01-02 01:26:16 +0000
23@@ -94,6 +94,9 @@
24 commands.get_alias("diff", config=my_config))
25
26 def test_single_quotes(self):
27+ if sys.platform == 'win32':
28+ raise TestSkipped("commandline parsing on win32 does not "
29+ "support single quotes")
30 my_config = self._get_config("[ALIASES]\n"
31 "diff=diff -r -2..-1 --diff-options "
32 "'--strip-trailing-cr -wp'\n")
33@@ -111,7 +114,7 @@
34
35 def test_unicode(self):
36 my_config = self._get_config("[ALIASES]\n"
37- u"iam=whoami 'Erik B\u00e5gfors <erik@bagfors.nu>'\n")
38+ u'iam=whoami "Erik B\u00e5gfors <erik@bagfors.nu>"\n')
39 self.assertEqual([u'whoami', u'Erik B\u00e5gfors <erik@bagfors.nu>'],
40 commands.get_alias("iam", config=my_config))
41
42
43=== modified file 'bzrlib/tests/test_diff.py'
44--- bzrlib/tests/test_diff.py 2009-12-22 15:50:40 +0000
45+++ bzrlib/tests/test_diff.py 2010-01-02 01:26:16 +0000
46@@ -45,6 +45,8 @@
47 from bzrlib.revisiontree import RevisionTree
48 from bzrlib.revisionspec import RevisionSpec
49
50+from bzrlib.tests.test_win32utils import BackslashDirSeparatorFeature
51+
52
53 class _AttribFeature(Feature):
54
55@@ -1292,12 +1294,27 @@
56 diff_obj.command_template)
57
58 def test_from_string_u5(self):
59- diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
60+ # win32 doesn't support escaping spaces with backslashes
61+ if sys.platform == 'win32':
62+ tool = 'diff "-u 5"'
63+ else:
64+ tool = 'diff -u\\ 5'
65+ diff_obj = DiffFromTool.from_string(tool, None, None, None)
66 self.addCleanup(diff_obj.finish)
67 self.assertEqual(['diff', '-u 5', '@old_path', '@new_path'],
68 diff_obj.command_template)
69 self.assertEqual(['diff', '-u 5', 'old-path', 'new-path'],
70 diff_obj._get_command('old-path', 'new-path'))
71+
72+ def test_from_string_path_with_backslashes(self):
73+ self.requireFeature(BackslashDirSeparatorFeature)
74+ tool = 'C:\\Tools\\Diff.exe'
75+ diff_obj = DiffFromTool.from_string(tool, None, None, None)
76+ self.addCleanup(diff_obj.finish)
77+ self.assertEqual(['C:\\Tools\\Diff.exe', '@old_path', '@new_path'],
78+ diff_obj.command_template)
79+ self.assertEqual(['C:\\Tools\\Diff.exe', 'old-path', 'new-path'],
80+ diff_obj._get_command('old-path', 'new-path'))
81
82 def test_execute(self):
83 output = StringIO()
84
85=== modified file 'bzrlib/tests/test_rules.py'
86--- bzrlib/tests/test_rules.py 2009-03-23 14:59:43 +0000
87+++ bzrlib/tests/test_rules.py 2010-01-02 01:26:16 +0000
88@@ -63,8 +63,12 @@
89 rs.get_selected_items('a.txt', ['foo']))
90
91 def test_get_items_from_multiple_glob_match(self):
92- rs = self.make_searcher(
93- "[name *.txt *.py 'x x' \"y y\"]\nfoo=bar\na=True\n")
94+ # win32 doesn't support single quotes for quoting args with spaces
95+ if sys.platform == 'win32':
96+ text = '[name *.txt *.py "x x" "y y"]\nfoo=bar\na=True\n'
97+ else:
98+ text = """[name *.txt *.py 'x x' "y y"]\nfoo=bar\na=True\n"""
99+ rs = self.make_searcher(text)
100 self.assertEquals((), rs.get_items('NEWS'))
101 self.assertEquals((('foo', 'bar'), ('a', 'True')),
102 rs.get_items('a.py'))
103
104=== modified file 'bzrlib/tests/test_win32utils.py'
105--- bzrlib/tests/test_win32utils.py 2009-11-20 16:42:28 +0000
106+++ bzrlib/tests/test_win32utils.py 2010-01-02 01:26:16 +0000
107@@ -291,11 +291,10 @@
108 win32utils.set_file_attr_hidden(path)
109
110
111-
112-class TestUnicodeShlex(tests.TestCase):
113+class TestUnicodeCommandLineSplitter(tests.TestCase):
114
115 def assertAsTokens(self, expected, line):
116- s = win32utils.UnicodeShlex(line)
117+ s = win32utils.UnicodeCommandLineSplitter(line)
118 self.assertEqual(expected, list(s))
119
120 def test_simple(self):
121@@ -311,14 +310,6 @@
122 def test_ignore_trailing_space(self):
123 self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar ')
124
125- def test_posix_quotations(self):
126- self.assertAsTokens([(True, u'foo bar')], u'"foo bar"')
127- self.assertAsTokens([(False, u"'fo''o"), (False, u"b''ar'")],
128- u"'fo''o b''ar'")
129- self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"')
130- self.assertAsTokens([(True, u"fo'o"), (True, u"b'ar")],
131- u'"fo"\'o b\'"ar"')
132-
133 def test_nested_quotations(self):
134 self.assertAsTokens([(True, u'foo"" bar')], u"\"foo\\\"\\\" bar\"")
135 self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"")
136@@ -343,10 +334,16 @@
137
138 def test_escape_quote(self):
139 self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
140+ self.assertAsTokens([(True, u'foo\\"bar')], u'"foo\\\\\\"bar"')
141+ self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\"bar"')
142
143 def test_double_escape(self):
144- self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
145+ self.assertAsTokens([(True, u'foo\\\\bar')], u'"foo\\\\bar"')
146 self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
147+
148+ def test_multiple_quoted_args(self):
149+ self.assertAsTokens([(True, u'x x'), (True, u'y y')],
150+ u'"x x" "y y"')
151
152
153 class Test_CommandLineToArgv(tests.TestCaseInTempDir):
154@@ -355,7 +352,7 @@
155 # Strictly speaking we should respect parameter order versus glob
156 # expansions, but it's not really worth the effort here
157 self.assertEqual(expected,
158- sorted(win32utils._command_line_to_argv(line)))
159+ sorted(win32utils.command_line_to_argv(line)))
160
161 def test_glob_paths(self):
162 self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
163
164=== modified file 'bzrlib/win32utils.py'
165--- bzrlib/win32utils.py 2009-12-16 06:38:15 +0000
166+++ bzrlib/win32utils.py 2010-01-02 01:26:16 +0000
167@@ -517,117 +517,147 @@
168 trace.mutter('Unable to set hidden attribute on %r: %s', path, e)
169
170
171-
172-class UnicodeShlex(object):
173- """This is a very simplified version of shlex.shlex.
174-
175- The main change is that it supports non-ascii input streams. The internal
176- structure is quite simplified relative to shlex.shlex, since we aren't
177- trying to handle multiple input streams, etc. In fact, we don't use a
178- file-like api either.
179- """
180-
181- def __init__(self, uni_string):
182- self._input = uni_string
183- self._input_iter = iter(self._input)
184- self._whitespace_match = re.compile(u'\s').match
185- self._word_match = re.compile(u'\S').match
186- self._quote_chars = u'"'
187- # self._quote_match = re.compile(u'[\'"]').match
188- self._escape_match = lambda x: None # Never matches
189- self._escape = '\\'
190- # State can be
191- # ' ' - after whitespace, starting a new token
192- # 'a' - after text, currently working on a token
193- # '"' - after ", currently in a "-delimited quoted section
194- # "\" - after '\', checking the next char
195- self._state = ' '
196- self._token = [] # Current token being parsed
197-
198- def _get_token(self):
199- # Were there quote chars as part of this token?
200- quoted = False
201- quoted_state = None
202- for nextchar in self._input_iter:
203- if self._state == ' ':
204- if self._whitespace_match(nextchar):
205- # if self._token: return token
206- continue
207- elif nextchar in self._quote_chars:
208- self._state = nextchar # quoted state
209- elif self._word_match(nextchar):
210- self._token.append(nextchar)
211- self._state = 'a'
212- else:
213- raise AssertionError('wtttf?')
214- elif self._state in self._quote_chars:
215- quoted = True
216- if nextchar == self._state: # End of quote
217- self._state = 'a' # posix allows 'foo'bar to translate to
218- # foobar
219- elif self._state == '"' and nextchar == self._escape:
220- quoted_state = self._state
221- self._state = nextchar
222- else:
223- self._token.append(nextchar)
224- elif self._state == self._escape:
225- if nextchar == '\\':
226- self._token.append('\\')
227- elif nextchar == '"':
228- self._token.append(nextchar)
229- else:
230- self._token.append('\\' + nextchar)
231- self._state = quoted_state
232- elif self._state == 'a':
233- if self._whitespace_match(nextchar):
234- if self._token:
235- break # emit this token
236- else:
237- continue # no token to emit
238- elif nextchar in self._quote_chars:
239- # Start a new quoted section
240- self._state = nextchar
241- # escape?
242- elif (self._word_match(nextchar)
243- or nextchar in self._quote_chars
244- # or whitespace_split?
245- ):
246- self._token.append(nextchar)
247- else:
248- raise AssertionError('state == "a", char: %r'
249- % (nextchar,))
250- else:
251- raise AssertionError('unknown state: %r' % (self._state,))
252- result = ''.join(self._token)
253- self._token = []
254- if not quoted and result == '':
255- result = None
256- return quoted, result
257-
258- def __iter__(self):
259- return self
260-
261+class _PushbackSequence(object):
262+ def __init__(self, orig):
263+ self._iter = iter(orig)
264+ self._pushback_buffer = []
265+
266+ def next(self):
267+ if len(self._pushback_buffer) > 0:
268+ return self._pushback_buffer.pop()
269+ else:
270+ return self._iter.next()
271+
272+ def pushback(self, char):
273+ self._pushback_buffer.append(char)
274+
275+ def __iter__(self):
276+ return self
277+
278+class _Whitespace(object):
279+ def process(self, next_char, seq, context):
280+ if _whitespace_match(next_char):
281+ if len(context.token) > 0:
282+ return None
283+ else:
284+ return self
285+ elif next_char == u'"':
286+ context.quoted = True
287+ return _Quotes(self)
288+ elif next_char == u'\\':
289+ return _Backslash(self)
290+ else:
291+ context.token.append(next_char)
292+ return _Word()
293+
294+class _Quotes(object):
295+ def __init__(self, exit_state):
296+ self.exit_state = exit_state
297+
298+ def process(self, next_char, seq, context):
299+ if next_char == u'\\':
300+ return _Backslash(self)
301+ elif next_char == u'"':
302+ return self.exit_state
303+ else:
304+ context.token.append(next_char)
305+ return self
306+
307+class _Backslash(object):
308+ # See http://msdn.microsoft.com/en-us/library/bb776391(VS.85).aspx
309+ def __init__(self, exit_state):
310+ self.exit_state = exit_state
311+ self.count = 1
312+
313+ def process(self, next_char, seq, context):
314+ if next_char == u'\\':
315+ self.count += 1
316+ return self
317+ elif next_char == u'"':
318+ # 2N backslashes followed by '"' are N backslashes
319+ context.token.append(u'\\' * (self.count/2))
320+ # 2N+1 backslashes follwed by '"' are N backslashes followed by '"'
321+ # which should not be processed as the start or end of quoted arg
322+ if self.count % 2 == 1:
323+ context.token.append(next_char) # odd number of '\' escapes the '"'
324+ else:
325+ seq.pushback(next_char) # let exit_state handle next_char
326+ self.count = 0
327+ return self.exit_state
328+ else:
329+ # N backslashes not followed by '"' are just N backslashes
330+ if self.count > 0:
331+ context.token.append(u'\\' * self.count)
332+ self.count = 0
333+ seq.pushback(next_char) # let exit_state handle next_char
334+ return self.exit_state
335+
336+ def finish(self, context):
337+ if self.count > 0:
338+ context.token.append(u'\\' * self.count)
339+
340+
341+class _Word(object):
342+ def process(self, next_char, seq, context):
343+ if _whitespace_match(next_char):
344+ return None
345+ elif next_char == u'"':
346+ return _Quotes(self)
347+ elif next_char == u'\\':
348+ return _Backslash(self)
349+ else:
350+ context.token.append(next_char)
351+ return self
352+
353+_whitespace_match = re.compile(u'\s').match
354+class UnicodeCommandLineSplitter(object):
355+ def __init__(self, command_line):
356+ self._seq = _PushbackSequence(command_line)
357+
358+ def __iter__(self):
359+ return self
360+
361 def next(self):
362 quoted, token = self._get_token()
363 if token is None:
364 raise StopIteration
365 return quoted, token
366-
367-
368-def _command_line_to_argv(command_line):
369- """Convert a Unicode command line into a set of argv arguments.
370-
371- This does wildcard expansion, etc. It is intended to make wildcards act
372- closer to how they work in posix shells, versus how they work by default on
373- Windows.
374+
375+ def _get_token(self):
376+ self.quoted = False
377+ self.token = []
378+ state = _Whitespace()
379+ for next_char in self._seq:
380+ state = state.process(next_char, self._seq, self)
381+ if state is None:
382+ break
383+ if not state is None and not getattr(state, 'finish', None) is None:
384+ state.finish(self)
385+ result = u''.join(self.token)
386+ if not self.quoted and result == '':
387+ result = None
388+ return self.quoted, result
389+
390+
391+def command_line_to_argv(command_line, wildcard_expansion=True):
392+ """Convert a Unicode command line into a list of argv arguments.
393+
394+ This optionally does wildcard expansion, etc. It is intended to make
395+ wildcards act closer to how they work in posix shells, versus how they
396+ work by default on Windows. Quoted arguments are left untouched.
397+
398+ :param command_line: The unicode string to split into an arg list.
399+ :param wildcard_expansion: Whether wildcard expansion should be applied to
400+ each argument. True by default.
401+ :return: A list of unicode strings.
402 """
403- s = UnicodeShlex(command_line)
404- # Now that we've split the content, expand globs
405+ s = UnicodeCommandLineSplitter(command_line)
406+ # Now that we've split the content, expand globs if necessary
407 # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like
408 # '**/' style globs
409 args = []
410 for is_quoted, arg in s:
411- if is_quoted or not glob.has_magic(arg):
412+ if is_quoted or not glob.has_magic(arg) or not wildcard_expansion:
413 args.append(arg)
414 else:
415 args.extend(glob_one(arg))
416@@ -636,16 +666,14 @@
417
418 if has_ctypes and winver != 'Windows 98':
419 def get_unicode_argv():
420- LPCWSTR = ctypes.c_wchar_p
421- INT = ctypes.c_int
422- POINTER = ctypes.POINTER
423- prototype = ctypes.WINFUNCTYPE(LPCWSTR)
424- GetCommandLine = prototype(("GetCommandLineW",
425- ctypes.windll.kernel32))
426- prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))
427- command_line = GetCommandLine()
428+ prototype = ctypes.WINFUNCTYPE(ctypes.c_wchar_p, use_last_error=True)
429+ GetCommandLineW = prototype(("GetCommandLineW",
430+ ctypes.windll.kernel32))
431+ command_line = GetCommandLineW()
432+ if command_line is None:
433+ raise ctypes.WinError()
434 # Skip the first argument, since we only care about parameters
435- argv = _command_line_to_argv(command_line)[1:]
436+ argv = command_line_to_argv(command_line)[1:]
437 if getattr(sys, 'frozen', None) is None:
438 # Invoked via 'python.exe' which takes the form:
439 # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]