Merge into bzr.dev : 392428 : Code : Bazaar

Reviewer	Date Requested	Status
Martin Packman (community)		Abstain on 2010-01-19
John A Meinel	2010-01-06	Needs Fixing on 2010-01-13
Alexander Belchenko	2010-01-06	Pending
Review via email: mp+16893@code.launchpad.net

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-21: Posted in a previous version of this proposal

#

Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.

I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.

Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:

bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
bzrdev diff --using C:\Tools\diff.cmd

'bzrdev selftest win32utils' still passes.

Revision history for this message

Alexander Belchenko (bialix) wrote on 2009-12-21: Posted in a previous version of this proposal

#

For non-windows it's better to use the helper function from commands.py: shlex_split_unicode. But even so it's OK for me.

Thanks!

review: Approve

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-21: Posted in a previous version of this proposal

#

> For non-windows it's better to use the helper function from commands.py:
> shlex_split_unicode. But even so it's OK for me.

That's the function I modified to use a different codepath on Windows.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-21: Posted in a previous version of this proposal

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> Gordon Tyler has proposed merging lp:~doxxx/bzr/392428 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #392428 `bzr diff --using C:\foo\bar` does not work
> https://bugs.launchpad.net/bugs/392428
>
>
> Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.
>
> I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.
>
> Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:
>
> bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
> bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
> bzrdev diff --using C:\Tools\diff.cmd
>
> 'bzrdev selftest win32utils' still passes.
>
>

I think command_line_to_argv is going to glob, and you probably don't
want that. It may not matter in practice, but its a thought.

The other changes seem gratuitous, but not specifically wrong.

As for testing...

1) Make sure any existing tests of 'shlex_split_unicode' aren't being
skipped on windows

2) Probably add a win32 specific test that when splitting "C:\foo\bar"
we preserve the '\' characters.

I'd like to see at least a smoke test added, but the code is simple
enough you don't have to be super thorough.

review: needsfixing

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksv2mcACgkQJdeBCYSNAAN+WgCgwHvIP3/4KsE5JRpStpQuwk4i
uLEAoNmmJOXEPjwsaTg5VPSjbhrzW9qq
=ml5g
-----END PGP SIGNATURE-----

review: Needs Fixing

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-23: Posted in a previous version of this proposal

#

So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-23: Posted in a previous version of this proposal

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

UnicodeShlex explicitly does not deal with ' because I was asked not to.
Alexander Belchenko feels that single quotes are not "Windows". (As
people can do:
touch don'tdothis
bzr add don'tdothis
)

*I* wanted ' to be a quote char. Just to mention that adding them in is
not just a bug, it is by design that we don't handle them.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksynfkACgkQJdeBCYSNAAPtWwCgrDby/wk6EopdNiwfj6V4Mxp4
wq4An2wJzn9ifGlLklYaQUOnYxLsvN3c
=cadk
-----END PGP SIGNATURE-----

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-23: Posted in a previous version of this proposal

#

2009/12/24 Gordon Tyler <email address hidden>:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Which ones? I don't see anything obviously relevant.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-24: Posted in a previous version of this proposal

#

C:\dev\bzr\392428>bzrdev selftest commands
testing: C:/dev/bzr/392428/bzr
C:\dev\bzr\392428\bzrlib
bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1

======================================================================
FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
le_quotes
commands.get_alias("diff", config=my_config))
AssertionError: not equal:
a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
b = [u'diff',
u'-r',
u'-2..-1',
u'--diff-options',
u"'--strip-trailing-cr",
u"-wp'"]

======================================================================
FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
ode
commands.get_alias("iam", config=my_config))
AssertionError: not equal:
a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]

----------------------------------------------------------------------
Ran 41 tests in 0.547s

FAILED (failures=2)
1 test skipped
Missing feature 'FTPServer' skipped 18 tests.

Revision history for this message

Martin Pool (mbp) wrote on 2009-12-24: Posted in a previous version of this proposal

#

OK.

So I think that unless we change the approach to quoting on Windows,
we probably need to make the tests skip on Windows, and add a comment
about why.

The fact that single quotes are accepted in configuration files (as
opposed to the command line) on unix but not on Windows is a bit ugly.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-24: Posted in a previous version of this proposal

#

I've pushed an update which makes test_single_quotes skip on win32 and fixes test_unicode for win32 by making it use double-quotes instead of single-quotes (the type of quote doesn't really matter for that specific test).

So now both test_win32utils and test_commands pass. I'm running full selftest now to see if anything else related fails.

Revision history for this message

Alexander Belchenko (bialix) wrote on 2009-12-24: Posted in a previous version of this proposal

#

These two failures are wrong by itself and should not be affected by
single quote vs double quote differences.

Gordon Tyler пишет:
> C:\dev\bzr\392428>bzrdev selftest commands
> testing: C:/dev/bzr/392428/bzr
> C:\dev\bzr\392428\bzrlib
> bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1
>
>
> ======================================================================
> FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
> le_quotes
> commands.get_alias("diff", config=my_config))
> AssertionError: not equal:
> a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
> b = [u'diff',
> u'-r',
> u'-2..-1',
> u'--diff-options',
> u"'--strip-trailing-cr",
> u"-wp'"]
>
>
> ======================================================================
> FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
> ode
> commands.get_alias("iam", config=my_config))
> AssertionError: not equal:
> a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
> b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]
>
>
> ----------------------------------------------------------------------
> Ran 41 tests in 0.547s
>
> FAILED (failures=2)
> 1 test skipped
> Missing feature 'FTPServer' skipped 18 tests.
>

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-24: Posted in a previous version of this proposal

#

> These two failures are wrong by itself and should not be affected by
> single quote vs double quote differences.

Okay, I'll see about fixing them.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2009-12-24: Posted in a previous version of this proposal

#

UnicodeShlex doesn't handling escaping a space, so test_from_string_u5 in test_diff.TestDiffFromTool is failing with the modified shlex_split_unicode codepath for win32. So another thing to be fixed.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-02: Posted in a previous version of this proposal

#

I've written a replacement for UnicodeShlex which duplicates the way the Win32 API function CommandLineToArgvW handles splitting a commandline into arguments, but with the added functionality of indicating which arguments are quoted, like UnicodeShlex did.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-02: Posted in a previous version of this proposal

#

And I've just added a test to test_diff which actually tests the scenario in bug 392428.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-06:

#

This is a resubmission of the original proposal because I made significant code changes during the discussion of the previous merge proposal and I thought it could use a fresh review.

This merge proposal fixes bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32. However, this caused various tests to fail due to the command line parser in win32utils, UnicodeShlex, purposefully not handling single quotes. It also sometimes escaped spaces with a backslash and sometimes did not.

So I decided to re-implement command line parsing in win32utils to match the way the Win32 API function, CommandLineToArgvW, parses the command line, particularly in regards to how it handles backslashes and their interaction with double quotes. I couldn't just use the CommandLineToArgvW function itself because command_line_to_argv needs to know which arguments were quoted and which were not so that it can be perform the appropriate (optional) wildcard expansion.

I've updated the various tests affected by these changes to handle the differences in how win32 handles quoting (no single quotes) and escapes (can't escape a space with a backslash). I've also added a test in test_diff for the specific case in bug 392428, a diff tool specified as a win32 path with backslashes.

I've also tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:

bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
bzrdev diff --using C:\Tools\diff.cmd

While I was updating get_unicode_argv to use the new command_line_to_argv, I noticed that there was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I cleaned this up and added some error handling as well.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-06:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> This is a resubmission of the original proposal because I made significant code changes during the discussion of the previous merge proposal and I thought it could use a fresh review.
>
> This merge proposal fixes bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32. However, this caused various tests to fail due to the command line parser in win32utils, UnicodeShlex, purposefully not handling single quotes. It also sometimes escaped spaces with a backslash and sometimes did not.
>
> So I decided to re-implement command line parsing in win32utils to match the way the Win32 API function, CommandLineToArgvW, parses the command line, particularly in regards to how it handles backslashes and their interaction with double quotes. I couldn't just use the CommandLineToArgvW function itself because command_line_to_argv needs to know which arguments were quoted and which were not so that it can be perform the appropriate (optional) wildcard expansion.

I haven't looked at the code, but I'd really like to have 1 standard
parser for the command line. If we need/want to do it differently than
it was done so far, then we should migrate to the new method. (wrt
escaping, etc.)

I don't really want to second guess what goes into "--using" versus what
is done on the command line, versus ...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktEliYACgkQJdeBCYSNAAPfJQCg1T3/6uPLidv2gDZsz3hpG3gJ
YuwAnj/ocg6CAiSjAfwjgt7lBacZRKSc
=17q1
-----END PGP SIGNATURE-----

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-06:

#

> I haven't looked at the code, but I'd really like to have 1 standard
> parser for the command line. If we need/want to do it differently than
> it was done so far, then we should migrate to the new method. (wrt
> escaping, etc.)

Do you mean one standard parser across all platforms?

> I don't really want to second guess what goes into "--using" versus what
> is done on the command line, versus ...

If I understand the code correctly, the bzr commandline parsing on win32 should be using this new code as well. I'll double-check though.

Revision history for this message

Martin Packman (gz) wrote on 2010-01-06:

#

I've not read over this change in detail yet, but a few comments.

> So I decided to re-implement command line parsing in win32utils to match the
> way the Win32 API function, CommandLineToArgvW, parses the command line,
> particularly in regards to how it handles backslashes and their interaction
> with double quotes.

I approve of this idea in principle.

> I've updated the various tests affected by these changes to handle the
> differences in how win32 handles quoting (no single quotes) and escapes (can't
> escape a space with a backslash). I've also added a test in test_diff for the
> specific case in bug 392428, a diff tool specified as a win32 path with
> backslashes.

This strikes me as good too, however I think to make this continue to work, rather than using platform-checks in shell-like tests, they need to be written in a portable subset. This should be pretty easy to enforce if they get run through the CommandLineToArgv-style parser on all platforms, and rejected if they pull any crazy bash-tricks.

review: Needs Fixing

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-06:

#

> This strikes me as good too, however I think to make this continue to work,
> rather than using platform-checks in shell-like tests, they need to be written
> in a portable subset. This should be pretty easy to enforce if they get run
> through the CommandLineToArgv-style parser on all platforms, and rejected if
> they pull any crazy bash-tricks.

They're not exactly crazy bash-tricks. They have to test that on non-win32 systems, bzr supports things like quoting with single quotes and backslash as the escape character, since users of those platforms will expect those platforms' conventions to work.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-13:

#

58 def test_from_string_u5(self):
59 - diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
60 + # win32 doesn't support escaping spaces with backslashes
61 + if sys.platform == 'win32':
62 + tool = 'diff "-u 5"'
63 + else:
64 + tool = 'diff -u\\ 5'
65 + diff_obj = DiffFromTool.from_string(tool, None, None, None)

^- I wouldn't have a problem supporting "\ " turning into an escaped space (non-breaking space). You can't put ' ' at the beginning of files in Windows anyway. (well not entirely true. Creating them via explorer will just remove the opening space, but other programs *can* create the file on disk.) It is unlikely enough, though that I would be fine with it.

Also, you *could* only support this when not currently in a quoted context. However, you don't have to implement this if it is a pain.

85 === modified file 'bzrlib/tests/test_rules.py'
86 --- bzrlib/tests/test_rules.py 2009-03-23 14:59:43 +0000
87 +++ bzrlib/tests/test_rules.py 2010-01-06 05:32:23 +0000
88 @@ -63,8 +63,12 @@
89 rs.get_selected_items('a.txt', ['foo']))
90
91 def test_get_items_from_multiple_glob_match(self):
92 - rs = self.make_searcher(
93 - "[name *.txt *.py 'x x' \"y y\"]\nfoo=bar\na=True\n")
94 + # win32 doesn't support single quotes for quoting args with spaces
95 + if sys.platform == 'win32':
96 + text = '[name *.txt *.py "x x" "y y"]\nfoo=bar\na=True\n'
97 + else:
98 + text = """[name *.txt *.py 'x x' "y y"]\nfoo=bar\na=True\n"""
99 + rs = self.make_searcher(text)
100 self.assertEquals((), rs.get_items('NEWS'))
101 self.assertEquals((('foo', 'bar'), ('a', 'True')),
102 rs.get_items('a.py'))

^- This concerns me a fair amount. Specifically, it is likely that "rules" files will end up as a versioned property (.bzrmeta/rules) or some other meta-data. As such, we *really* want them to be consistent across platforms.

How hard would it be to update UnicodeCommandlineSplitter to have optional support for ' chars? And then we would just disable it for the actual command line processing. (Again, *I* would rather support it always, but consensus seems to be against it.)

Some small formatting issues. PEP8 says that you should put 2 blank lines between class definitions (or any module-level definitions). So add some lines in places like this:
275 + def __iter__(self):
276 + return self
277 +
278 +class _Whitespace(object):

I don't think there is a specific need to change the name from UnicodeShlex to UnicodeComandlineSplitter. The functionality hasn't changed. As near as I can tell you just moved the "state" into helper classes, rather than a class var + if blocks.

I'm happy to help this along if you think you need some help. (I think it is generally good, and just needs some tweaks before landing. Some of which we can chose not to do.)

58	 def test_from_string_u5(self):
59	- diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
60	+ # win32 doesn't support escaping spaces with backslashes
61	+ if sys.platform == 'win32':
62	+ tool = 'diff "-u 5"'
63	+ else:
64	+ tool = 'diff -u\\ 5'
65	+ diff_obj = DiffFromTool.from_string(tool, None, None, None)

^- I wouldn't have a problem supporting "\ " turning into an escaped space (non-breaking space). You can't put ' ' at the beginning of files in Windows anyway. (well not entirely true. Creating them via explorer will just remove the opening space, but other programs *can* create the file on disk.) It is unlikely enough, though that I would be fine with it.

Also, you *could* only support this when not currently in a quoted context. However, you don't have to implement this if it is a pain.

85	=== modified file 'bzrlib/tests/test_rules.py'
86	--- bzrlib/tests/test_rules.py 2009-03-23 14:59:43 +0000
87	+++ bzrlib/tests/test_rules.py 2010-01-06 05:32:23 +0000
88	@@ -63,8 +63,12 @@
89	rs.get_selected_items('a.txt', ['foo']))
90	
91	def test_get_items_from_multiple_glob_match(self):
92	- rs = self.make_searcher(
93	- "[name *.txt *.py 'x x' \"y y\"]\nfoo=bar\na=True\n")
94	+ # win32 doesn't support single quotes for quoting args with spaces
95	+ if sys.platform == 'win32':
96	+ text = '[name *.txt *.py "x x" "y y"]\nfoo=bar\na=True\n'
97	+ else:
98	+ text = """[name *.txt *.py 'x x' "y y"]\nfoo=bar\na=True\n"""
99	+ rs = self.make_searcher(text)
100	self.assertEquals((), rs.get_items('NEWS'))
101	self.assertEquals((('foo', 'bar'), ('a', 'True')),
102	rs.get_items('a.py'))

^- This concerns me a fair amount. Specifically, it is likely that "rules" files will end up as a versioned property (.bzrmeta/rules) or some other meta-data. As such, we *really* want them to be consistent across platforms.

How hard would it be to update UnicodeCommandlineSplitter to have optional support for ' chars? And then we would just disable it for the actual command line processing. (Again, *I* would rather support it always, but consensus seems to be against it.)

Some small formatting issues. PEP8 says that you should put 2 blank lines between class definitions (or any module-level definitions). So add some lines in places like this:
275	+ def __iter__(self):
276	+ return self
277	+
278	+class _Whitespace(object):

I don't think there is a specific need to change the name from UnicodeShlex to UnicodeComandlineSplitter.  The functionality hasn't changed. As near as I can tell you just moved the "state" into helper classes, rather than a class var + if blocks.

I'm happy to help this along if you think you need some help. (I think it is generally good, and just needs some tweaks before landing. Some of which we can chose not to do.)

review: Needs Fixing

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-13:

#

Thanks for the feedback, John. I'm glad to have specific issues to focus on. :)

> ^- I wouldn't have a problem supporting "\ " turning into an escaped space
> (non-breaking space). You can't put ' ' at the beginning of files in Windows
> anyway. (well not entirely true. Creating them via explorer will just remove
> the opening space, but other programs *can* create the file on disk.) It is
> unlikely enough, though that I would be fine with it.
>
> Also, you *could* only support this when not currently in a quoted context.
> However, you don't have to implement this if it is a pain.

There would be problems with commandlines like:

bzr status C:\mybranch\ C:\myotherbranch

Allowing backslashes to escape spaces would cause the 3rd and 4th arguments to be treated as a single argument "C:\mybranch\ C:\myotherbranch".

The way you escape spaces on Windows is to quote the whole argument with double quotes.

> ^- This concerns me a fair amount. Specifically, it is likely that "rules"
> files will end up as a versioned property (.bzrmeta/rules) or some other meta-
> data. As such, we *really* want them to be consistent across platforms.

This concerned me as well but I was unsure what would be an acceptable way to resolve it.

> How hard would it be to update UnicodeCommandlineSplitter to have optional
> support for ' chars? And then we would just disable it for the actual command
> line processing. (Again, *I* would rather support it always, but consensus
> seems to be against it.)

Optional support for single quotes should be doable. And not allowing it when processing the bzr commandline on Windows should be doable as well, if I'm understanding the code correctly.

>
> Some small formatting issues. PEP8 says that you should put 2 blank lines
> between class definitions (or any module-level definitions).

Ok.

> I don't think there is a specific need to change the name from UnicodeShlex to
> UnicodeComandlineSplitter. The functionality hasn't changed. As near as I can
> tell you just moved the "state" into helper classes, rather than a class var +
> if blocks.

I changed it to distance it from the Unix-like shlex, but I have no strong preference on the matter. I'll change it back.

> I'm happy to help this along if you think you need some help. (I think it is
> generally good, and just needs some tweaks before landing. Some of which we
> can chose not to do.)

I think I'm good. I just needed some specific criticism. :)

Thanks for the feedback, John. I'm glad to have specific issues to focus on. :)

> ^- I wouldn't have a problem supporting "\ " turning into an escaped space
> (non-breaking space). You can't put ' ' at the beginning of files in Windows
> anyway. (well not entirely true. Creating them via explorer will just remove
> the opening space, but other programs *can* create the file on disk.) It is
> unlikely enough, though that I would be fine with it.
> 
> Also, you *could* only support this when not currently in a quoted context.
> However, you don't have to implement this if it is a pain.

There would be problems with commandlines like:

bzr status C:\mybranch\ C:\myotherbranch

Allowing backslashes to escape spaces would cause the 3rd and 4th arguments to be treated as a single argument "C:\mybranch\ C:\myotherbranch".

The way you escape spaces on Windows is to quote the whole argument with double quotes.

> ^- This concerns me a fair amount. Specifically, it is likely that "rules"
> files will end up as a versioned property (.bzrmeta/rules) or some other meta-
> data. As such, we *really* want them to be consistent across platforms.

This concerned me as well but I was unsure what would be an acceptable way to resolve it.

> How hard would it be to update UnicodeCommandlineSplitter to have optional
> support for ' chars? And then we would just disable it for the actual command
> line processing. (Again, *I* would rather support it always, but consensus
> seems to be against it.)

Optional support for single quotes should be doable. And not allowing it when processing the bzr commandline on Windows should be doable as well, if I'm understanding the code correctly.

> 
> Some small formatting issues. PEP8 says that you should put 2 blank lines
> between class definitions (or any module-level definitions).

Ok.

> I don't think there is a specific need to change the name from UnicodeShlex to
> UnicodeComandlineSplitter.  The functionality hasn't changed. As near as I can
> tell you just moved the "state" into helper classes, rather than a class var +
> if blocks.

I changed it to distance it from the Unix-like shlex, but I have no strong preference on the matter. I'll change it back.

> I'm happy to help this along if you think you need some help. (I think it is
> generally good, and just needs some tweaks before landing. Some of which we
> can chose not to do.)

I think I'm good. I just needed some specific criticism. :)

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-14:

#

The suggested changes have been made. TestDiffTool.test_from_string_u5 doesn't use a backslash to escape the space anymore. Instead, I changed it to use "diff '-u 5'" instead and remove the platform check, since the purpose of the test is not to verify the specifics of the escaping support but merely that it is able to use a non-trivial commandline as a diff tool.

Revision history for this message

Martin Packman (gz) wrote on 2010-01-14:

#

> The suggested changes have been made. TestDiffTool.test_from_string_u5 doesn't
> use a backslash to escape the space anymore. Instead, I changed it to use
> "diff '-u 5'" instead and remove the platform check, since the purpose of the
> test is not to verify the specifics of the escaping support but merely that it
> is able to use a non-trivial commandline as a diff tool.

That was the point I was trying to get at earlier, you don't want to be having to manually mark tests skipped because someone used the wrong sort of essentially interchangeable quoting mechanism. However, your fix should surely be 'diff "-u 5"' not "diff '-u 5'" if I understand the intention correctly.

I think there needs to be a mechanism for people who are writing tests on nix platforms to bash rules letting them know if their test isn't portable.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-14:

#

> That was the point I was trying to get at earlier, you don't want to be having
> to manually mark tests skipped because someone used the wrong sort of
> essentially interchangeable quoting mechanism. However, your fix should surely
> be 'diff "-u 5"' not "diff '-u 5'" if I understand the intention correctly.

In this case, I used '' specifically because in the diff tool setting it should be supported by both platforms, whereas it wouldn't be in normal bzr commandline processing. However, I see your point, that sort of testing should be done in test_commands.

> I think there needs to be a mechanism for people who are writing tests on nix
> platforms to bash rules letting them know if their test isn't portable.

I think that's outside of the scope of my changes here.

I have just noticed that I forgot to fix test_commands, so more changes incoming.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-14:

#

I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.

There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-14:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.
>
> There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

I don't really know about "use_last_error", but I'll also note that we
manually build up a prototype, but I can do

func = ctypes.windll.kernel32.GetCommandLineW

and get a _FuncPtr object in both python2.5 and 2.6. (In 2.4 you need to
install ctypes manually, and we don't use it for the standard Windows
installer anyway.)

I'm wondering what we are gaining by creating our own prototype.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktPXqgACgkQJdeBCYSNAAMOdgCeLpCrOLWeHSwMhYuf/sCcZUX9
/2QAoLdKmiwp12xZ24RQLra7vLFtfgge
=casy
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-14:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.
>
> There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

review time...

1) I'd like to get away from using 'shlex.split' on all platforms. the
'utf-8' hack is a bit ugly. It will mean losing some support for '\' in
some circumstances, but I'd rather have configuration (espec, stuff like
'rules') work consistently everywhere. (Do you think UnicodeShlex is a
reasonable replacement?)

2)
+_whitespace_match = re.compile(u'\s').match
+

^- Do we want flags=re.UNICODE? Quick testing shows that it isn't set
for u'\s':

re.match(u'\s', u'\u2028') => None
re.match(u'\s', u'\u2028', re.UNICODE) => Match

For those that don't know u'\u2028' is the Unicode "Line Separator"
(u'\N{LINE SEPARATOR}') character.
http://unicode.org/standard/reports/tr13/tr13-5.html
(of course, it doesn't help that Unicode still defines 2 separators,
with a new \N{PARAGRAPH SEPARATOR}... :(

That said, I don't really think we care about breaking command lines on
u'\2028', so maybe it doesn't matter...

The rest looks good to me. So the big question is (1)? Can we get away
with replacing shlex entirely and make things a bit more consistent?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktPYTcACgkQJdeBCYSNAAPboACeNXvqObazzbOnMaHSkkxuBClQ
EO0AoLHKIlKWLOKSB5pPdU0RN3qO+cUK
=EoNE
-----END PGP SIGNATURE-----

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-14:

#

> -----BEGIN PGP SIGNED MESSAGE-----
> func = ctypes.windll.kernel32.GetCommandLineW
>
> and get a _FuncPtr object in both python2.5 and 2.6. (In 2.4 you need to
> install ctypes manually, and we don't use it for the standard Windows
> installer anyway.)
>
> I'm wondering what we are gaining by creating our own prototype.

An actual string return type instead of an int (i.e. pointer value).

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-14:

#

> 1) I'd like to get away from using 'shlex.split' on all platforms. the
> 'utf-8' hack is a bit ugly. It will mean losing some support for '\' in
> some circumstances, but I'd rather have configuration (espec, stuff like
> 'rules') work consistently everywhere. (Do you think UnicodeShlex is a
> reasonable replacement?)

When I re-implemented UnicodeShlex, I rewrote it from scratch with solely Win32 commandline parsing compatibility in mind. I'd have to read through shlex.split to see what is required for posix compatibility but I can immediately point out that it does not allow backslashes as an escape character. The Win32 API function CommandLineToArgv has some odd rules on how backslashes are treated and UnicodeShlex would have to operate in a substantially different mode to achieve posix compatibility.

However, I think it should be doable but I believe it should be a separate step in another merge proposal after this one. My main aim here is to fix the bug with as little disruption to the rest of bzr.

> That said, I don't really think we care about breaking command lines on
> u'\2028', so maybe it doesn't matter...

It's a simple enough change.

> The rest looks good to me. So the big question is (1)? Can we get away
> with replacing shlex entirely and make things a bit more consistent?

I like the idea but I think it needs to be done in two phases.

Revision history for this message

Martin Packman (gz) wrote on 2010-01-19:

#

> I think that's outside of the scope of my changes here.

Okay.

> +_whitespace_match = re.compile(u'\s').match
> +
>
> ^- Do we want flags=re.UNICODE?

No, in fact it'd more correctly be [\x00-\x20] as that's all CommandLineToArgvW cares about. It's hard to imagine a scenario where it'd actually matter though.

review: Abstain

Revision history for this message

Martin Packman (gz) wrote on 2010-01-19:

#

Correction, [\x01-\x20] obviously.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-20:

#

So, I'd like to look into making UnicodeShlex work for non-win32 platforms but I don't know if I'll have it done for 2.1.0. So my question is do we want to merge the current proposal to fix bug 392428 and I make a separate proposal later which replaces that and the standard shlex with the new all-seeing, all-powerful, UnicodeShlex?

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-21:

#

I guess this answers my question:

** Changed in: bzr
Milestone: 2.1.0rc1 => None

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-22:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I guess this answers my question:
>
> ** Changed in: bzr
> Milestone: 2.1.0rc1 => None
>
>

Sorry about that. I did want to get it in, but just ran out of time to
get 2.1.0rc1 out the door. Once I get the installers and stuff built,
this is on the top of my todo list.

John
=:->-
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZ8hgACgkQJdeBCYSNAAOTUgCgzsn/HRP6Jc+zfGOPiMaIvTY5
5scAoJgq5XltueEVnpiiOiMhghdX126i
=qbxs
-----END PGP SIGNATURE-----

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-22:

#

On 22/01/2010 1:45 PM, John A Meinel wrote:
> Sorry about that. I did want to get it in, but just ran out of time to
> get 2.1.0rc1 out the door. Once I get the installers and stuff built,
> this is on the top of my todo list.

So this is going into 2.2 then?

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-22:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> On 22/01/2010 1:45 PM, John A Meinel wrote:
>> Sorry about that. I did want to get it in, but just ran out of time to
>> get 2.1.0rc1 out the door. Once I get the installers and stuff built,
>> this is on the top of my todo list.
>
> So this is going into 2.2 then?
>

I'm thinking to land this into trunk which is now the 2.2 series, yes.
This was never rejected, I just didn't get a chance to land it in time
for 2.1.0rc1 and it seems slightly too much of a change between an rc1
and a 2.1.0 final.

I could be convinced otherwise if other developers feel it could go into
2.1.0.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZ/b0ACgkQJdeBCYSNAANUvgCdFNtlrWkWonnVXNbzh1oxB96X
EpUAn0JPfw3kMjcGVwgC8XlRZkGIYLbe
=mJxT
-----END PGP SIGNATURE-----

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-01-22:

#

On 22/01/2010 2:36 PM, John A Meinel wrote:
> I'm thinking to land this into trunk which is now the 2.2 series, yes.
> This was never rejected, I just didn't get a chance to land it in time
> for 2.1.0rc1 and it seems slightly too much of a change between an rc1
> and a 2.1.0 final.
>
> I could be convinced otherwise if other developers feel it could go into
> 2.1.0.

I'm not too fussed so long as I know what is needed of me. :)

My only concern was whether the bug was considered important enough to
try getting fixed for 2.1.0. Although perhaps if it's landed into 2.2
and after a while is considered safe enough, it could be backported to a
2.1.x bugfix release?

Revision history for this message

Martin Pool (mbp) wrote on 2010-01-27:

#

2010/1/22 Gordon Tyler <email address hidden>:
> I'm not too fussed so long as I know what is needed of me. :)
>
> My only concern was whether the bug was considered important enough to
> try getting fixed for 2.1.0. Although perhaps if it's landed into 2.2
> and after a while is considered safe enough, it could be backported to a
> 2.1.x bugfix release?

Yes, that would be ok.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-02-09:

#

Do we have agreement to land this patch ? The comments seem to imply
it's ok to land on trunk but there is still no 'Approve' vote.

The patch history is consequent which will make a review from scratch
a bit tedious, I'd be more incline to do a final one if the patch
was already approved once.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-02-09:

#

On 2/9/2010 4:12 AM, Vincent Ladeuil wrote:
> Do we have agreement to land this patch ? The comments seem to imply
> it's ok to land on trunk but there is still no 'Approve' vote.
>
> The patch history is consequent which will make a review from scratch
> a bit tedious, I'd be more incline to do a final one if the patch
> was already approved once.

I'm actually working on a new merge proposal at John's suggestion to
create a replacement for shlex which works for all platforms. So I guess
this merge proposal has become redundant.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-02-09:

#

If Gordon says he's working on an updated version, I'm going to bump it back to Work in Progress for now. Though we could probably land this in bzr.dev and let him update it from there.

Revision history for this message

Gordon Tyler (doxxx) wrote on 2010-02-12:

#

Superseded by https://code.launchpad.net/~doxxx/bzr/cmdline-splitter/+merge/19160.

Bazaar

Merge lp:~doxxx/bzr/392428 into lp:bzr

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'bzrlib/commands.py'
 --- bzrlib/commands.py	2010-01-11 13:15:01 +0000
 +++ bzrlib/commands.py	2010-01-15 13:23:14 +0000
@@ -896,8 +896,13 @@
  def shlex_split_unicode(unsplit):
--    import shlex
--    return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
++    if sys.platform == "win32":
++        from bzrlib.win32utils import command_line_to_argv
++        return command_line_to_argv(unsplit, wildcard_expansion=False,
++                                    single_quotes_allowed=True)
++    else:
++        import shlex
++        return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
  def get_alias(cmd, config=None):
 === modified file 'bzrlib/tests/test_commands.py'
 --- bzrlib/tests/test_commands.py	2009-05-23 21:01:51 +0000
 +++ bzrlib/tests/test_commands.py	2010-01-15 13:23:14 +0000
@@ -111,7 +111,7 @@
      def test_unicode(self):
          my_config = self._get_config("[ALIASES]\n"
--            u"iam=whoami 'Erik B\u00e5gfors <erik@bagfors.nu>'\n")
++            u'iam=whoami "Erik B\u00e5gfors <erik@bagfors.nu>"\n')
          self.assertEqual([u'whoami', u'Erik B\u00e5gfors <erik@bagfors.nu>'],
                            commands.get_alias("iam", config=my_config))
 === modified file 'bzrlib/tests/test_diff.py'
 --- bzrlib/tests/test_diff.py	2009-12-22 15:50:40 +0000
 +++ bzrlib/tests/test_diff.py	2010-01-15 13:23:14 +0000
@@ -45,6 +45,8 @@
  from bzrlib.revisiontree import RevisionTree
  from bzrlib.revisionspec import RevisionSpec
++from bzrlib.tests.test_win32utils import BackslashDirSeparatorFeature
++
  class _AttribFeature(Feature):
@@ -1292,12 +1294,22 @@
              diff_obj.command_template)
      def test_from_string_u5(self):
--        diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
++        diff_obj = DiffFromTool.from_string('diff "-u 5"', None, None, None)
          self.addCleanup(diff_obj.finish)
          self.assertEqual(['diff', '-u 5', '@old_path', '@new_path'],
                           diff_obj.command_template)
          self.assertEqual(['diff', '-u 5', 'old-path', 'new-path'],
                           diff_obj._get_command('old-path', 'new-path'))
++
++    def test_from_string_path_with_backslashes(self):
++        self.requireFeature(BackslashDirSeparatorFeature)
++        tool = 'C:\\Tools\\Diff.exe'
++        diff_obj = DiffFromTool.from_string(tool, None, None, None)
++        self.addCleanup(diff_obj.finish)
++        self.assertEqual(['C:\\Tools\\Diff.exe', '@old_path', '@new_path'],
++                         diff_obj.command_template)
++        self.assertEqual(['C:\\Tools\\Diff.exe', 'old-path', 'new-path'],
++                         diff_obj._get_command('old-path', 'new-path'))
      def test_execute(self):
          output = StringIO()
 === modified file 'bzrlib/tests/test_win32utils.py'
 --- bzrlib/tests/test_win32utils.py	2009-11-20 16:42:28 +0000
 +++ bzrlib/tests/test_win32utils.py	2010-01-15 13:23:14 +0000
@@ -291,11 +291,11 @@
          win32utils.set_file_attr_hidden(path)
--
  class TestUnicodeShlex(tests.TestCase):
--    def assertAsTokens(self, expected, line):
--        s = win32utils.UnicodeShlex(line)
++    def assertAsTokens(self, expected, line, single_quotes_allowed=False):
++        s = win32utils.UnicodeShlex(line,
++                                    single_quotes_allowed=single_quotes_allowed)
          self.assertEqual(expected, list(s))
      def test_simple(self):
@@ -312,16 +312,22 @@
          self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar  ')
      def test_posix_quotations(self):
--        self.assertAsTokens([(True, u'foo bar')], u'"foo bar"')
--        self.assertAsTokens([(False, u"'fo''o"), (False, u"b''ar'")],
--            u"'fo''o b''ar'")
--        self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"')
--        self.assertAsTokens([(True, u"fo'o"), (True, u"b'ar")],
--            u'"fo"\'o b\'"ar"')
++        self.assertAsTokens([(True, u'foo bar')], u"'foo bar'",
++            single_quotes_allowed=True)
++        self.assertAsTokens([(True, u'foo bar')], u"'fo''o b''ar'",
++            single_quotes_allowed=True)
++        self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"',
++            single_quotes_allowed=True)
++        self.assertAsTokens([(True, u'foo bar')], u'"fo"\'o b\'"ar"',
++            single_quotes_allowed=True)
      def test_nested_quotations(self):
          self.assertAsTokens([(True, u'foo"" bar')], u"\"foo\\\"\\\" bar\"")
          self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"")
++        self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"",
++            single_quotes_allowed=True)
++        self.assertAsTokens([(True, u'foo"" bar')], u"'foo\"\" bar'",
++            single_quotes_allowed=True)
      def test_empty_result(self):
          self.assertAsTokens([], u'')
@@ -330,6 +336,7 @@
      def test_quoted_empty(self):
          self.assertAsTokens([(True, '')], u'""')
          self.assertAsTokens([(False, u"''")], u"''")
++        self.assertAsTokens([(True, '')], u"''", single_quotes_allowed=True)
      def test_unicode_chars(self):
          self.assertAsTokens([(False, u'f\xb5\xee'), (False, u'\u1234\u3456')],
@@ -337,25 +344,36 @@
      def test_newline_in_quoted_section(self):
          self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u'"foo\nbar\nbaz\n"')
++        self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u"'foo\nbar\nbaz\n'",
++            single_quotes_allowed=True)
      def test_escape_chars(self):
          self.assertAsTokens([(False, u'foo\\bar')], u'foo\\bar')
      def test_escape_quote(self):
          self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
++        self.assertAsTokens([(True, u'foo\\"bar')], u'"foo\\\\\\"bar"')
++        self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\"bar"')
      def test_double_escape(self):
--        self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
++        self.assertAsTokens([(True, u'foo\\\\bar')], u'"foo\\\\bar"')
          self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
++
++    def test_multiple_quoted_args(self):
++        self.assertAsTokens([(True, u'x x'), (True, u'y y')],
++            u'"x x" "y y"')
++        self.assertAsTokens([(True, u'x x'), (True, u'y y')],
++            u'"x x" \'y y\'', single_quotes_allowed=True)
  class Test_CommandLineToArgv(tests.TestCaseInTempDir):
--    def assertCommandLine(self, expected, line):
++    def assertCommandLine(self, expected, line, single_quotes_allowed=False):
          # Strictly speaking we should respect parameter order versus glob
          # expansions, but it's not really worth the effort here
--        self.assertEqual(expected,
--                         sorted(win32utils._command_line_to_argv(line)))
++        argv = win32utils.command_line_to_argv(line,
++                single_quotes_allowed=single_quotes_allowed)
++        self.assertEqual(expected, sorted(argv))
      def test_glob_paths(self):
          self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
@@ -371,19 +389,25 @@
          self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
          self.assertCommandLine([u'a/*.c'], '"a/*.c"')
          self.assertCommandLine([u"'a/*.c'"], "'a/*.c'")
++        self.assertCommandLine([u'a/*.c'], "'a/*.c'",
++            single_quotes_allowed=True)
      def test_slashes_changed(self):
          # Quoting doesn't change the supplied args
          self.assertCommandLine([u'a\\*.c'], '"a\\*.c"')
++        self.assertCommandLine([u'a\\*.c'], "'a\\*.c'",
++            single_quotes_allowed=True)
          # Expands the glob, but nothing matches, swaps slashes
          self.assertCommandLine([u'a/*.c'], 'a\\*.c')
          self.assertCommandLine([u'a/?.c'], 'a\\?.c')
          # No glob, doesn't touch slashes
          self.assertCommandLine([u'a\\foo.c'], 'a\\foo.c')
--    def test_no_single_quote_supported(self):
++    def test_single_quote_support(self):
          self.assertCommandLine(["add", "let's-do-it.txt"],
              "add let's-do-it.txt")
++        self.assertCommandLine(["add", "lets do it.txt"],
++            "add 'lets do it.txt'", single_quotes_allowed=True)
      def test_case_insensitive_globs(self):
          self.requireFeature(tests.CaseInsCasePresFilenameFeature)
 === modified file 'bzrlib/win32utils.py'
 --- bzrlib/win32utils.py	2009-12-16 06:38:15 +0000
 +++ bzrlib/win32utils.py	2010-01-15 13:23:14 +0000
@@ -517,117 +517,161 @@
              trace.mutter('Unable to set hidden attribute on %r: %s', path, e)
++_whitespace_match = re.compile(u'\s', re.UNICODE).match
++
++
++class _PushbackSequence(object):
++    def __init__(self, orig):
++        self._iter = iter(orig)
++        self._pushback_buffer = []
++
++    def next(self):
++        if len(self._pushback_buffer) > 0:
++            return self._pushback_buffer.pop()
++        else:
++            return self._iter.next()
++
++    def pushback(self, char):
++        self._pushback_buffer.append(char)
++
++    def __iter__(self):
++        return self
++
++
++class _Whitespace(object):
++    def process(self, next_char, seq, context):
++        if _whitespace_match(next_char):
++            if len(context.token) > 0:
++                return None
++            else:
++                return self
++        elif (next_char == u'"'
++              or (context.single_quotes_allowed and next_char == u"'")):
++            context.quoted = True
++            return _Quotes(next_char, self)
++        elif next_char == u'\\':
++            return _Backslash(self)
++        else:
++            context.token.append(next_char)
++            return _Word()
++
++
++class _Quotes(object):
++    def __init__(self, quote_char, exit_state):
++        self.quote_char = quote_char
++        self.exit_state = exit_state
++
++    def process(self, next_char, seq, context):
++        if next_char == u'\\':
++            return _Backslash(self)
++        elif next_char == self.quote_char:
++            return self.exit_state
++        else:
++            context.token.append(next_char)
++            return self
++
++
++class _Backslash(object):
++    # See http://msdn.microsoft.com/en-us/library/bb776391(VS.85).aspx
++    def __init__(self, exit_state):
++        self.exit_state = exit_state
++        self.count = 1
++
++    def process(self, next_char, seq, context):
++        if next_char == u'\\':
++            self.count += 1
++            return self
++        elif next_char == u'"':
++            # 2N backslashes followed by '"' are N backslashes
++            context.token.append(u'\\' * (self.count/2))
++            # 2N+1 backslashes follwed by '"' are N backslashes followed by '"'
++            # which should not be processed as the start or end of quoted arg
++            if self.count % 2 == 1:
++                context.token.append(next_char) # odd number of '\' escapes the '"'
++            else:
++                seq.pushback(next_char) # let exit_state handle next_char
++            self.count = 0
++            return self.exit_state
++        else:
++            # N backslashes not followed by '"' are just N backslashes
++            if self.count > 0:
++                context.token.append(u'\\' * self.count)
++                self.count = 0
++            seq.pushback(next_char) # let exit_state handle next_char
++            return self.exit_state
++
++    def finish(self, context):
++        if self.count > 0:
++            context.token.append(u'\\' * self.count)
++
++
++class _Word(object):
++    def process(self, next_char, seq, context):
++        if _whitespace_match(next_char):
++            return None
++        elif (next_char == u'"'
++              or (context.single_quotes_allowed and next_char == u"'")):
++            return _Quotes(next_char, self)
++        elif next_char == u'\\':
++            return _Backslash(self)
++        else:
++            context.token.append(next_char)
++            return self
++
  class UnicodeShlex(object):
--    """This is a very simplified version of shlex.shlex.
--
--    The main change is that it supports non-ascii input streams. The internal
--    structure is quite simplified relative to shlex.shlex, since we aren't
--    trying to handle multiple input streams, etc. In fact, we don't use a
--    file-like api either.
--    """
--
--    def __init__(self, uni_string):
--        self._input = uni_string
--        self._input_iter = iter(self._input)
--        self._whitespace_match = re.compile(u'\s').match
--        self._word_match = re.compile(u'\S').match
--        self._quote_chars = u'"'
--        # self._quote_match = re.compile(u'[\'"]').match
--        self._escape_match = lambda x: None # Never matches
--        self._escape = '\\'
--        # State can be
--        #   ' ' - after whitespace, starting a new token
--        #   'a' - after text, currently working on a token
--        #   '"' - after ", currently in a "-delimited quoted section
--        #   "\" - after '\', checking the next char
--        self._state = ' '
--        self._token = [] # Current token being parsed
--
--    def _get_token(self):
--        # Were there quote chars as part of this token?
--        quoted = False
--        quoted_state = None
--        for nextchar in self._input_iter:
--            if self._state == ' ':
--                if self._whitespace_match(nextchar):
--                    # if self._token: return token
--                    continue
--                elif nextchar in self._quote_chars:
--                    self._state = nextchar # quoted state
--                elif self._word_match(nextchar):
--                    self._token.append(nextchar)
--                    self._state = 'a'
--                else:
--                    raise AssertionError('wtttf?')
--            elif self._state in self._quote_chars:
--                quoted = True
--                if nextchar == self._state: # End of quote
--                    self._state = 'a' # posix allows 'foo'bar to translate to
--                                      # foobar
--                elif self._state == '"' and nextchar == self._escape:
--                    quoted_state = self._state
--                    self._state = nextchar
--                else:
--                    self._token.append(nextchar)
--            elif self._state == self._escape:
--                if nextchar == '\\':
--                    self._token.append('\\')
--                elif nextchar == '"':
--                    self._token.append(nextchar)
--                else:
--                    self._token.append('\\' + nextchar)
--                self._state = quoted_state
--            elif self._state == 'a':
--                if self._whitespace_match(nextchar):
--                    if self._token:
--                        break # emit this token
--                    else:
--                        continue # no token to emit
--                elif nextchar in self._quote_chars:
--                    # Start a new quoted section
--                    self._state = nextchar
--                # escape?
--                elif (self._word_match(nextchar)
--                      or nextchar in self._quote_chars
--                      # or whitespace_split?
--                      ):
--                    self._token.append(nextchar)
--                else:
--                    raise AssertionError('state == "a", char: %r'
--                                         % (nextchar,))
--            else:
--                raise AssertionError('unknown state: %r' % (self._state,))
--        result = ''.join(self._token)
--        self._token = []
--        if not quoted and result == '':
--            result = None
--        return quoted, result
--
++    def __init__(self, command_line, single_quotes_allowed=False):
++        self._seq = _PushbackSequence(command_line)
++        self.single_quotes_allowed = single_quotes_allowed
++
      def __iter__(self):
          return self
--
++
      def next(self):
          quoted, token = self._get_token()
          if token is None:
              raise StopIteration
          return quoted, token
--
--
--def _command_line_to_argv(command_line):
--    """Convert a Unicode command line into a set of argv arguments.
--
--    This does wildcard expansion, etc. It is intended to make wildcards act
--    closer to how they work in posix shells, versus how they work by default on
--    Windows.
++
++    def _get_token(self):
++        self.quoted = False
++        self.token = []
++        state = _Whitespace()
++        for next_char in self._seq:
++            state = state.process(next_char, self._seq, self)
++            if state is None:
++                break
++        if not state is None and not getattr(state, 'finish', None) is None:
++            state.finish(self)
++        result = u''.join(self.token)
++        if not self.quoted and result == '':
++            result = None
++        return self.quoted, result
++
++
++def command_line_to_argv(command_line, wildcard_expansion=True,
++                         single_quotes_allowed=False):
++    """Convert a Unicode command line into a list of argv arguments.
++
++    This optionally does wildcard expansion, etc. It is intended to make
++    wildcards act closer to how they work in posix shells, versus how they
++    work by default on Windows. Quoted arguments are left untouched.
++
++    :param command_line: The unicode string to split into an arg list.
++    :param wildcard_expansion: Whether wildcard expansion should be applied to
++                               each argument. True by default.
++    :param single_quotes_allowed: Whether single quotes are accepted as quoting
++                                  characters like double quotes. False by
++                                  default.
++    :return: A list of unicode strings.
      """
--    s = UnicodeShlex(command_line)
--    # Now that we've split the content, expand globs
++    s = UnicodeShlex(command_line, single_quotes_allowed=single_quotes_allowed)
++    # Now that we've split the content, expand globs if necessary
      # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like
      #       '**/' style globs
      args = []
      for is_quoted, arg in s:
--        if is_quoted or not glob.has_magic(arg):
++        if is_quoted or not glob.has_magic(arg) or not wildcard_expansion:
              args.append(arg)
          else:
              args.extend(glob_one(arg))
@@ -636,16 +680,14 @@
  if has_ctypes and winver != 'Windows 98':
      def get_unicode_argv():
--        LPCWSTR = ctypes.c_wchar_p
--        INT = ctypes.c_int
--        POINTER = ctypes.POINTER
--        prototype = ctypes.WINFUNCTYPE(LPCWSTR)
--        GetCommandLine = prototype(("GetCommandLineW",
--                                    ctypes.windll.kernel32))
--        prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))
--        command_line = GetCommandLine()
++        prototype = ctypes.WINFUNCTYPE(ctypes.c_wchar_p)
++        GetCommandLineW = prototype(("GetCommandLineW",
++                                     ctypes.windll.kernel32))
++        command_line = GetCommandLineW()
++        if command_line is None:
++            raise ctypes.WinError()
          # Skip the first argument, since we only care about parameters
--        argv = _command_line_to_argv(command_line)[1:]
++        argv = command_line_to_argv(command_line)[1:]
          if getattr(sys, 'frozen', None) is None:
              # Invoked via 'python.exe' which takes the form:
              #   python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]