Merge lp:~doxxx/bzr/392428 into lp:bzr

Proposed by Gordon Tyler
Status: Merged
Merged at revision: not available
Proposed branch: lp:~doxxx/bzr/392428
Merge into: lp:bzr
Diff against target: 480 lines (+207/-124)
5 files modified
bzrlib/commands.py (+7/-2)
bzrlib/tests/test_commands.py (+1/-1)
bzrlib/tests/test_diff.py (+13/-1)
bzrlib/tests/test_win32utils.py (+38/-14)
bzrlib/win32utils.py (+148/-106)
To merge this branch: bzr merge lp:~doxxx/bzr/392428
Reviewer Review Type Date Requested Status
Martin Packman (community) Abstain
John A Meinel Needs Fixing
Alexander Belchenko Pending
Review via email: mp+16893@code.launchpad.net

This proposal supersedes a proposal from 2009-12-21.

To post a comment you must log in.
Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.

I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.

Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:

bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
bzrdev diff --using C:\Tools\diff.cmd

'bzrdev selftest win32utils' still passes.

Revision history for this message
Alexander Belchenko (bialix) wrote : Posted in a previous version of this proposal

For non-windows it's better to use the helper function from commands.py: shlex_split_unicode. But even so it's OK for me.

Thanks!

review: Approve
Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

> For non-windows it's better to use the helper function from commands.py:
> shlex_split_unicode. But even so it's OK for me.

That's the function I modified to use a different codepath on Windows.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> Gordon Tyler has proposed merging lp:~doxxx/bzr/392428 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #392428 `bzr diff --using C:\foo\bar` does not work
> https://bugs.launchpad.net/bugs/392428
>
>
> Fixed bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32.
>
> I also cleaned up the win32utils.get_unicode_argv function. There was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I added error handling as well.
>
> Not sure how to write tests for this. I tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:
>
> bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
> bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
> bzrdev diff --using C:\Tools\diff.cmd
>
> 'bzrdev selftest win32utils' still passes.
>
>

I think command_line_to_argv is going to glob, and you probably don't
want that. It may not matter in practice, but its a thought.

The other changes seem gratuitous, but not specifically wrong.

As for testing...

1) Make sure any existing tests of 'shlex_split_unicode' aren't being
skipped on windows

2) Probably add a win32 specific test that when splitting "C:\foo\bar"
we preserve the '\' characters.

I'd like to see at least a smoke test added, but the code is simple
enough you don't have to be super thorough.

 review: needsfixing

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksv2mcACgkQJdeBCYSNAAN+WgCgwHvIP3/4KsE5JRpStpQuwk4i
uLEAoNmmJOXEPjwsaTg5VPSjbhrzW9qq
=ml5g
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

UnicodeShlex explicitly does not deal with ' because I was asked not to.
Alexander Belchenko feels that single quotes are not "Windows". (As
people can do:
  touch don'tdothis
  bzr add don'tdothis
)

*I* wanted ' to be a quote char. Just to mention that adding them in is
not just a bug, it is by design that we don't handle them.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksynfkACgkQJdeBCYSNAAPtWwCgrDby/wk6EopdNiwfj6V4Mxp4
wq4An2wJzn9ifGlLklYaQUOnYxLsvN3c
=cadk
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

2009/12/24 Gordon Tyler <email address hidden>:
> So it would seem that UnicodeShlex in win32utils.py doesn't handle single quotes properly which is making some tests in test_commands.py fail. Adding support for single quites to UnicodeShlex makes a lot of the tests in test_win32utils fail because they're expecting UnicodeShlex to not handle single quotes. Once I've fixed those, I think everything should be good.

Which ones? I don't see anything obviously relevant.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

C:\dev\bzr\392428>bzrdev selftest commands
testing: C:/dev/bzr/392428/bzr
   C:\dev\bzr\392428\bzrlib
   bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1

======================================================================
FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
le_quotes
    commands.get_alias("diff", config=my_config))
AssertionError: not equal:
a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
b = [u'diff',
 u'-r',
 u'-2..-1',
 u'--diff-options',
 u"'--strip-trailing-cr",
 u"-wp'"]

======================================================================
FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
ode
    commands.get_alias("iam", config=my_config))
AssertionError: not equal:
a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]

----------------------------------------------------------------------
Ran 41 tests in 0.547s

FAILED (failures=2)
1 test skipped
Missing feature 'FTPServer' skipped 18 tests.

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

OK.

So I think that unless we change the approach to quoting on Windows,
we probably need to make the tests skip on Windows, and add a comment
about why.

The fact that single quotes are accepted in configuration files (as
opposed to the command line) on unix but not on Windows is a bit ugly.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

I've pushed an update which makes test_single_quotes skip on win32 and fixes test_unicode for win32 by making it use double-quotes instead of single-quotes (the type of quote doesn't really matter for that specific test).

So now both test_win32utils and test_commands pass. I'm running full selftest now to see if anything else related fails.

Revision history for this message
Alexander Belchenko (bialix) wrote : Posted in a previous version of this proposal

These two failures are wrong by itself and should not be affected by
single quote vs double quote differences.

Gordon Tyler пишет:
> C:\dev\bzr\392428>bzrdev selftest commands
> testing: C:/dev/bzr/392428/bzr
> C:\dev\bzr\392428\bzrlib
> bzr-2.1.0dev5 python-2.6.4 Windows-Vista-6.0.6001-SP1
>
>
> ======================================================================
> FAIL: test_single_quotes (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 102, in test_sing
> le_quotes
> commands.get_alias("diff", config=my_config))
> AssertionError: not equal:
> a = [u'diff', u'-r', u'-2..-1', u'--diff-options', u'--strip-trailing-cr -wp']
> b = [u'diff',
> u'-r',
> u'-2..-1',
> u'--diff-options',
> u"'--strip-trailing-cr",
> u"-wp'"]
>
>
> ======================================================================
> FAIL: test_unicode (bzrlib.tests.test_commands.TestGetAlias)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "C:\dev\bzr\392428\bzrlib\tests\test_commands.py", line 116, in test_unic
> ode
> commands.get_alias("iam", config=my_config))
> AssertionError: not equal:
> a = [u'whoami', u'Erik B\xe5gfors <email address hidden>']
> b = [u'whoami', u"'Erik", u'B\xe5gfors', u"<email address hidden>'"]
>
>
> ----------------------------------------------------------------------
> Ran 41 tests in 0.547s
>
> FAILED (failures=2)
> 1 test skipped
> Missing feature 'FTPServer' skipped 18 tests.
>

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

> These two failures are wrong by itself and should not be affected by
> single quote vs double quote differences.

Okay, I'll see about fixing them.

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

UnicodeShlex doesn't handling escaping a space, so test_from_string_u5 in test_diff.TestDiffFromTool is failing with the modified shlex_split_unicode codepath for win32. So another thing to be fixed.

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

I've written a replacement for UnicodeShlex which duplicates the way the Win32 API function CommandLineToArgvW handles splitting a commandline into arguments, but with the added functionality of indicating which arguments are quoted, like UnicodeShlex did.

Revision history for this message
Gordon Tyler (doxxx) wrote : Posted in a previous version of this proposal

And I've just added a test to test_diff which actually tests the scenario in bug 392428.

Revision history for this message
Gordon Tyler (doxxx) wrote :

This is a resubmission of the original proposal because I made significant code changes during the discussion of the previous merge proposal and I thought it could use a fresh review.

This merge proposal fixes bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32. However, this caused various tests to fail due to the command line parser in win32utils, UnicodeShlex, purposefully not handling single quotes. It also sometimes escaped spaces with a backslash and sometimes did not.

So I decided to re-implement command line parsing in win32utils to match the way the Win32 API function, CommandLineToArgvW, parses the command line, particularly in regards to how it handles backslashes and their interaction with double quotes. I couldn't just use the CommandLineToArgvW function itself because command_line_to_argv needs to know which arguments were quoted and which were not so that it can be perform the appropriate (optional) wildcard expansion.

I've updated the various tests affected by these changes to handle the differences in how win32 handles quoting (no single quotes) and escapes (can't escape a space with a backslash). I've also added a test in test_diff for the specific case in bug 392428, a diff tool specified as a win32 path with backslashes.

I've also tested by hand using KDiff3, WinMerge and a Windows command script which calls KDiff3:

bzrdev diff --using "C:\Program Files (x86)\KDiff3\kdiff3.exe"
bzrdev diff --using "C:\Program Files (x86)\WinMerge\WinMergeU.exe"
bzrdev diff --using C:\Tools\diff.cmd

While I was updating get_unicode_argv to use the new command_line_to_argv, I noticed that there was some cruft leftover from what I'm guessing was its previous implementation which used the CommandLineToArgvW Win32 API function. I cleaned this up and added some error handling as well.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> This is a resubmission of the original proposal because I made significant code changes during the discussion of the previous merge proposal and I thought it could use a fresh review.
>
> This merge proposal fixes bug 392428 by changing commands.shlex_split_unicode to use win32utils.command_line_to_argv on win32. However, this caused various tests to fail due to the command line parser in win32utils, UnicodeShlex, purposefully not handling single quotes. It also sometimes escaped spaces with a backslash and sometimes did not.
>
> So I decided to re-implement command line parsing in win32utils to match the way the Win32 API function, CommandLineToArgvW, parses the command line, particularly in regards to how it handles backslashes and their interaction with double quotes. I couldn't just use the CommandLineToArgvW function itself because command_line_to_argv needs to know which arguments were quoted and which were not so that it can be perform the appropriate (optional) wildcard expansion.

I haven't looked at the code, but I'd really like to have 1 standard
parser for the command line. If we need/want to do it differently than
it was done so far, then we should migrate to the new method. (wrt
escaping, etc.)

I don't really want to second guess what goes into "--using" versus what
is done on the command line, versus ...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktEliYACgkQJdeBCYSNAAPfJQCg1T3/6uPLidv2gDZsz3hpG3gJ
YuwAnj/ocg6CAiSjAfwjgt7lBacZRKSc
=17q1
-----END PGP SIGNATURE-----

Revision history for this message
Gordon Tyler (doxxx) wrote :

> I haven't looked at the code, but I'd really like to have 1 standard
> parser for the command line. If we need/want to do it differently than
> it was done so far, then we should migrate to the new method. (wrt
> escaping, etc.)

Do you mean one standard parser across all platforms?

> I don't really want to second guess what goes into "--using" versus what
> is done on the command line, versus ...

If I understand the code correctly, the bzr commandline parsing on win32 should be using this new code as well. I'll double-check though.

Revision history for this message
Martin Packman (gz) wrote :

I've not read over this change in detail yet, but a few comments.

> So I decided to re-implement command line parsing in win32utils to match the
> way the Win32 API function, CommandLineToArgvW, parses the command line,
> particularly in regards to how it handles backslashes and their interaction
> with double quotes.

I approve of this idea in principle.

> I've updated the various tests affected by these changes to handle the
> differences in how win32 handles quoting (no single quotes) and escapes (can't
> escape a space with a backslash). I've also added a test in test_diff for the
> specific case in bug 392428, a diff tool specified as a win32 path with
> backslashes.

This strikes me as good too, however I think to make this continue to work, rather than using platform-checks in shell-like tests, they need to be written in a portable subset. This should be pretty easy to enforce if they get run through the CommandLineToArgv-style parser on all platforms, and rejected if they pull any crazy bash-tricks.

review: Needs Fixing
Revision history for this message
Gordon Tyler (doxxx) wrote :

> This strikes me as good too, however I think to make this continue to work,
> rather than using platform-checks in shell-like tests, they need to be written
> in a portable subset. This should be pretty easy to enforce if they get run
> through the CommandLineToArgv-style parser on all platforms, and rejected if
> they pull any crazy bash-tricks.

They're not exactly crazy bash-tricks. They have to test that on non-win32 systems, bzr supports things like quoting with single quotes and backslash as the escape character, since users of those platforms will expect those platforms' conventions to work.

Revision history for this message
John A Meinel (jameinel) wrote :

58 def test_from_string_u5(self):
59 - diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
60 + # win32 doesn't support escaping spaces with backslashes
61 + if sys.platform == 'win32':
62 + tool = 'diff "-u 5"'
63 + else:
64 + tool = 'diff -u\\ 5'
65 + diff_obj = DiffFromTool.from_string(tool, None, None, None)

^- I wouldn't have a problem supporting "\ " turning into an escaped space (non-breaking space). You can't put ' ' at the beginning of files in Windows anyway. (well not entirely true. Creating them via explorer will just remove the opening space, but other programs *can* create the file on disk.) It is unlikely enough, though that I would be fine with it.

Also, you *could* only support this when not currently in a quoted context. However, you don't have to implement this if it is a pain.

85 === modified file 'bzrlib/tests/test_rules.py'
86 --- bzrlib/tests/test_rules.py 2009-03-23 14:59:43 +0000
87 +++ bzrlib/tests/test_rules.py 2010-01-06 05:32:23 +0000
88 @@ -63,8 +63,12 @@
89 rs.get_selected_items('a.txt', ['foo']))
90
91 def test_get_items_from_multiple_glob_match(self):
92 - rs = self.make_searcher(
93 - "[name *.txt *.py 'x x' \"y y\"]\nfoo=bar\na=True\n")
94 + # win32 doesn't support single quotes for quoting args with spaces
95 + if sys.platform == 'win32':
96 + text = '[name *.txt *.py "x x" "y y"]\nfoo=bar\na=True\n'
97 + else:
98 + text = """[name *.txt *.py 'x x' "y y"]\nfoo=bar\na=True\n"""
99 + rs = self.make_searcher(text)
100 self.assertEquals((), rs.get_items('NEWS'))
101 self.assertEquals((('foo', 'bar'), ('a', 'True')),
102 rs.get_items('a.py'))

^- This concerns me a fair amount. Specifically, it is likely that "rules" files will end up as a versioned property (.bzrmeta/rules) or some other meta-data. As such, we *really* want them to be consistent across platforms.

How hard would it be to update UnicodeCommandlineSplitter to have optional support for ' chars? And then we would just disable it for the actual command line processing. (Again, *I* would rather support it always, but consensus seems to be against it.)

Some small formatting issues. PEP8 says that you should put 2 blank lines between class definitions (or any module-level definitions). So add some lines in places like this:
275 + def __iter__(self):
276 + return self
277 +
278 +class _Whitespace(object):

I don't think there is a specific need to change the name from UnicodeShlex to UnicodeComandlineSplitter. The functionality hasn't changed. As near as I can tell you just moved the "state" into helper classes, rather than a class var + if blocks.

I'm happy to help this along if you think you need some help. (I think it is generally good, and just needs some tweaks before landing. Some of which we can chose not to do.)

review: Needs Fixing
Revision history for this message
Gordon Tyler (doxxx) wrote :

Thanks for the feedback, John. I'm glad to have specific issues to focus on. :)

> ^- I wouldn't have a problem supporting "\ " turning into an escaped space
> (non-breaking space). You can't put ' ' at the beginning of files in Windows
> anyway. (well not entirely true. Creating them via explorer will just remove
> the opening space, but other programs *can* create the file on disk.) It is
> unlikely enough, though that I would be fine with it.
>
> Also, you *could* only support this when not currently in a quoted context.
> However, you don't have to implement this if it is a pain.

There would be problems with commandlines like:

bzr status C:\mybranch\ C:\myotherbranch

Allowing backslashes to escape spaces would cause the 3rd and 4th arguments to be treated as a single argument "C:\mybranch\ C:\myotherbranch".

The way you escape spaces on Windows is to quote the whole argument with double quotes.

> ^- This concerns me a fair amount. Specifically, it is likely that "rules"
> files will end up as a versioned property (.bzrmeta/rules) or some other meta-
> data. As such, we *really* want them to be consistent across platforms.

This concerned me as well but I was unsure what would be an acceptable way to resolve it.

> How hard would it be to update UnicodeCommandlineSplitter to have optional
> support for ' chars? And then we would just disable it for the actual command
> line processing. (Again, *I* would rather support it always, but consensus
> seems to be against it.)

Optional support for single quotes should be doable. And not allowing it when processing the bzr commandline on Windows should be doable as well, if I'm understanding the code correctly.

>
> Some small formatting issues. PEP8 says that you should put 2 blank lines
> between class definitions (or any module-level definitions).

Ok.

> I don't think there is a specific need to change the name from UnicodeShlex to
> UnicodeComandlineSplitter. The functionality hasn't changed. As near as I can
> tell you just moved the "state" into helper classes, rather than a class var +
> if blocks.

I changed it to distance it from the Unix-like shlex, but I have no strong preference on the matter. I'll change it back.

> I'm happy to help this along if you think you need some help. (I think it is
> generally good, and just needs some tweaks before landing. Some of which we
> can chose not to do.)

I think I'm good. I just needed some specific criticism. :)

Revision history for this message
Gordon Tyler (doxxx) wrote :

The suggested changes have been made. TestDiffTool.test_from_string_u5 doesn't use a backslash to escape the space anymore. Instead, I changed it to use "diff '-u 5'" instead and remove the platform check, since the purpose of the test is not to verify the specifics of the escaping support but merely that it is able to use a non-trivial commandline as a diff tool.

Revision history for this message
Martin Packman (gz) wrote :

> The suggested changes have been made. TestDiffTool.test_from_string_u5 doesn't
> use a backslash to escape the space anymore. Instead, I changed it to use
> "diff '-u 5'" instead and remove the platform check, since the purpose of the
> test is not to verify the specifics of the escaping support but merely that it
> is able to use a non-trivial commandline as a diff tool.

That was the point I was trying to get at earlier, you don't want to be having to manually mark tests skipped because someone used the wrong sort of essentially interchangeable quoting mechanism. However, your fix should surely be 'diff "-u 5"' not "diff '-u 5'" if I understand the intention correctly.

I think there needs to be a mechanism for people who are writing tests on nix platforms to bash rules letting them know if their test isn't portable.

Revision history for this message
Gordon Tyler (doxxx) wrote :

> That was the point I was trying to get at earlier, you don't want to be having
> to manually mark tests skipped because someone used the wrong sort of
> essentially interchangeable quoting mechanism. However, your fix should surely
> be 'diff "-u 5"' not "diff '-u 5'" if I understand the intention correctly.

In this case, I used '' specifically because in the diff tool setting it should be supported by both platforms, whereas it wouldn't be in normal bzr commandline processing. However, I see your point, that sort of testing should be done in test_commands.

> I think there needs to be a mechanism for people who are writing tests on nix
> platforms to bash rules letting them know if their test isn't portable.

I think that's outside of the scope of my changes here.

I have just noticed that I forgot to fix test_commands, so more changes incoming.

Revision history for this message
Gordon Tyler (doxxx) wrote :

I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.

There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.
>
> There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

I don't really know about "use_last_error", but I'll also note that we
manually build up a prototype, but I can do

func = ctypes.windll.kernel32.GetCommandLineW

and get a _FuncPtr object in both python2.5 and 2.6. (In 2.4 you need to
install ctypes manually, and we don't use it for the standard Windows
installer anyway.)

I'm wondering what we are gaining by creating our own prototype.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktPXqgACgkQJdeBCYSNAAMOdgCeLpCrOLWeHSwMhYuf/sCcZUX9
/2QAoLdKmiwp12xZ24RQLra7vLFtfgge
=casy
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I've fixed various tests, restoring some to their original state since they should work now with single quotes parsing enabled.
>
> There was also a compatibility problem with Python 2.5 in the cleanup changes I made to get_unicode_argv, which I didn't notice until now because I normally use Python 2.6.

review time...

1) I'd like to get away from using 'shlex.split' on all platforms. the
'utf-8' hack is a bit ugly. It will mean losing some support for '\' in
some circumstances, but I'd rather have configuration (espec, stuff like
'rules') work consistently everywhere. (Do you think UnicodeShlex is a
reasonable replacement?)

2)
+_whitespace_match = re.compile(u'\s').match
+

^- Do we want flags=re.UNICODE? Quick testing shows that it isn't set
for u'\s':

re.match(u'\s', u'\u2028') => None
re.match(u'\s', u'\u2028', re.UNICODE) => Match

For those that don't know u'\u2028' is the Unicode "Line Separator"
(u'\N{LINE SEPARATOR}') character.
http://unicode.org/standard/reports/tr13/tr13-5.html
(of course, it doesn't help that Unicode still defines 2 separators,
with a new \N{PARAGRAPH SEPARATOR}... :(

That said, I don't really think we care about breaking command lines on
u'\2028', so maybe it doesn't matter...

The rest looks good to me. So the big question is (1)? Can we get away
with replacing shlex entirely and make things a bit more consistent?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktPYTcACgkQJdeBCYSNAAPboACeNXvqObazzbOnMaHSkkxuBClQ
EO0AoLHKIlKWLOKSB5pPdU0RN3qO+cUK
=EoNE
-----END PGP SIGNATURE-----

Revision history for this message
Gordon Tyler (doxxx) wrote :

> -----BEGIN PGP SIGNED MESSAGE-----
> func = ctypes.windll.kernel32.GetCommandLineW
>
> and get a _FuncPtr object in both python2.5 and 2.6. (In 2.4 you need to
> install ctypes manually, and we don't use it for the standard Windows
> installer anyway.)
>
> I'm wondering what we are gaining by creating our own prototype.

An actual string return type instead of an int (i.e. pointer value).

Revision history for this message
Gordon Tyler (doxxx) wrote :

> 1) I'd like to get away from using 'shlex.split' on all platforms. the
> 'utf-8' hack is a bit ugly. It will mean losing some support for '\' in
> some circumstances, but I'd rather have configuration (espec, stuff like
> 'rules') work consistently everywhere. (Do you think UnicodeShlex is a
> reasonable replacement?)

When I re-implemented UnicodeShlex, I rewrote it from scratch with solely Win32 commandline parsing compatibility in mind. I'd have to read through shlex.split to see what is required for posix compatibility but I can immediately point out that it does not allow backslashes as an escape character. The Win32 API function CommandLineToArgv has some odd rules on how backslashes are treated and UnicodeShlex would have to operate in a substantially different mode to achieve posix compatibility.

However, I think it should be doable but I believe it should be a separate step in another merge proposal after this one. My main aim here is to fix the bug with as little disruption to the rest of bzr.

> That said, I don't really think we care about breaking command lines on
> u'\2028', so maybe it doesn't matter...

It's a simple enough change.

> The rest looks good to me. So the big question is (1)? Can we get away
> with replacing shlex entirely and make things a bit more consistent?

I like the idea but I think it needs to be done in two phases.

Revision history for this message
Martin Packman (gz) wrote :

> I think that's outside of the scope of my changes here.

Okay.

> +_whitespace_match = re.compile(u'\s').match
> +
>
> ^- Do we want flags=re.UNICODE?

No, in fact it'd more correctly be [\x00-\x20] as that's all CommandLineToArgvW cares about. It's hard to imagine a scenario where it'd actually matter though.

review: Abstain
Revision history for this message
Martin Packman (gz) wrote :

Correction, [\x01-\x20] obviously.

Revision history for this message
Gordon Tyler (doxxx) wrote :

So, I'd like to look into making UnicodeShlex work for non-win32 platforms but I don't know if I'll have it done for 2.1.0. So my question is do we want to merge the current proposal to fix bug 392428 and I make a separate proposal later which replaces that and the standard shlex with the new all-seeing, all-powerful, UnicodeShlex?

Revision history for this message
Gordon Tyler (doxxx) wrote :

I guess this answers my question:

** Changed in: bzr
    Milestone: 2.1.0rc1 => None

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> I guess this answers my question:
>
> ** Changed in: bzr
> Milestone: 2.1.0rc1 => None
>
>

Sorry about that. I did want to get it in, but just ran out of time to
get 2.1.0rc1 out the door. Once I get the installers and stuff built,
this is on the top of my todo list.

John
=:->-
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZ8hgACgkQJdeBCYSNAAOTUgCgzsn/HRP6Jc+zfGOPiMaIvTY5
5scAoJgq5XltueEVnpiiOiMhghdX126i
=qbxs
-----END PGP SIGNATURE-----

Revision history for this message
Gordon Tyler (doxxx) wrote :

On 22/01/2010 1:45 PM, John A Meinel wrote:
> Sorry about that. I did want to get it in, but just ran out of time to
> get 2.1.0rc1 out the door. Once I get the installers and stuff built,
> this is on the top of my todo list.

So this is going into 2.2 then?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> On 22/01/2010 1:45 PM, John A Meinel wrote:
>> Sorry about that. I did want to get it in, but just ran out of time to
>> get 2.1.0rc1 out the door. Once I get the installers and stuff built,
>> this is on the top of my todo list.
>
> So this is going into 2.2 then?
>

I'm thinking to land this into trunk which is now the 2.2 series, yes.
This was never rejected, I just didn't get a chance to land it in time
for 2.1.0rc1 and it seems slightly too much of a change between an rc1
and a 2.1.0 final.

I could be convinced otherwise if other developers feel it could go into
2.1.0.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZ/b0ACgkQJdeBCYSNAANUvgCdFNtlrWkWonnVXNbzh1oxB96X
EpUAn0JPfw3kMjcGVwgC8XlRZkGIYLbe
=mJxT
-----END PGP SIGNATURE-----

Revision history for this message
Gordon Tyler (doxxx) wrote :

On 22/01/2010 2:36 PM, John A Meinel wrote:
> I'm thinking to land this into trunk which is now the 2.2 series, yes.
> This was never rejected, I just didn't get a chance to land it in time
> for 2.1.0rc1 and it seems slightly too much of a change between an rc1
> and a 2.1.0 final.
>
> I could be convinced otherwise if other developers feel it could go into
> 2.1.0.

I'm not too fussed so long as I know what is needed of me. :)

My only concern was whether the bug was considered important enough to
try getting fixed for 2.1.0. Although perhaps if it's landed into 2.2
and after a while is considered safe enough, it could be backported to a
2.1.x bugfix release?

Revision history for this message
Martin Pool (mbp) wrote :

2010/1/22 Gordon Tyler <email address hidden>:
> I'm not too fussed so long as I know what is needed of me. :)
>
> My only concern was whether the bug was considered important enough to
> try getting fixed for 2.1.0. Although perhaps if it's landed into 2.2
> and after a while is considered safe enough, it could be backported to a
> 2.1.x bugfix release?

Yes, that would be ok.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote :

Do we have agreement to land this patch ? The comments seem to imply
it's ok to land on trunk but there is still no 'Approve' vote.

The patch history is consequent which will make a review from scratch
a bit tedious, I'd be more incline to do a final one if the patch
was already approved once.

Revision history for this message
Gordon Tyler (doxxx) wrote :

On 2/9/2010 4:12 AM, Vincent Ladeuil wrote:
> Do we have agreement to land this patch ? The comments seem to imply
> it's ok to land on trunk but there is still no 'Approve' vote.
>
> The patch history is consequent which will make a review from scratch
> a bit tedious, I'd be more incline to do a final one if the patch
> was already approved once.

I'm actually working on a new merge proposal at John's suggestion to
create a replacement for shlex which works for all platforms. So I guess
this merge proposal has become redundant.

Revision history for this message
John A Meinel (jameinel) wrote :

If Gordon says he's working on an updated version, I'm going to bump it back to Work in Progress for now. Though we could probably land this in bzr.dev and let him update it from there.

Revision history for this message
Gordon Tyler (doxxx) wrote :

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/commands.py'
2--- bzrlib/commands.py 2010-01-11 13:15:01 +0000
3+++ bzrlib/commands.py 2010-01-15 13:23:14 +0000
4@@ -896,8 +896,13 @@
5
6
7 def shlex_split_unicode(unsplit):
8- import shlex
9- return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
10+ if sys.platform == "win32":
11+ from bzrlib.win32utils import command_line_to_argv
12+ return command_line_to_argv(unsplit, wildcard_expansion=False,
13+ single_quotes_allowed=True)
14+ else:
15+ import shlex
16+ return [u.decode('utf-8') for u in shlex.split(unsplit.encode('utf-8'))]
17
18
19 def get_alias(cmd, config=None):
20
21=== modified file 'bzrlib/tests/test_commands.py'
22--- bzrlib/tests/test_commands.py 2009-05-23 21:01:51 +0000
23+++ bzrlib/tests/test_commands.py 2010-01-15 13:23:14 +0000
24@@ -111,7 +111,7 @@
25
26 def test_unicode(self):
27 my_config = self._get_config("[ALIASES]\n"
28- u"iam=whoami 'Erik B\u00e5gfors <erik@bagfors.nu>'\n")
29+ u'iam=whoami "Erik B\u00e5gfors <erik@bagfors.nu>"\n')
30 self.assertEqual([u'whoami', u'Erik B\u00e5gfors <erik@bagfors.nu>'],
31 commands.get_alias("iam", config=my_config))
32
33
34=== modified file 'bzrlib/tests/test_diff.py'
35--- bzrlib/tests/test_diff.py 2009-12-22 15:50:40 +0000
36+++ bzrlib/tests/test_diff.py 2010-01-15 13:23:14 +0000
37@@ -45,6 +45,8 @@
38 from bzrlib.revisiontree import RevisionTree
39 from bzrlib.revisionspec import RevisionSpec
40
41+from bzrlib.tests.test_win32utils import BackslashDirSeparatorFeature
42+
43
44 class _AttribFeature(Feature):
45
46@@ -1292,12 +1294,22 @@
47 diff_obj.command_template)
48
49 def test_from_string_u5(self):
50- diff_obj = DiffFromTool.from_string('diff -u\\ 5', None, None, None)
51+ diff_obj = DiffFromTool.from_string('diff "-u 5"', None, None, None)
52 self.addCleanup(diff_obj.finish)
53 self.assertEqual(['diff', '-u 5', '@old_path', '@new_path'],
54 diff_obj.command_template)
55 self.assertEqual(['diff', '-u 5', 'old-path', 'new-path'],
56 diff_obj._get_command('old-path', 'new-path'))
57+
58+ def test_from_string_path_with_backslashes(self):
59+ self.requireFeature(BackslashDirSeparatorFeature)
60+ tool = 'C:\\Tools\\Diff.exe'
61+ diff_obj = DiffFromTool.from_string(tool, None, None, None)
62+ self.addCleanup(diff_obj.finish)
63+ self.assertEqual(['C:\\Tools\\Diff.exe', '@old_path', '@new_path'],
64+ diff_obj.command_template)
65+ self.assertEqual(['C:\\Tools\\Diff.exe', 'old-path', 'new-path'],
66+ diff_obj._get_command('old-path', 'new-path'))
67
68 def test_execute(self):
69 output = StringIO()
70
71=== modified file 'bzrlib/tests/test_win32utils.py'
72--- bzrlib/tests/test_win32utils.py 2009-11-20 16:42:28 +0000
73+++ bzrlib/tests/test_win32utils.py 2010-01-15 13:23:14 +0000
74@@ -291,11 +291,11 @@
75 win32utils.set_file_attr_hidden(path)
76
77
78-
79 class TestUnicodeShlex(tests.TestCase):
80
81- def assertAsTokens(self, expected, line):
82- s = win32utils.UnicodeShlex(line)
83+ def assertAsTokens(self, expected, line, single_quotes_allowed=False):
84+ s = win32utils.UnicodeShlex(line,
85+ single_quotes_allowed=single_quotes_allowed)
86 self.assertEqual(expected, list(s))
87
88 def test_simple(self):
89@@ -312,16 +312,22 @@
90 self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar ')
91
92 def test_posix_quotations(self):
93- self.assertAsTokens([(True, u'foo bar')], u'"foo bar"')
94- self.assertAsTokens([(False, u"'fo''o"), (False, u"b''ar'")],
95- u"'fo''o b''ar'")
96- self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"')
97- self.assertAsTokens([(True, u"fo'o"), (True, u"b'ar")],
98- u'"fo"\'o b\'"ar"')
99+ self.assertAsTokens([(True, u'foo bar')], u"'foo bar'",
100+ single_quotes_allowed=True)
101+ self.assertAsTokens([(True, u'foo bar')], u"'fo''o b''ar'",
102+ single_quotes_allowed=True)
103+ self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"',
104+ single_quotes_allowed=True)
105+ self.assertAsTokens([(True, u'foo bar')], u'"fo"\'o b\'"ar"',
106+ single_quotes_allowed=True)
107
108 def test_nested_quotations(self):
109 self.assertAsTokens([(True, u'foo"" bar')], u"\"foo\\\"\\\" bar\"")
110 self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"")
111+ self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"",
112+ single_quotes_allowed=True)
113+ self.assertAsTokens([(True, u'foo"" bar')], u"'foo\"\" bar'",
114+ single_quotes_allowed=True)
115
116 def test_empty_result(self):
117 self.assertAsTokens([], u'')
118@@ -330,6 +336,7 @@
119 def test_quoted_empty(self):
120 self.assertAsTokens([(True, '')], u'""')
121 self.assertAsTokens([(False, u"''")], u"''")
122+ self.assertAsTokens([(True, '')], u"''", single_quotes_allowed=True)
123
124 def test_unicode_chars(self):
125 self.assertAsTokens([(False, u'f\xb5\xee'), (False, u'\u1234\u3456')],
126@@ -337,25 +344,36 @@
127
128 def test_newline_in_quoted_section(self):
129 self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u'"foo\nbar\nbaz\n"')
130+ self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u"'foo\nbar\nbaz\n'",
131+ single_quotes_allowed=True)
132
133 def test_escape_chars(self):
134 self.assertAsTokens([(False, u'foo\\bar')], u'foo\\bar')
135
136 def test_escape_quote(self):
137 self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"')
138+ self.assertAsTokens([(True, u'foo\\"bar')], u'"foo\\\\\\"bar"')
139+ self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\"bar"')
140
141 def test_double_escape(self):
142- self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"')
143+ self.assertAsTokens([(True, u'foo\\\\bar')], u'"foo\\\\bar"')
144 self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar")
145+
146+ def test_multiple_quoted_args(self):
147+ self.assertAsTokens([(True, u'x x'), (True, u'y y')],
148+ u'"x x" "y y"')
149+ self.assertAsTokens([(True, u'x x'), (True, u'y y')],
150+ u'"x x" \'y y\'', single_quotes_allowed=True)
151
152
153 class Test_CommandLineToArgv(tests.TestCaseInTempDir):
154
155- def assertCommandLine(self, expected, line):
156+ def assertCommandLine(self, expected, line, single_quotes_allowed=False):
157 # Strictly speaking we should respect parameter order versus glob
158 # expansions, but it's not really worth the effort here
159- self.assertEqual(expected,
160- sorted(win32utils._command_line_to_argv(line)))
161+ argv = win32utils.command_line_to_argv(line,
162+ single_quotes_allowed=single_quotes_allowed)
163+ self.assertEqual(expected, sorted(argv))
164
165 def test_glob_paths(self):
166 self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
167@@ -371,19 +389,25 @@
168 self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h'])
169 self.assertCommandLine([u'a/*.c'], '"a/*.c"')
170 self.assertCommandLine([u"'a/*.c'"], "'a/*.c'")
171+ self.assertCommandLine([u'a/*.c'], "'a/*.c'",
172+ single_quotes_allowed=True)
173
174 def test_slashes_changed(self):
175 # Quoting doesn't change the supplied args
176 self.assertCommandLine([u'a\\*.c'], '"a\\*.c"')
177+ self.assertCommandLine([u'a\\*.c'], "'a\\*.c'",
178+ single_quotes_allowed=True)
179 # Expands the glob, but nothing matches, swaps slashes
180 self.assertCommandLine([u'a/*.c'], 'a\\*.c')
181 self.assertCommandLine([u'a/?.c'], 'a\\?.c')
182 # No glob, doesn't touch slashes
183 self.assertCommandLine([u'a\\foo.c'], 'a\\foo.c')
184
185- def test_no_single_quote_supported(self):
186+ def test_single_quote_support(self):
187 self.assertCommandLine(["add", "let's-do-it.txt"],
188 "add let's-do-it.txt")
189+ self.assertCommandLine(["add", "lets do it.txt"],
190+ "add 'lets do it.txt'", single_quotes_allowed=True)
191
192 def test_case_insensitive_globs(self):
193 self.requireFeature(tests.CaseInsCasePresFilenameFeature)
194
195=== modified file 'bzrlib/win32utils.py'
196--- bzrlib/win32utils.py 2009-12-16 06:38:15 +0000
197+++ bzrlib/win32utils.py 2010-01-15 13:23:14 +0000
198@@ -517,117 +517,161 @@
199 trace.mutter('Unable to set hidden attribute on %r: %s', path, e)
200
201
202+_whitespace_match = re.compile(u'\s', re.UNICODE).match
203+
204+
205+class _PushbackSequence(object):
206+ def __init__(self, orig):
207+ self._iter = iter(orig)
208+ self._pushback_buffer = []
209+
210+ def next(self):
211+ if len(self._pushback_buffer) > 0:
212+ return self._pushback_buffer.pop()
213+ else:
214+ return self._iter.next()
215+
216+ def pushback(self, char):
217+ self._pushback_buffer.append(char)
218+
219+ def __iter__(self):
220+ return self
221+
222+
223+class _Whitespace(object):
224+ def process(self, next_char, seq, context):
225+ if _whitespace_match(next_char):
226+ if len(context.token) > 0:
227+ return None
228+ else:
229+ return self
230+ elif (next_char == u'"'
231+ or (context.single_quotes_allowed and next_char == u"'")):
232+ context.quoted = True
233+ return _Quotes(next_char, self)
234+ elif next_char == u'\\':
235+ return _Backslash(self)
236+ else:
237+ context.token.append(next_char)
238+ return _Word()
239+
240+
241+class _Quotes(object):
242+ def __init__(self, quote_char, exit_state):
243+ self.quote_char = quote_char
244+ self.exit_state = exit_state
245+
246+ def process(self, next_char, seq, context):
247+ if next_char == u'\\':
248+ return _Backslash(self)
249+ elif next_char == self.quote_char:
250+ return self.exit_state
251+ else:
252+ context.token.append(next_char)
253+ return self
254+
255+
256+class _Backslash(object):
257+ # See http://msdn.microsoft.com/en-us/library/bb776391(VS.85).aspx
258+ def __init__(self, exit_state):
259+ self.exit_state = exit_state
260+ self.count = 1
261+
262+ def process(self, next_char, seq, context):
263+ if next_char == u'\\':
264+ self.count += 1
265+ return self
266+ elif next_char == u'"':
267+ # 2N backslashes followed by '"' are N backslashes
268+ context.token.append(u'\\' * (self.count/2))
269+ # 2N+1 backslashes follwed by '"' are N backslashes followed by '"'
270+ # which should not be processed as the start or end of quoted arg
271+ if self.count % 2 == 1:
272+ context.token.append(next_char) # odd number of '\' escapes the '"'
273+ else:
274+ seq.pushback(next_char) # let exit_state handle next_char
275+ self.count = 0
276+ return self.exit_state
277+ else:
278+ # N backslashes not followed by '"' are just N backslashes
279+ if self.count > 0:
280+ context.token.append(u'\\' * self.count)
281+ self.count = 0
282+ seq.pushback(next_char) # let exit_state handle next_char
283+ return self.exit_state
284+
285+ def finish(self, context):
286+ if self.count > 0:
287+ context.token.append(u'\\' * self.count)
288+
289+
290+class _Word(object):
291+ def process(self, next_char, seq, context):
292+ if _whitespace_match(next_char):
293+ return None
294+ elif (next_char == u'"'
295+ or (context.single_quotes_allowed and next_char == u"'")):
296+ return _Quotes(next_char, self)
297+ elif next_char == u'\\':
298+ return _Backslash(self)
299+ else:
300+ context.token.append(next_char)
301+ return self
302+
303
304 class UnicodeShlex(object):
305- """This is a very simplified version of shlex.shlex.
306-
307- The main change is that it supports non-ascii input streams. The internal
308- structure is quite simplified relative to shlex.shlex, since we aren't
309- trying to handle multiple input streams, etc. In fact, we don't use a
310- file-like api either.
311- """
312-
313- def __init__(self, uni_string):
314- self._input = uni_string
315- self._input_iter = iter(self._input)
316- self._whitespace_match = re.compile(u'\s').match
317- self._word_match = re.compile(u'\S').match
318- self._quote_chars = u'"'
319- # self._quote_match = re.compile(u'[\'"]').match
320- self._escape_match = lambda x: None # Never matches
321- self._escape = '\\'
322- # State can be
323- # ' ' - after whitespace, starting a new token
324- # 'a' - after text, currently working on a token
325- # '"' - after ", currently in a "-delimited quoted section
326- # "\" - after '\', checking the next char
327- self._state = ' '
328- self._token = [] # Current token being parsed
329-
330- def _get_token(self):
331- # Were there quote chars as part of this token?
332- quoted = False
333- quoted_state = None
334- for nextchar in self._input_iter:
335- if self._state == ' ':
336- if self._whitespace_match(nextchar):
337- # if self._token: return token
338- continue
339- elif nextchar in self._quote_chars:
340- self._state = nextchar # quoted state
341- elif self._word_match(nextchar):
342- self._token.append(nextchar)
343- self._state = 'a'
344- else:
345- raise AssertionError('wtttf?')
346- elif self._state in self._quote_chars:
347- quoted = True
348- if nextchar == self._state: # End of quote
349- self._state = 'a' # posix allows 'foo'bar to translate to
350- # foobar
351- elif self._state == '"' and nextchar == self._escape:
352- quoted_state = self._state
353- self._state = nextchar
354- else:
355- self._token.append(nextchar)
356- elif self._state == self._escape:
357- if nextchar == '\\':
358- self._token.append('\\')
359- elif nextchar == '"':
360- self._token.append(nextchar)
361- else:
362- self._token.append('\\' + nextchar)
363- self._state = quoted_state
364- elif self._state == 'a':
365- if self._whitespace_match(nextchar):
366- if self._token:
367- break # emit this token
368- else:
369- continue # no token to emit
370- elif nextchar in self._quote_chars:
371- # Start a new quoted section
372- self._state = nextchar
373- # escape?
374- elif (self._word_match(nextchar)
375- or nextchar in self._quote_chars
376- # or whitespace_split?
377- ):
378- self._token.append(nextchar)
379- else:
380- raise AssertionError('state == "a", char: %r'
381- % (nextchar,))
382- else:
383- raise AssertionError('unknown state: %r' % (self._state,))
384- result = ''.join(self._token)
385- self._token = []
386- if not quoted and result == '':
387- result = None
388- return quoted, result
389-
390+ def __init__(self, command_line, single_quotes_allowed=False):
391+ self._seq = _PushbackSequence(command_line)
392+ self.single_quotes_allowed = single_quotes_allowed
393+
394 def __iter__(self):
395 return self
396-
397+
398 def next(self):
399 quoted, token = self._get_token()
400 if token is None:
401 raise StopIteration
402 return quoted, token
403-
404-
405-def _command_line_to_argv(command_line):
406- """Convert a Unicode command line into a set of argv arguments.
407-
408- This does wildcard expansion, etc. It is intended to make wildcards act
409- closer to how they work in posix shells, versus how they work by default on
410- Windows.
411+
412+ def _get_token(self):
413+ self.quoted = False
414+ self.token = []
415+ state = _Whitespace()
416+ for next_char in self._seq:
417+ state = state.process(next_char, self._seq, self)
418+ if state is None:
419+ break
420+ if not state is None and not getattr(state, 'finish', None) is None:
421+ state.finish(self)
422+ result = u''.join(self.token)
423+ if not self.quoted and result == '':
424+ result = None
425+ return self.quoted, result
426+
427+
428+def command_line_to_argv(command_line, wildcard_expansion=True,
429+ single_quotes_allowed=False):
430+ """Convert a Unicode command line into a list of argv arguments.
431+
432+ This optionally does wildcard expansion, etc. It is intended to make
433+ wildcards act closer to how they work in posix shells, versus how they
434+ work by default on Windows. Quoted arguments are left untouched.
435+
436+ :param command_line: The unicode string to split into an arg list.
437+ :param wildcard_expansion: Whether wildcard expansion should be applied to
438+ each argument. True by default.
439+ :param single_quotes_allowed: Whether single quotes are accepted as quoting
440+ characters like double quotes. False by
441+ default.
442+ :return: A list of unicode strings.
443 """
444- s = UnicodeShlex(command_line)
445- # Now that we've split the content, expand globs
446+ s = UnicodeShlex(command_line, single_quotes_allowed=single_quotes_allowed)
447+ # Now that we've split the content, expand globs if necessary
448 # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like
449 # '**/' style globs
450 args = []
451 for is_quoted, arg in s:
452- if is_quoted or not glob.has_magic(arg):
453+ if is_quoted or not glob.has_magic(arg) or not wildcard_expansion:
454 args.append(arg)
455 else:
456 args.extend(glob_one(arg))
457@@ -636,16 +680,14 @@
458
459 if has_ctypes and winver != 'Windows 98':
460 def get_unicode_argv():
461- LPCWSTR = ctypes.c_wchar_p
462- INT = ctypes.c_int
463- POINTER = ctypes.POINTER
464- prototype = ctypes.WINFUNCTYPE(LPCWSTR)
465- GetCommandLine = prototype(("GetCommandLineW",
466- ctypes.windll.kernel32))
467- prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT))
468- command_line = GetCommandLine()
469+ prototype = ctypes.WINFUNCTYPE(ctypes.c_wchar_p)
470+ GetCommandLineW = prototype(("GetCommandLineW",
471+ ctypes.windll.kernel32))
472+ command_line = GetCommandLineW()
473+ if command_line is None:
474+ raise ctypes.WinError()
475 # Skip the first argument, since we only care about parameters
476- argv = _command_line_to_argv(command_line)[1:]
477+ argv = command_line_to_argv(command_line)[1:]
478 if getattr(sys, 'frozen', None) is None:
479 # Invoked via 'python.exe' which takes the form:
480 # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS]