Merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion into lp:bzr
- 2.1.0b3-win32-shell-completion
- Merge into bzr.dev
Status: | Merged | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Merged at revision: | not available | ||||||||||||
Proposed branch: | lp:~jameinel/bzr/2.1.0b3-win32-shell-completion | ||||||||||||
Merge into: | lp:bzr | ||||||||||||
Diff against target: |
410 lines 5 files modified
NEWS (+9/-0) bzrlib/builtins.py (+0/-1) bzrlib/commands.py (+6/-5) bzrlib/tests/test_win32utils.py (+92/-2) bzrlib/win32utils.py (+152/-27) |
||||||||||||
To merge this branch: | bzr merge lp:~jameinel/bzr/2.1.0b3-win32-shell-completion | ||||||||||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
David Roberts (community) | Needs Information | ||
Alexander Belchenko | Needs Fixing | ||
Martin Packman (community) | Abstain | ||
Robert Collins (community) | Approve | ||
Review via email: mp+14445@code.launchpad.net |
Commit message
Description of the change
John A Meinel (jameinel) wrote : | # |
Robert Collins (lifeless) wrote : | # |
review: +1
awesome!
Code looks good to me
-Rob
Martin Packman (gz) wrote : | # |
I am unconviced that it's a good idea to try and make cmd behvave more like bash. Nix refugees tend to use cygwin or something like John does, and some of these behaviours will be suprising to anyone used to windows.
This particular change isn't so bad, as programs tend to glob inconsistently anyway, but I don't like the addition of single quotes as escape characters.
C:\bzr\
C:\bzr\
bzr: ERROR: No such file: u'C:/bzr/
Alexander Belchenko (bialix) wrote : | # |
I've late here but I want to say my word.
John, I agree with Martin that using single quotes is bad idea. I hope you decided to remove them. All Windows people used to double quotes so single quotes is not Windows way.
Also I think will be nice to have a way to disable this glob expansion via environment variable, e.g. BZR_GLOB=0 or BZR_GLOB=no. Using configs there is wrong idea, but os.environ should be OK and fast.
David Roberts (smartgpx) wrote : | # |
Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?
(Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Roberts wrote:
> Review: Needs Information
> Is it possible for an end-user who runs from a *setup.exe binary installer on a physical MS WinXP system to test this?
>
> (Since I don't run from a branch it would seem that simply merging the changes is not an option? Would it be valid to 'bzr init' the installed directory tree and then switch to John's branch to update or merge the changes?)
>
Well, you could use the setup.exe installer to download the latest
source, and then install a few things manually (python being the big one.)
Or you could wait about 1-2 days when 2.1.0b3 will be released :)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkr
6mUAmgILGGHqEVn
=UH5U
-----END PGP SIGNATURE-----
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-11-04 09:52:44 +0000 |
3 | +++ NEWS 2009-11-04 22:45:22 +0000 |
4 | @@ -29,6 +29,15 @@ |
5 | allow those because XML store silently translate it anyway. (The parser |
6 | auto-translates \r\n => \n in ways that are hard for us to catch.) |
7 | |
8 | +* On Windows, do glob expansion at the command-line level (as is usually |
9 | + done in bash, etc.) This means that *all* commands get glob expansion |
10 | + (bzr status, bzr add, bzr mv, etc). It uses a custom command line |
11 | + parser, which allows us to know if a given section was quoted. It means |
12 | + you can now do ``bzr ignore "*.py"``. It also means that single-quotes |
13 | + are now treated as quoted ``bzr ignore '*.py'``. |
14 | + (John Arbash Meinel, #425510, #426410, #194450) |
15 | + |
16 | + |
17 | Improvements |
18 | ************ |
19 | |
20 | |
21 | === modified file 'bzrlib/builtins.py' |
22 | --- bzrlib/builtins.py 2009-11-03 20:24:25 +0000 |
23 | +++ bzrlib/builtins.py 2009-11-04 22:45:22 +0000 |
24 | @@ -655,7 +655,6 @@ |
25 | if base_tree: |
26 | base_tree.lock_read() |
27 | try: |
28 | - file_list = self._maybe_expand_globs(file_list) |
29 | tree, file_list = tree_files_for_add(file_list) |
30 | added, ignored = tree.smart_add(file_list, not |
31 | no_recurse, action=action, save=not dry_run) |
32 | |
33 | === modified file 'bzrlib/commands.py' |
34 | --- bzrlib/commands.py 2009-10-14 20:02:28 +0000 |
35 | +++ bzrlib/commands.py 2009-11-04 22:45:22 +0000 |
36 | @@ -56,6 +56,7 @@ |
37 | from bzrlib.symbol_versioning import ( |
38 | deprecated_function, |
39 | deprecated_in, |
40 | + deprecated_method, |
41 | suppress_deprecation_warnings, |
42 | ) |
43 | |
44 | @@ -383,18 +384,18 @@ |
45 | # List of standard options directly supported |
46 | self.supported_std_options = [] |
47 | |
48 | + @deprecated_method(deprecated_in((2, 1, 0))) |
49 | def _maybe_expand_globs(self, file_list): |
50 | """Glob expand file_list if the platform does not do that itself. |
51 | |
52 | + Not used anymore, now that the bzr command-line parser globs on |
53 | + Windows. |
54 | + |
55 | :return: A possibly empty list of unicode paths. |
56 | |
57 | Introduced in bzrlib 0.18. |
58 | """ |
59 | - if not file_list: |
60 | - file_list = [] |
61 | - if sys.platform == 'win32': |
62 | - file_list = win32utils.glob_expand(file_list) |
63 | - return list(file_list) |
64 | + return file_list |
65 | |
66 | def _usage(self): |
67 | """Return single-line grammar for this command. |
68 | |
69 | === modified file 'bzrlib/tests/test_win32utils.py' |
70 | --- bzrlib/tests/test_win32utils.py 2009-07-03 14:26:34 +0000 |
71 | +++ bzrlib/tests/test_win32utils.py 2009-11-04 22:45:22 +0000 |
72 | @@ -17,7 +17,11 @@ |
73 | import os |
74 | import sys |
75 | |
76 | -from bzrlib import osutils |
77 | +from bzrlib import ( |
78 | + osutils, |
79 | + tests, |
80 | + win32utils, |
81 | + ) |
82 | from bzrlib.tests import ( |
83 | Feature, |
84 | TestCase, |
85 | @@ -26,7 +30,6 @@ |
86 | UnicodeFilenameFeature, |
87 | ) |
88 | from bzrlib.win32utils import glob_expand, get_app_path |
89 | -from bzrlib import win32utils |
90 | |
91 | |
92 | # Features |
93 | @@ -261,3 +264,90 @@ |
94 | os.makedirs(u'\u1234\\.bzr') |
95 | path = osutils.abspath(u'\u1234\\.bzr') |
96 | win32utils.set_file_attr_hidden(path) |
97 | + |
98 | + |
99 | + |
100 | +class TestUnicodeShlex(tests.TestCase): |
101 | + |
102 | + def assertAsTokens(self, expected, line): |
103 | + s = win32utils.UnicodeShlex(line) |
104 | + self.assertEqual(expected, list(s)) |
105 | + |
106 | + def test_simple(self): |
107 | + self.assertAsTokens([(False, u'foo'), (False, u'bar'), (False, u'baz')], |
108 | + u'foo bar baz') |
109 | + |
110 | + def test_ignore_multiple_spaces(self): |
111 | + self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar') |
112 | + |
113 | + def test_ignore_leading_space(self): |
114 | + self.assertAsTokens([(False, u'foo'), (False, u'bar')], u' foo bar') |
115 | + |
116 | + def test_ignore_trailing_space(self): |
117 | + self.assertAsTokens([(False, u'foo'), (False, u'bar')], u'foo bar ') |
118 | + |
119 | + def test_posix_quotations(self): |
120 | + self.assertAsTokens([(True, u'foo bar')], u'"foo bar"') |
121 | + self.assertAsTokens([(True, u'foo bar')], u"'foo bar'") |
122 | + self.assertAsTokens([(True, u'foo bar')], u"'fo''o b''ar'") |
123 | + self.assertAsTokens([(True, u'foo bar')], u'"fo""o b""ar"') |
124 | + self.assertAsTokens([(True, u'foo bar')], u'"fo"\'o b\'"ar"') |
125 | + |
126 | + def test_nested_quotations(self): |
127 | + self.assertAsTokens([(True, u'foo"" bar')], u"'foo\"\" bar'") |
128 | + self.assertAsTokens([(True, u'foo\'\' bar')], u"\"foo'' bar\"") |
129 | + |
130 | + def test_empty_result(self): |
131 | + self.assertAsTokens([], u'') |
132 | + self.assertAsTokens([], u' ') |
133 | + |
134 | + def test_quoted_empty(self): |
135 | + self.assertAsTokens([(True, '')], u'""') |
136 | + self.assertAsTokens([(True, '')], u"''") |
137 | + |
138 | + def test_unicode_chars(self): |
139 | + self.assertAsTokens([(False, u'f\xb5\xee'), (False, u'\u1234\u3456')], |
140 | + u'f\xb5\xee \u1234\u3456') |
141 | + |
142 | + def test_newline_in_quoted_section(self): |
143 | + self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u'"foo\nbar\nbaz\n"') |
144 | + self.assertAsTokens([(True, u'foo\nbar\nbaz\n')], u"'foo\nbar\nbaz\n'") |
145 | + |
146 | + def test_escape_chars(self): |
147 | + self.assertAsTokens([(False, u'foo\\bar')], u'foo\\bar') |
148 | + |
149 | + def test_escape_quote(self): |
150 | + self.assertAsTokens([(True, u'foo"bar')], u'"foo\\"bar"') |
151 | + self.assertAsTokens([(True, u'foo\\"bar')], u"'foo\\\"bar'") |
152 | + |
153 | + def test_double_escape(self): |
154 | + self.assertAsTokens([(True, u'foo\\bar')], u'"foo\\\\bar"') |
155 | + self.assertAsTokens([(True, u'foo\\\\bar')], u"'foo\\\\bar'") |
156 | + self.assertAsTokens([(False, u'foo\\\\bar')], u"foo\\\\bar") |
157 | + |
158 | + |
159 | +class Test_CommandLineToArgv(tests.TestCaseInTempDir): |
160 | + |
161 | + def assertCommandLine(self, expected, line): |
162 | + self.assertEqual(expected, win32utils._command_line_to_argv(line)) |
163 | + |
164 | + def test_glob_paths(self): |
165 | + self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h']) |
166 | + self.assertCommandLine([u'a/b.c', u'a/c.c'], 'a/*.c') |
167 | + self.build_tree(['b/', 'b/b.c', 'b/d.c', 'b/d.h']) |
168 | + self.assertCommandLine([u'a/b.c', u'b/b.c'], '*/b.c') |
169 | + self.assertCommandLine([u'a/b.c', u'a/c.c', u'b/b.c', u'b/d.c'], |
170 | + '*/*.c') |
171 | + # Bash style, just pass through the argument if nothing matches |
172 | + self.assertCommandLine([u'*/*.qqq'], '*/*.qqq') |
173 | + |
174 | + def test_quoted_globs(self): |
175 | + self.build_tree(['a/', 'a/b.c', 'a/c.c', 'a/c.h']) |
176 | + self.assertCommandLine([u'a/*.c'], '"a/*.c"') |
177 | + self.assertCommandLine([u'a/*.c'], "'a/*.c'") |
178 | + |
179 | + def test_slashes_changed(self): |
180 | + self.assertCommandLine([u'a/*.c'], '"a\\*.c"') |
181 | + # Expands the glob, but nothing matches |
182 | + self.assertCommandLine([u'a/*.c'], 'a\\*.c') |
183 | + self.assertCommandLine([u'a/foo.c'], 'a\\foo.c') |
184 | |
185 | === modified file 'bzrlib/win32utils.py' |
186 | --- bzrlib/win32utils.py 2009-07-08 14:37:25 +0000 |
187 | +++ bzrlib/win32utils.py 2009-11-04 22:45:22 +0000 |
188 | @@ -19,8 +19,12 @@ |
189 | Only one dependency: ctypes should be installed. |
190 | """ |
191 | |
192 | +import glob |
193 | import os |
194 | +import re |
195 | +import shlex |
196 | import struct |
197 | +import StringIO |
198 | import sys |
199 | |
200 | |
201 | @@ -422,6 +426,26 @@ |
202 | |
203 | |
204 | |
205 | +def glob_one(possible_glob): |
206 | + """Same as glob.glob(). |
207 | + |
208 | + work around bugs in glob.glob() |
209 | + - Python bug #1001604 ("glob doesn't return unicode with ...") |
210 | + - failing expansion for */* with non-iso-8859-* chars |
211 | + """ |
212 | + corrected_glob, corrected = _ensure_with_dir(possible_glob) |
213 | + glob_files = glob.glob(corrected_glob) |
214 | + |
215 | + if not glob_files: |
216 | + # special case to let the normal code path handle |
217 | + # files that do not exist, etc. |
218 | + glob_files = [possible_glob] |
219 | + elif corrected: |
220 | + glob_files = [_undo_ensure_with_dir(elem, corrected) |
221 | + for elem in glob_files] |
222 | + return [elem.replace(u'\\', u'/') for elem in glob_files] |
223 | + |
224 | + |
225 | def glob_expand(file_list): |
226 | """Replacement for glob expansion by the shell. |
227 | |
228 | @@ -435,25 +459,10 @@ |
229 | """ |
230 | if not file_list: |
231 | return [] |
232 | - import glob |
233 | expanded_file_list = [] |
234 | for possible_glob in file_list: |
235 | - # work around bugs in glob.glob() |
236 | - # - Python bug #1001604 ("glob doesn't return unicode with ...") |
237 | - # - failing expansion for */* with non-iso-8859-* chars |
238 | - possible_glob, corrected = _ensure_with_dir(possible_glob) |
239 | - glob_files = glob.glob(possible_glob) |
240 | - |
241 | - if glob_files == []: |
242 | - # special case to let the normal code path handle |
243 | - # files that do not exists |
244 | - expanded_file_list.append( |
245 | - _undo_ensure_with_dir(possible_glob, corrected)) |
246 | - else: |
247 | - glob_files = [_undo_ensure_with_dir(elem, corrected) for elem in glob_files] |
248 | - expanded_file_list += glob_files |
249 | - |
250 | - return [elem.replace(u'\\', u'/') for elem in expanded_file_list] |
251 | + expanded_file_list.extend(glob_one(possible_glob)) |
252 | + return expanded_file_list |
253 | |
254 | |
255 | def get_app_path(appname): |
256 | @@ -511,6 +520,124 @@ |
257 | trace.mutter('Unable to set hidden attribute on %r: %s', path, e) |
258 | |
259 | |
260 | + |
261 | +class UnicodeShlex(object): |
262 | + """This is a very simplified version of shlex.shlex. |
263 | + |
264 | + The main change is that it supports non-ascii input streams. The internal |
265 | + structure is quite simplified relative to shlex.shlex, since we aren't |
266 | + trying to handle multiple input streams, etc. In fact, we don't use a |
267 | + file-like api either. |
268 | + """ |
269 | + |
270 | + def __init__(self, uni_string): |
271 | + self._input = uni_string |
272 | + self._input_iter = iter(self._input) |
273 | + self._whitespace_match = re.compile(u'\s').match |
274 | + self._word_match = re.compile(u'\S').match |
275 | + self._quote_chars = u'\'"' |
276 | + # self._quote_match = re.compile(u'[\'"]').match |
277 | + self._escape_match = lambda x: None # Never matches |
278 | + self._escape = '\\' |
279 | + # State can be |
280 | + # ' ' - after whitespace, starting a new token |
281 | + # 'a' - after text, currently working on a token |
282 | + # '"' - after ", currently in a "-delimited quoted section |
283 | + # "'" - after ', currently in a '-delimited quotod section |
284 | + # "\" - after '\', checking the next char |
285 | + self._state = ' ' |
286 | + self._token = [] # Current token being parsed |
287 | + |
288 | + def _get_token(self): |
289 | + # Were there quote chars as part of this token? |
290 | + quoted = False |
291 | + quoted_state = None |
292 | + for nextchar in self._input_iter: |
293 | + if self._state == ' ': |
294 | + if self._whitespace_match(nextchar): |
295 | + # if self._token: return token |
296 | + continue |
297 | + elif nextchar in self._quote_chars: |
298 | + self._state = nextchar # quoted state |
299 | + elif self._word_match(nextchar): |
300 | + self._token.append(nextchar) |
301 | + self._state = 'a' |
302 | + else: |
303 | + raise AssertionError('wtttf?') |
304 | + elif self._state in self._quote_chars: |
305 | + quoted = True |
306 | + if nextchar == self._state: # End of quote |
307 | + self._state = 'a' # posix allows 'foo'bar to translate to |
308 | + # foobar |
309 | + elif self._state == '"' and nextchar == self._escape: |
310 | + quoted_state = self._state |
311 | + self._state = nextchar |
312 | + else: |
313 | + self._token.append(nextchar) |
314 | + elif self._state == self._escape: |
315 | + if nextchar == '\\': |
316 | + self._token.append('\\') |
317 | + elif nextchar == '"': |
318 | + self._token.append(nextchar) |
319 | + else: |
320 | + self._token.append('\\' + nextchar) |
321 | + self._state = quoted_state |
322 | + elif self._state == 'a': |
323 | + if self._whitespace_match(nextchar): |
324 | + if self._token: |
325 | + break # emit this token |
326 | + else: |
327 | + continue # no token to emit |
328 | + elif nextchar in self._quote_chars: |
329 | + # Start a new quoted section |
330 | + self._state = nextchar |
331 | + # escape? |
332 | + elif (self._word_match(nextchar) |
333 | + or nextchar in self._quote_chars |
334 | + # or whitespace_split? |
335 | + ): |
336 | + self._token.append(nextchar) |
337 | + else: |
338 | + raise AssertionError('state == "a", char: %r' |
339 | + % (nextchar,)) |
340 | + else: |
341 | + raise AssertionError('unknown state: %r' % (self._state,)) |
342 | + result = ''.join(self._token) |
343 | + self._token = [] |
344 | + if not quoted and result == '': |
345 | + result = None |
346 | + return quoted, result |
347 | + |
348 | + def __iter__(self): |
349 | + return self |
350 | + |
351 | + def next(self): |
352 | + quoted, token = self._get_token() |
353 | + if token is None: |
354 | + raise StopIteration |
355 | + return quoted, token |
356 | + |
357 | + |
358 | +def _command_line_to_argv(command_line): |
359 | + """Convert a Unicode command line into a set of argv arguments. |
360 | + |
361 | + This does wildcard expansion, etc. It is intended to make wildcards act |
362 | + closer to how they work in posix shells, versus how they work by default on |
363 | + Windows. |
364 | + """ |
365 | + s = UnicodeShlex(command_line) |
366 | + # Now that we've split the content, expand globs |
367 | + # TODO: Use 'globbing' instead of 'glob.glob', this gives us stuff like |
368 | + # '**/' style globs |
369 | + args = [] |
370 | + for is_quoted, arg in s: |
371 | + if is_quoted or not glob.has_magic(arg): |
372 | + args.append(arg.replace(u'\\', u'/')) |
373 | + else: |
374 | + args.extend(glob_one(arg)) |
375 | + return args |
376 | + |
377 | + |
378 | if has_ctypes and winver != 'Windows 98': |
379 | def get_unicode_argv(): |
380 | LPCWSTR = ctypes.c_wchar_p |
381 | @@ -520,21 +647,19 @@ |
382 | GetCommandLine = prototype(("GetCommandLineW", |
383 | ctypes.windll.kernel32)) |
384 | prototype = ctypes.WINFUNCTYPE(POINTER(LPCWSTR), LPCWSTR, POINTER(INT)) |
385 | - CommandLineToArgv = prototype(("CommandLineToArgvW", |
386 | - ctypes.windll.shell32)) |
387 | - c = INT(0) |
388 | - pargv = CommandLineToArgv(GetCommandLine(), ctypes.byref(c)) |
389 | + command_line = GetCommandLine() |
390 | # Skip the first argument, since we only care about parameters |
391 | - argv = [pargv[i] for i in range(1, c.value)] |
392 | + argv = _command_line_to_argv(GetCommandLine())[1:] |
393 | if getattr(sys, 'frozen', None) is None: |
394 | # Invoked via 'python.exe' which takes the form: |
395 | # python.exe [PYTHON_OPTIONS] C:\Path\bzr [BZR_OPTIONS] |
396 | # we need to get only BZR_OPTIONS part, |
397 | - # so let's using sys.argv[1:] as reference to get the tail |
398 | - # of unicode argv |
399 | - tail_len = len(sys.argv[1:]) |
400 | - ix = len(argv) - tail_len |
401 | - argv = argv[ix:] |
402 | + # We already removed 'python.exe' so we remove everything up to and |
403 | + # including the first non-option ('-') argument. |
404 | + for idx in xrange(len(argv)): |
405 | + if argv[idx][:1] != '-': |
406 | + break |
407 | + argv = argv[idx+1:] |
408 | return argv |
409 | else: |
410 | get_unicode_argv = None |
We've had a lot of proposals that sort of piece-meal implement support for wildcards on Windows. However, a while back I realized we could just implement global wildcard expansion, because we have access to the raw Unicode string that was used to start this process.
At the time Alexander was skeptical because we couldn't just use 'shlex.shlex' because it didn't handle unicode, and it always removed '\' as an escape character. Which doesn't work well if you supply "bzr status C:\foo\bar\baz".
However, I'm happy to note that shlex.shlex is actually not that complex if you get rid of all of its extra flags, etc. It is a rather simple state parser. I certainly don't think we need the ability to support injecting extra command lines into the middle of parsing, or anything else crazy.
I think I have reasonable tests for the actual parsing functionality, and I've tested things manually to make sure it works.
This means that all bzr commands will now get filename globbing, akin to 'bash' style globbing. So you can do: "bzr st *.h" or "bzr commit *.c", or "bzr mv *.h foo/"
It does the bash behavior that if "*.h" doesn't match anything it uses a literal '*.h'. We *could* error on those circumstances, though this seemed fine.
It also means that we get "bzr ignore '*.h'" and "bzr commit -m 'single quoted strings are supported.'".
This is a potential point of breakage for Windows users, so I'd like to get more people testing it than me. (I still call 'win32' bzr from a cygwin shell, so I rarely will see the glob support actually activate. And it probably will break 'bzr ignore "*.h"' for me, because bash will strip the "" and then Win32 won't see the "" to skip expanding the glob. However, I rarely do that, so I'll survive.)