Merge lp:~gz/bzr-grep/preformat_output into lp:bzr-grep
- preformat_output
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 137 |
Proposed branch: | lp:~gz/bzr-grep/preformat_output |
Merge into: | lp:bzr-grep |
Diff against target: |
781 lines (+280/-298) 4 files modified
NEWS (+6/-0) __init__.py (+7/-10) grep.py (+180/-284) test_grep.py (+87/-4) |
To merge this branch: | bzr merge lp:~gz/bzr-grep/preformat_output |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Parth Malwankar | Needs Fixing | ||
Review via email:
|
Commit message
Description of the change
Move a lot of code around, aimed at simplifying and reducing the duplication of pattern search loops so it becomes practical to add another optimisation there.
The main changes are moving the formatting logic out of the inner grep function an into its own object (which is kind of ugly currently, but can probably be improved), and then putting the cache there as well.
By and large, this branch aims not to change behaviour, with a few exceptions:
* Using both a fixed string and the lower case flag now uses a regular expression rather than lower casing the input and the pattern.
* Some colour formatting details.
* A couple of other minor bug fixes.
It's likely this branch changes the timings of certain operations, though I tried to keep the current optimisations as far as possible. I'm particularly interested in differences in before and after profiles for:
* The -Fi case mentioned above
* Huge numbers of matching lines with coloured formatting
* Matches in a file that doesn't change across a large number of revisions
For future work, it'd be great to make these functions less tightly coupled so it's actually possible to unit test them, rather just relying on (slow) functional tests.

Martin Packman (gz) wrote : | # |
> File "/home/
> write(start + per_line % kwargs)
> File "/storage/
> write
> self.wrapped_
> File "/usr/lib/
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78:
> ordinal not in range(128)
Ha, okay, I can reproduce this. Hadn't realised that outf was a wrapped object from bzrlib, so the dodgy line.decode(
Could see there were still issues with str/unicode in the plugin, but was hoping to put them off till later. I might paper over this one as well for the moment, but we need tests here.
> Please add a NEWS entry for the change. I would add it but I think you will be
> able to better describe it :)
Will do.
> On a somewhat related note, I was thinking of using trunk as stable and having
> a 'development' line (I have added lp:bzr-grep/development) as users would
> probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically
> be merged into trunk. I am also ok with any other approach. Feel free to
> suggest if you have any other ideas.
Hm, I'd prefer if you went with a PQM style thing, where you play the PQM. That is, and all changes get done in their own development branches and merged, not pushed, to trunk only when in a complete form. That means trunk remains stable, and each step is usable (complete with NEWS and tests and so on), rather than being done in bits.
- 156. By Martin Packman
-
Fix and test bytes/unicode issue but there's more to do in this area
- 157. By Martin Packman
-
Add NEWS entry for output formatter and -Fi changes

Parth Malwankar (parthm) wrote : | # |
> Hm, I'd prefer if you went with a PQM style thing, where you play the PQM.
> That is, and all changes get done in their own development branches and
> merged, not pushed, to trunk only when in a complete form. That means trunk
> remains stable, and each step is usable (complete with NEWS and tests and so
> on), rather than being done in bits.
Works for me. I have removed the development branch.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2010-05-23 07:59:31 +0000 |
3 | +++ NEWS 2010-06-07 20:35:32 +0000 |
4 | @@ -1,3 +1,9 @@ |
5 | +bzr-grep 0.4.0-dev |
6 | +================================== |
7 | +* Add seperate output formatter to reduce duplication of search loops, |
8 | + additionally make -Fi use regexp rather than lowercasing pattern and |
9 | + entirety of text for the same reason. (Martin [gz]) |
10 | + |
11 | bzr-grep 0.3.0-final - 23-May-2010 |
12 | ================================== |
13 | * Support for --color option (POSIX only). (Parth Malwankar, #571694) |
14 | |
15 | === modified file '__init__.py' |
16 | --- __init__.py 2010-05-23 05:10:32 +0000 |
17 | +++ __init__.py 2010-06-07 20:35:32 +0000 |
18 | @@ -188,9 +188,14 @@ |
19 | if null: |
20 | eol_marker = '\0' |
21 | |
22 | - # if the pattern isalnum, implicitly switch to fixed_string for faster grep |
23 | - if grep.is_fixed_string(pattern): |
24 | + if not ignore_case and grep.is_fixed_string(pattern): |
25 | + # if the pattern isalnum, implicitly use to -F for faster grep |
26 | fixed_string = True |
27 | + elif ignore_case and fixed_string: |
28 | + # GZ 2010-06-02: Fall back to regexp rather than lowercasing |
29 | + # pattern and text which will cause pain later |
30 | + fixed_string = False |
31 | + pattern = re.escape(pattern) |
32 | |
33 | patternc = None |
34 | re_flags = 0 |
35 | @@ -207,13 +212,6 @@ |
36 | elif color == 'auto': |
37 | show_color = allow_color() |
38 | |
39 | - sub_patternc = None |
40 | - if show_color: |
41 | - sub_pattern = '(' + pattern + ')' # make pattern capturing |
42 | - # sub_patternc is used for color display even for fixed_string |
43 | - # when ignore_case is set |
44 | - sub_patternc = grep.compile_pattern(sub_pattern, re_flags) |
45 | - |
46 | GrepOptions.verbose = verbose |
47 | GrepOptions.ignore_case = ignore_case |
48 | GrepOptions.no_recursive = no_recursive |
49 | @@ -234,7 +232,6 @@ |
50 | GrepOptions.eol_marker = eol_marker |
51 | GrepOptions.print_revno = print_revno |
52 | GrepOptions.patternc = patternc |
53 | - GrepOptions.sub_patternc = sub_patternc |
54 | GrepOptions.recursive = not no_recursive |
55 | GrepOptions.fixed_string = fixed_string |
56 | GrepOptions.outf = self.outf |
57 | |
58 | === modified file 'grep.py' |
59 | --- grep.py 2010-05-22 15:11:00 +0000 |
60 | +++ grep.py 2010-06-07 20:35:32 +0000 |
61 | @@ -16,12 +16,8 @@ |
62 | |
63 | from bzrlib.lazy_import import lazy_import |
64 | lazy_import(globals(), """ |
65 | -import codecs |
66 | -import cStringIO |
67 | from fnmatch import fnmatch |
68 | -import os |
69 | import re |
70 | -import string |
71 | |
72 | from termcolor import color_string, re_color_string, FG |
73 | |
74 | @@ -133,13 +129,6 @@ |
75 | bzrdir.BzrDir.open_containing_tree_or_branch('.') |
76 | branch.lock_read() |
77 | try: |
78 | - # res_cache is used to cache results for dir grep based on fid. |
79 | - # If the fid is does not change between results, it means that |
80 | - # the result will be the same apart from revno. In such a case |
81 | - # we avoid getting file chunks from repo and grepping. The result |
82 | - # is just printed by replacing old revno with new one. |
83 | - res_cache = {} |
84 | - |
85 | start_rev = opts.revision[0] |
86 | start_revid = start_rev.as_revision_id(branch) |
87 | if start_revid == None: |
88 | @@ -179,6 +168,9 @@ |
89 | start_rev_tuple = (start_revid, start_revno, 0) |
90 | given_revs = [start_rev_tuple] |
91 | |
92 | + # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now |
93 | + opts.outputter = _Outputter(opts, use_cache=True) |
94 | + |
95 | for revid, revno, merge_depth in given_revs: |
96 | if opts.levels == 1 and merge_depth != 0: |
97 | # with level=1 show only top level |
98 | @@ -195,8 +187,7 @@ |
99 | |
100 | if osutils.isdir(path): |
101 | path_prefix = path |
102 | - res_cache = dir_grep(tree, path, relpath, opts, |
103 | - revno, path_prefix, res_cache) |
104 | + dir_grep(tree, path, relpath, opts, revno, path_prefix) |
105 | else: |
106 | versioned_file_grep(tree, id, '.', path, opts, revno) |
107 | finally: |
108 | @@ -213,6 +204,9 @@ |
109 | 'To search for specific revision in history use the -r option.') |
110 | raise errors.BzrCommandError(msg) |
111 | |
112 | + # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now |
113 | + opts.outputter = _Outputter(opts) |
114 | + |
115 | tree.lock_read() |
116 | try: |
117 | for path in opts.path_list: |
118 | @@ -220,7 +214,7 @@ |
119 | path_prefix = path |
120 | dir_grep(tree, path, relpath, opts, revno, path_prefix) |
121 | else: |
122 | - _file_grep(open(path).read(), '.', path, opts, revno) |
123 | + _file_grep(open(path).read(), path, opts, revno) |
124 | finally: |
125 | tree.unlock() |
126 | |
127 | @@ -233,11 +227,7 @@ |
128 | return False |
129 | |
130 | |
131 | -def dir_grep(tree, path, relpath, opts, revno, path_prefix, res_cache={}): |
132 | - _revno_pattern = re.compile("\~[0-9.]+:") |
133 | - _revno_pattern_list_only = re.compile("\~[0-9.]+") |
134 | - dir_res = {} |
135 | - |
136 | +def dir_grep(tree, path, relpath, opts, revno, path_prefix): |
137 | # setup relpath to open files relative to cwd |
138 | rpath = relpath |
139 | if relpath: |
140 | @@ -251,7 +241,10 @@ |
141 | |
142 | to_grep = [] |
143 | to_grep_append = to_grep.append |
144 | - outf_write = opts.outf.write |
145 | + # GZ 2010-06-05: The cache dict used to be recycled every call to dir_grep |
146 | + # and hits manually refilled. Could do this again if it was |
147 | + # for a good reason, otherwise cache might want purging. |
148 | + outputter = opts.outputter |
149 | for fp, fc, fkind, fid, entry in tree.list_files(include_root=False, |
150 | from_dir=from_dir, recursive=opts.recursive): |
151 | |
152 | @@ -263,25 +256,12 @@ |
153 | # If old result is valid, print results immediately. |
154 | # Otherwise, add file info to to_grep so that the |
155 | # loop later will get chunks and grep them |
156 | - file_rev = tree.inventory[fid].revision |
157 | - old_res = res_cache.get(file_rev) |
158 | - if old_res != None: |
159 | - res = [] |
160 | - res_append = res.append |
161 | - |
162 | - if opts.files_with_matches or opts.files_without_match: |
163 | - new_rev = '~' + revno |
164 | - else: |
165 | - new_rev = ('~%s:' % (revno,)) |
166 | - |
167 | - for line in old_res: |
168 | - if opts.files_with_matches or opts.files_without_match: |
169 | - s = _revno_pattern_list_only.sub(new_rev, line) |
170 | - else: |
171 | - s = _revno_pattern.sub(new_rev, line) |
172 | - res_append(s) |
173 | - outf_write(s) |
174 | - dir_res[file_rev] = res |
175 | + cache_id = tree.inventory[fid].revision |
176 | + if cache_id in outputter.cache: |
177 | + # GZ 2010-06-05: Not really sure caching and re-outputting |
178 | + # the old path is really the right thing, |
179 | + # but it's what the old code seemed to do |
180 | + outputter.write_cached_lines(cache_id, revno) |
181 | else: |
182 | to_grep_append((fid, (fp, fid))) |
183 | else: |
184 | @@ -293,21 +273,17 @@ |
185 | if opts.files_with_matches or opts.files_without_match: |
186 | # Optimize for wtree list-only as we don't need to read the |
187 | # entire file |
188 | - file = codecs.open(path_for_file, 'r', buffering=4096) |
189 | - _file_grep_list_only_wtree(file, rpath, fp, opts, |
190 | - path_prefix) |
191 | + file = open(path_for_file, 'r', buffering=4096) |
192 | + _file_grep_list_only_wtree(file, fp, opts, path_prefix) |
193 | else: |
194 | - file_text = codecs.open(path_for_file, 'r').read() |
195 | - _file_grep(file_text, rpath, fp, |
196 | - opts, revno, path_prefix) |
197 | + file_text = open(path_for_file, 'r').read() |
198 | + _file_grep(file_text, fp, opts, revno, path_prefix) |
199 | |
200 | if revno != None: # grep versioned files |
201 | for (path, fid), chunks in tree.iter_files_bytes(to_grep): |
202 | path = _make_display_path(relpath, path) |
203 | - res = _file_grep(chunks[0], rpath, path, opts, revno, path_prefix) |
204 | - file_rev = tree.inventory[fid].revision |
205 | - dir_res[file_rev] = res |
206 | - return dir_res |
207 | + _file_grep(chunks[0], path, opts, revno, path_prefix, |
208 | + tree.inventory[fid].revision) |
209 | |
210 | |
211 | def _make_display_path(relpath, path): |
212 | @@ -331,43 +307,32 @@ |
213 | |
214 | path = _make_display_path(relpath, path) |
215 | file_text = tree.get_file_text(id) |
216 | - _file_grep(file_text, relpath, path, opts, revno, path_prefix) |
217 | + _file_grep(file_text, path, opts, revno, path_prefix) |
218 | |
219 | |
220 | def _path_in_glob_list(path, glob_list): |
221 | - present = False |
222 | for glob in glob_list: |
223 | if fnmatch(path, glob): |
224 | - present = True |
225 | - break |
226 | - return present |
227 | - |
228 | - |
229 | -def _file_grep_list_only_wtree(file, relpath, path, opts, path_prefix=None): |
230 | + return True |
231 | + return False |
232 | + |
233 | + |
234 | +def _file_grep_list_only_wtree(file, path, opts, path_prefix=None): |
235 | # test and skip binary files |
236 | if '\x00' in file.read(1024): |
237 | if opts.verbose: |
238 | trace.warning("Binary file '%s' skipped." % path) |
239 | - return |
240 | + return |
241 | |
242 | file.seek(0) # search from beginning |
243 | |
244 | found = False |
245 | if opts.fixed_string: |
246 | pattern = opts.pattern.encode(_user_encoding, 'replace') |
247 | - if opts.fixed_string and opts.ignore_case: |
248 | - pattern = opts.pattern.lower() |
249 | - if opts.ignore_case: |
250 | - for line in file: |
251 | - line = line.lower() |
252 | - if pattern in line: |
253 | - found = True |
254 | - break |
255 | - else: # don't ignore case |
256 | - for line in file: |
257 | - if pattern in line: |
258 | - found = True |
259 | - break |
260 | + for line in file: |
261 | + if pattern in line: |
262 | + found = True |
263 | + break |
264 | else: # not fixed_string |
265 | for line in file: |
266 | if opts.patternc.search(line): |
267 | @@ -379,239 +344,170 @@ |
268 | if path_prefix and path_prefix != '.': |
269 | # user has passed a dir arg, show that as result prefix |
270 | path = osutils.pathjoin(path_prefix, path) |
271 | - path = path.encode(_terminal_encoding, 'replace') |
272 | - s = path + opts.eol_marker |
273 | - opts.outf.write(s) |
274 | - |
275 | - |
276 | -def _file_grep(file_text, relpath, path, opts, revno, path_prefix=None): |
277 | - res = [] |
278 | - res_append = res.append |
279 | - outf_write = opts.outf.write |
280 | - |
281 | - _te = _terminal_encoding |
282 | - _ue = _user_encoding |
283 | - |
284 | - pattern = opts.pattern.encode(_ue, 'replace') |
285 | - patternc = opts.patternc |
286 | - eol_marker = opts.eol_marker |
287 | - |
288 | - if opts.fixed_string and opts.ignore_case: |
289 | - pattern = opts.pattern.lower() |
290 | - |
291 | + opts.outputter.get_writer(path, None, None)() |
292 | + |
293 | + |
294 | +class _Outputter(object): |
295 | + """Precalculate formatting based on options given |
296 | + |
297 | + The idea here is to do this work only once per run, and finally return a |
298 | + function that will do the minimum amount possible for each match. |
299 | + """ |
300 | + def __init__(self, opts, use_cache=False): |
301 | + self.outf = opts.outf |
302 | + if use_cache: |
303 | + # self.cache is used to cache results for dir grep based on fid. |
304 | + # If the fid is does not change between results, it means that |
305 | + # the result will be the same apart from revno. In such a case |
306 | + # we avoid getting file chunks from repo and grepping. The result |
307 | + # is just printed by replacing old revno with new one. |
308 | + self.cache = {} |
309 | + else: |
310 | + self.cache = None |
311 | + no_line = opts.files_with_matches or opts.files_without_match |
312 | + |
313 | + if opts.show_color: |
314 | + pat = opts.pattern.encode(_user_encoding, 'replace') |
315 | + if no_line: |
316 | + self.get_writer = self._get_writer_plain |
317 | + elif opts.fixed_string: |
318 | + self._old = pat |
319 | + self._new = color_string(pat, FG.BOLD_RED) |
320 | + self.get_writer = self._get_writer_fixed_highlighted |
321 | + else: |
322 | + flags = opts.patternc.flags |
323 | + self._sub = re.compile(pat.join(("((?:",")+)")), flags).sub |
324 | + self._highlight = color_string("\\1", FG.BOLD_RED) |
325 | + self.get_writer = self._get_writer_regexp_highlighted |
326 | + path_start = FG.MAGENTA |
327 | + path_end = FG.NONE |
328 | + sep = color_string(':', FG.BOLD_CYAN) |
329 | + rev_sep = color_string('~', FG.BOLD_YELLOW) |
330 | + else: |
331 | + self.get_writer = self._get_writer_plain |
332 | + path_start = path_end = "" |
333 | + sep = ":" |
334 | + rev_sep = "~" |
335 | + |
336 | + parts = [path_start, "%(path)s"] |
337 | + if opts.print_revno: |
338 | + parts.extend([rev_sep, "%(revno)s"]) |
339 | + self._format_initial = "".join(parts) |
340 | + parts = [] |
341 | + if no_line: |
342 | + if not opts.print_revno: |
343 | + parts.append(path_end) |
344 | + else: |
345 | + if opts.line_number: |
346 | + parts.extend([sep, "%(lineno)s"]) |
347 | + parts.extend([sep, "%(line)s"]) |
348 | + parts.append(opts.eol_marker) |
349 | + self._format_perline = "".join(parts) |
350 | + |
351 | + def _get_writer_plain(self, path, revno, cache_id): |
352 | + """Get function for writing uncoloured output""" |
353 | + per_line = self._format_perline |
354 | + start = self._format_initial % {"path":path, "revno":revno} |
355 | + write = self.outf.write |
356 | + if self.cache is not None and cache_id is not None: |
357 | + result_list = [] |
358 | + self.cache[cache_id] = path, result_list |
359 | + add_to_cache = result_list.append |
360 | + def _line_cache_and_writer(**kwargs): |
361 | + """Write formatted line and cache arguments""" |
362 | + end = per_line % kwargs |
363 | + add_to_cache(end) |
364 | + write(start + end) |
365 | + return _line_cache_and_writer |
366 | + def _line_writer(**kwargs): |
367 | + """Write formatted line from arguments given by underlying opts""" |
368 | + write(start + per_line % kwargs) |
369 | + return _line_writer |
370 | + |
371 | + def write_cached_lines(self, cache_id, revno): |
372 | + """Write cached results out again for new revision""" |
373 | + cached_path, cached_matches = self.cache[cache_id] |
374 | + start = self._format_initial % {"path":cached_path, "revno":revno} |
375 | + write = self.outf.write |
376 | + for end in cached_matches: |
377 | + write(start + end) |
378 | + |
379 | + def _get_writer_regexp_highlighted(self, path, revno, cache_id): |
380 | + """Get function for writing output with regexp match highlighted""" |
381 | + _line_writer = self._get_writer_plain(path, revno, cache_id) |
382 | + sub, highlight = self._sub, self._highlight |
383 | + def _line_writer_regexp_highlighted(line, **kwargs): |
384 | + """Write formatted line with matched pattern highlighted""" |
385 | + return _line_writer(line=sub(highlight, line), **kwargs) |
386 | + return _line_writer_regexp_highlighted |
387 | + |
388 | + def _get_writer_fixed_highlighted(self, path, revno, cache_id): |
389 | + """Get function for writing output with search string highlighted""" |
390 | + _line_writer = self._get_writer_plain(path, revno, cache_id) |
391 | + old, new = self._old, self._new |
392 | + def _line_writer_fixed_highlighted(line, **kwargs): |
393 | + """Write formatted line with string searched for highlighted""" |
394 | + return _line_writer(line=line.replace(old, new), **kwargs) |
395 | + return _line_writer_fixed_highlighted |
396 | + |
397 | + |
398 | +def _file_grep(file_text, path, opts, revno, path_prefix=None, cache_id=None): |
399 | # test and skip binary files |
400 | if '\x00' in file_text[:1024]: |
401 | if opts.verbose: |
402 | trace.warning("Binary file '%s' skipped." % path) |
403 | - return res |
404 | + return |
405 | |
406 | if path_prefix and path_prefix != '.': |
407 | # user has passed a dir arg, show that as result prefix |
408 | path = osutils.pathjoin(path_prefix, path) |
409 | |
410 | - path = path.encode(_te, 'replace') |
411 | - |
412 | - if opts.show_color: |
413 | - path = color_string(path, FG.MAGENTA) |
414 | - color_sep = color_string(':', FG.BOLD_CYAN) |
415 | - color_rev_sep = color_string('~', FG.BOLD_YELLOW) |
416 | - |
417 | - # for better performance we moved formatting conditionals out |
418 | - # of the core loop. hence, the core loop is somewhat duplicated |
419 | - # for various combinations of formatting options. |
420 | + # GZ 2010-06-07: There's no actual guarentee the file contents will be in |
421 | + # the user encoding, but we have to guess something and it |
422 | + # is a reasonable default without a better mechanism. |
423 | + file_encoding = _user_encoding |
424 | + pattern = opts.pattern.encode(_user_encoding, 'replace') |
425 | + |
426 | + writeline = opts.outputter.get_writer(path, revno, cache_id) |
427 | |
428 | if opts.files_with_matches or opts.files_without_match: |
429 | # While printing files with matches we only have two case |
430 | # print file name or print file name with revno. |
431 | found = False |
432 | - if opts.print_revno: |
433 | - if opts.fixed_string: |
434 | - for line in file_text.splitlines(): |
435 | - if opts.ignore_case: |
436 | - line = line.lower() |
437 | - if pattern in line: |
438 | - found = True |
439 | - break |
440 | - else: |
441 | - for line in file_text.splitlines(): |
442 | - if patternc.search(line): |
443 | - found = True |
444 | - break |
445 | + if opts.fixed_string: |
446 | + for line in file_text.splitlines(): |
447 | + if pattern in line: |
448 | + found = True |
449 | + break |
450 | else: |
451 | - if opts.fixed_string: |
452 | - for line in file_text.splitlines(): |
453 | - if opts.ignore_case: |
454 | - line = line.lower() |
455 | - if pattern in line: |
456 | - found = True |
457 | - break |
458 | - else: |
459 | - for line in file_text.splitlines(): |
460 | - if patternc.search(line): |
461 | - found = True |
462 | - break |
463 | + search = opts.patternc.search |
464 | + for line in file_text.splitlines(): |
465 | + if search(line): |
466 | + found = True |
467 | + break |
468 | if (opts.files_with_matches and found) or \ |
469 | (opts.files_without_match and not found): |
470 | - if opts.print_revno: |
471 | - pfmt = "~%s".encode(_te, 'replace') |
472 | - if opts.show_color: |
473 | - pfmt = color_rev_sep + "%s" |
474 | - s = path + (pfmt % (revno,)) + eol_marker |
475 | - else: |
476 | - s = path + eol_marker |
477 | - res_append(s) |
478 | - outf_write(s) |
479 | - return res # return from files_with|without_matches |
480 | - |
481 | - |
482 | - if opts.print_revno and opts.line_number: |
483 | - |
484 | - pfmt = "~%s:%d:%s".encode(_te) |
485 | - if opts.show_color: |
486 | - pfmt = color_rev_sep + "%s" + color_sep + "%d" + color_sep + "%s" |
487 | - pfmt = pfmt.encode(_te) |
488 | - |
489 | - if opts.fixed_string: |
490 | - if opts.ignore_case: |
491 | - for index, line in enumerate(file_text.splitlines()): |
492 | - if pattern in line.lower(): |
493 | - line = line.decode(_te, 'replace') |
494 | - if opts.show_color: |
495 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
496 | - s = path + (pfmt % (revno, index+1, line)) + eol_marker |
497 | - res_append(s) |
498 | - outf_write(s) |
499 | - else: # don't ignore case |
500 | - found_str = color_string(pattern, FG.BOLD_RED) |
501 | - for index, line in enumerate(file_text.splitlines()): |
502 | - if pattern in line: |
503 | - line = line.decode(_te, 'replace') |
504 | - if opts.show_color == True: |
505 | - line = line.replace(pattern, found_str) |
506 | - s = path + (pfmt % (revno, index+1, line)) + eol_marker |
507 | - res_append(s) |
508 | - outf_write(s) |
509 | - else: |
510 | + writeline() |
511 | + elif opts.fixed_string: |
512 | + if opts.line_number: |
513 | for index, line in enumerate(file_text.splitlines()): |
514 | - if patternc.search(line): |
515 | - line = line.decode(_te, 'replace') |
516 | - if opts.show_color: |
517 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
518 | - s = path + (pfmt % (revno, index+1, line)) + eol_marker |
519 | - res_append(s) |
520 | - outf_write(s) |
521 | - |
522 | - elif opts.print_revno and not opts.line_number: |
523 | - |
524 | - pfmt = "~%s:%s".encode(_te, 'replace') |
525 | - if opts.show_color: |
526 | - pfmt = color_rev_sep + "%s" + color_sep + "%s" |
527 | - pfmt = pfmt.encode(_te) |
528 | - |
529 | - if opts.fixed_string: |
530 | - if opts.ignore_case: |
531 | - for line in file_text.splitlines(): |
532 | - if pattern in line.lower(): |
533 | - line = line.decode(_te, 'replace') |
534 | - if opts.show_color: |
535 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
536 | - s = path + (pfmt % (revno, line)) + eol_marker |
537 | - res_append(s) |
538 | - outf_write(s) |
539 | - else: # don't ignore case |
540 | - found_str = color_string(pattern, FG.BOLD_RED) |
541 | - for line in file_text.splitlines(): |
542 | - if pattern in line: |
543 | - line = line.decode(_te, 'replace') |
544 | - if opts.show_color == True: |
545 | - line = line.replace(pattern, found_str) |
546 | - s = path + (pfmt % (revno, line)) + eol_marker |
547 | - res_append(s) |
548 | - outf_write(s) |
549 | - |
550 | + if pattern in line: |
551 | + line = line.decode(file_encoding) |
552 | + writeline(lineno=index+1, line=line) |
553 | else: |
554 | for line in file_text.splitlines(): |
555 | - if patternc.search(line): |
556 | - line = line.decode(_te, 'replace') |
557 | - if opts.show_color: |
558 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
559 | - s = path + (pfmt % (revno, line)) + eol_marker |
560 | - res_append(s) |
561 | - outf_write(s) |
562 | - |
563 | - elif not opts.print_revno and opts.line_number: |
564 | - |
565 | - pfmt = ":%d:%s".encode(_te) |
566 | - if opts.show_color: |
567 | - pfmt = color_sep + "%d" + color_sep + "%s" |
568 | - pfmt = pfmt.encode(_te) |
569 | - |
570 | - if opts.fixed_string: |
571 | - if opts.ignore_case: |
572 | - for index, line in enumerate(file_text.splitlines()): |
573 | - if pattern in line.lower(): |
574 | - line = line.decode(_te, 'replace') |
575 | - if opts.show_color: |
576 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
577 | - s = path + (pfmt % (index+1, line)) + eol_marker |
578 | - res_append(s) |
579 | - outf_write(s) |
580 | - else: # don't ignore case |
581 | - for index, line in enumerate(file_text.splitlines()): |
582 | - found_str = color_string(pattern, FG.BOLD_RED) |
583 | - if pattern in line: |
584 | - line = line.decode(_te, 'replace') |
585 | - if opts.show_color == True: |
586 | - line = line.replace(pattern, found_str) |
587 | - s = path + (pfmt % (index+1, line)) + eol_marker |
588 | - res_append(s) |
589 | - outf_write(s) |
590 | - else: |
591 | - for index, line in enumerate(file_text.splitlines()): |
592 | - if patternc.search(line): |
593 | - line = line.decode(_te, 'replace') |
594 | - if opts.show_color: |
595 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
596 | - s = path + (pfmt % (index+1, line)) + eol_marker |
597 | - res_append(s) |
598 | - outf_write(s) |
599 | - |
600 | + if pattern in line: |
601 | + line = line.decode(file_encoding) |
602 | + writeline(line=line) |
603 | else: |
604 | - |
605 | - pfmt = ":%s".encode(_te) |
606 | - if opts.show_color: |
607 | - pfmt = color_sep + "%s" |
608 | - pfmt = pfmt.encode(_te) |
609 | - |
610 | - if opts.fixed_string: |
611 | - if opts.ignore_case: |
612 | - for line in file_text.splitlines(): |
613 | - if pattern in line.lower(): |
614 | - line = line.decode(_te, 'replace') |
615 | - if opts.show_color: |
616 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
617 | - s = path + (pfmt % (line,)) + eol_marker |
618 | - res_append(s) |
619 | - outf_write(s) |
620 | - else: # don't ignore case |
621 | - found_str = color_string(pattern, FG.BOLD_RED) |
622 | - for line in file_text.splitlines(): |
623 | - if pattern in line: |
624 | - line = line.decode(_te, 'replace') |
625 | - if opts.show_color: |
626 | - line = line.replace(pattern, found_str) |
627 | - s = path + (pfmt % (line,)) + eol_marker |
628 | - res_append(s) |
629 | - outf_write(s) |
630 | + search = opts.patternc.search |
631 | + if opts.line_number: |
632 | + for index, line in enumerate(file_text.splitlines()): |
633 | + if search(line): |
634 | + line = line.decode(file_encoding) |
635 | + writeline(lineno=index+1, line=line) |
636 | else: |
637 | for line in file_text.splitlines(): |
638 | - if patternc.search(line): |
639 | - line = line.decode(_te, 'replace') |
640 | - if opts.show_color: |
641 | - line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED) |
642 | - s = path + (pfmt % (line,)) + eol_marker |
643 | - res_append(s) |
644 | - outf_write(s) |
645 | - |
646 | - return res |
647 | - |
648 | + if search(line): |
649 | + line = line.decode(file_encoding) |
650 | + writeline(line=line) |
651 | |
652 | === modified file 'test_grep.py' |
653 | --- test_grep.py 2010-05-23 05:10:32 +0000 |
654 | +++ test_grep.py 2010-06-07 20:35:32 +0000 |
655 | @@ -1936,9 +1936,48 @@ |
656 | self.assertEqual(len(out.splitlines()), 2) # finds line1 and line10 |
657 | |
658 | |
659 | +class TestNonAscii(GrepTestBase): |
660 | + """Tests for non-ascii filenames and file contents""" |
661 | + |
662 | + _test_needs_features = [tests.UnicodeFilenameFeature] |
663 | + |
664 | + def test_unicode_only_file(self): |
665 | + """Test filename and contents that requires a unicode encoding""" |
666 | + tree = self.make_branch_and_tree(".") |
667 | + contents = [u"\u1234"] |
668 | + self.build_tree(contents) |
669 | + tree.add(contents) |
670 | + tree.commit("Initial commit") |
671 | + as_utf8 = u"\u1234".encode("UTF-8") |
672 | + |
673 | + # GZ 2010-06-07: Note we can't actually grep for \u1234 as the pattern |
674 | + # is mangled according to the user encoding. |
675 | + streams = self.run_bzr(["grep", "--files-with-matches", |
676 | + u"contents"], encoding="UTF-8") |
677 | + self.assertEqual(streams, (as_utf8 + "\n", "")) |
678 | + |
679 | + streams = self.run_bzr(["grep", "-r", "1", "--files-with-matches", |
680 | + u"contents"], encoding="UTF-8") |
681 | + self.assertEqual(streams, (as_utf8 + "~1\n", "")) |
682 | + |
683 | + fileencoding = osutils.get_user_encoding() |
684 | + as_mangled = as_utf8.decode(fileencoding, "replace").encode("UTF-8") |
685 | + |
686 | + streams = self.run_bzr(["grep", "-n", |
687 | + u"contents"], encoding="UTF-8") |
688 | + self.assertEqual(streams, ("%s:1:contents of %s\n" % |
689 | + (as_utf8, as_mangled), "")) |
690 | + |
691 | + streams = self.run_bzr(["grep", "-n", "-r", "1", |
692 | + u"contents"], encoding="UTF-8") |
693 | + self.assertEqual(streams, ("%s~1:1:contents of %s\n" % |
694 | + (as_utf8, as_mangled), "")) |
695 | + |
696 | + |
697 | class TestColorGrep(GrepTestBase): |
698 | """Tests for the --color option.""" |
699 | |
700 | + # GZ 2010-06-05: Does this really require the feature? Nothing prints. |
701 | _test_needs_features = [features.color_feature] |
702 | |
703 | _rev_sep = color_string('~', fg=FG.BOLD_YELLOW) |
704 | @@ -1951,6 +1990,50 @@ |
705 | self.assertEqual(out, '') |
706 | self.assertContainsRe(err, 'Valid values for --color are', flags=TestGrep._reflags) |
707 | |
708 | + def test_ver_matching_files(self): |
709 | + """(versioned) Search for matches or no matches only""" |
710 | + tree = self.make_branch_and_tree(".") |
711 | + contents = ["d/", "d/aaa", "bbb"] |
712 | + self.build_tree(contents) |
713 | + tree.add(contents) |
714 | + tree.commit("Initial commit") |
715 | + |
716 | + # GZ 2010-06-05: Maybe modify the working tree here |
717 | + |
718 | + streams = self.run_bzr(["grep", "--color", "always", "-r", "1", |
719 | + "--files-with-matches", "aaa"]) |
720 | + self.assertEqual(streams, ("".join([ |
721 | + FG.MAGENTA, "d/aaa", self._rev_sep, "1", "\n" |
722 | + ]), "")) |
723 | + |
724 | + streams = self.run_bzr(["grep", "--color", "always", "-r", "1", |
725 | + "--files-without-match", "aaa"]) |
726 | + self.assertEqual(streams, ("".join([ |
727 | + FG.MAGENTA, "bbb", self._rev_sep, "1", "\n" |
728 | + ]), "")) |
729 | + |
730 | + def test_wtree_matching_files(self): |
731 | + """(wtree) Search for matches or no matches only""" |
732 | + tree = self.make_branch_and_tree(".") |
733 | + contents = ["d/", "d/aaa", "bbb"] |
734 | + self.build_tree(contents) |
735 | + tree.add(contents) |
736 | + tree.commit("Initial commit") |
737 | + |
738 | + # GZ 2010-06-05: Maybe modify the working tree here |
739 | + |
740 | + streams = self.run_bzr(["grep", "--color", "always", |
741 | + "--files-with-matches", "aaa"]) |
742 | + self.assertEqual(streams, ("".join([ |
743 | + FG.MAGENTA, "d/aaa", FG.NONE, "\n" |
744 | + ]), "")) |
745 | + |
746 | + streams = self.run_bzr(["grep", "--color", "always", |
747 | + "--files-without-match", "aaa"]) |
748 | + self.assertEqual(streams, ("".join([ |
749 | + FG.MAGENTA, "bbb", FG.NONE, "\n" |
750 | + ]), "")) |
751 | + |
752 | def test_ver_basic_file(self): |
753 | """(versioned) Search for pattern in specfic file. |
754 | """ |
755 | @@ -1962,12 +2045,12 @@ |
756 | |
757 | # prepare colored result |
758 | foo = color_string('foo', fg=FG.BOLD_RED) |
759 | - res = (color_string('file0.txt', fg=FG.MAGENTA) |
760 | + res = (FG.MAGENTA + 'file0.txt' |
761 | + self._rev_sep + '1' + self._sep |
762 | + foo + ' is ' + foo + 'bar1' + '\n') |
763 | txt_res = 'file0.txt~1:foo is foobar1\n' |
764 | |
765 | - nres = (color_string('file0.txt', fg=FG.MAGENTA) |
766 | + nres = (FG.MAGENTA + 'file0.txt' |
767 | + self._rev_sep + '1' + self._sep + '1' + self._sep |
768 | + foo + ' is ' + foo + 'bar1' + '\n') |
769 | |
770 | @@ -2029,10 +2112,10 @@ |
771 | |
772 | # prepare colored result |
773 | foo = color_string('foo', fg=FG.BOLD_RED) |
774 | - res = (color_string('file0.txt', fg=FG.MAGENTA) |
775 | + res = (FG.MAGENTA + 'file0.txt' |
776 | + self._sep + foo + ' is ' + foo + 'bar1' + '\n') |
777 | |
778 | - nres = (color_string('file0.txt', fg=FG.MAGENTA) |
779 | + nres = (FG.MAGENTA + 'file0.txt' |
780 | + self._sep + '1' + self._sep |
781 | + foo + ' is ' + foo + 'bar1' + '\n') |
782 |
mgz, thanks for the patch. This definitely makes the code better and -Fi sounds like a good idea.
I noticed that 'bzr grep -F ff' is crashing on mysql-server and emacs repo with this patch.
In case you need a mysql server branch in 2a format you can find one at /code.launchpad .net/~parthm/ +junk/mysql- server
https:/
[mysqls60]% bzr grep ff -F > /dev/null UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)
bzr: ERROR: exceptions.
Traceback (most recent call last): parth/src/ bzr.dev/ trunk/bzrlib/ commands. py", line 911, in exception_ to_return_ code parth/src/ bzr.dev/ trunk/bzrlib/ commands. py", line 1109, in run_bzr parth/src/ bzr.dev/ trunk/bzrlib/ commands. py", line 689, in run_argv_aliases **all_cmd_ args) parth/src/ bzr.dev/ trunk/bzrlib/ commands. py", line 704, in run .run_simple( *args, **kwargs) parth/src/ bzr.dev/ trunk/bzrlib/ cleanup. py", line 122, in run_simple parth/src/ bzr.dev/ trunk/bzrlib/ cleanup. py", line 156, in _do_with_cleanups parth/src/ bzr.dev/ trunk/bzrlib/ commands. py", line 1124, in ignore_pipe parthm/ .bazaar/ plugins/ grep/__ init__. py", line 241, in run workingtree_ grep(GrepOption s) parthm/ .bazaar/ plugins/ grep/grep. py", line 215, in workingtree_grep parthm/ .bazaar/ plugins/ grep/grep. py", line 280, in dir_grep grep(file_ text, fp, opts, revno, path_prefix) parthm/ .bazaar/ plugins/ grep/grep. py", line 497, in _file_grep line=line) parthm/ .bazaar/ plugins/ grep/grep. py", line 425, in _line_writer parth/src/ bzr.dev/ trunk/bzrlib/ ui/text. py", line 510, in write wrapped_ stream. write(to_ write) python2. 6/codecs. py", line 351, in write
File "/storage/
return the_callable(*args, **kwargs)
File "/storage/
ret = run(*run_argv)
File "/storage/
return self.run(
File "/storage/
return self._operation
File "/storage/
self.cleanups, self.func, *args, **kwargs)
File "/storage/
result = func(*args, **kwargs)
File "/storage/
result = func(*args, **kwargs)
File "/home/
grep.
File "/home/
dir_grep(tree, path, relpath, opts, revno, path_prefix)
File "/home/
_file_
File "/home/
writeline(
File "/home/
write(start + per_line % kwargs)
File "/storage/
self.
File "/usr/lib/
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)
You can report this problem to Bazaar's developers by running bzr.1000. 2010-06- 07T02:48. crash
apport-bug /var/crash/
if a bug-reporting window does not automatically appear.
[mysqls60]%
Please add a NEWS entry for the change. I would add it but I think you will be able to better describe it :)
On a somewhat related note, I was thinking of using trunk as stable and having a 'development' line (I have added lp:bzr-grep/development) as users would probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically be merged into trunk. I am also ok with any other approach. Feel free to suggest if you have any other ideas.
If lp:bzr-grep/development sounds ok could you propose this (and the other) branch to lp:bzr-grep/development? I should have probably done something like this earlier. Sorry about this.