Merge lp:~gz/bzr-grep/preformat_output into lp:bzr-grep

Proposed by Martin Packman
Status: Merged
Merged at revision: 137
Proposed branch: lp:~gz/bzr-grep/preformat_output
Merge into: lp:bzr-grep
Diff against target: 781 lines (+280/-298)
4 files modified
NEWS (+6/-0)
__init__.py (+7/-10)
grep.py (+180/-284)
test_grep.py (+87/-4)
To merge this branch: bzr merge lp:~gz/bzr-grep/preformat_output
Reviewer Review Type Date Requested Status
Parth Malwankar Needs Fixing
Review via email: mp+26873@code.launchpad.net

Description of the change

Move a lot of code around, aimed at simplifying and reducing the duplication of pattern search loops so it becomes practical to add another optimisation there.

The main changes are moving the formatting logic out of the inner grep function an into its own object (which is kind of ugly currently, but can probably be improved), and then putting the cache there as well.

By and large, this branch aims not to change behaviour, with a few exceptions:
* Using both a fixed string and the lower case flag now uses a regular expression rather than lower casing the input and the pattern.
* Some colour formatting details.
* A couple of other minor bug fixes.

It's likely this branch changes the timings of certain operations, though I tried to keep the current optimisations as far as possible. I'm particularly interested in differences in before and after profiles for:
* The -Fi case mentioned above
* Huge numbers of matching lines with coloured formatting
* Matches in a file that doesn't change across a large number of revisions

For future work, it'd be great to make these functions less tightly coupled so it's actually possible to unit test them, rather just relying on (slow) functional tests.

To post a comment you must log in.
Revision history for this message
Parth Malwankar (parthm) wrote :

mgz, thanks for the patch. This definitely makes the code better and -Fi sounds like a good idea.
I noticed that 'bzr grep -F ff' is crashing on mysql-server and emacs repo with this patch.

In case you need a mysql server branch in 2a format you can find one at
https://code.launchpad.net/~parthm/+junk/mysql-server

[mysqls60]% bzr grep ff -F > /dev/null
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)

Traceback (most recent call last):
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 911, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 1109, in run_bzr
    ret = run(*run_argv)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 689, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 704, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 122, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 156, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 1124, in ignore_pipe
    result = func(*args, **kwargs)
  File "/home/parthm/.bazaar/plugins/grep/__init__.py", line 241, in run
    grep.workingtree_grep(GrepOptions)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 215, in workingtree_grep
    dir_grep(tree, path, relpath, opts, revno, path_prefix)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 280, in dir_grep
    _file_grep(file_text, fp, opts, revno, path_prefix)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 497, in _file_grep
    writeline(line=line)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 425, in _line_writer
    write(start + per_line % kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/ui/text.py", line 510, in write
    self.wrapped_stream.write(to_write)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)

You can report this problem to Bazaar's developers by running
    apport-bug /var/crash/bzr.1000.2010-06-07T02:48.crash
if a bug-reporting window does not automatically appear.
[mysqls60]%

Please add a NEWS entry for the change. I would add it but I think you will be able to better describe it :)

On a somewhat related note, I was thinking of using trunk as stable and having a 'development' line (I have added lp:bzr-grep/development) as users would probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically be merged into trunk. I am also ok with any other approach. Feel free to suggest if you have any other ideas.
If lp:bzr-grep/development sounds ok could you propose this (and the other) branch to lp:bzr-grep/development? I should have probably done something like this earlier. Sorry about this.

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :

> File "/home/parthm/.bazaar/plugins/grep/grep.py", line 425, in _line_writer
> write(start + per_line % kwargs)
> File "/storage/parth/src/bzr.dev/trunk/bzrlib/ui/text.py", line 510, in
> write
> self.wrapped_stream.write(to_write)
> File "/usr/lib/python2.6/codecs.py", line 351, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78:
> ordinal not in range(128)

Ha, okay, I can reproduce this. Hadn't realised that outf was a wrapped object from bzrlib, so the dodgy line.decode(_terminal_encoding, 'replace') I removed was actually needed to make the, this means there's also a bug with paths as these are currently being *en*coded before being sent to print.

Could see there were still issues with str/unicode in the plugin, but was hoping to put them off till later. I might paper over this one as well for the moment, but we need tests here.

> Please add a NEWS entry for the change. I would add it but I think you will be
> able to better describe it :)

Will do.

> On a somewhat related note, I was thinking of using trunk as stable and having
> a 'development' line (I have added lp:bzr-grep/development) as users would
> probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically
> be merged into trunk. I am also ok with any other approach. Feel free to
> suggest if you have any other ideas.

Hm, I'd prefer if you went with a PQM style thing, where you play the PQM. That is, and all changes get done in their own development branches and merged, not pushed, to trunk only when in a complete form. That means trunk remains stable, and each step is usable (complete with NEWS and tests and so on), rather than being done in bits.

lp:~gz/bzr-grep/preformat_output updated
156. By Martin Packman

Fix and test bytes/unicode issue but there's more to do in this area

157. By Martin Packman

Add NEWS entry for output formatter and -Fi changes

Revision history for this message
Parth Malwankar (parthm) wrote :

> Hm, I'd prefer if you went with a PQM style thing, where you play the PQM.
> That is, and all changes get done in their own development branches and
> merged, not pushed, to trunk only when in a complete form. That means trunk
> remains stable, and each step is usable (complete with NEWS and tests and so
> on), rather than being done in bits.

Works for me. I have removed the development branch.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-05-23 07:59:31 +0000
3+++ NEWS 2010-06-07 20:35:32 +0000
4@@ -1,3 +1,9 @@
5+bzr-grep 0.4.0-dev
6+==================================
7+* Add seperate output formatter to reduce duplication of search loops,
8+ additionally make -Fi use regexp rather than lowercasing pattern and
9+ entirety of text for the same reason. (Martin [gz])
10+
11 bzr-grep 0.3.0-final - 23-May-2010
12 ==================================
13 * Support for --color option (POSIX only). (Parth Malwankar, #571694)
14
15=== modified file '__init__.py'
16--- __init__.py 2010-05-23 05:10:32 +0000
17+++ __init__.py 2010-06-07 20:35:32 +0000
18@@ -188,9 +188,14 @@
19 if null:
20 eol_marker = '\0'
21
22- # if the pattern isalnum, implicitly switch to fixed_string for faster grep
23- if grep.is_fixed_string(pattern):
24+ if not ignore_case and grep.is_fixed_string(pattern):
25+ # if the pattern isalnum, implicitly use to -F for faster grep
26 fixed_string = True
27+ elif ignore_case and fixed_string:
28+ # GZ 2010-06-02: Fall back to regexp rather than lowercasing
29+ # pattern and text which will cause pain later
30+ fixed_string = False
31+ pattern = re.escape(pattern)
32
33 patternc = None
34 re_flags = 0
35@@ -207,13 +212,6 @@
36 elif color == 'auto':
37 show_color = allow_color()
38
39- sub_patternc = None
40- if show_color:
41- sub_pattern = '(' + pattern + ')' # make pattern capturing
42- # sub_patternc is used for color display even for fixed_string
43- # when ignore_case is set
44- sub_patternc = grep.compile_pattern(sub_pattern, re_flags)
45-
46 GrepOptions.verbose = verbose
47 GrepOptions.ignore_case = ignore_case
48 GrepOptions.no_recursive = no_recursive
49@@ -234,7 +232,6 @@
50 GrepOptions.eol_marker = eol_marker
51 GrepOptions.print_revno = print_revno
52 GrepOptions.patternc = patternc
53- GrepOptions.sub_patternc = sub_patternc
54 GrepOptions.recursive = not no_recursive
55 GrepOptions.fixed_string = fixed_string
56 GrepOptions.outf = self.outf
57
58=== modified file 'grep.py'
59--- grep.py 2010-05-22 15:11:00 +0000
60+++ grep.py 2010-06-07 20:35:32 +0000
61@@ -16,12 +16,8 @@
62
63 from bzrlib.lazy_import import lazy_import
64 lazy_import(globals(), """
65-import codecs
66-import cStringIO
67 from fnmatch import fnmatch
68-import os
69 import re
70-import string
71
72 from termcolor import color_string, re_color_string, FG
73
74@@ -133,13 +129,6 @@
75 bzrdir.BzrDir.open_containing_tree_or_branch('.')
76 branch.lock_read()
77 try:
78- # res_cache is used to cache results for dir grep based on fid.
79- # If the fid is does not change between results, it means that
80- # the result will be the same apart from revno. In such a case
81- # we avoid getting file chunks from repo and grepping. The result
82- # is just printed by replacing old revno with new one.
83- res_cache = {}
84-
85 start_rev = opts.revision[0]
86 start_revid = start_rev.as_revision_id(branch)
87 if start_revid == None:
88@@ -179,6 +168,9 @@
89 start_rev_tuple = (start_revid, start_revno, 0)
90 given_revs = [start_rev_tuple]
91
92+ # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now
93+ opts.outputter = _Outputter(opts, use_cache=True)
94+
95 for revid, revno, merge_depth in given_revs:
96 if opts.levels == 1 and merge_depth != 0:
97 # with level=1 show only top level
98@@ -195,8 +187,7 @@
99
100 if osutils.isdir(path):
101 path_prefix = path
102- res_cache = dir_grep(tree, path, relpath, opts,
103- revno, path_prefix, res_cache)
104+ dir_grep(tree, path, relpath, opts, revno, path_prefix)
105 else:
106 versioned_file_grep(tree, id, '.', path, opts, revno)
107 finally:
108@@ -213,6 +204,9 @@
109 'To search for specific revision in history use the -r option.')
110 raise errors.BzrCommandError(msg)
111
112+ # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now
113+ opts.outputter = _Outputter(opts)
114+
115 tree.lock_read()
116 try:
117 for path in opts.path_list:
118@@ -220,7 +214,7 @@
119 path_prefix = path
120 dir_grep(tree, path, relpath, opts, revno, path_prefix)
121 else:
122- _file_grep(open(path).read(), '.', path, opts, revno)
123+ _file_grep(open(path).read(), path, opts, revno)
124 finally:
125 tree.unlock()
126
127@@ -233,11 +227,7 @@
128 return False
129
130
131-def dir_grep(tree, path, relpath, opts, revno, path_prefix, res_cache={}):
132- _revno_pattern = re.compile("\~[0-9.]+:")
133- _revno_pattern_list_only = re.compile("\~[0-9.]+")
134- dir_res = {}
135-
136+def dir_grep(tree, path, relpath, opts, revno, path_prefix):
137 # setup relpath to open files relative to cwd
138 rpath = relpath
139 if relpath:
140@@ -251,7 +241,10 @@
141
142 to_grep = []
143 to_grep_append = to_grep.append
144- outf_write = opts.outf.write
145+ # GZ 2010-06-05: The cache dict used to be recycled every call to dir_grep
146+ # and hits manually refilled. Could do this again if it was
147+ # for a good reason, otherwise cache might want purging.
148+ outputter = opts.outputter
149 for fp, fc, fkind, fid, entry in tree.list_files(include_root=False,
150 from_dir=from_dir, recursive=opts.recursive):
151
152@@ -263,25 +256,12 @@
153 # If old result is valid, print results immediately.
154 # Otherwise, add file info to to_grep so that the
155 # loop later will get chunks and grep them
156- file_rev = tree.inventory[fid].revision
157- old_res = res_cache.get(file_rev)
158- if old_res != None:
159- res = []
160- res_append = res.append
161-
162- if opts.files_with_matches or opts.files_without_match:
163- new_rev = '~' + revno
164- else:
165- new_rev = ('~%s:' % (revno,))
166-
167- for line in old_res:
168- if opts.files_with_matches or opts.files_without_match:
169- s = _revno_pattern_list_only.sub(new_rev, line)
170- else:
171- s = _revno_pattern.sub(new_rev, line)
172- res_append(s)
173- outf_write(s)
174- dir_res[file_rev] = res
175+ cache_id = tree.inventory[fid].revision
176+ if cache_id in outputter.cache:
177+ # GZ 2010-06-05: Not really sure caching and re-outputting
178+ # the old path is really the right thing,
179+ # but it's what the old code seemed to do
180+ outputter.write_cached_lines(cache_id, revno)
181 else:
182 to_grep_append((fid, (fp, fid)))
183 else:
184@@ -293,21 +273,17 @@
185 if opts.files_with_matches or opts.files_without_match:
186 # Optimize for wtree list-only as we don't need to read the
187 # entire file
188- file = codecs.open(path_for_file, 'r', buffering=4096)
189- _file_grep_list_only_wtree(file, rpath, fp, opts,
190- path_prefix)
191+ file = open(path_for_file, 'r', buffering=4096)
192+ _file_grep_list_only_wtree(file, fp, opts, path_prefix)
193 else:
194- file_text = codecs.open(path_for_file, 'r').read()
195- _file_grep(file_text, rpath, fp,
196- opts, revno, path_prefix)
197+ file_text = open(path_for_file, 'r').read()
198+ _file_grep(file_text, fp, opts, revno, path_prefix)
199
200 if revno != None: # grep versioned files
201 for (path, fid), chunks in tree.iter_files_bytes(to_grep):
202 path = _make_display_path(relpath, path)
203- res = _file_grep(chunks[0], rpath, path, opts, revno, path_prefix)
204- file_rev = tree.inventory[fid].revision
205- dir_res[file_rev] = res
206- return dir_res
207+ _file_grep(chunks[0], path, opts, revno, path_prefix,
208+ tree.inventory[fid].revision)
209
210
211 def _make_display_path(relpath, path):
212@@ -331,43 +307,32 @@
213
214 path = _make_display_path(relpath, path)
215 file_text = tree.get_file_text(id)
216- _file_grep(file_text, relpath, path, opts, revno, path_prefix)
217+ _file_grep(file_text, path, opts, revno, path_prefix)
218
219
220 def _path_in_glob_list(path, glob_list):
221- present = False
222 for glob in glob_list:
223 if fnmatch(path, glob):
224- present = True
225- break
226- return present
227-
228-
229-def _file_grep_list_only_wtree(file, relpath, path, opts, path_prefix=None):
230+ return True
231+ return False
232+
233+
234+def _file_grep_list_only_wtree(file, path, opts, path_prefix=None):
235 # test and skip binary files
236 if '\x00' in file.read(1024):
237 if opts.verbose:
238 trace.warning("Binary file '%s' skipped." % path)
239- return
240+ return
241
242 file.seek(0) # search from beginning
243
244 found = False
245 if opts.fixed_string:
246 pattern = opts.pattern.encode(_user_encoding, 'replace')
247- if opts.fixed_string and opts.ignore_case:
248- pattern = opts.pattern.lower()
249- if opts.ignore_case:
250- for line in file:
251- line = line.lower()
252- if pattern in line:
253- found = True
254- break
255- else: # don't ignore case
256- for line in file:
257- if pattern in line:
258- found = True
259- break
260+ for line in file:
261+ if pattern in line:
262+ found = True
263+ break
264 else: # not fixed_string
265 for line in file:
266 if opts.patternc.search(line):
267@@ -379,239 +344,170 @@
268 if path_prefix and path_prefix != '.':
269 # user has passed a dir arg, show that as result prefix
270 path = osutils.pathjoin(path_prefix, path)
271- path = path.encode(_terminal_encoding, 'replace')
272- s = path + opts.eol_marker
273- opts.outf.write(s)
274-
275-
276-def _file_grep(file_text, relpath, path, opts, revno, path_prefix=None):
277- res = []
278- res_append = res.append
279- outf_write = opts.outf.write
280-
281- _te = _terminal_encoding
282- _ue = _user_encoding
283-
284- pattern = opts.pattern.encode(_ue, 'replace')
285- patternc = opts.patternc
286- eol_marker = opts.eol_marker
287-
288- if opts.fixed_string and opts.ignore_case:
289- pattern = opts.pattern.lower()
290-
291+ opts.outputter.get_writer(path, None, None)()
292+
293+
294+class _Outputter(object):
295+ """Precalculate formatting based on options given
296+
297+ The idea here is to do this work only once per run, and finally return a
298+ function that will do the minimum amount possible for each match.
299+ """
300+ def __init__(self, opts, use_cache=False):
301+ self.outf = opts.outf
302+ if use_cache:
303+ # self.cache is used to cache results for dir grep based on fid.
304+ # If the fid is does not change between results, it means that
305+ # the result will be the same apart from revno. In such a case
306+ # we avoid getting file chunks from repo and grepping. The result
307+ # is just printed by replacing old revno with new one.
308+ self.cache = {}
309+ else:
310+ self.cache = None
311+ no_line = opts.files_with_matches or opts.files_without_match
312+
313+ if opts.show_color:
314+ pat = opts.pattern.encode(_user_encoding, 'replace')
315+ if no_line:
316+ self.get_writer = self._get_writer_plain
317+ elif opts.fixed_string:
318+ self._old = pat
319+ self._new = color_string(pat, FG.BOLD_RED)
320+ self.get_writer = self._get_writer_fixed_highlighted
321+ else:
322+ flags = opts.patternc.flags
323+ self._sub = re.compile(pat.join(("((?:",")+)")), flags).sub
324+ self._highlight = color_string("\\1", FG.BOLD_RED)
325+ self.get_writer = self._get_writer_regexp_highlighted
326+ path_start = FG.MAGENTA
327+ path_end = FG.NONE
328+ sep = color_string(':', FG.BOLD_CYAN)
329+ rev_sep = color_string('~', FG.BOLD_YELLOW)
330+ else:
331+ self.get_writer = self._get_writer_plain
332+ path_start = path_end = ""
333+ sep = ":"
334+ rev_sep = "~"
335+
336+ parts = [path_start, "%(path)s"]
337+ if opts.print_revno:
338+ parts.extend([rev_sep, "%(revno)s"])
339+ self._format_initial = "".join(parts)
340+ parts = []
341+ if no_line:
342+ if not opts.print_revno:
343+ parts.append(path_end)
344+ else:
345+ if opts.line_number:
346+ parts.extend([sep, "%(lineno)s"])
347+ parts.extend([sep, "%(line)s"])
348+ parts.append(opts.eol_marker)
349+ self._format_perline = "".join(parts)
350+
351+ def _get_writer_plain(self, path, revno, cache_id):
352+ """Get function for writing uncoloured output"""
353+ per_line = self._format_perline
354+ start = self._format_initial % {"path":path, "revno":revno}
355+ write = self.outf.write
356+ if self.cache is not None and cache_id is not None:
357+ result_list = []
358+ self.cache[cache_id] = path, result_list
359+ add_to_cache = result_list.append
360+ def _line_cache_and_writer(**kwargs):
361+ """Write formatted line and cache arguments"""
362+ end = per_line % kwargs
363+ add_to_cache(end)
364+ write(start + end)
365+ return _line_cache_and_writer
366+ def _line_writer(**kwargs):
367+ """Write formatted line from arguments given by underlying opts"""
368+ write(start + per_line % kwargs)
369+ return _line_writer
370+
371+ def write_cached_lines(self, cache_id, revno):
372+ """Write cached results out again for new revision"""
373+ cached_path, cached_matches = self.cache[cache_id]
374+ start = self._format_initial % {"path":cached_path, "revno":revno}
375+ write = self.outf.write
376+ for end in cached_matches:
377+ write(start + end)
378+
379+ def _get_writer_regexp_highlighted(self, path, revno, cache_id):
380+ """Get function for writing output with regexp match highlighted"""
381+ _line_writer = self._get_writer_plain(path, revno, cache_id)
382+ sub, highlight = self._sub, self._highlight
383+ def _line_writer_regexp_highlighted(line, **kwargs):
384+ """Write formatted line with matched pattern highlighted"""
385+ return _line_writer(line=sub(highlight, line), **kwargs)
386+ return _line_writer_regexp_highlighted
387+
388+ def _get_writer_fixed_highlighted(self, path, revno, cache_id):
389+ """Get function for writing output with search string highlighted"""
390+ _line_writer = self._get_writer_plain(path, revno, cache_id)
391+ old, new = self._old, self._new
392+ def _line_writer_fixed_highlighted(line, **kwargs):
393+ """Write formatted line with string searched for highlighted"""
394+ return _line_writer(line=line.replace(old, new), **kwargs)
395+ return _line_writer_fixed_highlighted
396+
397+
398+def _file_grep(file_text, path, opts, revno, path_prefix=None, cache_id=None):
399 # test and skip binary files
400 if '\x00' in file_text[:1024]:
401 if opts.verbose:
402 trace.warning("Binary file '%s' skipped." % path)
403- return res
404+ return
405
406 if path_prefix and path_prefix != '.':
407 # user has passed a dir arg, show that as result prefix
408 path = osutils.pathjoin(path_prefix, path)
409
410- path = path.encode(_te, 'replace')
411-
412- if opts.show_color:
413- path = color_string(path, FG.MAGENTA)
414- color_sep = color_string(':', FG.BOLD_CYAN)
415- color_rev_sep = color_string('~', FG.BOLD_YELLOW)
416-
417- # for better performance we moved formatting conditionals out
418- # of the core loop. hence, the core loop is somewhat duplicated
419- # for various combinations of formatting options.
420+ # GZ 2010-06-07: There's no actual guarentee the file contents will be in
421+ # the user encoding, but we have to guess something and it
422+ # is a reasonable default without a better mechanism.
423+ file_encoding = _user_encoding
424+ pattern = opts.pattern.encode(_user_encoding, 'replace')
425+
426+ writeline = opts.outputter.get_writer(path, revno, cache_id)
427
428 if opts.files_with_matches or opts.files_without_match:
429 # While printing files with matches we only have two case
430 # print file name or print file name with revno.
431 found = False
432- if opts.print_revno:
433- if opts.fixed_string:
434- for line in file_text.splitlines():
435- if opts.ignore_case:
436- line = line.lower()
437- if pattern in line:
438- found = True
439- break
440- else:
441- for line in file_text.splitlines():
442- if patternc.search(line):
443- found = True
444- break
445+ if opts.fixed_string:
446+ for line in file_text.splitlines():
447+ if pattern in line:
448+ found = True
449+ break
450 else:
451- if opts.fixed_string:
452- for line in file_text.splitlines():
453- if opts.ignore_case:
454- line = line.lower()
455- if pattern in line:
456- found = True
457- break
458- else:
459- for line in file_text.splitlines():
460- if patternc.search(line):
461- found = True
462- break
463+ search = opts.patternc.search
464+ for line in file_text.splitlines():
465+ if search(line):
466+ found = True
467+ break
468 if (opts.files_with_matches and found) or \
469 (opts.files_without_match and not found):
470- if opts.print_revno:
471- pfmt = "~%s".encode(_te, 'replace')
472- if opts.show_color:
473- pfmt = color_rev_sep + "%s"
474- s = path + (pfmt % (revno,)) + eol_marker
475- else:
476- s = path + eol_marker
477- res_append(s)
478- outf_write(s)
479- return res # return from files_with|without_matches
480-
481-
482- if opts.print_revno and opts.line_number:
483-
484- pfmt = "~%s:%d:%s".encode(_te)
485- if opts.show_color:
486- pfmt = color_rev_sep + "%s" + color_sep + "%d" + color_sep + "%s"
487- pfmt = pfmt.encode(_te)
488-
489- if opts.fixed_string:
490- if opts.ignore_case:
491- for index, line in enumerate(file_text.splitlines()):
492- if pattern in line.lower():
493- line = line.decode(_te, 'replace')
494- if opts.show_color:
495- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
496- s = path + (pfmt % (revno, index+1, line)) + eol_marker
497- res_append(s)
498- outf_write(s)
499- else: # don't ignore case
500- found_str = color_string(pattern, FG.BOLD_RED)
501- for index, line in enumerate(file_text.splitlines()):
502- if pattern in line:
503- line = line.decode(_te, 'replace')
504- if opts.show_color == True:
505- line = line.replace(pattern, found_str)
506- s = path + (pfmt % (revno, index+1, line)) + eol_marker
507- res_append(s)
508- outf_write(s)
509- else:
510+ writeline()
511+ elif opts.fixed_string:
512+ if opts.line_number:
513 for index, line in enumerate(file_text.splitlines()):
514- if patternc.search(line):
515- line = line.decode(_te, 'replace')
516- if opts.show_color:
517- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
518- s = path + (pfmt % (revno, index+1, line)) + eol_marker
519- res_append(s)
520- outf_write(s)
521-
522- elif opts.print_revno and not opts.line_number:
523-
524- pfmt = "~%s:%s".encode(_te, 'replace')
525- if opts.show_color:
526- pfmt = color_rev_sep + "%s" + color_sep + "%s"
527- pfmt = pfmt.encode(_te)
528-
529- if opts.fixed_string:
530- if opts.ignore_case:
531- for line in file_text.splitlines():
532- if pattern in line.lower():
533- line = line.decode(_te, 'replace')
534- if opts.show_color:
535- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
536- s = path + (pfmt % (revno, line)) + eol_marker
537- res_append(s)
538- outf_write(s)
539- else: # don't ignore case
540- found_str = color_string(pattern, FG.BOLD_RED)
541- for line in file_text.splitlines():
542- if pattern in line:
543- line = line.decode(_te, 'replace')
544- if opts.show_color == True:
545- line = line.replace(pattern, found_str)
546- s = path + (pfmt % (revno, line)) + eol_marker
547- res_append(s)
548- outf_write(s)
549-
550+ if pattern in line:
551+ line = line.decode(file_encoding)
552+ writeline(lineno=index+1, line=line)
553 else:
554 for line in file_text.splitlines():
555- if patternc.search(line):
556- line = line.decode(_te, 'replace')
557- if opts.show_color:
558- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
559- s = path + (pfmt % (revno, line)) + eol_marker
560- res_append(s)
561- outf_write(s)
562-
563- elif not opts.print_revno and opts.line_number:
564-
565- pfmt = ":%d:%s".encode(_te)
566- if opts.show_color:
567- pfmt = color_sep + "%d" + color_sep + "%s"
568- pfmt = pfmt.encode(_te)
569-
570- if opts.fixed_string:
571- if opts.ignore_case:
572- for index, line in enumerate(file_text.splitlines()):
573- if pattern in line.lower():
574- line = line.decode(_te, 'replace')
575- if opts.show_color:
576- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
577- s = path + (pfmt % (index+1, line)) + eol_marker
578- res_append(s)
579- outf_write(s)
580- else: # don't ignore case
581- for index, line in enumerate(file_text.splitlines()):
582- found_str = color_string(pattern, FG.BOLD_RED)
583- if pattern in line:
584- line = line.decode(_te, 'replace')
585- if opts.show_color == True:
586- line = line.replace(pattern, found_str)
587- s = path + (pfmt % (index+1, line)) + eol_marker
588- res_append(s)
589- outf_write(s)
590- else:
591- for index, line in enumerate(file_text.splitlines()):
592- if patternc.search(line):
593- line = line.decode(_te, 'replace')
594- if opts.show_color:
595- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
596- s = path + (pfmt % (index+1, line)) + eol_marker
597- res_append(s)
598- outf_write(s)
599-
600+ if pattern in line:
601+ line = line.decode(file_encoding)
602+ writeline(line=line)
603 else:
604-
605- pfmt = ":%s".encode(_te)
606- if opts.show_color:
607- pfmt = color_sep + "%s"
608- pfmt = pfmt.encode(_te)
609-
610- if opts.fixed_string:
611- if opts.ignore_case:
612- for line in file_text.splitlines():
613- if pattern in line.lower():
614- line = line.decode(_te, 'replace')
615- if opts.show_color:
616- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
617- s = path + (pfmt % (line,)) + eol_marker
618- res_append(s)
619- outf_write(s)
620- else: # don't ignore case
621- found_str = color_string(pattern, FG.BOLD_RED)
622- for line in file_text.splitlines():
623- if pattern in line:
624- line = line.decode(_te, 'replace')
625- if opts.show_color:
626- line = line.replace(pattern, found_str)
627- s = path + (pfmt % (line,)) + eol_marker
628- res_append(s)
629- outf_write(s)
630+ search = opts.patternc.search
631+ if opts.line_number:
632+ for index, line in enumerate(file_text.splitlines()):
633+ if search(line):
634+ line = line.decode(file_encoding)
635+ writeline(lineno=index+1, line=line)
636 else:
637 for line in file_text.splitlines():
638- if patternc.search(line):
639- line = line.decode(_te, 'replace')
640- if opts.show_color:
641- line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
642- s = path + (pfmt % (line,)) + eol_marker
643- res_append(s)
644- outf_write(s)
645-
646- return res
647-
648+ if search(line):
649+ line = line.decode(file_encoding)
650+ writeline(line=line)
651
652=== modified file 'test_grep.py'
653--- test_grep.py 2010-05-23 05:10:32 +0000
654+++ test_grep.py 2010-06-07 20:35:32 +0000
655@@ -1936,9 +1936,48 @@
656 self.assertEqual(len(out.splitlines()), 2) # finds line1 and line10
657
658
659+class TestNonAscii(GrepTestBase):
660+ """Tests for non-ascii filenames and file contents"""
661+
662+ _test_needs_features = [tests.UnicodeFilenameFeature]
663+
664+ def test_unicode_only_file(self):
665+ """Test filename and contents that requires a unicode encoding"""
666+ tree = self.make_branch_and_tree(".")
667+ contents = [u"\u1234"]
668+ self.build_tree(contents)
669+ tree.add(contents)
670+ tree.commit("Initial commit")
671+ as_utf8 = u"\u1234".encode("UTF-8")
672+
673+ # GZ 2010-06-07: Note we can't actually grep for \u1234 as the pattern
674+ # is mangled according to the user encoding.
675+ streams = self.run_bzr(["grep", "--files-with-matches",
676+ u"contents"], encoding="UTF-8")
677+ self.assertEqual(streams, (as_utf8 + "\n", ""))
678+
679+ streams = self.run_bzr(["grep", "-r", "1", "--files-with-matches",
680+ u"contents"], encoding="UTF-8")
681+ self.assertEqual(streams, (as_utf8 + "~1\n", ""))
682+
683+ fileencoding = osutils.get_user_encoding()
684+ as_mangled = as_utf8.decode(fileencoding, "replace").encode("UTF-8")
685+
686+ streams = self.run_bzr(["grep", "-n",
687+ u"contents"], encoding="UTF-8")
688+ self.assertEqual(streams, ("%s:1:contents of %s\n" %
689+ (as_utf8, as_mangled), ""))
690+
691+ streams = self.run_bzr(["grep", "-n", "-r", "1",
692+ u"contents"], encoding="UTF-8")
693+ self.assertEqual(streams, ("%s~1:1:contents of %s\n" %
694+ (as_utf8, as_mangled), ""))
695+
696+
697 class TestColorGrep(GrepTestBase):
698 """Tests for the --color option."""
699
700+ # GZ 2010-06-05: Does this really require the feature? Nothing prints.
701 _test_needs_features = [features.color_feature]
702
703 _rev_sep = color_string('~', fg=FG.BOLD_YELLOW)
704@@ -1951,6 +1990,50 @@
705 self.assertEqual(out, '')
706 self.assertContainsRe(err, 'Valid values for --color are', flags=TestGrep._reflags)
707
708+ def test_ver_matching_files(self):
709+ """(versioned) Search for matches or no matches only"""
710+ tree = self.make_branch_and_tree(".")
711+ contents = ["d/", "d/aaa", "bbb"]
712+ self.build_tree(contents)
713+ tree.add(contents)
714+ tree.commit("Initial commit")
715+
716+ # GZ 2010-06-05: Maybe modify the working tree here
717+
718+ streams = self.run_bzr(["grep", "--color", "always", "-r", "1",
719+ "--files-with-matches", "aaa"])
720+ self.assertEqual(streams, ("".join([
721+ FG.MAGENTA, "d/aaa", self._rev_sep, "1", "\n"
722+ ]), ""))
723+
724+ streams = self.run_bzr(["grep", "--color", "always", "-r", "1",
725+ "--files-without-match", "aaa"])
726+ self.assertEqual(streams, ("".join([
727+ FG.MAGENTA, "bbb", self._rev_sep, "1", "\n"
728+ ]), ""))
729+
730+ def test_wtree_matching_files(self):
731+ """(wtree) Search for matches or no matches only"""
732+ tree = self.make_branch_and_tree(".")
733+ contents = ["d/", "d/aaa", "bbb"]
734+ self.build_tree(contents)
735+ tree.add(contents)
736+ tree.commit("Initial commit")
737+
738+ # GZ 2010-06-05: Maybe modify the working tree here
739+
740+ streams = self.run_bzr(["grep", "--color", "always",
741+ "--files-with-matches", "aaa"])
742+ self.assertEqual(streams, ("".join([
743+ FG.MAGENTA, "d/aaa", FG.NONE, "\n"
744+ ]), ""))
745+
746+ streams = self.run_bzr(["grep", "--color", "always",
747+ "--files-without-match", "aaa"])
748+ self.assertEqual(streams, ("".join([
749+ FG.MAGENTA, "bbb", FG.NONE, "\n"
750+ ]), ""))
751+
752 def test_ver_basic_file(self):
753 """(versioned) Search for pattern in specfic file.
754 """
755@@ -1962,12 +2045,12 @@
756
757 # prepare colored result
758 foo = color_string('foo', fg=FG.BOLD_RED)
759- res = (color_string('file0.txt', fg=FG.MAGENTA)
760+ res = (FG.MAGENTA + 'file0.txt'
761 + self._rev_sep + '1' + self._sep
762 + foo + ' is ' + foo + 'bar1' + '\n')
763 txt_res = 'file0.txt~1:foo is foobar1\n'
764
765- nres = (color_string('file0.txt', fg=FG.MAGENTA)
766+ nres = (FG.MAGENTA + 'file0.txt'
767 + self._rev_sep + '1' + self._sep + '1' + self._sep
768 + foo + ' is ' + foo + 'bar1' + '\n')
769
770@@ -2029,10 +2112,10 @@
771
772 # prepare colored result
773 foo = color_string('foo', fg=FG.BOLD_RED)
774- res = (color_string('file0.txt', fg=FG.MAGENTA)
775+ res = (FG.MAGENTA + 'file0.txt'
776 + self._sep + foo + ' is ' + foo + 'bar1' + '\n')
777
778- nres = (color_string('file0.txt', fg=FG.MAGENTA)
779+ nres = (FG.MAGENTA + 'file0.txt'
780 + self._sep + '1' + self._sep
781 + foo + ' is ' + foo + 'bar1' + '\n')
782

Subscribers

People subscribed via source and target branches