bzr-grep

Merge lp:~gz/bzr-grep/preformat_output into lp:bzr-grep

preformat_output
Merge into trunk

Proposed by Martin Packman on 2010-06-05

Status:	Merged
Merged at revision:	137
Proposed branch:	lp:~gz/bzr-grep/preformat_output
Merge into:	lp:bzr-grep
Diff against target:	781 lines (+280/-298) 4 files modified NEWS (+6/-0) __init__.py (+7/-10) grep.py (+180/-284) test_grep.py (+87/-4)
To merge this branch:	bzr merge lp:~gz/bzr-grep/preformat_output
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Parth Malwankar		2010-06-05	Needs Fixing on 2010-06-07
Review via email: mp+26873@code.launchpad.net

Description of the change

Move a lot of code around, aimed at simplifying and reducing the duplication of pattern search loops so it becomes practical to add another optimisation there.

The main changes are moving the formatting logic out of the inner grep function an into its own object (which is kind of ugly currently, but can probably be improved), and then putting the cache there as well.

By and large, this branch aims not to change behaviour, with a few exceptions:
* Using both a fixed string and the lower case flag now uses a regular expression rather than lower casing the input and the pattern.
* Some colour formatting details.
* A couple of other minor bug fixes.

It's likely this branch changes the timings of certain operations, though I tried to keep the current optimisations as far as possible. I'm particularly interested in differences in before and after profiles for:
* The -Fi case mentioned above
* Huge numbers of matching lines with coloured formatting
* Matches in a file that doesn't change across a large number of revisions

For future work, it'd be great to make these functions less tightly coupled so it's actually possible to unit test them, rather just relying on (slow) functional tests.

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-06-07:

mgz, thanks for the patch. This definitely makes the code better and -Fi sounds like a good idea.
I noticed that 'bzr grep -F ff' is crashing on mysql-server and emacs repo with this patch.

In case you need a mysql server branch in 2a format you can find one at
https://code.launchpad.net/~parthm/+junk/mysql-server

[mysqls60]% bzr grep ff -F > /dev/null
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)

Traceback (most recent call last):
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 911, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 1109, in run_bzr
    ret = run(*run_argv)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 689, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 704, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 122, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 156, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 1124, in ignore_pipe
    result = func(*args, **kwargs)
  File "/home/parthm/.bazaar/plugins/grep/__init__.py", line 241, in run
    grep.workingtree_grep(GrepOptions)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 215, in workingtree_grep
    dir_grep(tree, path, relpath, opts, revno, path_prefix)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 280, in dir_grep
    _file_grep(file_text, fp, opts, revno, path_prefix)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 497, in _file_grep
    writeline(line=line)
  File "/home/parthm/.bazaar/plugins/grep/grep.py", line 425, in _line_writer
    write(start + per_line % kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/ui/text.py", line 510, in write
    self.wrapped_stream.write(to_write)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)

You can report this problem to Bazaar's developers by running
apport-bug /var/crash/bzr.1000.2010-06-07T02:48.crash
if a bug-reporting window does not automatically appear.
[mysqls60]%

Please add a NEWS entry for the change. I would add it but I think you will be able to better describe it :)

On a somewhat related note, I was thinking of using trunk as stable and having a 'development' line (I have added lp:bzr-grep/development) as users would probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically be merged into trunk. I am also ok with any other approach. Feel free to suggest if you have any other ideas.
If lp:bzr-grep/development sounds ok could you propose this (and the other) branch to lp:bzr-grep/development? I should have probably done something like this earlier. Sorry about this.

mgz, thanks for the patch. This definitely makes the code better and -Fi sounds like a good idea.
I noticed that 'bzr grep -F ff' is crashing on mysql-server and emacs repo with this patch.

In case you need a mysql server branch in 2a format you can find one at
https://code.launchpad.net/~parthm/+junk/mysql-server

[mysqls60]% bzr grep ff -F > /dev/null 
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78: ordinal not in range(128)

You can report this problem to Bazaar's developers by running
    apport-bug /var/crash/bzr.1000.2010-06-07T02:48.crash
if a bug-reporting window does not automatically appear.
[mysqls60]%

Please add a NEWS entry for the change. I would add it but I think you will be able to better describe it :)

review: Needs Fixing

Revision history for this message

Martin Packman (gz) wrote on 2010-06-07:

> File "/home/parthm/.bazaar/plugins/grep/grep.py", line 425, in _line_writer
> write(start + per_line % kwargs)
> File "/storage/parth/src/bzr.dev/trunk/bzrlib/ui/text.py", line 510, in
> write
> self.wrapped_stream.write(to_write)
> File "/usr/lib/python2.6/codecs.py", line 351, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 78:
> ordinal not in range(128)

Ha, okay, I can reproduce this. Hadn't realised that outf was a wrapped object from bzrlib, so the dodgy line.decode(_terminal_encoding, 'replace') I removed was actually needed to make the, this means there's also a bug with paths as these are currently being *en*coded before being sent to print.

Could see there were still issues with str/unicode in the plugin, but was hoping to put them off till later. I might paper over this one as well for the moment, but we need tests here.

> Please add a NEWS entry for the change. I would add it but I think you will be
> able to better describe it :)

Will do.

> On a somewhat related note, I was thinking of using trunk as stable and having
> a 'development' line (I have added lp:bzr-grep/development) as users would
> probably be pulling from lp:bzr-grep. lp:bzr-grep/development can periodically
> be merged into trunk. I am also ok with any other approach. Feel free to
> suggest if you have any other ideas.

Hm, I'd prefer if you went with a PQM style thing, where you play the PQM. That is, and all changes get done in their own development branches and merged, not pushed, to trunk only when in a complete form. That means trunk remains stable, and each step is usable (complete with NEWS and tests and so on), rather than being done in bits.

lp:~gz/bzr-grep/preformat_output updated on 2010-06-07

156. By Martin Packman on 2010-06-07: Fix and test bytes/unicode issue but there's more to do in this area
157. By Martin Packman on 2010-06-07: Add NEWS entry for output formatter and -Fi changes

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-06-08:

> Hm, I'd prefer if you went with a PQM style thing, where you play the PQM.
> That is, and all changes get done in their own development branches and
> merged, not pushed, to trunk only when in a complete form. That means trunk
> remains stable, and each step is usable (complete with NEWS and tests and so
> on), rather than being done in bits.

Works for me. I have removed the development branch.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Martin Packman

Parth Malwankar

 === modified file 'NEWS'
 --- NEWS	2010-05-23 07:59:31 +0000
 +++ NEWS	2010-06-07 20:35:32 +0000
@@ -1,3 +1,9 @@
++bzr-grep 0.4.0-dev
++==================================
++* Add seperate output formatter to reduce duplication of search loops,
++  additionally make -Fi use regexp rather than lowercasing pattern and
++  entirety of text for the same reason. (Martin [gz])
++
  bzr-grep 0.3.0-final - 23-May-2010
  ==================================
  * Support for --color option (POSIX only). (Parth Malwankar, #571694)
 === modified file '__init__.py'
 --- __init__.py	2010-05-23 05:10:32 +0000
 +++ __init__.py	2010-06-07 20:35:32 +0000
@@ -188,9 +188,14 @@
          if null:
              eol_marker = '\0'
--        # if the pattern isalnum, implicitly switch to fixed_string for faster grep
--        if grep.is_fixed_string(pattern):
++        if not ignore_case and grep.is_fixed_string(pattern):
++            # if the pattern isalnum, implicitly use to -F for faster grep
              fixed_string = True
++        elif ignore_case and fixed_string:
++            # GZ 2010-06-02: Fall back to regexp rather than lowercasing
++            #                pattern and text which will cause pain later
++            fixed_string = False
++            pattern = re.escape(pattern)
          patternc = None
          re_flags = 0
@@ -207,13 +212,6 @@
          elif color == 'auto':
              show_color = allow_color()
--        sub_patternc = None
--        if show_color:
--            sub_pattern = '(' + pattern + ')' # make pattern capturing
--            # sub_patternc is used for color display even for fixed_string
--            # when ignore_case is set
--            sub_patternc = grep.compile_pattern(sub_pattern, re_flags)
--
          GrepOptions.verbose = verbose
          GrepOptions.ignore_case = ignore_case
          GrepOptions.no_recursive = no_recursive
@@ -234,7 +232,6 @@
          GrepOptions.eol_marker = eol_marker
          GrepOptions.print_revno = print_revno
          GrepOptions.patternc = patternc
--        GrepOptions.sub_patternc = sub_patternc
          GrepOptions.recursive = not no_recursive
          GrepOptions.fixed_string = fixed_string
          GrepOptions.outf = self.outf
 === modified file 'grep.py'
 --- grep.py	2010-05-22 15:11:00 +0000
 +++ grep.py	2010-06-07 20:35:32 +0000
@@ -16,12 +16,8 @@
  from bzrlib.lazy_import import lazy_import
  lazy_import(globals(), """
--import codecs
--import cStringIO
  from fnmatch import fnmatch
--import os
  import re
--import string
  from termcolor import color_string, re_color_string, FG
@@ -133,13 +129,6 @@
          bzrdir.BzrDir.open_containing_tree_or_branch('.')
      branch.lock_read()
      try:
--        # res_cache is used to cache results for dir grep based on fid.
--        # If the fid is does not change between results, it means that
--        # the result will be the same apart from revno. In such a case
--        # we avoid getting file chunks from repo and grepping. The result
--        # is just printed by replacing old revno with new one.
--        res_cache = {}
--
          start_rev = opts.revision[0]
          start_revid = start_rev.as_revision_id(branch)
          if start_revid == None:
@@ -179,6 +168,9 @@
              start_rev_tuple = (start_revid, start_revno, 0)
              given_revs = [start_rev_tuple]
++        # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now
++        opts.outputter = _Outputter(opts, use_cache=True)
++
          for revid, revno, merge_depth in given_revs:
              if opts.levels == 1 and merge_depth != 0:
                  # with level=1 show only top level
@@ -195,8 +187,7 @@
                  if osutils.isdir(path):
                      path_prefix = path
--                    res_cache = dir_grep(tree, path, relpath, opts,
--                        revno, path_prefix, res_cache)
++                    dir_grep(tree, path, relpath, opts, revno, path_prefix)
                  else:
                      versioned_file_grep(tree, id, '.', path, opts, revno)
      finally:
@@ -213,6 +204,9 @@
              'To search for specific revision in history use the -r option.')
          raise errors.BzrCommandError(msg)
++    # GZ 2010-06-02: Shouldn't be smuggling this on opts, but easy for now
++    opts.outputter = _Outputter(opts)
++
      tree.lock_read()
      try:
          for path in opts.path_list:
@@ -220,7 +214,7 @@
                  path_prefix = path
                  dir_grep(tree, path, relpath, opts, revno, path_prefix)
              else:
--                _file_grep(open(path).read(), '.', path, opts, revno)
++                _file_grep(open(path).read(), path, opts, revno)
      finally:
          tree.unlock()
@@ -233,11 +227,7 @@
      return False
--def dir_grep(tree, path, relpath, opts, revno, path_prefix, res_cache={}):
--    _revno_pattern = re.compile("\~[0-9.]+:")
--    _revno_pattern_list_only = re.compile("\~[0-9.]+")
--    dir_res = {}
--
++def dir_grep(tree, path, relpath, opts, revno, path_prefix):
      # setup relpath to open files relative to cwd
      rpath = relpath
      if relpath:
@@ -251,7 +241,10 @@
      to_grep = []
      to_grep_append = to_grep.append
--    outf_write = opts.outf.write
++    # GZ 2010-06-05: The cache dict used to be recycled every call to dir_grep
++    #                and hits manually refilled. Could do this again if it was
++    #                for a good reason, otherwise cache might want purging.
++    outputter = opts.outputter
      for fp, fc, fkind, fid, entry in tree.list_files(include_root=False,
          from_dir=from_dir, recursive=opts.recursive):
@@ -263,25 +256,12 @@
                  # If old result is valid, print results immediately.
                  # Otherwise, add file info to to_grep so that the
                  # loop later will get chunks and grep them
--                file_rev = tree.inventory[fid].revision
--                old_res = res_cache.get(file_rev)
--                if old_res != None:
--                    res = []
--                    res_append = res.append
--
--                    if opts.files_with_matches or opts.files_without_match:
--                        new_rev = '~' + revno
--                    else:
--                        new_rev = ('~%s:' % (revno,))
--
--                    for line in old_res:
--                        if opts.files_with_matches or opts.files_without_match:
--                            s = _revno_pattern_list_only.sub(new_rev, line)
--                        else:
--                            s = _revno_pattern.sub(new_rev, line)
--                        res_append(s)
--                        outf_write(s)
--                    dir_res[file_rev] = res
++                cache_id = tree.inventory[fid].revision
++                if cache_id in outputter.cache:
++                    # GZ 2010-06-05: Not really sure caching and re-outputting
++                    #                the old path is really the right thing,
++                    #                but it's what the old code seemed to do
++                    outputter.write_cached_lines(cache_id, revno)
                  else:
                      to_grep_append((fid, (fp, fid)))
              else:
@@ -293,21 +273,17 @@
                  if opts.files_with_matches or opts.files_without_match:
                      # Optimize for wtree list-only as we don't need to read the
                      # entire file
--                    file = codecs.open(path_for_file, 'r', buffering=4096)
--                    _file_grep_list_only_wtree(file, rpath, fp, opts,
--                        path_prefix)
++                    file = open(path_for_file, 'r', buffering=4096)
++                    _file_grep_list_only_wtree(file, fp, opts, path_prefix)
                  else:
--                    file_text = codecs.open(path_for_file, 'r').read()
--                    _file_grep(file_text, rpath, fp,
--                        opts, revno, path_prefix)
++                    file_text = open(path_for_file, 'r').read()
++                    _file_grep(file_text, fp, opts, revno, path_prefix)
      if revno != None: # grep versioned files
          for (path, fid), chunks in tree.iter_files_bytes(to_grep):
              path = _make_display_path(relpath, path)
--            res = _file_grep(chunks[0], rpath, path, opts, revno, path_prefix)
--            file_rev = tree.inventory[fid].revision
--            dir_res[file_rev] = res
--    return dir_res
++            _file_grep(chunks[0], path, opts, revno, path_prefix,
++                tree.inventory[fid].revision)
  def _make_display_path(relpath, path):
@@ -331,43 +307,32 @@
      path = _make_display_path(relpath, path)
      file_text = tree.get_file_text(id)
--    _file_grep(file_text, relpath, path, opts, revno, path_prefix)
++    _file_grep(file_text, path, opts, revno, path_prefix)
  def _path_in_glob_list(path, glob_list):
--    present = False
      for glob in glob_list:
          if fnmatch(path, glob):
--            present = True
--            break
--    return present
--
--
--def _file_grep_list_only_wtree(file, relpath, path, opts, path_prefix=None):
++            return True
++    return False
++
++
++def _file_grep_list_only_wtree(file, path, opts, path_prefix=None):
      # test and skip binary files
      if '\x00' in file.read(1024):
          if opts.verbose:
              trace.warning("Binary file '%s' skipped." % path)
--            return
++        return
      file.seek(0) # search from beginning
      found = False
      if opts.fixed_string:
          pattern = opts.pattern.encode(_user_encoding, 'replace')
--        if opts.fixed_string and opts.ignore_case:
--            pattern = opts.pattern.lower()
--        if opts.ignore_case:
--            for line in file:
--                line = line.lower()
--                if pattern in line:
--                    found = True
--                    break
--        else: # don't ignore case
--            for line in file:
--                if pattern in line:
--                    found = True
--                    break
++        for line in file:
++            if pattern in line:
++                found = True
++                break
      else: # not fixed_string
          for line in file:
              if opts.patternc.search(line):
@@ -379,239 +344,170 @@
          if path_prefix and path_prefix != '.':
              # user has passed a dir arg, show that as result prefix
              path = osutils.pathjoin(path_prefix, path)
--        path = path.encode(_terminal_encoding, 'replace')
--        s = path + opts.eol_marker
--        opts.outf.write(s)
--
--
--def _file_grep(file_text, relpath, path, opts, revno, path_prefix=None):
--    res = []
--    res_append = res.append
--    outf_write = opts.outf.write
--
--    _te = _terminal_encoding
--    _ue = _user_encoding
--
--    pattern = opts.pattern.encode(_ue, 'replace')
--    patternc = opts.patternc
--    eol_marker = opts.eol_marker
--
--    if opts.fixed_string and opts.ignore_case:
--        pattern = opts.pattern.lower()
--
++        opts.outputter.get_writer(path, None, None)()
++
++
++class _Outputter(object):
++    """Precalculate formatting based on options given
++
++    The idea here is to do this work only once per run, and finally return a
++    function that will do the minimum amount possible for each match.
++    """
++    def __init__(self, opts, use_cache=False):
++        self.outf = opts.outf
++        if use_cache:
++            # self.cache is used to cache results for dir grep based on fid.
++            # If the fid is does not change between results, it means that
++            # the result will be the same apart from revno. In such a case
++            # we avoid getting file chunks from repo and grepping. The result
++            # is just printed by replacing old revno with new one.
++            self.cache = {}
++        else:
++            self.cache = None
++        no_line = opts.files_with_matches or opts.files_without_match
++
++        if opts.show_color:
++            pat = opts.pattern.encode(_user_encoding, 'replace')
++            if no_line:
++                self.get_writer = self._get_writer_plain
++            elif opts.fixed_string:
++                self._old = pat
++                self._new = color_string(pat, FG.BOLD_RED)
++                self.get_writer = self._get_writer_fixed_highlighted
++            else:
++                flags = opts.patternc.flags
++                self._sub = re.compile(pat.join(("((?:",")+)")), flags).sub
++                self._highlight = color_string("\\1", FG.BOLD_RED)
++                self.get_writer = self._get_writer_regexp_highlighted
++            path_start = FG.MAGENTA
++            path_end = FG.NONE
++            sep = color_string(':', FG.BOLD_CYAN)
++            rev_sep = color_string('~', FG.BOLD_YELLOW)
++        else:
++            self.get_writer = self._get_writer_plain
++            path_start = path_end = ""
++            sep = ":"
++            rev_sep = "~"
++
++        parts = [path_start, "%(path)s"]
++        if opts.print_revno:
++            parts.extend([rev_sep, "%(revno)s"])
++        self._format_initial = "".join(parts)
++        parts = []
++        if no_line:
++            if not opts.print_revno:
++                parts.append(path_end)
++        else:
++            if opts.line_number:
++                parts.extend([sep, "%(lineno)s"])
++            parts.extend([sep, "%(line)s"])
++        parts.append(opts.eol_marker)
++        self._format_perline = "".join(parts)
++
++    def _get_writer_plain(self, path, revno, cache_id):
++        """Get function for writing uncoloured output"""
++        per_line = self._format_perline
++        start = self._format_initial % {"path":path, "revno":revno}
++        write = self.outf.write
++        if self.cache is not None and cache_id is not None:
++            result_list = []
++            self.cache[cache_id] = path, result_list
++            add_to_cache = result_list.append
++            def _line_cache_and_writer(**kwargs):
++                """Write formatted line and cache arguments"""
++                end = per_line % kwargs
++                add_to_cache(end)
++                write(start + end)
++            return _line_cache_and_writer
++        def _line_writer(**kwargs):
++            """Write formatted line from arguments given by underlying opts"""
++            write(start + per_line % kwargs)
++        return _line_writer
++
++    def write_cached_lines(self, cache_id, revno):
++        """Write cached results out again for new revision"""
++        cached_path, cached_matches = self.cache[cache_id]
++        start = self._format_initial % {"path":cached_path, "revno":revno}
++        write = self.outf.write
++        for end in cached_matches:
++            write(start + end)
++
++    def _get_writer_regexp_highlighted(self, path, revno, cache_id):
++        """Get function for writing output with regexp match highlighted"""
++        _line_writer = self._get_writer_plain(path, revno, cache_id)
++        sub, highlight = self._sub, self._highlight
++        def _line_writer_regexp_highlighted(line, **kwargs):
++            """Write formatted line with matched pattern highlighted"""
++            return _line_writer(line=sub(highlight, line), **kwargs)
++        return _line_writer_regexp_highlighted
++
++    def _get_writer_fixed_highlighted(self, path, revno, cache_id):
++        """Get function for writing output with search string highlighted"""
++        _line_writer = self._get_writer_plain(path, revno, cache_id)
++        old, new = self._old, self._new
++        def _line_writer_fixed_highlighted(line, **kwargs):
++            """Write formatted line with string searched for highlighted"""
++            return _line_writer(line=line.replace(old, new), **kwargs)
++        return _line_writer_fixed_highlighted
++
++
++def _file_grep(file_text, path, opts, revno, path_prefix=None, cache_id=None):
      # test and skip binary files
      if '\x00' in file_text[:1024]:
          if opts.verbose:
              trace.warning("Binary file '%s' skipped." % path)
--        return res
++        return
      if path_prefix and path_prefix != '.':
          # user has passed a dir arg, show that as result prefix
          path = osutils.pathjoin(path_prefix, path)
--    path = path.encode(_te, 'replace')
--
--    if opts.show_color:
--        path = color_string(path, FG.MAGENTA)
--        color_sep = color_string(':', FG.BOLD_CYAN)
--        color_rev_sep = color_string('~', FG.BOLD_YELLOW)
--
--    # for better performance we moved formatting conditionals out
--    # of the core loop. hence, the core loop is somewhat duplicated
--    # for various combinations of formatting options.
++    # GZ 2010-06-07: There's no actual guarentee the file contents will be in
++    #                the user encoding, but we have to guess something and it
++    #                is a reasonable default without a better mechanism.
++    file_encoding = _user_encoding
++    pattern = opts.pattern.encode(_user_encoding, 'replace')
++
++    writeline = opts.outputter.get_writer(path, revno, cache_id)
      if opts.files_with_matches or opts.files_without_match:
          # While printing files with matches we only have two case
          # print file name or print file name with revno.
          found = False
--        if opts.print_revno:
--            if opts.fixed_string:
--                for line in file_text.splitlines():
--                    if opts.ignore_case:
--                        line = line.lower()
--                    if pattern in line:
--                        found = True
--                        break
--            else:
--                for line in file_text.splitlines():
--                    if patternc.search(line):
--                        found = True
--                        break
++        if opts.fixed_string:
++            for line in file_text.splitlines():
++                if pattern in line:
++                    found = True
++                    break
          else:
--            if opts.fixed_string:
--                for line in file_text.splitlines():
--                    if opts.ignore_case:
--                        line = line.lower()
--                    if pattern in line:
--                        found = True
--                        break
--            else:
--                for line in file_text.splitlines():
--                    if patternc.search(line):
--                        found = True
--                        break
++            search = opts.patternc.search
++            for line in file_text.splitlines():
++                if search(line):
++                    found = True
++                    break
          if (opts.files_with_matches and found) or \
                  (opts.files_without_match and not found):
--            if opts.print_revno:
--                pfmt = "~%s".encode(_te, 'replace')
--                if opts.show_color:
--                    pfmt = color_rev_sep + "%s"
--                s = path + (pfmt % (revno,)) + eol_marker
--            else:
--                s = path + eol_marker
--            res_append(s)
--            outf_write(s)
--        return res # return from files_with|without_matches
--
--
--    if opts.print_revno and opts.line_number:
--
--        pfmt = "~%s:%d:%s".encode(_te)
--        if opts.show_color:
--            pfmt = color_rev_sep + "%s" + color_sep + "%d" + color_sep + "%s"
--            pfmt = pfmt.encode(_te)
--
--        if opts.fixed_string:
--            if opts.ignore_case:
--                for index, line in enumerate(file_text.splitlines()):
--                    if pattern in line.lower():
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color:
--                            line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                        s = path + (pfmt % (revno, index+1, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--            else: # don't ignore case
--                found_str = color_string(pattern, FG.BOLD_RED)
--                for index, line in enumerate(file_text.splitlines()):
--                    if pattern in line:
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color == True:
--                            line = line.replace(pattern, found_str)
--                        s = path + (pfmt % (revno, index+1, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--        else:
++            writeline()
++    elif opts.fixed_string:
++        if opts.line_number:
              for index, line in enumerate(file_text.splitlines()):
--                if patternc.search(line):
--                    line = line.decode(_te, 'replace')
--                    if opts.show_color:
--                        line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                    s = path + (pfmt % (revno, index+1, line)) + eol_marker
--                    res_append(s)
--                    outf_write(s)
--
--    elif opts.print_revno and not opts.line_number:
--
--        pfmt = "~%s:%s".encode(_te, 'replace')
--        if opts.show_color:
--            pfmt = color_rev_sep + "%s" + color_sep + "%s"
--            pfmt = pfmt.encode(_te)
--
--        if opts.fixed_string:
--            if opts.ignore_case:
--                for line in file_text.splitlines():
--                    if pattern in line.lower():
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color:
--                            line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                        s = path + (pfmt % (revno, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--            else: # don't ignore case
--                found_str = color_string(pattern, FG.BOLD_RED)
--                for line in file_text.splitlines():
--                    if pattern in line:
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color == True:
--                            line = line.replace(pattern, found_str)
--                        s = path + (pfmt % (revno, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--
++                if pattern in line:
++                    line = line.decode(file_encoding)
++                    writeline(lineno=index+1, line=line)
          else:
              for line in file_text.splitlines():
--                if patternc.search(line):
--                    line = line.decode(_te, 'replace')
--                    if opts.show_color:
--                        line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                    s = path + (pfmt % (revno, line)) + eol_marker
--                    res_append(s)
--                    outf_write(s)
--
--    elif not opts.print_revno and opts.line_number:
--
--        pfmt = ":%d:%s".encode(_te)
--        if opts.show_color:
--            pfmt = color_sep + "%d" + color_sep + "%s"
--            pfmt = pfmt.encode(_te)
--
--        if opts.fixed_string:
--            if opts.ignore_case:
--                for index, line in enumerate(file_text.splitlines()):
--                    if pattern in line.lower():
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color:
--                            line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                        s = path + (pfmt % (index+1, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--            else: # don't ignore case
--                for index, line in enumerate(file_text.splitlines()):
--                    found_str = color_string(pattern, FG.BOLD_RED)
--                    if pattern in line:
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color == True:
--                            line = line.replace(pattern, found_str)
--                        s = path + (pfmt % (index+1, line)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--        else:
--            for index, line in enumerate(file_text.splitlines()):
--                if patternc.search(line):
--                    line = line.decode(_te, 'replace')
--                    if opts.show_color:
--                        line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                    s = path + (pfmt % (index+1, line)) + eol_marker
--                    res_append(s)
--                    outf_write(s)
--
++                if pattern in line:
++                    line = line.decode(file_encoding)
++                    writeline(line=line)
      else:
--
--        pfmt = ":%s".encode(_te)
--        if opts.show_color:
--            pfmt = color_sep + "%s"
--            pfmt = pfmt.encode(_te)
--
--        if opts.fixed_string:
--            if opts.ignore_case:
--                for line in file_text.splitlines():
--                    if pattern in line.lower():
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color:
--                            line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                        s = path + (pfmt % (line,)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
--            else: # don't ignore case
--                found_str = color_string(pattern, FG.BOLD_RED)
--                for line in file_text.splitlines():
--                    if pattern in line:
--                        line = line.decode(_te, 'replace')
--                        if opts.show_color:
--                            line = line.replace(pattern, found_str)
--                        s = path + (pfmt % (line,)) + eol_marker
--                        res_append(s)
--                        outf_write(s)
++        search = opts.patternc.search
++        if opts.line_number:
++            for index, line in enumerate(file_text.splitlines()):
++                if search(line):
++                    line = line.decode(file_encoding)
++                    writeline(lineno=index+1, line=line)
          else:
              for line in file_text.splitlines():
--                if patternc.search(line):
--                    line = line.decode(_te, 'replace')
--                    if opts.show_color:
--                        line = re_color_string(opts.sub_patternc, line, FG.BOLD_RED)
--                    s = path + (pfmt % (line,)) + eol_marker
--                    res_append(s)
--                    outf_write(s)
--
--    return res
--
++                if search(line):
++                    line = line.decode(file_encoding)
++                    writeline(line=line)
 === modified file 'test_grep.py'
 --- test_grep.py	2010-05-23 05:10:32 +0000
 +++ test_grep.py	2010-06-07 20:35:32 +0000
@@ -1936,9 +1936,48 @@
          self.assertEqual(len(out.splitlines()), 2) # finds line1 and line10
++class TestNonAscii(GrepTestBase):
++    """Tests for non-ascii filenames and file contents"""
++
++    _test_needs_features = [tests.UnicodeFilenameFeature]
++
++    def test_unicode_only_file(self):
++        """Test filename and contents that requires a unicode encoding"""
++        tree = self.make_branch_and_tree(".")
++        contents = [u"\u1234"]
++        self.build_tree(contents)
++        tree.add(contents)
++        tree.commit("Initial commit")
++        as_utf8 = u"\u1234".encode("UTF-8")
++
++        # GZ 2010-06-07: Note we can't actually grep for \u1234 as the pattern
++        #                is mangled according to the user encoding.
++        streams = self.run_bzr(["grep", "--files-with-matches",
++            u"contents"], encoding="UTF-8")
++        self.assertEqual(streams, (as_utf8 + "\n", ""))
++
++        streams = self.run_bzr(["grep", "-r", "1", "--files-with-matches",
++            u"contents"], encoding="UTF-8")
++        self.assertEqual(streams, (as_utf8 + "~1\n", ""))
++
++        fileencoding = osutils.get_user_encoding()
++        as_mangled = as_utf8.decode(fileencoding, "replace").encode("UTF-8")
++
++        streams = self.run_bzr(["grep", "-n",
++            u"contents"], encoding="UTF-8")
++        self.assertEqual(streams, ("%s:1:contents of %s\n" %
++            (as_utf8, as_mangled), ""))
++
++        streams = self.run_bzr(["grep", "-n", "-r", "1",
++            u"contents"], encoding="UTF-8")
++        self.assertEqual(streams, ("%s~1:1:contents of %s\n" %
++            (as_utf8, as_mangled), ""))
++
++
  class TestColorGrep(GrepTestBase):
      """Tests for the --color option."""
++    # GZ 2010-06-05: Does this really require the feature? Nothing prints.
      _test_needs_features = [features.color_feature]
      _rev_sep = color_string('~', fg=FG.BOLD_YELLOW)
@@ -1951,6 +1990,50 @@
          self.assertEqual(out, '')
          self.assertContainsRe(err, 'Valid values for --color are', flags=TestGrep._reflags)
++    def test_ver_matching_files(self):
++        """(versioned) Search for matches or no matches only"""
++        tree = self.make_branch_and_tree(".")
++        contents = ["d/", "d/aaa", "bbb"]
++        self.build_tree(contents)
++        tree.add(contents)
++        tree.commit("Initial commit")
++
++        # GZ 2010-06-05: Maybe modify the working tree here
++
++        streams = self.run_bzr(["grep", "--color", "always", "-r", "1",
++            "--files-with-matches", "aaa"])
++        self.assertEqual(streams, ("".join([
++            FG.MAGENTA, "d/aaa", self._rev_sep, "1", "\n"
++            ]), ""))
++
++        streams = self.run_bzr(["grep", "--color", "always", "-r", "1",
++            "--files-without-match", "aaa"])
++        self.assertEqual(streams, ("".join([
++            FG.MAGENTA, "bbb", self._rev_sep, "1", "\n"
++            ]), ""))
++
++    def test_wtree_matching_files(self):
++        """(wtree) Search for matches or no matches only"""
++        tree = self.make_branch_and_tree(".")
++        contents = ["d/", "d/aaa", "bbb"]
++        self.build_tree(contents)
++        tree.add(contents)
++        tree.commit("Initial commit")
++
++        # GZ 2010-06-05: Maybe modify the working tree here
++
++        streams = self.run_bzr(["grep", "--color", "always",
++            "--files-with-matches", "aaa"])
++        self.assertEqual(streams, ("".join([
++            FG.MAGENTA, "d/aaa", FG.NONE, "\n"
++            ]), ""))
++
++        streams = self.run_bzr(["grep", "--color", "always",
++            "--files-without-match", "aaa"])
++        self.assertEqual(streams, ("".join([
++            FG.MAGENTA, "bbb", FG.NONE, "\n"
++            ]), ""))
++
      def test_ver_basic_file(self):
          """(versioned) Search for pattern in specfic file.
          """
@@ -1962,12 +2045,12 @@
          # prepare colored result
          foo = color_string('foo', fg=FG.BOLD_RED)
--        res = (color_string('file0.txt', fg=FG.MAGENTA)
++        res = (FG.MAGENTA + 'file0.txt'
              + self._rev_sep + '1' + self._sep
              + foo + ' is ' + foo + 'bar1' + '\n')
          txt_res = 'file0.txt~1:foo is foobar1\n'
--        nres = (color_string('file0.txt', fg=FG.MAGENTA)
++        nres = (FG.MAGENTA + 'file0.txt'
              + self._rev_sep + '1' + self._sep + '1' + self._sep
              + foo + ' is ' + foo + 'bar1' + '\n')
@@ -2029,10 +2112,10 @@
          # prepare colored result
          foo = color_string('foo', fg=FG.BOLD_RED)
--        res = (color_string('file0.txt', fg=FG.MAGENTA)
++        res = (FG.MAGENTA + 'file0.txt'
              + self._sep + foo + ' is ' + foo + 'bar1' + '\n')
--        nres = (color_string('file0.txt', fg=FG.MAGENTA)
++        nres = (FG.MAGENTA + 'file0.txt'
              + self._sep + '1' + self._sep
              + foo + ' is ' + foo + 'bar1' + '\n')

bzr-grep

Merge lp:~gz/bzr-grep/preformat_output into lp:bzr-grep

Commit message

Description of the change

Preview Diff

Subscribers