Merge lp:~parthm/bzr/503670-vila-grep-as-builtin into lp:bzr

Proposed by Parth Malwankar
Status: Rejected
Rejected by: Vincent Ladeuil
Proposed branch: lp:~parthm/bzr/503670-vila-grep-as-builtin
Merge into: lp:bzr
Diff against target: 216 lines (+183/-0)
3 files modified
bzrlib/builtins.py (+127/-0)
bzrlib/tests/blackbox/__init__.py (+1/-0)
bzrlib/tests/blackbox/test_grep.py (+55/-0)
To merge this branch: bzr merge lp:~parthm/bzr/503670-vila-grep-as-builtin
Reviewer Review Type Date Requested Status
Parth Malwankar Disapprove
Martin Packman (community) Needs Fixing
Robert Collins (community) Needs Fixing
Review via email: mp+19857@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Parth Malwankar (parthm) wrote :

Bug #503670

This patch ports the Vincent's grep plugin https://code.launchpad.net/~vila/bzr/grep
to builtins. The grep command is now available as 'bzr grep PATTERN [-- options]'. All grep options are support (followed by '--').
The current implementation requires xargs and grep commands to be available.

This command uses an implementation of os.path.relpath (ported from python 2.6 as cmd_grep._relpath). I tried using osutils.relpath but I found the two to be
inconsistent with the python 2.6 version in argument ordering, having optional arg and handling of '.'. As I don't fully understand the use cases of osutils.relpath I decided to keep cmd_grep._relpath.

Note on windows usage.
I tried this on a windows system with http://unxutils.sourceforge.net/ installed.
The command silently existed for me even though xargs and grep are available. It may be a good idea to disable this command on non-POSIX OSes. We could add a new error type. Comments/suggestions?

Example usage:
[bzrlib]% ../bzr --no-plugins grep main_version -- -i
__init__.py: main_version = '%d.%d' % version_info[:2]
__init__.py: main_version = '%d.%d.%d' % version_info[:3]
__init__.py: return main_version
__init__.py: return main_version + sub_string
tests/blackbox/test_version.py: def test_main_version(self):
[bzrlib]%

Revision history for this message
Robert Collins (lifeless) wrote :

 review: needsfixing

So this seems to have very few tests and not work on windows. I don't
think its good enough to merge to core yet.

Also as this doesn't look in history at all, there is really little
benefit in putting it in the core unless its going to make windows users
lives easier (which it won't). Right now its little more than 'bzr ls |
xargs greap'.

I rather suspect you'd be better of writing one from scratch with full
tests at appropriate unit and blackbox layers, to get windows support
and test coverage.

-Rob

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :

As a windows user, I have no problem with the idea of a `bzr grep` that depends on an external grep tool, I'd just want you to lose the xargs dependency.

The -- thing is ugly, why not just pull out the bazaar-specific arguments from the list given and pass the rest onto the external tool?

review: Needs Fixing
Revision history for this message
Parth Malwankar (parthm) wrote :

> As a windows user, I have no problem with the idea of a `bzr grep` that
> depends on an external grep tool, I'd just want you to lose the xargs
> dependency.
>

Having grep.exe shipped with bzr is one option. But for something like history
and view support, we would still need to do use bzrlib API to filter files,
get text of older versions and pipe it to grep. And as you mentioned below, I also
think '--' looks quite ugly so it might be good to process options via bzr's
command line processing.

I feel going with 'bzr ls [opts] | grep' may be a better option than the above. For history we would probably need to use 'bzr cat'?

Considering the above, IMO going with a plugin gives us more flexibility (e.g the --from-root option in the plugin) along with the possibility of easily enhancing it in the future.

> The -- thing is ugly, why not just pull out the bazaar-specific arguments from
> the list given and pass the rest onto the external tool?

I am wondering if there is a consensus on the approach to bug #503670.
The options which seem ok to me are:
 (1) 'bzr ls [opts] | grep': bzr could [based on user selection] install grep.exe on windows. 'bzr cat' could be used for history search.
 (2) self contained grep [a]: This should probably be a plugin (at least to start with).
 (3) ???
Whatever option is chosen should have support for views and history IMO.

Once there is some agreement on the approach thats preferred, we should be
able to close bug #503670

[a] https://code.launchpad.net/~parthm/bzr/503670-grep-plugin/+merge/20067

Revision history for this message
Martin Pool (mbp) wrote :

I think it's reasonable to have a selfcontained grep, so that
 * it can efficiently search history
 * it's easier for windows users

Building on Python's re module a basic grep should be very small.

I don't strongly care if this is in a plugin or core. I don't think
core needs to be minimal and I don't think it needs to be maximal and
merge every useful feature.

It would be nice to also have the option to pipe files through an
external grep so people can get extra features from grep.
--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Parth Malwankar (parthm) wrote :

> It would be nice to also have the option to pipe files through an
> external grep so people can get extra features from grep.
> --
> Martin <http://launchpad.net/~mbp/>

Sounds like a good idea. '--cat-only'.

Revision history for this message
Martin Pool (mbp) wrote :

On 26 February 2010 16:54, Parth Malwankar <email address hidden> wrote:
>
>
>> It would be nice to also have the option to pipe files through an
>> external grep so people can get extra features from grep.
>> --
>> Martin <http://launchpad.net/~mbp/>
>
> Sounds like a good idea. '--cat-only'.

Well, actually I meant --external-grep=PROG, and then that can be run
for each file, using eg gnu grep's --label argument to indicate which
version of which file it is.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Parth Malwankar (parthm) wrote :

This proposal should probably be rejected over lp:bzr-grep.

review: Disapprove

Unmerged revisions

5054. By Parth Malwankar

ported tests

5053. By Parth Malwankar

move grep command into core (from plugin https://code.launchpad.net/~vila/bzr/grep)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/builtins.py'
2--- bzrlib/builtins.py 2010-02-15 20:23:34 +0000
3+++ bzrlib/builtins.py 2010-02-22 11:16:18 +0000
4@@ -24,6 +24,7 @@
5 import cStringIO
6 import sys
7 import time
8+import subprocess
9
10 import bzrlib
11 from bzrlib import (
12@@ -5916,6 +5917,132 @@
13 for path, location in sorted(ref_list):
14 self.outf.write('%s %s\n' % (path, location))
15
16+class cmd_grep(Command):
17+ """Grep through your working tree, but only files managed by bzr.
18+
19+ By default this is a recursive grep, unless you specify --non-recursive.
20+
21+ To pass options to grep use "--" to signal the end of options to bzr, eg:
22+
23+ # bzr grep pattern -- -i file
24+
25+ This does a case-insensitive grep on file.
26+ """
27+
28+ takes_args = ['pattern', 'file*']
29+ takes_options = [
30+ Option(
31+ 'non-recursive',
32+ help="Don't recursively grep the contents of directories."),
33+ ]
34+
35+ # TODO: Separate file selection from file processing
36+ # TODO: avoid xargs completely ?
37+ # TODO: separate file selection from file processing to get a true streamed
38+ # process (everything is is buffered so far, inputs as well as outputs and will
39+ # blow up for big trees one day or the other)
40+ @display_command
41+ def run(self, pattern, non_recursive=False, file_list=None):
42+
43+ curdir = osutils.getcwd()
44+
45+ options, file_list = self._read_options(file_list)
46+
47+ versioned = 'V'
48+
49+ file_list = tree_files(file_list)[1]
50+ tree, b, reldir = bzrdir.BzrDir.open_containing_tree_or_branch('.')
51+ if reldir:
52+ reldir += '/'
53+ tree.lock_read()
54+
55+ try:
56+ cmd = ['xargs', '-0', 'grep', '-e', pattern]
57+ cmd.extend(options)
58+
59+ xargs = subprocess.Popen(cmd, stdin=subprocess.PIPE,
60+ stdout=subprocess.PIPE,
61+ stderr=subprocess.PIPE,
62+ )
63+
64+ for fpath, fclass, fkind, fid, entry \
65+ in tree.list_files(include_root=False):
66+ # XXX: this selection part is messy, more tests are needed and
67+ # a better flow too
68+ if fkind == 'directory':
69+ continue
70+ if file_list:
71+ if non_recursive:
72+ if not fpath in file_list:
73+ continue
74+ elif not osutils.is_inside_any(file_list, fpath):
75+ continue
76+ elif reldir and not fpath.startswith(reldir):
77+ continue
78+ if (non_recursive and reldir
79+ and fpath.startswith(reldir)
80+ and '/' in fpath[len(reldir):]):
81+ continue
82+ if fclass != versioned:
83+ continue
84+ rpath = self._relpath(tree.abspath(fpath))
85+ xargs.stdin.write(rpath + '\0')
86+ xargs.stdin.flush()
87+ out, err = xargs.communicate()
88+ if out:
89+ sys.stdout.write(out)
90+ if err:
91+ sys.stderr.write(err)
92+ finally:
93+ tree.unlock()
94+
95+ def _read_options(self, list):
96+ # This is a bit of a hack, but if we scan the file list for
97+ # options we can do:
98+ # bzr grep foo -- -i file
99+ # and pass the -i through to grep.
100+
101+ if list is None:
102+ return [], []
103+
104+ files = []
105+ options = []
106+
107+ while len(list) > 0:
108+ item = list.pop(0)
109+
110+ if item.startswith('-'):
111+ options.append(item)
112+ if item in ['-A', '-B', '-C', '-m']:
113+ options.append(list.pop(0))
114+ else:
115+ files.append(item)
116+
117+ return options, files
118+
119+ # Borrowed from python2.6 until we get there
120+ def _relpath(self, path, start=os.path.curdir):
121+ """Return a relative version of a path.
122+
123+ This a slighty simplified and modified version of the 2.6
124+ version. Windows specific error handling deleted since we always calls
125+ this function for paths on the same volume.
126+ """
127+
128+ if not path:
129+ raise ValueError("no path specified")
130+
131+ start_list = os.path.abspath(start).split(os.path.sep)
132+ path_list = os.path.abspath(path).split(os.path.sep)
133+
134+ # Work out how much of the filepath is shared by start and path.
135+ i = len(os.path.commonprefix([start_list, path_list]))
136+
137+ rel_list = [os.path.pardir] * (len(start_list)-i) + path_list[i:]
138+ return os.path.join(*rel_list)
139+
140+
141+
142
143 # these get imported and then picked up by the scan for cmd_*
144 # TODO: Some more consistent way to split command definitions across files;
145
146=== modified file 'bzrlib/tests/blackbox/__init__.py'
147--- bzrlib/tests/blackbox/__init__.py 2009-04-29 20:31:34 +0000
148+++ bzrlib/tests/blackbox/__init__.py 2010-02-22 11:16:18 +0000
149@@ -60,6 +60,7 @@
150 'bzrlib.tests.blackbox.test_filesystem_cicp',
151 'bzrlib.tests.blackbox.test_filtered_view_ops',
152 'bzrlib.tests.blackbox.test_find_merge_base',
153+ 'bzrlib.tests.blackbox.test_grep',
154 'bzrlib.tests.blackbox.test_help',
155 'bzrlib.tests.blackbox.test_hooks',
156 'bzrlib.tests.blackbox.test_ignore',
157
158=== added file 'bzrlib/tests/blackbox/test_grep.py'
159--- bzrlib/tests/blackbox/test_grep.py 1970-01-01 00:00:00 +0000
160+++ bzrlib/tests/blackbox/test_grep.py 2010-02-22 11:16:18 +0000
161@@ -0,0 +1,55 @@
162+# Copyright (C) 2010 Canonical Ltd
163+#
164+# This program is free software; you can redistribute it and/or modify
165+# it under the terms of the GNU General Public License as published by
166+# the Free Software Foundation; either version 2 of the License, or
167+# (at your option) any later version.
168+#
169+# This program is distributed in the hope that it will be useful,
170+# but WITHOUT ANY WARRANTY; without even the implied warranty of
171+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
172+# GNU General Public License for more details.
173+#
174+# You should have received a copy of the GNU General Public License
175+# along with this program; if not, write to the Free Software
176+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
177+
178+import os
179+
180+from bzrlib.tests import TestCaseWithTransport
181+
182+class TestGrepInDirs(TestCaseWithTransport):
183+
184+ def setUp(self):
185+ super(TestGrepInDirs, self).setUp()
186+ tree = self.make_branch_and_tree('.')
187+ file_list = ['top', 'subdir/', 'subdir/down']
188+ self.build_tree(file_list)
189+ tree.add(file_list)
190+ tree.commit('setup')
191+
192+ def assertGrepResultIs(self, expected_output, grep_args,
193+ working_dir='.',
194+ expected_error_re='.*'):
195+ out, err = self.run_bzr(['grep'] + grep_args, working_dir=working_dir)
196+ self.assertEquals(expected_output, out)
197+ self.assertContainsRe(err, expected_error_re)
198+
199+ def test_grep_in_root_dir(self):
200+ self.assertGrepResultIs('subdir/down:contents of subdir/down\n'
201+ 'top:contents of top\n' ,
202+ ['contents'])
203+ self.assertGrepResultIs('contents of top\n', ['top', 'top'])
204+ self.assertGrepResultIs('contents of subdir/down\n', ['down',
205+ 'subdir/down'])
206+
207+ def test_grep_in_subir(self):
208+ self.assertGrepResultIs('contents of subdir/down\n', ['down', 'down'],
209+ working_dir='subdir')
210+ self.assertGrepResultIs('contents of top\n', ['top', '../top'],
211+ working_dir='subdir')
212+
213+ def test_grep_in_subdir_doesnt_go_up(self):
214+ self.assertGrepResultIs('contents of subdir/down\n', ['contents'],
215+ working_dir='subdir')
216+