Merge lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2 into lp:bzr/2.2

Proposed by Parth Malwankar on 2010-07-15
Status: Merged
Approved by: Vincent Ladeuil on 2010-07-21
Approved revision: 5061
Merged at revision: 5065
Proposed branch: lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2
Merge into: lp:bzr/2.2
Diff against target: 279 lines (+135/-39)
6 files modified
NEWS (+5/-0)
bzrlib/builtins.py (+8/-0)
bzrlib/globbing.py (+75/-23)
bzrlib/tests/blackbox/test_ignore.py (+16/-0)
bzrlib/tests/per_workingtree/test_is_ignored.py (+25/-13)
bzrlib/tests/test_globbing.py (+6/-3)
To merge this branch: bzr merge lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve on 2010-07-21
John A Meinel 2010-07-15 Needs Fixing on 2010-07-19
Review via email: mp+29949@code.launchpad.net

Commit message

'bzr ignore' fails on bad pattern. bad pattern error messages now shows the faulting pattern.

Description of the change

This fix is towards bug #300062
It does the following:
1. `bzr ignore PATTERNS` now fails with an error if an invalid pattern is provided.
2. For pattern errors, the faulting pattern is displayed to the user.

E.g.
[a1234]% ~/src/bzr.dev/300062-2.2-bad-pattern-error-part-2/bzr st
bzr: ERROR: Invalid pattern(s) found. File ~/.bazaar/ignore or .bzrignore contains error(s).
  RE:[
[a1234]% ~/src/bzr.dev/300062-2.2-bad-pattern-error-part-2/bzr ignore "RE:*.cpp"
bzr: error: Invalid ignore pattern(s) found.
  RE:*.cpp
bzr: ERROR: Invalid pattern(s) found.
[a1234]%

This patch is the same as the https://code.launchpad.net/~parthm/bzr/300062-bad-pattern-error-part-2/+merge/29467 but is targeted to lp:bzr/2.2
If also contains update for e.message -> e.msg fix that came up since the previous patch was created.

To post a comment you must log in.
John A Meinel (jameinel) wrote :

79 - self._add_patterns(ext_patterns,_sub_extension,
80 - prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
81 - self._add_patterns(base_patterns,_sub_basename,
82 - prefix=r'(?:.*/)?(?!.*/)')
83 - self._add_patterns(path_patterns,_sub_fullpath)
84 + pattern_lists[Globster.identify(pat)].append(pat)
85 + for pattern_type, patterns in pattern_lists.iteritems():
86 + self._add_patterns(patterns,
87 + Globster.translators[pattern_type],
88 + Globster.prefixes[pattern_type])

^- I think factoring the patterns into a dict indirected code is interesting. I'm a bit concerned about:
for pattern_type, patterns in pattern_lists.iteritems():

I'm pretty sure we explicitly intended the order of 'check extensions' then 'check basename' then 'check fullpath'. So that we can do checks on just the little bit of the content before we have to check against the whole content. I don't have any explicit proof of this, but turning the current deterministic ordering with a random one doesn't seem ideal.

otherwise the change seems fine to me. I may be over-concerned, but I do believe that 'is_ignored()' is a fair amount of overhead in the 'initial add of 30k files', and I'd like us to at least be aware of it. I do wonder about changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out everything but the basename for some patterns, if we wouldn't be better off knowing that this is a basename-only pattern and just supply that to the match.

I'm certainly willing to discuss it, but the easiest way to move forward is to just make the order of _add_patterns deterministic by iterating over a fixed ordering.

review: Needs Fixing
Parth Malwankar (parthm) wrote :

> ^- I think factoring the patterns into a dict indirected code is interesting.
> I'm a bit concerned about:
> for pattern_type, patterns in pattern_lists.iteritems():
>
> I'm pretty sure we explicitly intended the order of 'check extensions' then
> 'check basename' then 'check fullpath'. So that we can do checks on just the
> little bit of the content before we have to check against the whole content. I
> don't have any explicit proof of this, but turning the current deterministic
> ordering with a random one doesn't seem ideal.
>

Makes sense. I have updated the patch to make _add_patterns order deterministic.

>
> otherwise the change seems fine to me. I may be over-concerned, but I do
> believe that 'is_ignored()' is a fair amount of overhead in the 'initial add
> of 30k files', and I'd like us to at least be aware of it. I do wonder about
> changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out
> everything but the basename for some patterns, if we wouldn't be better off
> knowing that this is a basename-only pattern and just supply that to the
> match.
>

Filed bug #607258 to track this.
Thanks for the review.

>
> I'm certainly willing to discuss it, but the easiest way to move forward is to
> just make the order of _add_patterns deterministic by iterating over a fixed
> ordering.

Vincent Ladeuil (vila) wrote :

43 + TYPE_FULLPATH = 1
44 + TYPE_BASENAME = 2
45 + TYPE_EXTENSION = 3

This looks a bit C-ish, why don't you just use regular keys ('fullpath', 'basename' and 'extension') ?

From a design point of view it looks like you're defining several set of objects associated with the same key which sounds like you want a dict or even a registry...

Also, while this is certainly out of scope for this proposal, keep in mind that some plugins want to define their own '.xxx-ignore' files and our current API is already a pain to use, it would be nice if we can help them by providing the right set of function/classes to process a set of paths against a given set of patterns coming from a file (or several in bzr case...).

Parth Malwankar (parthm) wrote :

> 43 + TYPE_FULLPATH = 1
> 44 + TYPE_BASENAME = 2
> 45 + TYPE_EXTENSION = 3
>
> This looks a bit C-ish, why don't you just use regular keys ('fullpath',
> 'basename' and 'extension') ?
>
> From a design point of view it looks like you're defining several set of
> objects associated with the same key which sounds like you want a dict or even
> a registry...
>

I have updated the patch to use a dict of dict with strings. Its much cleaner now.

> Also, while this is certainly out of scope for this proposal, keep in mind
> that some plugins want to define their own '.xxx-ignore' files and our current
> API is already a pain to use, it would be nice if we can help them by
> providing the right set of function/classes to process a set of paths against
> a given set of patterns coming from a file (or several in bzr case...).

Makes sense. I think it may be possible to build an API on top of the
current implementation to make it easy. I will look into bzr-upload as
suggested by you on IRC.

Thanks for the review.

Vincent Ladeuil (vila) wrote :

Yup, far cleaner.

I think you've also addressed jam's concerns so good to land for me.

review: Approve
Parth Malwankar (parthm) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-07-20 15:17:58 +0000
3+++ NEWS 2010-07-21 10:19:44 +0000
4@@ -31,6 +31,11 @@
5 Bug Fixes
6 *********
7
8+* ``bzr ignore PATTERNS`` exits with error if a bad pattern is supplied.
9+ ``InvalidPattern`` exception error message now shows faulting
10+ regular expression.
11+ (Parth Malwankar #300062)
12+
13 * Configuration files in ``${BZR_HOME}`` are now written in an atomic
14 way which should help avoid problems with concurrent writers.
15 (Vincent Ladeuil, #525571)
16
17=== modified file 'bzrlib/builtins.py'
18--- bzrlib/builtins.py 2010-07-18 14:22:34 +0000
19+++ bzrlib/builtins.py 2010-07-21 10:19:44 +0000
20@@ -2712,6 +2712,14 @@
21 "NAME_PATTERN or --default-rules.")
22 name_pattern_list = [globbing.normalize_pattern(p)
23 for p in name_pattern_list]
24+ bad_patterns = ''
25+ for p in name_pattern_list:
26+ if not globbing.Globster.is_pattern_valid(p):
27+ bad_patterns += ('\n %s' % p)
28+ if bad_patterns:
29+ msg = ('Invalid ignore pattern(s) found. %s' % bad_patterns)
30+ ui.ui_factory.show_error(msg)
31+ raise errors.InvalidPattern('')
32 for name_pattern in name_pattern_list:
33 if (name_pattern[0] == '/' or
34 (len(name_pattern) > 1 and name_pattern[1] == ':')):
35
36=== modified file 'bzrlib/globbing.py'
37--- bzrlib/globbing.py 2010-07-09 16:16:11 +0000
38+++ bzrlib/globbing.py 2010-07-21 10:19:44 +0000
39@@ -179,24 +179,41 @@
40 so are matched first, then the basename patterns, then the fullpath
41 patterns.
42 """
43+ # We want to _add_patterns in a specific order (as per type_list below)
44+ # starting with the shortest and going to the longest.
45+ # As some Python version don't support ordered dicts the list below is
46+ # used to select inputs for _add_pattern in a specific order.
47+ pattern_types = [ "extension", "basename", "fullpath" ]
48+
49+ pattern_info = {
50+ "extension" : {
51+ "translator" : _sub_extension,
52+ "prefix" : r'(?:.*/)?(?!.*/)(?:.*\.)'
53+ },
54+ "basename" : {
55+ "translator" : _sub_basename,
56+ "prefix" : r'(?:.*/)?(?!.*/)'
57+ },
58+ "fullpath" : {
59+ "translator" : _sub_fullpath,
60+ "prefix" : r''
61+ },
62+ }
63+
64 def __init__(self, patterns):
65 self._regex_patterns = []
66- path_patterns = []
67- base_patterns = []
68- ext_patterns = []
69+ pattern_lists = {
70+ "extension" : [],
71+ "basename" : [],
72+ "fullpath" : [],
73+ }
74 for pat in patterns:
75 pat = normalize_pattern(pat)
76- if pat.startswith(u'RE:') or u'/' in pat:
77- path_patterns.append(pat)
78- elif pat.startswith(u'*.'):
79- ext_patterns.append(pat)
80- else:
81- base_patterns.append(pat)
82- self._add_patterns(ext_patterns,_sub_extension,
83- prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
84- self._add_patterns(base_patterns,_sub_basename,
85- prefix=r'(?:.*/)?(?!.*/)')
86- self._add_patterns(path_patterns,_sub_fullpath)
87+ pattern_lists[Globster.identify(pat)].append(pat)
88+ pi = Globster.pattern_info
89+ for t in Globster.pattern_types:
90+ self._add_patterns(pattern_lists[t], pi[t]["translator"],
91+ pi[t]["prefix"])
92
93 def _add_patterns(self, patterns, translator, prefix=''):
94 while patterns:
95@@ -221,10 +238,50 @@
96 # the combined pattern we sent to regex. Instead we indicate to
97 # the user that an ignore file needs fixing.
98 mutter('Invalid pattern found in regex: %s.', e.msg)
99- e.msg = "File ~/.bazaar/ignore or .bzrignore contains errors."
100+ e.msg = "File ~/.bazaar/ignore or .bzrignore contains error(s)."
101+ bad_patterns = ''
102+ for _, patterns in self._regex_patterns:
103+ for p in patterns:
104+ if not Globster.is_pattern_valid(p):
105+ bad_patterns += ('\n %s' % p)
106+ e.msg += bad_patterns
107 raise e
108 return None
109
110+ @staticmethod
111+ def identify(pattern):
112+ """Returns pattern category.
113+
114+ :param pattern: normalized pattern.
115+ Identify if a pattern is fullpath, basename or extension
116+ and returns the appropriate type.
117+ """
118+ if pattern.startswith(u'RE:') or u'/' in pattern:
119+ return "fullpath"
120+ elif pattern.startswith(u'*.'):
121+ return "extension"
122+ else:
123+ return "basename"
124+
125+ @staticmethod
126+ def is_pattern_valid(pattern):
127+ """Returns True if pattern is valid.
128+
129+ :param pattern: Normalized pattern.
130+ is_pattern_valid() assumes pattern to be normalized.
131+ see: globbing.normalize_pattern
132+ """
133+ result = True
134+ translator = Globster.pattern_info[Globster.identify(pattern)]["translator"]
135+ tpattern = '(%s)' % translator(pattern)
136+ try:
137+ re_obj = re.compile(tpattern, re.UNICODE)
138+ re_obj.search("") # force compile
139+ except errors.InvalidPattern, e:
140+ result = False
141+ return result
142+
143+
144 class ExceptionGlobster(object):
145 """A Globster that supports exception patterns.
146
147@@ -272,14 +329,9 @@
148 self._regex_patterns = []
149 for pat in patterns:
150 pat = normalize_pattern(pat)
151- if pat.startswith(u'RE:') or u'/' in pat:
152- self._add_patterns([pat], _sub_fullpath)
153- elif pat.startswith(u'*.'):
154- self._add_patterns([pat], _sub_extension,
155- prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
156- else:
157- self._add_patterns([pat], _sub_basename,
158- prefix=r'(?:.*/)?(?!.*/)')
159+ t = Globster.identify(pat)
160+ self._add_patterns([pat], Globster.pattern_info[t]["translator"],
161+ Globster.pattern_info[t]["prefix"])
162
163
164 _slashes = re.compile(r'[\\/]+')
165
166=== modified file 'bzrlib/tests/blackbox/test_ignore.py'
167--- bzrlib/tests/blackbox/test_ignore.py 2010-06-11 07:32:12 +0000
168+++ bzrlib/tests/blackbox/test_ignore.py 2010-07-21 10:19:44 +0000
169@@ -162,3 +162,19 @@
170 tree = self.make_branch_and_tree('a')
171 self.run_bzr(['ignore', '--directory=a', 'README'])
172 self.check_file_contents('a/.bzrignore', 'README\n')
173+
174+ def test_ignored_invalid_pattern(self):
175+ """Ensure graceful handling for invalid ignore pattern.
176+
177+ Test case for #300062.
178+ Invalid pattern should show clear error message.
179+ Invalid pattern should not be added to .bzrignore file.
180+ """
181+ tree = self.make_branch_and_tree('.')
182+ out, err = self.run_bzr(['ignore', 'RE:*.cpp', 'foo', 'RE:['], 3)
183+ self.assertEqual(out, '')
184+ self.assertContainsRe(err,
185+ 'Invalid ignore pattern.*RE:\*\.cpp.*RE:\[', re.DOTALL)
186+ self.assertNotContainsRe(err, 'foo', re.DOTALL)
187+ self.assertFalse(os.path.isfile('.bzrignore'))
188+
189
190=== modified file 'bzrlib/tests/per_workingtree/test_is_ignored.py'
191--- bzrlib/tests/per_workingtree/test_is_ignored.py 2010-02-17 17:11:16 +0000
192+++ bzrlib/tests/per_workingtree/test_is_ignored.py 2010-07-21 10:19:44 +0000
193@@ -22,6 +22,16 @@
194
195 class TestIsIgnored(TestCaseWithWorkingTree):
196
197+ def _set_user_ignore_content(self, ignores):
198+ """Create user ignore file and set its content to ignores."""
199+ config.ensure_config_dir_exists()
200+ user_ignore_file = config.user_ignore_config_filename()
201+ f = open(user_ignore_file, 'wb')
202+ try:
203+ f.write(ignores)
204+ finally:
205+ f.close()
206+
207 def test_is_ignored(self):
208 tree = self.make_branch_and_tree('.')
209 # this will break if a tree changes the ignored format. That is fine
210@@ -46,6 +56,11 @@
211 '#comment\n'
212 ' xx \n' # whitespace
213 )])
214+ # We set user ignore file to contain '' to avoid patterns from
215+ # user ignore being used instead of bzrignore. For .e.g. If we
216+ # don't do this 'foo.~1~' will match '*~' default user ignore
217+ # pattern instead of '*.~*' from bzr ignore as we expect below.
218+ self._set_user_ignore_content('')
219 # is_ignored returns the matching ignore regex when a path is ignored.
220 # we check some expected matches for each rule, and one or more
221 # relevant not-matches that look plausible as cases for bugs.
222@@ -119,19 +134,16 @@
223
224 config.ensure_config_dir_exists()
225 user_ignore_file = config.user_ignore_config_filename()
226- f = open(user_ignore_file, 'wb')
227- try:
228- f.write('*.py[co]\n'
229- './.shelf\n'
230- '# comment line\n'
231- '\n' #Blank line
232- '\r\n' #Blank dos line
233- ' * \n' #Trailing and suffix spaces
234- 'crlf\r\n' # dos style line
235- '*\xc3\xa5*\n' # u'\xe5'.encode('utf8')
236- )
237- finally:
238- f.close()
239+ self._set_user_ignore_content(
240+ '*.py[co]\n'
241+ './.shelf\n'
242+ '# comment line\n'
243+ '\n' #Blank line
244+ '\r\n' #Blank dos line
245+ ' * \n' #Trailing and suffix spaces
246+ 'crlf\r\n' # dos style line
247+ '*\xc3\xa5*\n' # u'\xe5'.encode('utf8')
248+ )
249
250 # Rooted
251 self.assertEqual('./.shelf', tree.is_ignored('.shelf'))
252
253=== modified file 'bzrlib/tests/test_globbing.py'
254--- bzrlib/tests/test_globbing.py 2010-07-09 16:16:11 +0000
255+++ bzrlib/tests/test_globbing.py 2010-07-21 10:19:44 +0000
256@@ -15,6 +15,8 @@
257 # along with this program; if not, write to the Free Software
258 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
259
260+import re
261+
262 from bzrlib import errors
263 from bzrlib.globbing import (
264 Globster,
265@@ -311,10 +313,11 @@
266
267 def test_bad_pattern(self):
268 """Ensure that globster handles bad patterns cleanly."""
269- patterns = [u'RE:[']
270+ patterns = [u'RE:[', u'/home/foo', u'RE:*.cpp']
271 g = Globster(patterns)
272- e = self.assertRaises(errors.InvalidPattern, g.match, 'foo')
273- self.assertContainsRe(e.msg, "File.*ignore.*contains errors")
274+ e = self.assertRaises(errors.InvalidPattern, g.match, 'filename')
275+ self.assertContainsRe(e.msg,
276+ "File.*ignore.*contains error.*RE:\[.*RE:\*\.cpp", flags=re.DOTALL)
277
278
279 class TestExceptionGlobster(TestCase):

Subscribers

People subscribed via source and target branches