Merge lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2 into lp:bzr/2.2

Proposed by Parth Malwankar on 2010-07-15
Status: Merged
Approved by: Vincent Ladeuil on 2010-07-21
Approved revision: 5061
Merged at revision: 5065
Proposed branch: lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2
Merge into: lp:bzr/2.2
Diff against target: 279 lines (+135/-39)
6 files modified
NEWS (+5/-0)
bzrlib/builtins.py (+8/-0)
bzrlib/globbing.py (+75/-23)
bzrlib/tests/blackbox/test_ignore.py (+16/-0)
bzrlib/tests/per_workingtree/test_is_ignored.py (+25/-13)
bzrlib/tests/test_globbing.py (+6/-3)
To merge this branch: bzr merge lp:~parthm/bzr/300062-2.2-bad-pattern-error-part-2
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve on 2010-07-21
John A Meinel 2010-07-15 Needs Fixing on 2010-07-19
Review via email: mp+29949@code.launchpad.net

Commit message

'bzr ignore' fails on bad pattern. bad pattern error messages now shows the faulting pattern.

Description of the change

This fix is towards bug #300062
It does the following:
1. `bzr ignore PATTERNS` now fails with an error if an invalid pattern is provided.
2. For pattern errors, the faulting pattern is displayed to the user.

E.g.
[a1234]% ~/src/bzr.dev/300062-2.2-bad-pattern-error-part-2/bzr st
bzr: ERROR: Invalid pattern(s) found. File ~/.bazaar/ignore or .bzrignore contains error(s).
  RE:[
[a1234]% ~/src/bzr.dev/300062-2.2-bad-pattern-error-part-2/bzr ignore "RE:*.cpp"
bzr: error: Invalid ignore pattern(s) found.
  RE:*.cpp
bzr: ERROR: Invalid pattern(s) found.
[a1234]%

This patch is the same as the https://code.launchpad.net/~parthm/bzr/300062-bad-pattern-error-part-2/+merge/29467 but is targeted to lp:bzr/2.2
If also contains update for e.message -> e.msg fix that came up since the previous patch was created.

To post a comment you must log in.
John A Meinel (jameinel) wrote :

79 - self._add_patterns(ext_patterns,_sub_extension,
80 - prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
81 - self._add_patterns(base_patterns,_sub_basename,
82 - prefix=r'(?:.*/)?(?!.*/)')
83 - self._add_patterns(path_patterns,_sub_fullpath)
84 + pattern_lists[Globster.identify(pat)].append(pat)
85 + for pattern_type, patterns in pattern_lists.iteritems():
86 + self._add_patterns(patterns,
87 + Globster.translators[pattern_type],
88 + Globster.prefixes[pattern_type])

^- I think factoring the patterns into a dict indirected code is interesting. I'm a bit concerned about:
for pattern_type, patterns in pattern_lists.iteritems():

I'm pretty sure we explicitly intended the order of 'check extensions' then 'check basename' then 'check fullpath'. So that we can do checks on just the little bit of the content before we have to check against the whole content. I don't have any explicit proof of this, but turning the current deterministic ordering with a random one doesn't seem ideal.

otherwise the change seems fine to me. I may be over-concerned, but I do believe that 'is_ignored()' is a fair amount of overhead in the 'initial add of 30k files', and I'd like us to at least be aware of it. I do wonder about changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out everything but the basename for some patterns, if we wouldn't be better off knowing that this is a basename-only pattern and just supply that to the match.

I'm certainly willing to discuss it, but the easiest way to move forward is to just make the order of _add_patterns deterministic by iterating over a fixed ordering.

review: Needs Fixing
Parth Malwankar (parthm) wrote :

> ^- I think factoring the patterns into a dict indirected code is interesting.
> I'm a bit concerned about:
> for pattern_type, patterns in pattern_lists.iteritems():
>
> I'm pretty sure we explicitly intended the order of 'check extensions' then
> 'check basename' then 'check fullpath'. So that we can do checks on just the
> little bit of the content before we have to check against the whole content. I
> don't have any explicit proof of this, but turning the current deterministic
> ordering with a random one doesn't seem ideal.
>

Makes sense. I have updated the patch to make _add_patterns order deterministic.

>
> otherwise the change seems fine to me. I may be over-concerned, but I do
> believe that 'is_ignored()' is a fair amount of overhead in the 'initial add
> of 30k files', and I'd like us to at least be aware of it. I do wonder about
> changing the internals so that instead of using '?:.*/)?(?!.*/)) to filter out
> everything but the basename for some patterns, if we wouldn't be better off
> knowing that this is a basename-only pattern and just supply that to the
> match.
>

Filed bug #607258 to track this.
Thanks for the review.

>
> I'm certainly willing to discuss it, but the easiest way to move forward is to
> just make the order of _add_patterns deterministic by iterating over a fixed
> ordering.

Vincent Ladeuil (vila) wrote :

43 + TYPE_FULLPATH = 1
44 + TYPE_BASENAME = 2
45 + TYPE_EXTENSION = 3

This looks a bit C-ish, why don't you just use regular keys ('fullpath', 'basename' and 'extension') ?

From a design point of view it looks like you're defining several set of objects associated with the same key which sounds like you want a dict or even a registry...

Also, while this is certainly out of scope for this proposal, keep in mind that some plugins want to define their own '.xxx-ignore' files and our current API is already a pain to use, it would be nice if we can help them by providing the right set of function/classes to process a set of paths against a given set of patterns coming from a file (or several in bzr case...).

Parth Malwankar (parthm) wrote :

> 43 + TYPE_FULLPATH = 1
> 44 + TYPE_BASENAME = 2
> 45 + TYPE_EXTENSION = 3
>
> This looks a bit C-ish, why don't you just use regular keys ('fullpath',
> 'basename' and 'extension') ?
>
> From a design point of view it looks like you're defining several set of
> objects associated with the same key which sounds like you want a dict or even
> a registry...
>

I have updated the patch to use a dict of dict with strings. Its much cleaner now.

> Also, while this is certainly out of scope for this proposal, keep in mind
> that some plugins want to define their own '.xxx-ignore' files and our current
> API is already a pain to use, it would be nice if we can help them by
> providing the right set of function/classes to process a set of paths against
> a given set of patterns coming from a file (or several in bzr case...).

Makes sense. I think it may be possible to build an API on top of the
current implementation to make it easy. I will look into bzr-upload as
suggested by you on IRC.

Thanks for the review.

Vincent Ladeuil (vila) wrote :

Yup, far cleaner.

I think you've also addressed jam's concerns so good to land for me.

review: Approve
Parth Malwankar (parthm) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2010-07-20 15:17:58 +0000
+++ NEWS 2010-07-21 10:19:44 +0000
@@ -31,6 +31,11 @@
31Bug Fixes31Bug Fixes
32*********32*********
3333
34* ``bzr ignore PATTERNS`` exits with error if a bad pattern is supplied.
35 ``InvalidPattern`` exception error message now shows faulting
36 regular expression.
37 (Parth Malwankar #300062)
38
34* Configuration files in ``${BZR_HOME}`` are now written in an atomic39* Configuration files in ``${BZR_HOME}`` are now written in an atomic
35 way which should help avoid problems with concurrent writers.40 way which should help avoid problems with concurrent writers.
36 (Vincent Ladeuil, #525571)41 (Vincent Ladeuil, #525571)
3742
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2010-07-18 14:22:34 +0000
+++ bzrlib/builtins.py 2010-07-21 10:19:44 +0000
@@ -2712,6 +2712,14 @@
2712 "NAME_PATTERN or --default-rules.")2712 "NAME_PATTERN or --default-rules.")
2713 name_pattern_list = [globbing.normalize_pattern(p)2713 name_pattern_list = [globbing.normalize_pattern(p)
2714 for p in name_pattern_list]2714 for p in name_pattern_list]
2715 bad_patterns = ''
2716 for p in name_pattern_list:
2717 if not globbing.Globster.is_pattern_valid(p):
2718 bad_patterns += ('\n %s' % p)
2719 if bad_patterns:
2720 msg = ('Invalid ignore pattern(s) found. %s' % bad_patterns)
2721 ui.ui_factory.show_error(msg)
2722 raise errors.InvalidPattern('')
2715 for name_pattern in name_pattern_list:2723 for name_pattern in name_pattern_list:
2716 if (name_pattern[0] == '/' or2724 if (name_pattern[0] == '/' or
2717 (len(name_pattern) > 1 and name_pattern[1] == ':')):2725 (len(name_pattern) > 1 and name_pattern[1] == ':')):
27182726
=== modified file 'bzrlib/globbing.py'
--- bzrlib/globbing.py 2010-07-09 16:16:11 +0000
+++ bzrlib/globbing.py 2010-07-21 10:19:44 +0000
@@ -179,24 +179,41 @@
179 so are matched first, then the basename patterns, then the fullpath179 so are matched first, then the basename patterns, then the fullpath
180 patterns.180 patterns.
181 """181 """
182 # We want to _add_patterns in a specific order (as per type_list below)
183 # starting with the shortest and going to the longest.
184 # As some Python version don't support ordered dicts the list below is
185 # used to select inputs for _add_pattern in a specific order.
186 pattern_types = [ "extension", "basename", "fullpath" ]
187
188 pattern_info = {
189 "extension" : {
190 "translator" : _sub_extension,
191 "prefix" : r'(?:.*/)?(?!.*/)(?:.*\.)'
192 },
193 "basename" : {
194 "translator" : _sub_basename,
195 "prefix" : r'(?:.*/)?(?!.*/)'
196 },
197 "fullpath" : {
198 "translator" : _sub_fullpath,
199 "prefix" : r''
200 },
201 }
202
182 def __init__(self, patterns):203 def __init__(self, patterns):
183 self._regex_patterns = []204 self._regex_patterns = []
184 path_patterns = []205 pattern_lists = {
185 base_patterns = []206 "extension" : [],
186 ext_patterns = []207 "basename" : [],
208 "fullpath" : [],
209 }
187 for pat in patterns:210 for pat in patterns:
188 pat = normalize_pattern(pat)211 pat = normalize_pattern(pat)
189 if pat.startswith(u'RE:') or u'/' in pat:212 pattern_lists[Globster.identify(pat)].append(pat)
190 path_patterns.append(pat)213 pi = Globster.pattern_info
191 elif pat.startswith(u'*.'):214 for t in Globster.pattern_types:
192 ext_patterns.append(pat)215 self._add_patterns(pattern_lists[t], pi[t]["translator"],
193 else:216 pi[t]["prefix"])
194 base_patterns.append(pat)
195 self._add_patterns(ext_patterns,_sub_extension,
196 prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
197 self._add_patterns(base_patterns,_sub_basename,
198 prefix=r'(?:.*/)?(?!.*/)')
199 self._add_patterns(path_patterns,_sub_fullpath)
200217
201 def _add_patterns(self, patterns, translator, prefix=''):218 def _add_patterns(self, patterns, translator, prefix=''):
202 while patterns:219 while patterns:
@@ -221,10 +238,50 @@
221 # the combined pattern we sent to regex. Instead we indicate to238 # the combined pattern we sent to regex. Instead we indicate to
222 # the user that an ignore file needs fixing.239 # the user that an ignore file needs fixing.
223 mutter('Invalid pattern found in regex: %s.', e.msg)240 mutter('Invalid pattern found in regex: %s.', e.msg)
224 e.msg = "File ~/.bazaar/ignore or .bzrignore contains errors."241 e.msg = "File ~/.bazaar/ignore or .bzrignore contains error(s)."
242 bad_patterns = ''
243 for _, patterns in self._regex_patterns:
244 for p in patterns:
245 if not Globster.is_pattern_valid(p):
246 bad_patterns += ('\n %s' % p)
247 e.msg += bad_patterns
225 raise e248 raise e
226 return None249 return None
227250
251 @staticmethod
252 def identify(pattern):
253 """Returns pattern category.
254
255 :param pattern: normalized pattern.
256 Identify if a pattern is fullpath, basename or extension
257 and returns the appropriate type.
258 """
259 if pattern.startswith(u'RE:') or u'/' in pattern:
260 return "fullpath"
261 elif pattern.startswith(u'*.'):
262 return "extension"
263 else:
264 return "basename"
265
266 @staticmethod
267 def is_pattern_valid(pattern):
268 """Returns True if pattern is valid.
269
270 :param pattern: Normalized pattern.
271 is_pattern_valid() assumes pattern to be normalized.
272 see: globbing.normalize_pattern
273 """
274 result = True
275 translator = Globster.pattern_info[Globster.identify(pattern)]["translator"]
276 tpattern = '(%s)' % translator(pattern)
277 try:
278 re_obj = re.compile(tpattern, re.UNICODE)
279 re_obj.search("") # force compile
280 except errors.InvalidPattern, e:
281 result = False
282 return result
283
284
228class ExceptionGlobster(object):285class ExceptionGlobster(object):
229 """A Globster that supports exception patterns.286 """A Globster that supports exception patterns.
230 287
@@ -272,14 +329,9 @@
272 self._regex_patterns = []329 self._regex_patterns = []
273 for pat in patterns:330 for pat in patterns:
274 pat = normalize_pattern(pat)331 pat = normalize_pattern(pat)
275 if pat.startswith(u'RE:') or u'/' in pat:332 t = Globster.identify(pat)
276 self._add_patterns([pat], _sub_fullpath)333 self._add_patterns([pat], Globster.pattern_info[t]["translator"],
277 elif pat.startswith(u'*.'):334 Globster.pattern_info[t]["prefix"])
278 self._add_patterns([pat], _sub_extension,
279 prefix=r'(?:.*/)?(?!.*/)(?:.*\.)')
280 else:
281 self._add_patterns([pat], _sub_basename,
282 prefix=r'(?:.*/)?(?!.*/)')
283335
284336
285_slashes = re.compile(r'[\\/]+')337_slashes = re.compile(r'[\\/]+')
286338
=== modified file 'bzrlib/tests/blackbox/test_ignore.py'
--- bzrlib/tests/blackbox/test_ignore.py 2010-06-11 07:32:12 +0000
+++ bzrlib/tests/blackbox/test_ignore.py 2010-07-21 10:19:44 +0000
@@ -162,3 +162,19 @@
162 tree = self.make_branch_and_tree('a')162 tree = self.make_branch_and_tree('a')
163 self.run_bzr(['ignore', '--directory=a', 'README'])163 self.run_bzr(['ignore', '--directory=a', 'README'])
164 self.check_file_contents('a/.bzrignore', 'README\n')164 self.check_file_contents('a/.bzrignore', 'README\n')
165
166 def test_ignored_invalid_pattern(self):
167 """Ensure graceful handling for invalid ignore pattern.
168
169 Test case for #300062.
170 Invalid pattern should show clear error message.
171 Invalid pattern should not be added to .bzrignore file.
172 """
173 tree = self.make_branch_and_tree('.')
174 out, err = self.run_bzr(['ignore', 'RE:*.cpp', 'foo', 'RE:['], 3)
175 self.assertEqual(out, '')
176 self.assertContainsRe(err,
177 'Invalid ignore pattern.*RE:\*\.cpp.*RE:\[', re.DOTALL)
178 self.assertNotContainsRe(err, 'foo', re.DOTALL)
179 self.assertFalse(os.path.isfile('.bzrignore'))
180
165181
=== modified file 'bzrlib/tests/per_workingtree/test_is_ignored.py'
--- bzrlib/tests/per_workingtree/test_is_ignored.py 2010-02-17 17:11:16 +0000
+++ bzrlib/tests/per_workingtree/test_is_ignored.py 2010-07-21 10:19:44 +0000
@@ -22,6 +22,16 @@
2222
23class TestIsIgnored(TestCaseWithWorkingTree):23class TestIsIgnored(TestCaseWithWorkingTree):
2424
25 def _set_user_ignore_content(self, ignores):
26 """Create user ignore file and set its content to ignores."""
27 config.ensure_config_dir_exists()
28 user_ignore_file = config.user_ignore_config_filename()
29 f = open(user_ignore_file, 'wb')
30 try:
31 f.write(ignores)
32 finally:
33 f.close()
34
25 def test_is_ignored(self):35 def test_is_ignored(self):
26 tree = self.make_branch_and_tree('.')36 tree = self.make_branch_and_tree('.')
27 # this will break if a tree changes the ignored format. That is fine37 # this will break if a tree changes the ignored format. That is fine
@@ -46,6 +56,11 @@
46 '#comment\n'56 '#comment\n'
47 ' xx \n' # whitespace57 ' xx \n' # whitespace
48 )])58 )])
59 # We set user ignore file to contain '' to avoid patterns from
60 # user ignore being used instead of bzrignore. For .e.g. If we
61 # don't do this 'foo.~1~' will match '*~' default user ignore
62 # pattern instead of '*.~*' from bzr ignore as we expect below.
63 self._set_user_ignore_content('')
49 # is_ignored returns the matching ignore regex when a path is ignored.64 # is_ignored returns the matching ignore regex when a path is ignored.
50 # we check some expected matches for each rule, and one or more65 # we check some expected matches for each rule, and one or more
51 # relevant not-matches that look plausible as cases for bugs.66 # relevant not-matches that look plausible as cases for bugs.
@@ -119,19 +134,16 @@
119134
120 config.ensure_config_dir_exists()135 config.ensure_config_dir_exists()
121 user_ignore_file = config.user_ignore_config_filename()136 user_ignore_file = config.user_ignore_config_filename()
122 f = open(user_ignore_file, 'wb')137 self._set_user_ignore_content(
123 try:138 '*.py[co]\n'
124 f.write('*.py[co]\n'139 './.shelf\n'
125 './.shelf\n'140 '# comment line\n'
126 '# comment line\n'141 '\n' #Blank line
127 '\n' #Blank line142 '\r\n' #Blank dos line
128 '\r\n' #Blank dos line143 ' * \n' #Trailing and suffix spaces
129 ' * \n' #Trailing and suffix spaces144 'crlf\r\n' # dos style line
130 'crlf\r\n' # dos style line145 '*\xc3\xa5*\n' # u'\xe5'.encode('utf8')
131 '*\xc3\xa5*\n' # u'\xe5'.encode('utf8')146 )
132 )
133 finally:
134 f.close()
135147
136 # Rooted148 # Rooted
137 self.assertEqual('./.shelf', tree.is_ignored('.shelf'))149 self.assertEqual('./.shelf', tree.is_ignored('.shelf'))
138150
=== modified file 'bzrlib/tests/test_globbing.py'
--- bzrlib/tests/test_globbing.py 2010-07-09 16:16:11 +0000
+++ bzrlib/tests/test_globbing.py 2010-07-21 10:19:44 +0000
@@ -15,6 +15,8 @@
15# along with this program; if not, write to the Free Software15# along with this program; if not, write to the Free Software
16# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA16# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1717
18import re
19
18from bzrlib import errors20from bzrlib import errors
19from bzrlib.globbing import (21from bzrlib.globbing import (
20 Globster,22 Globster,
@@ -311,10 +313,11 @@
311313
312 def test_bad_pattern(self):314 def test_bad_pattern(self):
313 """Ensure that globster handles bad patterns cleanly."""315 """Ensure that globster handles bad patterns cleanly."""
314 patterns = [u'RE:[']316 patterns = [u'RE:[', u'/home/foo', u'RE:*.cpp']
315 g = Globster(patterns)317 g = Globster(patterns)
316 e = self.assertRaises(errors.InvalidPattern, g.match, 'foo')318 e = self.assertRaises(errors.InvalidPattern, g.match, 'filename')
317 self.assertContainsRe(e.msg, "File.*ignore.*contains errors")319 self.assertContainsRe(e.msg,
320 "File.*ignore.*contains error.*RE:\[.*RE:\*\.cpp", flags=re.DOTALL)
318321
319322
320class TestExceptionGlobster(TestCase):323class TestExceptionGlobster(TestCase):

Subscribers

People subscribed via source and target branches