Merge lp:~gagern/bzr/encodingSafeTests into lp:bzr

Proposed by Martin von Gagern
Status: Work in progress
Proposed branch: lp:~gagern/bzr/encodingSafeTests
Merge into: lp:bzr
Diff against target: 142 lines (+79/-11)
2 files modified
bzrlib/tests/__init__.py (+29/-11)
bzrlib/tests/test_selftest.py (+50/-0)
To merge this branch: bzr merge lp:~gagern/bzr/encodingSafeTests
Reviewer Review Type Date Requested Status
Martin Packman (community) Abstain
Andrew Bennetts Needs Fixing
Review via email: mp+23925@code.launchpad.net

Description of the change

Ensure tests won't cause the whole selftest to abort even if UnicodeErrors occur when formatting some test result.

I had proposed this in 2008: http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/43607
Back then, John Arbash Meinel and Martin Pool had approved it, but Martin Pool later voted resubmit, as the feature failed its own test cases.

I've updated it to current bzr.dev, and selftests work for me now. Do you want this patch?

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote :

+ return string_or_unicode.encode('string_escape').replace(r'\n', '\n')

This mangles the output:

>>> x = r'backslash n, not newline: \n'
>>> x.encode('string_escape').replace(r'\n', '\n')
'backslash n, not newline: \\\n'
>>> '\n' in x
False
>>> '\n' in x.encode('string_escape').replace(r'\n', '\n')
True

Otherwise this seems ok. When UnicodeErrors occur you _safe_str all variable that might be involved, which is perhaps too zealous. But it's undoubtedly better than allowing the UnicodeError to abort selftest.

review: Needs Fixing
lp:~gagern/bzr/encodingSafeTests updated
3530. By Martin von Gagern

Fix broken newline unescaping.

In order to show useful diffs, and readable output in general, newlines
should not be escaped. The unescaping done before was broken if r'\n'
appeared in the input, as Andrew Bennetts reviewed. In order to avoid using
regular expressions, we now split lines and join them after escaping. Added
a test case as well, and a new test class for toplevel functions in tests.

Revision history for this message
Martin Packman (gz) wrote :

Making the test result output robust against dodgy bytestrings is a worthy goal, but there are issues with the specifics here. Most generally, I think testtools is the place to start.

There are three things this patch is making 'safe':

* Test descriptions, which come from test ids
Always ascii, as noted by John last time around:
<https://lists.ubuntu.com/archives/bazaar/2008q3/045619.html>

* Exception instances from testtools
Already been run through an encoding and decoding process. There *are* issues like bug 501166 that need fixing upstream.

* Diffs where one text is a (non-ascii) str and the other is unicode
This is undecidable as the function does not know the encoding of the bytestring. Just escaping both inputs can result in a bogus pass, asserting the types are the same would catch badly written tests.

review: Abstain
Revision history for this message
Robert Collins (lifeless) wrote :

So, I think Martin[gz]'s analysis is good; I'm ok with us doing what
your patch does *as long as* we don't lose motivation to fix these
other things:
 - so, any comparisons doing implicit casting should error (that is,
we should say that u'foo' != 'foo'), which will need to be its own
patch [it will be huge].
 - we need to *really visibly* log the unicode exception
 - we need to not permit any unicode exceptions on PQM, because its
meant to only get clean things.

I think on balance, that I wouldn't do what you've done, myself - I'd
just find and fix the root causes anytime the test suite did get
aborted: there can't be many such root causes.

HTH,
Rob

Revision history for this message
John A Meinel (jameinel) wrote :

I'm a bit unsure of how to proceed on this one. It looks like it should be a resubmit/do it differently sort of vote. At least going by Martin and Robert's comments.

Revision history for this message
Martin von Gagern (gagern) wrote :

See https://code.edge.launchpad.net/~gagern/bzr/setlocale/+merge/391 for some background on this request here: I mostly submitted the branch because it was the only thing left over from that ancient setlocale merge request. I wanted to get things closed down properly. It was vila who suggested I submit it as well.

The issues that motivated these changes have long since been dealt with, so I don't urgently need this fix. If you don't feel like merging it, then don't, I won't object (for this merge here). Only thing I'd like to avoid is rejecting the thing now and redoing the work in a few years because nobody remembers this branch.

I won't have time to code a different approach to this. So if you want it done differently, fine by me, but at least now I won't be the one implementing it.

Revision history for this message
Martin Packman (gz) wrote :

Submitting this merge request was a good idea, there are real problems here still even after two years. What is different is the test suite now depends on testtools, and there's more of a push to get str/unicode issues sorted out. It's fine if you're not interested in working out what needs changing now as it's no longer an issue for you.

I've spoken to Robert already about making some changes to how tracebacks are stringified in testtools, and with Martin Pool about general exception printing in Bazaar. These changes are going to be a pain, but if you're okay with it I'd like to CC you on the review when I'm done.

Revision history for this message
Martin von Gagern (gagern) wrote :

> if you're okay with it I'd like to CC you on the review when I'm done.

Sure do.

Unmerged revisions

3530. By Martin von Gagern

Fix broken newline unescaping.

In order to show useful diffs, and readable output in general, newlines
should not be escaped. The unescaping done before was broken if r'\n'
appeared in the input, as Andrew Bennetts reviewed. In order to avoid using
regular expressions, we now split lines and join them after escaping. Added
a test case as well, and a new test class for toplevel functions in tests.

3529. By Martin von Gagern

Merge from trunk, with lots of adjustments.

This branch has been lying around for quite a while, so there has been a
number of changes to incorporate. Therefore this merge has a large edit
component to it.

3528. By Martin von Gagern

Added tests for tests._safe_string.

3527. By Martin von Gagern

More robust handling of encoding issues in tests.

Problems with encoding errors could lead to not only single tests
failing but the whole selftest process crashing. The introduction of
safe strings as a fallback in case of trouble avoids most of these
problems. The safe strings try to maintain as much information as
possible while using only ASCII characters. This is achieved by
formatting the strings using escape sequences.

This modification only changes how certain test results are displayed
to the user. There is no change to the semantics of these tests, so a
test that used to fail before will still continue to fail. Hopefully if
it does fail it is more likely that the test suite will continue with
the next test and not abort due to some exception displaying the error.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/__init__.py'
2--- bzrlib/tests/__init__.py 2010-04-22 15:44:21 +0000
3+++ bzrlib/tests/__init__.py 2010-04-23 07:05:38 +0000
4@@ -139,6 +139,18 @@
5 SUBUNIT_SEEK_SET = 0
6 SUBUNIT_SEEK_CUR = 1
7
8+def _safe_string(string_or_unicode):
9+ """Escape argument so it won't likely cause UnicodeErrors."""
10+ if isinstance(string_or_unicode, str):
11+ return '\n'.join([s.encode('string_escape')
12+ for s in string_or_unicode.split('\n')])
13+ if isinstance(string_or_unicode, unicode):
14+ return '\n'.join([s.encode('unicode_escape')
15+ for s in string_or_unicode.split(u'\n')])
16+ if string_or_unicode is None:
17+ return '<None>'
18+ return str(string_or_unicode).encode('string_escape')
19+
20
21 class ExtendedTestResult(unittest._TextTestResult):
22 """Accepts, reports and accumulates the results of running tests.
23@@ -488,17 +500,20 @@
24 def _test_description(self, test):
25 return self._shortened_test_description(test)
26
27+ def _report(self, kind, test, err):
28+ descr = self._test_description(test)
29+ msg = err[1]
30+ try:
31+ self.stream.write('%s: %s\n %s\n' % (kind, descr, msg))
32+ except UnicodeError:
33+ self.stream.write('%s: %s\n %s\n' %
34+ (kind, _safe_string(descr), _safe_string(msg)))
35+
36 def report_error(self, test, err):
37- self.stream.write('ERROR: %s\n %s\n' % (
38- self._test_description(test),
39- err[1],
40- ))
41+ self._report('ERROR', test, err)
42
43 def report_failure(self, test, err):
44- self.stream.write('FAIL: %s\n %s\n' % (
45- self._test_description(test),
46- err[1],
47- ))
48+ self._report('FAIL', test, err)
49
50 def report_known_failure(self, test, err):
51 pass
52@@ -1113,9 +1128,12 @@
53 message = 'first string is missing a final newline.\n'
54 if a == b + '\n':
55 message = 'second string is missing a final newline.\n'
56- raise AssertionError(message +
57- self._ndiff_strings(a, b))
58-
59+ try:
60+ diff = self._ndiff_strings(a, b)
61+ except UnicodeError:
62+ diff = self._ndiff_strings(_safe_string(a), _safe_string(b))
63+ raise AssertionError(message + diff)
64+
65 def assertEqualMode(self, mode, mode_test):
66 self.assertEqual(mode, mode_test,
67 'mode mismatch %o != %o' % (mode, mode_test))
68
69=== modified file 'bzrlib/tests/test_selftest.py'
70--- bzrlib/tests/test_selftest.py 2010-04-18 08:51:37 +0000
71+++ bzrlib/tests/test_selftest.py 2010-04-23 07:05:38 +0000
72@@ -510,6 +510,14 @@
73 os.lstat("foo"), os.lstat("longname"))
74
75
76+class TestTestsPackage(tests.TestCase):
77+ """Test parts of the bzrlib.tests package like top-level functions."""
78+
79+ def test_safe_string_with_newline(self):
80+ self.assertEquals('foo\\\\\nbar\\\\nbaz',
81+ tests._safe_string('foo\\\nbar\\nbaz'))
82+
83+
84 class TestTestCaseWithMemoryTransport(tests.TestCaseWithMemoryTransport):
85
86 def test_home_is_non_existant_dir_under_root(self):
87@@ -996,6 +1004,32 @@
88 test.run(result)
89 self.assertEquals(1, result.calls)
90
91+ def test_safe_string_message(self):
92+ result_stream = StringIO()
93+ result = bzrlib.tests.TextTestResult(
94+ unittest._WritelnDecorator(result_stream),
95+ descriptions=0,
96+ verbosity=1,
97+ )
98+ test = self.get_passing_test()
99+ # To be a bit like real life, the strings deal with
100+ # japanese Katakana characters. The binary string first
101+ # encodes the character U+30A4 in EUC-JP and then U+30A6
102+ # in UTF-8. Mixed encodings like this is the kind of issue
103+ # addressed by _safe_string.
104+ test.id = lambda: u'\u30a2\n\u30a8'
105+ # this seeds the state to handle reporting the test.
106+ result.startTest(test)
107+ prefix = len(result_stream.getvalue())
108+ # the err parameter has the shape:
109+ # (class, exception object, traceback)
110+ err = (AssertionError, AssertionError('\xa5\xa4\n\xe3\x82\xa6'), [])
111+ result.report_failure(test, err)
112+ output = result_stream.getvalue()[prefix:]
113+ self.assertEqual(
114+ 'FAIL: \\u30a2\n\\u30a8\n \\xa5\\xa4\\n\\xe3\\x82\\xa6\n',
115+ output)
116+
117
118 class TestUnicodeFilenameFeature(tests.TestCase):
119
120@@ -1654,6 +1688,22 @@
121 test.run(unittest.TestResult())
122 self.assertEqual('original', obj.test_attr)
123
124+ def test_safe_string_diff(self):
125+ """Problems with encodings should fall back to _safe_string."""
126+ try:
127+ # To be a bit like real life, the strings deal with
128+ # japanese Katakana characters. The binary string first
129+ # encodes the character U+30A4 in EUC-JP and then U+30A6
130+ # in UTF-8. Mixed encodings like this is the kind of issue
131+ # addressed by _safe_string.
132+ self.assertEqualDiff(u'\u30a2\n\u30a8',
133+ '\xa5\xa4\n\xe3\x82\xa6')
134+ except AssertionError, err:
135+ self.assertEqualDiff('texts not equal:\n' +
136+ '- \\u30a2\n- \\u30a8\n' +
137+ '+ \\xa5\\xa4\n+ \\xe3\\x82\\xa6\n',
138+ err.message)
139+
140
141 # NB: Don't delete this; it's not actually from 0.11!
142 @deprecated_function(deprecated_in((0, 11, 0)))