Merge lp:~gz/testtools/unprintable-assertThat-804127 into lp:~testtools-committers/testtools/trunk

Proposed by Martin Packman on 2011-08-24
Status: Merged
Approved by: Jonathan Lange on 2011-09-14
Approved revision: 245
Merged at revision: 229
Proposed branch: lp:~gz/testtools/unprintable-assertThat-804127
Merge into: lp:~testtools-committers/testtools/trunk
Prerequisite: lp:~jml/testtools/unprintable-assertThat-804127
Diff against target: 754 lines (+439/-32) 6 files modified (has conflicts)
Text conflict in testtools/tests/test_matchers.py
To merge this branch: bzr merge lp:~gz/testtools/unprintable-assertThat-804127
Reviewer Review Type Date Requested Status
Jonathan Lange 2011-08-24 Approve on 2011-09-14
Review via email: mp+72641@code.launchpad.net

Description of the Change

Resolve issues related to stringifying in the matcher code, by replacing several custom repr-like schemes using %s on strings (which may upcase the result to unicode or prevent later upcasting, or leak control codes, or...) with a big ugly method that tries to do what the callers wanted.

This deserves a longer description of the reasoning behind the changes, but there's some discussion of the reasoning in the prerequisite branch and I need some feedback before I go crazy\bier.

Some random notes so I don't forget them later:
* Switched from """...""" to '''...''' so if I see output pasted in bug reports I'll know if I broke it.
* Being clever with the escaping of single quotes in multiline strings is probably more trouble than it's worth, just putting a backslash in front of every occurrence would simplify the logic a lot.
* Astral characters break, as per usual. I'm amused to note that upstream just fixed one more issue their end[1] (but without exposing a usable iterator to python level code still), and on Python 3 only of course.
* If the decision is to dump all these attempts at fancy formatting and just live with the normal repr, I would not mind that at all.

[1]: http://bugs.python.org/issue9200

To post a comment you must log in.
Jonathan Lange (jml) wrote :

Heroic effort. I can only imagine what this must have been like to write.

What follows is initial feedback and thoughts. Answer & fix as much as you
like. I'm happy to fix up the minor stuff as I land.

To summarize, although the bug was originally filed against assertThat, it
turns out that it's actually an issue in the matchers themselves as well as
assertThat. Right?

If so, we have to be thinking about:
 * what are our failure modes with this code and existing broken matchers
 * how to make it easy to define matchers and mismatches correctly

Will return to those in a bit. General code stuff:

 * Can you elaborate on your comment on istext()?
 * Is there a way we can make it so matchers never have to use _isbytes?
   Perhaps by adding an "is_multiline" function to compat.
 * I don't care about astral characters
 * In compat, "Seperate" should be spelled "Separate"
 * text_repr is a bit hard to follow, maybe add some internal comments?
 * Some more documentation on the intent & API of text_repr would help,
   saying that it attempts to mimic repr() except returning unicode,
   giving some examples, explaining why you might want to use it and also what
   multiline does.
 * I don't get why multiline needs three different ways of escaping lines
   whereas the non-multiline case only has two branches
 * I think having a repr()-like function that takes any object and
   special-cases text would be good
 * Just to be clear, we are expecting Mismatch.describe() to return unicode,
   right?
 * What are we expecting Matcher.__str__ to return? Text or bytes?

On the original points:

 * We should add something to the docs explaining best practice for
   Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)
 * We should update the docstring for ``Mismatch.describe``
 * Our docs say that match() should return a Mismatch, so we can assume
   subclassing of that.
 * These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
   e.g. The 'message' parameter of assertThat needs to have its type
   documented.

Thanks so much for persevering.

jml

Robert Collins (lifeless) wrote :

 * What are we expecting Matcher.__str__ to return? Text or bytes?

-- text. Fugly I know. So python 2.x: use __unicode__, 3.x use
__str__, and we expect text all the time. We use it to get something
looking like what the user asked for after all.
....

 * Our docs say that match() should return a Mismatch, so we can assume
  subclassing of that.

Please no, duck typing should be all we need. (And None is not a
subclass of Mismatch anyway :P).

-Rob

Martin Packman (gz) wrote :
Download full text (6.1 KiB)

Thanks, this is is a useful review that covers all the stuff I needed to mention but didn't get in the proposal.

> To summarize, although the bug was originally filed against assertThat, it
> turns out that it's actually an issue in the matchers themselves as well as
> assertThat. Right?

Yes. The initial thoughts were about adding more escaping to assertThat, but by then most of the formatting has been done. By moving the onus onto Matchers to escape more carefully, it's possible to preserve more information.

> If so, we have to be thinking about:
> * what are our failure modes with this code and existing broken matchers
> * how to make it easy to define matchers and mismatches correctly

These are exactly the two primary concerns here. We don't want to punish test case authors with spurious errors if they specify an assert incorrectly, but give a clear enough failure for them to fix the problem. We also don't want learning about Python string quirks to be a prerequisite for writing a Matcher.

> * Can you elaborate on your comment on istext()?

The result of the function is generally not enough information to actually use your string safely.

Two of the current callers in texttools are just trying to do special handling on a string, and otherwise assume the object is already a specific correct type. In both cases, the call is (harmlessly) subtly wrong, __import__ doesn't take non-ascii strings on Python 2, and Python 3 bytes can be used for regular expressions.

Code wanting to do actual string handling has it worse, as it will always risk getting a string that can't be safely interpolated, or written stream, or otherwise given as output.

In short, it's something of an attractive nuisance. In using it, I realised the text_repr interface wasn't very good.

> * Is there a way we can make it so matchers never have to use _isbytes?
> Perhaps by adding an "is_multiline" function to compat.

Something like this would be much better, possibly moving the multiline logic into the repr function (so True means triple quote if there's a newline, or by adding a third state). This was me lowering my standards and keeping coding, in order to get something that would work. :)

> * text_repr is a bit hard to follow, maybe add some internal comments?

Yup, and some general cleanup would help, I left the first version that passed the tests alone so we could see if it actually worked as an interface before buffing it.

> * Some more documentation on the intent & API of text_repr would help,
> saying that it attempts to mimic repr() except returning unicode,
> giving some examples, explaining why you might want to use it and also what
> multiline does.

Will write this. As is probably clear from the tests, it aims to use the Python 3 logic for displaying literal unicode characters, and to maintain the `eval(repr(s)) == s` expectation.

> * I don't get why multiline needs three different ways of escaping lines
> whereas the non-multiline case only has two branches

So, there are a cube of relevant combinations:

               Multiline?
                 F T
               +---+---+ Python 3
             / | | R | bytes
           / ...

Read more...

Jonathan Lange (jml) wrote :

Wow, thanks for the reply. Sorry its taken so long to get back to you.

In general, your replies to my review seem to be either agreeing with me or informing me of something I didn't know. There are a few cleanup things that need to be done before this lands (identified above). Where do you want to go from here?

241. By Martin Packman on 2011-09-13

Extra tests to ensure multiline quoting is correct on Python 3

242. By Martin Packman on 2011-09-13

Fix spelling error noted in review by jml

243. By Martin Packman on 2011-09-13

Add cautions about unprintable strings to Mismatch documentation

244. By Martin Packman on 2011-09-13

Add third state to multiline argument of text_repr and comment the internal logic

245. By Martin Packman on 2011-09-13

Test that not passing multiline to text_repr defaults based on input

Martin Packman (gz) wrote :

Okay, this branch should now be tidied up enough to land, can you see anything I've missed?

I tried out switching the repr-like function to take any object, and it does seem to be a better interface but complicates the logic even further, so should probably be done separately.

There's still a bunch of todo all over the place, and I haven't managed to squish _isbytes yet, though it's not strictly necessary in the matchers code bar some new tests I added that expect fancy behaviour.

There are a couple more interesting things that could do with handling later. Either Annotate or AnnotateMismatch (or both, see the earlier complaint about string typing) should check the message they are given is printable unicode. Also the regular expression display form could do with some extra testing, I'm not totally convinced it always produces correct expressions (and certainly doesn't with some on Python 2.4 where the unicode-escape codec fails to escape backslashes).

Jonathan Lange (jml) wrote :

Can't see anything you've missed. I'm going to merge this, and will file wishlist bugs for the things that you say you've left out.

Jonathan Lange (jml) :
review: Approve
Jonathan Lange (jml) wrote :
Martin Packman (gz) wrote :

Great, thanks jml, I'll subscribe to those bugs.

Preview Diff

1=== modified file 'doc/for-test-authors.rst'
2--- doc/for-test-authors.rst 2011-08-15 16:14:42 +0000
3+++ doc/for-test-authors.rst 2011-09-13 23:54:14 +0000
4@@ -717,7 +717,7 @@
5 self.remainder = remainder
6
7 def describe(self):
8- return "%s is not divisible by %s, %s remains" % (
9+ return "%r is not divisible by %r, %r remains" % (
10 self.number, self.divider, self.remainder)
11
12 def get_details(self):
13@@ -738,11 +738,19 @@
14 remainder = actual % self.divider
15 if remainder != 0:
16 return Mismatch(
17- "%s is not divisible by %s, %s remains" % (
18+ "%r is not divisible by %r, %r remains" % (
19 actual, self.divider, remainder))
20 else:
21 return None
22
23+When writing a ``describe`` method or constructing a ``Mismatch`` object the
24+code should ensure it only emits printable unicode. As this output must be
25+combined with other text and forwarded for presentation, letting through
26+non-ascii bytes of ambiguous encoding or control characters could throw an
27+exception or mangle the display. In most cases simply avoiding the ``%s``
28+format specifier and using ``%r`` instead will be enough. For examples of
29+more complex formatting see the ``testtools.matchers`` implementatons.
30+
31
32 Details
33 =======
34
35=== modified file 'testtools/compat.py'
36--- testtools/compat.py 2011-09-13 23:54:14 +0000
37+++ testtools/compat.py 2011-09-13 23:54:14 +0000
38@@ -25,6 +25,7 @@
39 import re
40 import sys
41 import traceback
42+import unicodedata
43
44 from testtools.helpers import try_imports
45
46@@ -52,6 +53,7 @@
47 """
48
49 if sys.version_info > (3, 0):
50+ import builtins
51 def _u(s):
52 return s
53 _r = ascii
54@@ -59,12 +61,14 @@
55 """A byte literal."""
56 return s.encode("latin-1")
57 advance_iterator = next
58+ # GZ 2011-08-24: Seems istext() is easy to misuse and makes for bad code.
59 def istext(x):
60 return isinstance(x, str)
61 def classtypes():
62 return (type,)
63 str_is_unicode = True
64 else:
65+ import __builtin__ as builtins
66 def _u(s):
67 # The double replace mangling going on prepares the string for
68 # unicode-escape - \foo is preserved, \u and \U are decoded.
69@@ -112,6 +116,95 @@
70 return isinstance(exception, (KeyboardInterrupt, SystemExit))
71
72
73+# GZ 2011-08-24: Using isinstance checks like this encourages bad interfaces,
74+# there should be better ways to write code needing this.
75+if not issubclass(getattr(builtins, "bytes", str), str):
76+ def _isbytes(x):
77+ return isinstance(x, bytes)
78+else:
79+ # Never return True on Pythons that provide the name but not the real type
80+ def _isbytes(x):
81+ return False
82+
83+
84+def _slow_escape(text):
85+ """Escape unicode `text` leaving printable characters unmodified
86+
87+ The behaviour emulates the Python 3 implementation of repr, see
88+ unicode_repr in unicodeobject.c and isprintable definition.
89+
90+ Because this iterates over the input a codepoint at a time, it's slow, and
91+ does not handle astral characters correctly on Python builds with 16 bit
92+ rather than 32 bit unicode type.
93+ """
94+ output = []
95+ for c in text:
96+ o = ord(c)
97+ if o < 256:
98+ if o < 32 or 126 < o < 161:
99+ output.append(c.encode("unicode-escape"))
100+ elif o == 92:
101+ # Separate due to bug in unicode-escape codec in Python 2.4
102+ output.append("\\\\")
103+ else:
104+ output.append(c)
105+ else:
106+ # To get correct behaviour would need to pair up surrogates here
107+ if unicodedata.category(c)[0] in "CZ":
108+ output.append(c.encode("unicode-escape"))
109+ else:
110+ output.append(c)
111+ return "".join(output)
112+
113+
114+def text_repr(text, multiline=None):
115+ """Rich repr for `text` returning unicode, triple quoted if `multiline`"""
116+ is_py3k = sys.version_info > (3, 0)
117+ nl = _isbytes(text) and bytes((0xA,)) or "\n"
118+ if multiline is None:
119+ multiline = nl in text
120+ if not multiline and (is_py3k or not str_is_unicode and type(text) is str):
121+ # Use normal repr for single line of unicode on Python 3 or bytes
122+ return repr(text)
123+ prefix = repr(text[:0])[:-2]
124+ if multiline:
125+ # To escape multiline strings, split and process each line in turn,
126+ # making sure that quotes are not escaped.
127+ if is_py3k:
128+ offset = len(prefix) + 1
129+ lines = []
130+ for l in text.split(nl):
131+ r = repr(l)
132+ q = r[-1]
133+ lines.append(r[offset:-1].replace("\\" + q, q))
134+ elif not str_is_unicode and isinstance(text, str):
135+ lines = [l.encode("string-escape").replace("\\'", "'")
136+ for l in text.split("\n")]
137+ else:
138+ lines = [_slow_escape(l) for l in text.split("\n")]
139+ # Combine the escaped lines and append two of the closing quotes,
140+ # then iterate over the result to escape triple quotes correctly.
141+ _semi_done = "\n".join(lines) + "''"
142+ p = 0
143+ while True:
144+ p = _semi_done.find("'''", p)
145+ if p == -1:
146+ break
147+ _semi_done = "\\".join([_semi_done[:p], _semi_done[p:]])
148+ p += 2
149+ return "".join([prefix, "'''\\\n", _semi_done, "'"])
150+ escaped_text = _slow_escape(text)
151+ # Determine which quote character to use and if one gets prefixed with a
152+ # backslash following the same logic Python uses for repr() on strings
153+ quote = "'"
154+ if "'" in text:
155+ if '"' in text:
156+ escaped_text = escaped_text.replace("'", "\\'")
157+ else:
158+ quote = '"'
159+ return "".join([prefix, quote, escaped_text, quote])
160+
161+
162 def unicode_output_stream(stream):
163 """Get wrapper for given stream that writes any unicode without exception
164
165@@ -146,11 +239,6 @@
166 return writer(stream, "replace")
167
168
169-try:
170- to_text = unicode
171-except NameError:
172- to_text = str
173-
174 # The default source encoding is actually "iso-8859-1" until Python 2.5 but
175 # using non-ascii causes a deprecation warning in 2.4 and it's cleaner to
176 # treat all versions the same way
177
178=== modified file 'testtools/matchers.py'
179--- testtools/matchers.py 2011-09-13 23:54:14 +0000
180+++ testtools/matchers.py 2011-09-13 23:54:14 +0000
181@@ -49,7 +49,10 @@
182 classtypes,
183 _error_repr,
184 isbaseexception,
185+ _isbytes,
186 istext,
187+ str_is_unicode,
188+ text_repr
189 )
190
191
192@@ -102,6 +105,8 @@
193 """Describe the mismatch.
194
195 This should be either a human-readable string or castable to a string.
196+ In particular, is should either be plain ascii or unicode on Python 2,
197+ and care should be taken to escape control characters.
198 """
199 try:
200 return self._description
201@@ -151,12 +156,25 @@
202 def __str__(self):
203 difference = self.mismatch.describe()
204 if self.verbose:
205+ # GZ 2011-08-24: Smelly API? Better to take any object and special
206+ # case text inside?
207+ if istext(self.matchee) or _isbytes(self.matchee):
208+ matchee = text_repr(self.matchee, multiline=False)
209+ else:
210+ matchee = repr(self.matchee)
211 return (
212- 'Match failed. Matchee: "%s"\nMatcher: %s\nDifference: %s\n'
213- % (self.matchee, self.matcher, difference))
214+ 'Match failed. Matchee: %s\nMatcher: %s\nDifference: %s\n'
215+ % (matchee, self.matcher, difference))
216 else:
217 return difference
218
219+ if not str_is_unicode:
220+
221+ __unicode__ = __str__
222+
223+ def __str__(self):
224+ return self.__unicode__().encode("ascii", "backslashreplace")
225+
226
227 class MismatchDecorator(object):
228 """Decorate a ``Mismatch``.
229@@ -268,7 +286,12 @@
230 self.with_nl = with_nl
231
232 def describe(self):
233- return self.matcher._describe_difference(self.with_nl)
234+ s = self.matcher._describe_difference(self.with_nl)
235+ if str_is_unicode or isinstance(s, unicode):
236+ return s
237+ # GZ 2011-08-24: This is actually pretty bogus, most C0 codes should
238+ # be escaped, in addition to non-ascii bytes.
239+ return s.decode("latin1").encode("ascii", "backslashreplace")
240
241
242 class DoesNotContain(Mismatch):
243@@ -298,8 +321,8 @@
244 self.expected = expected
245
246 def describe(self):
247- return "'%s' does not start with '%s'." % (
248- self.matchee, self.expected)
249+ return "%s does not start with %s." % (
250+ text_repr(self.matchee), text_repr(self.expected))
251
252
253 class DoesNotEndWith(Mismatch):
254@@ -314,8 +337,8 @@
255 self.expected = expected
256
257 def describe(self):
258- return "'%s' does not end with '%s'." % (
259- self.matchee, self.expected)
260+ return "%s does not end with %s." % (
261+ text_repr(self.matchee), text_repr(self.expected))
262
263
264 class _BinaryComparison(object):
265@@ -347,8 +370,8 @@
266 def _format(self, thing):
267 # Blocks of text with newlines are formatted as triple-quote
268 # strings. Everything else is pretty-printed.
269- if istext(thing) and '\n' in thing:
270- return '"""\\\n%s"""' % (thing,)
271+ if istext(thing) or _isbytes(thing):
272+ return text_repr(thing)
273 return pformat(thing)
274
275 def describe(self):
276@@ -359,7 +382,7 @@
277 self._mismatch_string, self._format(self.expected),
278 self._format(self.other))
279 else:
280- return "%s %s %s" % (left, self._mismatch_string,right)
281+ return "%s %s %s" % (left, self._mismatch_string, right)
282
283
284 class Equals(_BinaryComparison):
285@@ -599,7 +622,7 @@
286 self.expected = expected
287
288 def __str__(self):
289- return "Starts with '%s'." % self.expected
290+ return "StartsWith(%r)" % (self.expected,)
291
292 def match(self, matchee):
293 if not matchee.startswith(self.expected):
294@@ -618,7 +641,7 @@
295 self.expected = expected
296
297 def __str__(self):
298- return "Ends with '%s'." % self.expected
299+ return "EndsWith(%r)" % (self.expected,)
300
301 def match(self, matchee):
302 if not matchee.endswith(self.expected):
303@@ -875,8 +898,12 @@
304
305 def match(self, value):
306 if not re.match(self.pattern, value, self.flags):
307+ pattern = self.pattern
308+ if not isinstance(pattern, str_is_unicode and str or unicode):
309+ pattern = pattern.decode("latin1")
310+ pattern = pattern.encode("unicode_escape").decode("ascii")
311 return Mismatch("%r does not match /%s/" % (
312- value, self.pattern))
313+ value, pattern.replace("\\\\", "\\")))
314
315
316 class MatchesSetwise(object):
317
318=== modified file 'testtools/tests/test_compat.py'
319--- testtools/tests/test_compat.py 2011-07-04 18:03:28 +0000
320+++ testtools/tests/test_compat.py 2011-09-13 23:54:14 +0000
321@@ -16,6 +16,7 @@
322 _get_source_encoding,
323 _u,
324 str_is_unicode,
325+ text_repr,
326 unicode_output_stream,
327 )
328 from testtools.matchers import (
329@@ -262,6 +263,132 @@
330 self.assertEqual("pa???n", sout.getvalue())
331
332
333+class TestTextRepr(testtools.TestCase):
334+ """Ensure in extending repr, basic behaviours are not being broken"""
335+
336+ ascii_examples = (
337+ # Single character examples
338+ # C0 control codes should be escaped except multiline \n
339+ ("\x00", "'\\x00'", "'''\\\n\\x00'''"),
340+ ("\b", "'\\x08'", "'''\\\n\\x08'''"),
341+ ("\t", "'\\t'", "'''\\\n\\t'''"),
342+ ("\n", "'\\n'", "'''\\\n\n'''"),
343+ ("\r", "'\\r'", "'''\\\n\\r'''"),
344+ # Quotes and backslash should match normal repr behaviour
345+ ('"', "'\"'", "'''\\\n\"'''"),
346+ ("'", "\"'\"", "'''\\\n\\''''"),
347+ ("\\", "'\\\\'", "'''\\\n\\\\'''"),
348+ # DEL is also unprintable and should be escaped
349+ ("\x7F", "'\\x7f'", "'''\\\n\\x7f'''"),
350+
351+ # Character combinations that need double checking
352+ ("\r\n", "'\\r\\n'", "'''\\\n\\r\n'''"),
353+ ("\"'", "'\"\\''", "'''\\\n\"\\''''"),
354+ ("'\"", "'\\'\"'", "'''\\\n'\"'''"),
355+ ("\\n", "'\\\\n'", "'''\\\n\\\\n'''"),
356+ ("\\\n", "'\\\\\\n'", "'''\\\n\\\\\n'''"),
357+ ("\\' ", "\"\\\\' \"", "'''\\\n\\\\' '''"),
358+ ("\\'\n", "\"\\\\'\\n\"", "'''\\\n\\\\'\n'''"),
359+ ("\\'\"", "'\\\\\\'\"'", "'''\\\n\\\\'\"'''"),
360+ ("\\'''", "\"\\\\'''\"", "'''\\\n\\\\\\'\\'\\''''"),
361+ )
362+
363+ # Bytes with the high bit set should always be escaped
364+ bytes_examples = (
365+ (_b("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
366+ (_b("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
367+ (_b("\xC0"), "'\\xc0'", "'''\\\n\\xc0'''"),
368+ (_b("\xFF"), "'\\xff'", "'''\\\n\\xff'''"),
369+ (_b("\xC2\xA7"), "'\\xc2\\xa7'", "'''\\\n\\xc2\\xa7'''"),
370+ )
371+
372+ # Unicode doesn't escape printable characters as per the Python 3 model
373+ unicode_examples = (
374+ # C1 codes are unprintable
375+ (_u("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
376+ (_u("\x9F"), "'\\x9f'", "'''\\\n\\x9f'''"),
377+ # No-break space is unprintable
378+ (_u("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
379+ # Letters latin alphabets are printable
380+ (_u("\xA1"), _u("'\xa1'"), _u("'''\\\n\xa1'''")),
381+ (_u("\xFF"), _u("'\xff'"), _u("'''\\\n\xff'''")),
382+ (_u("\u0100"), _u("'\u0100'"), _u("'''\\\n\u0100'''")),
383+ # Line and paragraph seperators are unprintable
384+ (_u("\u2028"), "'\\u2028'", "'''\\\n\\u2028'''"),
385+ (_u("\u2029"), "'\\u2029'", "'''\\\n\\u2029'''"),
386+ # Unpaired surrogates are unprintable
387+ (_u("\uD800"), "'\\ud800'", "'''\\\n\\ud800'''"),
388+ (_u("\uDFFF"), "'\\udfff'", "'''\\\n\\udfff'''"),
389+ # Unprintable general categories not fully tested: Cc, Cf, Co, Cn, Zs
390+ )
391+
392+ b_prefix = repr(_b(""))[:-2]
393+ u_prefix = repr(_u(""))[:-2]
394+
395+ def test_ascii_examples_oneline_bytes(self):
396+ for s, expected, _ in self.ascii_examples:
397+ b = _b(s)
398+ actual = text_repr(b, multiline=False)
399+ # Add self.assertIsInstance check?
400+ self.assertEqual(actual, self.b_prefix + expected)
401+ self.assertEqual(eval(actual), b)
402+
403+ def test_ascii_examples_oneline_unicode(self):
404+ for s, expected, _ in self.ascii_examples:
405+ u = _u(s)
406+ actual = text_repr(u, multiline=False)
407+ self.assertEqual(actual, self.u_prefix + expected)
408+ self.assertEqual(eval(actual), u)
409+
410+ def test_ascii_examples_multiline_bytes(self):
411+ for s, _, expected in self.ascii_examples:
412+ b = _b(s)
413+ actual = text_repr(b, multiline=True)
414+ self.assertEqual(actual, self.b_prefix + expected)
415+ self.assertEqual(eval(actual), b)
416+
417+ def test_ascii_examples_multiline_unicode(self):
418+ for s, _, expected in self.ascii_examples:
419+ u = _u(s)
420+ actual = text_repr(u, multiline=True)
421+ self.assertEqual(actual, self.u_prefix + expected)
422+ self.assertEqual(eval(actual), u)
423+
424+ def test_ascii_examples_defaultline_bytes(self):
425+ for s, one, multi in self.ascii_examples:
426+ expected = "\n" in s and multi or one
427+ self.assertEqual(text_repr(_b(s)), self.b_prefix + expected)
428+
429+ def test_ascii_examples_defaultline_unicode(self):
430+ for s, one, multi in self.ascii_examples:
431+ expected = "\n" in s and multi or one
432+ self.assertEqual(text_repr(_u(s)), self.u_prefix + expected)
433+
434+ def test_bytes_examples_oneline(self):
435+ for b, expected, _ in self.bytes_examples:
436+ actual = text_repr(b, multiline=False)
437+ self.assertEqual(actual, self.b_prefix + expected)
438+ self.assertEqual(eval(actual), b)
439+
440+ def test_bytes_examples_multiline(self):
441+ for b, _, expected in self.bytes_examples:
442+ actual = text_repr(b, multiline=True)
443+ self.assertEqual(actual, self.b_prefix + expected)
444+ self.assertEqual(eval(actual), b)
445+
446+ def test_unicode_examples_oneline(self):
447+ for u, expected, _ in self.unicode_examples:
448+ actual = text_repr(u, multiline=False)
449+ self.assertEqual(actual, self.u_prefix + expected)
450+ self.assertEqual(eval(actual), u)
451+
452+ def test_unicode_examples_multiline(self):
453+ for u, _, expected in self.unicode_examples:
454+ actual = text_repr(u, multiline=True)
455+ self.assertEqual(actual, self.u_prefix + expected)
456+ self.assertEqual(eval(actual), u)
457+
458+
459 def test_suite():
460 from unittest import TestLoader
461 return TestLoader().loadTestsFromName(__name__)
462
463=== modified file 'testtools/tests/test_matchers.py'
464--- testtools/tests/test_matchers.py 2011-09-13 23:54:14 +0000
465+++ testtools/tests/test_matchers.py 2011-09-13 23:54:14 +0000
466@@ -12,7 +12,9 @@
467 )
468 from testtools.compat import (
469 StringIO,
470- to_text,
471+ str_is_unicode,
472+ text_repr,
473+ _b,
474 _u,
475 )
476 from testtools.matchers import (
477@@ -20,7 +22,11 @@
478 AllMatch,
479 Annotate,
480 AnnotatedMismatch,
481+<<<<<<< TREE
482 Contains,
483+=======
484+ _BinaryMismatch,
485+>>>>>>> MERGE-SOURCE
486 Equals,
487 DocTestMatches,
488 DoesNotEndWith,
489@@ -96,7 +102,7 @@
490 mismatch = matcher.match(2)
491 e = MismatchError(matchee, matcher, mismatch, True)
492 expected = (
493- 'Match failed. Matchee: "%s"\n'
494+ 'Match failed. Matchee: %r\n'
495 'Matcher: %s\n'
496 'Difference: %s\n' % (
497 matchee,
498@@ -112,17 +118,80 @@
499 matcher = Equals(_u('a'))
500 mismatch = matcher.match(matchee)
501 expected = (
502- 'Match failed. Matchee: "%s"\n'
503+ 'Match failed. Matchee: %s\n'
504 'Matcher: %s\n'
505 'Difference: %s\n' % (
506- matchee,
507+ text_repr(matchee),
508 matcher,
509 mismatch.describe(),
510 ))
511 e = MismatchError(matchee, matcher, mismatch, True)
512- # XXX: Using to_text rather than str because, on Python 2, str will
513- # raise UnicodeEncodeError.
514- self.assertEqual(expected, to_text(e))
515+ if str_is_unicode:
516+ actual = str(e)
517+ else:
518+ actual = unicode(e)
519+ # Using str() should still work, and return ascii only
520+ self.assertEqual(
521+ expected.replace(matchee, matchee.encode("unicode-escape")),
522+ str(e).decode("ascii"))
523+ self.assertEqual(expected, actual)
524+
525+
526+class Test_BinaryMismatch(TestCase):
527+ """Mismatches from binary comparisons need useful describe output"""
528+
529+ _long_string = "This is a longish multiline non-ascii string\n\xa7"
530+ _long_b = _b(_long_string)
531+ _long_u = _u(_long_string)
532+
533+ def test_short_objects(self):
534+ o1, o2 = object(), object()
535+ mismatch = _BinaryMismatch(o1, "!~", o2)
536+ self.assertEqual(mismatch.describe(), "%r !~ %r" % (o1, o2))
537+
538+ def test_short_mixed_strings(self):
539+ b, u = _b("\xa7"), _u("\xa7")
540+ mismatch = _BinaryMismatch(b, "!~", u)
541+ self.assertEqual(mismatch.describe(), "%r !~ %r" % (b, u))
542+
543+ def test_long_bytes(self):
544+ one_line_b = self._long_b.replace(_b("\n"), _b(" "))
545+ mismatch = _BinaryMismatch(one_line_b, "!~", self._long_b)
546+ self.assertEqual(mismatch.describe(),
547+ "%s:\nreference = %s\nactual = %s\n" % ("!~",
548+ text_repr(one_line_b),
549+ text_repr(self._long_b, multiline=True)))
550+
551+ def test_long_unicode(self):
552+ one_line_u = self._long_u.replace("\n", " ")
553+ mismatch = _BinaryMismatch(one_line_u, "!~", self._long_u)
554+ self.assertEqual(mismatch.describe(),
555+ "%s:\nreference = %s\nactual = %s\n" % ("!~",
556+ text_repr(one_line_u),
557+ text_repr(self._long_u, multiline=True)))
558+
559+ def test_long_mixed_strings(self):
560+ mismatch = _BinaryMismatch(self._long_b, "!~", self._long_u)
561+ self.assertEqual(mismatch.describe(),
562+ "%s:\nreference = %s\nactual = %s\n" % ("!~",
563+ text_repr(self._long_b, multiline=True),
564+ text_repr(self._long_u, multiline=True)))
565+
566+ def test_long_bytes_and_object(self):
567+ obj = object()
568+ mismatch = _BinaryMismatch(self._long_b, "!~", obj)
569+ self.assertEqual(mismatch.describe(),
570+ "%s:\nreference = %s\nactual = %s\n" % ("!~",
571+ text_repr(self._long_b, multiline=True),
572+ repr(obj)))
573+
574+ def test_long_unicode_and_object(self):
575+ obj = object()
576+ mismatch = _BinaryMismatch(self._long_u, "!~", obj)
577+ self.assertEqual(mismatch.describe(),
578+ "%s:\nreference = %s\nactual = %s\n" % ("!~",
579+ text_repr(self._long_u, multiline=True),
580+ repr(obj)))
581
582
583 class TestMatchersInterface(object):
584@@ -208,6 +277,23 @@
585 self.assertEqual("bar\n", matcher.want)
586 self.assertEqual(doctest.ELLIPSIS, matcher.flags)
587
588+ def test_describe_non_ascii_bytes(self):
589+ """Even with bytestrings, the mismatch should be coercible to unicode
590+
591+ DocTestMatches is intended for text, but the Python 2 str type also
592+ permits arbitrary binary inputs. This is a slightly bogus thing to do,
593+ and under Python 3 using bytes objects will reasonably raise an error.
594+ """
595+ header = _b("\x89PNG\r\n\x1a\n...")
596+ if str_is_unicode:
597+ self.assertRaises(TypeError,
598+ DocTestMatches, header, doctest.ELLIPSIS)
599+ return
600+ matcher = DocTestMatches(header, doctest.ELLIPSIS)
601+ mismatch = matcher.match(_b("GIF89a\1\0\1\0\0\0\0;"))
602+ # Must be treatable as unicode text, the exact output matters less
603+ self.assertTrue(unicode(mismatch.describe()))
604+
605
606 class TestEqualsInterface(TestCase, TestMatchersInterface):
607
608@@ -610,6 +696,21 @@
609 mismatch = DoesNotStartWith("fo", "bo")
610 self.assertEqual("'fo' does not start with 'bo'.", mismatch.describe())
611
612+ def test_describe_non_ascii_unicode(self):
613+ string = _u("A\xA7")
614+ suffix = _u("B\xA7")
615+ mismatch = DoesNotStartWith(string, suffix)
616+ self.assertEqual("%s does not start with %s." % (
617+ text_repr(string), text_repr(suffix)),
618+ mismatch.describe())
619+
620+ def test_describe_non_ascii_bytes(self):
621+ string = _b("A\xA7")
622+ suffix = _b("B\xA7")
623+ mismatch = DoesNotStartWith(string, suffix)
624+ self.assertEqual("%r does not start with %r." % (string, suffix),
625+ mismatch.describe())
626+
627
628 class StartsWithTests(TestCase):
629
630@@ -617,7 +718,17 @@
631
632 def test_str(self):
633 matcher = StartsWith("bar")
634- self.assertEqual("Starts with 'bar'.", str(matcher))
635+ self.assertEqual("StartsWith('bar')", str(matcher))
636+
637+ def test_str_with_bytes(self):
638+ b = _b("\xA7")
639+ matcher = StartsWith(b)
640+ self.assertEqual("StartsWith(%r)" % (b,), str(matcher))
641+
642+ def test_str_with_unicode(self):
643+ u = _u("\xA7")
644+ matcher = StartsWith(u)
645+ self.assertEqual("StartsWith(%r)" % (u,), str(matcher))
646
647 def test_match(self):
648 matcher = StartsWith("bar")
649@@ -646,6 +757,21 @@
650 mismatch = DoesNotEndWith("fo", "bo")
651 self.assertEqual("'fo' does not end with 'bo'.", mismatch.describe())
652
653+ def test_describe_non_ascii_unicode(self):
654+ string = _u("A\xA7")
655+ suffix = _u("B\xA7")
656+ mismatch = DoesNotEndWith(string, suffix)
657+ self.assertEqual("%s does not end with %s." % (
658+ text_repr(string), text_repr(suffix)),
659+ mismatch.describe())
660+
661+ def test_describe_non_ascii_bytes(self):
662+ string = _b("A\xA7")
663+ suffix = _b("B\xA7")
664+ mismatch = DoesNotEndWith(string, suffix)
665+ self.assertEqual("%r does not end with %r." % (string, suffix),
666+ mismatch.describe())
667+
668
669 class EndsWithTests(TestCase):
670
671@@ -653,7 +779,17 @@
672
673 def test_str(self):
674 matcher = EndsWith("bar")
675- self.assertEqual("Ends with 'bar'.", str(matcher))
676+ self.assertEqual("EndsWith('bar')", str(matcher))
677+
678+ def test_str_with_bytes(self):
679+ b = _b("\xA7")
680+ matcher = EndsWith(b)
681+ self.assertEqual("EndsWith(%r)" % (b,), str(matcher))
682+
683+ def test_str_with_unicode(self):
684+ u = _u("\xA7")
685+ matcher = EndsWith(u)
686+ self.assertEqual("EndsWith(%r)" % (u,), str(matcher))
687
688 def test_match(self):
689 matcher = EndsWith("arf")
690@@ -770,11 +906,17 @@
691 ("MatchesRegex('a|b')", MatchesRegex('a|b')),
692 ("MatchesRegex('a|b', re.M)", MatchesRegex('a|b', re.M)),
693 ("MatchesRegex('a|b', re.I|re.M)", MatchesRegex('a|b', re.I|re.M)),
694+ ("MatchesRegex(%r)" % (_b("\xA7"),), MatchesRegex(_b("\xA7"))),
695+ ("MatchesRegex(%r)" % (_u("\xA7"),), MatchesRegex(_u("\xA7"))),
696 ]
697
698 describe_examples = [
699 ("'c' does not match /a|b/", 'c', MatchesRegex('a|b')),
700 ("'c' does not match /a\d/", 'c', MatchesRegex(r'a\d')),
701+ ("%r does not match /\\s+\\xa7/" % (_b('c'),),
702+ _b('c'), MatchesRegex(_b("\\s+\xA7"))),
703+ ("%r does not match /\\s+\\xa7/" % (_u('c'),),
704+ _u('c'), MatchesRegex(_u("\\s+\xA7"))),
705 ]
706
707
708
709=== modified file 'testtools/tests/test_testcase.py'
710--- testtools/tests/test_testcase.py 2011-09-13 23:54:14 +0000
711+++ testtools/tests/test_testcase.py 2011-09-13 23:54:14 +0000
712@@ -488,7 +488,7 @@
713 matchee = 'foo'
714 matcher = Equals('bar')
715 expected = (
716- 'Match failed. Matchee: "%s"\n'
717+ 'Match failed. Matchee: %r\n'
718 'Matcher: %s\n'
719 'Difference: %s\n' % (
720 matchee,
721@@ -528,10 +528,10 @@
722 matchee = _u('\xa7')
723 matcher = Equals(_u('a'))
724 expected = (
725- 'Match failed. Matchee: "%s"\n'
726+ 'Match failed. Matchee: %s\n'
727 'Matcher: %s\n'
728 'Difference: %s\n\n' % (
729- matchee,
730+ repr(matchee).replace("\\xa7", matchee),
731 matcher,
732 matcher.match(matchee).describe(),
733 ))
734@@ -565,6 +565,21 @@
735 self.assertFails(expected_error, self.assertEquals, a, b)
736 self.assertFails(expected_error, self.failUnlessEqual, a, b)
737
738+ def test_assertEqual_non_ascii_str_with_newlines(self):
739+ message = _u("Be careful mixing unicode and bytes")
740+ a = "a\n\xa7\n"
741+ b = "Just a longish string so the more verbose output form is used."
742+ expected_error = '\n'.join([
743+ '!=:',
744+ "reference = '''\\",
745+ 'a',
746+ repr('\xa7')[1:-1],
747+ "'''",
748+ 'actual = %r' % (b,),
749+ ': ' + message,
750+ ])
751+ self.assertFails(expected_error, self.assertEqual, a, b, message)
752+
753 def test_assertIsNone(self):
754 self.assertIsNone(None)
755

Subscribers

People subscribed via source and target branches