Merge lp:~gz/testtools/unprintable-assertThat-804127 into lp:~testtools-committers/testtools/trunk

Proposed by Martin Packman
Status: Merged
Approved by: Jonathan Lange
Approved revision: 245
Merged at revision: 229
Proposed branch: lp:~gz/testtools/unprintable-assertThat-804127
Merge into: lp:~testtools-committers/testtools/trunk
Prerequisite: lp:~jml/testtools/unprintable-assertThat-804127
Diff against target: 754 lines (+439/-32) (has conflicts)
6 files modified
doc/for-test-authors.rst (+10/-2)
testtools/compat.py (+93/-5)
testtools/matchers.py (+40/-13)
testtools/tests/test_compat.py (+127/-0)
testtools/tests/test_matchers.py (+151/-9)
testtools/tests/test_testcase.py (+18/-3)
Text conflict in testtools/tests/test_matchers.py
To merge this branch: bzr merge lp:~gz/testtools/unprintable-assertThat-804127
Reviewer Review Type Date Requested Status
Jonathan Lange Approve
Review via email: mp+72641@code.launchpad.net

Description of the change

Resolve issues related to stringifying in the matcher code, by replacing several custom repr-like schemes using %s on strings (which may upcase the result to unicode or prevent later upcasting, or leak control codes, or...) with a big ugly method that tries to do what the callers wanted.

This deserves a longer description of the reasoning behind the changes, but there's some discussion of the reasoning in the prerequisite branch and I need some feedback before I go crazy\bier.

Some random notes so I don't forget them later:
* Switched from """...""" to '''...''' so if I see output pasted in bug reports I'll know if I broke it.
* Being clever with the escaping of single quotes in multiline strings is probably more trouble than it's worth, just putting a backslash in front of every occurrence would simplify the logic a lot.
* Astral characters break, as per usual. I'm amused to note that upstream just fixed one more issue their end[1] (but without exposing a usable iterator to python level code still), and on Python 3 only of course.
* If the decision is to dump all these attempts at fancy formatting and just live with the normal repr, I would not mind that at all.

[1]: http://bugs.python.org/issue9200

To post a comment you must log in.
Revision history for this message
Jonathan Lange (jml) wrote :

Heroic effort. I can only imagine what this must have been like to write.

What follows is initial feedback and thoughts. Answer & fix as much as you
like. I'm happy to fix up the minor stuff as I land.

To summarize, although the bug was originally filed against assertThat, it
turns out that it's actually an issue in the matchers themselves as well as
assertThat. Right?

If so, we have to be thinking about:
 * what are our failure modes with this code and existing broken matchers
 * how to make it easy to define matchers and mismatches correctly

Will return to those in a bit. General code stuff:

 * Can you elaborate on your comment on istext()?
 * Is there a way we can make it so matchers never have to use _isbytes?
   Perhaps by adding an "is_multiline" function to compat.
 * I don't care about astral characters
 * In compat, "Seperate" should be spelled "Separate"
 * text_repr is a bit hard to follow, maybe add some internal comments?
 * Some more documentation on the intent & API of text_repr would help,
   saying that it attempts to mimic repr() except returning unicode,
   giving some examples, explaining why you might want to use it and also what
   multiline does.
 * I don't get why multiline needs three different ways of escaping lines
   whereas the non-multiline case only has two branches
 * I think having a repr()-like function that takes any object and
   special-cases text would be good
 * Just to be clear, we are expecting Mismatch.describe() to return unicode,
   right?
 * What are we expecting Matcher.__str__ to return? Text or bytes?

On the original points:

 * We should add something to the docs explaining best practice for
   Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)
 * We should update the docstring for ``Mismatch.describe``
 * Our docs say that match() should return a Mismatch, so we can assume
   subclassing of that.
 * These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
   e.g. The 'message' parameter of assertThat needs to have its type
   documented.

Thanks so much for persevering.

jml

Revision history for this message
Robert Collins (lifeless) wrote :

 * What are we expecting Matcher.__str__ to return? Text or bytes?

-- text. Fugly I know. So python 2.x: use __unicode__, 3.x use
__str__, and we expect text all the time. We use it to get something
looking like what the user asked for after all.
....

 * Our docs say that match() should return a Mismatch, so we can assume
  subclassing of that.

Please no, duck typing should be all we need. (And None is not a
subclass of Mismatch anyway :P).

-Rob

Revision history for this message
Martin Packman (gz) wrote :
Download full text (6.1 KiB)

Thanks, this is is a useful review that covers all the stuff I needed to mention but didn't get in the proposal.

> To summarize, although the bug was originally filed against assertThat, it
> turns out that it's actually an issue in the matchers themselves as well as
> assertThat. Right?

Yes. The initial thoughts were about adding more escaping to assertThat, but by then most of the formatting has been done. By moving the onus onto Matchers to escape more carefully, it's possible to preserve more information.

> If so, we have to be thinking about:
> * what are our failure modes with this code and existing broken matchers
> * how to make it easy to define matchers and mismatches correctly

These are exactly the two primary concerns here. We don't want to punish test case authors with spurious errors if they specify an assert incorrectly, but give a clear enough failure for them to fix the problem. We also don't want learning about Python string quirks to be a prerequisite for writing a Matcher.

> * Can you elaborate on your comment on istext()?

The result of the function is generally not enough information to actually use your string safely.

Two of the current callers in texttools are just trying to do special handling on a string, and otherwise assume the object is already a specific correct type. In both cases, the call is (harmlessly) subtly wrong, __import__ doesn't take non-ascii strings on Python 2, and Python 3 bytes can be used for regular expressions.

Code wanting to do actual string handling has it worse, as it will always risk getting a string that can't be safely interpolated, or written stream, or otherwise given as output.

In short, it's something of an attractive nuisance. In using it, I realised the text_repr interface wasn't very good.

> * Is there a way we can make it so matchers never have to use _isbytes?
> Perhaps by adding an "is_multiline" function to compat.

Something like this would be much better, possibly moving the multiline logic into the repr function (so True means triple quote if there's a newline, or by adding a third state). This was me lowering my standards and keeping coding, in order to get something that would work. :)

> * text_repr is a bit hard to follow, maybe add some internal comments?

Yup, and some general cleanup would help, I left the first version that passed the tests alone so we could see if it actually worked as an interface before buffing it.

> * Some more documentation on the intent & API of text_repr would help,
> saying that it attempts to mimic repr() except returning unicode,
> giving some examples, explaining why you might want to use it and also what
> multiline does.

Will write this. As is probably clear from the tests, it aims to use the Python 3 logic for displaying literal unicode characters, and to maintain the `eval(repr(s)) == s` expectation.

> * I don't get why multiline needs three different ways of escaping lines
> whereas the non-multiline case only has two branches

So, there are a cube of relevant combinations:

               Multiline?
                 F T
               +---+---+ Python 3
             / | | R | bytes
           / ...

Read more...

Revision history for this message
Jonathan Lange (jml) wrote :

Wow, thanks for the reply. Sorry its taken so long to get back to you.

In general, your replies to my review seem to be either agreeing with me or informing me of something I didn't know. There are a few cleanup things that need to be done before this lands (identified above). Where do you want to go from here?

241. By Martin Packman

Extra tests to ensure multiline quoting is correct on Python 3

242. By Martin Packman

Fix spelling error noted in review by jml

243. By Martin Packman

Add cautions about unprintable strings to Mismatch documentation

244. By Martin Packman

Add third state to multiline argument of text_repr and comment the internal logic

245. By Martin Packman

Test that not passing multiline to text_repr defaults based on input

Revision history for this message
Martin Packman (gz) wrote :

Okay, this branch should now be tidied up enough to land, can you see anything I've missed?

I tried out switching the repr-like function to take any object, and it does seem to be a better interface but complicates the logic even further, so should probably be done separately.

There's still a bunch of todo all over the place, and I haven't managed to squish _isbytes yet, though it's not strictly necessary in the matchers code bar some new tests I added that expect fancy behaviour.

There are a couple more interesting things that could do with handling later. Either Annotate or AnnotateMismatch (or both, see the earlier complaint about string typing) should check the message they are given is printable unicode. Also the regular expression display form could do with some extra testing, I'm not totally convinced it always produces correct expressions (and certainly doesn't with some on Python 2.4 where the unicode-escape codec fails to escape backslashes).

Revision history for this message
Jonathan Lange (jml) wrote :

Can't see anything you've missed. I'm going to merge this, and will file wishlist bugs for the things that you say you've left out.

Revision history for this message
Jonathan Lange (jml) :
review: Approve
Revision history for this message
Jonathan Lange (jml) wrote :
Revision history for this message
Martin Packman (gz) wrote :

Great, thanks jml, I'll subscribe to those bugs.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'doc/for-test-authors.rst'
--- doc/for-test-authors.rst 2011-08-15 16:14:42 +0000
+++ doc/for-test-authors.rst 2011-09-13 23:54:14 +0000
@@ -717,7 +717,7 @@
717 self.remainder = remainder717 self.remainder = remainder
718718
719 def describe(self):719 def describe(self):
720 return "%s is not divisible by %s, %s remains" % (720 return "%r is not divisible by %r, %r remains" % (
721 self.number, self.divider, self.remainder)721 self.number, self.divider, self.remainder)
722722
723 def get_details(self):723 def get_details(self):
@@ -738,11 +738,19 @@
738 remainder = actual % self.divider738 remainder = actual % self.divider
739 if remainder != 0:739 if remainder != 0:
740 return Mismatch(740 return Mismatch(
741 "%s is not divisible by %s, %s remains" % (741 "%r is not divisible by %r, %r remains" % (
742 actual, self.divider, remainder))742 actual, self.divider, remainder))
743 else:743 else:
744 return None744 return None
745745
746When writing a ``describe`` method or constructing a ``Mismatch`` object the
747code should ensure it only emits printable unicode. As this output must be
748combined with other text and forwarded for presentation, letting through
749non-ascii bytes of ambiguous encoding or control characters could throw an
750exception or mangle the display. In most cases simply avoiding the ``%s``
751format specifier and using ``%r`` instead will be enough. For examples of
752more complex formatting see the ``testtools.matchers`` implementatons.
753
746754
747Details755Details
748=======756=======
749757
=== modified file 'testtools/compat.py'
--- testtools/compat.py 2011-09-13 23:54:14 +0000
+++ testtools/compat.py 2011-09-13 23:54:14 +0000
@@ -25,6 +25,7 @@
25import re25import re
26import sys26import sys
27import traceback27import traceback
28import unicodedata
2829
29from testtools.helpers import try_imports30from testtools.helpers import try_imports
3031
@@ -52,6 +53,7 @@
52"""53"""
5354
54if sys.version_info > (3, 0):55if sys.version_info > (3, 0):
56 import builtins
55 def _u(s):57 def _u(s):
56 return s58 return s
57 _r = ascii59 _r = ascii
@@ -59,12 +61,14 @@
59 """A byte literal."""61 """A byte literal."""
60 return s.encode("latin-1")62 return s.encode("latin-1")
61 advance_iterator = next63 advance_iterator = next
64 # GZ 2011-08-24: Seems istext() is easy to misuse and makes for bad code.
62 def istext(x):65 def istext(x):
63 return isinstance(x, str)66 return isinstance(x, str)
64 def classtypes():67 def classtypes():
65 return (type,)68 return (type,)
66 str_is_unicode = True69 str_is_unicode = True
67else:70else:
71 import __builtin__ as builtins
68 def _u(s):72 def _u(s):
69 # The double replace mangling going on prepares the string for73 # The double replace mangling going on prepares the string for
70 # unicode-escape - \foo is preserved, \u and \U are decoded.74 # unicode-escape - \foo is preserved, \u and \U are decoded.
@@ -112,6 +116,95 @@
112 return isinstance(exception, (KeyboardInterrupt, SystemExit))116 return isinstance(exception, (KeyboardInterrupt, SystemExit))
113117
114118
119# GZ 2011-08-24: Using isinstance checks like this encourages bad interfaces,
120# there should be better ways to write code needing this.
121if not issubclass(getattr(builtins, "bytes", str), str):
122 def _isbytes(x):
123 return isinstance(x, bytes)
124else:
125 # Never return True on Pythons that provide the name but not the real type
126 def _isbytes(x):
127 return False
128
129
130def _slow_escape(text):
131 """Escape unicode `text` leaving printable characters unmodified
132
133 The behaviour emulates the Python 3 implementation of repr, see
134 unicode_repr in unicodeobject.c and isprintable definition.
135
136 Because this iterates over the input a codepoint at a time, it's slow, and
137 does not handle astral characters correctly on Python builds with 16 bit
138 rather than 32 bit unicode type.
139 """
140 output = []
141 for c in text:
142 o = ord(c)
143 if o < 256:
144 if o < 32 or 126 < o < 161:
145 output.append(c.encode("unicode-escape"))
146 elif o == 92:
147 # Separate due to bug in unicode-escape codec in Python 2.4
148 output.append("\\\\")
149 else:
150 output.append(c)
151 else:
152 # To get correct behaviour would need to pair up surrogates here
153 if unicodedata.category(c)[0] in "CZ":
154 output.append(c.encode("unicode-escape"))
155 else:
156 output.append(c)
157 return "".join(output)
158
159
160def text_repr(text, multiline=None):
161 """Rich repr for `text` returning unicode, triple quoted if `multiline`"""
162 is_py3k = sys.version_info > (3, 0)
163 nl = _isbytes(text) and bytes((0xA,)) or "\n"
164 if multiline is None:
165 multiline = nl in text
166 if not multiline and (is_py3k or not str_is_unicode and type(text) is str):
167 # Use normal repr for single line of unicode on Python 3 or bytes
168 return repr(text)
169 prefix = repr(text[:0])[:-2]
170 if multiline:
171 # To escape multiline strings, split and process each line in turn,
172 # making sure that quotes are not escaped.
173 if is_py3k:
174 offset = len(prefix) + 1
175 lines = []
176 for l in text.split(nl):
177 r = repr(l)
178 q = r[-1]
179 lines.append(r[offset:-1].replace("\\" + q, q))
180 elif not str_is_unicode and isinstance(text, str):
181 lines = [l.encode("string-escape").replace("\\'", "'")
182 for l in text.split("\n")]
183 else:
184 lines = [_slow_escape(l) for l in text.split("\n")]
185 # Combine the escaped lines and append two of the closing quotes,
186 # then iterate over the result to escape triple quotes correctly.
187 _semi_done = "\n".join(lines) + "''"
188 p = 0
189 while True:
190 p = _semi_done.find("'''", p)
191 if p == -1:
192 break
193 _semi_done = "\\".join([_semi_done[:p], _semi_done[p:]])
194 p += 2
195 return "".join([prefix, "'''\\\n", _semi_done, "'"])
196 escaped_text = _slow_escape(text)
197 # Determine which quote character to use and if one gets prefixed with a
198 # backslash following the same logic Python uses for repr() on strings
199 quote = "'"
200 if "'" in text:
201 if '"' in text:
202 escaped_text = escaped_text.replace("'", "\\'")
203 else:
204 quote = '"'
205 return "".join([prefix, quote, escaped_text, quote])
206
207
115def unicode_output_stream(stream):208def unicode_output_stream(stream):
116 """Get wrapper for given stream that writes any unicode without exception209 """Get wrapper for given stream that writes any unicode without exception
117210
@@ -146,11 +239,6 @@
146 return writer(stream, "replace")239 return writer(stream, "replace")
147240
148241
149try:
150 to_text = unicode
151except NameError:
152 to_text = str
153
154# The default source encoding is actually "iso-8859-1" until Python 2.5 but242# The default source encoding is actually "iso-8859-1" until Python 2.5 but
155# using non-ascii causes a deprecation warning in 2.4 and it's cleaner to243# using non-ascii causes a deprecation warning in 2.4 and it's cleaner to
156# treat all versions the same way244# treat all versions the same way
157245
=== modified file 'testtools/matchers.py'
--- testtools/matchers.py 2011-09-13 23:54:14 +0000
+++ testtools/matchers.py 2011-09-13 23:54:14 +0000
@@ -49,7 +49,10 @@
49 classtypes,49 classtypes,
50 _error_repr,50 _error_repr,
51 isbaseexception,51 isbaseexception,
52 _isbytes,
52 istext,53 istext,
54 str_is_unicode,
55 text_repr
53 )56 )
5457
5558
@@ -102,6 +105,8 @@
102 """Describe the mismatch.105 """Describe the mismatch.
103106
104 This should be either a human-readable string or castable to a string.107 This should be either a human-readable string or castable to a string.
108 In particular, is should either be plain ascii or unicode on Python 2,
109 and care should be taken to escape control characters.
105 """110 """
106 try:111 try:
107 return self._description112 return self._description
@@ -151,12 +156,25 @@
151 def __str__(self):156 def __str__(self):
152 difference = self.mismatch.describe()157 difference = self.mismatch.describe()
153 if self.verbose:158 if self.verbose:
159 # GZ 2011-08-24: Smelly API? Better to take any object and special
160 # case text inside?
161 if istext(self.matchee) or _isbytes(self.matchee):
162 matchee = text_repr(self.matchee, multiline=False)
163 else:
164 matchee = repr(self.matchee)
154 return (165 return (
155 'Match failed. Matchee: "%s"\nMatcher: %s\nDifference: %s\n'166 'Match failed. Matchee: %s\nMatcher: %s\nDifference: %s\n'
156 % (self.matchee, self.matcher, difference))167 % (matchee, self.matcher, difference))
157 else:168 else:
158 return difference169 return difference
159170
171 if not str_is_unicode:
172
173 __unicode__ = __str__
174
175 def __str__(self):
176 return self.__unicode__().encode("ascii", "backslashreplace")
177
160178
161class MismatchDecorator(object):179class MismatchDecorator(object):
162 """Decorate a ``Mismatch``.180 """Decorate a ``Mismatch``.
@@ -268,7 +286,12 @@
268 self.with_nl = with_nl286 self.with_nl = with_nl
269287
270 def describe(self):288 def describe(self):
271 return self.matcher._describe_difference(self.with_nl)289 s = self.matcher._describe_difference(self.with_nl)
290 if str_is_unicode or isinstance(s, unicode):
291 return s
292 # GZ 2011-08-24: This is actually pretty bogus, most C0 codes should
293 # be escaped, in addition to non-ascii bytes.
294 return s.decode("latin1").encode("ascii", "backslashreplace")
272295
273296
274class DoesNotContain(Mismatch):297class DoesNotContain(Mismatch):
@@ -298,8 +321,8 @@
298 self.expected = expected321 self.expected = expected
299322
300 def describe(self):323 def describe(self):
301 return "'%s' does not start with '%s'." % (324 return "%s does not start with %s." % (
302 self.matchee, self.expected)325 text_repr(self.matchee), text_repr(self.expected))
303326
304327
305class DoesNotEndWith(Mismatch):328class DoesNotEndWith(Mismatch):
@@ -314,8 +337,8 @@
314 self.expected = expected337 self.expected = expected
315338
316 def describe(self):339 def describe(self):
317 return "'%s' does not end with '%s'." % (340 return "%s does not end with %s." % (
318 self.matchee, self.expected)341 text_repr(self.matchee), text_repr(self.expected))
319342
320343
321class _BinaryComparison(object):344class _BinaryComparison(object):
@@ -347,8 +370,8 @@
347 def _format(self, thing):370 def _format(self, thing):
348 # Blocks of text with newlines are formatted as triple-quote371 # Blocks of text with newlines are formatted as triple-quote
349 # strings. Everything else is pretty-printed.372 # strings. Everything else is pretty-printed.
350 if istext(thing) and '\n' in thing:373 if istext(thing) or _isbytes(thing):
351 return '"""\\\n%s"""' % (thing,)374 return text_repr(thing)
352 return pformat(thing)375 return pformat(thing)
353376
354 def describe(self):377 def describe(self):
@@ -359,7 +382,7 @@
359 self._mismatch_string, self._format(self.expected),382 self._mismatch_string, self._format(self.expected),
360 self._format(self.other))383 self._format(self.other))
361 else:384 else:
362 return "%s %s %s" % (left, self._mismatch_string,right)385 return "%s %s %s" % (left, self._mismatch_string, right)
363386
364387
365class Equals(_BinaryComparison):388class Equals(_BinaryComparison):
@@ -599,7 +622,7 @@
599 self.expected = expected622 self.expected = expected
600623
601 def __str__(self):624 def __str__(self):
602 return "Starts with '%s'." % self.expected625 return "StartsWith(%r)" % (self.expected,)
603626
604 def match(self, matchee):627 def match(self, matchee):
605 if not matchee.startswith(self.expected):628 if not matchee.startswith(self.expected):
@@ -618,7 +641,7 @@
618 self.expected = expected641 self.expected = expected
619642
620 def __str__(self):643 def __str__(self):
621 return "Ends with '%s'." % self.expected644 return "EndsWith(%r)" % (self.expected,)
622645
623 def match(self, matchee):646 def match(self, matchee):
624 if not matchee.endswith(self.expected):647 if not matchee.endswith(self.expected):
@@ -875,8 +898,12 @@
875898
876 def match(self, value):899 def match(self, value):
877 if not re.match(self.pattern, value, self.flags):900 if not re.match(self.pattern, value, self.flags):
901 pattern = self.pattern
902 if not isinstance(pattern, str_is_unicode and str or unicode):
903 pattern = pattern.decode("latin1")
904 pattern = pattern.encode("unicode_escape").decode("ascii")
878 return Mismatch("%r does not match /%s/" % (905 return Mismatch("%r does not match /%s/" % (
879 value, self.pattern))906 value, pattern.replace("\\\\", "\\")))
880907
881908
882class MatchesSetwise(object):909class MatchesSetwise(object):
883910
=== modified file 'testtools/tests/test_compat.py'
--- testtools/tests/test_compat.py 2011-07-04 18:03:28 +0000
+++ testtools/tests/test_compat.py 2011-09-13 23:54:14 +0000
@@ -16,6 +16,7 @@
16 _get_source_encoding,16 _get_source_encoding,
17 _u,17 _u,
18 str_is_unicode,18 str_is_unicode,
19 text_repr,
19 unicode_output_stream,20 unicode_output_stream,
20 )21 )
21from testtools.matchers import (22from testtools.matchers import (
@@ -262,6 +263,132 @@
262 self.assertEqual("pa???n", sout.getvalue())263 self.assertEqual("pa???n", sout.getvalue())
263264
264265
266class TestTextRepr(testtools.TestCase):
267 """Ensure in extending repr, basic behaviours are not being broken"""
268
269 ascii_examples = (
270 # Single character examples
271 # C0 control codes should be escaped except multiline \n
272 ("\x00", "'\\x00'", "'''\\\n\\x00'''"),
273 ("\b", "'\\x08'", "'''\\\n\\x08'''"),
274 ("\t", "'\\t'", "'''\\\n\\t'''"),
275 ("\n", "'\\n'", "'''\\\n\n'''"),
276 ("\r", "'\\r'", "'''\\\n\\r'''"),
277 # Quotes and backslash should match normal repr behaviour
278 ('"', "'\"'", "'''\\\n\"'''"),
279 ("'", "\"'\"", "'''\\\n\\''''"),
280 ("\\", "'\\\\'", "'''\\\n\\\\'''"),
281 # DEL is also unprintable and should be escaped
282 ("\x7F", "'\\x7f'", "'''\\\n\\x7f'''"),
283
284 # Character combinations that need double checking
285 ("\r\n", "'\\r\\n'", "'''\\\n\\r\n'''"),
286 ("\"'", "'\"\\''", "'''\\\n\"\\''''"),
287 ("'\"", "'\\'\"'", "'''\\\n'\"'''"),
288 ("\\n", "'\\\\n'", "'''\\\n\\\\n'''"),
289 ("\\\n", "'\\\\\\n'", "'''\\\n\\\\\n'''"),
290 ("\\' ", "\"\\\\' \"", "'''\\\n\\\\' '''"),
291 ("\\'\n", "\"\\\\'\\n\"", "'''\\\n\\\\'\n'''"),
292 ("\\'\"", "'\\\\\\'\"'", "'''\\\n\\\\'\"'''"),
293 ("\\'''", "\"\\\\'''\"", "'''\\\n\\\\\\'\\'\\''''"),
294 )
295
296 # Bytes with the high bit set should always be escaped
297 bytes_examples = (
298 (_b("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
299 (_b("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
300 (_b("\xC0"), "'\\xc0'", "'''\\\n\\xc0'''"),
301 (_b("\xFF"), "'\\xff'", "'''\\\n\\xff'''"),
302 (_b("\xC2\xA7"), "'\\xc2\\xa7'", "'''\\\n\\xc2\\xa7'''"),
303 )
304
305 # Unicode doesn't escape printable characters as per the Python 3 model
306 unicode_examples = (
307 # C1 codes are unprintable
308 (_u("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
309 (_u("\x9F"), "'\\x9f'", "'''\\\n\\x9f'''"),
310 # No-break space is unprintable
311 (_u("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
312 # Letters latin alphabets are printable
313 (_u("\xA1"), _u("'\xa1'"), _u("'''\\\n\xa1'''")),
314 (_u("\xFF"), _u("'\xff'"), _u("'''\\\n\xff'''")),
315 (_u("\u0100"), _u("'\u0100'"), _u("'''\\\n\u0100'''")),
316 # Line and paragraph seperators are unprintable
317 (_u("\u2028"), "'\\u2028'", "'''\\\n\\u2028'''"),
318 (_u("\u2029"), "'\\u2029'", "'''\\\n\\u2029'''"),
319 # Unpaired surrogates are unprintable
320 (_u("\uD800"), "'\\ud800'", "'''\\\n\\ud800'''"),
321 (_u("\uDFFF"), "'\\udfff'", "'''\\\n\\udfff'''"),
322 # Unprintable general categories not fully tested: Cc, Cf, Co, Cn, Zs
323 )
324
325 b_prefix = repr(_b(""))[:-2]
326 u_prefix = repr(_u(""))[:-2]
327
328 def test_ascii_examples_oneline_bytes(self):
329 for s, expected, _ in self.ascii_examples:
330 b = _b(s)
331 actual = text_repr(b, multiline=False)
332 # Add self.assertIsInstance check?
333 self.assertEqual(actual, self.b_prefix + expected)
334 self.assertEqual(eval(actual), b)
335
336 def test_ascii_examples_oneline_unicode(self):
337 for s, expected, _ in self.ascii_examples:
338 u = _u(s)
339 actual = text_repr(u, multiline=False)
340 self.assertEqual(actual, self.u_prefix + expected)
341 self.assertEqual(eval(actual), u)
342
343 def test_ascii_examples_multiline_bytes(self):
344 for s, _, expected in self.ascii_examples:
345 b = _b(s)
346 actual = text_repr(b, multiline=True)
347 self.assertEqual(actual, self.b_prefix + expected)
348 self.assertEqual(eval(actual), b)
349
350 def test_ascii_examples_multiline_unicode(self):
351 for s, _, expected in self.ascii_examples:
352 u = _u(s)
353 actual = text_repr(u, multiline=True)
354 self.assertEqual(actual, self.u_prefix + expected)
355 self.assertEqual(eval(actual), u)
356
357 def test_ascii_examples_defaultline_bytes(self):
358 for s, one, multi in self.ascii_examples:
359 expected = "\n" in s and multi or one
360 self.assertEqual(text_repr(_b(s)), self.b_prefix + expected)
361
362 def test_ascii_examples_defaultline_unicode(self):
363 for s, one, multi in self.ascii_examples:
364 expected = "\n" in s and multi or one
365 self.assertEqual(text_repr(_u(s)), self.u_prefix + expected)
366
367 def test_bytes_examples_oneline(self):
368 for b, expected, _ in self.bytes_examples:
369 actual = text_repr(b, multiline=False)
370 self.assertEqual(actual, self.b_prefix + expected)
371 self.assertEqual(eval(actual), b)
372
373 def test_bytes_examples_multiline(self):
374 for b, _, expected in self.bytes_examples:
375 actual = text_repr(b, multiline=True)
376 self.assertEqual(actual, self.b_prefix + expected)
377 self.assertEqual(eval(actual), b)
378
379 def test_unicode_examples_oneline(self):
380 for u, expected, _ in self.unicode_examples:
381 actual = text_repr(u, multiline=False)
382 self.assertEqual(actual, self.u_prefix + expected)
383 self.assertEqual(eval(actual), u)
384
385 def test_unicode_examples_multiline(self):
386 for u, _, expected in self.unicode_examples:
387 actual = text_repr(u, multiline=True)
388 self.assertEqual(actual, self.u_prefix + expected)
389 self.assertEqual(eval(actual), u)
390
391
265def test_suite():392def test_suite():
266 from unittest import TestLoader393 from unittest import TestLoader
267 return TestLoader().loadTestsFromName(__name__)394 return TestLoader().loadTestsFromName(__name__)
268395
=== modified file 'testtools/tests/test_matchers.py'
--- testtools/tests/test_matchers.py 2011-09-13 23:54:14 +0000
+++ testtools/tests/test_matchers.py 2011-09-13 23:54:14 +0000
@@ -12,7 +12,9 @@
12 )12 )
13from testtools.compat import (13from testtools.compat import (
14 StringIO,14 StringIO,
15 to_text,15 str_is_unicode,
16 text_repr,
17 _b,
16 _u,18 _u,
17 )19 )
18from testtools.matchers import (20from testtools.matchers import (
@@ -20,7 +22,11 @@
20 AllMatch,22 AllMatch,
21 Annotate,23 Annotate,
22 AnnotatedMismatch,24 AnnotatedMismatch,
25<<<<<<< TREE
23 Contains,26 Contains,
27=======
28 _BinaryMismatch,
29>>>>>>> MERGE-SOURCE
24 Equals,30 Equals,
25 DocTestMatches,31 DocTestMatches,
26 DoesNotEndWith,32 DoesNotEndWith,
@@ -96,7 +102,7 @@
96 mismatch = matcher.match(2)102 mismatch = matcher.match(2)
97 e = MismatchError(matchee, matcher, mismatch, True)103 e = MismatchError(matchee, matcher, mismatch, True)
98 expected = (104 expected = (
99 'Match failed. Matchee: "%s"\n'105 'Match failed. Matchee: %r\n'
100 'Matcher: %s\n'106 'Matcher: %s\n'
101 'Difference: %s\n' % (107 'Difference: %s\n' % (
102 matchee,108 matchee,
@@ -112,17 +118,80 @@
112 matcher = Equals(_u('a'))118 matcher = Equals(_u('a'))
113 mismatch = matcher.match(matchee)119 mismatch = matcher.match(matchee)
114 expected = (120 expected = (
115 'Match failed. Matchee: "%s"\n'121 'Match failed. Matchee: %s\n'
116 'Matcher: %s\n'122 'Matcher: %s\n'
117 'Difference: %s\n' % (123 'Difference: %s\n' % (
118 matchee,124 text_repr(matchee),
119 matcher,125 matcher,
120 mismatch.describe(),126 mismatch.describe(),
121 ))127 ))
122 e = MismatchError(matchee, matcher, mismatch, True)128 e = MismatchError(matchee, matcher, mismatch, True)
123 # XXX: Using to_text rather than str because, on Python 2, str will129 if str_is_unicode:
124 # raise UnicodeEncodeError.130 actual = str(e)
125 self.assertEqual(expected, to_text(e))131 else:
132 actual = unicode(e)
133 # Using str() should still work, and return ascii only
134 self.assertEqual(
135 expected.replace(matchee, matchee.encode("unicode-escape")),
136 str(e).decode("ascii"))
137 self.assertEqual(expected, actual)
138
139
140class Test_BinaryMismatch(TestCase):
141 """Mismatches from binary comparisons need useful describe output"""
142
143 _long_string = "This is a longish multiline non-ascii string\n\xa7"
144 _long_b = _b(_long_string)
145 _long_u = _u(_long_string)
146
147 def test_short_objects(self):
148 o1, o2 = object(), object()
149 mismatch = _BinaryMismatch(o1, "!~", o2)
150 self.assertEqual(mismatch.describe(), "%r !~ %r" % (o1, o2))
151
152 def test_short_mixed_strings(self):
153 b, u = _b("\xa7"), _u("\xa7")
154 mismatch = _BinaryMismatch(b, "!~", u)
155 self.assertEqual(mismatch.describe(), "%r !~ %r" % (b, u))
156
157 def test_long_bytes(self):
158 one_line_b = self._long_b.replace(_b("\n"), _b(" "))
159 mismatch = _BinaryMismatch(one_line_b, "!~", self._long_b)
160 self.assertEqual(mismatch.describe(),
161 "%s:\nreference = %s\nactual = %s\n" % ("!~",
162 text_repr(one_line_b),
163 text_repr(self._long_b, multiline=True)))
164
165 def test_long_unicode(self):
166 one_line_u = self._long_u.replace("\n", " ")
167 mismatch = _BinaryMismatch(one_line_u, "!~", self._long_u)
168 self.assertEqual(mismatch.describe(),
169 "%s:\nreference = %s\nactual = %s\n" % ("!~",
170 text_repr(one_line_u),
171 text_repr(self._long_u, multiline=True)))
172
173 def test_long_mixed_strings(self):
174 mismatch = _BinaryMismatch(self._long_b, "!~", self._long_u)
175 self.assertEqual(mismatch.describe(),
176 "%s:\nreference = %s\nactual = %s\n" % ("!~",
177 text_repr(self._long_b, multiline=True),
178 text_repr(self._long_u, multiline=True)))
179
180 def test_long_bytes_and_object(self):
181 obj = object()
182 mismatch = _BinaryMismatch(self._long_b, "!~", obj)
183 self.assertEqual(mismatch.describe(),
184 "%s:\nreference = %s\nactual = %s\n" % ("!~",
185 text_repr(self._long_b, multiline=True),
186 repr(obj)))
187
188 def test_long_unicode_and_object(self):
189 obj = object()
190 mismatch = _BinaryMismatch(self._long_u, "!~", obj)
191 self.assertEqual(mismatch.describe(),
192 "%s:\nreference = %s\nactual = %s\n" % ("!~",
193 text_repr(self._long_u, multiline=True),
194 repr(obj)))
126195
127196
128class TestMatchersInterface(object):197class TestMatchersInterface(object):
@@ -208,6 +277,23 @@
208 self.assertEqual("bar\n", matcher.want)277 self.assertEqual("bar\n", matcher.want)
209 self.assertEqual(doctest.ELLIPSIS, matcher.flags)278 self.assertEqual(doctest.ELLIPSIS, matcher.flags)
210279
280 def test_describe_non_ascii_bytes(self):
281 """Even with bytestrings, the mismatch should be coercible to unicode
282
283 DocTestMatches is intended for text, but the Python 2 str type also
284 permits arbitrary binary inputs. This is a slightly bogus thing to do,
285 and under Python 3 using bytes objects will reasonably raise an error.
286 """
287 header = _b("\x89PNG\r\n\x1a\n...")
288 if str_is_unicode:
289 self.assertRaises(TypeError,
290 DocTestMatches, header, doctest.ELLIPSIS)
291 return
292 matcher = DocTestMatches(header, doctest.ELLIPSIS)
293 mismatch = matcher.match(_b("GIF89a\1\0\1\0\0\0\0;"))
294 # Must be treatable as unicode text, the exact output matters less
295 self.assertTrue(unicode(mismatch.describe()))
296
211297
212class TestEqualsInterface(TestCase, TestMatchersInterface):298class TestEqualsInterface(TestCase, TestMatchersInterface):
213299
@@ -610,6 +696,21 @@
610 mismatch = DoesNotStartWith("fo", "bo")696 mismatch = DoesNotStartWith("fo", "bo")
611 self.assertEqual("'fo' does not start with 'bo'.", mismatch.describe())697 self.assertEqual("'fo' does not start with 'bo'.", mismatch.describe())
612698
699 def test_describe_non_ascii_unicode(self):
700 string = _u("A\xA7")
701 suffix = _u("B\xA7")
702 mismatch = DoesNotStartWith(string, suffix)
703 self.assertEqual("%s does not start with %s." % (
704 text_repr(string), text_repr(suffix)),
705 mismatch.describe())
706
707 def test_describe_non_ascii_bytes(self):
708 string = _b("A\xA7")
709 suffix = _b("B\xA7")
710 mismatch = DoesNotStartWith(string, suffix)
711 self.assertEqual("%r does not start with %r." % (string, suffix),
712 mismatch.describe())
713
613714
614class StartsWithTests(TestCase):715class StartsWithTests(TestCase):
615716
@@ -617,7 +718,17 @@
617718
618 def test_str(self):719 def test_str(self):
619 matcher = StartsWith("bar")720 matcher = StartsWith("bar")
620 self.assertEqual("Starts with 'bar'.", str(matcher))721 self.assertEqual("StartsWith('bar')", str(matcher))
722
723 def test_str_with_bytes(self):
724 b = _b("\xA7")
725 matcher = StartsWith(b)
726 self.assertEqual("StartsWith(%r)" % (b,), str(matcher))
727
728 def test_str_with_unicode(self):
729 u = _u("\xA7")
730 matcher = StartsWith(u)
731 self.assertEqual("StartsWith(%r)" % (u,), str(matcher))
621732
622 def test_match(self):733 def test_match(self):
623 matcher = StartsWith("bar")734 matcher = StartsWith("bar")
@@ -646,6 +757,21 @@
646 mismatch = DoesNotEndWith("fo", "bo")757 mismatch = DoesNotEndWith("fo", "bo")
647 self.assertEqual("'fo' does not end with 'bo'.", mismatch.describe())758 self.assertEqual("'fo' does not end with 'bo'.", mismatch.describe())
648759
760 def test_describe_non_ascii_unicode(self):
761 string = _u("A\xA7")
762 suffix = _u("B\xA7")
763 mismatch = DoesNotEndWith(string, suffix)
764 self.assertEqual("%s does not end with %s." % (
765 text_repr(string), text_repr(suffix)),
766 mismatch.describe())
767
768 def test_describe_non_ascii_bytes(self):
769 string = _b("A\xA7")
770 suffix = _b("B\xA7")
771 mismatch = DoesNotEndWith(string, suffix)
772 self.assertEqual("%r does not end with %r." % (string, suffix),
773 mismatch.describe())
774
649775
650class EndsWithTests(TestCase):776class EndsWithTests(TestCase):
651777
@@ -653,7 +779,17 @@
653779
654 def test_str(self):780 def test_str(self):
655 matcher = EndsWith("bar")781 matcher = EndsWith("bar")
656 self.assertEqual("Ends with 'bar'.", str(matcher))782 self.assertEqual("EndsWith('bar')", str(matcher))
783
784 def test_str_with_bytes(self):
785 b = _b("\xA7")
786 matcher = EndsWith(b)
787 self.assertEqual("EndsWith(%r)" % (b,), str(matcher))
788
789 def test_str_with_unicode(self):
790 u = _u("\xA7")
791 matcher = EndsWith(u)
792 self.assertEqual("EndsWith(%r)" % (u,), str(matcher))
657793
658 def test_match(self):794 def test_match(self):
659 matcher = EndsWith("arf")795 matcher = EndsWith("arf")
@@ -770,11 +906,17 @@
770 ("MatchesRegex('a|b')", MatchesRegex('a|b')),906 ("MatchesRegex('a|b')", MatchesRegex('a|b')),
771 ("MatchesRegex('a|b', re.M)", MatchesRegex('a|b', re.M)),907 ("MatchesRegex('a|b', re.M)", MatchesRegex('a|b', re.M)),
772 ("MatchesRegex('a|b', re.I|re.M)", MatchesRegex('a|b', re.I|re.M)),908 ("MatchesRegex('a|b', re.I|re.M)", MatchesRegex('a|b', re.I|re.M)),
909 ("MatchesRegex(%r)" % (_b("\xA7"),), MatchesRegex(_b("\xA7"))),
910 ("MatchesRegex(%r)" % (_u("\xA7"),), MatchesRegex(_u("\xA7"))),
773 ]911 ]
774912
775 describe_examples = [913 describe_examples = [
776 ("'c' does not match /a|b/", 'c', MatchesRegex('a|b')),914 ("'c' does not match /a|b/", 'c', MatchesRegex('a|b')),
777 ("'c' does not match /a\d/", 'c', MatchesRegex(r'a\d')),915 ("'c' does not match /a\d/", 'c', MatchesRegex(r'a\d')),
916 ("%r does not match /\\s+\\xa7/" % (_b('c'),),
917 _b('c'), MatchesRegex(_b("\\s+\xA7"))),
918 ("%r does not match /\\s+\\xa7/" % (_u('c'),),
919 _u('c'), MatchesRegex(_u("\\s+\xA7"))),
778 ]920 ]
779921
780922
781923
=== modified file 'testtools/tests/test_testcase.py'
--- testtools/tests/test_testcase.py 2011-09-13 23:54:14 +0000
+++ testtools/tests/test_testcase.py 2011-09-13 23:54:14 +0000
@@ -488,7 +488,7 @@
488 matchee = 'foo'488 matchee = 'foo'
489 matcher = Equals('bar')489 matcher = Equals('bar')
490 expected = (490 expected = (
491 'Match failed. Matchee: "%s"\n'491 'Match failed. Matchee: %r\n'
492 'Matcher: %s\n'492 'Matcher: %s\n'
493 'Difference: %s\n' % (493 'Difference: %s\n' % (
494 matchee,494 matchee,
@@ -528,10 +528,10 @@
528 matchee = _u('\xa7')528 matchee = _u('\xa7')
529 matcher = Equals(_u('a'))529 matcher = Equals(_u('a'))
530 expected = (530 expected = (
531 'Match failed. Matchee: "%s"\n'531 'Match failed. Matchee: %s\n'
532 'Matcher: %s\n'532 'Matcher: %s\n'
533 'Difference: %s\n\n' % (533 'Difference: %s\n\n' % (
534 matchee,534 repr(matchee).replace("\\xa7", matchee),
535 matcher,535 matcher,
536 matcher.match(matchee).describe(),536 matcher.match(matchee).describe(),
537 ))537 ))
@@ -565,6 +565,21 @@
565 self.assertFails(expected_error, self.assertEquals, a, b)565 self.assertFails(expected_error, self.assertEquals, a, b)
566 self.assertFails(expected_error, self.failUnlessEqual, a, b)566 self.assertFails(expected_error, self.failUnlessEqual, a, b)
567567
568 def test_assertEqual_non_ascii_str_with_newlines(self):
569 message = _u("Be careful mixing unicode and bytes")
570 a = "a\n\xa7\n"
571 b = "Just a longish string so the more verbose output form is used."
572 expected_error = '\n'.join([
573 '!=:',
574 "reference = '''\\",
575 'a',
576 repr('\xa7')[1:-1],
577 "'''",
578 'actual = %r' % (b,),
579 ': ' + message,
580 ])
581 self.assertFails(expected_error, self.assertEqual, a, b, message)
582
568 def test_assertIsNone(self):583 def test_assertIsNone(self):
569 self.assertIsNone(None)584 self.assertIsNone(None)
570585

Subscribers

People subscribed via source and target branches