testtools

doc/for-test-authors.rst (+10/-2)
testtools/compat.py (+93/-5)
testtools/matchers.py (+40/-13)
testtools/tests/test_compat.py (+127/-0)
testtools/tests/test_matchers.py (+151/-9)
testtools/tests/test_testcase.py (+18/-3)

Text conflict in testtools/tests/test_matchers.py

To merge this branch:

bzr merge lp:~gz/testtools/unprintable-assertThat-804127

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Jonathan Lange		2011-08-24	Approve on 2011-09-14
Review via email: mp+72641@code.launchpad.net

Description of the change

Resolve issues related to stringifying in the matcher code, by replacing several custom repr-like schemes using %s on strings (which may upcase the result to unicode or prevent later upcasting, or leak control codes, or...) with a big ugly method that tries to do what the callers wanted.

This deserves a longer description of the reasoning behind the changes, but there's some discussion of the reasoning in the prerequisite branch and I need some feedback before I go crazy\bier.

Some random notes so I don't forget them later:
* Switched from """...""" to '''...''' so if I see output pasted in bug reports I'll know if I broke it.
* Being clever with the escaping of single quotes in multiline strings is probably more trouble than it's worth, just putting a backslash in front of every occurrence would simplify the logic a lot.
* Astral characters break, as per usual. I'm amused to note that upstream just fixed one more issue their end[1] (but without exposing a usable iterator to python level code still), and on Python 3 only of course.
* If the decision is to dump all these attempts at fancy formatting and just live with the normal repr, I would not mind that at all.

[1]: http://bugs.python.org/issue9200

Revision history for this message

Jonathan Lange (jml) wrote on 2011-08-24:

Heroic effort. I can only imagine what this must have been like to write.

What follows is initial feedback and thoughts. Answer & fix as much as you
like. I'm happy to fix up the minor stuff as I land.

To summarize, although the bug was originally filed against assertThat, it
turns out that it's actually an issue in the matchers themselves as well as
assertThat. Right?

If so, we have to be thinking about:
* what are our failure modes with this code and existing broken matchers
* how to make it easy to define matchers and mismatches correctly

Will return to those in a bit. General code stuff:

* Can you elaborate on your comment on istext()?
* Is there a way we can make it so matchers never have to use _isbytes?
   Perhaps by adding an "is_multiline" function to compat.
* I don't care about astral characters
* In compat, "Seperate" should be spelled "Separate"
* text_repr is a bit hard to follow, maybe add some internal comments?
* Some more documentation on the intent & API of text_repr would help,
   saying that it attempts to mimic repr() except returning unicode,
   giving some examples, explaining why you might want to use it and also what
   multiline does.
* I don't get why multiline needs three different ways of escaping lines
   whereas the non-multiline case only has two branches
* I think having a repr()-like function that takes any object and
   special-cases text would be good
* Just to be clear, we are expecting Mismatch.describe() to return unicode,
   right?
* What are we expecting Matcher.__str__ to return? Text or bytes?

On the original points:

* We should add something to the docs explaining best practice for
   Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)
* We should update the docstring for ``Mismatch.describe``
* Our docs say that match() should return a Mismatch, so we can assume
   subclassing of that.
* These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
   e.g. The 'message' parameter of assertThat needs to have its type
   documented.

Thanks so much for persevering.

jml

Heroic effort. I can only imagine what this must have been like to write.

What follows is initial feedback and thoughts. Answer & fix as much as you
like. I'm happy to fix up the minor stuff as I land.

To summarize, although the bug was originally filed against assertThat, it
turns out that it's actually an issue in the matchers themselves as well as
assertThat. Right?

If so, we have to be thinking about:
 * what are our failure modes with this code and existing broken matchers
 * how to make it easy to define matchers and mismatches correctly

Will return to those in a bit. General code stuff:

* Can you elaborate on your comment on istext()?
 * Is there a way we can make it so matchers never have to use _isbytes?
   Perhaps by adding an "is_multiline" function to compat.
 * I don't care about astral characters
 * In compat, "Seperate" should be spelled "Separate"
 * text_repr is a bit hard to follow, maybe add some internal comments?
 * Some more documentation on the intent & API of text_repr would help,
   saying that it attempts to mimic repr() except returning unicode,
   giving some examples, explaining why you might want to use it and also what
   multiline does.
 * I don't get why multiline needs three different ways of escaping lines
   whereas the non-multiline case only has two branches
 * I think having a repr()-like function that takes any object and
   special-cases text would be good
 * Just to be clear, we are expecting Mismatch.describe() to return unicode,
   right?
 * What are we expecting Matcher.__str__ to return? Text or bytes?

On the original points:

* We should add something to the docs explaining best practice for
   Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)
 * We should update the docstring for ``Mismatch.describe``
 * Our docs say that match() should return a Mismatch, so we can assume
   subclassing of that.
 * These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
   e.g. The 'message' parameter of assertThat needs to have its type
   documented.

Thanks so much for persevering.

jml

Revision history for this message

Robert Collins (lifeless) wrote on 2011-08-25:

* What are we expecting Matcher.__str__ to return? Text or bytes?

-- text. Fugly I know. So python 2.x: use __unicode__, 3.x use
__str__, and we expect text all the time. We use it to get something
looking like what the user asked for after all.
....

* Our docs say that match() should return a Mismatch, so we can assume
subclassing of that.

Please no, duck typing should be all we need. (And None is not a
subclass of Mismatch anyway :P).

-Rob

Revision history for this message

Martin Packman (gz) wrote on 2011-08-25:

Download full text (6.1 KiB)

Thanks, this is is a useful review that covers all the stuff I needed to mention but didn't get in the proposal.

> To summarize, although the bug was originally filed against assertThat, it
> turns out that it's actually an issue in the matchers themselves as well as
> assertThat. Right?

Yes. The initial thoughts were about adding more escaping to assertThat, but by then most of the formatting has been done. By moving the onus onto Matchers to escape more carefully, it's possible to preserve more information.

> If so, we have to be thinking about:
> * what are our failure modes with this code and existing broken matchers
> * how to make it easy to define matchers and mismatches correctly

These are exactly the two primary concerns here. We don't want to punish test case authors with spurious errors if they specify an assert incorrectly, but give a clear enough failure for them to fix the problem. We also don't want learning about Python string quirks to be a prerequisite for writing a Matcher.

> * Can you elaborate on your comment on istext()?

The result of the function is generally not enough information to actually use your string safely.

Two of the current callers in texttools are just trying to do special handling on a string, and otherwise assume the object is already a specific correct type. In both cases, the call is (harmlessly) subtly wrong, __import__ doesn't take non-ascii strings on Python 2, and Python 3 bytes can be used for regular expressions.

Code wanting to do actual string handling has it worse, as it will always risk getting a string that can't be safely interpolated, or written stream, or otherwise given as output.

In short, it's something of an attractive nuisance. In using it, I realised the text_repr interface wasn't very good.

> * Is there a way we can make it so matchers never have to use _isbytes?
> Perhaps by adding an "is_multiline" function to compat.

Something like this would be much better, possibly moving the multiline logic into the repr function (so True means triple quote if there's a newline, or by adding a third state). This was me lowering my standards and keeping coding, in order to get something that would work. :)

> * text_repr is a bit hard to follow, maybe add some internal comments?

Yup, and some general cleanup would help, I left the first version that passed the tests alone so we could see if it actually worked as an interface before buffing it.

> * Some more documentation on the intent & API of text_repr would help,
> saying that it attempts to mimic repr() except returning unicode,
> giving some examples, explaining why you might want to use it and also what
> multiline does.

Will write this. As is probably clear from the tests, it aims to use the Python 3 logic for displaying literal unicode characters, and to maintain the `eval(repr(s)) == s` expectation.

> * I don't get why multiline needs three different ways of escaping lines
> whereas the non-multiline case only has two branches

So, there are a cube of relevant combinations:

               Multiline?
                 F T
               +---+---+ Python 3
             / | | R | bytes
           / ...

Thanks, this is is a useful review that covers all the stuff I needed to mention but didn't get in the proposal.

> To summarize, although the bug was originally filed against assertThat, it
> turns out that it's actually an issue in the matchers themselves as well as
> assertThat. Right?

> If so, we have to be thinking about:
>  * what are our failure modes with this code and existing broken matchers
>  * how to make it easy to define matchers and mismatches correctly

>  * Can you elaborate on your comment on istext()?

The result of the function is generally not enough information to actually use your string safely.

Code wanting to do actual string handling has it worse, as it will always risk getting a string that can't be safely interpolated, or written stream, or otherwise given as output.

In short, it's something of an attractive nuisance. In using it, I realised the text_repr interface wasn't very good.

>  * Is there a way we can make it so matchers never have to use _isbytes?
>    Perhaps by adding an "is_multiline" function to compat.

>  * text_repr is a bit hard to follow, maybe add some internal comments?

Yup, and some general cleanup would help, I left the first version that passed the tests alone so we could see if it actually worked as an interface before buffing it.

>  * Some more documentation on the intent & API of text_repr would help,
>    saying that it attempts to mimic repr() except returning unicode,
>    giving some examples, explaining why you might want to use it and also what
>    multiline does.

Will write this. As is probably clear from the tests, it aims to use the Python 3 logic for displaying literal unicode characters, and to maintain the `eval(repr(s)) == s` expectation.

>  * I don't get why multiline needs three different ways of escaping lines
>    whereas the non-multiline case only has two branches

So, there are a cube of relevant combinations:

Multiline?
                 F   T
               +---+---+ Python 3
             / |   | R | bytes
           /   +---+---+
 Python 2 +---+---+| R | str
      str |   | S |+---+
          +---+---+   / 
  unicode | e | E | /   
          +---+---+

With non-multiline output Python 3 repr does what we want, as does str repr on Python 2. Everything else need some extra handling.

[e] Uses _slow_escape on the whole string then quotes appropriately
 [R] Uses repr on each line (with small complications due to differing bytes and str interfaces)
 [S] Uses string-escape on each line
 [E] Uses _slow_escape on each line

Ideally the [R] and [S] cases would be shared as per non-multiline equivalents. However, there is no codec in Python 3 that has the repr handing we want, so we have to call the function then post-process the string. It would be possible to use repr here on Python 2 as well, but entails extra work over the string-escape codec and the complications with split("\n") aren't relevent.

Some rearrangements should make this less hairy.

>  * I think having a repr()-like function that takes any object and
>    special-cases text would be good

I'm thinking about renaming `text_repr` to `rich_repr` or similar, and adding code that calls repr on non-stringy objects, then escapes unprintable characters per the existing code but without the quoting. That would then be generally useful for avoiding control sequences leaking out.

>  * Just to be clear, we are expecting Mismatch.describe() to return unicode,
>    right?

On Python 2, ascii-only str is also currently acceptable. Being fussier about the type would make it harder to write code that would appear to work correctly until it was used on a system with different locale and other similar problems. Another option is guarding describe in MismatchError more carefully to check for non-ascii bytestrings, which has the downside of doing an extra decode()/encode() all the time.

This is where having a robust type system would make life easier. Being able to define a string with a limited range of valid bytes would save the need to redundant checking at every level of the interface.

>  * What are we expecting Matcher.__str__ to return? Text or bytes?

The native str type, but must be ascii-only on Python 2 so upcasting to unicode is safe. This isn't a tough requirement, if the direction of movement is towards fixing bug 686807, as __repr__ tends to be written in terms that satisfy this.

> On the original points:
> 
>  * We should add something to the docs explaining best practice for
>    Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)

I think "use repr, see our fancy repr alternative if dealing with large or non-ascii output" might be the right generally idea. Which makes me wonder, should pformat move into the fancy repr call, making _BinaryMismatch._format redundant?

>  * These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
>    e.g. The 'message' parameter of assertThat needs to have its type
>    documented.

Yes, and a few extra checks around things like message would help robustness.

Revision history for this message

Jonathan Lange (jml) wrote on 2011-09-09:

Wow, thanks for the reply. Sorry its taken so long to get back to you.

In general, your replies to my review seem to be either agreeing with me or informing me of something I didn't know. There are a few cleanup things that need to be done before this lands (identified above). Where do you want to go from here?

lp:~gz/testtools/unprintable-assertThat-804127 updated on 2011-09-13

241. By Martin Packman on 2011-09-13: Extra tests to ensure multiline quoting is correct on Python 3
242. By Martin Packman on 2011-09-13: Fix spelling error noted in review by jml
243. By Martin Packman on 2011-09-13: Add cautions about unprintable strings to Mismatch documentation
244. By Martin Packman on 2011-09-13: Add third state to multiline argument of text_repr and comment the internal logic
245. By Martin Packman on 2011-09-13: Test that not passing multiline to text_repr defaults based on input

Revision history for this message

Martin Packman (gz) wrote on 2011-09-14:

Okay, this branch should now be tidied up enough to land, can you see anything I've missed?

I tried out switching the repr-like function to take any object, and it does seem to be a better interface but complicates the logic even further, so should probably be done separately.

There's still a bunch of todo all over the place, and I haven't managed to squish _isbytes yet, though it's not strictly necessary in the matchers code bar some new tests I added that expect fancy behaviour.

There are a couple more interesting things that could do with handling later. Either Annotate or AnnotateMismatch (or both, see the earlier complaint about string typing) should check the message they are given is printable unicode. Also the regular expression display form could do with some extra testing, I'm not totally convinced it always produces correct expressions (and certainly doesn't with some on Python 2.4 where the unicode-escape codec fails to escape backslashes).

Revision history for this message

Jonathan Lange (jml) wrote on 2011-09-14:

Can't see anything you've missed. I'm going to merge this, and will file wishlist bugs for the things that you say you've left out.

Revision history for this message

Jonathan Lange (jml) on 2011-09-14:

review: Approve

Revision history for this message

Jonathan Lange (jml) wrote on 2011-09-14:

Filed bug 849846 and bug 849843.

Revision history for this message

Martin Packman (gz) wrote on 2011-09-14:

Great, thanks jml, I'll subscribe to those bugs.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Martin Packman

Michael Hudson-Doyle

Robert Collins

testtools developers

 === modified file 'doc/for-test-authors.rst'
 --- doc/for-test-authors.rst	2011-08-15 16:14:42 +0000
 +++ doc/for-test-authors.rst	2011-09-13 23:54:14 +0000
@@ -717,7 +717,7 @@
            self.remainder = remainder
        def describe(self):
--          return "%s is not divisible by %s, %s remains" % (
++          return "%r is not divisible by %r, %r remains" % (
                self.number, self.divider, self.remainder)
        def get_details(self):
@@ -738,11 +738,19 @@
        remainder = actual % self.divider
        if remainder != 0:
            return Mismatch(
--              "%s is not divisible by %s, %s remains" % (
++              "%r is not divisible by %r, %r remains" % (
                    actual, self.divider, remainder))
        else:
            return None
++When writing a ``describe`` method or constructing a ``Mismatch`` object the
++code should ensure it only emits printable unicode.  As this output must be
++combined with other text and forwarded for presentation, letting through
++non-ascii bytes of ambiguous encoding or control characters could throw an
++exception or mangle the display.  In most cases simply avoiding the ``%s``
++format specifier and using ``%r`` instead will be enough.  For examples of
++more complex formatting see the ``testtools.matchers`` implementatons.
++
  Details
  =======
 === modified file 'testtools/compat.py'
 --- testtools/compat.py	2011-09-13 23:54:14 +0000
 +++ testtools/compat.py	2011-09-13 23:54:14 +0000
@@ -25,6 +25,7 @@
  import re
  import sys
  import traceback
++import unicodedata
  from testtools.helpers import try_imports
@@ -52,6 +53,7 @@
  """
  if sys.version_info > (3, 0):
++    import builtins
      def _u(s):
          return s
      _r = ascii
@@ -59,12 +61,14 @@
          """A byte literal."""
          return s.encode("latin-1")
      advance_iterator = next
++    # GZ 2011-08-24: Seems istext() is easy to misuse and makes for bad code.
      def istext(x):
          return isinstance(x, str)
      def classtypes():
          return (type,)
      str_is_unicode = True
  else:
++    import __builtin__ as builtins
      def _u(s):
          # The double replace mangling going on prepares the string for
          # unicode-escape - \foo is preserved, \u and \U are decoded.
@@ -112,6 +116,95 @@
          return isinstance(exception, (KeyboardInterrupt, SystemExit))
++# GZ 2011-08-24: Using isinstance checks like this encourages bad interfaces,
++#                there should be better ways to write code needing this.
++if not issubclass(getattr(builtins, "bytes", str), str):
++    def _isbytes(x):
++        return isinstance(x, bytes)
++else:
++    # Never return True on Pythons that provide the name but not the real type
++    def _isbytes(x):
++        return False
++
++
++def _slow_escape(text):
++    """Escape unicode `text` leaving printable characters unmodified
++
++    The behaviour emulates the Python 3 implementation of repr, see
++    unicode_repr in unicodeobject.c and isprintable definition.
++
++    Because this iterates over the input a codepoint at a time, it's slow, and
++    does not handle astral characters correctly on Python builds with 16 bit
++    rather than 32 bit unicode type.
++    """
++    output = []
++    for c in text:
++        o = ord(c)
++        if o < 256:
++            if o < 32 or 126 < o < 161:
++                output.append(c.encode("unicode-escape"))
++            elif o == 92:
++                # Separate due to bug in unicode-escape codec in Python 2.4
++                output.append("\\\\")
++            else:
++                output.append(c)
++        else:
++            # To get correct behaviour would need to pair up surrogates here
++            if unicodedata.category(c)[0] in "CZ":
++                output.append(c.encode("unicode-escape"))
++            else:
++                output.append(c)
++    return "".join(output)
++
++
++def text_repr(text, multiline=None):
++    """Rich repr for `text` returning unicode, triple quoted if `multiline`"""
++    is_py3k = sys.version_info > (3, 0)
++    nl = _isbytes(text) and bytes((0xA,)) or "\n"
++    if multiline is None:
++        multiline = nl in text
++    if not multiline and (is_py3k or not str_is_unicode and type(text) is str):
++        # Use normal repr for single line of unicode on Python 3 or bytes
++        return repr(text)
++    prefix = repr(text[:0])[:-2]
++    if multiline:
++        # To escape multiline strings, split and process each line in turn,
++        # making sure that quotes are not escaped.
++        if is_py3k:
++            offset = len(prefix) + 1
++            lines = []
++            for l in text.split(nl):
++                r = repr(l)
++                q = r[-1]
++                lines.append(r[offset:-1].replace("\\" + q, q))
++        elif not str_is_unicode and isinstance(text, str):
++            lines = [l.encode("string-escape").replace("\\'", "'")
++                for l in text.split("\n")]
++        else:
++            lines = [_slow_escape(l) for l in text.split("\n")]
++        # Combine the escaped lines and append two of the closing quotes,
++        # then iterate over the result to escape triple quotes correctly.
++        _semi_done = "\n".join(lines) + "''"
++        p = 0
++        while True:
++            p = _semi_done.find("'''", p)
++            if p == -1:
++                break
++            _semi_done = "\\".join([_semi_done[:p], _semi_done[p:]])
++            p += 2
++        return "".join([prefix, "'''\\\n", _semi_done, "'"])
++    escaped_text = _slow_escape(text)
++    # Determine which quote character to use and if one gets prefixed with a
++    # backslash following the same logic Python uses for repr() on strings
++    quote = "'"
++    if "'" in text:
++        if '"' in text:
++            escaped_text = escaped_text.replace("'", "\\'")
++        else:
++            quote = '"'
++    return "".join([prefix, quote, escaped_text, quote])
++
++
  def unicode_output_stream(stream):
      """Get wrapper for given stream that writes any unicode without exception
@@ -146,11 +239,6 @@
      return writer(stream, "replace")
--try:
--    to_text = unicode
--except NameError:
--    to_text = str
--
  # The default source encoding is actually "iso-8859-1" until Python 2.5 but
  # using non-ascii causes a deprecation warning in 2.4 and it's cleaner to
  # treat all versions the same way
 === modified file 'testtools/matchers.py'
 --- testtools/matchers.py	2011-09-13 23:54:14 +0000
 +++ testtools/matchers.py	2011-09-13 23:54:14 +0000
@@ -49,7 +49,10 @@
      classtypes,
      _error_repr,
      isbaseexception,
++    _isbytes,
      istext,
++    str_is_unicode,
++    text_repr
+     )
@@ -102,6 +105,8 @@
          """Describe the mismatch.
          This should be either a human-readable string or castable to a string.
++        In particular, is should either be plain ascii or unicode on Python 2,
++        and care should be taken to escape control characters.
          """
          try:
              return self._description
@@ -151,12 +156,25 @@
      def __str__(self):
          difference = self.mismatch.describe()
          if self.verbose:
++            # GZ 2011-08-24: Smelly API? Better to take any object and special
++            #                case text inside?
++            if istext(self.matchee) or _isbytes(self.matchee):
++                matchee = text_repr(self.matchee, multiline=False)
++            else:
++                matchee = repr(self.matchee)
              return (
--                'Match failed. Matchee: "%s"\nMatcher: %s\nDifference: %s\n'
--                % (self.matchee, self.matcher, difference))
++                'Match failed. Matchee: %s\nMatcher: %s\nDifference: %s\n'
++                % (matchee, self.matcher, difference))
          else:
              return difference
++    if not str_is_unicode:
++
++        __unicode__ = __str__
++
++        def __str__(self):
++            return self.__unicode__().encode("ascii", "backslashreplace")
++
  class MismatchDecorator(object):
      """Decorate a ``Mismatch``.
@@ -268,7 +286,12 @@
          self.with_nl = with_nl
      def describe(self):
--        return self.matcher._describe_difference(self.with_nl)
++        s = self.matcher._describe_difference(self.with_nl)
++        if str_is_unicode or isinstance(s, unicode):
++            return s
++        # GZ 2011-08-24: This is actually pretty bogus, most C0 codes should
++        #                be escaped, in addition to non-ascii bytes.
++        return s.decode("latin1").encode("ascii", "backslashreplace")
  class DoesNotContain(Mismatch):
@@ -298,8 +321,8 @@
          self.expected = expected
      def describe(self):
--        return "'%s' does not start with '%s'." % (
--            self.matchee, self.expected)
++        return "%s does not start with %s." % (
++            text_repr(self.matchee), text_repr(self.expected))
  class DoesNotEndWith(Mismatch):
@@ -314,8 +337,8 @@
          self.expected = expected
      def describe(self):
--        return "'%s' does not end with '%s'." % (
--            self.matchee, self.expected)
++        return "%s does not end with %s." % (
++            text_repr(self.matchee), text_repr(self.expected))
  class _BinaryComparison(object):
@@ -347,8 +370,8 @@
      def _format(self, thing):
          # Blocks of text with newlines are formatted as triple-quote
          # strings. Everything else is pretty-printed.
--        if istext(thing) and '\n' in thing:
--            return '"""\\\n%s"""' % (thing,)
++        if istext(thing) or _isbytes(thing):
++            return text_repr(thing)
          return pformat(thing)
      def describe(self):
@@ -359,7 +382,7 @@
                  self._mismatch_string, self._format(self.expected),
                  self._format(self.other))
          else:
--            return "%s %s %s" % (left, self._mismatch_string,right)
++            return "%s %s %s" % (left, self._mismatch_string, right)
  class Equals(_BinaryComparison):
@@ -599,7 +622,7 @@
          self.expected = expected
      def __str__(self):
--        return "Starts with '%s'." % self.expected
++        return "StartsWith(%r)" % (self.expected,)
      def match(self, matchee):
          if not matchee.startswith(self.expected):
@@ -618,7 +641,7 @@
          self.expected = expected
      def __str__(self):
--        return "Ends with '%s'." % self.expected
++        return "EndsWith(%r)" % (self.expected,)
      def match(self, matchee):
          if not matchee.endswith(self.expected):
@@ -875,8 +898,12 @@
      def match(self, value):
          if not re.match(self.pattern, value, self.flags):
++            pattern = self.pattern
++            if not isinstance(pattern, str_is_unicode and str or unicode):
++                pattern = pattern.decode("latin1")
++            pattern = pattern.encode("unicode_escape").decode("ascii")
              return Mismatch("%r does not match /%s/" % (
--                    value, self.pattern))
++                    value, pattern.replace("\\\\", "\\")))
  class MatchesSetwise(object):
 === modified file 'testtools/tests/test_compat.py'
 --- testtools/tests/test_compat.py	2011-07-04 18:03:28 +0000
 +++ testtools/tests/test_compat.py	2011-09-13 23:54:14 +0000
@@ -16,6 +16,7 @@
      _get_source_encoding,
      _u,
      str_is_unicode,
++    text_repr,
      unicode_output_stream,
+     )
  from testtools.matchers import (
@@ -262,6 +263,132 @@
          self.assertEqual("pa???n", sout.getvalue())
++class TestTextRepr(testtools.TestCase):
++    """Ensure in extending repr, basic behaviours are not being broken"""
++
++    ascii_examples = (
++        # Single character examples
++        #  C0 control codes should be escaped except multiline \n
++        ("\x00", "'\\x00'", "'''\\\n\\x00'''"),
++        ("\b", "'\\x08'", "'''\\\n\\x08'''"),
++        ("\t", "'\\t'", "'''\\\n\\t'''"),
++        ("\n", "'\\n'", "'''\\\n\n'''"),
++        ("\r", "'\\r'", "'''\\\n\\r'''"),
++        #  Quotes and backslash should match normal repr behaviour
++        ('"', "'\"'", "'''\\\n\"'''"),
++        ("'", "\"'\"", "'''\\\n\\''''"),
++        ("\\", "'\\\\'", "'''\\\n\\\\'''"),
++        #  DEL is also unprintable and should be escaped
++        ("\x7F", "'\\x7f'", "'''\\\n\\x7f'''"),
++
++        # Character combinations that need double checking
++        ("\r\n", "'\\r\\n'", "'''\\\n\\r\n'''"),
++        ("\"'", "'\"\\''", "'''\\\n\"\\''''"),
++        ("'\"", "'\\'\"'", "'''\\\n'\"'''"),
++        ("\\n", "'\\\\n'", "'''\\\n\\\\n'''"),
++        ("\\\n", "'\\\\\\n'", "'''\\\n\\\\\n'''"),
++        ("\\' ", "\"\\\\' \"", "'''\\\n\\\\' '''"),
++        ("\\'\n", "\"\\\\'\\n\"", "'''\\\n\\\\'\n'''"),
++        ("\\'\"", "'\\\\\\'\"'", "'''\\\n\\\\'\"'''"),
++        ("\\'''", "\"\\\\'''\"", "'''\\\n\\\\\\'\\'\\''''"),
++        )
++
++    # Bytes with the high bit set should always be escaped
++    bytes_examples = (
++        (_b("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
++        (_b("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
++        (_b("\xC0"), "'\\xc0'", "'''\\\n\\xc0'''"),
++        (_b("\xFF"), "'\\xff'", "'''\\\n\\xff'''"),
++        (_b("\xC2\xA7"), "'\\xc2\\xa7'", "'''\\\n\\xc2\\xa7'''"),
++        )
++
++    # Unicode doesn't escape printable characters as per the Python 3 model
++    unicode_examples = (
++        # C1 codes are unprintable
++        (_u("\x80"), "'\\x80'", "'''\\\n\\x80'''"),
++        (_u("\x9F"), "'\\x9f'", "'''\\\n\\x9f'''"),
++        # No-break space is unprintable
++        (_u("\xA0"), "'\\xa0'", "'''\\\n\\xa0'''"),
++        # Letters latin alphabets are printable
++        (_u("\xA1"), _u("'\xa1'"), _u("'''\\\n\xa1'''")),
++        (_u("\xFF"), _u("'\xff'"), _u("'''\\\n\xff'''")),
++        (_u("\u0100"), _u("'\u0100'"), _u("'''\\\n\u0100'''")),
++        # Line and paragraph seperators are unprintable
++        (_u("\u2028"), "'\\u2028'", "'''\\\n\\u2028'''"),
++        (_u("\u2029"), "'\\u2029'", "'''\\\n\\u2029'''"),
++        # Unpaired surrogates are unprintable
++        (_u("\uD800"), "'\\ud800'", "'''\\\n\\ud800'''"),
++        (_u("\uDFFF"), "'\\udfff'", "'''\\\n\\udfff'''"),
++        # Unprintable general categories not fully tested: Cc, Cf, Co, Cn, Zs
++        )
++
++    b_prefix = repr(_b(""))[:-2]
++    u_prefix = repr(_u(""))[:-2]
++
++    def test_ascii_examples_oneline_bytes(self):
++        for s, expected, _ in self.ascii_examples:
++            b = _b(s)
++            actual = text_repr(b, multiline=False)
++            # Add self.assertIsInstance check?
++            self.assertEqual(actual, self.b_prefix + expected)
++            self.assertEqual(eval(actual), b)
++
++    def test_ascii_examples_oneline_unicode(self):
++        for s, expected, _ in self.ascii_examples:
++            u = _u(s)
++            actual = text_repr(u, multiline=False)
++            self.assertEqual(actual, self.u_prefix + expected)
++            self.assertEqual(eval(actual), u)
++
++    def test_ascii_examples_multiline_bytes(self):
++        for s, _, expected in self.ascii_examples:
++            b = _b(s)
++            actual = text_repr(b, multiline=True)
++            self.assertEqual(actual, self.b_prefix + expected)
++            self.assertEqual(eval(actual), b)
++
++    def test_ascii_examples_multiline_unicode(self):
++        for s, _, expected in self.ascii_examples:
++            u = _u(s)
++            actual = text_repr(u, multiline=True)
++            self.assertEqual(actual, self.u_prefix + expected)
++            self.assertEqual(eval(actual), u)
++
++    def test_ascii_examples_defaultline_bytes(self):
++        for s, one, multi in self.ascii_examples:
++            expected = "\n" in s and multi or one
++            self.assertEqual(text_repr(_b(s)), self.b_prefix + expected)
++
++    def test_ascii_examples_defaultline_unicode(self):
++        for s, one, multi in self.ascii_examples:
++            expected = "\n" in s and multi or one
++            self.assertEqual(text_repr(_u(s)), self.u_prefix + expected)
++
++    def test_bytes_examples_oneline(self):
++        for b, expected, _ in self.bytes_examples:
++            actual = text_repr(b, multiline=False)
++            self.assertEqual(actual, self.b_prefix + expected)
++            self.assertEqual(eval(actual), b)
++
++    def test_bytes_examples_multiline(self):
++        for b, _, expected in self.bytes_examples:
++            actual = text_repr(b, multiline=True)
++            self.assertEqual(actual, self.b_prefix + expected)
++            self.assertEqual(eval(actual), b)
++
++    def test_unicode_examples_oneline(self):
++        for u, expected, _ in self.unicode_examples:
++            actual = text_repr(u, multiline=False)
++            self.assertEqual(actual, self.u_prefix + expected)
++            self.assertEqual(eval(actual), u)
++
++    def test_unicode_examples_multiline(self):
++        for u, _, expected in self.unicode_examples:
++            actual = text_repr(u, multiline=True)
++            self.assertEqual(actual, self.u_prefix + expected)
++            self.assertEqual(eval(actual), u)
++
++
  def test_suite():
      from unittest import TestLoader
      return TestLoader().loadTestsFromName(__name__)
 === modified file 'testtools/tests/test_matchers.py'
 --- testtools/tests/test_matchers.py	2011-09-13 23:54:14 +0000
 +++ testtools/tests/test_matchers.py	2011-09-13 23:54:14 +0000
@@ -12,7 +12,9 @@
+     )
  from testtools.compat import (
      StringIO,
--    to_text,
++    str_is_unicode,
++    text_repr,
++    _b,
      _u,
+     )
  from testtools.matchers import (
@@ -20,7 +22,11 @@
      AllMatch,
      Annotate,
      AnnotatedMismatch,
++<<<<<<< TREE
      Contains,
++=======
++    _BinaryMismatch,
++>>>>>>> MERGE-SOURCE
      Equals,
      DocTestMatches,
      DoesNotEndWith,
@@ -96,7 +102,7 @@
          mismatch = matcher.match(2)
          e = MismatchError(matchee, matcher, mismatch, True)
          expected = (
--            'Match failed. Matchee: "%s"\n'
++            'Match failed. Matchee: %r\n'
              'Matcher: %s\n'
              'Difference: %s\n' % (
                  matchee,
@@ -112,17 +118,80 @@
          matcher = Equals(_u('a'))
          mismatch = matcher.match(matchee)
          expected = (
--            'Match failed. Matchee: "%s"\n'
++            'Match failed. Matchee: %s\n'
              'Matcher: %s\n'
              'Difference: %s\n' % (
--                matchee,
++                text_repr(matchee),
                  matcher,
                  mismatch.describe(),
                  ))
          e = MismatchError(matchee, matcher, mismatch, True)
--        # XXX: Using to_text rather than str because, on Python 2, str will
--        # raise UnicodeEncodeError.
--        self.assertEqual(expected, to_text(e))
++        if str_is_unicode:
++            actual = str(e)
++        else:
++            actual = unicode(e)
++            # Using str() should still work, and return ascii only
++            self.assertEqual(
++                expected.replace(matchee, matchee.encode("unicode-escape")),
++                str(e).decode("ascii"))
++        self.assertEqual(expected, actual)
++
++
++class Test_BinaryMismatch(TestCase):
++    """Mismatches from binary comparisons need useful describe output"""
++
++    _long_string = "This is a longish multiline non-ascii string\n\xa7"
++    _long_b = _b(_long_string)
++    _long_u = _u(_long_string)
++
++    def test_short_objects(self):
++        o1, o2 = object(), object()
++        mismatch = _BinaryMismatch(o1, "!~", o2)
++        self.assertEqual(mismatch.describe(), "%r !~ %r" % (o1, o2))
++
++    def test_short_mixed_strings(self):
++        b, u = _b("\xa7"), _u("\xa7")
++        mismatch = _BinaryMismatch(b, "!~", u)
++        self.assertEqual(mismatch.describe(), "%r !~ %r" % (b, u))
++
++    def test_long_bytes(self):
++        one_line_b = self._long_b.replace(_b("\n"), _b(" "))
++        mismatch = _BinaryMismatch(one_line_b, "!~", self._long_b)
++        self.assertEqual(mismatch.describe(),
++            "%s:\nreference = %s\nactual = %s\n" % ("!~",
++                text_repr(one_line_b),
++                text_repr(self._long_b, multiline=True)))
++
++    def test_long_unicode(self):
++        one_line_u = self._long_u.replace("\n", " ")
++        mismatch = _BinaryMismatch(one_line_u, "!~", self._long_u)
++        self.assertEqual(mismatch.describe(),
++            "%s:\nreference = %s\nactual = %s\n" % ("!~",
++                text_repr(one_line_u),
++                text_repr(self._long_u, multiline=True)))
++
++    def test_long_mixed_strings(self):
++        mismatch = _BinaryMismatch(self._long_b, "!~", self._long_u)
++        self.assertEqual(mismatch.describe(),
++            "%s:\nreference = %s\nactual = %s\n" % ("!~",
++                text_repr(self._long_b, multiline=True),
++                text_repr(self._long_u, multiline=True)))
++
++    def test_long_bytes_and_object(self):
++        obj = object()
++        mismatch = _BinaryMismatch(self._long_b, "!~", obj)
++        self.assertEqual(mismatch.describe(),
++            "%s:\nreference = %s\nactual = %s\n" % ("!~",
++                text_repr(self._long_b, multiline=True),
++                repr(obj)))
++
++    def test_long_unicode_and_object(self):
++        obj = object()
++        mismatch = _BinaryMismatch(self._long_u, "!~", obj)
++        self.assertEqual(mismatch.describe(),
++            "%s:\nreference = %s\nactual = %s\n" % ("!~",
++                text_repr(self._long_u, multiline=True),
++                repr(obj)))
  class TestMatchersInterface(object):
@@ -208,6 +277,23 @@
          self.assertEqual("bar\n", matcher.want)
          self.assertEqual(doctest.ELLIPSIS, matcher.flags)
++    def test_describe_non_ascii_bytes(self):
++        """Even with bytestrings, the mismatch should be coercible to unicode
++
++        DocTestMatches is intended for text, but the Python 2 str type also
++        permits arbitrary binary inputs. This is a slightly bogus thing to do,
++        and under Python 3 using bytes objects will reasonably raise an error.
++        """
++        header = _b("\x89PNG\r\n\x1a\n...")
++        if str_is_unicode:
++            self.assertRaises(TypeError,
++                DocTestMatches, header, doctest.ELLIPSIS)
++            return
++        matcher = DocTestMatches(header, doctest.ELLIPSIS)
++        mismatch = matcher.match(_b("GIF89a\1\0\1\0\0\0\0;"))
++        # Must be treatable as unicode text, the exact output matters less
++        self.assertTrue(unicode(mismatch.describe()))
++
  class TestEqualsInterface(TestCase, TestMatchersInterface):
@@ -610,6 +696,21 @@
          mismatch = DoesNotStartWith("fo", "bo")
          self.assertEqual("'fo' does not start with 'bo'.", mismatch.describe())
++    def test_describe_non_ascii_unicode(self):
++        string = _u("A\xA7")
++        suffix = _u("B\xA7")
++        mismatch = DoesNotStartWith(string, suffix)
++        self.assertEqual("%s does not start with %s." % (
++            text_repr(string), text_repr(suffix)),
++            mismatch.describe())
++
++    def test_describe_non_ascii_bytes(self):
++        string = _b("A\xA7")
++        suffix = _b("B\xA7")
++        mismatch = DoesNotStartWith(string, suffix)
++        self.assertEqual("%r does not start with %r." % (string, suffix),
++            mismatch.describe())
++
  class StartsWithTests(TestCase):
@@ -617,7 +718,17 @@
      def test_str(self):
          matcher = StartsWith("bar")
--        self.assertEqual("Starts with 'bar'.", str(matcher))
++        self.assertEqual("StartsWith('bar')", str(matcher))
++
++    def test_str_with_bytes(self):
++        b = _b("\xA7")
++        matcher = StartsWith(b)
++        self.assertEqual("StartsWith(%r)" % (b,), str(matcher))
++
++    def test_str_with_unicode(self):
++        u = _u("\xA7")
++        matcher = StartsWith(u)
++        self.assertEqual("StartsWith(%r)" % (u,), str(matcher))
      def test_match(self):
          matcher = StartsWith("bar")
@@ -646,6 +757,21 @@
          mismatch = DoesNotEndWith("fo", "bo")
          self.assertEqual("'fo' does not end with 'bo'.", mismatch.describe())
++    def test_describe_non_ascii_unicode(self):
++        string = _u("A\xA7")
++        suffix = _u("B\xA7")
++        mismatch = DoesNotEndWith(string, suffix)
++        self.assertEqual("%s does not end with %s." % (
++            text_repr(string), text_repr(suffix)),
++            mismatch.describe())
++
++    def test_describe_non_ascii_bytes(self):
++        string = _b("A\xA7")
++        suffix = _b("B\xA7")
++        mismatch = DoesNotEndWith(string, suffix)
++        self.assertEqual("%r does not end with %r." % (string, suffix),
++            mismatch.describe())
++
  class EndsWithTests(TestCase):
@@ -653,7 +779,17 @@
      def test_str(self):
          matcher = EndsWith("bar")
--        self.assertEqual("Ends with 'bar'.", str(matcher))
++        self.assertEqual("EndsWith('bar')", str(matcher))
++
++    def test_str_with_bytes(self):
++        b = _b("\xA7")
++        matcher = EndsWith(b)
++        self.assertEqual("EndsWith(%r)" % (b,), str(matcher))
++
++    def test_str_with_unicode(self):
++        u = _u("\xA7")
++        matcher = EndsWith(u)
++        self.assertEqual("EndsWith(%r)" % (u,), str(matcher))
      def test_match(self):
          matcher = EndsWith("arf")
@@ -770,11 +906,17 @@
          ("MatchesRegex('a|b')", MatchesRegex('a|b')),
          ("MatchesRegex('a|b', re.M)", MatchesRegex('a|b', re.M)),
          ("MatchesRegex('a|b', re.I|re.M)", MatchesRegex('a|b', re.I|re.M)),
++        ("MatchesRegex(%r)" % (_b("\xA7"),), MatchesRegex(_b("\xA7"))),
++        ("MatchesRegex(%r)" % (_u("\xA7"),), MatchesRegex(_u("\xA7"))),
+         ]
      describe_examples = [
          ("'c' does not match /a|b/", 'c', MatchesRegex('a|b')),
          ("'c' does not match /a\d/", 'c', MatchesRegex(r'a\d')),
++        ("%r does not match /\\s+\\xa7/" % (_b('c'),),
++            _b('c'), MatchesRegex(_b("\\s+\xA7"))),
++        ("%r does not match /\\s+\\xa7/" % (_u('c'),),
++            _u('c'), MatchesRegex(_u("\\s+\xA7"))),
+         ]
 === modified file 'testtools/tests/test_testcase.py'
 --- testtools/tests/test_testcase.py	2011-09-13 23:54:14 +0000
 +++ testtools/tests/test_testcase.py	2011-09-13 23:54:14 +0000
@@ -488,7 +488,7 @@
          matchee = 'foo'
          matcher = Equals('bar')
          expected = (
--            'Match failed. Matchee: "%s"\n'
++            'Match failed. Matchee: %r\n'
              'Matcher: %s\n'
              'Difference: %s\n' % (
                  matchee,
@@ -528,10 +528,10 @@
          matchee = _u('\xa7')
          matcher = Equals(_u('a'))
          expected = (
--            'Match failed. Matchee: "%s"\n'
++            'Match failed. Matchee: %s\n'
              'Matcher: %s\n'
              'Difference: %s\n\n' % (
--                matchee,
++                repr(matchee).replace("\\xa7", matchee),
                  matcher,
                  matcher.match(matchee).describe(),
                  ))
@@ -565,6 +565,21 @@
          self.assertFails(expected_error, self.assertEquals, a, b)
          self.assertFails(expected_error, self.failUnlessEqual, a, b)
++    def test_assertEqual_non_ascii_str_with_newlines(self):
++        message = _u("Be careful mixing unicode and bytes")
++        a = "a\n\xa7\n"
++        b = "Just a longish string so the more verbose output form is used."
++        expected_error = '\n'.join([
++            '!=:',
++            "reference = '''\\",
++            'a',
++            repr('\xa7')[1:-1],
++            "'''",
++            'actual = %r' % (b,),
++            ': ' + message,
++            ])
++        self.assertFails(expected_error, self.assertEqual, a, b, message)
++
      def test_assertIsNone(self):
          self.assertIsNone(None)

testtools

Merge lp:~gz/testtools/unprintable-assertThat-804127 into lp:~testtools-committers/testtools/trunk

Commit message

Description of the change

Preview Diff

Subscribers