pyjunitxml

Merge lp:~gz/pyjunitxml/ensure_well_formed_xml_625589 into lp:pyjunitxml

ensure_well_formed_xml_625589
Merge into trunk

Proposed by Martin Packman on 2010-09-03

Status:	Merged
Merged at revision:	21
Proposed branch:	lp:~gz/pyjunitxml/ensure_well_formed_xml_625589
Merge into:	lp:pyjunitxml
Diff against target:	275 lines (+181/-9) (has conflicts) 2 files modified junitxml/__init__.py (+57/-9) junitxml/tests/test_junitxml.py (+124/-0) Text conflict in junitxml/__init__.py Text conflict in junitxml/tests/test_junitxml.py
To merge this branch:	bzr merge lp:~gz/pyjunitxml/ensure_well_formed_xml_625589
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Robert Collins	2010-09-03	Needs Fixing on 2010-09-11
Vincent Ladeuil		Approve on 2010-09-05
Review via email: mp+34573@code.launchpad.net

Description of the change

Fixes the escaping function to be properly robust against arbitrary input, and to output utf-8 bytestrings on Python 2.

There are a couple of judgement calls in how this escaping is written. Firstly, invalid cdata is stripped rather than included in some escaped form in the output. Without resorting to clever things that would require the junit xml format there's always the potential for confusing output like "AssertionError: '' != ''" but that's not much worse than "AssertionError: '\x00' != '\x00'" or similar. Also, \r is stripped even though it may appear in xml content, this is largely because of how parsing handles it anyway.

This branch builds on lp:~gz/pyjunitxml/unexpected_expectations_python_2.7 and p:~gz/pyjunitxml/source_compat_python_3 but launchpad doesn't like multiple prerequisite branches.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-09-05:

Woohoo ! Makes babune happy :-)

First time we get an actual Test Result, compare:
http://babune.ladeuil.net:24842/job/selftest-windows/153/
with
http://babune.ladeuil.net:24842/job/selftest-windows/154/

I will be running with this branch for all other platforms to check
against regressions.

review: Approve

Revision history for this message

Martin Packman (gz) wrote on 2010-09-05:

> Woohoo ! Makes babune happy :-)

Ah, great, I was going to ask you to give this a go tomorrow.

> First time we get an actual Test Result, compare:
> http://babune.ladeuil.net:24842/job/selftest-windows/153/
> with
> http://babune.ladeuil.net:24842/job/selftest-windows/154/
>
> I will be running with this branch for all other platforms to check
> against regressions.

Went to check if maverick was now working as well, and apparently it was already made happy by fixing the elementtree thing. The testing is better for this when there are lots of different things going wrong with the test suite. :)

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-09-10:

@Robert: ping, this has been working well on babune for the last days. Anything else needed to land ?

Revision history for this message

Robert Collins (lifeless) wrote on 2010-09-11:

Fails selftest:

:!python -m testtools.run junitxml.test_suite
Tests running...
======================================================================
ERROR: junitxml.tests.test_junitxml.TestWellFormedXml.test_error_with_surrogates
----------------------------------------------------------------------
Traceback (most recent call last):
File "junitxml/tests/test_junitxml.py", line 313, in test_error_with_surrogates
self.assertTrue(unichr(0x201A2) in traceback)
UnboundLocalError: local variable 'unichr' referenced before assignment
Ran 18 tests in 0.007s

Also has conflicts in trunk, but they are shallow.

review: Needs Fixing

Revision history for this message

Robert Collins (lifeless) wrote on 2010-09-11:

And needs NEWS

lp:~gz/pyjunitxml/ensure_well_formed_xml_625589 updated on 2010-09-11

21. By Martin Packman on 2010-09-11: Don't create a local unichr for test in Python 3 instead use expected string

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Martin Packman

Robert Collins

 === modified file 'junitxml/__init__.py'
 --- junitxml/__init__.py	2010-09-11 22:25:07 +0000
 +++ junitxml/__init__.py	2010-09-11 22:28:41 +0000
@@ -8,9 +8,9 @@
  import datetime
++import re
  import time
  import unittest
--from xml.sax.saxutils import escape
  # same format as sys.version_info: "A tuple containing the five components of
  # the version number: major, minor, micro, releaselevel, and serial. All
@@ -52,12 +52,49 @@
          return None
--def _error_name(eclass):
--    module = eclass.__module__
--    if module not in ("__main__", "builtins", "exceptions"):
--        return ".".join([module, eclass.__name__])
--    return eclass.__name__
--
++<<<<<<< TREE
++def _error_name(eclass):
++    module = eclass.__module__
++    if module not in ("__main__", "builtins", "exceptions"):
++        return ".".join([module, eclass.__name__])
++    return eclass.__name__
++
++=======
++def _error_name(eclass):
++    module = eclass.__module__
++    if module not in ("__main__", "builtins", "exceptions"):
++        return ".".join([module, eclass.__name__])
++    return eclass.__name__
++
++
++_non_cdata = "[\0-\b\x0B-\x1F\uD800-\uDFFF\uFFFE\uFFFF]+"
++if "\\u" in _non_cdata:
++    _non_cdata = _non_cdata.decode("unicode-escape")
++    def _strip_invalid_chars(s, _sub=re.compile(_non_cdata, re.UNICODE).sub):
++        if not isinstance(s, unicode):
++            try:
++                s = s.decode("utf-8")
++            except UnicodeDecodeError:
++                s = s.decode("ascii", "replace")
++        return _sub("", s).encode("utf-8")
++else:
++    def _strip_invalid_chars(s, _sub=re.compile(_non_cdata, re.UNICODE).sub):
++        return _sub("", s)
++def _escape_content(s):
++    return (_strip_invalid_chars(s)
++        .replace("&", "&amp;")
++        .replace("<", "&lt;")
++        .replace("]]>", "]]&gt;"))
++def _escape_attr(s):
++    return (_strip_invalid_chars(s)
++        .replace("&", "&amp;")
++        .replace("<", "&lt;")
++        .replace("]]>", "]]&gt;")
++        .replace('"', "&quot;")
++        .replace("\t", "&#x9;")
++        .replace("\n", "&#xA;"))
++
++>>>>>>> MERGE-SOURCE
  class JUnitXmlResult(unittest.TestResult):
      """A TestResult which outputs JUnit compatible XML."""
@@ -71,6 +108,9 @@
          """
          self.__super = super(JUnitXmlResult, self)
          self.__super.__init__()
++        # GZ 2010-09-03: We have a problem if passed a text stream in Python 3
++        #                as really we want to write raw UTF-8 to ensure that
++        #                the encoding is not mangled later
          self._stream = stream
          self._results = []
          self._set_time = None
@@ -122,7 +162,7 @@
          else:
              classname, name = test_id[:class_end], test_id[class_end+1:]
          self._results.append('<testcase classname="%s" name="%s" '
--            'time="%0.3f"' % (escape(classname), escape(name), duration))
++            'time="%0.3f"' % (_escape_attr(classname), _escape_attr(name), duration))
      def stopTestRun(self):
          """Stop a test run.
@@ -142,13 +182,21 @@
          self.__super.addError(test, error)
          self._test_case_string(test)
          self._results.append('>\n')
++<<<<<<< TREE
          self._results.append('<error type="%s">%s</error>\n</testcase>\n'% (escape(_error_name(error[0])), escape(self._exc_info_to_string(error, test))))
++=======
++        self._results.append('<error type="%s">%s</error>\n</testcase>\n'% (_escape_attr(_error_name(error[0])), _escape_content(self._exc_info_to_string(error, test))))
++>>>>>>> MERGE-SOURCE
      def addFailure(self, test, error):
          self.__super.addFailure(test, error)
          self._test_case_string(test)
          self._results.append('>\n')
++<<<<<<< TREE
          self._results.append('<failure type="%s">%s</failure>\n</testcase>\n'% (escape(_error_name(error[0])), escape(self._exc_info_to_string(error, test))))
++=======
++        self._results.append('<failure type="%s">%s</failure>\n</testcase>\n'% (_escape_attr(_error_name(error[0])), _escape_content(self._exc_info_to_string(error, test))))
++>>>>>>> MERGE-SOURCE
      def addSuccess(self, test):
          self.__super.addSuccess(test)
@@ -163,7 +211,7 @@
              pass
          self._test_case_string(test)
          self._results.append('>\n')
--        self._results.append('<skip>%s</skip>\n</testcase>\n'% escape(reason))
++        self._results.append('<skip>%s</skip>\n</testcase>\n'% _escape_attr(reason))
      def addUnexpectedSuccess(self, test):
          try:
 === modified file 'junitxml/tests/test_junitxml.py'
 --- junitxml/tests/test_junitxml.py	2010-09-11 22:25:07 +0000
 +++ junitxml/tests/test_junitxml.py	2010-09-11 22:28:41 +0000
@@ -11,7 +11,9 @@
      from io import StringIO
  import datetime
  import re
++import sys
  import unittest
++import xml.dom.minidom
  import junitxml
@@ -188,12 +190,20 @@
          expected_failure_support = [True]
          class ExpectedFail(unittest.TestCase):
              def test_me(self):
++<<<<<<< TREE
                  self.fail("fail")
              try:
                  test_me = unittest.expectedFailure(test_me)
              except AttributeError:
                  # Older python - just let the test fail
                  expected_failure_support[0] = False
++=======
++                self.fail("fail")
++            try:
++                test_me = unittest.expectedFailure(test_me)
++            except AttributeError:
++                pass # Older python - just let the test fail
++>>>>>>> MERGE-SOURCE
          self.result.startTestRun()
          ExpectedFail("test_me").run(self.result)
          self.result.stopTestRun()
@@ -210,5 +220,119 @@
  """
          if expected_failure_support[0]:
              self.assertEqual(expected, output)
++<<<<<<< TREE
          else:
              self.assertEqual(expected_old, output)
++=======
++
++
++class TestWellFormedXml(unittest.TestCase):
++    """XML created should always be well formed even with odd test cases"""
++
++    def _run_and_parse_test(self, case):
++        output = StringIO()
++        result = junitxml.JUnitXmlResult(output)
++        result.startTestRun()
++        case.run(result)
++        result.stopTestRun()
++        return xml.dom.minidom.parseString(output.getvalue())
++
++    def test_failure_with_amp(self):
++        """Check the failure element content is escaped"""
++        class FailWithAmp(unittest.TestCase):
++            def runTest(self):
++                self.fail("& should be escaped as &amp;")
++        doc = self._run_and_parse_test(FailWithAmp())
++        self.assertTrue(
++            doc.getElementsByTagName("failure")[0].firstChild.nodeValue
++                .endswith("AssertionError: & should be escaped as &amp;\n"))
++
++    def test_quotes_in_test_case_id(self):
++        """Check that quotes in an attribute are escaped"""
++        class QuoteId(unittest.TestCase):
++            def id(self):
++                return unittest.TestCase.id(self) + '("quotes")'
++            def runTest(self):
++                pass
++        doc = self._run_and_parse_test(QuoteId())
++        self.assertEqual('runTest("quotes")',
++            doc.getElementsByTagName("testcase")[0].getAttribute("name"))
++
++    def test_skip_reason(self):
++        """Check the skip element content is escaped"""
++        class SkipWithLt(unittest.TestCase):
++            def runTest(self):
++                self.fail("version < 2.7")
++            try:
++                runTest = unittest.skip("2.7 <= version")(runTest)
++            except AttributeError:
++                self.has_skip = False
++            else:
++                self.has_skip = True
++        doc = self._run_and_parse_test(SkipWithLt())
++        if self.has_skip:
++            self.assertEqual('2.7 <= version',
++                doc.getElementsByTagName("skip")[0].firstChild.nodeValue)
++        else:
++            self.assertTrue(
++                doc.getElementsByTagName("failure")[0].firstChild.nodeValue
++                    .endswith("AssertionError: version < 2.7\n"))
++
++    def test_error_with_control_characters(self):
++        """Check C0 control characters are stripped rather than output"""
++        class ErrorWithC0(unittest.TestCase):
++            def runTest(self):
++                raise ValueError("\x1F\x0E\x0C\x0B\x08\x01\x00lost control")
++        doc = self._run_and_parse_test(ErrorWithC0())
++        self.assertTrue(
++            doc.getElementsByTagName("error")[0].firstChild.nodeValue
++                .endswith("ValueError: lost control\n"))
++
++    def test_error_with_invalid_cdata(self):
++        """Check unicode outside the valid cdata range is stripped"""
++        if len("\uffff") == 1:
++            # Basic str type supports unicode
++            exception = ValueError("\ufffe\uffffEOF")
++        else:
++            class UTF8_Error(Exception):
++                def __unicode__(self):
++                    return str(self).decode("UTF-8")
++            exception = UTF8_Error("\xef\xbf\xbe\xef\xbf\xbfEOF")
++        class ErrorWithBadUnicode(unittest.TestCase):
++            def runTest(self):
++                raise exception
++        doc = self._run_and_parse_test(ErrorWithBadUnicode())
++        self.assertTrue(
++            doc.getElementsByTagName("error")[0].firstChild.nodeValue
++                .endswith("Error: EOF\n"))
++
++    def test_error_with_surrogates(self):
++        """Check unicode surrogates are handled properly, paired or otherwise
++
++        This is a pain due to suboptimal unicode support in Python and the
++        various changes in Python 3. On UCS-2 builds there is no easy way of
++        getting rid of unpaired surrogates while leaving valid pairs alone, so
++        this test doesn't require astral characters are kept there.
++        """
++        if len("\uffff") == 1:
++            exception = ValueError("paired: \U000201a2"
++                " unpaired: "+chr(0xD800)+"-"+chr(0xDFFF))
++            astral_char = "\U000201a2"
++        else:
++            class UTF8_Error(Exception):
++                def __unicode__(self):
++                    return str(self).decode("UTF-8")
++            exception = UTF8_Error("paired: \xf0\xa0\x86\xa2"
++                " unpaired: \xed\xa0\x80-\xed\xbf\xbf")
++            astral_char = "\U000201a2".decode("unicode-escape")
++        class ErrorWithSurrogates(unittest.TestCase):
++            def runTest(self):
++                raise exception
++        doc = self._run_and_parse_test(ErrorWithSurrogates())
++        traceback = doc.getElementsByTagName("error")[0].firstChild.nodeValue
++        if sys.maxunicode == 0xFFFF:
++            pass # would be nice to handle astral characters properly even so
++        else:
++            self.assertTrue(astral_char in traceback)
++        self.assertTrue(traceback.endswith(" unpaired: -\n"))
++>>>>>>> MERGE-SOURCE

pyjunitxml

Merge lp:~gz/pyjunitxml/ensure_well_formed_xml_625589 into lp:pyjunitxml

Commit message

Description of the change

Preview Diff

Subscribers