Merge into trunk : saml-unicode : Code : Canonical SSO provider

Status:	Merged
Approved by:	Daniel Manrique on 2019-04-10
Approved revision:	no longer in the source branch.
Merge reported by:	Otto Co-Pilot
Merged at revision:	not available
Proposed branch:	lp:~roadmr/canonical-identity-provider/saml-unicode
Merge into:	lp:canonical-identity-provider/release
Diff against target:	143 lines (+83/-1) 4 files modified src/ubuntu_sso_saml/processors.py (+10/-1) src/ubuntu_sso_saml/tests/test_processors.py (+19/-0) src/ubuntu_sso_saml/tests/test_utils.py (+39/-0) src/ubuntu_sso_saml/utils.py (+15/-0)
To merge this branch:	bzr merge lp:~roadmr/canonical-identity-provider/saml-unicode
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Celso Providelo (community)		2019-04-05	Approve on 2019-04-10
Review via email: mp+365615@code.launchpad.net

Commit message

SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.

The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.

Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.

This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.

Description of the change

SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.

The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.

Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.

This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.

Revision history for this message

Celso Providelo (cprov) wrote on 2019-04-10:

#

I guess the SAML library we use will require lots of love in order to support py3, when the time comes for SSO.

My feeling is that the conversion (unicode -> bytes) should only be necessary at the time the xml is generated, but that's mainly based in speculation.

Thanks to structuring the conversion, it's much easier to understand than if it was scattered.

review: Approve

Revision history for this message

Daniel Manrique (roadmr) wrote on 2019-04-10:

#

"My feeling is that the conversion (unicode -> bytes) should only be necessary at the time the xml is generated, but that's mainly based in speculation."

In general yes; but we also need to do it in 2 other places:

1- when generating the AttributeStatement, it's a sub-template that needs the bytes conversion on data we feed to it
2- when calculating the signature to be included in the final xml render; some of the elements that are part of the signature come from the assertion parameters and thus also need to be encoded if they are unicode, as the signature is calculated using hashlib which is a byte-a-tarian.

Thanks!

 === modified file 'src/ubuntu_sso_saml/processors.py'
 --- src/ubuntu_sso_saml/processors.py	2019-01-25 20:58:02 +0000
 +++ src/ubuntu_sso_saml/processors.py	2019-04-08 19:57:02 +0000
@@ -14,6 +14,7 @@
  from ubuntu_sso_saml.utils import (
      certificate_from_sp_config,
      get_config_from_processor,
++    utf8ize_dict,
+ )
@@ -383,6 +384,10 @@
                  attrs = {}
              certificate = certificate_from_sp_config(sp_config)
              self._assertion_params['ATTRIBUTES'] = self._eval_attributes(attrs)
++            # Everything passed to the format_assertion methods needs to be
++            # utf8-encoded str; no unicodes! (in Python3 parlance: everything
++            # needs to be ut8-encoded bytes, no strings/strs)
++            self._assertion_params = utf8ize_dict(self._assertion_params)
              if sp_config.audience:
                  self._format_assertion_restricted(certificate=certificate)
              else:
@@ -446,6 +451,9 @@
          sp_config = get_config_from_processor(self)
          # Helper properly handles sp_config being None.
          certificate = certificate_from_sp_config(sp_config)
++        # Convert _request_params and _response_params to utf-8 string
++        self._request_params = utf8ize_dict(self._request_params)
++        self._response_params = utf8ize_dict(self._response_params)
          super(CanonicalProcessor, self)._format_response(
              certificate=certificate)
@@ -469,4 +477,5 @@
                  email = self._subject
                  if email:
                      attributes[key] = email
--        return attributes
++        # Convert keys and values to utf8-encoded str and return that
++        return utf8ize_dict(attributes)
 === modified file 'src/ubuntu_sso_saml/tests/test_processors.py'
 --- src/ubuntu_sso_saml/tests/test_processors.py	2019-01-25 20:58:02 +0000
 +++ src/ubuntu_sso_saml/tests/test_processors.py	2019-04-08 19:57:02 +0000
@@ -1825,3 +1825,22 @@
                      '{}</saml:AttributeValue></saml:Attribute>'.format(
                          displayname))
          self.assertIn(expected, samlresponse)
++
++    def test_displayname_as_attribute_unicode(self):
++        # https://pad.lv/1821825
++        self.setup_saml_sp(attributes=json.dumps({
++            'fullname': '{{displayname}}',
++        }))
++        # The below contains chinese characters, a grave-accented "a" and a
++        # tilded n
++        weirdstring = u'\u4f60\u597d l\xe0 to\xf1'
++        self.account.displayname = weirdstring
++        self.account.save()
++        displayname = self.account.displayname
++
++        samlresponse = self.do_saml_request()
++        expected = ('<saml:Attribute Name="fullname">'
++                    '<saml:AttributeValue>'
++                    '{}</saml:AttributeValue></saml:Attribute>'.format(
++                        displayname.encode('utf-8')))
++        self.assertIn(expected, samlresponse)
 === modified file 'src/ubuntu_sso_saml/tests/test_utils.py'
 --- src/ubuntu_sso_saml/tests/test_utils.py	2018-04-06 19:41:49 +0000
 +++ src/ubuntu_sso_saml/tests/test_utils.py	2019-04-08 19:57:02 +0000
@@ -13,6 +13,7 @@
      get_config_for_acs,
      get_config_from_processor,
      get_deeplink_url,
++    utf8ize_dict,
+ )
@@ -144,3 +145,41 @@
          # Ensure it's identical to a what we read from the file, and not
          # a djangoized unicode.
          self.assertEqual(cert_string, certificate_from_sp_config(sp_config))
++
++
++class UnicodeToUtfConversionTestCase(TestCase):
++
++    def test_ascii_dict_untouched(self):
++        in_dict = {b'foo': b'bar'}
++        self.assertEqual(in_dict, utf8ize_dict(in_dict))
++
++    def test_unicode_dict_stringized(self):
++        in_dict = {u'foo': u'bar'}
++        out_dict = {b'foo': b'bar'}
++        self.assertEqual(out_dict, utf8ize_dict(in_dict))
++
++    def test_unicode_dict_key_utf8ized(self):
++        in_dict = {u'fo\xe1o': u'bar'}
++        out_dict = {b'fo\xc3\xa1o': b'bar'}
++        self.assertEqual(out_dict, utf8ize_dict(in_dict))
++
++    def test_unicode_dict_both_utf8ized(self):
++        in_dict = {u'fo\xe1o': u'ba\xe1r'}
++        out_dict = {b'fo\xc3\xa1o': b'ba\xc3\xa1r'}
++        self.assertEqual(out_dict, utf8ize_dict(in_dict))
++
++    def test_unicode_dict_combination(self):
++        in_dict = {u'fo\xe1o': u'bar',
++                   b'baz': u'qu\xe1x',
++                   b'grub': b'lilo',
++                   b'pr\xc3\xa1nk': u'j\xe1ck'}
++        out_dict = {b'fo\xc3\xa1o': b'bar',
++                    b'baz': b'qu\xc3\xa1x',
++                    b'grub': b'lilo',
++                    b'pr\xc3\xa1nk': b'j\xc3\xa1ck'}
++        self.assertEqual(out_dict, utf8ize_dict(in_dict))
++
++    def test_unicode_non_strings_untouched(self):
++        in_dict = {u'fo\xe1o': {1: u'ba\xe1r'}, 2: None}
++        out_dict = {b'fo\xc3\xa1o': {1: u'ba\xe1r'}, 2: None}
++        self.assertEqual(out_dict, utf8ize_dict(in_dict))
 === modified file 'src/ubuntu_sso_saml/utils.py'
 --- src/ubuntu_sso_saml/utils.py	2018-04-09 18:22:00 +0000
 +++ src/ubuntu_sso_saml/utils.py	2019-04-08 19:57:02 +0000
@@ -80,3 +80,18 @@
      else:
          certificate = None
      return certificate
++
++
++def utf8ize_dict(a_dict):
++    """
++    Return a new dict just like the old, but where all unicode keys
++    and values have been converted to utf8-encoded string values.
++    """
++    new_dict = {}
++    for k, v in a_dict.items():
++        if isinstance(k, unicode):
++            k = k.encode('utf-8')
++        if isinstance(v, unicode):
++            v = v.encode('utf-8')
++        new_dict[k] = v
++    return new_dict

Canonical SSO provider

Merge lp:~roadmr/canonical-identity-provider/saml-unicode into lp:canonical-identity-provider/release

Commit message

Description of the change

Preview Diff

Subscribers