Merge lp:~roadmr/canonical-identity-provider/saml-unicode into lp:canonical-identity-provider/release

Proposed by Daniel Manrique
Status: Merged
Approved by: Daniel Manrique
Approved revision: no longer in the source branch.
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: lp:~roadmr/canonical-identity-provider/saml-unicode
Merge into: lp:canonical-identity-provider/release
Diff against target: 143 lines (+83/-1)
4 files modified
src/ubuntu_sso_saml/processors.py (+10/-1)
src/ubuntu_sso_saml/tests/test_processors.py (+19/-0)
src/ubuntu_sso_saml/tests/test_utils.py (+39/-0)
src/ubuntu_sso_saml/utils.py (+15/-0)
To merge this branch: bzr merge lp:~roadmr/canonical-identity-provider/saml-unicode
Reviewer Review Type Date Requested Status
Celso Providelo (community) Approve
Review via email: mp+365615@code.launchpad.net

Commit message

SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.

The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.

Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.

This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.

Description of the change

SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.

The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.

Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.

This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.

To post a comment you must log in.
Revision history for this message
Celso Providelo (cprov) wrote :

I guess the SAML library we use will require lots of love in order to support py3, when the time comes for SSO.

My feeling is that the conversion (unicode -> bytes) should only be necessary at the time the xml is generated, but that's mainly based in speculation.

Thanks to structuring the conversion, it's much easier to understand than if it was scattered.

review: Approve
Revision history for this message
Daniel Manrique (roadmr) wrote :

"My feeling is that the conversion (unicode -> bytes) should only be necessary at the time the xml is generated, but that's mainly based in speculation."

In general yes; but we also need to do it in 2 other places:

1- when generating the AttributeStatement, it's a sub-template that needs the bytes conversion on data we feed to it
2- when calculating the signature to be included in the final xml render; some of the elements that are part of the signature come from the assertion parameters and thus also need to be encoded if they are unicode, as the signature is calculated using hashlib which is a byte-a-tarian.

Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/ubuntu_sso_saml/processors.py'
2--- src/ubuntu_sso_saml/processors.py 2019-01-25 20:58:02 +0000
3+++ src/ubuntu_sso_saml/processors.py 2019-04-08 19:57:02 +0000
4@@ -14,6 +14,7 @@
5 from ubuntu_sso_saml.utils import (
6 certificate_from_sp_config,
7 get_config_from_processor,
8+ utf8ize_dict,
9 )
10
11
12@@ -383,6 +384,10 @@
13 attrs = {}
14 certificate = certificate_from_sp_config(sp_config)
15 self._assertion_params['ATTRIBUTES'] = self._eval_attributes(attrs)
16+ # Everything passed to the format_assertion methods needs to be
17+ # utf8-encoded str; no unicodes! (in Python3 parlance: everything
18+ # needs to be ut8-encoded bytes, no strings/strs)
19+ self._assertion_params = utf8ize_dict(self._assertion_params)
20 if sp_config.audience:
21 self._format_assertion_restricted(certificate=certificate)
22 else:
23@@ -446,6 +451,9 @@
24 sp_config = get_config_from_processor(self)
25 # Helper properly handles sp_config being None.
26 certificate = certificate_from_sp_config(sp_config)
27+ # Convert _request_params and _response_params to utf-8 string
28+ self._request_params = utf8ize_dict(self._request_params)
29+ self._response_params = utf8ize_dict(self._response_params)
30 super(CanonicalProcessor, self)._format_response(
31 certificate=certificate)
32
33@@ -469,4 +477,5 @@
34 email = self._subject
35 if email:
36 attributes[key] = email
37- return attributes
38+ # Convert keys and values to utf8-encoded str and return that
39+ return utf8ize_dict(attributes)
40
41=== modified file 'src/ubuntu_sso_saml/tests/test_processors.py'
42--- src/ubuntu_sso_saml/tests/test_processors.py 2019-01-25 20:58:02 +0000
43+++ src/ubuntu_sso_saml/tests/test_processors.py 2019-04-08 19:57:02 +0000
44@@ -1825,3 +1825,22 @@
45 '{}</saml:AttributeValue></saml:Attribute>'.format(
46 displayname))
47 self.assertIn(expected, samlresponse)
48+
49+ def test_displayname_as_attribute_unicode(self):
50+ # https://pad.lv/1821825
51+ self.setup_saml_sp(attributes=json.dumps({
52+ 'fullname': '{{displayname}}',
53+ }))
54+ # The below contains chinese characters, a grave-accented "a" and a
55+ # tilded n
56+ weirdstring = u'\u4f60\u597d l\xe0 to\xf1'
57+ self.account.displayname = weirdstring
58+ self.account.save()
59+ displayname = self.account.displayname
60+
61+ samlresponse = self.do_saml_request()
62+ expected = ('<saml:Attribute Name="fullname">'
63+ '<saml:AttributeValue>'
64+ '{}</saml:AttributeValue></saml:Attribute>'.format(
65+ displayname.encode('utf-8')))
66+ self.assertIn(expected, samlresponse)
67
68=== modified file 'src/ubuntu_sso_saml/tests/test_utils.py'
69--- src/ubuntu_sso_saml/tests/test_utils.py 2018-04-06 19:41:49 +0000
70+++ src/ubuntu_sso_saml/tests/test_utils.py 2019-04-08 19:57:02 +0000
71@@ -13,6 +13,7 @@
72 get_config_for_acs,
73 get_config_from_processor,
74 get_deeplink_url,
75+ utf8ize_dict,
76 )
77
78
79@@ -144,3 +145,41 @@
80 # Ensure it's identical to a what we read from the file, and not
81 # a djangoized unicode.
82 self.assertEqual(cert_string, certificate_from_sp_config(sp_config))
83+
84+
85+class UnicodeToUtfConversionTestCase(TestCase):
86+
87+ def test_ascii_dict_untouched(self):
88+ in_dict = {b'foo': b'bar'}
89+ self.assertEqual(in_dict, utf8ize_dict(in_dict))
90+
91+ def test_unicode_dict_stringized(self):
92+ in_dict = {u'foo': u'bar'}
93+ out_dict = {b'foo': b'bar'}
94+ self.assertEqual(out_dict, utf8ize_dict(in_dict))
95+
96+ def test_unicode_dict_key_utf8ized(self):
97+ in_dict = {u'fo\xe1o': u'bar'}
98+ out_dict = {b'fo\xc3\xa1o': b'bar'}
99+ self.assertEqual(out_dict, utf8ize_dict(in_dict))
100+
101+ def test_unicode_dict_both_utf8ized(self):
102+ in_dict = {u'fo\xe1o': u'ba\xe1r'}
103+ out_dict = {b'fo\xc3\xa1o': b'ba\xc3\xa1r'}
104+ self.assertEqual(out_dict, utf8ize_dict(in_dict))
105+
106+ def test_unicode_dict_combination(self):
107+ in_dict = {u'fo\xe1o': u'bar',
108+ b'baz': u'qu\xe1x',
109+ b'grub': b'lilo',
110+ b'pr\xc3\xa1nk': u'j\xe1ck'}
111+ out_dict = {b'fo\xc3\xa1o': b'bar',
112+ b'baz': b'qu\xc3\xa1x',
113+ b'grub': b'lilo',
114+ b'pr\xc3\xa1nk': b'j\xc3\xa1ck'}
115+ self.assertEqual(out_dict, utf8ize_dict(in_dict))
116+
117+ def test_unicode_non_strings_untouched(self):
118+ in_dict = {u'fo\xe1o': {1: u'ba\xe1r'}, 2: None}
119+ out_dict = {b'fo\xc3\xa1o': {1: u'ba\xe1r'}, 2: None}
120+ self.assertEqual(out_dict, utf8ize_dict(in_dict))
121
122=== modified file 'src/ubuntu_sso_saml/utils.py'
123--- src/ubuntu_sso_saml/utils.py 2018-04-09 18:22:00 +0000
124+++ src/ubuntu_sso_saml/utils.py 2019-04-08 19:57:02 +0000
125@@ -80,3 +80,18 @@
126 else:
127 certificate = None
128 return certificate
129+
130+
131+def utf8ize_dict(a_dict):
132+ """
133+ Return a new dict just like the old, but where all unicode keys
134+ and values have been converted to utf8-encoded string values.
135+ """
136+ new_dict = {}
137+ for k, v in a_dict.items():
138+ if isinstance(k, unicode):
139+ k = k.encode('utf-8')
140+ if isinstance(v, unicode):
141+ v = v.encode('utf-8')
142+ new_dict[k] = v
143+ return new_dict