Merge lp:~roadmr/canonical-identity-provider/saml-unicode into lp:canonical-identity-provider/release
Status: | Merged |
---|---|
Approved by: | Daniel Manrique |
Approved revision: | no longer in the source branch. |
Merge reported by: | Otto Co-Pilot |
Merged at revision: | not available |
Proposed branch: | lp:~roadmr/canonical-identity-provider/saml-unicode |
Merge into: | lp:canonical-identity-provider/release |
Diff against target: |
143 lines (+83/-1) 4 files modified
src/ubuntu_sso_saml/processors.py (+10/-1) src/ubuntu_sso_saml/tests/test_processors.py (+19/-0) src/ubuntu_sso_saml/tests/test_utils.py (+39/-0) src/ubuntu_sso_saml/utils.py (+15/-0) |
To merge this branch: | bzr merge lp:~roadmr/canonical-identity-provider/saml-unicode |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Celso Providelo (community) | Approve | ||
Review via email: mp+365615@code.launchpad.net |
Commit message
SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.
The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.
Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.
This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.
Description of the change
SAML: Ensure all dicts used to build assertions contain only utf-8-encoded data.
The SAML library we use assumes use only of ascii inputs and parameters and Python 2 str <-> unicode implicit conversion intricacies. Some of those assumptions are broken by SSO's use of the library and particularly the CanonicalProcessor, which gets a lot of those assertion parameters and attributes from the database; SP config parameters, custom attributes come from Django ORM and are thus unicode, and since the SAML library doesn't do explicit unicode data encoding, it barfs when implicit conversions for unicode data that contains non-ascii characters are attempted, particularly when using the python string.Template classes and base64-encoding the final assertion;these behave predictably badly when given a str template and fed unicode substitution values.
Since the output is expected to be utf8-encoded XML, this MP just ensures all the pieces used by the library to assemble, sign and encode the assertion are sent as utf8-encoded strs rather than unicodes.
This problem was only exposed when we added a new "full name" substitution for SAML attributes: up until now, by some fluke, all the date we fed to the SAML library was ascii-only and so even when we were sending mixed strs and unicodes back and forth, implicit conversion of unicode to str worked fine and the problem went undetected. However, fullnames, unlike URLs and other identifiers, usernames and OpenIDs, understandably can contain non-ascii characters.
I guess the SAML library we use will require lots of love in order to support py3, when the time comes for SSO.
My feeling is that the conversion (unicode -> bytes) should only be necessary at the time the xml is generated, but that's mainly based in speculation.
Thanks to structuring the conversion, it's much easier to understand than if it was scattered.