Merge lp:~jelmer/bzr/urllib-verifies-ssl-certs into lp:bzr

Proposed by Jelmer Vernooij
Status: Superseded
Proposed branch: lp:~jelmer/bzr/urllib-verifies-ssl-certs
Merge into: lp:bzr
Diff against target: 247 lines (+133/-18)
6 files modified
bzrlib/config.py (+7/-1)
bzrlib/errors.py (+8/-0)
bzrlib/tests/test_http.py (+0/-1)
bzrlib/transport/__init__.py (+3/-3)
bzrlib/transport/http/_urllib2_wrappers.py (+109/-13)
doc/en/release-notes/bzr-2.5.txt (+6/-0)
To merge this branch: bzr merge lp:~jelmer/bzr/urllib-verifies-ssl-certs
Reviewer Review Type Date Requested Status
Martin Packman (community) Needs Fixing
Review via email: mp+81227@code.launchpad.net

This proposal has been superseded by a proposal from 2011-12-20.

Description of the change

Add support for verifying SSL certificates when using urllib.

This branch does actually work, but it's got some open ends:

* The location with the CAs to trust differs per system. Rather than hardcoding
  a list, I think we should have a configuration option that allows the user
  to specify what CA list they want to verify against.

  This option could then also be pointed at a user-specific list
  (rather than a system wide list). And packagers can put in the default
  for their distribution.

  Where do I obtain this option from, which config stack ?

* We probably want some way to specify that self-signed certificates are ok.
  Probably also a new option? We should probably still warn the user
  when a self-signed certificate is being trusted.

* There are no additional tests. There probably should be some, at least for
  e.g. the match_hostname helper.

* I have the feeling that I'm missing something. This was way too easy.
  What is it?

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

On 4 November 2011 14:03, Jelmer Vernooij <email address hidden> wrote:
> Jelmer Vernooij has proposed merging lp:~jelmer/bzr/urllib-verifies-ssl-certs into lp:bzr.
>
> Requested reviews:
>  bzr-core (bzr-core)
> Related bugs:
>  Bug #125055 in Bazaar: "defaulting to pycurl doesn't make sense anymore"
>  https://bugs.launchpad.net/bzr/+bug/125055
>  Bug #651161 in Bazaar: "bzr fails to verify ssl validity in https connections - by default --> as pycurl isn't a dep only a suggestion "
>  https://bugs.launchpad.net/bzr/+bug/651161
>
> For more details, see:
> https://code.launchpad.net/~jelmer/bzr/urllib-verifies-ssl-certs/+merge/81227
>
> Add support for verifying SSL certificates when using urllib.

That's great, thanks.

> This branch does actually work, but it's got some open ends:
>
> * The location with the CAs to trust differs per system. Rather than hardcoding
>  a list, I think we should have a configuration option that allows the user
>  to specify what CA list they want to verify against.
>
>  This option could then also be pointed at a user-specific list
>  (rather than a system wide list). And packagers can put in the default
>  for their distribution.
>
>  Where do I obtain this option from, which config stack ?

[needsfixing] I think the global stack would be a good place to start
- it would seem like a kind of contrived case to want different values
on a smaller scope. We can always change it later.

> * We probably want some way to specify that self-signed certificates are ok.
>  Probably also a new option? We should probably still warn the user
>  when a self-signed certificate is being trusted.

+1 to a new option and to always warning.

Though perhaps it would be equivalent and easier to just have an
option saying "don't do validation."

> * There are no additional tests. There probably should be some, at least for
>  e.g. the match_hostname helper.

It would be nice, if it's not too hard, to actually test against an
untrusted cert, or at least to do monkeypatching to make it look like
we got an invalid cert.

[needsfixing] I'd like a unit test for match_hostname covering the
various cases.

> * I have the feeling that I'm missing something. This was way too easy.
>  What is it?

I haven't read the rfc but it looks plausible. You could wait for a
review from vila, who has probably touched this most, or robert who is
an http nerd.

I wonder if this needs a change to Windows or Mac packaging to make
sure the certs are there. Or, possibly we can get them from the
system's keyring in some way.

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.0 KiB)

> Add support for verifying SSL certificates when using urllib.

\o/

>
> This branch does actually work, but it's got some open ends:
>
> * The location with the CAs to trust differs per system. Rather than hardcoding
> a list, I think we should have a configuration option that allows the user
> to specify what CA list they want to verify against.

pycurl already handles that for windows but getting it right in the general
case... sounds like a packaging issue :-/

>
> This option could then also be pointed at a user-specific list
> (rather than a system wide list). And packagers can put in the default
> for their distribution.
>
> Where do I obtain this option from, which config stack ?

I'd go with global to start with specifying the default value at
registration time if possible.

Arguably it's a system-wide config file that should be used, making packager
life easier.

>
> * We probably want some way to specify that self-signed certificates are ok.
> Probably also a new option? We should probably still warn the user
> when a self-signed certificate is being trusted.

authentication.conf is the right place for that and the option is even
documented but not implemented ;)

>
> * There are no additional tests. There probably should be some, at least for
> e.g. the match_hostname helper.

Yup. There are already https tests for pycurl that should be activated now
(<broken record> hey, did I mention the 'scenarios' attribute is cool for
that ? </broken record> ;)

>
> * I have the feeling that I'm missing something. This was way too easy.
> What is it?

Hehe.

The last time I worked on this area we were still supporting python-2.4/2.5
where https wasn't supported. Now, you also found the relevant code in
python-3.2 so... that certainly simplify things *a lot* :)

This wasn't a full review but as I understand it you weren't requesting one
either ;)

As for tests, if you look at the pycurl ones you'll certainly find that most
of the needed infrastructure is there to build certificates for special
purposes (self-signed, obsolete, etc) and easily setup an https server.

99 + SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 rules
100 + are mostly followed, but IP addresses are not accepted for *hostname*.

Hmm, IP addresses not accepted ? I bet our tests won't like that much.

pycurl itself was not open enough to test for these weird certificates AFAIR
but it's also a robust implementation so I wasn't concerned, but if we make
urllib the default one, we have to make sure we're making the right checks.

I'll need to dig a bit more to find back the relevant discussions about this
topic.

Also, defaulting to urllib is fine but the documentation probably needs to
be updated (created even ?) to make it clear that we're not defaulting to
pycurl. We had issues in the past (including failures to check lp
certificate) which had led to the actual defaults (urllib for http, pycurl
for https), let's make the switch crystal clear.

141 def connect_to_origin(self):
142 - ssl_sock = _ssl_wrap_socket(self.sock, self.key_file, self.cert_file)
143 + ssl_sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file,
144 + ...

Read more...

Revision history for this message
Martin Packman (gz) wrote :

I think there's general agreement this is great, and with the reqirement of Python 2.6 something that's now acheivable and should probably get into the next bzr release.

It absolutely needs some kind of automated testing though, and likely an option for disabling checks, I've used `curl -k` and `wget --no-check-certificate` enough to know I'd be annoyed if that option wasn't there.

review: Needs Fixing

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/config.py'
--- bzrlib/config.py 2011-12-19 16:41:49 +0000
+++ bzrlib/config.py 2011-12-20 09:40:32 +0000
@@ -1662,7 +1662,6 @@
1662 return value.decode(osutils.get_user_encoding())1662 return value.decode(osutils.get_user_encoding())
1663 return unicode_str1663 return unicode_str
16641664
1665
1666def _auto_user_id():1665def _auto_user_id():
1667 """Calculate automatic user identification.1666 """Calculate automatic user identification.
16681667
@@ -2761,6 +2760,13 @@
2761 help="If we wait for a new request from a client for more than"2760 help="If we wait for a new request from a client for more than"
2762 " X seconds, consider the client idle, and hangup."))2761 " X seconds, consider the client idle, and hangup."))
27632762
2763option_registry.register_lazy('ssl.ca_certs',
2764 'bzrlib.transport.http._urllib2_wrappers', 'opt_ssl_ca_certs')
2765
2766option_registry.register_lazy('ssl.cert_reqs',
2767 'bzrlib.transport.http._urllib2_wrappers', 'opt_ssl_cert_reqs')
2768
2769
27642770
2765class Section(object):2771class Section(object):
2766 """A section defines a dict of option name => value.2772 """A section defines a dict of option name => value.
27672773
=== modified file 'bzrlib/errors.py'
--- bzrlib/errors.py 2011-12-19 13:23:58 +0000
+++ bzrlib/errors.py 2011-12-20 09:40:32 +0000
@@ -1655,6 +1655,14 @@
1655 TransportError.__init__(self, msg, orig_error=orig_error)1655 TransportError.__init__(self, msg, orig_error=orig_error)
16561656
16571657
1658class CertificateError(TransportError):
1659
1660 _fmt = "Certificate error: %(error)s"
1661
1662 def __init__(self, error):
1663 self.error = error
1664
1665
1658class InvalidHttpRange(InvalidHttpResponse):1666class InvalidHttpRange(InvalidHttpResponse):
16591667
1660 _fmt = "Invalid http range %(range)r for %(path)s: %(msg)s"1668 _fmt = "Invalid http range %(range)r for %(path)s: %(msg)s"
16611669
=== modified file 'bzrlib/tests/test_http.py'
--- bzrlib/tests/test_http.py 2011-10-12 16:00:13 +0000
+++ bzrlib/tests/test_http.py 2011-12-20 09:40:32 +0000
@@ -32,7 +32,6 @@
32import bzrlib32import bzrlib
33from bzrlib import (33from bzrlib import (
34 bzrdir,34 bzrdir,
35 cethread,
36 config,35 config,
37 debug,36 debug,
38 errors,37 errors,
3938
=== modified file 'bzrlib/transport/__init__.py'
--- bzrlib/transport/__init__.py 2011-12-19 10:58:39 +0000
+++ bzrlib/transport/__init__.py 2011-12-20 09:40:32 +0000
@@ -1765,15 +1765,15 @@
1765 help="Read-only access of branches exported on the web.")1765 help="Read-only access of branches exported on the web.")
1766register_transport_proto('https://',1766register_transport_proto('https://',
1767 help="Read-only access of branches exported on the web using SSL.")1767 help="Read-only access of branches exported on the web using SSL.")
1768# The default http implementation is urllib, but https is pycurl if available1768# The default http implementation is urllib
1769register_lazy_transport('http://', 'bzrlib.transport.http._pycurl',1769register_lazy_transport('http://', 'bzrlib.transport.http._pycurl',
1770 'PyCurlTransport')1770 'PyCurlTransport')
1771register_lazy_transport('https://', 'bzrlib.transport.http._pycurl',
1772 'PyCurlTransport')
1771register_lazy_transport('http://', 'bzrlib.transport.http._urllib',1773register_lazy_transport('http://', 'bzrlib.transport.http._urllib',
1772 'HttpTransport_urllib')1774 'HttpTransport_urllib')
1773register_lazy_transport('https://', 'bzrlib.transport.http._urllib',1775register_lazy_transport('https://', 'bzrlib.transport.http._urllib',
1774 'HttpTransport_urllib')1776 'HttpTransport_urllib')
1775register_lazy_transport('https://', 'bzrlib.transport.http._pycurl',
1776 'PyCurlTransport')
17771777
1778register_transport_proto('ftp://', help="Access using passive FTP.")1778register_transport_proto('ftp://', help="Access using passive FTP.")
1779register_lazy_transport('ftp://', 'bzrlib.transport.ftp', 'FtpTransport')1779register_lazy_transport('ftp://', 'bzrlib.transport.ftp', 'FtpTransport')
17801780
=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- bzrlib/transport/http/_urllib2_wrappers.py 2011-12-19 13:23:58 +0000
+++ bzrlib/transport/http/_urllib2_wrappers.py 2011-12-20 09:40:32 +0000
@@ -63,12 +63,53 @@
63 config,63 config,
64 debug,64 debug,
65 errors,65 errors,
66 lazy_import,
66 osutils,67 osutils,
67 trace,68 trace,
68 transport,69 transport,
69 urlutils,70 urlutils,
70 )71 )
7172lazy_import.lazy_import(globals(), """
73import ssl
74""")
75
76
77def default_ca_certs():
78 """Find the default file with ca certificates."""
79 return u"/etc/ssl/certs/ca-certificates.crt"
80
81def default_cert_reqs():
82 return u"required"
83
84def cert_reqs_from_store(unicode_str):
85 import ssl
86 try:
87 return {
88 "required": ssl.CERT_REQUIRED,
89 "optional": ssl.CERT_OPTIONAL,
90 "none": ssl.CERT_NONE
91 }[unicode_str]
92 except KeyError:
93 raise ValueError("invalid value %s" % unicode_str)
94
95
96opt_ssl_ca_certs = config.Option('ssl.ca_certs',
97 default=default_ca_certs,
98 help="""\
99Path to certification authority certificates to trust.
100""")
101
102opt_ssl_cert_reqs = config.Option('ssl.cert_reqs',
103 default=default_cert_reqs,
104 from_unicode=cert_reqs_from_store,
105 help="""\
106Whether to require a certificate from the remote side. (default:required)
107
108Possible values:
109 * none: certificates ignored
110 * optional: Certificates not required, but validated if provided
111 * required: Certificates required, and validated
112""")
72113
73checked_kerberos = False114checked_kerberos = False
74kerberos = None115kerberos = None
@@ -312,17 +353,60 @@
312 self._wrap_socket_for_reporting(self.sock)353 self._wrap_socket_for_reporting(self.sock)
313354
314355
315# Build the appropriate socket wrapper for ssl356# These two methods were imported from Python 3.2's ssl module
316try:357
317 # python 2.6 introduced a better ssl package358def _dnsname_to_pat(dn):
318 import ssl359 pats = []
319 _ssl_wrap_socket = ssl.wrap_socket360 for frag in dn.split(r'.'):
320except ImportError:361 if frag == '*':
321 # python versions prior to 2.6 don't have ssl and ssl.wrap_socket instead362 # When '*' is a fragment by itself, it matches a non-empty dotless
322 # they use httplib.FakeSocket363 # fragment.
323 def _ssl_wrap_socket(sock, key_file, cert_file):364 pats.append('[^.]+')
324 ssl_sock = socket.ssl(sock, key_file, cert_file)365 else:
325 return httplib.FakeSocket(sock, ssl_sock)366 # Otherwise, '*' matches any dotless fragment.
367 frag = re.escape(frag)
368 pats.append(frag.replace(r'\*', '[^.]*'))
369 return re.compile(r'\A' + r'\.'.join(pats) + r'\Z', re.IGNORECASE)
370
371
372def match_hostname(cert, hostname):
373 """Verify that *cert* (in decoded format as returned by
374 SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 rules
375 are mostly followed, but IP addresses are not accepted for *hostname*.
376
377 CertificateError is raised on failure. On success, the function
378 returns nothing.
379 """
380 if not cert:
381 raise ValueError("empty or no certificate")
382 dnsnames = []
383 san = cert.get('subjectAltName', ())
384 for key, value in san:
385 if key == 'DNS':
386 if _dnsname_to_pat(value).match(hostname):
387 return
388 dnsnames.append(value)
389 if not san:
390 # The subject is only checked when subjectAltName is empty
391 for sub in cert.get('subject', ()):
392 for key, value in sub:
393 # XXX according to RFC 2818, the most specific Common Name
394 # must be used.
395 if key == 'commonName':
396 if _dnsname_to_pat(value).match(hostname):
397 return
398 dnsnames.append(value)
399 if len(dnsnames) > 1:
400 raise errors.CertificateError("hostname %r "
401 "doesn't match either of %s"
402 % (hostname, ', '.join(map(repr, dnsnames))))
403 elif len(dnsnames) == 1:
404 raise errors.CertificateError("hostname %r "
405 "doesn't match %r"
406 % (hostname, dnsnames[0]))
407 else:
408 raise errors.CertificateError("no appropriate commonName or "
409 "subjectAltName fields were found")
326410
327411
328class HTTPSConnection(AbstractHTTPConnection, httplib.HTTPSConnection):412class HTTPSConnection(AbstractHTTPConnection, httplib.HTTPSConnection):
@@ -345,7 +429,19 @@
345 self.connect_to_origin()429 self.connect_to_origin()
346430
347 def connect_to_origin(self):431 def connect_to_origin(self):
348 ssl_sock = _ssl_wrap_socket(self.sock, self.key_file, self.cert_file)432 # FIXME JRV 2011-12-18: Use location config here?
433 config_stack = config.GlobalStack()
434 ca_certs = config_stack.get('ssl.ca_certs')
435 cert_reqs = config_stack.get('ssl.cert_reqs')
436 if cert_reqs == "none":
437 trace.warning("not checking SSL certificates for %s: %d",
438 self.host, self.port)
439 ssl_sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file,
440 cert_reqs=cert_reqs, ca_certs=ca_certs)
441 peer_cert = ssl_sock.getpeercert()
442 if cert_reqs == "required" or peer_cert:
443 match_hostname(peer_cert, self.host)
444
349 # Wrap the ssl socket before anybody use it445 # Wrap the ssl socket before anybody use it
350 self._wrap_socket_for_reporting(ssl_sock)446 self._wrap_socket_for_reporting(ssl_sock)
351447
352448
=== modified file 'doc/en/release-notes/bzr-2.5.txt'
--- doc/en/release-notes/bzr-2.5.txt 2011-12-19 16:41:49 +0000
+++ doc/en/release-notes/bzr-2.5.txt 2011-12-20 09:40:32 +0000
@@ -408,6 +408,12 @@
408 to 'line-based'. This replace the previous shelf UI only patch using408 to 'line-based'. This replace the previous shelf UI only patch using
409 ``INSIDE_EMACS``. (Benoît Pierre)409 ``INSIDE_EMACS``. (Benoît Pierre)
410410
411* urllib-based HTTPS client connections now verify the server certificate
412 as well as the hostname. (Jelmer Vernooij)
413
414* urllib-based HTTPS connections are now the default.
415 (Jelmer Vernooij, #125055, #651161)
416
411Bug Fixes417Bug Fixes
412*********418*********
413419