Merge lp:~jelmer/bzr/urllib-verifies-ssl-certs into lp:bzr

Proposed by Jelmer Vernooij
Status: Superseded
Proposed branch: lp:~jelmer/bzr/urllib-verifies-ssl-certs
Merge into: lp:bzr
Diff against target: 247 lines (+133/-18)
6 files modified
bzrlib/config.py (+7/-1)
bzrlib/errors.py (+8/-0)
bzrlib/tests/test_http.py (+0/-1)
bzrlib/transport/__init__.py (+3/-3)
bzrlib/transport/http/_urllib2_wrappers.py (+109/-13)
doc/en/release-notes/bzr-2.5.txt (+6/-0)
To merge this branch: bzr merge lp:~jelmer/bzr/urllib-verifies-ssl-certs
Reviewer Review Type Date Requested Status
Martin Packman (community) Needs Fixing
Review via email: mp+81227@code.launchpad.net

This proposal has been superseded by a proposal from 2011-12-20.

Description of the change

Add support for verifying SSL certificates when using urllib.

This branch does actually work, but it's got some open ends:

* The location with the CAs to trust differs per system. Rather than hardcoding
  a list, I think we should have a configuration option that allows the user
  to specify what CA list they want to verify against.

  This option could then also be pointed at a user-specific list
  (rather than a system wide list). And packagers can put in the default
  for their distribution.

  Where do I obtain this option from, which config stack ?

* We probably want some way to specify that self-signed certificates are ok.
  Probably also a new option? We should probably still warn the user
  when a self-signed certificate is being trusted.

* There are no additional tests. There probably should be some, at least for
  e.g. the match_hostname helper.

* I have the feeling that I'm missing something. This was way too easy.
  What is it?

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

On 4 November 2011 14:03, Jelmer Vernooij <email address hidden> wrote:
> Jelmer Vernooij has proposed merging lp:~jelmer/bzr/urllib-verifies-ssl-certs into lp:bzr.
>
> Requested reviews:
>  bzr-core (bzr-core)
> Related bugs:
>  Bug #125055 in Bazaar: "defaulting to pycurl doesn't make sense anymore"
>  https://bugs.launchpad.net/bzr/+bug/125055
>  Bug #651161 in Bazaar: "bzr fails to verify ssl validity in https connections - by default --> as pycurl isn't a dep only a suggestion "
>  https://bugs.launchpad.net/bzr/+bug/651161
>
> For more details, see:
> https://code.launchpad.net/~jelmer/bzr/urllib-verifies-ssl-certs/+merge/81227
>
> Add support for verifying SSL certificates when using urllib.

That's great, thanks.

> This branch does actually work, but it's got some open ends:
>
> * The location with the CAs to trust differs per system. Rather than hardcoding
>  a list, I think we should have a configuration option that allows the user
>  to specify what CA list they want to verify against.
>
>  This option could then also be pointed at a user-specific list
>  (rather than a system wide list). And packagers can put in the default
>  for their distribution.
>
>  Where do I obtain this option from, which config stack ?

[needsfixing] I think the global stack would be a good place to start
- it would seem like a kind of contrived case to want different values
on a smaller scope. We can always change it later.

> * We probably want some way to specify that self-signed certificates are ok.
>  Probably also a new option? We should probably still warn the user
>  when a self-signed certificate is being trusted.

+1 to a new option and to always warning.

Though perhaps it would be equivalent and easier to just have an
option saying "don't do validation."

> * There are no additional tests. There probably should be some, at least for
>  e.g. the match_hostname helper.

It would be nice, if it's not too hard, to actually test against an
untrusted cert, or at least to do monkeypatching to make it look like
we got an invalid cert.

[needsfixing] I'd like a unit test for match_hostname covering the
various cases.

> * I have the feeling that I'm missing something. This was way too easy.
>  What is it?

I haven't read the rfc but it looks plausible. You could wait for a
review from vila, who has probably touched this most, or robert who is
an http nerd.

I wonder if this needs a change to Windows or Mac packaging to make
sure the certs are there. Or, possibly we can get them from the
system's keyring in some way.

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.0 KiB)

> Add support for verifying SSL certificates when using urllib.

\o/

>
> This branch does actually work, but it's got some open ends:
>
> * The location with the CAs to trust differs per system. Rather than hardcoding
> a list, I think we should have a configuration option that allows the user
> to specify what CA list they want to verify against.

pycurl already handles that for windows but getting it right in the general
case... sounds like a packaging issue :-/

>
> This option could then also be pointed at a user-specific list
> (rather than a system wide list). And packagers can put in the default
> for their distribution.
>
> Where do I obtain this option from, which config stack ?

I'd go with global to start with specifying the default value at
registration time if possible.

Arguably it's a system-wide config file that should be used, making packager
life easier.

>
> * We probably want some way to specify that self-signed certificates are ok.
> Probably also a new option? We should probably still warn the user
> when a self-signed certificate is being trusted.

authentication.conf is the right place for that and the option is even
documented but not implemented ;)

>
> * There are no additional tests. There probably should be some, at least for
> e.g. the match_hostname helper.

Yup. There are already https tests for pycurl that should be activated now
(<broken record> hey, did I mention the 'scenarios' attribute is cool for
that ? </broken record> ;)

>
> * I have the feeling that I'm missing something. This was way too easy.
> What is it?

Hehe.

The last time I worked on this area we were still supporting python-2.4/2.5
where https wasn't supported. Now, you also found the relevant code in
python-3.2 so... that certainly simplify things *a lot* :)

This wasn't a full review but as I understand it you weren't requesting one
either ;)

As for tests, if you look at the pycurl ones you'll certainly find that most
of the needed infrastructure is there to build certificates for special
purposes (self-signed, obsolete, etc) and easily setup an https server.

99 + SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 rules
100 + are mostly followed, but IP addresses are not accepted for *hostname*.

Hmm, IP addresses not accepted ? I bet our tests won't like that much.

pycurl itself was not open enough to test for these weird certificates AFAIR
but it's also a robust implementation so I wasn't concerned, but if we make
urllib the default one, we have to make sure we're making the right checks.

I'll need to dig a bit more to find back the relevant discussions about this
topic.

Also, defaulting to urllib is fine but the documentation probably needs to
be updated (created even ?) to make it clear that we're not defaulting to
pycurl. We had issues in the past (including failures to check lp
certificate) which had led to the actual defaults (urllib for http, pycurl
for https), let's make the switch crystal clear.

141 def connect_to_origin(self):
142 - ssl_sock = _ssl_wrap_socket(self.sock, self.key_file, self.cert_file)
143 + ssl_sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file,
144 + ...

Read more...

Revision history for this message
Martin Packman (gz) wrote :

I think there's general agreement this is great, and with the reqirement of Python 2.6 something that's now acheivable and should probably get into the next bzr release.

It absolutely needs some kind of automated testing though, and likely an option for disabling checks, I've used `curl -k` and `wget --no-check-certificate` enough to know I'd be annoyed if that option wasn't there.

review: Needs Fixing
6243. By Jelmer Vernooij

Merge bzr.dev.

6244. By Jelmer Vernooij

Add ssl.ca_certificates setting.

6245. By Jelmer Vernooij

Add ssl.cert_reqs.

6246. By Jelmer Vernooij

Move options to bzrlib/transport/http/_urllib2_wrappers.py

6247. By Jelmer Vernooij

Add some tests for ssl.ca_certs config option.

6248. By Jelmer Vernooij

Add more tests for ssl.ca_certs option.

6249. By Jelmer Vernooij

add basic tests for match_hostname.

6250. By Jelmer Vernooij

Note new SSL certificate verification in whats-new.

6251. By Jelmer Vernooij

add notes on using -Ossl.cert_reqs=none.

6252. By Jelmer Vernooij

Urgh - only warn if there actually isn't a ca certs file set.

6253. By Jelmer Vernooij

merge bzr.dev.

6254. By Jelmer Vernooij

Merge bzr.dev.

6255. By Jelmer Vernooij

merge bzr.dev.

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/config.py'
2--- bzrlib/config.py 2011-12-19 16:41:49 +0000
3+++ bzrlib/config.py 2011-12-20 09:40:32 +0000
4@@ -1662,7 +1662,6 @@
5 return value.decode(osutils.get_user_encoding())
6 return unicode_str
7
8-
9 def _auto_user_id():
10 """Calculate automatic user identification.
11
12@@ -2761,6 +2760,13 @@
13 help="If we wait for a new request from a client for more than"
14 " X seconds, consider the client idle, and hangup."))
15
16+option_registry.register_lazy('ssl.ca_certs',
17+ 'bzrlib.transport.http._urllib2_wrappers', 'opt_ssl_ca_certs')
18+
19+option_registry.register_lazy('ssl.cert_reqs',
20+ 'bzrlib.transport.http._urllib2_wrappers', 'opt_ssl_cert_reqs')
21+
22+
23
24 class Section(object):
25 """A section defines a dict of option name => value.
26
27=== modified file 'bzrlib/errors.py'
28--- bzrlib/errors.py 2011-12-19 13:23:58 +0000
29+++ bzrlib/errors.py 2011-12-20 09:40:32 +0000
30@@ -1655,6 +1655,14 @@
31 TransportError.__init__(self, msg, orig_error=orig_error)
32
33
34+class CertificateError(TransportError):
35+
36+ _fmt = "Certificate error: %(error)s"
37+
38+ def __init__(self, error):
39+ self.error = error
40+
41+
42 class InvalidHttpRange(InvalidHttpResponse):
43
44 _fmt = "Invalid http range %(range)r for %(path)s: %(msg)s"
45
46=== modified file 'bzrlib/tests/test_http.py'
47--- bzrlib/tests/test_http.py 2011-10-12 16:00:13 +0000
48+++ bzrlib/tests/test_http.py 2011-12-20 09:40:32 +0000
49@@ -32,7 +32,6 @@
50 import bzrlib
51 from bzrlib import (
52 bzrdir,
53- cethread,
54 config,
55 debug,
56 errors,
57
58=== modified file 'bzrlib/transport/__init__.py'
59--- bzrlib/transport/__init__.py 2011-12-19 10:58:39 +0000
60+++ bzrlib/transport/__init__.py 2011-12-20 09:40:32 +0000
61@@ -1765,15 +1765,15 @@
62 help="Read-only access of branches exported on the web.")
63 register_transport_proto('https://',
64 help="Read-only access of branches exported on the web using SSL.")
65-# The default http implementation is urllib, but https is pycurl if available
66+# The default http implementation is urllib
67 register_lazy_transport('http://', 'bzrlib.transport.http._pycurl',
68 'PyCurlTransport')
69+register_lazy_transport('https://', 'bzrlib.transport.http._pycurl',
70+ 'PyCurlTransport')
71 register_lazy_transport('http://', 'bzrlib.transport.http._urllib',
72 'HttpTransport_urllib')
73 register_lazy_transport('https://', 'bzrlib.transport.http._urllib',
74 'HttpTransport_urllib')
75-register_lazy_transport('https://', 'bzrlib.transport.http._pycurl',
76- 'PyCurlTransport')
77
78 register_transport_proto('ftp://', help="Access using passive FTP.")
79 register_lazy_transport('ftp://', 'bzrlib.transport.ftp', 'FtpTransport')
80
81=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
82--- bzrlib/transport/http/_urllib2_wrappers.py 2011-12-19 13:23:58 +0000
83+++ bzrlib/transport/http/_urllib2_wrappers.py 2011-12-20 09:40:32 +0000
84@@ -63,12 +63,53 @@
85 config,
86 debug,
87 errors,
88+ lazy_import,
89 osutils,
90 trace,
91 transport,
92 urlutils,
93 )
94-
95+lazy_import.lazy_import(globals(), """
96+import ssl
97+""")
98+
99+
100+def default_ca_certs():
101+ """Find the default file with ca certificates."""
102+ return u"/etc/ssl/certs/ca-certificates.crt"
103+
104+def default_cert_reqs():
105+ return u"required"
106+
107+def cert_reqs_from_store(unicode_str):
108+ import ssl
109+ try:
110+ return {
111+ "required": ssl.CERT_REQUIRED,
112+ "optional": ssl.CERT_OPTIONAL,
113+ "none": ssl.CERT_NONE
114+ }[unicode_str]
115+ except KeyError:
116+ raise ValueError("invalid value %s" % unicode_str)
117+
118+
119+opt_ssl_ca_certs = config.Option('ssl.ca_certs',
120+ default=default_ca_certs,
121+ help="""\
122+Path to certification authority certificates to trust.
123+""")
124+
125+opt_ssl_cert_reqs = config.Option('ssl.cert_reqs',
126+ default=default_cert_reqs,
127+ from_unicode=cert_reqs_from_store,
128+ help="""\
129+Whether to require a certificate from the remote side. (default:required)
130+
131+Possible values:
132+ * none: certificates ignored
133+ * optional: Certificates not required, but validated if provided
134+ * required: Certificates required, and validated
135+""")
136
137 checked_kerberos = False
138 kerberos = None
139@@ -312,17 +353,60 @@
140 self._wrap_socket_for_reporting(self.sock)
141
142
143-# Build the appropriate socket wrapper for ssl
144-try:
145- # python 2.6 introduced a better ssl package
146- import ssl
147- _ssl_wrap_socket = ssl.wrap_socket
148-except ImportError:
149- # python versions prior to 2.6 don't have ssl and ssl.wrap_socket instead
150- # they use httplib.FakeSocket
151- def _ssl_wrap_socket(sock, key_file, cert_file):
152- ssl_sock = socket.ssl(sock, key_file, cert_file)
153- return httplib.FakeSocket(sock, ssl_sock)
154+# These two methods were imported from Python 3.2's ssl module
155+
156+def _dnsname_to_pat(dn):
157+ pats = []
158+ for frag in dn.split(r'.'):
159+ if frag == '*':
160+ # When '*' is a fragment by itself, it matches a non-empty dotless
161+ # fragment.
162+ pats.append('[^.]+')
163+ else:
164+ # Otherwise, '*' matches any dotless fragment.
165+ frag = re.escape(frag)
166+ pats.append(frag.replace(r'\*', '[^.]*'))
167+ return re.compile(r'\A' + r'\.'.join(pats) + r'\Z', re.IGNORECASE)
168+
169+
170+def match_hostname(cert, hostname):
171+ """Verify that *cert* (in decoded format as returned by
172+ SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 rules
173+ are mostly followed, but IP addresses are not accepted for *hostname*.
174+
175+ CertificateError is raised on failure. On success, the function
176+ returns nothing.
177+ """
178+ if not cert:
179+ raise ValueError("empty or no certificate")
180+ dnsnames = []
181+ san = cert.get('subjectAltName', ())
182+ for key, value in san:
183+ if key == 'DNS':
184+ if _dnsname_to_pat(value).match(hostname):
185+ return
186+ dnsnames.append(value)
187+ if not san:
188+ # The subject is only checked when subjectAltName is empty
189+ for sub in cert.get('subject', ()):
190+ for key, value in sub:
191+ # XXX according to RFC 2818, the most specific Common Name
192+ # must be used.
193+ if key == 'commonName':
194+ if _dnsname_to_pat(value).match(hostname):
195+ return
196+ dnsnames.append(value)
197+ if len(dnsnames) > 1:
198+ raise errors.CertificateError("hostname %r "
199+ "doesn't match either of %s"
200+ % (hostname, ', '.join(map(repr, dnsnames))))
201+ elif len(dnsnames) == 1:
202+ raise errors.CertificateError("hostname %r "
203+ "doesn't match %r"
204+ % (hostname, dnsnames[0]))
205+ else:
206+ raise errors.CertificateError("no appropriate commonName or "
207+ "subjectAltName fields were found")
208
209
210 class HTTPSConnection(AbstractHTTPConnection, httplib.HTTPSConnection):
211@@ -345,7 +429,19 @@
212 self.connect_to_origin()
213
214 def connect_to_origin(self):
215- ssl_sock = _ssl_wrap_socket(self.sock, self.key_file, self.cert_file)
216+ # FIXME JRV 2011-12-18: Use location config here?
217+ config_stack = config.GlobalStack()
218+ ca_certs = config_stack.get('ssl.ca_certs')
219+ cert_reqs = config_stack.get('ssl.cert_reqs')
220+ if cert_reqs == "none":
221+ trace.warning("not checking SSL certificates for %s: %d",
222+ self.host, self.port)
223+ ssl_sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file,
224+ cert_reqs=cert_reqs, ca_certs=ca_certs)
225+ peer_cert = ssl_sock.getpeercert()
226+ if cert_reqs == "required" or peer_cert:
227+ match_hostname(peer_cert, self.host)
228+
229 # Wrap the ssl socket before anybody use it
230 self._wrap_socket_for_reporting(ssl_sock)
231
232
233=== modified file 'doc/en/release-notes/bzr-2.5.txt'
234--- doc/en/release-notes/bzr-2.5.txt 2011-12-19 16:41:49 +0000
235+++ doc/en/release-notes/bzr-2.5.txt 2011-12-20 09:40:32 +0000
236@@ -408,6 +408,12 @@
237 to 'line-based'. This replace the previous shelf UI only patch using
238 ``INSIDE_EMACS``. (Benoît Pierre)
239
240+* urllib-based HTTPS client connections now verify the server certificate
241+ as well as the hostname. (Jelmer Vernooij)
242+
243+* urllib-based HTTPS connections are now the default.
244+ (Jelmer Vernooij, #125055, #651161)
245+
246 Bug Fixes
247 *********
248