Merge lp:~lifeless/bzr/bug-558343-wrong-host-with-proxy into lp:bzr

Proposed by Robert Collins
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 5220
Proposed branch: lp:~lifeless/bzr/bug-558343-wrong-host-with-proxy
Merge into: lp:bzr
Diff against target: 70 lines (+19/-13)
2 files modified
NEWS (+3/-0)
bzrlib/transport/http/_urllib2_wrappers.py (+16/-13)
To merge this branch: bzr merge lp:~lifeless/bzr/bug-558343-wrong-host-with-proxy
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
bzr-core Pending
Review via email: mp+24938@code.launchpad.net

Commit message

Fix lp: urls behind an https proxy.

Description of the change

This should fix using lp: urls from behind an https proxy; no tests - it appears remarkably opaque :(.

Would love vincent's opinion on it, as I think he knows urllib2 guts much better than I.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

Oh, and as I'm travelling to UDS, please land if its ok (or fix and land) - this is fairly high importance IMO.

Revision history for this message
Vincent Ladeuil (vila) wrote :

At first glance, the following line:

26 + self.add_unredirected_header('Host', self.proxied_host)

looks obvious.

From memory, I'm pretty sure the Host header is added just before sending the request
if it's not there already. Yet, I also have some vague feeling that playing with *this* header
led to trouble in the past (this fix could have avoid that may be).

I'd blame the lack of test for this omission. Am I right in thinking that
the bug requires a redirection behind the proxy to trigger ?

If yes, that could explain why we didn't get more reports (and the NEWS entry may
be more precise then).

Since proxy and redirection tests are pretty hackish so far, there may be a way to test
this with pre-canned responses (I think there is some test http
server than can be used for that).

I won't have time to look at it until UDS so let's talk about it there.

Otherwise, thanks for the cleanup, we may also want to use 'http' in debug_flags instead of
debug_level, that will help diagnosing issues in the future.

Revision history for this message
Robert Collins (lifeless) wrote :

Yes, when I stepped through the code, urllib2 adds a host header if
there isn't one, and its grabbing the wrong host value because
'has_proxy' is returning False.

And yes, you have to have an https proxy defined to trigger this - see
my comments in the bug about reproducing it.

Sounds like you think this is ok? Could you land it(I'm hopping on a
plane in a minute for the next leg).

Thanks,
Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

I found time to look at the code. There is already a huge list of missing tests under a FIXME there (including the one that is missing here).

And yes, the fix is right and urllib2 didn't have the bug because it didn't implement CONNECT anyway.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

submitted to PQM by hand.

Revision history for this message
Robert Collins (lifeless) wrote :

submitted to PQM by hand.

Revision history for this message
Robert Collins (lifeless) wrote :

submitted to PQM by hand.

Revision history for this message
Robert Collins (lifeless) wrote :

submitted to PQM by hand.

Revision history for this message
Robert Collins (lifeless) wrote :

submitted to PQM by hand.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2010-05-06 06:51:11 +0000
+++ NEWS 2010-05-08 06:00:46 +0000
@@ -101,6 +101,9 @@
101 messages.101 messages.
102 (Parth Malwankar, #563646)102 (Parth Malwankar, #563646)
103103
104* Using bzr with `lp:` urls behind an http proxy should work.
105 (Robert Collins, #558343)
106
104Improvements107Improvements
105************108************
106109
107110
=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- bzrlib/transport/http/_urllib2_wrappers.py 2010-02-17 17:11:16 +0000
+++ bzrlib/transport/http/_urllib2_wrappers.py 2010-05-08 06:00:46 +0000
@@ -378,6 +378,11 @@
378 port = conn_class.default_port378 port = conn_class.default_port
379 self.proxied_host = '%s:%s' % (host, port)379 self.proxied_host = '%s:%s' % (host, port)
380 urllib2.Request.set_proxy(self, proxy, type)380 urllib2.Request.set_proxy(self, proxy, type)
381 # When urllib2 makes a https request with our wrapper code and a proxy,
382 # it sets Host to the https proxy, not the host we want to talk to.
383 # I'm fairly sure this is our fault, but what is the cause is an open
384 # question. -- Robert Collins May 8 2010.
385 self.add_unredirected_header('Host', self.proxied_host)
381386
382387
383class _ConnectRequest(Request):388class _ConnectRequest(Request):
@@ -711,7 +716,7 @@
711 connect = _ConnectRequest(request)716 connect = _ConnectRequest(request)
712 response = self.parent.open(connect)717 response = self.parent.open(connect)
713 if response.code != 200:718 if response.code != 200:
714 raise ConnectionError("Can't connect to %s via proxy %s" % (719 raise errors.ConnectionError("Can't connect to %s via proxy %s" % (
715 connect.proxied_host, self.host))720 connect.proxied_host, self.host))
716 # Housekeeping721 # Housekeeping
717 connection.cleanup_pipe()722 connection.cleanup_pipe()
@@ -868,21 +873,19 @@
868 print 'Will unbind %s_open for %r' % (type, proxy)873 print 'Will unbind %s_open for %r' % (type, proxy)
869 delattr(self, '%s_open' % type)874 delattr(self, '%s_open' % type)
870875
876 def bind_scheme_request(proxy, scheme):
877 if proxy is None:
878 return
879 scheme_request = scheme + '_request'
880 if self._debuglevel >= 3:
881 print 'Will bind %s for %r' % (scheme_request, proxy)
882 setattr(self, scheme_request,
883 lambda request: self.set_proxy(request, scheme))
871 # We are interested only by the http[s] proxies884 # We are interested only by the http[s] proxies
872 http_proxy = self.get_proxy_env_var('http')885 http_proxy = self.get_proxy_env_var('http')
886 bind_scheme_request(http_proxy, 'http')
873 https_proxy = self.get_proxy_env_var('https')887 https_proxy = self.get_proxy_env_var('https')
874888 bind_scheme_request(https_proxy, 'https')
875 if http_proxy is not None:
876 if self._debuglevel >= 3:
877 print 'Will bind http_request for %r' % http_proxy
878 setattr(self, 'http_request',
879 lambda request: self.set_proxy(request, 'http'))
880
881 if https_proxy is not None:
882 if self._debuglevel >= 3:
883 print 'Will bind http_request for %r' % https_proxy
884 setattr(self, 'https_request',
885 lambda request: self.set_proxy(request, 'https'))
886889
887 def get_proxy_env_var(self, name, default_to='all'):890 def get_proxy_env_var(self, name, default_to='all'):
888 """Get a proxy env var.891 """Get a proxy env var.