Merge lp:~vila/bzr/383920-pycurl-hangs into lp:~bzr/bzr/trunk-old

Proposed by Vincent Ladeuil
Status: Rejected
Rejected by: Vincent Ladeuil
Proposed branch: lp:~vila/bzr/383920-pycurl-hangs
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 196 lines
To merge this branch: bzr merge lp:~vila/bzr/383920-pycurl-hangs
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+8295@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Nasty bug here: pycurl was hanging when connecting in https mode to our test server
due to a missing Content-Length header in the python's BaseHttpServer implementation.

We didn't realize it sooner because it's triggered only when we test pycurl against
our https test server which requires both python2.6 and pycurl, which in turn seems
to be rather unsual.

A big thank you to Pavel, Russel and Martitza for bringing that up !

Revision history for this message
Robert Collins (lifeless) wrote :

I think your analysis of the problem is slightly off.

HTTP says (broadly):
- 1.1: connections are persistent unless otherwise noted/prevented
- 1.0: connections are non-persistent unless otherwise requested and not
       prevented
- all messages are delimited
 - by Content-Length or
 - TE or
 - dropping the connection

So, if:
 - we're sending 1.1 as the version
 - we were not sending Content-Length, nor doing TE
 - and not dropping the connection

then the server was fundamentally broken, not that the client was being
overly strict.

Is the diff accurate? it seems to have bundle stuff in it...

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins <email address hidden> writes:

    robert> I think your analysis of the problem is slightly off.
    robert> HTTP says (broadly):
    robert> - 1.1: connections are persistent unless otherwise noted/prevented
    robert> - 1.0: connections are non-persistent unless otherwise requested and not
    robert> prevented
    robert> - all messages are delimited
    robert> - by Content-Length or
    robert> - TE or
    robert> - dropping the connection

    robert> So, if:
    robert> - we're sending 1.1 as the version
    robert> - we were not sending Content-Length, nor doing TE
    robert> - and not dropping the connection

    robert> then the server was fundamentally broken, not that
    robert> the client was being overly strict.

Right. So the test server was:
- presenting itself as 1.1,
- not sending the Content-Length,
- sending 'Connection: close',
- closing the connection

and the client hanged, so following your arguments, it means the
*client* is broken, right ? Or is there still some grey area
because the server close on error while announcing 1.1 ?

Would you prefer KnownFailures for pycurl+https instead ?

    robert> Is the diff accurate? it seems to have bundle stuff in it...

Yes to both points, the commit messages says... d-a-m-n--i-t,
commit messages don't appear in mails, <suffering/> right, I
should have think about that and explain in the cover letter :-/

A test was added there (long ago, which breaks under the same conditions
than the others, to avoid further regressions there, I refactor
the three existing tests, since there was a bunch of duplication,
that makes some noise.

     Vincent

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-07-07 at 13:15 +0000, Vincent Ladeuil wrote:

> Right. So the test server was:
> - presenting itself as 1.1,
> - not sending the Content-Length,
> - sending 'Connection: close',
> - closing the connection

This is valid. How were we closing it? shutdown(both), or halfshutdown,
then close, or ...? Perhaps this question doesn't matter, if doing
exactly the same thing but adding the header was enough to stop the
symptoms.

> and the client hanged, so following your arguments, it means the
> *client* is broken, right ? Or is there still some grey area
> because the server close on error while announcing 1.1 ?

Is it using https, or just http? I can imagine that there is some https
teardown stuff that perhaps we aren't doing right.

> Would you prefer KnownFailures for pycurl+https instead ?

I'd prefer to know the real cause. If valid http behaviour makes pycurl
hang, then real servers on the internet will be able to make bzr hang,
and thats bad. As it stands, I feel that landing this is a bit of an
anti-fix, as it is a workaround we can't reasonably ask live internet
servers to make.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

31 + def send_error(self, code, message=None):
32 + """Send and log an error reply.
33 +
34 + Replace the bogus BaseHTTPServer.py with one that add the
35 + Content-Length header to address strict http clients.
36 + """

Please add a comment pointing to the bug so we know what particular problem this causes.

+ self.send_header("Content-Length", "%d" % len(content))

I'd be a bit concerned about this being wrong if the message happens to be unicode - this will get the length in characters and then it'll be written as bytes. But maybe that never happens.

57 + if self.command != 'HEAD' and code >= 200 and code not in (204, 304):
58 + self.wfile.write(content)

This should possibly handle OPTIONS too, though I doubt we ever send it and it may not even reach this function, so it's academic.

Revision history for this message
Martin Pool (mbp) wrote :

> On Tue, 2009-07-07 at 13:15 +0000, Vincent Ladeuil wrote:
>
> > Right. So the test server was:
> > - presenting itself as 1.1,
> > - not sending the Content-Length,
> > - sending 'Connection: close',
> > - closing the connection
>
> This is valid. How were we closing it? shutdown(both), or halfshutdown,
> then close, or ...? Perhaps this question doesn't matter, if doing
> exactly the same thing but adding the header was enough to stop the
> symptoms.

I can imagine some deadlocks occurring because they're both in the same process - perhaps the socket is relying on gc collect and it's not running before the deadlock occurs.

> > Would you prefer KnownFailures for pycurl+https instead ?
>
> I'd prefer to know the real cause. If valid http behaviour makes pycurl
> hang, then real servers on the internet will be able to make bzr hang,
> and thats bad. As it stands, I feel that landing this is a bit of an
> anti-fix, as it is a workaround we can't reasonably ask live internet
> servers to make.

I don't think we should hold this patch hostage to that. It's making the test server more correct and it's fixing a real problem that the test suite hangs. If we get reports that bzr hangs against real http/1.1 servers, which I don't remember seeing so far, then we can look at how to test that.

Revision history for this message
Martin Pool (mbp) wrote :

ie approve

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2009-07-08 at 00:09 +0000, Martin Pool wrote:

> I can imagine some deadlocks occurring because they're both in the
> same process - perhaps the socket is relying on gc collect and it's
> not running before the deadlock occurs.

Thats another possibility yes. Good catch.

> > > Would you prefer KnownFailures for pycurl+https instead ?
> >
> > I'd prefer to know the real cause. If valid http behaviour makes
> pycurl
> > hang, then real servers on the internet will be able to make bzr
> hang,
> > and thats bad. As it stands, I feel that landing this is a bit of an
> > anti-fix, as it is a workaround we can't reasonably ask live
> internet
> > servers to make.
>
> I don't think we should hold this patch hostage to that. It's making
> the test server more correct and it's fixing a real problem that the
> test suite hangs. If we get reports that bzr hangs against real
> http/1.1 servers, which I don't remember seeing so far, then we can
> look at how to test that.

The patch *does not* make the server more correct. It does prevent
hangs, but still, I think we should not stop looking at this till we
know the actual cause. Otherwise we've only half completed the root
cause analysis.

One reason we probably don't see reports of hangs, is that error cases
are relatively rare for bzr - we generally request resources that exist.

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins <email address hidden> writes:

    robert> On Tue, 2009-07-07 at 13:15 +0000, Vincent Ladeuil wrote:
    >> Right. So the test server was:
    >> - presenting itself as 1.1,
    >> - not sending the Content-Length,
    >> - sending 'Connection: close',
    >> - closing the connection

    robert> This is valid. How were we closing it? shutdown(both),

No

    robert> or halfshutdown,

Not sure what you mean.

    robert> then close, or ...?

Close the read and write parts of the socket. Simple closes.

    robert> Perhaps this question doesn't matter, if doing
    robert> exactly the same thing but adding the header was
    robert> enough to stop the symptoms.

I was about to either:
- look at the C code,
- test against real servers (via my local_test_server) plugins,
- keep thinking...

And then I thought about the send_error method that I didn't
fully trust in the past and thought, well, let's just try
that.. and it worked.

    >> and the client hanged, so following your arguments, it
    >> means the *client* is broken, right ? Or is there still
    >> some grey area because the server close on error while
    >> announcing 1.1 ?

    robert> Is it using https, or just http?

https *only*, http is fine...

    robert> I can imagine that there is some https teardown stuff
    robert> that perhaps we aren't doing right.

Whatever we do, a network connection loss will not, the client is
not supposed to hang either in that case no ?

    >> Would you prefer KnownFailures for pycurl+https instead ?

    robert> I'd prefer to know the real cause.

Cost/benefit analysis stopped me there, whatever the bug is, it
came from a change in either python or pycurl as I can guarantee
that at one point I had a python2.6/https/pycurl combo passing
the full test suite. It's a pity I didn't take note of the
precise versions involved.

    robert> If valid http behaviour makes pycurl hang, then real
    robert> servers on the internet will be able to make bzr
    robert> hang, and thats bad.

I totally share that view.

    robert> As it stands, I feel that landing this is a bit of an
    robert> anti-fix, as it is a workaround we can't reasonably
    robert> ask live internet servers to make.

Iff there are servers in the wild acting as described above (1.1,
close on error without Content-Length)...

My plan for that is to keep adding real servers to the
local_test_server plugin.

Overall, I think my time is better spent that way than fixing
bugs in pycurl instead of working on certificate validation for
urllib...

        Vincent

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-07-06 06:46:30 +0000
3+++ NEWS 2009-07-08 06:35:20 +0000
4@@ -75,6 +75,10 @@
5 of a remote repository.
6 (Jelmer Vernooij, #332194)
7
8+* Fix bogus ``send_error`` in python's BaseHTTPServer by inserting the missing
9+ 'Content-Length header. https pycurl clients can hang otherwise.
10+ (Vincent Ladeuil, #383920).
11+
12 * Force deletion of readonly files during merge, update and other tree
13 transforms.
14 (Craig Hewetson, Martin Pool, #218206)
15
16=== modified file 'bzrlib/tests/http_server.py'
17--- bzrlib/tests/http_server.py 2009-03-23 14:59:43 +0000
18+++ bzrlib/tests/http_server.py 2009-07-08 06:35:20 +0000
19@@ -14,6 +14,7 @@
20 # along with this program; if not, write to the Free Software
21 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
22
23+import BaseHTTPServer
24 import errno
25 import httplib
26 import os
27@@ -144,6 +145,35 @@
28
29 return SimpleHTTPServer.SimpleHTTPRequestHandler.send_head(self)
30
31+ def send_error(self, code, message=None):
32+ """Send and log an error reply.
33+
34+ Replace the bogus BaseHTTPServer.py with one that add the
35+ Content-Length header to address strict http clients.
36+ """
37+
38+ try:
39+ short, long = self.responses[code]
40+ except KeyError:
41+ short, long = '???', '???'
42+ if message is None:
43+ message = short
44+ explain = long
45+ self.log_error("code %d, message %s", code, message)
46+ # using _quote_html to prevent Cross Site Scripting attacks (see bug
47+ # #1100201)
48+ content = (self.error_message_format %
49+ {'code': code,
50+ 'message': BaseHTTPServer._quote_html(message),
51+ 'explain': explain})
52+ self.send_response(code, message)
53+ self.send_header("Content-Length", "%d" % len(content))
54+ self.send_header("Content-Type", self.error_content_type)
55+ self.send_header('Connection', 'close')
56+ self.end_headers()
57+ if self.command != 'HEAD' and code >= 200 and code not in (204, 304):
58+ self.wfile.write(content)
59+
60 def send_range_content(self, file, start, length):
61 file.seek(start)
62 self.wfile.write(file.read(length))
63
64=== modified file 'bzrlib/tests/test_read_bundle.py'
65--- bzrlib/tests/test_read_bundle.py 2009-03-23 14:59:43 +0000
66+++ bzrlib/tests/test_read_bundle.py 2009-07-08 06:35:20 +0000
67@@ -84,70 +84,54 @@
68 class TestReadBundleFromURL(TestTransportImplementation):
69 """Test that read_bundle works properly across multiple transports"""
70
71+ def setUp(self):
72+ super(TestReadBundleFromURL, self).setUp()
73+ self.bundle_name = 'test_bundle'
74+ # read_mergeable_from_url will invoke get_transport which may *not*
75+ # respect self._transport (i.e. returns a transport that is different
76+ # from the one we want to test, so we must inject a correct transport
77+ # into possible_transports first.
78+ self.possible_transports = [self.get_transport(self.bundle_name)]
79+ self._captureVar('BZR_NO_SMART_VFS', None)
80+ wt = self.create_test_bundle()
81+
82+ def read_mergeable_from_url(self, url):
83+ return bzrlib.bundle.read_mergeable_from_url(
84+ url, possible_transports=self.possible_transports)
85+
86 def get_url(self, relpath=''):
87 return bzrlib.urlutils.join(self._server.get_url(), relpath)
88
89 def create_test_bundle(self):
90 out, wt = create_bundle_file(self)
91 if self.get_transport().is_readonly():
92- f = open('test_bundle', 'wb')
93- try:
94- f.write(out.getvalue())
95- finally:
96- f.close()
97+ self.build_tree_contents([(self.bundle_name, out.getvalue())])
98 else:
99- self.get_transport().put_file('test_bundle', out)
100- self.log('Put to: %s', self.get_url('test_bundle'))
101+ self.get_transport().put_file(self.bundle_name, out)
102+ self.log('Put to: %s', self.get_url(self.bundle_name))
103 return wt
104
105 def test_read_mergeable_from_url(self):
106- self._captureVar('BZR_NO_SMART_VFS', None)
107- wt = self.create_test_bundle()
108- if wt is None:
109- return
110- # read_mergeable_from_url will invoke get_transport which may *not*
111- # respect self._transport (i.e. returns a transport that is different
112- # from the one we want to test, so we must inject a correct transport
113- # into possible_transports first.
114- t = self.get_transport('test_bundle')
115- possible_transports = [t]
116- info = bzrlib.bundle.read_mergeable_from_url(
117- unicode(self.get_url('test_bundle')),
118- possible_transports=possible_transports)
119+ info = self.read_mergeable_from_url(
120+ unicode(self.get_url(self.bundle_name)))
121 revision = info.real_revisions[-1]
122 self.assertEqual('commit-1', revision.revision_id)
123
124 def test_read_fail(self):
125 # Trying to read from a directory, or non-bundle file
126 # should fail with NotABundle
127- self._captureVar('BZR_NO_SMART_VFS', None)
128- wt = self.create_test_bundle()
129- if wt is None:
130- return
131-
132- self.assertRaises(errors.NotABundle,
133- bzrlib.bundle.read_mergeable_from_url,
134- self.get_url('tree'))
135- self.assertRaises(errors.NotABundle,
136- bzrlib.bundle.read_mergeable_from_url,
137- self.get_url('tree/a'))
138+ self.assertRaises(errors.NotABundle,
139+ self.read_mergeable_from_url, self.get_url('tree'))
140+ self.assertRaises(errors.NotABundle,
141+ self.read_mergeable_from_url, self.get_url('tree/a'))
142
143 def test_read_mergeable_respects_possible_transports(self):
144- t = self.get_transport('test_bundle')
145- if not isinstance(t, bzrlib.transport.ConnectedTransport):
146+ if not isinstance(self.get_transport(self.bundle_name),
147+ bzrlib.transport.ConnectedTransport):
148 # There is no point testing transport reuse for not connected
149 # transports (the test will fail even).
150- return
151- self._captureVar('BZR_NO_SMART_VFS', None)
152- wt = self.create_test_bundle()
153- if wt is None:
154- return
155- # read_mergeable_from_url will invoke get_transport which may *not*
156- # respect self._transport (i.e. returns a transport that is different
157- # from the one we want to test, so we must inject a correct transport
158- # into possible_transports first.
159- possible_transports = [t]
160- url = unicode(self.get_url('test_bundle'))
161- info = bzrlib.bundle.read_mergeable_from_url(url,
162- possible_transports=possible_transports)
163- self.assertEqual(1, len(possible_transports))
164+ raise tests.TestSkipped(
165+ 'Need a ConnectedTransport to test transport reuse')
166+ url = unicode(self.get_url(self.bundle_name))
167+ info = self.read_mergeable_from_url(url)
168+ self.assertEqual(1, len(self.possible_transports))
169
170=== modified file 'bzrlib/tests/test_transport_implementations.py'
171--- bzrlib/tests/test_transport_implementations.py 2009-04-20 04:19:45 +0000
172+++ bzrlib/tests/test_transport_implementations.py 2009-07-08 06:35:21 +0000
173@@ -203,6 +203,13 @@
174 for content, f in itertools.izip(contents, content_f):
175 self.assertEqual(content, f.read())
176
177+ def test_get_unknown_file(self):
178+ t = self.get_transport()
179+ files = ['a', 'b']
180+ contents = ['contents of a\n',
181+ 'contents of b\n',
182+ ]
183+ self.build_tree(files, transport=t, line_endings='binary')
184 self.assertRaises(NoSuchFile, t.get, 'c')
185 self.assertListRaises(NoSuchFile, t.get_multi, ['a', 'b', 'c'])
186 self.assertListRaises(NoSuchFile, t.get_multi, iter(['a', 'b', 'c']))
187@@ -242,6 +249,9 @@
188 for content, fname in zip(contents, files):
189 self.assertEqual(content, t.get_bytes(fname))
190
191+ def test_get_bytes_unknown_file(self):
192+ t = self.get_transport()
193+
194 self.assertRaises(NoSuchFile, t.get_bytes, 'c')
195
196 def test_get_with_open_write_stream_sees_all_content(self):