Merge lp:~gz/bzr/test_ssh_client_medium_eintr__read_bytes into lp:bzr
- test_ssh_client_medium_eintr__read_bytes
- Merge into bzr.dev
Status: | Work in progress |
---|---|
Proposed branch: | lp:~gz/bzr/test_ssh_client_medium_eintr__read_bytes |
Merge into: | lp:bzr |
Diff against target: |
74 lines (+64/-0) 1 file modified
bzrlib/tests/test_smart_transport.py (+64/-0) |
To merge this branch: | bzr merge lp:~gz/bzr/test_ssh_client_medium_eintr__read_bytes |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Needs Fixing | ||
Review via email: mp+19439@code.launchpad.net |
Commit message
Description of the change
- 5038. By Martin Packman
-
Turn the expectFailure the right way round, remove sleep trailing whitespace
Martin Pool (mbp) wrote : | # |
This does make the bug more clear, thanks for posting it. Clearly we
should fix _read_bytes.
I'm not super keen to actually merge this because history shows that
any test that depends on timing interactions between threads will
intermittently fail in the future. Do you really want us to?
--
Martin <http://
Martin Packman (gz) wrote : | # |
> I'm not super keen to actually merge this because history shows that
> any test that depends on timing interactions between threads will
> intermittently fail in the future. Do you really want us to?
Your call, at the moment the code risks a random UnexpectedSuccess - which as mentioned by John AM on the list is currently harmless, but shouldn't be. When the underlying issues are fixed it'll pass, and risk wrongly passing if there's a regression, but won't cause random orange.
This kind of timing thing *is* hard to test correctly though, and it may be worth not merging just for the sanity of people reading through the module.
Andrew Bennetts (spiv) wrote : | # |
First, I've got a branch that builds on this patch at <lp:~spiv/bzr/test_ssh_client_medium_eintr__read_bytes>.
Ok, I see the issue. It appears it's only a problem with the SSH medium, not the simple pipes (--inet) or TCP media. I haven't finished digging, but it appears that the root cause is that socket_
I've rewritten your test so that it doesn't rely on timeouts or time.sleep, and so that it is an implementation test that runs against all of SmartTCPClientM
I also changed the test to check for the existence of the EINTR symbol rather than just checking the platform name. So now test should run on POSIX-like platforms that don't say os.name == 'posix', yet might be vulnerable to this bug.
I still need to do a bit more refactoring in my branch to remove duplication with existing test code, but this has finally pushed me into making per-medium implementation tests, so thank you! I'll probably put this new test in a new per_smart_medium test module.
Finally, your test is a bit overly precise about the interface. It's ok for _read_bytes to return too few (but not 0 unless the connection has ended), or even too many bytes. Of course, of the bytes it returns, it has to return them in the right order, so I've fixed the test to assert just that.
I'm not totally satisfied with the shape of the new test, but I think it's getting closer, and it does clearly pinpoint the bug. Thanks for your persistence on this issue!
We may to backport the eventual fix to 2.0, if it isn't too hairy. Thinking of which, is there a bug number for this?
Martin Packman (gz) wrote : | # |
> First, I've got a branch that builds on this patch at
> <lp:~spiv/bzr/test_ssh_client_medium_eintr__read_bytes>.
Great, thanks for looking at this, the test needed to be less specific and monolithic if it was going to stick around.
> Ok, I see the issue. It appears it's only a problem with the SSH medium, not
> the simple pipes (--inet) or TCP media. I haven't finished digging, but it
> appears that the root cause is that socket_
> file object that is buggy when EINTR occurs? If so, I think we can workaround
> that without losing the ability to handle EINTR.
I urge you to read the mailing list thread if you've not already, it included a complete list of which interfaces are affected and how in the first post. The only tricky bit is indeed SmartSSHClientM
> I also changed the test to check for the existence of the EINTR symbol rather
> than just checking the platform name. So now test should run on POSIX-like
> platforms that don't say os.name == 'posix', yet might be vulnerable to this
> bug.
That is incorrect. EINTR *exists* on other platforms but this PC loser-ing issue is specific to unix.
> Finally, your test is a bit overly precise about the interface. It's ok for
> _read_bytes to return too few (but not 0 unless the connection has ended), or
> even too many bytes. Of course, of the bytes it returns, it has to return
> them in the right order, so I've fixed the test to assert just that.
It's demonstrating a specific problem with one implementation, poking it to work with others too is fine.
> I'm not totally satisfied with the shape of the new test, but I think it's
> getting closer, and it does clearly pinpoint the bug. Thanks for your
> persistence on this issue!
Your changes keeps around an interface I'm trying to junk and is a little harder to reason about, but losing the sleeps is good. Please CC me on the review when you post it, as I've some specific comments.
> We may to backport the eventual fix to 2.0, if it isn't too hairy. Thinking
> of which, is there a bug number for this?
Nope, given that there are about five different bugs with the same underlying cause, it seemed easier to take it to the mailing list so all the discussion stayed in one place.
Andrew Bennetts (spiv) wrote : | # |
Martin [gz] wrote:
[...]
> I urge you to read the mailing list thread if you've not already, it included
> a complete list of which interfaces are affected and how in the first post.
> The only tricky bit is indeed SmartSSHClientM
> upstream.
I have read it, but will re-read. It's been a long, slightly convoluted
discussion!
> > I also changed the test to check for the existence of the EINTR symbol rather
> > than just checking the platform name. So now test should run on POSIX-like
> > platforms that don't say os.name == 'posix', yet might be vulnerable to this
> > bug.
>
> That is incorrect. EINTR *exists* on other platforms but this PC loser-ing
> issue is specific to unix.
But isn't it valid to test for correct behaviour in the face of EINTR on all
platforms that have EINTR?
[...]
> > I'm not totally satisfied with the shape of the new test, but I think it's
> > getting closer, and it does clearly pinpoint the bug. Thanks for your
> > persistence on this issue!
>
> Your changes keeps around an interface I'm trying to junk and is a little
> harder to reason about, but losing the sleeps is good. Please CC me on the
> review when you post it, as I've some specific comments.
Which interface is it that you're trying to junk?
I'll definitely CC you on the merge proposal.
> > We may to backport the eventual fix to 2.0, if it isn't too hairy. Thinking
> > of which, is there a bug number for this?
>
> Nope, given that there are about five different bugs with the same underlying
> cause, it seemed easier to take it to the mailing list so all the discussion
> stayed in one place.
I see what you mean. On the other hand, I have to search for and then read
through a multi-message thread with no clear summary. Neither a cluster of
bugs or a mailing list thread is entirely satisfying. Hmm.
Thanks for your comments!
Martin Packman (gz) wrote : | # |
> But isn't it valid to test for correct behaviour in the face of EINTR on all
> platforms that have EINTR?
Depends on the test. For mocks, I think it's useful, when dealing with real platform code where the outcome will either be a spurious pass or a hang depending on the details, I think it's clearer to skip. Note that os.name == "posix" covers linux, bsd (which actually has a different signal interrupt paradigm but python switches into the posix style), mac os x, solaris... Name me a platform bzr supports that you're worried about skipping the test on?
> Which interface is it that you're trying to junk?
I see you've now found:
<https:/
Andrew Bennetts (spiv) wrote : | # |
Martin [gz] wrote:
> > But isn't it valid to test for correct behaviour in the face of EINTR on all
> > platforms that have EINTR?
>
> Depends on the test. For mocks, I think it's useful, when dealing with real
> platform code where the outcome will either be a spurious pass or a hang
> depending on the details, I think it's clearer to skip. Note that os.name ==
> "posix" covers linux, bsd (which actually has a different signal interrupt
> paradigm but python switches into the posix style), mac os x, solaris... Name
> me a platform bzr supports that you're worried about skipping the test on?
It's more that I'd rather *not* worry about making sure all relevant platforms
are covered, and instead aim for the broadest possible coverage. If that turns
out to be impractical then os.name == 'posix' will do.
Martin Packman (gz) wrote : | # |
Changing the status to WIP here as Andrew has a branch building on this.
Unmerged revisions
- 5038. By Martin Packman
-
Turn the expectFailure the right way round, remove sleep trailing whitespace
- 5037. By Martin Packman
-
Test demonstrating a few of the issues with the current code attempting to handle EINTR being raised
Preview Diff
1 | === modified file 'bzrlib/tests/test_smart_transport.py' |
2 | --- bzrlib/tests/test_smart_transport.py 2010-02-11 09:27:55 +0000 |
3 | +++ bzrlib/tests/test_smart_transport.py 2010-02-16 23:34:14 +0000 |
4 | @@ -367,6 +367,70 @@ |
5 | client_medium.disconnect() |
6 | self.assertEqual(['flush'], flush_calls) |
7 | |
8 | + def test_ssh_client_medium_eintr__read_bytes(self): |
9 | + """ |
10 | + Verify that _read_bytes is robust against EINTR being raised |
11 | + |
12 | + It's probably not possible to test this in such a way as it won't |
13 | + sometimes randomly pass, but this fails in the expected manners |
14 | + pretty reliably. |
15 | + """ |
16 | + if os.name != "posix": |
17 | + raise tests.TestNotApplicable("Only nix needs to handle EINTR") |
18 | + import errno, signal, sys, time |
19 | + # Use timeouts we can join the server thread without risking hanging |
20 | + timeout = 3 |
21 | + sock_server = socket.socket() |
22 | + sock_server.settimeout(timeout) |
23 | + sock_server.bind(("127.0.0.1", 0)) |
24 | + sock_server.listen(1) |
25 | + about_to_read = threading.Event() |
26 | + def _send_with_interrupt(): |
27 | + """Send a message split by a signal to a connecting client""" |
28 | + sock_response, addr = sock_server.accept() |
29 | + about_to_read.wait(timeout) |
30 | + sock_response.sendall("head") |
31 | + # Let the main thread recv before raising signal |
32 | + time.sleep(0.1) |
33 | + os.kill(os.getpid(), signal.SIGUSR1) |
34 | + # Let the signal be handled before sending more |
35 | + time.sleep(0.1) |
36 | + sock_response.sendall("tail") |
37 | + server_thread = threading.Thread(target=_send_with_interrupt) |
38 | + siglog = [] |
39 | + def _record_signal(signum, frame): |
40 | + """Record the signal received and the functions being called""" |
41 | + stack = [] |
42 | + while frame: |
43 | + co_name = frame.f_code.co_name |
44 | + if co_name == "test_ssh_client_medium_eintr__read_bytes": |
45 | + break |
46 | + stack.append(co_name) |
47 | + frame = frame.f_back |
48 | + siglog.append((signum, stack)) |
49 | + self.addCleanup(signal.signal, signal.SIGUSR1, |
50 | + signal.signal(signal.SIGUSR1, _record_signal)) |
51 | + server_thread.start() |
52 | + self.addCleanup(server_thread.join) |
53 | + sock_request = socket.socket() |
54 | + sock_request.connect(sock_server.getsockname()) |
55 | + addr, port = sock_request.getsockname() |
56 | + client_medium = medium.SmartSSHClientMedium(addr, port, |
57 | + vendor=StringIOSSHVendor(sock_request.makefile(), None)) |
58 | + client_medium._ensure_connection() |
59 | + about_to_read.set() |
60 | + try: |
61 | + bytes = client_medium._read_bytes(8) |
62 | + except socket.error, e: |
63 | + if e.args[0] == errno.EINTR and sys.version_info < (2, 6): |
64 | + self.knownFailure("Interruptions not handled on Python < 2.6") |
65 | + raise |
66 | + self.assertEqual(siglog, |
67 | + [(signal.SIGUSR1, ["read", "until_no_eintr", "_read_bytes"])]) |
68 | + self.expectFailure("An interrupted read loses bytes already read", |
69 | + self.assertNotEqual, bytes, "tail") |
70 | + self.assertEqual(bytes, "headtail") |
71 | + |
72 | def test_construct_smart_tcp_client_medium(self): |
73 | # the TCP client medium takes a host and a port. Constructing it won't |
74 | # connect to anything. |
For background, see this thread (and particularly the three changesets linked ffrom the initial post): /lists. ubuntu. com/archives/ bazaar/ 2010q1/ 066869. html>
<https:/
There seemed to be some doubt about exactly how harmful the current code attempting to handle EINTR is, despite my best attempts to explain it, so here's a test that demonstrates a couple of the failure states. It uses a reading method, but these problems apply equally to those that write, it's just harder to get them to block where you want reliably enough to make an automated test.
As an aside, testtools doesn't seem to do anything sensible with TestCase. knownFailure (and the TestCase. expectFailure isn't much better).