Merge lp:~jameinel/launchpad/twisted-close-on-failure into lp:launchpad
Status: | Merged |
---|---|
Approved by: | Robert Collins |
Approved revision: | no longer in the source branch. |
Merged at revision: | 12415 |
Proposed branch: | lp:~jameinel/launchpad/twisted-close-on-failure |
Merge into: | lp:launchpad |
Diff against target: |
137 lines (+62/-31) 1 file modified
lib/lp/codehosting/sshserver/session.py (+62/-31) |
To merge this branch: | bzr merge lp:~jameinel/launchpad/twisted-close-on-failure |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Brad Crittenden (community) | code | Approve | |
Benji York (community) | code* | Approve | |
Review via email: mp+50067@code.launchpad.net |
Commit message
[r=bac,
Description of the change
This is the final step I've put together for bug #717345.
This changes the spawning code so that it more directly handles when we fail to actually connect to a newly spawned child. This should decrease the chance of leaking file handles. And with https:/
I tested this using 'ulimit -n 100', and then 'hammer_ssh.py --load=20'. So it starts running out of handles, but serves what it can on the remaining connections. Then once the load is gone, I can see all the remaining handles get closed, and the process is back to a clean state.
This branch looks like it will greatly help prevent resource exhaustion. observations for your consideration:
Approved* with a couple of questions/
(* I'm doing mentored reviews, so Brad will be reviewing my review and
giving final approval.)
I'm curious why the check (on line 26 of the diff) that the content of
"response" not start with "FAILURE" was removed? Was it because the
assert will also fire in that case? If so, note two things: 1) that's
an abuse of assert in my book: assert is for asserting the programmer
beliefs about preconditions, postconditions, and invariants, not
checking for erroneous inputs, 2) asserts are compiled out if run with
-O or -OO, so it sets up a failure mode that won't be easy for someone
to track down.
The original "assert ok == 'ok'" was fine because there was a previous
check for ok == 'FAILURE' and the intent was that the only two possible
values for ok were "ok" and "FAILURE" so at that point in the code any
value other than "ok" signaled a mismatch between the programmer's
internal model and reality.
This bit of code confuses me (it's the body of openHandleFailures sans
docstring):
fd = os.open(path, flags) on_failure. append( (os.close, fd)) Writer now handles this fileno, don't close on_failure[ -1] = (p.connectionLost, (None,)) pipes[child_ fd] = p
call_
p = proc_class(reactor, self, child_fd, fd)
# The ProcessReader/
# directly anymore
call_
self.
In particular, it appends a tuple to call_on_failure, makes a function
call that appears to be unrelated to call_on_failure and then replaces
the just-added item at the end of call_on_failure with a new tuple.
Oh wait! Is this because the responsibility for closing fd moves to p
after it is successfully created? Is that what the comment is trying to
tell me? If so, maybe a different phrasing would help. Something like:
# Now that p has been successfully created it takes on the
# responsibility of closing fd, so we'll replace the explicit close
# of fp with the connectionLost handler of p.
In the following code (lines 88 to 98 of the diff), is it possible that
any of the functions in call_on_failure might raise an exception? If
so, is another layer of try/excepts prudent?
try:
func( *args)
...
except:
for func, args in call_on_failure:
raise