Merge lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre into lp:launchpad

Proposed by John A Meinel
Status: Merged
Approved by: Curtis Hovey
Approved revision: no longer in the source branch.
Merged at revision: 12414
Proposed branch: lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre
Merge into: lp:launchpad
Diff against target: 57 lines (+21/-17)
1 file modified
bzrplugins/lpserve/__init__.py (+21/-17)
To merge this branch: bzr merge lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
j.c.sackett (community) code* Approve
Review via email: mp+50031@code.launchpad.net

Commit message

[r=jcsackett,sinzui][no-qa] Force lp-forking-service children to exit on any error, rather than falling back to master code.

Description of the change

This is one step along the way to squashing bug #717345.

I finally was able to trigger the bug manually, and the first thing I noticed was if the children get interrupted at the right place, they revert to being a master process.

So this squashes all of those with a try/finally.

I think what was happening is that they were blocked trying to open their new stdin, and then SIGINT triggers them to wakeup and start exiting, and then they think they can run as a master process.

I did not add a test for it, because it requires a fair amount of trickery to stop the process at just the right time, and it is hard to detect. If we feel it is required, I might be able to put something together. (Request a fork, but don't bind to it, send it SIGINT, and make sure it exits with code 127. But I don't know how to check the return code of a process that isn't your direct child...)

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

John--

I can't say I'm 100% comfortable with approving this, given there's no tests. fork configurations and processes have already caused on incident report, so I think we need more confidence that this works as intended before approving it.

I'm not claiming this as a review, b/c you might find someone who knows more about the problem area and can approve this with some confidence. I do urge you to look into testing options for it; that might not be test coverage, but we should see how this code works in some fashion.

Revision history for this message
j.c.sackett (jcsackett) wrote :

After a conversation in IRC, I realize I have misunderstood the situation. Given that, I think is fine.

6:22 PM jam
jcsacket: ... I have run the specific code quite thoroughly, and there are some tests that already cover it.
6:22 PM
It is a little bit hard to inject specific failures into that exact point. If you feel it is critical, we can try to figure something out
6:22 PM
but I did do a lot of manual testing.

6:22 PM jcsackett
jam: ah, in your MP you said there were no tests; i took that to mean no testing. :-)

6:22 PM jam
no new tests covering what changed

6:23 PM jcsackett
dig.

review: Approve (code*)
Revision history for this message
Curtis Hovey (sinzui) wrote :

This looks good to land.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrplugins/lpserve/__init__.py'
2--- bzrplugins/lpserve/__init__.py 2010-11-08 22:43:58 +0000
3+++ bzrplugins/lpserve/__init__.py 2011-02-17 21:50:50 +0000
4@@ -424,14 +424,23 @@
5
6 def become_child(self, command_argv, path):
7 """We are in the spawned child code, do our magic voodoo."""
8- # Stop tracking new signals
9- self._unregister_signals()
10- # Reset the start time
11- trace._bzr_log_start_time = time.time()
12- trace.mutter('%d starting %r'
13- % (os.getpid(), command_argv))
14- self._bind_child_file_descriptors(path)
15- self._run_child_command(command_argv)
16+ retcode = 127 # Failed in a bad way, poor cleanup, etc.
17+ try:
18+ # Stop tracking new signals
19+ self._unregister_signals()
20+ # Reset the start time
21+ trace._bzr_log_start_time = time.time()
22+ trace.mutter('%d starting %r'
23+ % (os.getpid(), command_argv))
24+ self._bind_child_file_descriptors(path)
25+ retcode = self._run_child_command(command_argv)
26+ finally:
27+ # We force os._exit() here, because we don't want to unwind the
28+ # stack, which has complex results. (We can get it to unwind back
29+ # to the cmd_launchpad_forking_service code, and even back to
30+ # main() reporting thereturn code, but after that, suddenly the
31+ # return code changes from a '0' to a '1', with no logging of info.
32+ os._exit(retcode)
33
34 def _run_child_command(self, command_argv):
35 # This is the point where we would actually want to do something with
36@@ -447,17 +456,12 @@
37 self._close_child_file_descriptors()
38 trace.mutter('%d finished %r'
39 % (os.getpid(), command_argv))
40- # We force os._exit() here, because we don't want to unwind the stack,
41- # which has complex results. (We can get it to unwind back to the
42- # cmd_launchpad_forking_service code, and even back to main() reporting
43- # thereturn code, but after that, suddenly the return code changes from
44- # a '0' to a '1', with no logging of info.
45- # TODO: Should we call sys.exitfunc() here? it allows atexit functions
46- # to fire, however, some of those may be still around from the
47- # parent process, which we don't really want.
48+ # TODO: Should we call sys.exitfunc() here? it allows atexit
49+ # functions to fire, however, some of those may be still
50+ # around from the parent process, which we don't really want.
51 sys.exitfunc()
52 # See [Decision #6]
53- os._exit(retcode)
54+ return retcode
55
56 @staticmethod
57 def command_to_argv(command_str):