Merge lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre into lp:launchpad

Proposed by John A Meinel
Status: Merged
Approved by: Curtis Hovey
Approved revision: no longer in the source branch.
Merged at revision: 12414
Proposed branch: lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre
Merge into: lp:launchpad
Diff against target: 57 lines (+21/-17)
1 file modified
bzrplugins/lpserve/__init__.py (+21/-17)
To merge this branch: bzr merge lp:~jameinel/launchpad/lp-forking-serve-cleaner-childre
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
j.c.sackett (community) code* Approve
Review via email: mp+50031@code.launchpad.net

Commit message

[r=jcsackett,sinzui][no-qa] Force lp-forking-service children to exit on any error, rather than falling back to master code.

Description of the change

This is one step along the way to squashing bug #717345.

I finally was able to trigger the bug manually, and the first thing I noticed was if the children get interrupted at the right place, they revert to being a master process.

So this squashes all of those with a try/finally.

I think what was happening is that they were blocked trying to open their new stdin, and then SIGINT triggers them to wakeup and start exiting, and then they think they can run as a master process.

I did not add a test for it, because it requires a fair amount of trickery to stop the process at just the right time, and it is hard to detect. If we feel it is required, I might be able to put something together. (Request a fork, but don't bind to it, send it SIGINT, and make sure it exits with code 127. But I don't know how to check the return code of a process that isn't your direct child...)

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

John--

I can't say I'm 100% comfortable with approving this, given there's no tests. fork configurations and processes have already caused on incident report, so I think we need more confidence that this works as intended before approving it.

I'm not claiming this as a review, b/c you might find someone who knows more about the problem area and can approve this with some confidence. I do urge you to look into testing options for it; that might not be test coverage, but we should see how this code works in some fashion.

Revision history for this message
j.c.sackett (jcsackett) wrote :

After a conversation in IRC, I realize I have misunderstood the situation. Given that, I think is fine.

6:22 PM jam
jcsacket: ... I have run the specific code quite thoroughly, and there are some tests that already cover it.
6:22 PM
It is a little bit hard to inject specific failures into that exact point. If you feel it is critical, we can try to figure something out
6:22 PM
but I did do a lot of manual testing.

6:22 PM jcsackett
jam: ah, in your MP you said there were no tests; i took that to mean no testing. :-)

6:22 PM jam
no new tests covering what changed

6:23 PM jcsackett
dig.

review: Approve (code*)
Revision history for this message
Curtis Hovey (sinzui) wrote :

This looks good to land.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrplugins/lpserve/__init__.py'
--- bzrplugins/lpserve/__init__.py 2010-11-08 22:43:58 +0000
+++ bzrplugins/lpserve/__init__.py 2011-02-17 21:50:50 +0000
@@ -424,14 +424,23 @@
424424
425 def become_child(self, command_argv, path):425 def become_child(self, command_argv, path):
426 """We are in the spawned child code, do our magic voodoo."""426 """We are in the spawned child code, do our magic voodoo."""
427 # Stop tracking new signals427 retcode = 127 # Failed in a bad way, poor cleanup, etc.
428 self._unregister_signals()428 try:
429 # Reset the start time429 # Stop tracking new signals
430 trace._bzr_log_start_time = time.time()430 self._unregister_signals()
431 trace.mutter('%d starting %r'431 # Reset the start time
432 % (os.getpid(), command_argv))432 trace._bzr_log_start_time = time.time()
433 self._bind_child_file_descriptors(path)433 trace.mutter('%d starting %r'
434 self._run_child_command(command_argv)434 % (os.getpid(), command_argv))
435 self._bind_child_file_descriptors(path)
436 retcode = self._run_child_command(command_argv)
437 finally:
438 # We force os._exit() here, because we don't want to unwind the
439 # stack, which has complex results. (We can get it to unwind back
440 # to the cmd_launchpad_forking_service code, and even back to
441 # main() reporting thereturn code, but after that, suddenly the
442 # return code changes from a '0' to a '1', with no logging of info.
443 os._exit(retcode)
435444
436 def _run_child_command(self, command_argv):445 def _run_child_command(self, command_argv):
437 # This is the point where we would actually want to do something with446 # This is the point where we would actually want to do something with
@@ -447,17 +456,12 @@
447 self._close_child_file_descriptors()456 self._close_child_file_descriptors()
448 trace.mutter('%d finished %r'457 trace.mutter('%d finished %r'
449 % (os.getpid(), command_argv))458 % (os.getpid(), command_argv))
450 # We force os._exit() here, because we don't want to unwind the stack,459 # TODO: Should we call sys.exitfunc() here? it allows atexit
451 # which has complex results. (We can get it to unwind back to the460 # functions to fire, however, some of those may be still
452 # cmd_launchpad_forking_service code, and even back to main() reporting461 # around from the parent process, which we don't really want.
453 # thereturn code, but after that, suddenly the return code changes from
454 # a '0' to a '1', with no logging of info.
455 # TODO: Should we call sys.exitfunc() here? it allows atexit functions
456 # to fire, however, some of those may be still around from the
457 # parent process, which we don't really want.
458 sys.exitfunc()462 sys.exitfunc()
459 # See [Decision #6]463 # See [Decision #6]
460 os._exit(retcode)464 return retcode
461465
462 @staticmethod466 @staticmethod
463 def command_to_argv(command_str):467 def command_to_argv(command_str):