Comment 8 for bug 505913

Revision history for this message
Jonathan Lange (jml) wrote :

OK, more information. It's a bug in ampoule, sort of.

What's happening:
 * the job "lease" mechanism works by specifying a timeout for the job using the "deadline" mechanism in ampoule
 * if the deadline is too early, the deadline can be triggered before AMPChild.__enter__ can be processed (in our case, JobRunnerProcess.__enter__)
 * when the deadline is triggered, ampoule kills the child process with SIGHUP
 * we're using __enter__ to set a SIGHUP handler that raises an error, our test expects that error to be raised
 * when the deadline occurs before our __enter__ is processed, AMP just kills the process and then starts another worker

What I don't understand is why the test considers the job to have completed successfully. Probably something to do with the many layers of indirection between ampoule and the actual job.

Not sure what the next action is. Changing the timeout to 5s instead of 1s would fix the intermittency, at a cost of test performance. Alternatively, we might want to reconsider the testing approach here (e.g. delete this test and add some tests that are more unit-y). Probably better to fix ampoule though.

# With deadline set too early
jml@truth:~/src/launchpad/test-timeout-505913$ PYTHONPATH=.:$PYTHONPATH ./bin/py examplerunner.py
/home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: DeprecationWarning: the sets module is deprecated
  import sets
2011-03-18 20:42:05+0000 [-] Log opened.
2011-03-18 20:42:05+0000 [-] Subprocess 0 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] Process: 0 ended
2011-03-18 20:42:05+0000 [-] AMP connection lost (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] ************* JOB RAISED
2011-03-18 20:42:05+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:05+0000 [-] Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 1.
2011-03-18 20:42:05+0000 [-]
2011-03-18 20:42:05+0000 [-] FATAL: Restarting after [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessTerminated'>: A process has ended with a probable error condition: process ended by signal 1.
 ]
2011-03-18 20:42:05+0000 [-] Subprocess 1 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] Main loop terminated.

# With deadline set "late enough"
jml@truth:~/src/launchpad/test-timeout-505913$ PYTHONPATH=.:$PYTHONPATH ./bin/py examplerunner.py
/home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: DeprecationWarning: the sets module is deprecated
  import sets
2011-03-18 20:42:21+0000 [-] Log opened.
2011-03-18 20:42:21+0000 [-] Subprocess 0 started.
2011-03-18 20:42:21+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Log opened.
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] using set_wakeup_fd
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] /home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: exceptions.DeprecationWarning: the sets module is deprecated
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] JobRunnerProcess connection established (HOST:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a990> PEER:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a950>)
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Hello
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Got command
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] Unhandled Error
2011-03-18 20:42:24+0000 [-] FROM 0: Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 884, in _commandReceived
2011-03-18 20:42:24+0000 [-] FROM 0: deferred = self.dispatchCommand(box)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 941, in dispatchCommand
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(responder, box)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 133, in maybeDeferred
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 1030, in doit
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(aCallable, **kw).addCallback(
2011-03-18 20:42:24+0000 [-] FROM 0: --- <exception caught here> ---
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 133, in maybeDeferred
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 57, in runJobCommand
2011-03-18 20:42:24+0000 [-] FROM 0: time.sleep(5)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 46, in handler
2011-03-18 20:42:24+0000 [-] FROM 0: raise TimeoutError
2011-03-18 20:42:24+0000 [-] FROM 0: examplerace.TimeoutError: Wooeuohuateh
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] JobRunnerProcess connection lost (HOST:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a990> PEER:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a950>)
2011-03-18 20:42:24+0000 [-] ************* JOB RAISED
2011-03-18 20:42:24+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error
2011-03-18 20:42:24+0000 [-]
2011-03-18 20:42:24+0000 [-] Main loop terminated.