OK, more information. It's a bug in ampoule, sort of.
What's happening:
* the job "lease" mechanism works by specifying a timeout for the job using the "deadline" mechanism in ampoule
* if the deadline is too early, the deadline can be triggered before AMPChild.__enter__ can be processed (in our case, JobRunnerProcess.__enter__)
* when the deadline is triggered, ampoule kills the child process with SIGHUP
* we're using __enter__ to set a SIGHUP handler that raises an error, our test expects that error to be raised
* when the deadline occurs before our __enter__ is processed, AMP just kills the process and then starts another worker
What I don't understand is why the test considers the job to have completed successfully. Probably something to do with the many layers of indirection between ampoule and the actual job.
Not sure what the next action is. Changing the timeout to 5s instead of 1s would fix the intermittency, at a cost of test performance. Alternatively, we might want to reconsider the testing approach here (e.g. delete this test and add some tests that are more unit-y). Probably better to fix ampoule though.
# With deadline set too early
jml@truth:~/src/launchpad/test-timeout-505913$ PYTHONPATH=.:$PYTHONPATH ./bin/py examplerunner.py
/home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: DeprecationWarning: the sets module is deprecated
import sets
2011-03-18 20:42:05+0000 [-] Log opened.
2011-03-18 20:42:05+0000 [-] Subprocess 0 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] Process: 0 ended
2011-03-18 20:42:05+0000 [-] AMP connection lost (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] ************* JOB RAISED
2011-03-18 20:42:05+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:05+0000 [-] Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 1.
2011-03-18 20:42:05+0000 [-]
2011-03-18 20:42:05+0000 [-] FATAL: Restarting after [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessTerminated'>: A process has ended with a probable error condition: process ended by signal 1.
]
2011-03-18 20:42:05+0000 [-] Subprocess 1 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:05+0000 [-] Main loop terminated.
# With deadline set "late enough"
jml@truth:~/src/launchpad/test-timeout-505913$ PYTHONPATH=.:$PYTHONPATH ./bin/py examplerunner.py
/home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: DeprecationWarning: the sets module is deprecated
import sets
2011-03-18 20:42:21+0000 [-] Log opened.
2011-03-18 20:42:21+0000 [-] Subprocess 0 started.
2011-03-18 20:42:21+0000 [-] AMP connection established (HOST:('no host',) PEER:('subprocess',))
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Log opened.
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] using set_wakeup_fd
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] /home/jml/src/launchpad/devel/eggs/ampoule-0.2.0-py2.6.egg/ampoule/main.py:4: exceptions.DeprecationWarning: the sets module is deprecated
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] JobRunnerProcess connection established (HOST:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a990> PEER:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a950>)
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Hello
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Got command
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] Unhandled Error
2011-03-18 20:42:24+0000 [-] FROM 0: Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 884, in _commandReceived
2011-03-18 20:42:24+0000 [-] FROM 0: deferred = self.dispatchCommand(box)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 941, in dispatchCommand
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(responder, box)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 133, in maybeDeferred
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/protocols/amp.py", line 1030, in doit
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(aCallable, **kw).addCallback(
2011-03-18 20:42:24+0000 [-] FROM 0: --- <exception caught here> ---
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/jml/src/launchpad/devel/eggs/Twisted-10.2.0_4395fix_1_4909-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 133, in maybeDeferred
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 57, in runJobCommand
2011-03-18 20:42:24+0000 [-] FROM 0: time.sleep(5)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 46, in handler
2011-03-18 20:42:24+0000 [-] FROM 0: raise TimeoutError
2011-03-18 20:42:24+0000 [-] FROM 0: examplerace.TimeoutError: Wooeuohuateh
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] JobRunnerProcess connection lost (HOST:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a990> PEER:<twisted.internet._posixstdio.PipeAddress object at 0x1a6a950>)
2011-03-18 20:42:24+0000 [-] ************* JOB RAISED
2011-03-18 20:42:24+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error
2011-03-18 20:42:24+0000 [-]
2011-03-18 20:42:24+0000 [-] Main loop terminated.
OK, more information. It's a bug in ampoule, sort of.
What's happening: s.__enter_ _)
* the job "lease" mechanism works by specifying a timeout for the job using the "deadline" mechanism in ampoule
* if the deadline is too early, the deadline can be triggered before AMPChild.__enter__ can be processed (in our case, JobRunnerProces
* when the deadline is triggered, ampoule kills the child process with SIGHUP
* we're using __enter__ to set a SIGHUP handler that raises an error, our test expects that error to be raised
* when the deadline occurs before our __enter__ is processed, AMP just kills the process and then starts another worker
What I don't understand is why the test considers the job to have completed successfully. Probably something to do with the many layers of indirection between ampoule and the actual job.
Not sure what the next action is. Changing the timeout to 5s instead of 1s would fix the intermittency, at a cost of test performance. Alternatively, we might want to reconsider the testing approach here (e.g. delete this test and add some tests that are more unit-y). Probably better to fix ampoule though.
# With deadline set too early ~/src/launchpad /test-timeout- 505913$ PYTHONPATH= .:$PYTHONPATH ./bin/py examplerunner.py src/launchpad/ devel/eggs/ ampoule- 0.2.0-py2. 6.egg/ampoule/ main.py: 4: DeprecationWarning: the sets module is deprecated subprocess' ,)) subprocess' ,)) internet. error.ProcessTe rminated: A process has ended with a probable error condition: process ended by signal 1. internet. error.ProcessTe rminated' >: A process has ended with a probable error condition: process ended by signal 1. subprocess' ,))
jml@truth:
/home/jml/
import sets
2011-03-18 20:42:05+0000 [-] Log opened.
2011-03-18 20:42:05+0000 [-] Subprocess 0 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('
2011-03-18 20:42:05+0000 [-] Process: 0 ended
2011-03-18 20:42:05+0000 [-] AMP connection lost (HOST:('no host',) PEER:('
2011-03-18 20:42:05+0000 [-] ************* JOB RAISED
2011-03-18 20:42:05+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:05+0000 [-] Failure: twisted.
2011-03-18 20:42:05+0000 [-]
2011-03-18 20:42:05+0000 [-] FATAL: Restarting after [Failure instance: Traceback (failure with no frames): <class 'twisted.
]
2011-03-18 20:42:05+0000 [-] Subprocess 1 started.
2011-03-18 20:42:05+0000 [-] AMP connection established (HOST:('no host',) PEER:('
2011-03-18 20:42:05+0000 [-] Main loop terminated.
# With deadline set "late enough" ~/src/launchpad /test-timeout- 505913$ PYTHONPATH= .:$PYTHONPATH ./bin/py examplerunner.py src/launchpad/ devel/eggs/ ampoule- 0.2.0-py2. 6.egg/ampoule/ main.py: 4: DeprecationWarning: the sets module is deprecated subprocess' ,)) src/launchpad/ devel/eggs/ ampoule- 0.2.0-py2. 6.egg/ampoule/ main.py: 4: exceptions. DeprecationWarn ing: the sets module is deprecated twisted. internet. _posixstdio. PipeAddress object at 0x1a6a990> PEER:<twisted. internet. _posixstdio. PipeAddress object at 0x1a6a950>) jml/src/ launchpad/ devel/eggs/ Twisted- 10.2.0_ 4395fix_ 1_4909- py2.6-linux- x86_64. egg/twisted/ protocols/ amp.py" , line 884, in _commandReceived mmand(box) jml/src/ launchpad/ devel/eggs/ Twisted- 10.2.0_ 4395fix_ 1_4909- py2.6-linux- x86_64. egg/twisted/ protocols/ amp.py" , line 941, in dispatchCommand responder, box) jml/src/ launchpad/ devel/eggs/ Twisted- 10.2.0_ 4395fix_ 1_4909- py2.6-linux- x86_64. egg/twisted/ internet/ defer.py" , line 133, in maybeDeferred jml/src/ launchpad/ devel/eggs/ Twisted- 10.2.0_ 4395fix_ 1_4909- py2.6-linux- x86_64. egg/twisted/ protocols/ amp.py" , line 1030, in doit aCallable, **kw).addCallback( jml/src/ launchpad/ devel/eggs/ Twisted- 10.2.0_ 4395fix_ 1_4909- py2.6-linux- x86_64. egg/twisted/ internet/ defer.py" , line 133, in maybeDeferred TimeoutError: Wooeuohuateh twisted. internet. _posixstdio. PipeAddress object at 0x1a6a990> PEER:<twisted. internet. _posixstdio. PipeAddress object at 0x1a6a950>) protocols. amp.UnknownRemo teError: Code<UNKNOWN>: Unknown Error
jml@truth:
/home/jml/
import sets
2011-03-18 20:42:21+0000 [-] Log opened.
2011-03-18 20:42:21+0000 [-] Subprocess 0 started.
2011-03-18 20:42:21+0000 [-] AMP connection established (HOST:('no host',) PEER:('
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Log opened.
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] using set_wakeup_fd
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] /home/jml/
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] JobRunnerProcess connection established (HOST:<
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Hello
2011-03-18 20:42:22+0000 [-] FROM 0: 2011-03-18 20:42:22+0000 [-] Got command
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] Unhandled Error
2011-03-18 20:42:24+0000 [-] FROM 0: Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/
2011-03-18 20:42:24+0000 [-] FROM 0: deferred = self.dispatchCo
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/
2011-03-18 20:42:24+0000 [-] FROM 0: return maybeDeferred(
2011-03-18 20:42:24+0000 [-] FROM 0: --- <exception caught here> ---
2011-03-18 20:42:24+0000 [-] FROM 0: File "/home/
2011-03-18 20:42:24+0000 [-] FROM 0: result = f(*args, **kw)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 57, in runJobCommand
2011-03-18 20:42:24+0000 [-] FROM 0: time.sleep(5)
2011-03-18 20:42:24+0000 [-] FROM 0: File "examplerace.py", line 46, in handler
2011-03-18 20:42:24+0000 [-] FROM 0: raise TimeoutError
2011-03-18 20:42:24+0000 [-] FROM 0: examplerace.
2011-03-18 20:42:24+0000 [-] FROM 0: 2011-03-18 20:42:24+0000 [-] JobRunnerProcess connection lost (HOST:<
2011-03-18 20:42:24+0000 [-] ************* JOB RAISED
2011-03-18 20:42:24+0000 [-] Traceback (most recent call last):
2011-03-18 20:42:24+0000 [-] Failure: twisted.
2011-03-18 20:42:24+0000 [-]
2011-03-18 20:42:24+0000 [-] Main loop terminated.