Merge lp:~mbp/rabbitfixture/rabbit-startup into lp:rabbitfixture

Proposed by Martin Pool
Status: Merged
Merged at revision: 26
Proposed branch: lp:~mbp/rabbitfixture/rabbit-startup
Merge into: lp:rabbitfixture
Diff against target: 34 lines (+7/-4)
1 file modified
rabbitfixture/server.py (+7/-4)
To merge this branch: bzr merge lp:~mbp/rabbitfixture/rabbit-startup
Reviewer Review Type Date Requested Status
Gavin Panella Approve
Review via email: mp+80285@code.launchpad.net

This proposal supersedes a proposal from 2011-10-24.

Description of the change

On the weekend, Rabbit failed to start for me, and Launchpad's tests fail with "RabbitMQ server is not running" from inside the code that's meant to be starting it. Making a change similar to the attached one helped me find the rabbit log and understand the problem.

There are two changes:
 * exceptions point to the log file which contains the actual failure message
 * rather than calling 'check_running' which raises a cryptic message if it's not running, use is_running to decide whether to continue waiting

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

The change to the exception may confuse folk, because the config
workarea gets deleted entirely [when used with transient rabbits in
tests]

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

On 24 October 2011 17:02, Robert Collins <email address hidden> wrote:
> The change to the exception may confuse folk, because the config
> workarea gets deleted entirely [when used with transient rabbits in
> tests]

I found that:

1- the log file is directly in /tmp, eg /tmp/tmpj8LKTv
2- when the tests failed, the log file was still present

and so for me it was useful.

There may be other cases where the log file is already deleted by the
time you see the exception. I thought about trying to read it into
memory and put it in the exception but it's fairly long.

Revision history for this message
Gavin Panella (allenap) wrote : Posted in a previous version of this proposal

check_running() just checks that the process is still running; if it's
no longer running then Rabbit is never going to start.

One way to get a better insight into this problem is to change the
stuff in runlaunchpad.py (in Launchpad) to print fixture details when
one of them fails. At the moment those details are ignored.

review: Disapprove
Revision history for this message
Gavin Panella (allenap) wrote :

This looks like it'll work, but it's ugly:

- It relies upon the fact that fixture clean-ups are not run when
  setUp fails.

- When these errors appear as part of a test run they're not
  particularly useful (useless in ec2 if the instance has already
  terminated) and could be misleading (for example, someone might
  believe that they have to capture the log when its contents are
  actually already in a test detail).

I still think that it would be much better to fix runlaunchpad.py
rather than muddy things in rabbitfixture. So, approved, but it's a
+0.1 rather than a +1, and I'd want to revert these changes when
runlaunchpad is fixed.

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

Fixture cleanup when setUp fails should really be invoked - the code
is careful to make sure that that is reasonable, and if we're not
cleaning up, we'll be leaving crud behind.

(e.g. the runlaunchpad code should have a try: finally:). If this
branch is dependent on runlaunchpad *having* that defect, I think we
should reject it.

Revision history for this message
Martin Pool (mbp) wrote :

On 25 October 2011 21:21, Gavin Panella <email address hidden> wrote:
> Review: Approve
>
> This looks like it'll work, but it's ugly:
>
> - It relies upon the fact that fixture clean-ups are not run when
>  setUp fails.
>
> - When these errors appear as part of a test run they're not
>  particularly useful (useless in ec2 if the instance has already
>  terminated) and could be misleading (for example, someone might
>  believe that they have to capture the log when its contents are
>  actually already in a test detail).
>
> I still think that it would be much better to fix runlaunchpad.py
> rather than muddy things in rabbitfixture. So, approved, but it's a
> +0.1 rather than a +1, and I'd want to revert these changes when
> runlaunchpad is fixed.

I don't understand. With this patch, it raises an exception in just
the same cases as before, but now the exception has a more useful
message. How does that change the interaction with tearDown, and why
would you ever want to revert it?

I take your point about ec2 and the whole disk disappearing, and
perhaps there printing the log would be better.

Revision history for this message
Gavin Panella (allenap) wrote :

The log is put into a temporary directory that is ordinarily erased
during fixture clean-up. At the moment fixture clean-ups are not run
if setUp() raises an exception, but, as Rob says, clean-ups should
really be run. When python-fixtures adopts that behaviour, telling the
user the location of the log file will be pointless because it will
have been erased by the time they see the message.

Revision history for this message
Martin Pool (mbp) wrote :

I see, thanks.

Revision history for this message
Robert Collins (lifeless) wrote :

The change isn't in python-fixtures so much as in callers of setUp -
e.g. TestCase.useFixture() should handle exceptions by calling
cleanUp(). fixtures will need to do this too, for its implementation
of useFixture. runlaunchpad or whatever is calling setUp is the
immediate place to do this though.

Revision history for this message
Martin Pool (mbp) wrote :

Can you give me a clue how this ought to be handled in runlaunchpad?

Revision history for this message
Gavin Panella (allenap) wrote :

> Can you give me a clue how this ought to be handled in runlaunchpad?

In runlaunchpad.py there's a start_launchpad() function. It starts up
services here:

    with nested(*services):

Each member of services is a Service, which is subclass of Fixture,
which means it responds to getDetails().

Something like the following might work:

    # This is for inspiration; it's not necessarily right.
    from testtools.testresult.real import _details_to_str

    try:
        with nested(*services):
            ...
    except Exception:
        for service in services:
            print _details_to_str(service.getDetails())
        raise

Fixtures are also used by some layers that just call setUp() and
cleanUp() directly, and keep no track of details. I don't really know
how to address that yet, but it can wait.

Revision history for this message
Martin Pool (mbp) wrote :

based on Gavin's +0.1 I'm going to merge this, even though it's not a total fix.

> > Can you give me a clue how this ought to be handled in runlaunchpad?
>
> In runlaunchpad.py there's a start_launchpad() function. It starts up
> services here:
>
> with nested(*services):
>
> Each member of services is a Service, which is subclass of Fixture,
> which means it responds to getDetails().
>
> Something like the following might work:
>
> # This is for inspiration; it's not necessarily right.
> from testtools.testresult.real import _details_to_str
>
> try:
> with nested(*services):
> ...
> except Exception:
> for service in services:
> print _details_to_str(service.getDetails())
> raise
>
> Fixtures are also used by some layers that just call setUp() and
> cleanUp() directly, and keep no track of details. I don't really know
> how to address that yet, but it can wait.

Ok, and so then you'd also want me to attach the contents of the log file in to the service? Why would that be better than putting them in to the exception?

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Oct 31, 2011 at 8:46 PM, Martin Pool <email address hidden> wrote:
>
> Ok, and so then you'd also want me to attach the contents of the log file in to the service?  Why would that be better than putting them in to the exception?

Because when something else fails, the exception raised won't have the
log files. But they may still be useful.

Revision history for this message
Martin Pool (mbp) wrote :

On 31 October 2011 18:52, Robert Collins <email address hidden> wrote:
> On Mon, Oct 31, 2011 at 8:46 PM, Martin Pool <email address hidden> wrote:
>>
>> Ok, and so then you'd also want me to attach the contents of the log file in to the service?  Why would that be better than putting them in to the exception?
>
> Because when something else fails, the exception raised won't have the
> log files. But they may still be useful.

That makes sense. I see they are even read in at the moment.

Revision history for this message
Martin Pool (mbp) wrote :

I'm not planning to pursue this any more at the moment, but I'll just mention that I think the thing of putting it into runlaunchpad won't work, as I hit this while running tests. It is more like we need to capture the details from failures during useFixture. But python-fixtures trunk at least does seem to do that. So, I'm not sure.

Revision history for this message
Martin Pool (mbp) wrote :

... for the record, I did actually hit this again and ended up doing https://code.launchpad.net/~mbp/launchpad/892427-service-failure/+merge/82766 to put Gavin's suggestion in to lp.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'rabbitfixture/server.py'
2--- rabbitfixture/server.py 2011-09-29 11:20:28 +0000
3+++ rabbitfixture/server.py 2011-10-25 03:46:24 +0000
4@@ -283,15 +283,17 @@
5 "RabbitMQ OTP already running even though it "
6 "hasn't been started it yet!")
7 self._spawn()
8- # Wait for the server to come up...
9+ # Wait for the server to come up: stop when the process is dead, or
10+ # the timeout expires, or the server responds.
11 timeout = time.time() + 15
12- while time.time() < timeout and self.check_running():
13+ while time.time() < timeout and self.is_running():
14 if self.environment.is_node_running():
15 break
16 time.sleep(0.3)
17 else:
18 raise Exception(
19- "Timeout waiting for RabbitMQ server to start.")
20+ "Timeout waiting for RabbitMQ server to start: log in %r." %
21+ (self.config.logfile,))
22 # The Erlang OTP is up, but RabbitMQ may not be usable. We need to
23 # cleanup up the process from here on in even if the full service
24 # fails to get together.
25@@ -308,7 +310,8 @@
26 break
27 else:
28 raise Exception(
29- "Timeout waiting for RabbitMQ to node to come up.")
30+ "Timeout waiting for RabbitMQ to node to come up: log in %r." %
31+ (self.config.logfile,))
32
33 def _request_stop(self):
34 outstr, errstr = self.environment.rabbitctl("stop", strip=True)

Subscribers

People subscribed via source and target branches

to all changes: