OpenStack Compute (nova)

Merge lp:~jk0/nova/lp661472 into lp:~hudson-openstack/nova/trunk

lp661472
Merge into trunk

Proposed by Josh Kearney on 2010-11-19

Status:

Merged

Merge reported by:

Merged at revision:

not available

Proposed branch:

lp:~jk0/nova/lp661472

Merge into:

lp:~hudson-openstack/nova/trunk

Diff against target:

43 lines (+16/-3)

1 file modified

nova/rpc.py (+16/-3)

To merge this branch:

bzr merge lp:~jk0/nova/lp661472

Related bugs:

Bug #661472: Fails if rabbitmq isn't around or drops connection

High

Fix Released

Remove

Link a bug report

Reviewer	Review Type	Date Requested	Status
Vish Ishaya (community)		2010-11-19	Approve on 2010-11-20
Joshua McKenty (community)			Approve on 2010-11-20
Review via email: mp+41373@code.launchpad.net

Commit message

Check for running AMQP instances.

Description of the change

Fixes bug #661472.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2010-11-19:

#

A couple of issues issues:

1. This won't pass pep8

2. We may need something better than time.sleep(30) on fetch. This will block the reactor while we're waiting to reconnect, stopping us from doing any other work like checking the current state of the instances and doing recovery or updating the db. Is there a reason we can't keep the failed connection flag and just add the self.declare() to the existing code?

Revision history for this message

Matt Dietz (cerberus) wrote on 2010-11-19:

#

Is there some reason we don't want to actually log the exceptions on #22 and #58?

lp:~jk0/nova/lp661472 updated on 2010-11-19

409. By Josh Kearney on 2010-11-19: Reverted some changes

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2010-11-19:

#

LGTM. Consider changing logging.warning to logging.exception in the init code. Might be good to have traceback in case it is a configuration issue instead of rabbit actually being down.

review: Approve

lp:~jk0/nova/lp661472 updated on 2010-11-20

410. By Josh Kearney on 2010-11-20: Use logging.exception instead

Revision history for this message

Joshua McKenty (joshua-mckenty) wrote on 2010-11-20:

#

LGTM. Would consider adding a maximum number of retries, or a decay on the 30 second limit. The 30 second should be a FLAG or a CONST as well, IMO.

review: Approve

Revision history for this message

Josh Kearney (jk0) wrote on 2010-11-20:

#

I like the idea of making it a constant with a maximum number of retries. Would the best course of action be to sys.exit() once that limit is hit (giving a friendly error message), or should we let it continue to run?

Also -- is 30 seconds ideal and what should the retry max be?

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2010-11-20:

#

lgtm.

review: Approve

Revision history for this message

Joshua McKenty (joshua-mckenty) wrote on 2010-11-20:

#

I would make it 10 seconds (I think a lot of Rabbit errors are transient network issues), with a maximum 12 tries for a total of 2 minutes. (OR, a bunch more config FLAGS.)

And I'd definitely sys.exit(<nonzero>), so that a monitoring system like monit, etc. can show the shutdown as a failure.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Adam Johnson

Anne Gentle

Anthony Young

Brian Waldon

Chuck Short

Dan Mihai Dumitriu

Dave Walker

David Pravec

Diego Parrilla

Edgar Magana

Endre Karlson

Ilya Alekseyev

Isaku Yamahata

JJ Asghar

Jay Pipes

Jonathan Bryce

Josh Kearney

Kapil Thangavelu

Keisuke Tagami

Koji Iida

Krisztian Eyssen

Lorin Hochstein

Mark McLoughlin

Masanori Itoh

Milind Barve

Nachi Ueno

Paul Guth

Pedro Perez

Rajesh Battala

Ram Durairaj

Robert Middleswarth

Salvatore Orlando

Sateesh

Soren Hansen

Tomoya Masuko

Vish Ishaya

Vladimir Popovski

Youcef Laribi

adil mukarram

jawaid ekram

justinsb

jxta

makki

med makki maalej

sreekanth

termie

to status/vote changes:

Chris Behrens