rpc.server do not consume messages after message acknowledge failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
Fix Released
|
Medium
|
Mehdi Abaakouk | ||
oslo.messaging (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Trusty |
Fix Released
|
High
|
Unassigned | ||
Utopic |
Won't Fix
|
High
|
Unassigned | ||
Vivid |
Fix Released
|
High
|
Unassigned |
Bug Description
def start(self):
@excutils.
def _executor_thread():
try:
while self._running:
incoming = self.listener.
if incoming is not None:
self.
except greenlet.
return
class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on message acknowledgement can be raised and caught by the excutils.
Kombu related code is listed below.
def drain_events(self, **kwargs):
return self.transport.
@property
def connection(self):
if not self._closed:
if not self.connected:
return self._connection
-------
[Impact]
This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.
[Test Case]
Note steps are for trusty-icehouse, including latest oslo.messaging library (1.3.0-0ubuntu1.1 at the time of this writing).
Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly kill one of the rabbit nodes (e.g. force panic, etc). Observe that the nova services do detect that the node went down and report that they are reconnected, but messages are still reporting as timed out, nova service-list still reports compute nodes as down, etc.
[Regression Potential]
There is the possibility that there will be more reconnect attempts from the oslo.messaging library if there is a false positive in the underlying kombu connection reported as disconnected. This should be unlikely since this is bringing the oslo.messaging code into sync with the underlying library, but it is a possibility.
[Other Info]
The attempt to drive reconnection logic was fixed in a recent SRU of oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix that is required in order to allow the oslo.messaging library to not go into a zombie-fied connection state.
Changed in oslo.messaging: | |
assignee: | nobody → QingchuanHao (haoqingchuan-28) |
Changed in oslo.messaging: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in oslo.messaging: | |
assignee: | QingchuanHao (haoqingchuan-28) → Mehdi Abaakouk (sileht) |
status: | Confirmed → In Progress |
Changed in oslo.messaging: | |
milestone: | none → 1.11.0 |
status: | Fix Committed → Fix Released |
description: | updated |
no longer affects: | python-oslo.messaging (Ubuntu) |
Changed in oslo.messaging (Ubuntu Wily): | |
status: | New → Fix Released |
Changed in oslo.messaging (Ubuntu Vivid): | |
importance: | Undecided → High |
Changed in oslo.messaging (Ubuntu Trusty): | |
importance: | Undecided → High |
Changed in oslo.messaging (Ubuntu Wily): | |
importance: | Undecided → High |
tags: | removed: verification-needed |
Changed in oslo.messaging (Ubuntu Utopic): | |
status: | Fix Committed → Won't Fix |
no longer affects: | oslo.messaging (Ubuntu Wily) |
Reviewed: https:/ /review. openstack. org/180059 /git.openstack. org/cgit/ openstack/ oslo.messaging/ commit/ ?id=415db68b673 68d7c8aa550e710 8122200816e665
Committed: https:/
Submitter: Jenkins
Branch: master
commit 415db68b67368d7 c8aa550e7108122 200816e665
Author: Mehdi Abaakouk <email address hidden>
Date: Tue May 5 10:29:22 2015 +0200
rabbit: redeclare consumers when ack/requeue fail
In case the acknowledgement or requeue of a message fail,
the kombu transport can be disconnected
In this case, we must redeclare our consumers.
This changes fixes that.
This have no tests because the kombu memory transport we use in our tests
cannot be in disconnected state.
Closes-bug: #1448650
Change-Id: I5991a4cf827411 bc27c857561d974 61212a17f40