Merge lp:~corey.bryant/ubuntu/trusty/oslo.messaging/lp1318721 into lp:ubuntu/trusty-updates/oslo.messaging

Proposed by Corey Bryant
Status: Merged
Merge reported by: Chuck Short
Merged at revision: not available
Proposed branch: lp:~corey.bryant/ubuntu/trusty/oslo.messaging/lp1318721
Merge into: lp:ubuntu/trusty-updates/oslo.messaging
Diff against target: 104 lines (+84/-0)
3 files modified
debian/changelog (+8/-0)
debian/patches/fix-reconnect-race-condition-with-rabbitmq-cluster.patch (+75/-0)
debian/patches/series (+1/-0)
To merge this branch: bzr merge lp:~corey.bryant/ubuntu/trusty/oslo.messaging/lp1318721
Reviewer Review Type Date Requested Status
James Page Approve
Review via email: mp+283304@code.launchpad.net
To post a comment you must log in.
Revision history for this message
James Page (james-page) wrote :

Uploaded to trusty proposed for SRU team review.

review: Approve
10. By Corey Bryant

Bump to 1.4

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'debian/changelog'
2--- debian/changelog 2015-06-25 09:59:42 +0000
3+++ debian/changelog 2016-01-21 15:09:11 +0000
4@@ -1,3 +1,11 @@
5+oslo.messaging (1.3.0-0ubuntu1.4) trusty; urgency=medium
6+
7+ * Backport upstream fix (LP: #1318721):
8+ - d/p/fix-reconnect-race-condition-with-rabbitmq-cluster.patch:
9+ Redeclare if exception is catched after self.queue.declare() failed.
10+
11+ -- Hui Xiang <hui.xiang@canonical.com> Thu, 17 Dec 2015 16:22:35 +0800
12+
13 oslo.messaging (1.3.0-0ubuntu1.2) trusty; urgency=medium
14
15 * Detect when underlying kombu connection to rabbitmq server has been
16
17=== added file 'debian/patches/fix-reconnect-race-condition-with-rabbitmq-cluster.patch'
18--- debian/patches/fix-reconnect-race-condition-with-rabbitmq-cluster.patch 1970-01-01 00:00:00 +0000
19+++ debian/patches/fix-reconnect-race-condition-with-rabbitmq-cluster.patch 2016-01-21 15:09:11 +0000
20@@ -0,0 +1,75 @@
21+Description: Fix reconnect race condition with RabbitMQ cluster
22+
23+ commit 7ad0d7eaf9cb095a14b07a08c814d9f1f9c8ff12
24+ Author: Jens Rosenboom <j.rosenboom@x-ion.de>
25+ Date: Fri Jun 27 16:46:47 2014 +0200
26+
27+ Retry Queue creation to workaround race condition
28+ that may happen when both the client and broker race over
29+ exchange creation and deletion respectively which happen only
30+ when the Queue/Exchange were created with auto-delete flag.
31+
32+ Queues/Exchange declared with auto-delete instruct the Broker to
33+ delete the Queue when the last Consumer disconnect from it, and
34+ the Exchange when the last Queue is deleted from this Exchange.
35+
36+ Now in a RabbitMQ cluster setup, if the cluster node that we are
37+ connected to go down, 2 things will happen:
38+
39+ 1. From RabbitMQ side, the Queues w/ auto-delete will be deleted
40+ from the other cluster nodes and then the Exchanges that the
41+ Queues are bind to if they were also created w/ auto-delete.
42+ 2. From client side, client will reconnect to another cluster
43+ node and call queue.declare() which create Exchanges then
44+ Queues then Binding in that order.
45+
46+ Now in a happy path the queues/exchanges will be deleted from the
47+ broker before client start re-creating them again, but it also
48+ possible that the client first start by creating queues/exchange
49+ as part of the queue.declare() call, which are no-op operations
50+ b/c they alreay existed, but before it could bind Queue to
51+ Exchange, RabbitMQ nodes just received the 'signal' that the
52+ queue doesn't have any consumer so it should be delete, and the
53+ same with exchanges, which will lead to binding fail with
54+ NotFound error.
55+
56+ Illustration of the time line from Client and RabbitMQ cluster
57+ respectively when the race condition happen:
58+
59+
60+ e-declare(E) q-declare(Q) q-bind(Q, E)
61+ -----+------------------+----------------+----------->
62+ e-delete(E)
63+ ------------------------------+---------------------->
64+
65+ Change-Id: Ideb73af6f246a8282780cdb204d675d5d4555bf0
66+ Closes-Bug: #1318721
67+
68+Author: Jens Rosenboom <j.rosenboom@x-ion.de>
69+Origin: backport, https://review.openstack.org/#/c/103157/
70+Bug: https://bugs.launchpad.net/neutron/+bug/1318721
71+
72+--- a/oslo/messaging/_drivers/impl_rabbit.py
73++++ b/oslo/messaging/_drivers/impl_rabbit.py
74+@@ -159,7 +159,20 @@
75+ self.channel = channel
76+ self.kwargs['channel'] = channel
77+ self.queue = kombu.entity.Queue(**self.kwargs)
78+- self.queue.declare()
79++ try:
80++ self.queue.declare()
81++ except Exception as e:
82++ # NOTE: This exception may be triggered by a race condition.
83++ # Simply retrying will solve the error most of the time and
84++ # should work well enough as a workaround until the race condition
85++ # itself can be fixed.
86++ # TODO(jrosenboom): In order to be able to match the Execption
87++ # more specifically, we have to refactor ConsumerBase to use
88++ # 'channel_errors' of the kombu connection object that
89++ # has created the channel.
90++ # See https://bugs.launchpad.net/neutron/+bug/1318721 for details.
91++ LOG.exception(_("Declaring queue failed with (%s), retrying"), e)
92++ self.queue.declare()
93+
94+ def _callback_handler(self, message, callback):
95+ """Call callback with deserialized message.
96
97=== modified file 'debian/patches/series'
98--- debian/patches/series 2015-06-25 09:59:42 +0000
99+++ debian/patches/series 2016-01-21 15:09:11 +0000
100@@ -3,3 +3,4 @@
101 0002-rabbit-fix-timeout-timer-when-duration-is-None.patch
102 0003-Declare-DirectPublisher-exchanges-with-passive-True.patch
103 redeclare-consumers-when-ack-requeue-fails.patch
104+fix-reconnect-race-condition-with-rabbitmq-cluster.patch

Subscribers

People subscribed via source and target branches

to all changes: