neutron

Bug #1318721
Comment #8

Comment 8 for bug 1318721

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-04: Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/103157
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=7ad0d7eaf9cb095a14b07a08c814d9f1f9c8ff12
Submitter: Jenkins
Branch: master

commit 7ad0d7eaf9cb095a14b07a08c814d9f1f9c8ff12
Author: Jens Rosenboom <email address hidden>
Date: Fri Jun 27 16:46:47 2014 +0200

Fix reconnect race condition with RabbitMQ cluster

    Retry Queue creation to workaround race condition
    that may happen when both the client and broker race over
    exchange creation and deletion respectively which happen only
    when the Queue/Exchange were created with auto-delete flag.

    Queues/Exchange declared with auto-delete instruct the Broker to
    delete the Queue when the last Consumer disconnect from it, and
    the Exchange when the last Queue is deleted from this Exchange.

Now in a RabbitMQ cluster setup, if the cluster node that we are
connected to go down, 2 things will happen:

     1. From RabbitMQ side, the Queues w/ auto-delete will be deleted
        from the other cluster nodes and then the Exchanges that the
        Queues are bind to if they were also created w/ auto-delete.
     2. From client side, client will reconnect to another cluster
        node and call queue.declare() which create Exchanges then
        Queues then Binding in that order.

    Now in a happy path the queues/exchanges will be deleted from the
    broker before client start re-creating them again, but it also
    possible that the client first start by creating queues/exchange
    as part of the queue.declare() call, which are no-op operations
    b/c they alreay existed, but before it could bind Queue to
    Exchange, RabbitMQ nodes just received the 'signal' that the
    queue doesn't have any consumer so it should be delete, and the
    same with exchanges, which will lead to binding fail with
    NotFound error.

Illustration of the time line from Client and RabbitMQ cluster
respectively when the race condition happen:

           e-declare(E) q-declare(Q) q-bind(Q, E)
         -----+------------------+----------------+----------->
                                   e-delete(E)
         ------------------------------+---------------------->

Change-Id: Ideb73af6f246a8282780cdb204d675d5d4555bf0
Closes-Bug: #1318721

Reviewed:  https://review.openstack.org/103157
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=7ad0d7eaf9cb095a14b07a08c814d9f1f9c8ff12
Submitter: Jenkins
Branch:    master

commit 7ad0d7eaf9cb095a14b07a08c814d9f1f9c8ff12
Author: Jens Rosenboom <j.rosenboom@x-ion.de>
Date:   Fri Jun 27 16:46:47 2014 +0200

Fix reconnect race condition with RabbitMQ cluster
    
    Retry Queue creation to workaround race condition
    that may happen when both the client and broker race over
    exchange creation and deletion respectively which happen only
    when the Queue/Exchange were created with auto-delete flag.
    
    Queues/Exchange declared with auto-delete instruct the Broker to
    delete the Queue when the last Consumer disconnect from it, and
    the Exchange when the last Queue is deleted from this Exchange.
    
    Now in a RabbitMQ cluster setup, if the cluster node that we are
    connected to go down, 2 things will happen:
    
     1. From RabbitMQ side, the Queues w/ auto-delete will be deleted
        from the other cluster nodes and then the Exchanges that the
        Queues are bind to if they were also created w/ auto-delete.
     2. From client side, client will reconnect to another cluster
        node and call queue.declare() which  create Exchanges then
        Queues then Binding in that order.
    
    Now in a happy path the queues/exchanges will be deleted from the
    broker before client start re-creating them again, but it also
    possible that the client first start by creating queues/exchange
    as part of the queue.declare() call, which are no-op operations
    b/c they alreay existed, but before it could bind Queue to
    Exchange, RabbitMQ nodes just received the 'signal' that the
    queue doesn't have any consumer so it should be delete, and the
    same with exchanges, which will lead to binding fail with
    NotFound error.
    
    Illustration of the time line from Client and RabbitMQ cluster
    respectively when the race condition happen:
    
           e-declare(E)      q-declare(Q)       q-bind(Q, E)
         -----+------------------+----------------+----------->
                                   e-delete(E)
         ------------------------------+---------------------->
    
    Change-Id: Ideb73af6f246a8282780cdb204d675d5d4555bf0
    Closes-Bug: #1318721