Fix magnum's gate, oslo.messaging broke it

Bug #1694728 reported by Spyros Trigazis
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
Critical
Spyros Trigazis
Zun
Invalid
Undecided
Unassigned
oslo.messaging
Fix Released
High
Mehdi Abaakouk

Bug Description

Yesterday, 30-05-2017 oslo.messaging===5.25.0 was released with this [1] rabbit related fix.
With oslo.messaging===5.25.0 messages from magnum-api never reach magnum-conductor.

[1] https://review.openstack.org/#/c/463673/

Revision history for this message
Spyros Trigazis (strigazi) wrote :

Per IRC discussion, oslo.messaging===5.24.2 is buggy. We can avoid 5.24.2 and 5.25.0 until the problem is fixed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/469518

Changed in magnum:
assignee: nobody → Spyros Trigazis (strigazi)
status: New → In Progress
Revision history for this message
Matthew Thode (prometheanfire) wrote :

please submit this to global-requirements.txt (in the requirements repo) instead so it can filter out to all projects.

Revision history for this message
Matt Riedemann (mriedem) wrote :

There is no indication in here about what went into 5.24.2 that broke Magnum - has anyone sorted that out?

Revision history for this message
Matt Riedemann (mriedem) wrote :

The description says https://review.openstack.org/#/c/463673/ but that went into 5.25.0, not 5.24.2, so why is 5.24.2 also bad - or isn't it?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Also note that oslo.messaging change was backported and merged to stable/ocata and proposed to stable/newton, but I put a -2 on the stable/newton change until this is sorted out:

https://review.openstack.org/#/q/I62b9e09513e3ebfebc64a941d4b21b6c053b511d,n,z

Revision history for this message
Spyros Trigazis (strigazi) wrote :

5.24.2 doesn't break magnum. Per IRC discussion (with sileht) 5.24.2 is considered buggy. Why 5.24.2 is bad, I can not answer. Magnum works fine with 5.24.2.

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK thanks for clarifying.

Changed in zun:
assignee: nobody → feng.shengqin@zte.com.cn (feng-shengqin)
Revision history for this message
hongbin (hongbin034) wrote :

I did some troubleshooting on this, it looks the issue occurs if you are using oslo.messaging 5.25.0 and the 'blocking' executor. Change the executor to 'eventlet' will resolve the problem. I commented on the patch for details: https://review.openstack.org/#/c/463673/

Changed in oslo.messaging:
status: New → Confirmed
Changed in zun:
status: New → Invalid
Changed in zun:
assignee: feng.shengqin@zte.com.cn (feng-shengqin) → nobody
Changed in magnum:
importance: Undecided → Critical
Mehdi Abaakouk (sileht)
Changed in oslo.messaging:
assignee: nobody → Mehdi Abaakouk (sileht)
importance: Undecided → High
Revision history for this message
Spyros Trigazis (strigazi) wrote :

The default executor of o.m is 'blocking', but after change [1],
o.m with blocking is broken. 'eventlet' and 'non-blocking' work
fine and most services use 'eventlet', so this is why it was
noticed by magnum which used the default one.

Magnum uses eventlet in one place [2], but wrongly blocking in another [3].

We must change the executor in magnum to be eventlet regardless of this bug, since it recommended by the oslo team and the 'blocking' executor has 0% test coverage.

Oslo side, the default behavior of the library *must* be functional. So, I suggest [4] to blacklist until 5.25.1 is out.

[1] https://review.openstack.org/#/c/463673/
[2] http://git.openstack.org/cgit/openstack/magnum/tree/magnum/common/rpc.py#n166
[3] http://git.openstack.org/cgit/openstack/magnum/tree/magnum/common/rpc_service.py#n52
[4] https://review.openstack.org/#/c/469539/

Revision history for this message
Spyros Trigazis (strigazi) wrote :

I assume that a 5.25.1 will be released to fix the blocking executor.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/469852

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on magnum (master)

Change abandoned by Spyros Trigazis (strigazi) (<email address hidden>) on branch: master
Review: https://review.openstack.org/469518
Reason: In favor of https://review.openstack.org/#/c/469852/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/469852
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=bd69b3fff6b50048813360627fb02044cf1bcb58
Submitter: Jenkins
Branch: master

commit bd69b3fff6b50048813360627fb02044cf1bcb58
Author: Spyros Trigazis (strigazi) <email address hidden>
Date: Thu Jun 1 12:20:26 2017 +0000

    Use eventlet executor in rpc_service

    The default executor of o.m is 'blocking', but after change [1],
    o.m with blocking is broken. 'eventlet' and 'non-blocking' work
    fine and most services use 'eventlet', so this is why it was
    noticed by magnum which used the default one.

    Use eventlet to not be affected by bug #1694728 , but more
    importantly, the oslo team suggests to not use blocking which
    has 0% test coverage and it is going to be deprecated and unset
    from the default configuration.

    [1] https://review.openstack.org/#/c/463673/

    Change-Id: I47da73787456c97f7d84fd4440404b551ff62528
    Closes-Bug: #1694728

Changed in magnum:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/470283

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (stable/ocata)

Reviewed: https://review.openstack.org/470283
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=a4203b79d77ceee3d884fbb376b1e32dfc54530f
Submitter: Jenkins
Branch: stable/ocata

commit a4203b79d77ceee3d884fbb376b1e32dfc54530f
Author: Matt Riedemann <email address hidden>
Date: Fri Jun 2 13:19:28 2017 +0000

    Revert "rabbit: restore synchronous ack/requeue"

    This reverts commit b3316669263ad5f76e03bb7b54f1704f64c8c17f.

    It was reported on master (pike) that this change broke
    the default "blocking" executor so we should revert it on
    stable/ocata.

    Change-Id: Ia4bd74aa3df059e00b209d66afa8e327b76fe6ca
    Related-Bug: #1694728

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/470759

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/newton)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/470759

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/470762

Changed in oslo.messaging:
status: Confirmed → In Progress
Changed in oslo.messaging:
assignee: Mehdi Abaakouk (sileht) → ChangBo Guo(gcb) (glongwave)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/471275

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/471276

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/471277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (stable/mitaka)

Reviewed: https://review.openstack.org/471277
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=a9c642af0ce75c75a8d1439d63e2c803eaedcc2c
Submitter: Jenkins
Branch: stable/mitaka

commit a9c642af0ce75c75a8d1439d63e2c803eaedcc2c
Author: Spyros Trigazis (strigazi) <email address hidden>
Date: Thu Jun 1 12:20:26 2017 +0000

    Use eventlet executor in rpc_service

    The default executor of o.m is 'blocking', but after change [1],
    o.m with blocking is broken. 'eventlet' and 'non-blocking' work
    fine and most services use 'eventlet', so this is why it was
    noticed by magnum which used the default one.

    Use eventlet to not be affected by bug #1694728 , but more
    importantly, the oslo team suggests to not use blocking which
    has 0% test coverage and it is going to be deprecated and unset
    from the default configuration.

    [1] https://review.openstack.org/#/c/463673/

    Change-Id: I47da73787456c97f7d84fd4440404b551ff62528
    Closes-Bug: #1694728
    (cherry picked from commit bd69b3fff6b50048813360627fb02044cf1bcb58)

tags: added: in-stable-mitaka
Changed in oslo.messaging:
assignee: ChangBo Guo(gcb) (glongwave) → Mehdi Abaakouk (sileht)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/ocata)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/470762
Reason: I will restore it when master is merged

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/472233

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (master)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: master
Review: https://review.openstack.org/472233

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/469806
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=8ee5ae135a6ecb918f40619982e3dc7e38ed0bbf
Submitter: Jenkins
Branch: master

commit 8ee5ae135a6ecb918f40619982e3dc7e38ed0bbf
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Jun 1 10:28:23 2017 +0200

    Fix rabbitmq driver with blocking executor

    We recently move ack/requeue of messages in main/polling thread
    of rabbitmq drivers. And break the blocking executor.

    This one is not tested by any tests and now deprecated.

    This change workaround the issue until we completely remove the
    blocking executor.

    Change-Id: Id479100f6ff364cf67a199e9b70f9f0c7bf7e1a9
    Closes-bug: #1694728

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 5.27.0

This issue was fixed in the openstack/oslo.messaging 5.27.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/newton)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/466787

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/474202

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/ocata)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/470762
Reason: Sorry I have missed that on I remove it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/474455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/ocata)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/474202
Reason: The one with the new Change-ID: https://review.openstack.org/#/c/474455/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/474456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/newton)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/470759
Reason: This change have a new changeID: https://review.openstack.org/474456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/474456
Reason: Thanks a lot for catching this. A good reason for me to stop provide backport in advance...

Also changing all the Change-ID have created some mess, git-review refuses to update this patch. And it force me to update the old one: https://review.openstack.org/466787

So I have restored it and abandon this one.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/ocata)

Reviewed: https://review.openstack.org/474455
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=2775568e01b8429dc90fade4e38673085098da34
Submitter: Jenkins
Branch: stable/ocata

commit 2775568e01b8429dc90fade4e38673085098da34
Author: Mehdi Abaakouk <email address hidden>
Date: Wed May 10 09:19:38 2017 +0200

    rabbit: restore synchronous ack/requeue

    Note this change also contains the fix for the regression it
    introduced.

    In https://review.openstack.org/#/c/436958, we fix a thread safety
    issue. But we make the ack/requeue of message asynchronous. In nominal
    case, it works, but if network/rabbit connection issue occurs this
    can result to rpc call handle twice. By chance we double check already
    processed message ids, and drop duplicates, but that if the message
    goes to another node, the mitigation won't work.

    This restore the previous behavior, to ensure we run application
    callback of rpc.call/rpc.cast only when the message have been
    successfully ack.

    (cherry picked from commit da02bc2169b09959d857c605961ead1bbba1019d)

    Fix rabbitmq driver with blocking executor

    We recently move ack/requeue of messages in main/polling thread
    of rabbitmq drivers. And break the blocking executor.

    This one is not tested by any tests and now deprecated.

    This change workaround the issue until we completely remove the
    blocking executor.

    Closes-bug: #1694728
    (cherry picked from commit 8ee5ae135a6ecb918f40619982e3dc7e38ed0bbf)

    Change-Id: I62b9e09513e3ebfebc64a941d4b21b6c053b511d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on magnum (stable/ocata)

Change abandoned by Spyros Trigazis (strigazi) (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/471275
Reason: Abandon, since o.m blocking is fixed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on magnum (stable/newton)

Change abandoned by Spyros Trigazis (strigazi) (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/471276
Reason: Abandon, since o.m blocking is fixed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 5.17.2

This issue was fixed in the openstack/oslo.messaging 5.17.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/newton)

Reviewed: https://review.openstack.org/466787
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=c41f0ce709ea39717eb75748a6c5b7e9a69628cf
Submitter: Jenkins
Branch: stable/newton

commit c41f0ce709ea39717eb75748a6c5b7e9a69628cf
Author: Mehdi Abaakouk <email address hidden>
Date: Wed May 10 09:19:38 2017 +0200

    rabbit: restore synchronous ack/requeue

    Note this change also contains the fix for the regression it
    introduced.

    In https://review.openstack.org/#/c/436958, we fix a thread safety
    issue. But we make the ack/requeue of message asynchronous. In nominal
    case, it works, but if network/rabbit connection issue occurs this
    can result to rpc call handle twice. By chance we double check already
    processed message ids, and drop duplicates, but that if the message
    goes to another node, the mitigation won't work.

    This restore the previous behavior, to ensure we run application
    callback of rpc.call/rpc.cast only when the message have been
    successfully ack.

    (cherry picked from commit da02bc2169b09959d857c605961ead1bbba1019d)

    Fix rabbitmq driver with blocking executor

    We recently move ack/requeue of messages in main/polling thread
    of rabbitmq drivers. And break the blocking executor.

    This one is not tested by any tests and now deprecated.

    This change workaround the issue until we completely remove the
    blocking executor.

    Closes-bug: #1694728
    (cherry picked from commit 8ee5ae135a6ecb918f40619982e3dc7e38ed0bbf)

    Change-Id: I62b9e09513e3ebfebc64a941d4b21b6c053b511d

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 5.10.2

This issue was fixed in the openstack/oslo.messaging 5.10.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/magnum 5.0.0

This issue was fixed in the openstack/magnum 5.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.