Nova Affinity filters no longer work

Bug #1708171 reported by Samuel Matzek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Fix Released
Undecided
Matt Riedemann

Bug Description

A recent change to devstack [1] has broken RPC between Nova compute and Nova scheduler in devstack and this in turn is affecting the affinity and anti-affinity filters.

The Trove project has a negative gate test which tests anti-affinity as part of a DB cluster replica deploy and this test is failing because Nova is allowing 2 instances on the same host.

The lack of RPC from compute to the scheduler can be seen in the Nova gate logs from before and after the change. The message "Successfully synced instances from host' appears in the scheduler log before the change [2].

Based on the commit message in this change and related changes, it may have been intentional to break the RPC comm between compute and scheduler to test for cells. If that is the case then we need another way to allow the affinity filters to work in this set up. I have tried setting 'track_instance_changes=False' based on its description [3] but this did not fix the anti-affinity filter [4], [5].

After the change the transport_url settings in Nova conf differ between the scheduler and the compute and even have different values in the compute conf itself:

nova scheduler:
[DEFAULT]
transport_url = rabbit://stackrabbit:secretrabbit@192.168.0.74:5672/
[oslo_messaging_notifications]
transport_url = rabbit://stackrabbit:secretrabbit@192.168.0.74:5672/

nova compute:
[DEFAULT]
transport_url = rabbit://stackrabbit:secretrabbit@192.168.0.74:5672/nova_cell1
[oslo_messaging_notifications]
transport_url = rabbit://stackrabbit:secretrabbit@192.168.0.74:5672/

[1] https://github.com/openstack-dev/devstack/commit/5f0a963cb31222c08deb4a3c219f9cdd1674b218#diff-665a24457e945ac31372cf63d00a4080
[2] http://intel-openstack-ci-logs.ovh/portland/2017-08-02/481479/4/check/tempest-dsvm-full-nfv-xenial/f7f398c/logs/screen-n-sch.txt.gz
[3] https://github.com/openstack/nova/blame/master/nova/conf/scheduler.py#L195-L213
[4] http://logs.openstack.org/73/489773/1/check/gate-trove-scenario-dsvm-mysql-multi-ubuntu-xenial/2cf8e82/
[5] https://review.openstack.org/#/c/489773/1

Revision history for this message
Dan Smith (danms) wrote :

Yeah, this was intentional and will be like this for a bit. Here's your workaround:

https://review.openstack.org/#/c/487478/

Revision history for this message
Matt Riedemann (mriedem) wrote :

FWIW Tempest has tests for the ServerGroupAffinityFilter which passes in the superconductor mode:

http://logs.openstack.org/32/489632/1/check/gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv/ae3cb6f/console.html#_2017-08-01_15_57_46_709238

And I'm working on a Tempest test for anti-affinity using the ServerGroupAntiAffinityFilter:

https://review.openstack.org/#/c/489754/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/490042

Changed in trove:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
no longer affects: devstack
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on trove (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/490042
Reason: Fixing it here since there are 2 gate blocking issues for Trove apparently:

https://review.openstack.org/#/c/490061/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to trove (master)

Reviewed: https://review.openstack.org/490061
Committed: https://git.openstack.org/cgit/openstack/trove/commit/?id=0fb67d459baa0fb22391b1b1da3dba5349666bb7
Submitter: Jenkins
Branch: master

commit 0fb67d459baa0fb22391b1b1da3dba5349666bb7
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 2 11:46:16 2017 -0400

    Fix AttributeError in api example snippets tests

    Change I3f020b6bcb1b9bf6d18a3b4f738c13cccd1bbff8 in
    python-troveclient 2.11.0 changed the
    troveclient.compat.client._logger variable to be a LOG
    variable.

    I have no idea how this hasn't been breaking the Trove
    API examples CI job since python-troveclient 2.11.0 was
    released. Maybe it has an no one has noticed or cared to
    fix it.

    Anyway, this adds hasattr checking in the test code to
    set the log level on the correct variable based on which
    version of troveclient is being used.

    Also - no idea why setting the log level in the client
    for these API tests is even necessary, but this dates back
    to a change in 2014 so who knows. Not me.

    --

    This also fixes bug 1708171 by making nova run in
    singleconductor mode so the affinity/anti-affinity
    scheduling tests work. Trove CI is blocked by both
    changes so they have to go together.

    Change-Id: Iaf00fc55336a8049c8303b8fa2849df2366115e6
    Closes-Bug: #1708190
    Closes-Bug: #1708171

Changed in trove:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/trove 8.0.0.0rc1

This issue was fixed in the openstack/trove 8.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.