Ram filter is broken since Mitaka

Bug #1635367 reported by Erik Olof Gunnar Andersson
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith
Mitaka
Won't Fix
High
Matt Riedemann
Newton
Fix Released
High
Matt Riedemann
Ocata
Fix Released
High
Matt Riedemann

Bug Description

This patch that was backported to Mitaka made the assumption that free_ram_mb can never be less than 0.
https://github.com/openstack/nova/commit/016b810f675b20e8ce78f4c82dc9c679c0162b7a

free_ram_mb can be negative when the ram_allocation_ratio is set higher than one. This is also supported by the following comment in the code for the resource tracker.
> # free ram and disk may be negative, depending on policy:

This breaks any Scheduler filter that supports oversubcribe (e.g. memory or disk), as an example see this unit-test for the ram filter.
https://github.com/openstack/nova/blob/master/nova/tests/unit/scheduler/filters/test_ram_filters.py#L43

Tags: compute
Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Jay Faulkner (jason-oldos) wrote :

This is going to be rough to fix without leaving this broken or breaking the ironic virt driver. Adding Jesse Cook to the bug as he wrote the initial patch.

Sorry for the breakage :(

Matt Riedemann (mriedem)
tags: added: compute
Revision history for this message
Jesse J. Cook (jesse-j-cook) wrote :

I put it on my weekly commitments to dig in and have a looksee.

Revision history for this message
Jesse J. Cook (jesse-j-cook) wrote :

PiN == provisioned in Nova, PiI == provisioned in Ironic.

How it used to work:
Broke PiN PiI UsedMem AvailMem Reported
    Y Y Y 100 100 0
    Y Y N 100 0 -100
    Y N Y 0 100 100
    Y N N 0 0 0
    N Y Y 100 100 0
    N Y N 100 100 0
    N N Y 0 100 100
    N N N 0 100 100

These changes were made:
https://review.openstack.org/#/c/306670/2
https://review.openstack.org/#/c/321907/1

How it works now:
Broke PiN PiI UsedMem AvailMem Reported
    Y Y Y 100 0 -100 -> 0
    Y Y N 100 0 -100 -> 0
    Y N Y 0 0 0
    Y N N 0 0 0
    N Y Y 100 0 -100 -> 0
    N Y N 100 100 0
    N N Y 0 0 0
    N N N 0 100 100

Basically, only schedule to an Ironic node if it's not broken and neither Nova
or Ironic have it marked as provisioned.

If negative values are allowed, these cases would be considered
over-subscriptions, but they are not. Allowing negative RAM causes confusion,
and misrepresents available memory (controlled by the hypervisor / node). This
can lead to subtle bugs when the memory is not available and oversubscribing is
allowed. One example is the possibility of scheduling to a node that has no
available memory. Another example, negative memory is subtracted from
available memory in cells reporting causing builds to fail even though capacity
is available.

At the time of this writing, I'm not certain to what level over-subscriptions
are broken since the available memory limit is stored at a higher value using
the ram allocation factor. It will take a bit more digging before I fully grok
the impact of disallowing negative values here.

One potential way to solve for both cases is to rework the filters to use the
ram_allocation_factor to reduce the allocation instead of increasing the
available memory limit (https://github.com/openstack/nova/blob/master/nova/scheduler/filters/ram_filter.py#L51).
I believe this would eliminate the need for negative values and still allow for
over subscriptions. I can go down this path if it makes sense, but probably good
to have a conversation first.

Revision history for this message
Jesse J. Cook (jesse-j-cook) wrote :

Reposting comment because it's formatted horribly (if this doesn't work, sorry):

PiN == provisioned in Nova, PiI == provisioned in Ironic.

How it used to work:
Broke PiN PiI UsedMem AvailMem Reported
    Y Y Y 100 100 0
    Y Y N 100 0 -100
    Y N Y 0 100 100
    Y N N 0 0 0
    N Y Y 100 100 0
    N Y N 100 100 0
    N N Y 0 100 100
    N N N 0 100 100

These changes were made:
https://review.openstack.org/#/c/306670/2
https://review.openstack.org/#/c/321907/1

How it works now:
Broke PiN PiI UsedMem AvailMem Reported
    Y Y Y 100 0 -100 -> 0
    Y Y N 100 0 -100 -> 0
    Y N Y 0 0 0
    Y N N 0 0 0
    N Y Y 100 0 -100 -> 0
    N Y N 100 100 0
    N N Y 0 0 0
    N N N 0 100 100

Basically, only schedule to an Ironic node if it's not broken and
neither Nova or Ironic have it marked as provisioned.

If negative values are allowed, these cases would be considered
oversubscriptions, but they are not. Allowing negative RAM causes
confusion, and misrepresents available memory (controlled by the
hypervisor / node). This can lead to subtle bugs when the memory
is not available and oversubscribing is allowed. One example is
the possiblity of scheduling to a node that has no available
memory. Another example, negative memory is substractred from
available memory in cells reporting causing builds to fail even
though capacity is available.

At the time of this writing, I'm not certain to what level
oversubscriptions are broken since the available memory limit is
stored at a higher value using the ram allocation factor. It will
take a bit more digging before I fully grok the impact of
disallowing negative values here.

One potential way to solve for both cases is to rework the filters
to use the ram_allocation_factor to reduce the allocation instead
of increasing the available memory limit
(https://github.com/openstack/nova/blob/master/nova/scheduler/filters/ram_filter.py#L51).
I believe this would eliminate the need for negative values and
still allow for oversubscriptions. I can go down this path if it
makes sense, but probably good to have a conversation first.

Revision history for this message
Jesse J. Cook (jesse-j-cook) wrote :

Some IRC discussion here (~1731 - 1744): http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2016-10-21.log.html

I think there's some consensus that we can solve both cases without using negative values by leveraging used instead of free and making the appropriate checks in the filters.

Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :
Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

Live Migration is likely to be affected by the same code. I haven't had time to dig into that one deeper though.
https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L112

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/390984

Changed in nova:
assignee: nobody → Erik Olof Gunnar Andersson (eandersson)
status: Confirmed → In Progress
Changed in nova:
assignee: Erik Olof Gunnar Andersson (eandersson) → Eli Qiao (taget-9)
Changed in nova:
assignee: Eli Qiao (taget-9) → Erik Olof Gunnar Andersson (eandersson)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
assignee: Erik Olof Gunnar Andersson (eandersson) → Dan Smith (danms)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Erik Olof Gunnar Andersson (<email address hidden>) on branch: master
Review: https://review.openstack.org/390984
Reason: Abandoning as this was fixed in a different manner.
https://review.openstack.org/#/c/474994/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.