DBDeadlock when when syncing traits in Placement during list_allocation_candidates

Bug #1738083 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

This killed a scheduling request so we resulted with a NoValidHost:

http://logs.openstack.org/64/527564/1/gate/legacy-tempest-dsvm-py35/7db2d64/logs/screen-placement-api.txt.gz#_Dec_13_17_07_40_968321

It looks like it blows up here:

Dec 13 17:07:40.973678 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation_candidate.py", line 217, in list_allocation_candidates
Dec 13 17:07:40.973796 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler cands = rp_obj.AllocationCandidates.get_by_requests(context, requests)
Dec 13 17:07:40.973893 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 3182, in get_by_requests
Dec 13 17:07:40.973969 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler _ensure_trait_sync(context)
Dec 13 17:07:40.974045 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 135, in _ensure_trait_sync
Dec 13 17:07:40.974140 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler _trait_sync(ctx)
Dec 13 17:07:40.974218 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 984, in wrapper
Dec 13 17:07:40.974294 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler return fn(*args, **kwargs)
Dec 13 17:07:40.974366 ubuntu-xenial-citycloud-sto2-0001423712 <email address hidden>[14690]: ERROR nova.api.openstack.placement.handler File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 108, in _trait_sync

Due to this deadlock:

oslo_db.exception.DBDeadlock: (pymysql.err.InternalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: 'INSERT INTO traits (created_at, name) VALUES (%(created_at)s, %(name)s)'] [parameters: ({'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954357), 'name': 'HW_GPU_API_CUDA_V2_1'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954363), 'name': 'HW_CPU_X86_TSX'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954365), 'name': 'HW_CPU_X86_AVX512ER'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954367), 'name': 'HW_NIC_OFFLOAD_GRO'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954369), 'name': 'HW_GPU_API_DIRECT3D_V11_2'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954371), 'name': 'HW_GPU_API_OPENGL_V4_4'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954373), 'name': 'HW_GPU_API_CUDA_V1_2'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954375), 'name': 'HW_CPU_X86_AVX512VL'} ... displaying 10 of 163 total bound parameter sets ... {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954663), 'name': 'HW_NIC_OFFLOAD_FDF'}, {'created_at': datetime.datetime(2017, 12, 13, 17, 7, 40, 954665), 'name': 'HW_GPU_API_OPENGL_V4_0'})]

Tags: db placement
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/527836

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/527836
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c66ae65775bb9d885fac059847063fee70617bc5
Submitter: Zuul
Branch: master

commit c66ae65775bb9d885fac059847063fee70617bc5
Author: Matt Riedemann <email address hidden>
Date: Wed Dec 13 21:22:32 2017 -0500

    Retry _trait_sync on deadlock

    We're seeing DBDeadlock failures during scheduling in CI jobs
    when syncing traits when getting allocation candidates.

    We have a lock around this code but that's not going to carry across
    multiple processes, so we need to be able to retry on deadlock if
    one occurs.

    Change-Id: I6cf1793c1cbed18d850ec7e32b5b195e78cb4e68
    Closes-Bug: #1738083

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/528094

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/528094
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=66a7508728fc7cc9f221702b0fe511ddf7f378ab
Submitter: Zuul
Branch: stable/pike

commit 66a7508728fc7cc9f221702b0fe511ddf7f378ab
Author: Matt Riedemann <email address hidden>
Date: Wed Dec 13 21:22:32 2017 -0500

    Retry _trait_sync on deadlock

    We're seeing DBDeadlock failures during scheduling in CI jobs
    when syncing traits when getting allocation candidates.

    We have a lock around this code but that's not going to carry across
    multiple processes, so we need to be able to retry on deadlock if
    one occurs.

    Change-Id: I6cf1793c1cbed18d850ec7e32b5b195e78cb4e68
    Closes-Bug: #1738083
    (cherry picked from commit c66ae65775bb9d885fac059847063fee70617bc5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b3

This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.0

This issue was fixed in the openstack/nova 16.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.