'test_server_signal_userdata_format_software_config' test not completing and jobs timing out

Bug #1651768 reported by Rabi Mishra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Undecided
Crag Wolfe

Bug Description

It seems that we've number of job failures in the last week with timeout and jobs killed. Looks like it's the 'test_server_signal_userdata_format_software_config' test which landed last week timing out most of the time.

logstash query:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22timeout%20-s%209%20%24%7BREMAINING_TIME%7Dm%20bash%5C%22%20AND%20(build_name%3A%5C%22gate-heat-dsvm-functional-orig-mysql-lbaasv2-ubuntu-xenial%5C%22%20OR%20build_name%3A%5C%22gate-heat-dsvm-functional-convg-mysql-lbaasv2-ubuntu-xenial%5C%22)

Revision history for this message
Rabi Mishra (rabi) wrote :

Looks like we've added the test in https://review.openstack.org/#/c/400464/.

summary: - test_server_signal_userdata_format_software_config test not completing
- and job killed
+ 'test_server_signal_userdata_format_software_config' test not completing
+ and jobs timing out
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/413921

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/413921
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=922f51ed06a3829f0e0fcfa9b0fdc46a6843f05b
Submitter: Jenkins
Branch: master

commit 922f51ed06a3829f0e0fcfa9b0fdc46a6843f05b
Author: rabi <email address hidden>
Date: Thu Dec 22 11:40:22 2016 +0530

    Skip test_server_signal_userdata_format_software_config

    This test is failing very often with timeout, let's
    skip it until the issue is resolved.

    Change-Id: I116a96b20082d1c47068ddc64873200e9ae33a27
    Related-Bug: #1651768

Changed in heat:
assignee: nobody → Crag Wolfe (cwolfe)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Crag Wolfe (<email address hidden>) on branch: master
Review: https://review.openstack.org/459893
Reason: Zane already has a patch for this, mentioned in his last reply.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/321783
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=e37d9fab8fe2e779ae8c0e2311de2601b66c66b6
Submitter: Jenkins
Branch: master

commit e37d9fab8fe2e779ae8c0e2311de2601b66c66b6
Author: Zane Bitter <email address hidden>
Date: Thu May 26 13:40:53 2016 -0400

    Corrected max secs for concurrent trans retries

    This was most likely meant as a max 2s delay here, not a max 2ms
    delay.

    Also includes a related change: when retries for metadata updates are
    attempted, make sure we do not have a stale value of the atomic_key
    (otherwise we'll just inevitably hit the ConcurrentTransaction issue).

    Co-Authored-By: Crag Wolfe <email address hidden>
    Partial-Bug: #1651768
    Change-Id: Ie56e0e4ff93633db1f4752859d2b2a9506922911

Zane Bitter (zaneb)
tags: added: newton-backport-potential ocata-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/464264

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/466008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/ocata)

Reviewed: https://review.openstack.org/464264
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=416bad4325ac8311e7b9b891fb6d70c67a97ab95
Submitter: Jenkins
Branch: stable/ocata

commit 416bad4325ac8311e7b9b891fb6d70c67a97ab95
Author: Zane Bitter <email address hidden>
Date: Thu May 26 13:40:53 2016 -0400

    Corrected max secs for concurrent trans retries

    This was most likely meant as a max 2s delay here, not a max 2ms
    delay.

    Also includes a related change: when retries for metadata updates are
    attempted, make sure we do not have a stale value of the atomic_key
    (otherwise we'll just inevitably hit the ConcurrentTransaction issue).

     Conflicts:
     heat/engine/service_software_config.py

    Co-Authored-By: Crag Wolfe <email address hidden>
    Partial-Bug: #1651768
    Change-Id: Ie56e0e4ff93633db1f4752859d2b2a9506922911
    (cherry picked from commit e37d9fab8fe2e779ae8c0e2311de2601b66c66b6)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/newton)

Reviewed: https://review.openstack.org/466008
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=c52fdbb7d68bb0814843f3fec9b2d966239bab40
Submitter: Jenkins
Branch: stable/newton

commit c52fdbb7d68bb0814843f3fec9b2d966239bab40
Author: Zane Bitter <email address hidden>
Date: Thu May 26 13:40:53 2016 -0400

    Corrected max secs for concurrent trans retries

    This was most likely meant as a max 2s delay here, not a max 2ms
    delay.

    Also includes a related change: when retries for metadata updates are
    attempted, make sure we do not have a stale value of the atomic_key
    (otherwise we'll just inevitably hit the ConcurrentTransaction issue).

     Conflicts:
     heat/engine/service_software_config.py
     heat/objects/resource.py

    Co-Authored-By: Crag Wolfe <email address hidden>
    Partial-Bug: #1651768
    Change-Id: Ie56e0e4ff93633db1f4752859d2b2a9506922911
    (cherry picked from commit e37d9fab8fe2e779ae8c0e2311de2601b66c66b6)

tags: added: in-stable-newton
Revision history for this message
Zane Bitter (zaneb) wrote :

Tests were restored by https://review.openstack.org/#/c/462749/

There have been a few changes that should enable these tests to succeed on a regular basis, all related to edge cases around updating a resource's atomic_key and metadata: Corrected max secs for concurrent trans retries e37d9fab8fe2e779ae8c0e2311de2601b66c66b6 Don't set metadata for deleted resources 8d7e3e41e8f02726dca33b5ec2f6d5b6b6b07a31 Allow retries when resource acquires lock 2ec2d5a973927f9a2cc2a62f70712afc5cb30f4c

In addition to the patch above, the fixes were:
https://review.openstack.org/#/c/449286/
https://review.openstack.org/#/c/449351/

Changed in heat:
status: In Progress → Fix Released
Zane Bitter (zaneb)
tags: removed: newton-backport-potential ocata-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.