Merge lp:~ltrager/maas/lp1603563 into lp:~maas-committers/maas/trunk

Proposed by Lee Trager
Status: Merged
Approved by: Lee Trager
Approved revision: no longer in the source branch.
Merged at revision: 5515
Proposed branch: lp:~ltrager/maas/lp1603563
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 47 lines (+18/-0)
2 files modified
src/maasserver/models/node.py (+9/-0)
src/maasserver/models/tests/test_node.py (+9/-0)
To merge this branch: bzr merge lp:~ltrager/maas/lp1603563
Reviewer Review Type Date Requested Status
Mike Pontillo (community) Approve
Review via email: mp+309305@code.launchpad.net

Commit message

Reset status_expires when a non-monitored status is set.

Description of the change

In LP1554636 we have multiple users who are reporting that when releasing a deployed node the node goes to failed releasing state. When you look through the attached logs you can see the node goes into a failed releasing state 1 second after the release request comes in. I haven't been able to reproduce this myself but I *think* I found a race condition which is causing this to happen. Node._release was setting the status_expires time in a post_commit hook before the status was set.

What I believe is happening is the status gets set, then the status_monitor service runs using an old expired_status, sees its expired and puts the node into a FAILED_RELEASING state. This happens before the post_commit hook to update the node status is run.

What this MP does is always set the status_expires field to None on save if the status is not monitored.

To post a comment you must log in.
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Looks good to me! (Needs a commit message though.)

review: Approve
Revision history for this message
MAAS Lander (maas-lander) wrote :
Download full text (32.4 KiB)

The attempt to merge lp:~ltrager/maas/lp1603563 into lp:maas failed. Below is the output from the failed tests.

Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Hit:2 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial InRelease
Get:3 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB]
Get:4 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-backports InRelease [92.2 kB]
Get:5 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates/main Sources [201 kB]
Get:6 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [413 kB]
Fetched 896 kB in 0s (1,626 kB/s)
Reading package lists...
sudo DEBIAN_FRONTEND=noninteractive apt-get -y \
    --no-install-recommends install apache2 archdetect-deb authbind avahi-utils bash bind9 bind9utils build-essential bzr bzr-builddeb chromium-browser chromium-chromedriver curl daemontools debhelper dh-apport dh-systemd distro-info dnsutils firefox freeipmi-tools git gjs ipython isc-dhcp-common isc-dhcp-server libjs-angularjs libjs-jquery libjs-jquery-hotkeys libjs-yui3-full libjs-yui3-min libpq-dev make nodejs-legacy npm postgresql pxelinux python3-all python3-apt python3-attr python3-bson python3-convoy python3-crochet python3-cssselect python3-curtin python3-dev python3-distro-info python3-django python3-django-nose python3-django-piston3 python3-dnspython python3-docutils python3-formencode python3-hivex python3-httplib2 python3-jinja2 python3-jsonschema python3-lxml python3-netaddr python3-netifaces python3-novaclient python3-oauth python3-oauthlib python3-openssl python3-paramiko python3-petname python3-pexpect python3-psycopg2 python3-pyinotify python3-pyparsing python3-pyvmomi python3-requests python3-seamicroclient python3-setuptools python3-simplestreams python3-sphinx python3-tempita python3-twisted python3-txtftp python3-tz python3-yaml python3-zope.interface python-bson python-crochet python-django python-django-piston python-djorm-ext-pgarray python-formencode python-lxml python-netaddr python-netifaces python-pocket-lint python-psycopg2 python-simplejson python-tempita python-twisted python-yaml socat syslinux-common tgt ubuntu-cloudimage-keyring wget xvfb
Reading package lists...
Building dependency tree...
Reading state information...
authbind is already the newest version (2.1.1+nmu1).
avahi-utils is already the newest version (0.6.32~rc+dfsg-1ubuntu2).
build-essential is already the newest version (12.1ubuntu2).
debhelper is already the newest version (9.20160115ubuntu3).
distro-info is already the newest version (0.14build1).
freeipmi-tools is already the newest version (1.4.11-1ubuntu1).
git is already the newest version (1:2.7.4-0ubuntu1).
libjs-angularjs is already the newest version (1.2.28-1ubuntu2).
libjs-jquery is already the newest version (1.11.3+dfsg-4).
libjs-yui3-full is already the newest version (3.5.1-1ubuntu3).
libjs-yui3-min is already the newest version (3.5.1-1ubuntu3).
make is already the newest version (4.1-6).
postgresql is already the newest version (9.5+173).
pxelinux is already the newest version (3:6.03+dfsg-11ubuntu1).
python-f...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/maasserver/models/node.py'
2--- src/maasserver/models/node.py 2016-10-22 05:40:08 +0000
3+++ src/maasserver/models/node.py 2016-10-26 18:28:44 +0000
4@@ -123,6 +123,7 @@
5 COMMISSIONING_LIKE_STATUSES,
6 get_failed_status,
7 is_failed_status,
8+ MONITORED_STATUSES,
9 NODE_FAILURE_MONITORED_STATUS_TIMEOUTS,
10 NODE_TRANSITIONS,
11 )
12@@ -1462,7 +1463,15 @@
13 self.hostname, error)
14
15 def save(self, *args, **kwargs):
16+ # Reset the status_expires if not a monitored status. This prevents
17+ # a race condition seen in LP1603563 where an old status_expires caused
18+ # the node to do in a FAILED_RELEASING state due to an old
19+ # status_expire being set.
20+ if self.status not in MONITORED_STATUSES:
21+ self.status_expires = None
22+
23 super(Node, self).save(*args, **kwargs)
24+
25 # We let hostname be blank for the initial save, but fix it before the
26 # save completes. This is because set_random_hostname() operates by
27 # trying to re-save the node with a random hostname, and retrying until
28
29=== modified file 'src/maasserver/models/tests/test_node.py'
30--- src/maasserver/models/tests/test_node.py 2016-10-22 05:50:06 +0000
31+++ src/maasserver/models/tests/test_node.py 2016-10-26 18:28:44 +0000
32@@ -2624,6 +2624,15 @@
33 "Invalid transition: Retired -> Allocated.",
34 node.save)
35
36+ def test_save_resets_status_expires_on_non_monitored_status(self):
37+ # Regression test for LP:1603563
38+ node = factory.make_Node(status=NODE_STATUS.RELEASING)
39+ Node._set_status_expires(node.system_id, 60)
40+ node.status = NODE_STATUS.READY
41+ node.save()
42+ node = reload_object(node)
43+ self.assertIsNone(node.status_expires)
44+
45 def test_full_clean_checks_architecture_for_installable_nodes(self):
46 device = factory.make_Device(architecture='')
47 # Set type here so we don't cause exception while creating object