Merge lp:~andreserl/maas/lp1598149_lp1605252 into lp:~maas-committers/maas/trunk

Proposed by Andres Rodriguez
Status: Merged
Approved by: Andres Rodriguez
Approved revision: no longer in the source branch.
Merged at revision: 5200
Proposed branch: lp:~andreserl/maas/lp1598149_lp1605252
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 199 lines (+54/-13)
5 files modified
src/maasserver/models/node.py (+10/-3)
src/maasserver/node_status.py (+16/-0)
src/maasserver/status_monitor.py (+18/-3)
src/maasserver/tests/test_status_monitor.py (+6/-5)
src/metadataserver/api.py (+4/-2)
To merge this branch: bzr merge lp:~andreserl/maas/lp1598149_lp1605252
Reviewer Review Type Date Requested Status
Blake Rouse (community) Approve
Lee Trager (community) Approve
Review via email: mp+300818@code.launchpad.net

Commit message

Ensure that messages when a timeout for deploying, releasing, and commissioning happens is logged (This was supported in older releases but was dropped in 2.0).

Also, drive-by fix the commissioning monitor services. This ensures that the status_expires is not cleared when it shouldn't and allows the monitor to work while commissioning.

Another drive-by improvement, is to ensure that the monitor is only checking MACHINES and not anything else.

To post a comment you must log in.
Revision history for this message
Lee Trager (ltrager) wrote :

Looks good. Did you test that rack and region refresh still works even when they're not on the same machine?

review: Approve
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Looks good.

review: Approve
Revision history for this message
MAAS Lander (maas-lander) wrote :
Download full text (1.4 MiB)

The attempt to merge lp:~andreserl/maas/lp1598149_lp1605252 into lp:maas failed. Below is the output from the failed tests.

Hit:1 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial InRelease
Get:2 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Hit:4 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-backports InRelease
Get:5 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [328 kB]
Get:6 http://prodstack-zone-2.clouds.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [300 kB]
Fetched 818 kB in 0s (1,581 kB/s)
Reading package lists...
sudo DEBIAN_FRONTEND=noninteractive apt-get -y \
    --no-install-recommends install apache2 archdetect-deb authbind bash bind9 bind9utils build-essential bzr bzr-builddeb chromium-browser chromium-chromedriver curl daemontools debhelper dh-apport dh-systemd distro-info dnsutils firefox freeipmi-tools git gjs ipython isc-dhcp-common libjs-angularjs libjs-jquery libjs-jquery-hotkeys libjs-yui3-full libjs-yui3-min libpq-dev make nodejs-legacy npm postgresql pxelinux python3-all python3-apt python3-bson python3-convoy python3-crochet python3-cssselect python3-curtin python3-dev python3-distro-info python3-django python3-django-nose python3-django-piston3 python3-dnspython python3-docutils python3-formencode python3-hivex python3-httplib2 python3-jinja2 python3-jsonschema python3-lxml python3-netaddr python3-netifaces python3-novaclient python3-oauth python3-oauthlib python3-openssl python3-paramiko python3-petname python3-pexpect python3-psycopg2 python3-pyinotify python3-pyparsing python3-pyvmomi python3-requests python3-seamicroclient python3-setuptools python3-simplestreams python3-sphinx python3-tempita python3-twisted python3-txtftp python3-tz python3-yaml python3-zope.interface python-bson python-crochet python-django python-django-piston python-djorm-ext-pgarray python-formencode python-lxml python-netaddr python-netifaces python-pocket-lint python-psycopg2 python-simplejson python-tempita python-twisted python-yaml socat syslinux-common tgt ubuntu-cloudimage-keyring wget xvfb
Reading package lists...
Building dependency tree...
Reading state information...
archdetect-deb is already the newest version (1.117ubuntu2).
authbind is already the newest version (2.1.1+nmu1).
build-essential is already the newest version (12.1ubuntu2).
curl is already the newest version (7.47.0-1ubuntu2).
debhelper is already the newest version (9.20160115ubuntu3).
distro-info is already the newest version (0.14build1).
freeipmi-tools is already the newest version (1.4.11-1ubuntu1).
git is already the newest version (1:2.7.4-0ubuntu1).
libjs-angularjs is already the newest version (1.2.28-1ubuntu2).
libjs-jquery is already the newest version (1.11.3+dfsg-4).
libjs-yui3-full is already the newest version (3.5.1-1ubuntu3).
libjs-yui3-min is already the newest version (3.5.1-1ubuntu3).
make is already the newest version (4.1-6).
postgresql is already the newest version (9.5+173).
pxelinux is already the newest version (3:6.03+dfsg-11...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/maasserver/models/node.py'
2--- src/maasserver/models/node.py 2016-07-18 20:06:21 +0000
3+++ src/maasserver/models/node.py 2016-07-22 14:42:17 +0000
4@@ -123,6 +123,7 @@
5 COMMISSIONING_LIKE_STATUSES,
6 get_failed_status,
7 is_failed_status,
8+ NODE_FAILURE_MONITORED_STATUS_TIMEOUTS,
9 NODE_TRANSITIONS,
10 )
11 from maasserver.rpc import (
12@@ -1106,7 +1107,9 @@
13 """
14 # Return a *very* conservative estimate for now.
15 # Something that shouldn't conflict with any deployment.
16- return timedelta(minutes=40).total_seconds()
17+ return timedelta(
18+ minutes=NODE_FAILURE_MONITORED_STATUS_TIMEOUTS[
19+ NODE_STATUS.DEPLOYING]).total_seconds()
20
21 def get_commissioning_time(self):
22 """Return the commissioning time of this node (in seconds).
23@@ -1114,14 +1117,18 @@
24 This is the maximum time the commissioning is allowed to take.
25 """
26 # Return a *very* conservative estimate for now.
27- return timedelta(minutes=20).total_seconds()
28+ return timedelta(
29+ minutes=NODE_FAILURE_MONITORED_STATUS_TIMEOUTS[
30+ NODE_STATUS.COMMISSIONING]).total_seconds()
31
32 def get_releasing_time(self):
33 """Return the releasing time of this node (in seconds).
34
35 This is the maximum time that releasing is allowed to take.
36 """
37- return timedelta(minutes=5).total_seconds()
38+ return timedelta(
39+ minutes=NODE_FAILURE_MONITORED_STATUS_TIMEOUTS[
40+ NODE_STATUS.RELEASING]).total_seconds()
41
42 def _register_request_event(
43 self, user, type_name, action='', comment=None):
44
45=== modified file 'src/maasserver/node_status.py'
46--- src/maasserver/node_status.py 2015-12-01 18:12:59 +0000
47+++ src/maasserver/node_status.py 2016-07-22 14:42:17 +0000
48@@ -162,6 +162,22 @@
49 NODE_STATUS.DISK_ERASING: NODE_STATUS.FAILED_DISK_ERASING,
50 }
51
52+# State transitions that are monitored for timeouts for when a node
53+# fails:
54+# Mapping between in-progress statuses and the corresponding failed
55+# statuses.
56+NODE_FAILURE_MONITORED_STATUS_TRANSITIONS = {
57+ NODE_STATUS.COMMISSIONING: NODE_STATUS.FAILED_COMMISSIONING,
58+ NODE_STATUS.DEPLOYING: NODE_STATUS.FAILED_DEPLOYMENT,
59+ NODE_STATUS.RELEASING: NODE_STATUS.FAILED_RELEASING,
60+}
61+
62+NODE_FAILURE_MONITORED_STATUS_TIMEOUTS = {
63+ NODE_STATUS.COMMISSIONING: 20,
64+ NODE_STATUS.DEPLOYING: 40,
65+ NODE_STATUS.RELEASING: 5,
66+}
67+
68 # Statuses that correspond to managed steps for which MAAS actively
69 # monitors that the status changes after a fixed period of time.
70 MONITORED_STATUSES = list(NODE_FAILURE_STATUS_TRANSITIONS.keys())
71
72=== modified file 'src/maasserver/status_monitor.py'
73--- src/maasserver/status_monitor.py 2016-02-01 10:28:01 +0000
74+++ src/maasserver/status_monitor.py 2016-07-22 14:42:17 +0000
75@@ -8,9 +8,16 @@
76 'StatusMonitorService',
77 ]
78
79+from maasserver.enum import (
80+ NODE_STATUS_CHOICES_DICT,
81+ NODE_TYPE,
82+)
83 from maasserver.models.node import Node
84 from maasserver.models.timestampedmodel import now
85-from maasserver.node_status import NODE_FAILURE_STATUS_TRANSITIONS
86+from maasserver.node_status import (
87+ NODE_FAILURE_MONITORED_STATUS_TIMEOUTS,
88+ NODE_FAILURE_MONITORED_STATUS_TRANSITIONS,
89+)
90 from maasserver.utils.orm import transactional
91 from maasserver.utils.threads import deferToDatabase
92 from provisioningserver.utils.twisted import synchronous
93@@ -21,14 +28,22 @@
94 """Mark all nodes in that database as failed where the status did not
95 transition in time. `status_expires` is checked on the node to see if the
96 current time is newer than the expired time.
97+
98+ Status monitors are only available for Machines that are Commissioning,
99+ Deploying or Releasing.
100 """
101 current_db_time = now()
102 expired_nodes = Node.objects.filter(
103- status__in=NODE_FAILURE_STATUS_TRANSITIONS.keys(),
104+ node_type=NODE_TYPE.MACHINE,
105+ status__in=NODE_FAILURE_MONITORED_STATUS_TRANSITIONS.keys(),
106 status_expires__isnull=False,
107 status_expires__lte=current_db_time)
108 for node in expired_nodes:
109- node._mark_failed(None, commit=False)
110+ comment = "Machine operation '%s' timed out after %s minutes." % (
111+ NODE_STATUS_CHOICES_DICT[node.status],
112+ NODE_FAILURE_MONITORED_STATUS_TIMEOUTS[node.status],
113+ )
114+ node._mark_failed(None, commit=False, comment=comment)
115 node.status_expires = None
116 node.save()
117
118
119=== modified file 'src/maasserver/tests/test_status_monitor.py'
120--- src/maasserver/tests/test_status_monitor.py 2016-05-16 09:21:53 +0000
121+++ src/maasserver/tests/test_status_monitor.py 2016-07-22 14:42:17 +0000
122@@ -14,7 +14,7 @@
123
124 from maasserver import status_monitor
125 from maasserver.models.signals.testing import SignalsDisabled
126-from maasserver.node_status import NODE_FAILURE_STATUS_TRANSITIONS
127+from maasserver.node_status import NODE_FAILURE_MONITORED_STATUS_TRANSITIONS
128 from maasserver.status_monitor import (
129 mark_nodes_failed_after_expiring,
130 StatusMonitorService,
131@@ -40,7 +40,7 @@
132 expired_time = current_time - timedelta(minutes=1)
133 nodes = [
134 factory.make_Node(status=status, status_expires=expired_time)
135- for status in NODE_FAILURE_STATUS_TRANSITIONS.keys()
136+ for status in NODE_FAILURE_MONITORED_STATUS_TRANSITIONS.keys()
137 ]
138 mark_nodes_failed_after_expiring()
139 failed_statuses = [
140@@ -48,7 +48,8 @@
141 for node in nodes
142 ]
143 self.assertItemsEqual(
144- NODE_FAILURE_STATUS_TRANSITIONS.values(), failed_statuses)
145+ NODE_FAILURE_MONITORED_STATUS_TRANSITIONS.values(),
146+ failed_statuses)
147
148 def test__skips_those_that_have_not_expired(self):
149 self.useFixture(SignalsDisabled("power"))
150@@ -57,7 +58,7 @@
151 expired_time = current_time + timedelta(minutes=1)
152 nodes = [
153 factory.make_Node(status=status, status_expires=expired_time)
154- for status in NODE_FAILURE_STATUS_TRANSITIONS.keys()
155+ for status in NODE_FAILURE_MONITORED_STATUS_TRANSITIONS.keys()
156 ]
157 mark_nodes_failed_after_expiring()
158 failed_statuses = [
159@@ -65,7 +66,7 @@
160 for node in nodes
161 ]
162 self.assertItemsEqual(
163- NODE_FAILURE_STATUS_TRANSITIONS.keys(), failed_statuses)
164+ NODE_FAILURE_MONITORED_STATUS_TRANSITIONS.keys(), failed_statuses)
165
166
167 class TestStatusMonitorService(MAASServerTestCase):
168
169=== modified file 'src/metadataserver/api.py'
170--- src/metadataserver/api.py 2016-06-09 08:10:02 +0000
171+++ src/metadataserver/api.py 2016-07-22 14:42:17 +0000
172@@ -352,7 +352,6 @@
173 # At the end of a top-level event, we change the node status.
174 if _is_top_level(activity_name) and event_type == 'finish':
175 if node.status == NODE_STATUS.COMMISSIONING:
176- node.status_expires = None
177 if result in ['FAIL', 'FAILURE']:
178 node.status = NODE_STATUS.FAILED_COMMISSIONING
179
180@@ -370,6 +369,7 @@
181 if node.node_type == NODE_TYPE.MACHINE and node.status in [
182 NODE_STATUS.READY,
183 NODE_STATUS.FAILED_COMMISSIONING]:
184+ node.status_expires = None
185 node.owner = None
186 node.error = 'failed: %s' % description
187
188@@ -518,8 +518,10 @@
189 if node.power_type != "mscm":
190 store_node_power_parameters(node, request)
191
192- node.status_expires = None
193 target_status = self.signaling_statuses.get(status)
194+ if target_status in [NODE_STATUS.FAILED_COMMISSIONING,
195+ NODE_STATUS.READY]:
196+ node.status_expires = None
197 # Recalculate tags when commissioning ends.
198 if target_status == NODE_STATUS.READY:
199 populate_tags_for_single_node(Tag.objects.all(), node)