Merge lp:~gmb/maas/add-alerts-for-disconnected-clusters-bug-1341121 into lp:~maas-committers/maas/trunk

Proposed by Graham Binns
Status: Merged
Approved by: Graham Binns
Approved revision: no longer in the source branch.
Merged at revision: 3195
Proposed branch: lp:~gmb/maas/add-alerts-for-disconnected-clusters-bug-1341121
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 186 lines (+115/-2)
4 files modified
src/maasserver/api/tests/test_nodegroup.py (+3/-0)
src/maasserver/enum.py (+1/-0)
src/maasserver/middleware.py (+21/-2)
src/maasserver/tests/test_middleware.py (+90/-0)
To merge this branch: bzr merge lp:~gmb/maas/add-alerts-for-disconnected-clusters-bug-1341121
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Jeroen T. Vermeulen (community) Approve
Review via email: mp+237039@code.launchpad.net

Commit message

Add a persistent error when there are disconnected clusters, and remove it when all the clusters are connected again.

Previously, the only way to know about clusters being disconnected was to go to the cluster listing and look.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I'm sure this will be appreciated, even if it may be a little annoying to some people with distributed clusters.

Review notes inline.

review: Approve
Revision history for this message
Raphaël Badin (rvb) :
Revision history for this message
Graham Binns (gmb) wrote :
Download full text (3.9 KiB)

On 3 October 2014 12:09, Jeroen T. Vermeulen <email address hidden> wrote:
> I'm sure this will be appreciated, even if it may be a little annoying to some people with distributed clusters.

Definitely! Thank you! Don't worry about being annoying.

Oh, wait, you mean the branch. Nevermind ;P

> Review notes inline.
>
> Diff comments:
>
> Tiny suggestion: instead of...
>
> any(
> not cluster.is_connected() for cluster in accepted_clusters)
>
> you could probably write...
>
> not all(
> cluster.is_connected() for cluster in accepted_clusters)

Yep, done.

>> +class ExternalComponentsMiddlewareTest(MAASServerTestCase):
>> + """Tests for the ExternalComponentsMiddleware."""
>> +
>> + def test__checks_connectivity_of_accepted_clusters(self):
>> + get_client_for = self.patch(nodegroup_module, 'getClientFor')
>> + middleware = ExternalComponentsMiddleware()
>> + clusters = {
>> + factory.make_NodeGroup(status=NODEGROUP_STATUS.ACCEPTED)
>> + for _ in range(3)}
>
> Why are we creating 3 of everything? For luck? If the code is correct for 2, it's probably correct for 3. (And in Python, if it's correct for 1 chances are it's correct for 2. It's not like C where you might do *ptr instead of ptr[i] for every i.)
>
> Also, we've taken to putting the closing brace/bracket of a comprehension on a line of its own.

You know, I honestly don't know why. Habit, I guess. Three is a magic
number. It's not really relevant to have > 1 cluster in this test
anyway, I'll change it.

>> +
>> + request = factory.make_fake_request(factory.make_string(), 'GET')
>> + middleware.process_request(request)
>> +
>> + self.assertThat(
>> + get_client_for, MockCallsMatch(
>> + *[call(cluster.uuid, timeout=0) for cluster in clusters]))
>> +
>> + def test__ignores_non_accepted_clusters(self):
>> + get_client_for = self.patch(nodegroup_module, 'getClientFor')
>> + middleware = ExternalComponentsMiddleware()
>> + statuses = (
>> + NODEGROUP_STATUS.PENDING,
>> + NODEGROUP_STATUS.ACCEPTED,
>> + NODEGROUP_STATUS.REJECTED)
>> + clusters = {
>> + factory.make_NodeGroup(status=random.choice(statuses))
>> + for _ in range(3)}
>
> Again with the three! But why bother testing three random items from a list of 3 values, when it's actually easier to test all values?
>
> Although in this case I personally would do it differently: write one simple test just for ACCEPTED, and one simple test for a random other status. In the event that we get it wrong for pending clusters but not rejected ones, or vice versa, that leaves a fifty-fifty chance of the test passing by accident on any given run, so up to you to judge how likely that is.

Right.

> Also, there's no need to enumerate the statuses manually; just use factory.pick_enum(NODEGROUP_STATUS, but_not=[NODEGROUP_STATUS.ACCEPTED]).

Ah, didn't know about that, thanks.

>> + accepted_cluster_uuids = (
>> + cluster.uuid for cluster in clusters
>> + if cluster.status == NODEGROUP_STATUS....

Read more...

Revision history for this message
Graham Binns (gmb) wrote :

On 3 October 2014 12:11, Raphaël Badin <email address hidden> wrote:
>
>
> Diff comments:
>
>> === modified file 'src/maasserver/enum.py'
>> --- src/maasserver/enum.py 2014-09-27 02:22:32 +0000
>> +++ src/maasserver/enum.py 2014-10-03 10:53:21 +0000
>> @@ -41,6 +41,7 @@
>> """Major moving parts of the application that may have failure states."""
>> PSERV = 'provisioning server'
>> IMPORT_PXE_FILES = 'maas-import-pxe-files script'
>> + CLUSTERS = 'clusters'
>
> We might want to have other component errors related to clusters. Maybe this should be renamed "DISCONNECTED_CLUSTERS"

Agreed.

>> class NODE_STATUS:
>>
>> + accepted_clusters = NodeGroup.objects.filter(
>> + status=NODEGROUP_STATUS.ACCEPTED)
>> + disconnected_clusters_found = any(
>> + not cluster.is_connected() for cluster in accepted_clusters)
>> + if disconnected_clusters_found:
>> + register_persistent_error(
>> + COMPONENT.CLUSTERS,
>> + "One or more clusters are currently disconnected. Visit "
>> + "the <a href=\"%s\">clusters</a> page for more "
>
> I think the link should apply to "clusters page" instead of just "clusters" but it's a detail.
>

Fixed.

Revision history for this message
Raphaël Badin (rvb) :
review: Approve
Revision history for this message
MAAS Lander (maas-lander) wrote :
Download full text (23.0 KiB)

The attempt to merge lp:~gmb/maas/add-alerts-for-disconnected-clusters-bug-1341121 into lp:maas failed. Below is the output from the failed tests.

Ign http://security.ubuntu.com trusty-security InRelease
Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
Ign http://nova.clouds.archive.ubuntu.com trusty InRelease
Get:2 http://security.ubuntu.com trusty-security Release [59.7 kB]
Ign http://nova.clouds.archive.ubuntu.com trusty-updates InRelease
Hit http://nova.clouds.archive.ubuntu.com trusty Release.gpg
Get:3 http://nova.clouds.archive.ubuntu.com trusty-updates Release.gpg [933 B]
Hit http://nova.clouds.archive.ubuntu.com trusty Release
Get:4 http://nova.clouds.archive.ubuntu.com trusty-updates Release [59.7 kB]
Get:5 http://security.ubuntu.com trusty-security/main Sources [46.3 kB]
Hit http://nova.clouds.archive.ubuntu.com trusty/main Sources
Get:6 http://security.ubuntu.com trusty-security/universe Sources [10.8 kB]
Get:7 http://security.ubuntu.com trusty-security/main amd64 Packages [146 kB]
Hit http://nova.clouds.archive.ubuntu.com trusty/universe Sources
Hit http://nova.clouds.archive.ubuntu.com trusty/main amd64 Packages
Hit http://nova.clouds.archive.ubuntu.com trusty/universe amd64 Packages
Hit http://nova.clouds.archive.ubuntu.com trusty/main Translation-en
Hit http://nova.clouds.archive.ubuntu.com trusty/universe Translation-en
Get:8 http://security.ubuntu.com trusty-security/universe amd64 Packages [49.1 kB]
Hit http://security.ubuntu.com trusty-security/main Translation-en
Hit http://security.ubuntu.com trusty-security/universe Translation-en
Ign http://nova.clouds.archive.ubuntu.com trusty/main Translation-en_US
Ign http://nova.clouds.archive.ubuntu.com trusty/universe Translation-en_US
Get:9 http://nova.clouds.archive.ubuntu.com trusty-updates/main Sources [125 kB]
Get:10 http://nova.clouds.archive.ubuntu.com trusty-updates/universe Sources [86.2 kB]
Get:11 http://nova.clouds.archive.ubuntu.com trusty-updates/main amd64 Packages [336 kB]
Get:12 http://nova.clouds.archive.ubuntu.com trusty-updates/universe amd64 Packages [209 kB]
Hit http://nova.clouds.archive.ubuntu.com trusty-updates/main Translation-en
Hit http://nova.clouds.archive.ubuntu.com trusty-updates/universe Translation-en
Fetched 1,129 kB in 0s (1,852 kB/s)
Reading package lists...
sudo DEBIAN_FRONTEND=noninteractive apt-get -y \
     --no-install-recommends install apache2 authbind bind9 bind9utils build-essential bzr-builddeb curl daemontools debhelper dh-apport distro-info dnsutils firefox freeipmi-tools gjs ipython isc-dhcp-common libjs-raphael libjs-yui3-full libjs-yui3-min libpq-dev make pep8 postgresql pyflakes python-amqplib python-bzrlib python-celery python-convoy python-crochet python-cssselect python-curtin python-dev python-distro-info python-django python-django-piston python-django-south python-djorm-ext-pgarray python-docutils python-extras python-fixtures python-flake8 python-formencode python-hivex python-httplib2 python-jinja2 python-jsonschema python-lockfile python-lxml python-mimeparse python-mock python-netaddr python-netifaces python-nose python-oauth python-oops python-oops-amqp python-oops-datedir-repo python-oops-twisted ...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/maasserver/api/tests/test_nodegroup.py'
--- src/maasserver/api/tests/test_nodegroup.py 2014-09-24 14:20:51 +0000
+++ src/maasserver/api/tests/test_nodegroup.py 2014-10-03 14:52:27 +0000
@@ -218,6 +218,9 @@
218 self.assertEqual(httplib.FORBIDDEN, response.status_code)218 self.assertEqual(httplib.FORBIDDEN, response.status_code)
219219
220 def test_import_boot_images_for_all_accepted_clusters(self):220 def test_import_boot_images_for_all_accepted_clusters(self):
221 # Patch out the is_connected method of NodeGroup so as to avoid
222 # extra calls to getClient()
223 self.patch(nodegroup_module.NodeGroup, 'is_connected')
221 self.become_admin()224 self.become_admin()
222 mock_getClientFor = self.patch(nodegroup_module, 'getClientFor')225 mock_getClientFor = self.patch(nodegroup_module, 'getClientFor')
223 accepted_nodegroups = [226 accepted_nodegroups = [
224227
=== modified file 'src/maasserver/enum.py'
--- src/maasserver/enum.py 2014-09-27 02:22:32 +0000
+++ src/maasserver/enum.py 2014-10-03 14:52:27 +0000
@@ -41,6 +41,7 @@
41 """Major moving parts of the application that may have failure states."""41 """Major moving parts of the application that may have failure states."""
42 PSERV = 'provisioning server'42 PSERV = 'provisioning server'
43 IMPORT_PXE_FILES = 'maas-import-pxe-files script'43 IMPORT_PXE_FILES = 'maas-import-pxe-files script'
44 CLUSTERS = 'clusters'
4445
4546
46class NODE_STATUS:47class NODE_STATUS:
4748
=== modified file 'src/maasserver/middleware.py'
--- src/maasserver/middleware.py 2014-10-01 16:50:53 +0000
+++ src/maasserver/middleware.py 2014-10-03 14:52:27 +0000
@@ -44,6 +44,14 @@
44 HttpResponseForbidden,44 HttpResponseForbidden,
45 HttpResponseRedirect,45 HttpResponseRedirect,
46 )46 )
47from maasserver.components import (
48 discard_persistent_error,
49 register_persistent_error,
50 )
51from maasserver.enum import (
52 COMPONENT,
53 NODEGROUP_STATUS,
54 )
47from maasserver.models.nodegroup import NodeGroup55from maasserver.models.nodegroup import NodeGroup
48from provisioningserver.rpc.exceptions import (56from provisioningserver.rpc.exceptions import (
49 MultipleFailures,57 MultipleFailures,
@@ -139,8 +147,19 @@
139 # disturbing the handling of the request. Proper error reporting147 # disturbing the handling of the request. Proper error reporting
140 # should be handled in the check method itself.148 # should be handled in the check method itself.
141 try:149 try:
142 # TODO: Components checks here.150 # Check cluster connectivity.
143 pass151 accepted_clusters = NodeGroup.objects.filter(
152 status=NODEGROUP_STATUS.ACCEPTED)
153 disconnected_clusters_found = not all(
154 cluster.is_connected() for cluster in accepted_clusters)
155 if disconnected_clusters_found:
156 register_persistent_error(
157 COMPONENT.CLUSTERS,
158 "One or more clusters are currently disconnected. Visit "
159 "the <a href=\"%s\">clusters page</a> for more "
160 "information." % reverse('cluster-list'))
161 else:
162 discard_persistent_error(COMPONENT.CLUSTERS)
144 except Exception:163 except Exception:
145 pass164 pass
146 return None165 return None
147166
=== modified file 'src/maasserver/tests/test_middleware.py'
--- src/maasserver/tests/test_middleware.py 2014-10-01 16:50:53 +0000
+++ src/maasserver/tests/test_middleware.py 2014-10-03 14:52:27 +0000
@@ -25,9 +25,18 @@
25 PermissionDenied,25 PermissionDenied,
26 ValidationError,26 ValidationError,
27 )27 )
28from django.core.urlresolvers import reverse
28from django.http import HttpResponse29from django.http import HttpResponse
29from django.http.request import build_request_repr30from django.http.request import build_request_repr
30from fixtures import FakeLogger31from fixtures import FakeLogger
32from maasserver.components import (
33 get_persistent_error,
34 register_persistent_error,
35 )
36from maasserver.enum import (
37 COMPONENT,
38 NODEGROUP_STATUS,
39 )
31from maasserver.exceptions import (40from maasserver.exceptions import (
32 ExternalComponentException,41 ExternalComponentException,
33 MAASAPIException,42 MAASAPIException,
@@ -40,11 +49,17 @@
40 DebuggingLoggerMiddleware,49 DebuggingLoggerMiddleware,
41 ErrorsMiddleware,50 ErrorsMiddleware,
42 ExceptionMiddleware,51 ExceptionMiddleware,
52 ExternalComponentsMiddleware,
43 RPCErrorsMiddleware,53 RPCErrorsMiddleware,
44 )54 )
55from maasserver.models import nodegroup as nodegroup_module
45from maasserver.testing import extract_redirect56from maasserver.testing import extract_redirect
46from maasserver.testing.factory import factory57from maasserver.testing.factory import factory
47from maasserver.testing.testcase import MAASServerTestCase58from maasserver.testing.testcase import MAASServerTestCase
59from maastesting.matchers import (
60 MockCalledOnceWith,
61 MockNotCalled,
62 )
48from maastesting.utils import sample_binary_data63from maastesting.utils import sample_binary_data
49from provisioningserver.rpc.exceptions import (64from provisioningserver.rpc.exceptions import (
50 MultipleFailures,65 MultipleFailures,
@@ -506,3 +521,78 @@
506 "but that's just peanuts to space!")521 "but that's just peanuts to space!")
507 response = middleware.process_exception(request, exception)522 response = middleware.process_exception(request, exception)
508 self.assertIsNone(response)523 self.assertIsNone(response)
524
525
526class ExternalComponentsMiddlewareTest(MAASServerTestCase):
527 """Tests for the ExternalComponentsMiddleware."""
528
529 def test__checks_connectivity_of_accepted_clusters(self):
530 get_client_for = self.patch(nodegroup_module, 'getClientFor')
531 middleware = ExternalComponentsMiddleware()
532 cluster = factory.make_NodeGroup(status=NODEGROUP_STATUS.ACCEPTED)
533
534 request = factory.make_fake_request(factory.make_string(), 'GET')
535 middleware.process_request(request)
536
537 self.assertThat(
538 get_client_for, MockCalledOnceWith(cluster.uuid, timeout=0))
539
540 def test__ignores_non_accepted_clusters(self):
541 get_client_for = self.patch(nodegroup_module, 'getClientFor')
542 factory.make_NodeGroup(
543 status=factory.pick_enum(
544 NODEGROUP_STATUS, but_not=[NODEGROUP_STATUS.ACCEPTED]))
545 middleware = ExternalComponentsMiddleware()
546 request = factory.make_fake_request(factory.make_string(), 'GET')
547 middleware.process_request(request)
548
549 self.assertThat(get_client_for, MockNotCalled())
550
551 def test__registers_error_if_all_clusters_are_disconnected(self):
552 get_client_for = self.patch(nodegroup_module, 'getClientFor')
553 get_client_for.side_effect = NoConnectionsAvailable(
554 "Why, it's a jet-propelled, guided NAAFI!")
555 middleware = ExternalComponentsMiddleware()
556 factory.make_NodeGroup(status=NODEGROUP_STATUS.ACCEPTED)
557
558 request = factory.make_fake_request(factory.make_string(), 'GET')
559 middleware.process_request(request)
560
561 error = get_persistent_error(COMPONENT.CLUSTERS)
562 self.assertEqual(
563 "One or more clusters are currently disconnected. Visit the "
564 "<a href=\"%s\">clusters page</a> for more information." %
565 reverse('cluster-list'),
566 error)
567
568 def test__registers_error_if_any_clusters_are_disconnected(self):
569 get_client_for = self.patch(nodegroup_module, 'getClientFor')
570 get_client_for.side_effect = [
571 NoConnectionsAvailable("Why, it's a jet-propelled, guided NAAFI!"),
572 None,
573 ]
574 middleware = ExternalComponentsMiddleware()
575 factory.make_NodeGroup(status=NODEGROUP_STATUS.ACCEPTED)
576
577 request = factory.make_fake_request(factory.make_string(), 'GET')
578 middleware.process_request(request)
579
580 error = get_persistent_error(COMPONENT.CLUSTERS)
581 self.assertEqual(
582 "One or more clusters are currently disconnected. Visit the "
583 "<a href=\"%s\">clusters page</a> for more information." %
584 reverse('cluster-list'),
585 error)
586
587 def test__removes_error_once_all_clusters_are_connected(self):
588 # Patch getClientFor() to ensure that we don't actually try to
589 # connect to the cluster.
590 self.patch(nodegroup_module, 'getClientFor')
591 middleware = ExternalComponentsMiddleware()
592 factory.make_NodeGroup(status=NODEGROUP_STATUS.ACCEPTED)
593
594 register_persistent_error(
595 COMPONENT.CLUSTERS, "Who flung that batter pudding?")
596 request = factory.make_fake_request(factory.make_string(), 'GET')
597 middleware.process_request(request)
598 self.assertIsNone(get_persistent_error(COMPONENT.CLUSTERS))