Merge lp:~blake-rouse/maas/fix-1559398 into lp:~maas-committers/maas/trunk

Proposed by Blake Rouse
Status: Merged
Approved by: Blake Rouse
Approved revision: no longer in the source branch.
Merged at revision: 4815
Proposed branch: lp:~blake-rouse/maas/fix-1559398
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 407 lines (+240/-82)
7 files modified
etc/maas/templates/dhcp/dhcpd.conf.template (+3/-3)
scripts/maas-dhcp-helper (+128/-0)
setup.py (+2/-0)
src/provisioningserver/__main__.py (+0/-2)
src/provisioningserver/dhcp/notifier.py (+0/-73)
src/provisioningserver/dhcp/tests/test_helper_clean.py (+103/-0)
src/provisioningserver/dhcp/tests/test_helper_notify.py (+4/-4)
To merge this branch: bzr merge lp:~blake-rouse/maas/fix-1559398
Reviewer Review Type Date Requested Status
Mike Pontillo (community) Approve
Review via email: mp+289712@code.launchpad.net

Commit message

Add maas-dhcp-notify wrapper that forks maas-rack dhcp-notify and returns. This fixes the issue where if a execute action in isc-dhcp-server takes to long it affects the performance of isc-dhcp-server.

Description of the change

I have QA'd this and previously commissioning with just 40 nodes would result only half getting an actual lease. This change allowed all to get a lease and commission successfully.

Packaging change is also required: https://code.launchpad.net/~blake-rouse/maas/fix-1559398-packaging/+merge/289713

To post a comment you must log in.
Revision history for this message
Mike Pontillo (mpontillo) wrote :

See one minor nit below.

My only concern with this branch is that instead of queuing up on a blocking write, we're creating an explosion of processes whenever a large number of DHCP notifications happen.

While this might scale to ~40 nodes, it may not scale to hundreds or thousands; we may fail to create more processes and lose notifications.

Long-term, we should queue up DHCP notifications in memory as fast as we can read them, and a worker should dequeue and process them. (if we still have problems scaling, we can scan the queue for redundant notifications, too.)

review: Approve
Revision history for this message
Mike Pontillo (mpontillo) wrote :

One other data point: I wrote a simple script to accept connections on a UNIX datagram socket:

https://paste.ubuntu.com/15468191/

... with a "processing delay" of one second after reading each datagram.

Using the Python client, I can easily get ~10 packets ahead of the server while it takes its time dequeuing the packets, using a command similar to:

bin/maas-rack dhcp-notify --action commit --mac 74:d4:35:89:bd:a9 --ip-family ipv4 --ip 192.168.10.60 --lease-time 43200 --hostname test-$SEQ --socket /tmp/test-socket

Using netcat, I can send in 500+ packets before it starts blocking on write, using a command similar to:

echo -n "{\"ip\": \"192.168.10.60\", \"mac\": \"74:d4:35:89:bd:a9\", \"ip_family\": \"ipv4\", \"hostname\": \"test-$SEQ\", \"lease_time\": 43200, \"action\": \"commit\", \"timestamp\": 1458597246}" | nc -w0 -uU /tmp/test-socket

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Upon further testing, I don't think the fork approach is viable. I was able to cripple my system after ~150-200 processes forked, which could easily happen in MAAS if someone brings a rack online.

review: Disapprove
Revision history for this message
Mike Pontillo (mpontillo) wrote :

It looks like something we're doing to set up the notification process is very slow (on the order of half a second, or ~100 notifications in a little less than a minute). Since we don't want to pull in a dependency on netcat, I looked at other ways to send the lease notification message faster.

If I use the following test code to write to the socket, it's plenty fast (~100 notifications in ~1.2 seconds on my system):

python -c "import socket; conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM); conn.connect('/tmp/test-socket'); conn.send('<lease-json>'); conn.close()"

Something we're doing to set up the command framework and parse the arguments may be what's making it slow. (I already tried eliminating the dependency on the JSON module, and that didn't make any difference.)

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Looks much better; thanks for the fixes.

I see you've included the host-cleaning functionality in this utility as well. (I didn't look at it in too much detail, but it seems a little scary to do this with a regular expression; you might want to verify that it can't ever match the word "host" inside, for example, a lease block.) So a couple of comments on that:

 - It might be safer to use the ISC parser than a regular expression.
 - I hope we're going to ensure that dhcpd is disabled while we clean the file, since it could be large, and in the time it takes to rewrite the file another lease could easily have been allocated.
 - Could we possibly read the file to determine which host entries we want to remove, and use OMAPI to remove them?

review: Approve
Revision history for this message
Mike Pontillo (mpontillo) wrote :

One other comment. Since you've added the 'clean' functionality to this code, you could probably make the socket notifier portion faster if you move "import re" to the clean function, that way it isn't imported if we aren't cleaning the leases.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Good idea. Moved the imports to each sub command so that they run faster.

Also the clean is called specifically from systemd when the the service before it is started. So its 100% not running when that is called. This ensures that its not being modified at the same time, as its not even running.

The test includes hostname in the lease entry that my previous regex was matching. This regex is specific to a host block not any host in the lease section.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'etc/maas/templates/dhcp/dhcpd.conf.template'
--- etc/maas/templates/dhcp/dhcpd.conf.template 2016-03-17 19:09:25 +0000
+++ etc/maas/templates/dhcp/dhcpd.conf.template 2016-03-22 17:04:41 +0000
@@ -89,7 +89,7 @@
89 set cllt = binary-to-ascii(10, 32, "", encode-int(lease-time, 32));89 set cllt = binary-to-ascii(10, 32, "", encode-int(lease-time, 32));
90 set clht = pick-first-value(option host-name, "(none)");90 set clht = pick-first-value(option host-name, "(none)");
91 execute(91 execute(
92 "/usr/sbin/maas-rack", "dhcp-notify",92 "/usr/sbin/maas-dhcp-helper", "notify",
93 "--action", "commit", "--mac", clhw,93 "--action", "commit", "--mac", clhw,
94 "--ip-family", "ipv4", "--ip", clip,94 "--ip-family", "ipv4", "--ip", clip,
95 "--lease-time", cllt, "--hostname", clht);95 "--lease-time", cllt, "--hostname", clht);
@@ -98,7 +98,7 @@
98 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));98 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));
99 set clip = binary-to-ascii(10, 8, ".", leased-address);99 set clip = binary-to-ascii(10, 8, ".", leased-address);
100 execute(100 execute(
101 "/usr/sbin/maas-rack", "dhcp-notify",101 "/usr/sbin/maas-dhcp-helper", "notify",
102 "--action", "expiry", "--mac", clhw,102 "--action", "expiry", "--mac", clhw,
103 "--ip-family", "ipv4", "--ip", clip);103 "--ip-family", "ipv4", "--ip", clip);
104}104}
@@ -106,7 +106,7 @@
106 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));106 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));
107 set clip = binary-to-ascii(10, 8, ".", leased-address);107 set clip = binary-to-ascii(10, 8, ".", leased-address);
108 execute(108 execute(
109 "/usr/sbin/maas-rack", "dhcp-notify",109 "/usr/sbin/maas-dhcp-helper", "notify",
110 "--action", "release", "--mac", clhw,110 "--action", "release", "--mac", clhw,
111 "--ip-family", "ipv4", "--ip", clip);111 "--ip-family", "ipv4", "--ip", clip);
112}112}
113113
=== added file 'scripts/maas-dhcp-helper'
--- scripts/maas-dhcp-helper 1970-01-01 00:00:00 +0000
+++ scripts/maas-dhcp-helper 2016-03-22 17:04:41 +0000
@@ -0,0 +1,128 @@
1#!/usr/bin/env python3
2# -*- mode: python -*-
3# Copyright 2016 Canonical Ltd. This software is licensed under the
4# GNU Affero General Public License version 3 (see the file LICENSE).
5
6"""Utility script used by maas-dhcpd and maas-dhcpd6.
7
8`notify` is called by maas-dhcpd and maas-dhcpd6 with the on execute command
9from isc-dhcp-server when a lease is committed, released, or expired. Sends
10a JSON encoded message over the `dhcpd.sock` to the running rackd process.
11
12`clean` is called by maas-dhcpd and maas-dhcpd6 when either service starts
13to remove any host entries from the passed leases file. This is used to make
14sure that when maas-dhcpd and maas-dhcpd6 start that only the host entries
15from the dhcpd.conf or dhcpd6.conf are used.
16"""
17
18import argparse
19from contextlib import closing
20import json
21import re
22import socket
23import time
24
25
26def notify_add_arguments(parser):
27 """Initialise options for sending DHCP notification over the dhcpd.sock.
28
29 :param parser: An instance of :class:`ArgumentParser`.
30 """
31 parser.add_argument(
32 "--action", action="store", required=True,
33 choices=['commit', 'expiry', 'release'], help=(
34 "Action taken by DHCP server for the lease."))
35 parser.add_argument(
36 "--mac", action="store", required=True, help=(
37 "MAC address for lease."))
38 parser.add_argument(
39 "--ip-family", action="store", required=True, choices=['ipv4', 'ipv6'],
40 help="IP address family for lease.")
41 parser.add_argument(
42 "--ip", action="store", required=True, help=(
43 "IP address for lease."))
44 parser.add_argument(
45 "--lease-time", action="store", type=int, required=False, help=(
46 "Length of time before the lease expires."))
47 parser.add_argument(
48 "--hostname", action="store", required=False, help=(
49 "Hostname of the machine for the lease."))
50 parser.add_argument(
51 "--socket", action="store", required=False,
52 default="/var/lib/maas/dhcpd.sock", help=argparse.SUPPRESS)
53
54
55def notify(args):
56 """Write DHCP action notification to the `dhcpd.sock`."""
57 notify_packet = {
58 "action": args.action,
59 "mac": args.mac,
60 "ip_family": args.ip_family,
61 "ip": args.ip,
62 "timestamp": int(time.time()),
63 }
64
65 # Lease time is required by the commit action and hostname is optional.
66 if args.action == "commit":
67 notify_packet["lease_time"] = args.lease_time
68 hostname = args.hostname
69 has_hostname = (
70 hostname is not None and
71 len(hostname) > 0 and
72 not hostname.isspace())
73 if has_hostname:
74 notify_packet["hostname"] = hostname
75
76 # Connect to the socket and send the packet over as JSON.
77 payload = json.dumps(notify_packet)
78 conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
79 conn.connect(args.socket)
80 with closing(conn):
81 conn.send(payload.encode("utf-8"))
82
83
84def clean_add_arguments(parser):
85 """Initialise options for cleaning the dhcpd.leases file.
86
87 :param parser: An instance of :class:`ArgumentParser`.
88 """
89 parser.add_argument(
90 "leases_file", action="store",
91 help=("Leases file to remove host entries from."))
92
93
94def clean(args):
95 """Remove all host entries from leases file."""
96 with open(args.leases_file, "r", encoding="utf-8") as fp:
97 content = fp.read()
98 cleaned_content = re.sub("host [^{]+{[^}]+}\n", "", content)
99 with open(args.leases_file, "w", encoding="utf-8") as fp:
100 fp.write(cleaned_content)
101
102
103def main():
104 """Main entry point for the script."""
105 parser = argparse.ArgumentParser()
106 subparsers = parser.add_subparsers(title="command")
107 subparsers.required = True
108 subparsers.dest = "command"
109
110 # Notify.
111 notify_parser = subparsers.add_parser(
112 "notify", help="Write DHCP action notification to the `dhcpd.sock`.")
113 notify_parser.set_defaults(handler=notify)
114 notify_add_arguments(notify_parser)
115
116 # Clean.
117 clean_parser = subparsers.add_parser(
118 "clean", help="Remove all host entries from leases file.")
119 clean_parser.set_defaults(handler=clean)
120 clean_add_arguments(clean_parser)
121
122 # Run the parser and the command.
123 args = parser.parse_args()
124 args.handler(args)
125
126
127if __name__ == "__main__":
128 main()
0129
=== modified file 'setup.py'
--- setup.py 2016-01-04 16:24:36 +0000
+++ setup.py 2016-03-22 17:04:41 +0000
@@ -81,6 +81,8 @@
81 ('/usr/bin',81 ('/usr/bin',
82 ['scripts/maas-generate-winrm-cert',82 ['scripts/maas-generate-winrm-cert',
83 'scripts/uec2roottar']),83 'scripts/uec2roottar']),
84 ('/usr/sbin',
85 ['scripts/maas-dhcp-notify']),
84 ],86 ],
8587
86 classifiers=[88 classifiers=[
8789
=== modified file 'src/provisioningserver/__main__.py'
--- src/provisioningserver/__main__.py 2016-03-17 04:26:14 +0000
+++ src/provisioningserver/__main__.py 2016-03-22 17:04:41 +0000
@@ -8,7 +8,6 @@
8import provisioningserver.boot.install_bootloader8import provisioningserver.boot.install_bootloader
9import provisioningserver.boot.install_grub9import provisioningserver.boot.install_grub
10import provisioningserver.cluster_config_command10import provisioningserver.cluster_config_command
11import provisioningserver.dhcp.notifier
12import provisioningserver.support_dump11import provisioningserver.support_dump
13import provisioningserver.upgrade_cluster12import provisioningserver.upgrade_cluster
14from provisioningserver.utils.script import (13from provisioningserver.utils.script import (
@@ -23,7 +22,6 @@
23 'atomic-delete': AtomicDeleteScript,22 'atomic-delete': AtomicDeleteScript,
24 'check-for-shared-secret': security.CheckForSharedSecretScript,23 'check-for-shared-secret': security.CheckForSharedSecretScript,
25 'config': provisioningserver.cluster_config_command,24 'config': provisioningserver.cluster_config_command,
26 'dhcp-notify': provisioningserver.dhcp.notifier,
27 'install-shared-secret': security.InstallSharedSecretScript,25 'install-shared-secret': security.InstallSharedSecretScript,
28 'install-uefi-config': provisioningserver.boot.install_grub,26 'install-uefi-config': provisioningserver.boot.install_grub,
29 'support-dump': provisioningserver.support_dump,27 'support-dump': provisioningserver.support_dump,
3028
=== removed file 'src/provisioningserver/dhcp/notifier.py'
--- src/provisioningserver/dhcp/notifier.py 2015-12-23 15:59:21 +0000
+++ src/provisioningserver/dhcp/notifier.py 1970-01-01 00:00:00 +0000
@@ -1,73 +0,0 @@
1# Copyright 2012-2014 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Write DHCP action notification to the `dhcpd.sock`."""
5
6__all__ = [
7 "add_arguments",
8 "run",
9 ]
10
11import argparse
12from contextlib import closing
13import json
14import socket
15import time
16
17
18def add_arguments(parser):
19 """Initialise options for sending DHCP notification over the dhcpd.sock.
20
21 :param parser: An instance of :class:`ArgumentParser`.
22 """
23 parser.add_argument(
24 "--action", action="store", required=True,
25 choices=['commit', 'expiry', 'release'], help=(
26 "Action taken by DHCP server for the lease."))
27 parser.add_argument(
28 "--mac", action="store", required=True, help=(
29 "MAC address for lease."))
30 parser.add_argument(
31 "--ip-family", action="store", required=True, choices=['ipv4', 'ipv6'],
32 help="IP address family for lease.")
33 parser.add_argument(
34 "--ip", action="store", required=True, help=(
35 "IP address for lease."))
36 parser.add_argument(
37 "--lease-time", action="store", type=int, required=False, help=(
38 "Length of time before the lease expires."))
39 parser.add_argument(
40 "--hostname", action="store", required=False, help=(
41 "Hostname of the machine for the lease."))
42 parser.add_argument(
43 "--socket", action="store", required=False,
44 default="/var/lib/maas/dhcpd.sock", help=argparse.SUPPRESS)
45
46
47def run(args):
48 """Write DHCP action notification to the `dhcpd.sock`."""
49 notify_packet = {
50 "action": args.action,
51 "mac": args.mac,
52 "ip_family": args.ip_family,
53 "ip": args.ip,
54 "timestamp": int(time.time()),
55 }
56
57 # Lease time is required by the commit action and hostname is optional.
58 if args.action == "commit":
59 notify_packet["lease_time"] = args.lease_time
60 hostname = args.hostname
61 has_hostname = (
62 hostname is not None and
63 len(hostname) > 0 and
64 not hostname.isspace())
65 if has_hostname:
66 notify_packet["hostname"] = hostname
67
68 # Connect to the socket and send the packet over as JSON.
69 payload = json.dumps(notify_packet)
70 conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
71 conn.connect(args.socket)
72 with closing(conn):
73 conn.send(payload.encode("utf-8"))
740
=== added file 'src/provisioningserver/dhcp/tests/test_helper_clean.py'
--- src/provisioningserver/dhcp/tests/test_helper_clean.py 1970-01-01 00:00:00 +0000
+++ src/provisioningserver/dhcp/tests/test_helper_clean.py 2016-03-22 17:04:41 +0000
@@ -0,0 +1,103 @@
1# Copyright 2016 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Tests for maas-dhcp-support clean command."""
5
6__all__ = []
7
8from maastesting import root
9from maastesting.testcase import MAASTestCase
10from provisioningserver.utils.fs import read_text_file
11from provisioningserver.utils.shell import call_and_check
12
13
14LEASES_FILE_WITH_HOSTS = """\
15lease 192.168.10.67 {
16 starts 2 2016/03/22 13:44:15;
17 ends 3 2016/03/23 01:44:15;
18 cltt 2 2016/03/22 13:44:15;
19 binding state active;
20 next binding state free;
21 rewind binding state free;
22 hardware ethernet 74:d4:35:89:bc:f2;
23 set clht = "zoochemical-carlton";
24 set cllt = "43200";
25 set clip = "192.168.10.67";
26 set clhw = "74:d4:35:89:bc:f2";
27 set vendor-class-identifier = "Linux ipconfig";
28 client-hostname "zoochemical-carlton";
29}
30host 74-d4-35-89-b9-e8 {
31 dynamic;
32 hardware ethernet 74:d4:35:89:b9:e8;
33 fixed-address 192.168.10.5;
34}
35lease 192.168.10.69 {
36 starts 2 2016/03/22 13:44:15;
37 ends 3 2016/03/23 01:44:15;
38 cltt 2 2016/03/22 13:44:15;
39 binding state active;
40 next binding state free;
41 rewind binding state free;
42 hardware ethernet 74:d4:35:89:bd:25;
43 set clht = "undetesting-johnetta";
44 set cllt = "43200";
45 set clip = "192.168.10.69";
46 set clhw = "74:d4:35:89:bd:25";
47 set vendor-class-identifier = "Linux ipconfig";
48}
49host 74-d4-35-89-bc-26 {
50 dynamic;
51 hardware ethernet 74:d4:35:89:bc:26;
52 fixed-address 192.168.10.7;
53}
54host 74-d4-35-89-bc-23 {
55 dynamic;
56 deleted;
57}
58"""
59
60
61LEASES_FILE_WITHOUT_HOSTS = """\
62lease 192.168.10.67 {
63 starts 2 2016/03/22 13:44:15;
64 ends 3 2016/03/23 01:44:15;
65 cltt 2 2016/03/22 13:44:15;
66 binding state active;
67 next binding state free;
68 rewind binding state free;
69 hardware ethernet 74:d4:35:89:bc:f2;
70 set clht = "zoochemical-carlton";
71 set cllt = "43200";
72 set clip = "192.168.10.67";
73 set clhw = "74:d4:35:89:bc:f2";
74 set vendor-class-identifier = "Linux ipconfig";
75 client-hostname "zoochemical-carlton";
76}
77lease 192.168.10.69 {
78 starts 2 2016/03/22 13:44:15;
79 ends 3 2016/03/23 01:44:15;
80 cltt 2 2016/03/22 13:44:15;
81 binding state active;
82 next binding state free;
83 rewind binding state free;
84 hardware ethernet 74:d4:35:89:bd:25;
85 set clht = "undetesting-johnetta";
86 set cllt = "43200";
87 set clip = "192.168.10.69";
88 set clhw = "74:d4:35:89:bd:25";
89 set vendor-class-identifier = "Linux ipconfig";
90}
91"""
92
93
94class TestDHCPClean(MAASTestCase):
95
96 def test_removes_hosts_from_leases_file(self):
97 path = self.make_file(contents=LEASES_FILE_WITH_HOSTS)
98 call_and_check([
99 "%s/scripts/maas-dhcp-helper" % root,
100 "clean",
101 path,
102 ])
103 self.assertEquals(LEASES_FILE_WITHOUT_HOSTS, read_text_file(path))
0104
=== renamed file 'src/provisioningserver/dhcp/tests/test_notifier.py' => 'src/provisioningserver/dhcp/tests/test_helper_notify.py'
--- src/provisioningserver/dhcp/tests/test_notifier.py 2016-03-07 23:20:52 +0000
+++ src/provisioningserver/dhcp/tests/test_helper_notify.py 2016-03-22 17:04:41 +0000
@@ -1,7 +1,7 @@
1# Copyright 2014-2015 Canonical Ltd. This software is licensed under the1# Copyright 2015-2016 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Tests for maas-rack dhcp-notify command."""4"""Tests for maas-dhcp-support notify command."""
55
6__all__ = []6__all__ = []
77
@@ -67,8 +67,8 @@
67 self.addCleanup(service.stopService)67 self.addCleanup(service.stopService)
6868
69 call_and_check([69 call_and_check([
70 "%s/bin/maas-rack" % root,70 "%s/scripts/maas-dhcp-helper" % root,
71 "dhcp-notify",71 "notify",
72 '--action', action,72 '--action', action,
73 '--mac', mac,73 '--mac', mac,
74 '--ip-family', ip_family,74 '--ip-family', ip_family,