Merge lp:~blake-rouse/maas/fix-1559398 into lp:maas/trunk

Proposed by Blake Rouse on 2016-03-21
Status: Merged
Approved by: Blake Rouse on 2016-03-22
Approved revision: 4817
Merged at revision: 4815
Proposed branch: lp:~blake-rouse/maas/fix-1559398
Merge into: lp:maas/trunk
Diff against target: 407 lines (+240/-82)
7 files modified
etc/maas/templates/dhcp/dhcpd.conf.template (+3/-3)
scripts/maas-dhcp-helper (+128/-0)
setup.py (+2/-0)
src/provisioningserver/__main__.py (+0/-2)
src/provisioningserver/dhcp/notifier.py (+0/-73)
src/provisioningserver/dhcp/tests/test_helper_clean.py (+103/-0)
src/provisioningserver/dhcp/tests/test_helper_notify.py (+4/-4)
To merge this branch: bzr merge lp:~blake-rouse/maas/fix-1559398
Reviewer Review Type Date Requested Status
Mike Pontillo (community) 2016-03-21 Approve on 2016-03-22
Review via email: mp+289712@code.launchpad.net

Commit message

Add maas-dhcp-notify wrapper that forks maas-rack dhcp-notify and returns. This fixes the issue where if a execute action in isc-dhcp-server takes to long it affects the performance of isc-dhcp-server.

Description of the change

I have QA'd this and previously commissioning with just 40 nodes would result only half getting an actual lease. This change allowed all to get a lease and commission successfully.

Packaging change is also required: https://code.launchpad.net/~blake-rouse/maas/fix-1559398-packaging/+merge/289713

To post a comment you must log in.
lp:~blake-rouse/maas/fix-1559398 updated on 2016-03-21
4816. By Blake Rouse on 2016-03-21

Fix comment.

Mike Pontillo (mpontillo) wrote :

See one minor nit below.

My only concern with this branch is that instead of queuing up on a blocking write, we're creating an explosion of processes whenever a large number of DHCP notifications happen.

While this might scale to ~40 nodes, it may not scale to hundreds or thousands; we may fail to create more processes and lose notifications.

Long-term, we should queue up DHCP notifications in memory as fast as we can read them, and a worker should dequeue and process them. (if we still have problems scaling, we can scan the queue for redundant notifications, too.)

review: Approve
Mike Pontillo (mpontillo) wrote :

One other data point: I wrote a simple script to accept connections on a UNIX datagram socket:

https://paste.ubuntu.com/15468191/

... with a "processing delay" of one second after reading each datagram.

Using the Python client, I can easily get ~10 packets ahead of the server while it takes its time dequeuing the packets, using a command similar to:

bin/maas-rack dhcp-notify --action commit --mac 74:d4:35:89:bd:a9 --ip-family ipv4 --ip 192.168.10.60 --lease-time 43200 --hostname test-$SEQ --socket /tmp/test-socket

Using netcat, I can send in 500+ packets before it starts blocking on write, using a command similar to:

echo -n "{\"ip\": \"192.168.10.60\", \"mac\": \"74:d4:35:89:bd:a9\", \"ip_family\": \"ipv4\", \"hostname\": \"test-$SEQ\", \"lease_time\": 43200, \"action\": \"commit\", \"timestamp\": 1458597246}" | nc -w0 -uU /tmp/test-socket

Mike Pontillo (mpontillo) wrote :

Upon further testing, I don't think the fork approach is viable. I was able to cripple my system after ~150-200 processes forked, which could easily happen in MAAS if someone brings a rack online.

review: Disapprove
Mike Pontillo (mpontillo) wrote :

It looks like something we're doing to set up the notification process is very slow (on the order of half a second, or ~100 notifications in a little less than a minute). Since we don't want to pull in a dependency on netcat, I looked at other ways to send the lease notification message faster.

If I use the following test code to write to the socket, it's plenty fast (~100 notifications in ~1.2 seconds on my system):

python -c "import socket; conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM); conn.connect('/tmp/test-socket'); conn.send('<lease-json>'); conn.close()"

Something we're doing to set up the command framework and parse the arguments may be what's making it slow. (I already tried eliminating the dependency on the JSON module, and that didn't make any difference.)

lp:~blake-rouse/maas/fix-1559398 updated on 2016-03-22
4817. By Blake Rouse on 2016-03-22

Add maas-dhcp-helper script. Remove the dhcp-notifier from maas-rack. Update dhcp template to call the maas-dhcp-helper.

Mike Pontillo (mpontillo) wrote :

Looks much better; thanks for the fixes.

I see you've included the host-cleaning functionality in this utility as well. (I didn't look at it in too much detail, but it seems a little scary to do this with a regular expression; you might want to verify that it can't ever match the word "host" inside, for example, a lease block.) So a couple of comments on that:

 - It might be safer to use the ISC parser than a regular expression.
 - I hope we're going to ensure that dhcpd is disabled while we clean the file, since it could be large, and in the time it takes to rewrite the file another lease could easily have been allocated.
 - Could we possibly read the file to determine which host entries we want to remove, and use OMAPI to remove them?

review: Approve
Mike Pontillo (mpontillo) wrote :

One other comment. Since you've added the 'clean' functionality to this code, you could probably make the socket notifier portion faster if you move "import re" to the clean function, that way it isn't imported if we aren't cleaning the leases.

Blake Rouse (blake-rouse) wrote :

Good idea. Moved the imports to each sub command so that they run faster.

Also the clean is called specifically from systemd when the the service before it is started. So its 100% not running when that is called. This ensures that its not being modified at the same time, as its not even running.

The test includes hostname in the lease entry that my previous regex was matching. This regex is specific to a host block not any host in the lease section.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'etc/maas/templates/dhcp/dhcpd.conf.template'
2--- etc/maas/templates/dhcp/dhcpd.conf.template 2016-03-17 19:09:25 +0000
3+++ etc/maas/templates/dhcp/dhcpd.conf.template 2016-03-22 17:04:41 +0000
4@@ -89,7 +89,7 @@
5 set cllt = binary-to-ascii(10, 32, "", encode-int(lease-time, 32));
6 set clht = pick-first-value(option host-name, "(none)");
7 execute(
8- "/usr/sbin/maas-rack", "dhcp-notify",
9+ "/usr/sbin/maas-dhcp-helper", "notify",
10 "--action", "commit", "--mac", clhw,
11 "--ip-family", "ipv4", "--ip", clip,
12 "--lease-time", cllt, "--hostname", clht);
13@@ -98,7 +98,7 @@
14 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));
15 set clip = binary-to-ascii(10, 8, ".", leased-address);
16 execute(
17- "/usr/sbin/maas-rack", "dhcp-notify",
18+ "/usr/sbin/maas-dhcp-helper", "notify",
19 "--action", "expiry", "--mac", clhw,
20 "--ip-family", "ipv4", "--ip", clip);
21 }
22@@ -106,7 +106,7 @@
23 set clhw = binary-to-ascii(16, 8, ":", substring(hardware, 1, 6));
24 set clip = binary-to-ascii(10, 8, ".", leased-address);
25 execute(
26- "/usr/sbin/maas-rack", "dhcp-notify",
27+ "/usr/sbin/maas-dhcp-helper", "notify",
28 "--action", "release", "--mac", clhw,
29 "--ip-family", "ipv4", "--ip", clip);
30 }
31
32=== added file 'scripts/maas-dhcp-helper'
33--- scripts/maas-dhcp-helper 1970-01-01 00:00:00 +0000
34+++ scripts/maas-dhcp-helper 2016-03-22 17:04:41 +0000
35@@ -0,0 +1,128 @@
36+#!/usr/bin/env python3
37+# -*- mode: python -*-
38+# Copyright 2016 Canonical Ltd. This software is licensed under the
39+# GNU Affero General Public License version 3 (see the file LICENSE).
40+
41+"""Utility script used by maas-dhcpd and maas-dhcpd6.
42+
43+`notify` is called by maas-dhcpd and maas-dhcpd6 with the on execute command
44+from isc-dhcp-server when a lease is committed, released, or expired. Sends
45+a JSON encoded message over the `dhcpd.sock` to the running rackd process.
46+
47+`clean` is called by maas-dhcpd and maas-dhcpd6 when either service starts
48+to remove any host entries from the passed leases file. This is used to make
49+sure that when maas-dhcpd and maas-dhcpd6 start that only the host entries
50+from the dhcpd.conf or dhcpd6.conf are used.
51+"""
52+
53+import argparse
54+from contextlib import closing
55+import json
56+import re
57+import socket
58+import time
59+
60+
61+def notify_add_arguments(parser):
62+ """Initialise options for sending DHCP notification over the dhcpd.sock.
63+
64+ :param parser: An instance of :class:`ArgumentParser`.
65+ """
66+ parser.add_argument(
67+ "--action", action="store", required=True,
68+ choices=['commit', 'expiry', 'release'], help=(
69+ "Action taken by DHCP server for the lease."))
70+ parser.add_argument(
71+ "--mac", action="store", required=True, help=(
72+ "MAC address for lease."))
73+ parser.add_argument(
74+ "--ip-family", action="store", required=True, choices=['ipv4', 'ipv6'],
75+ help="IP address family for lease.")
76+ parser.add_argument(
77+ "--ip", action="store", required=True, help=(
78+ "IP address for lease."))
79+ parser.add_argument(
80+ "--lease-time", action="store", type=int, required=False, help=(
81+ "Length of time before the lease expires."))
82+ parser.add_argument(
83+ "--hostname", action="store", required=False, help=(
84+ "Hostname of the machine for the lease."))
85+ parser.add_argument(
86+ "--socket", action="store", required=False,
87+ default="/var/lib/maas/dhcpd.sock", help=argparse.SUPPRESS)
88+
89+
90+def notify(args):
91+ """Write DHCP action notification to the `dhcpd.sock`."""
92+ notify_packet = {
93+ "action": args.action,
94+ "mac": args.mac,
95+ "ip_family": args.ip_family,
96+ "ip": args.ip,
97+ "timestamp": int(time.time()),
98+ }
99+
100+ # Lease time is required by the commit action and hostname is optional.
101+ if args.action == "commit":
102+ notify_packet["lease_time"] = args.lease_time
103+ hostname = args.hostname
104+ has_hostname = (
105+ hostname is not None and
106+ len(hostname) > 0 and
107+ not hostname.isspace())
108+ if has_hostname:
109+ notify_packet["hostname"] = hostname
110+
111+ # Connect to the socket and send the packet over as JSON.
112+ payload = json.dumps(notify_packet)
113+ conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
114+ conn.connect(args.socket)
115+ with closing(conn):
116+ conn.send(payload.encode("utf-8"))
117+
118+
119+def clean_add_arguments(parser):
120+ """Initialise options for cleaning the dhcpd.leases file.
121+
122+ :param parser: An instance of :class:`ArgumentParser`.
123+ """
124+ parser.add_argument(
125+ "leases_file", action="store",
126+ help=("Leases file to remove host entries from."))
127+
128+
129+def clean(args):
130+ """Remove all host entries from leases file."""
131+ with open(args.leases_file, "r", encoding="utf-8") as fp:
132+ content = fp.read()
133+ cleaned_content = re.sub("host [^{]+{[^}]+}\n", "", content)
134+ with open(args.leases_file, "w", encoding="utf-8") as fp:
135+ fp.write(cleaned_content)
136+
137+
138+def main():
139+ """Main entry point for the script."""
140+ parser = argparse.ArgumentParser()
141+ subparsers = parser.add_subparsers(title="command")
142+ subparsers.required = True
143+ subparsers.dest = "command"
144+
145+ # Notify.
146+ notify_parser = subparsers.add_parser(
147+ "notify", help="Write DHCP action notification to the `dhcpd.sock`.")
148+ notify_parser.set_defaults(handler=notify)
149+ notify_add_arguments(notify_parser)
150+
151+ # Clean.
152+ clean_parser = subparsers.add_parser(
153+ "clean", help="Remove all host entries from leases file.")
154+ clean_parser.set_defaults(handler=clean)
155+ clean_add_arguments(clean_parser)
156+
157+ # Run the parser and the command.
158+ args = parser.parse_args()
159+ args.handler(args)
160+
161+
162+if __name__ == "__main__":
163+ main()
164
165=== modified file 'setup.py'
166--- setup.py 2016-01-04 16:24:36 +0000
167+++ setup.py 2016-03-22 17:04:41 +0000
168@@ -81,6 +81,8 @@
169 ('/usr/bin',
170 ['scripts/maas-generate-winrm-cert',
171 'scripts/uec2roottar']),
172+ ('/usr/sbin',
173+ ['scripts/maas-dhcp-notify']),
174 ],
175
176 classifiers=[
177
178=== modified file 'src/provisioningserver/__main__.py'
179--- src/provisioningserver/__main__.py 2016-03-17 04:26:14 +0000
180+++ src/provisioningserver/__main__.py 2016-03-22 17:04:41 +0000
181@@ -8,7 +8,6 @@
182 import provisioningserver.boot.install_bootloader
183 import provisioningserver.boot.install_grub
184 import provisioningserver.cluster_config_command
185-import provisioningserver.dhcp.notifier
186 import provisioningserver.support_dump
187 import provisioningserver.upgrade_cluster
188 from provisioningserver.utils.script import (
189@@ -23,7 +22,6 @@
190 'atomic-delete': AtomicDeleteScript,
191 'check-for-shared-secret': security.CheckForSharedSecretScript,
192 'config': provisioningserver.cluster_config_command,
193- 'dhcp-notify': provisioningserver.dhcp.notifier,
194 'install-shared-secret': security.InstallSharedSecretScript,
195 'install-uefi-config': provisioningserver.boot.install_grub,
196 'support-dump': provisioningserver.support_dump,
197
198=== removed file 'src/provisioningserver/dhcp/notifier.py'
199--- src/provisioningserver/dhcp/notifier.py 2015-12-23 15:59:21 +0000
200+++ src/provisioningserver/dhcp/notifier.py 1970-01-01 00:00:00 +0000
201@@ -1,73 +0,0 @@
202-# Copyright 2012-2014 Canonical Ltd. This software is licensed under the
203-# GNU Affero General Public License version 3 (see the file LICENSE).
204-
205-"""Write DHCP action notification to the `dhcpd.sock`."""
206-
207-__all__ = [
208- "add_arguments",
209- "run",
210- ]
211-
212-import argparse
213-from contextlib import closing
214-import json
215-import socket
216-import time
217-
218-
219-def add_arguments(parser):
220- """Initialise options for sending DHCP notification over the dhcpd.sock.
221-
222- :param parser: An instance of :class:`ArgumentParser`.
223- """
224- parser.add_argument(
225- "--action", action="store", required=True,
226- choices=['commit', 'expiry', 'release'], help=(
227- "Action taken by DHCP server for the lease."))
228- parser.add_argument(
229- "--mac", action="store", required=True, help=(
230- "MAC address for lease."))
231- parser.add_argument(
232- "--ip-family", action="store", required=True, choices=['ipv4', 'ipv6'],
233- help="IP address family for lease.")
234- parser.add_argument(
235- "--ip", action="store", required=True, help=(
236- "IP address for lease."))
237- parser.add_argument(
238- "--lease-time", action="store", type=int, required=False, help=(
239- "Length of time before the lease expires."))
240- parser.add_argument(
241- "--hostname", action="store", required=False, help=(
242- "Hostname of the machine for the lease."))
243- parser.add_argument(
244- "--socket", action="store", required=False,
245- default="/var/lib/maas/dhcpd.sock", help=argparse.SUPPRESS)
246-
247-
248-def run(args):
249- """Write DHCP action notification to the `dhcpd.sock`."""
250- notify_packet = {
251- "action": args.action,
252- "mac": args.mac,
253- "ip_family": args.ip_family,
254- "ip": args.ip,
255- "timestamp": int(time.time()),
256- }
257-
258- # Lease time is required by the commit action and hostname is optional.
259- if args.action == "commit":
260- notify_packet["lease_time"] = args.lease_time
261- hostname = args.hostname
262- has_hostname = (
263- hostname is not None and
264- len(hostname) > 0 and
265- not hostname.isspace())
266- if has_hostname:
267- notify_packet["hostname"] = hostname
268-
269- # Connect to the socket and send the packet over as JSON.
270- payload = json.dumps(notify_packet)
271- conn = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
272- conn.connect(args.socket)
273- with closing(conn):
274- conn.send(payload.encode("utf-8"))
275
276=== added file 'src/provisioningserver/dhcp/tests/test_helper_clean.py'
277--- src/provisioningserver/dhcp/tests/test_helper_clean.py 1970-01-01 00:00:00 +0000
278+++ src/provisioningserver/dhcp/tests/test_helper_clean.py 2016-03-22 17:04:41 +0000
279@@ -0,0 +1,103 @@
280+# Copyright 2016 Canonical Ltd. This software is licensed under the
281+# GNU Affero General Public License version 3 (see the file LICENSE).
282+
283+"""Tests for maas-dhcp-support clean command."""
284+
285+__all__ = []
286+
287+from maastesting import root
288+from maastesting.testcase import MAASTestCase
289+from provisioningserver.utils.fs import read_text_file
290+from provisioningserver.utils.shell import call_and_check
291+
292+
293+LEASES_FILE_WITH_HOSTS = """\
294+lease 192.168.10.67 {
295+ starts 2 2016/03/22 13:44:15;
296+ ends 3 2016/03/23 01:44:15;
297+ cltt 2 2016/03/22 13:44:15;
298+ binding state active;
299+ next binding state free;
300+ rewind binding state free;
301+ hardware ethernet 74:d4:35:89:bc:f2;
302+ set clht = "zoochemical-carlton";
303+ set cllt = "43200";
304+ set clip = "192.168.10.67";
305+ set clhw = "74:d4:35:89:bc:f2";
306+ set vendor-class-identifier = "Linux ipconfig";
307+ client-hostname "zoochemical-carlton";
308+}
309+host 74-d4-35-89-b9-e8 {
310+ dynamic;
311+ hardware ethernet 74:d4:35:89:b9:e8;
312+ fixed-address 192.168.10.5;
313+}
314+lease 192.168.10.69 {
315+ starts 2 2016/03/22 13:44:15;
316+ ends 3 2016/03/23 01:44:15;
317+ cltt 2 2016/03/22 13:44:15;
318+ binding state active;
319+ next binding state free;
320+ rewind binding state free;
321+ hardware ethernet 74:d4:35:89:bd:25;
322+ set clht = "undetesting-johnetta";
323+ set cllt = "43200";
324+ set clip = "192.168.10.69";
325+ set clhw = "74:d4:35:89:bd:25";
326+ set vendor-class-identifier = "Linux ipconfig";
327+}
328+host 74-d4-35-89-bc-26 {
329+ dynamic;
330+ hardware ethernet 74:d4:35:89:bc:26;
331+ fixed-address 192.168.10.7;
332+}
333+host 74-d4-35-89-bc-23 {
334+ dynamic;
335+ deleted;
336+}
337+"""
338+
339+
340+LEASES_FILE_WITHOUT_HOSTS = """\
341+lease 192.168.10.67 {
342+ starts 2 2016/03/22 13:44:15;
343+ ends 3 2016/03/23 01:44:15;
344+ cltt 2 2016/03/22 13:44:15;
345+ binding state active;
346+ next binding state free;
347+ rewind binding state free;
348+ hardware ethernet 74:d4:35:89:bc:f2;
349+ set clht = "zoochemical-carlton";
350+ set cllt = "43200";
351+ set clip = "192.168.10.67";
352+ set clhw = "74:d4:35:89:bc:f2";
353+ set vendor-class-identifier = "Linux ipconfig";
354+ client-hostname "zoochemical-carlton";
355+}
356+lease 192.168.10.69 {
357+ starts 2 2016/03/22 13:44:15;
358+ ends 3 2016/03/23 01:44:15;
359+ cltt 2 2016/03/22 13:44:15;
360+ binding state active;
361+ next binding state free;
362+ rewind binding state free;
363+ hardware ethernet 74:d4:35:89:bd:25;
364+ set clht = "undetesting-johnetta";
365+ set cllt = "43200";
366+ set clip = "192.168.10.69";
367+ set clhw = "74:d4:35:89:bd:25";
368+ set vendor-class-identifier = "Linux ipconfig";
369+}
370+"""
371+
372+
373+class TestDHCPClean(MAASTestCase):
374+
375+ def test_removes_hosts_from_leases_file(self):
376+ path = self.make_file(contents=LEASES_FILE_WITH_HOSTS)
377+ call_and_check([
378+ "%s/scripts/maas-dhcp-helper" % root,
379+ "clean",
380+ path,
381+ ])
382+ self.assertEquals(LEASES_FILE_WITHOUT_HOSTS, read_text_file(path))
383
384=== renamed file 'src/provisioningserver/dhcp/tests/test_notifier.py' => 'src/provisioningserver/dhcp/tests/test_helper_notify.py'
385--- src/provisioningserver/dhcp/tests/test_notifier.py 2016-03-07 23:20:52 +0000
386+++ src/provisioningserver/dhcp/tests/test_helper_notify.py 2016-03-22 17:04:41 +0000
387@@ -1,7 +1,7 @@
388-# Copyright 2014-2015 Canonical Ltd. This software is licensed under the
389+# Copyright 2015-2016 Canonical Ltd. This software is licensed under the
390 # GNU Affero General Public License version 3 (see the file LICENSE).
391
392-"""Tests for maas-rack dhcp-notify command."""
393+"""Tests for maas-dhcp-support notify command."""
394
395 __all__ = []
396
397@@ -67,8 +67,8 @@
398 self.addCleanup(service.stopService)
399
400 call_and_check([
401- "%s/bin/maas-rack" % root,
402- "dhcp-notify",
403+ "%s/scripts/maas-dhcp-helper" % root,
404+ "notify",
405 '--action', action,
406 '--mac', mac,
407 '--ip-family', ip_family,