Merge lp:~jtv/maas/bug-1059569 into lp:~maas-committers/maas/trunk

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 1128
Proposed branch: lp:~jtv/maas/bug-1059569
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 230 lines (+74/-24)
2 files modified
src/provisioningserver/start_cluster_controller.py (+21/-6)
src/provisioningserver/tests/test_start_cluster_controller.py (+53/-18)
To merge this branch: bzr merge lp:~jtv/maas/bug-1059569
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Review via email: mp+127442@code.launchpad.net

Commit message

Run the cluster controller under a customizable user/group identity (by default, maas/maas).

Description of the change

Discussed with... everyone, probably. We had a vague hope that this would also fix Upstart's problems tracking the celeryd fork(), but actually that required a separate branch. The problem there was that we did a different fork first, and so upstart got to track the wrong one. All that _really_ changes in the branch you see here is that a fork/exec sequence becomes a fork/setuid/exec sequence.

I do as much as possible before the fork, because the error feedback channel from the child process is always going to be a bit narrower and harder to manage. If a failure is coming, we'd best give up early and report it while we still can. It would have been worth checking privileges for the setuid/setgid beforehand as well, except maas-provision won't run unless you're root. For our purposes here it'd be fine if you ran it as maas (or yourself, in dev mode) and made it setuid to that same user; setuid would allow that but maas-provision, as it stands, will not.

Interesting fact: this code has undergone so many changes in how we kick off the child process that the test abstractions for faking the related system calls keep growing.

Jeroen

To post a comment you must log in.
Revision history for this message
Raphaël Badin (rvb) wrote :

Looks good.

[0]

17 + parser.add_argument(
18 + '--user', '-u', metavar='USER', default='maas',
19 + help="System user identity that should run the cluster controller.")
20 + parser.add_argument(
21 + '--group', '-g', metavar='GROUP', default='maas',
22 + help="System group that should run the cluster controller.")

and

58 +def start_up(server_url, connection_details, user='maas', group='maas'):

That's two places where you set the default ('maas'/'maas'). The first one is probably enough don't you think?

review: Approve
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good. I haven't looked at everything because I saw that rvba is
doing this, but I had a couple of comments.

[1]

-    Popen(command, env=env)
+
+    pid = os.fork()
+    if pid == 0:
+        # Child process.  Become the right user, and start celery.
+        os.setuid(uid)
+        os.setgid(gid)
+        os.execvpe(command[0], command, env)

Doesn't matter much, but you could use Popen's preexec_fn argument
too:

    def switch_user():
        os.setuid(uid)
        os.setgid(gid)
    Popen(command, env=env, preexec_fn=switch_user)

[2]

+    parser.add_argument(
+        '--user', '-u', metavar='USER', default='maas',
+        help="System user identity that should run the cluster controller.")
+    parser.add_argument(
+        '--group', '-g', metavar='GROUP', default='maas',
+        help="System group that should run the cluster controller.")

Perhaps the default should be the current user, unless invoked by
root, in which case there is no default and it becomes a mandatory
argument (argparse supports required=True)? Might be handy for
development.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Raphael: you're absolutely right and I fixed it. Complicates the tests a tiny bit but it's really not so bad.

Gavin: nice idea but not now please! Doing it the way I did here solves the bug immediately. Doing it your way requires a packaging change first.

Revision history for this message
Gavin Panella (allenap) wrote :

On 2 October 2012 11:22, Jeroen T. Vermeulen <...> wrote:
> Gavin: nice idea but not now please! Doing it the way I did here
> solves the bug immediately. Doing it your way requires a packaging
> change first.

Good point.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/provisioningserver/start_cluster_controller.py'
2--- src/provisioningserver/start_cluster_controller.py 2012-09-28 10:16:59 +0000
3+++ src/provisioningserver/start_cluster_controller.py 2012-10-02 10:24:23 +0000
4@@ -18,7 +18,7 @@
5 import httplib
6 import json
7 import os
8-from subprocess import Popen
9+from pwd import getpwnam
10 from time import sleep
11 from urllib2 import (
12 HTTPError,
13@@ -43,6 +43,12 @@
14 """For use by :class:`MainScript`."""
15 parser.add_argument(
16 'server_url', metavar='URL', help="URL to the MAAS region controller.")
17+ parser.add_argument(
18+ '--user', '-u', metavar='USER', default='maas',
19+ help="System user identity that should run the cluster controller.")
20+ parser.add_argument(
21+ '--group', '-g', metavar='GROUP', default='maas',
22+ help="System group that should run the cluster controller.")
23
24
25 def log_error(exception):
26@@ -115,8 +121,10 @@
27 raise AssertionError("Unexpected return code: %r" % status_code)
28
29
30-def start_celery(connection_details):
31+def start_celery(connection_details, user, group):
32 broker_url = connection_details['BROKER_URL']
33+ uid = getpwnam(user).pw_uid
34+ gid = getpwnam(group).pw_gid
35
36 # Copy environment, but also tell celeryd what broker to listen to.
37 env = dict(os.environ, CELERY_BROKER_URL=broker_url)
38@@ -128,7 +136,13 @@
39 '--beat',
40 '-Q', get_cluster_uuid(),
41 ]
42- Popen(command, env=env)
43+
44+ pid = os.fork()
45+ if pid == 0:
46+ # Child process. Become the right user, and start celery.
47+ os.setuid(uid)
48+ os.setgid(gid)
49+ os.execvpe(command[0], command, env)
50
51
52 def request_refresh(server_url):
53@@ -141,13 +155,13 @@
54 % e.reason)
55
56
57-def start_up(server_url, connection_details):
58+def start_up(server_url, connection_details, user, group):
59 """We've been accepted as a cluster controller; start doing the job.
60
61 This starts up celeryd, listening to the broker that the region
62 controller pointed us to, and on the appropriate queue.
63 """
64- start_celery(connection_details)
65+ start_celery(connection_details, user=user, group=group)
66 sleep(10)
67 request_refresh(server_url)
68
69@@ -162,4 +176,5 @@
70 while connection_details is None:
71 sleep(60)
72 connection_details = register(args.server_url)
73- start_up(args.server_url, connection_details)
74+ start_up(
75+ args.server_url, connection_details, user=args.user, group=args.group)
76
77=== modified file 'src/provisioningserver/tests/test_start_cluster_controller.py'
78--- src/provisioningserver/tests/test_start_cluster_controller.py 2012-09-28 10:16:59 +0000
79+++ src/provisioningserver/tests/test_start_cluster_controller.py 2012-10-02 10:24:23 +0000
80@@ -17,6 +17,8 @@
81 import httplib
82 from io import BytesIO
83 import json
84+import os
85+from random import randint
86 from urllib2 import (
87 HTTPError,
88 URLError,
89@@ -26,7 +28,6 @@
90 from apiclient.testing.django import parse_headers_and_body_with_django
91 from fixtures import EnvironmentVariableFixture
92 from maastesting.factory import factory
93-from mock import Mock
94 from provisioningserver import start_cluster_controller
95 from provisioningserver.testing.testcase import PservTestCase
96
97@@ -50,13 +51,15 @@
98 )
99
100
101-FakeArgs = namedtuple('FakeArgs', ['server_url'])
102+FakeArgs = namedtuple('FakeArgs', ['server_url', 'user', 'group'])
103
104
105 def make_args(server_url=None):
106 if server_url is None:
107 server_url = make_url('region')
108- return FakeArgs(server_url)
109+ user = factory.make_name('user')
110+ group = factory.make_name('group')
111+ return FakeArgs(server_url, user, group)
112
113
114 class FakeURLOpenResponse:
115@@ -77,12 +80,19 @@
116
117 def setUp(self):
118 super(TestStartClusterController, self).setUp()
119+ # Patch out anything that could be remotely harmful if we did it
120+ # accidentally in the test. Make the really outrageous ones
121+ # raise exceptions.
122 self.patch(start_cluster_controller, 'sleep').side_effect = Sleeping()
123- self.patch(start_cluster_controller, 'Popen').side_effect = (
124- Executing())
125- self.cluster_uuid = factory.getRandomUUID()
126- self.patch(start_cluster_controller, 'get_cluster_uuid', Mock(
127- return_value=self.cluster_uuid))
128+ self.patch(start_cluster_controller, 'getpwnam')
129+ start_cluster_controller.getpwnam.pw_uid = randint(3000, 4000)
130+ start_cluster_controller.getpwnam.pw_gid = randint(3000, 4000)
131+ self.patch(os, 'fork').side_effect = Executing()
132+ self.patch(os, 'execvpe').side_effect = Executing()
133+ self.patch(os, 'setuid')
134+ self.patch(os, 'setgid')
135+ get_uuid = self.patch(start_cluster_controller, 'get_cluster_uuid')
136+ get_uuid.return_value = factory.getRandomUUID()
137
138 def make_connection_details(self):
139 return {
140@@ -123,17 +133,36 @@
141 """Prepare to return "request pending" from API request."""
142 self.prepare_response(httplib.ACCEPTED)
143
144+ def pretend_to_fork_into_child(self):
145+ """Make `fork` act as if it's returning into the child process.
146+
147+ The start_cluster_controller child process then executes celeryd,
148+ so this call also patches up the call that does that so it pretends
149+ to be successful.
150+ """
151+ self.patch(os, 'fork').return_value = 0
152+ self.patch(os, 'execvpe')
153+
154+ def pretend_to_fork_into_parent(self):
155+ """Make `fork` act as if it's returning into the parent process."""
156+ self.patch(os, 'fork').return_value = randint(2, 65535)
157+
158 def test_run_command(self):
159 # We can't really run the script, but we can verify that (with
160 # the right system functions patched out) we can run it
161 # directly.
162- self.patch(start_cluster_controller, 'Popen')
163+ self.pretend_to_fork_into_child()
164 self.patch(start_cluster_controller, 'sleep')
165 self.prepare_success_response()
166 parser = ArgumentParser()
167 start_cluster_controller.add_arguments(parser)
168 start_cluster_controller.run(parser.parse_args((make_url(),)))
169- self.assertNotEqual(0, start_cluster_controller.Popen.call_count)
170+ self.assertEqual(1, os.fork.call_count)
171+ self.assertEqual(1, os.execvpe.call_count)
172+ os.setuid.assert_called_once_with(
173+ start_cluster_controller.getpwnam.return_value.pw_uid)
174+ os.setgid.assert_called_once_with(
175+ start_cluster_controller.getpwnam.return_value.pw_gid)
176
177 def test_uses_given_url(self):
178 url = make_url('region')
179@@ -184,24 +213,29 @@
180 headers, body = kwargs["headers"], kwargs["data"]
181 post, files = self.parse_headers_and_body(headers, body)
182 self.assertEqual([interface], json.loads(post['interfaces']))
183- self.assertEqual(self.cluster_uuid, post['uuid'])
184+ self.assertEqual(
185+ start_cluster_controller.get_cluster_uuid.return_value,
186+ post['uuid'])
187
188 def test_starts_up_once_accepted(self):
189 self.patch(start_cluster_controller, 'start_up')
190 connection_details = self.prepare_success_response()
191 server_url = make_url()
192 start_cluster_controller.run(make_args(server_url=server_url))
193- start_cluster_controller.start_up.assert_called_once_with(
194- server_url, connection_details)
195+ self.assertItemsEqual(
196+ start_cluster_controller.start_up.call_args[0],
197+ (server_url, connection_details))
198
199 def test_start_up_calls_refresh_secrets(self):
200 url = make_url('region')
201 connection_details = self.make_connection_details()
202- self.patch(start_cluster_controller, 'Popen')
203+ self.pretend_to_fork_into_parent()
204 self.patch(start_cluster_controller, 'sleep')
205 self.prepare_success_response()
206
207- start_cluster_controller.start_up(url, connection_details)
208+ start_cluster_controller.start_up(
209+ url, connection_details,
210+ factory.make_name('user'), factory.make_name('group'))
211
212 (args, kwargs) = MAASDispatcher.dispatch_query.call_args
213 self.assertEqual(url + 'api/1.0/nodegroups/', args[0])
214@@ -212,12 +246,13 @@
215 self.assertEqual("refresh_workers", post["op"])
216
217 def test_start_up_ignores_failure_on_refresh_secrets(self):
218- self.patch(start_cluster_controller, 'Popen')
219+ self.pretend_to_fork_into_parent()
220 self.patch(start_cluster_controller, 'sleep')
221 self.patch(MAASDispatcher, 'dispatch_query').side_effect = URLError(
222 "Simulated HTTP failure.")
223
224 start_cluster_controller.start_up(
225- make_url(), self.make_connection_details())
226+ make_url(), self.make_connection_details(),
227+ factory.make_name('user'), factory.make_name('group'))
228
229- self.assertNotEqual(0, start_cluster_controller.Popen.call_count)
230+ self.assertEqual(1, os.fork.call_count)