Merge lp:~jamesodhunt/upstart/test-quiesce-cleanup into lp:upstart

Proposed by James Hunt on 2013-08-28
Status: Merged
Merged at revision: 1528
Proposed branch: lp:~jamesodhunt/upstart/test-quiesce-cleanup
Merge into: lp:upstart
Diff against target: 528 lines (+245/-54)
2 files modified
ChangeLog (+14/-0)
util/tests/test_initctl.c (+231/-54)
To merge this branch: bzr merge lp:~jamesodhunt/upstart/test-quiesce-cleanup
Reviewer Review Type Date Requested Status
Steve Langasek 2013-08-28 Approve on 2013-09-11
Review via email: mp+182698@code.launchpad.net

Description of the change

* util/tests/test_initctl.c: test_quiesce(): Clean up any processes that
  the Session Init couldn't before it shut down.

Note that such rogues can cause issues for pbuilder.

To post a comment you must log in.
Steve Langasek (vorlon) wrote :

Hi James,

Some of these changes appear to be for cases where we launch a session init with a single job, and then shut down the session init. Why should the session init ever fail to kill the job's process in this case? Isn't this a bug in the session init itself, not in the test case?

If a session init can fail to kill -9 its subprocesses under test, it can also fail to do so in the real world; that sounds like a bug to me that we should fix, not work around in the test.

review: Needs Information
1528. By James Hunt on 2013-09-05

* util/tests/test_initctl.c: test_quiesce():
  - Improve kill checks on job processes.
  - Assert precisely which job processes are expected to be running
    after the Session Init has exited (particularly important for jobs
    that 'start on session-end' since they may be running in a System
    Shutdown scenario).

James Hunt (jamesodhunt) wrote :

Hi Steve,

The tests are simulating both System Shutdown and Session Shutdown.

For the Session Shutdown scenario, the Session Init will wait for the jobs, hence we expect all job processes (even those that block SIGTERM) to be dead by the time the Session Init exits since it will have sent such jobs SIGKILL.

However, in the System Shutdown scenario, when the Display Manager receives SIGTERM to denote system shutdown, it will send the same signal to all its clients, one of which is the Session Init. The Session Init will then send all job processes the job->class->kill_signal signal. However, since the Session Init should not hold up the system shutdown, although it attempts to wait for 5 seconds after sending SIGTERM before sending the final SIGKILL to each job process, it may already have been SIGKILL'd by the Display Manager: on an Ubuntu system, the DM waits 5 seconds between sending SIGTERM and SIGKILL... and the Session Init waits 5 seconds between the time it sent SIGTERM to each job process and the time it sends SIGKILL. The tests were attempting to simulate this exact behaviour but that does admittedly look "odd". I've now updated the branch to avoid such absolute mimicry but at the same time be more careful wrt checking which pids we expect to have stopped and which could still be running.

What this branch does highlight (and which the tests now comment on explicitly) is that any job that specifies "start on session-end" could still be running after the Session Init has exited but only in a System Shutdown context (since the "session-end" event is only emitted in the next iteration of the main loop - after Upstart has sent SIGTERM to all running job processes). As above, on an actual system those processes would get killed by the System Init.

Steve Langasek (vorlon) wrote :

The updated branch looks good to me - that precisely addresses my concerns about inconsistent/unreliable tests, thanks.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog'
2--- ChangeLog 2013-08-23 14:49:09 +0000
3+++ ChangeLog 2013-09-05 16:19:52 +0000
4@@ -1,3 +1,17 @@
5+2013-09-05 James Hunt <james.hunt@ubuntu.com>
6+
7+ * util/tests/test_initctl.c: test_quiesce():
8+ - Improve kill checks on job processes.
9+ - Assert precisely which job processes are expected to be running
10+ after the Session Init has exited (particularly important for jobs
11+ that 'start on session-end' since they may be running in a System
12+ Shutdown scenario).
13+
14+2013-08-28 James Hunt <james.hunt@ubuntu.com>
15+
16+ * util/tests/test_initctl.c: test_quiesce(): Clean up any
17+ processes that the Session Init couldn't before it shut down.
18+
19 2013-08-23 James Hunt <james.hunt@ubuntu.com>
20
21 * NEWS: Release 1.10
22
23=== modified file 'util/tests/test_initctl.c'
24--- util/tests/test_initctl.c 2013-08-23 11:23:19 +0000
25+++ util/tests/test_initctl.c 2013-09-05 16:19:52 +0000
26@@ -11125,6 +11125,7 @@
27 {
28 char confdir[PATH_MAX];
29 char logdir[PATH_MAX];
30+ char pid_file[PATH_MAX];
31 char sessiondir[PATH_MAX];
32 nih_local char *cmd = NULL;
33 pid_t upstart_pid = 0;
34@@ -11135,6 +11136,8 @@
35 nih_local NihDBusProxy *upstart = NULL;
36 nih_local char *orig_xdg_runtime_dir = NULL;
37 nih_local char *session_file = NULL;
38+ nih_local char *job = NULL;
39+ pid_t job_pid;
40
41 TEST_GROUP ("Session Init quiesce");
42
43@@ -11204,21 +11207,64 @@
44 TEST_EQ (lines, 1);
45 nih_free (output);
46
47+ job_pid = job_to_pid ("long-running");
48+ TEST_NE (job_pid, -1);
49+
50 /* Trigger shutdown */
51 assert0 (kill (upstart_pid, SIGTERM));
52
53 /* Force reset */
54 test_user_mode = FALSE;
55
56- TEST_EQ (timed_waitpid (upstart_pid, TEST_QUIESCE_KILL_PHASE), upstart_pid);
57+ /* Wait for longer than we expect the Session Init to take to
58+ * shutdown to give it time to send SIGKILL to all job
59+ * processes. This is unrealistic, but safer for the tests since
60+ * the exact behaviour can be checked.
61+ *
62+ * In reality, the following steps either side of the markers *will*
63+ * occur and those within the markers *may* occur:
64+ *
65+ * 1) A System Shutdown is triggered.
66+ * 2) The Display Manager receives SIGTERM.
67+ * 3) The Display Manager sends SIGTERM to all its clients.
68+ * (including the Session Init).
69+ * 4) The Session Init sends SIGTERM to all running job
70+ * processes.
71+ *
72+ * --- :XXX: START MARKER :XXX: ---
73+ *
74+ * 5) The Session Init will attempt to wait for
75+ * MAX(kill_timeout) seconds.
76+ * 6) The Session Init will send all running job processes
77+ * SIGKILL.
78+ * 7) The Session Init will wait for all remaining job processes
79+ * to end.
80+ * 8) The Session Init will exit.
81+ *
82+ * --- :XXX: END MARKER :XXX: ---
83+ *
84+ * 9) The Display Manager sends SIGKILL to all its clients.
85+ * 10) If still running, the Session Init is killed and exits.
86+ *
87+ * The problem is that the Session Init cannot know when the
88+ * Display Manager will kill *it* so it may be that the Session
89+ * Init cannot send SIGKILL to each job process instead relying
90+ * on the System Init to clean up.
91+ */
92+ TEST_EQ (timed_waitpid (upstart_pid, 1+TEST_QUIESCE_KILL_PHASE), upstart_pid);
93
94 /* Should not now be running */
95 TEST_EQ (kill (upstart_pid, 0), -1);
96+ TEST_EQ (errno, ESRCH);
97
98 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
99 sessiondir, (int)upstart_pid));
100 unlink (session_file);
101
102+ /* pid should no longer exist */
103+ TEST_EQ (kill (job_pid, SIGKILL), -1);
104+ TEST_EQ (errno, ESRCH);
105+
106 DELETE_FILE (confdir, "long-running.conf");
107
108 /*******************************************************************/
109@@ -11243,33 +11289,45 @@
110 TEST_EQ (lines, 1);
111 nih_free (output);
112
113+ job_pid = job_to_pid ("long-running-term");
114+ TEST_NE (job_pid, -1);
115+
116 /* Trigger shutdown */
117 assert0 (kill (upstart_pid, SIGTERM));
118
119 /* Force reset */
120 test_user_mode = FALSE;
121
122- TEST_EQ (timed_waitpid (upstart_pid, TEST_QUIESCE_KILL_PHASE), upstart_pid);
123+ TEST_EQ (timed_waitpid (upstart_pid, 1+TEST_QUIESCE_KILL_PHASE), upstart_pid);
124
125 /* Should not now be running */
126 TEST_EQ (kill (upstart_pid, 0), -1);
127+ TEST_EQ (errno, ESRCH);
128
129 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
130 sessiondir, (int)upstart_pid));
131 unlink (session_file);
132
133+ /* pid should no longer exist */
134+ TEST_EQ (kill (job_pid, SIGKILL), -1);
135+ TEST_EQ (errno, ESRCH);
136+
137 DELETE_FILE (confdir, "long-running-term.conf");
138
139 /*******************************************************************/
140 TEST_FEATURE ("system shutdown: one job which starts on session-end");
141
142- CREATE_FILE (confdir, "session-end.conf",
143- "start on session-end\n"
144- "\n"
145- "script\n"
146- " echo hello\n"
147- " sleep 999\n"
148- "end script");
149+ TEST_FILENAME (pid_file);
150+
151+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
152+ "\n"
153+ "script\n"
154+ " echo hello\n"
155+ " echo $$ >\"%s\"\n"
156+ " exec sleep 999\n"
157+ "end script", pid_file));
158+
159+ CREATE_FILE (confdir, "session-end.conf", job);
160
161 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
162
163@@ -11282,10 +11340,11 @@
164 /* Force reset */
165 test_user_mode = FALSE;
166
167- TEST_EQ (timed_waitpid (upstart_pid, TEST_QUIESCE_KILL_PHASE), upstart_pid);
168+ TEST_EQ (timed_waitpid (upstart_pid, 1+TEST_QUIESCE_KILL_PHASE), upstart_pid);
169
170 /* Should not now be running */
171 TEST_EQ (kill (upstart_pid, 0), -1);
172+ TEST_EQ (errno, ESRCH);
173
174 logfile = NIH_MUST (nih_sprintf (NULL, "%s/%s",
175 logdir,
176@@ -11302,19 +11361,36 @@
177 sessiondir, (int)upstart_pid));
178 unlink (session_file);
179
180+ file = fopen (pid_file, "r");
181+ TEST_NE_P (file, NULL);
182+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
183+ fclose (file);
184+
185+ /* pid should be running since Upstart won't have signalled it
186+ * to stop (since it started as a result of session-end being
187+ * emitted _after_ the job pids were sent SIGTERM).
188+ */
189+ TEST_EQ (kill (job_pid, SIGKILL), 0);
190+
191+ assert0 (unlink (pid_file));
192+
193 DELETE_FILE (confdir, "session-end.conf");
194
195 /*******************************************************************/
196 TEST_FEATURE ("system shutdown: one job which starts on session-end and ignores SIGTERM");
197
198- CREATE_FILE (confdir, "session-end-term.conf",
199- "start on session-end\n"
200- "\n"
201- "script\n"
202- " trap '' TERM\n"
203- " echo hello\n"
204- " sleep 999\n"
205- "end script");
206+ TEST_FILENAME (pid_file);
207+
208+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
209+ "\n"
210+ "script\n"
211+ " trap '' TERM\n"
212+ " echo hello\n"
213+ " echo $$ >\"%s\"\n"
214+ " exec sleep 999\n"
215+ "end script", pid_file));
216+
217+ CREATE_FILE (confdir, "session-end-term.conf", job);
218
219 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
220
221@@ -11327,10 +11403,11 @@
222 /* Force reset */
223 test_user_mode = FALSE;
224
225- TEST_EQ (timed_waitpid (upstart_pid, TEST_QUIESCE_KILL_PHASE), upstart_pid);
226+ TEST_EQ (timed_waitpid (upstart_pid, 1+TEST_QUIESCE_KILL_PHASE), upstart_pid);
227
228 /* Should not now be running */
229 TEST_EQ (kill (upstart_pid, 0), -1);
230+ TEST_EQ (errno, ESRCH);
231
232 logfile = NIH_MUST (nih_sprintf (NULL, "%s/%s",
233 logdir,
234@@ -11347,6 +11424,17 @@
235 sessiondir, (int)upstart_pid));
236 unlink (session_file);
237
238+ /* kill job pid if not already dead */
239+ file = fopen (pid_file, "r");
240+ TEST_NE_P (file, NULL);
241+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
242+ fclose (file);
243+
244+ /* pid should still be running */
245+ TEST_EQ (kill (job_pid, SIGKILL), 0);
246+
247+ assert0 (unlink (pid_file));
248+
249 DELETE_FILE (confdir, "session-end-term.conf");
250
251 /*******************************************************************/
252@@ -11357,17 +11445,20 @@
253 CREATE_FILE (confdir, "long-running-term.conf",
254 "script\n"
255 " trap '' TERM\n"
256- " sleep 999\n"
257- "end script");
258-
259- CREATE_FILE (confdir, "session-end-term.conf",
260- "start on session-end\n"
261- "\n"
262- "script\n"
263- " trap '' TERM\n"
264- " sleep 999\n"
265- "end script");
266-
267+ " exec sleep 999\n"
268+ "end script");
269+
270+ TEST_FILENAME (pid_file);
271+
272+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
273+ "\n"
274+ "script\n"
275+ " trap '' TERM\n"
276+ " echo $$ >\"%s\"\n"
277+ " exec sleep 999\n"
278+ "end script", pid_file));
279+
280+ CREATE_FILE (confdir, "session-end-term.conf", job);
281
282 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
283
284@@ -11382,21 +11473,39 @@
285 TEST_EQ (lines, 1);
286 nih_free (output);
287
288+ job_pid = job_to_pid ("long-running-term");
289+ TEST_NE (job_pid, -1);
290+
291 /* Trigger shutdown */
292 assert0 (kill (upstart_pid, SIGTERM));
293
294 /* Force reset */
295 test_user_mode = FALSE;
296
297- TEST_EQ (timed_waitpid (upstart_pid, TEST_QUIESCE_KILL_PHASE), upstart_pid);
298+ TEST_EQ (timed_waitpid (upstart_pid, 1+TEST_QUIESCE_KILL_PHASE), upstart_pid);
299
300 /* Should not now be running */
301 TEST_EQ (kill (upstart_pid, 0), -1);
302+ TEST_EQ (errno, ESRCH);
303
304 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
305 sessiondir, (int)upstart_pid));
306 unlink (session_file);
307
308+ /* the long-running job pid should no longer exist */
309+ kill (job_pid, SIGKILL);
310+ TEST_EQ (errno, ESRCH);
311+
312+ file = fopen (pid_file, "r");
313+ TEST_NE_P (file, NULL);
314+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
315+ fclose (file);
316+
317+ /* .... but the session-end job pid should still be running */
318+ TEST_EQ (kill (job_pid, SIGKILL), 0);
319+
320+ assert0 (unlink (pid_file));
321+
322 DELETE_FILE (confdir, "long-running-term.conf");
323 DELETE_FILE (confdir, "session-end-term.conf");
324
325@@ -11449,6 +11558,9 @@
326 TEST_EQ (lines, 1);
327 nih_free (output);
328
329+ job_pid = job_to_pid ("long-running");
330+ TEST_NE (job_pid, -1);
331+
332 upstart = upstart_open (NULL);
333 TEST_NE_P (upstart, NULL);
334
335@@ -11465,11 +11577,16 @@
336
337 /* Should not now be running */
338 TEST_EQ (kill (upstart_pid, 0), -1);
339+ TEST_EQ (errno, ESRCH);
340
341 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
342 sessiondir, (int)upstart_pid));
343 unlink (session_file);
344
345+ /* pid should no longer exist */
346+ TEST_EQ (kill (job_pid, SIGKILL), -1);
347+ TEST_EQ (errno, ESRCH);
348+
349 DELETE_FILE (confdir, "long-running.conf");
350
351 /*******************************************************************/
352@@ -11484,13 +11601,16 @@
353 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
354
355 cmd = nih_sprintf (NULL, "%s start %s 2>&1",
356- get_initctl (), "long-running");
357+ get_initctl (), "long-running-term");
358 TEST_NE_P (cmd, NULL);
359
360 RUN_COMMAND (NULL, cmd, &output, &lines);
361 TEST_EQ (lines, 1);
362 nih_free (output);
363
364+ job_pid = job_to_pid ("long-running-term");
365+ TEST_NE (job_pid, -1);
366+
367 upstart = upstart_open (NULL);
368 TEST_NE_P (upstart, NULL);
369
370@@ -11507,23 +11627,32 @@
371
372 /* Should not now be running */
373 TEST_EQ (kill (upstart_pid, 0), -1);
374+ TEST_EQ (errno, ESRCH);
375
376 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
377 sessiondir, (int)upstart_pid));
378 unlink (session_file);
379
380+ /* pid should no longer exist */
381+ TEST_EQ (kill (job_pid, SIGKILL), -1);
382+ TEST_EQ (errno, ESRCH);
383+
384 DELETE_FILE (confdir, "long-running-term.conf");
385
386 /*******************************************************************/
387 TEST_FEATURE ("session shutdown: one job which starts on session-end");
388
389- CREATE_FILE (confdir, "session-end.conf",
390- "start on session-end\n"
391- "\n"
392- "script\n"
393- " echo hello\n"
394- " sleep 999\n"
395- "end script");
396+ TEST_FILENAME (pid_file);
397+
398+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
399+ "\n"
400+ "script\n"
401+ " echo hello\n"
402+ " echo $$ >\"%s\"\n"
403+ " exec sleep 999\n"
404+ "end script", pid_file));
405+
406+ CREATE_FILE (confdir, "session-end.conf", job);
407
408 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
409
410@@ -11555,6 +11684,17 @@
411 TEST_EQ (fclose (file), 0);
412 assert0 (unlink (logfile));
413
414+ file = fopen (pid_file, "r");
415+ TEST_NE_P (file, NULL);
416+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
417+ fclose (file);
418+
419+ /* pid should no longer exist */
420+ TEST_EQ (kill (job_pid, SIGKILL), -1);
421+ TEST_EQ (errno, ESRCH);
422+
423+ assert0 (unlink (pid_file));
424+
425 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
426 sessiondir, (int)upstart_pid));
427 unlink (session_file);
428@@ -11564,14 +11704,18 @@
429 /*******************************************************************/
430 TEST_FEATURE ("session shutdown: one job which starts on session-end");
431
432- CREATE_FILE (confdir, "session-end-term.conf",
433- "start on session-end\n"
434- "\n"
435- "script\n"
436- " trap '' TERM\n"
437- " echo hello\n"
438- " sleep 999\n"
439- "end script");
440+ TEST_FILENAME (pid_file);
441+
442+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
443+ "\n"
444+ "script\n"
445+ " trap '' TERM\n"
446+ " echo hello\n"
447+ " echo $$ >\"%s\"\n"
448+ " exec sleep 999\n"
449+ "end script", pid_file));
450+
451+ CREATE_FILE (confdir, "session-end-term.conf", job);
452
453 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
454
455@@ -11603,6 +11747,17 @@
456 TEST_EQ (fclose (file), 0);
457 assert0 (unlink (logfile));
458
459+ file = fopen (pid_file, "r");
460+ TEST_NE_P (file, NULL);
461+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
462+ fclose (file);
463+
464+ /* pid should no longer exist */
465+ TEST_EQ (kill (job_pid, SIGKILL), -1);
466+ TEST_EQ (errno, ESRCH);
467+
468+ assert0 (unlink (pid_file));
469+
470 session_file = NIH_MUST (nih_sprintf (NULL, "%s/upstart/sessions/%d.session",
471 sessiondir, (int)upstart_pid));
472 unlink (session_file);
473@@ -11620,13 +11775,17 @@
474 " sleep 999\n"
475 "end script");
476
477- CREATE_FILE (confdir, "session-end-term.conf",
478- "start on session-end\n"
479- "\n"
480- "script\n"
481- " trap '' TERM\n"
482- " sleep 999\n"
483- "end script");
484+ TEST_FILENAME (pid_file);
485+
486+ job = NIH_MUST (nih_sprintf (NULL, "start on session-end\n"
487+ "\n"
488+ "script\n"
489+ " trap '' TERM\n"
490+ " echo $$ >\"%s\"\n"
491+ " sleep 999\n"
492+ "end script", pid_file));
493+
494+ CREATE_FILE (confdir, "session-end-term.conf", job);
495
496 start_upstart_common (&upstart_pid, TRUE, confdir, logdir, NULL);
497
498@@ -11638,6 +11797,9 @@
499 TEST_EQ (lines, 1);
500 nih_free (output);
501
502+ job_pid = job_to_pid ("long-running-term");
503+ TEST_NE (job_pid, -1);
504+
505 upstart = upstart_open (NULL);
506 TEST_NE_P (upstart, NULL);
507
508@@ -11659,6 +11821,21 @@
509 sessiondir, (int)upstart_pid));
510 unlink (session_file);
511
512+ /* pid should no longer exist */
513+ TEST_EQ (kill (job_pid, SIGKILL), -1);
514+ TEST_EQ (errno, ESRCH);
515+
516+ file = fopen (pid_file, "r");
517+ TEST_NE_P (file, NULL);
518+ TEST_EQ (fscanf (file, "%d", &job_pid), 1);
519+ fclose (file);
520+
521+ /* pid should no longer exist */
522+ TEST_EQ (kill (job_pid, SIGKILL), -1);
523+ TEST_EQ (errno, ESRCH);
524+
525+ assert0 (unlink (pid_file));
526+
527 DELETE_FILE (confdir, "long-running-term.conf");
528 DELETE_FILE (confdir, "session-end-term.conf");
529

Subscribers

People subscribed via source and target branches