Merge lp:~jamesodhunt/upstart/bug-901038 into lp:upstart

Proposed by James Hunt
Status: Merged
Merged at revision: 1664
Proposed branch: lp:~jamesodhunt/upstart/bug-901038
Merge into: lp:upstart
Diff against target: 123 lines (+77/-16) (has conflicts)
2 files modified
ChangeLog (+15/-0)
util/telinit.c (+62/-16)
Text conflict in ChangeLog
To merge this branch: bzr merge lp:~jamesodhunt/upstart/bug-901038
Reviewer Review Type Date Requested Status
Steve Langasek Approve
Review via email: mp+231705@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Stéphane Graber (stgraber) wrote :

Should there be some kind of timeout, say a minute or so to catch the unlikely case where upstart is messed up an never starts listening again on dbus?

Also, you mention that a rejected reexec request is unlikely, but what happens if a non-root user does it, wouldn't that fail and then have the rest of the code return success anyway?

Revision history for this message
James Hunt (jamesodhunt) wrote :

>Should there be some kind of timeout, say a minute or so to catch the unlikely case where upstart
> is messed up an never starts listening again on dbus?
I can understand the concern, but I don't think we can realistically provide a timeout as we don't know what a reasonable re-exec time is in all circumstances: infinity's >1 minute re-exec time was clearly the result of a bug (bug 1338637), but the re-exec _did_ complete correctly - it just took a very long time.

Also, telinit is connecting to the private D-Bus socket and if that isn't available, Upstarts capabilities are then severly restricted.

> Also, you mention that a rejected reexec request is unlikely, but what happens if a non-root user
> does it, wouldn't that fail and then have the rest of the code return success anyway?
If a non-root user runs 'telinit u', they get rejected immediately as telinit ensures the user running the command is root. I think the only scenarios where the re-exec request would be rejected are:

1) Where telinit is attempting to talk to an old version of upstart that doesn't support re-exec.
   We can ignore this scenario as we don't support down-grade scenarios.

2) Where telinit is attempting to talk to a non-Upstart init daemon.

   I guess this could be an issue in Ubuntu, depending on exactly how we manage the init
   transition. However, we should be able to arrange for maintainer scripts to set
   UPSTART_TELINIT_U_NO_WAIT to avoid that issue.

Revision history for this message
Steve Langasek (vorlon) wrote :

> 1) Where telinit is attempting to talk to an old version of upstart that doesn't support
> re-exec. We can ignore this scenario as we don't support down-grade scenarios.

> 2) Where telinit is attempting to talk to a non-Upstart init daemon.

> I guess this could be an issue in Ubuntu, depending on exactly how we manage the init
> transition. However, we should be able to arrange for maintainer scripts to set
> UPSTART_TELINIT_U_NO_WAIT to avoid that issue.

I don't think any solution should require maintainer scripts to set an extra variable to get sane behavior out of telinit.

If telinit is trying to talk to a non-upstart init, shouldn't it know it? In that case, it surely should not wait for a reconnect on an upstart-specific private socket, but should automatically fall back gracefully to a non-blocking restart.

However, the current upstart 'telinit u' code simply exits non-zero when pid 1 is not upstart - as does the proposed new code. So this is something we need to deal with if/when adding support for telinit u on top of non-upstart pid 1, but not something that should block this change now.

Given this, what is the expected use case for "UPSTART_TELINIT_U_NO_WAIT"? If you don't have a concrete user for this, please omit. Unnecessary interfaces always carry an incremental cost, and I can't see any case where it would ever be correct to use this interface, so this looks like overengineering to me.

Otherwise, this looks fine to me for merging.

review: Needs Fixing
lp:~jamesodhunt/upstart/bug-901038 updated
1662. By James Hunt

* util/telinit.c: Remove UPSTART_TELINIT_U_NO_WAIT check as it
  shouldn't realistically be needed.

Revision history for this message
James Hunt (jamesodhunt) wrote :

OK - removed the UPSTART_TELINIT_U_NO_WAIT logic.

Revision history for this message
Steve Langasek (vorlon) wrote :

Ok, merging.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'ChangeLog'
--- ChangeLog 2014-09-04 11:07:22 +0000
+++ ChangeLog 2014-09-08 10:12:37 +0000
@@ -1,3 +1,4 @@
1<<<<<<< TREE
12014-09-04 James Hunt <james.hunt@ubuntu.com>22014-09-04 James Hunt <james.hunt@ubuntu.com>
23
3 * NEWS: Release 1.13.24 * NEWS: Release 1.13.2
@@ -23,6 +24,20 @@
23 * init/tests/test_job.c: test_job_last_process(): New test for24 * init/tests/test_job.c: test_job_last_process(): New test for
24 job_last_process().25 job_last_process().
2526
27=======
282014-09-08 James Hunt <james.hunt@ubuntu.com>
29
30 * util/telinit.c: Remove UPSTART_TELINIT_U_NO_WAIT check as it
31 shouldn't realistically be needed.
32
332014-08-21 James Hunt <james.hunt@ubuntu.com>
34
35 * util/telinit.c: restart_upstart():
36 - Revert to synchronous behaviour coupled with unavoidable
37 poll to ensure telinit only returns once a re-exec has
38 completed (LP: #901038).
39
40>>>>>>> MERGE-SOURCE
262014-08-14 James Hunt <james.hunt@ubuntu.com>412014-08-14 James Hunt <james.hunt@ubuntu.com>
2742
28 * init/control.c: Disallow modifying system jobs via SetEnv,43 * init/control.c: Disallow modifying system jobs via SetEnv,
2944
=== modified file 'util/telinit.c'
--- util/telinit.c 2014-03-10 13:43:50 +0000
+++ util/telinit.c 2014-09-08 10:12:37 +0000
@@ -160,27 +160,73 @@
160int160int
161restart_upstart (void)161restart_upstart (void)
162{162{
163 nih_local NihDBusProxy *upstart = NULL;163 nih_local NihDBusProxy *upstart = NULL;
164 DBusPendingCall *ret;164 NihError *err;
165 int ret;
165166
166 upstart = upstart_open (NULL);167 upstart = upstart_open (NULL);
167 if (! upstart)168 if (! upstart)
168 return -1;169 return -1;
169170
170 /* Fire and forget:171 /* Ask Upstart to restart itself.
171 *172 *
172 * Ask Upstart to restart itself using the async interface to173 * Since it is not possible to serialise a D-Bus connection,
173 * avoid the client-side complaining if and when it detects that174 * Upstart is forced to sever all D-Bus client connections,
174 * Upstart has severed all connections to perform the re-exec.175 * including this one.
175 */176 *
176 ret = upstart_restart (upstart, NULL, NULL, NULL, 0);177 * Further, since the user expects telinit to block _until the
177 dbus_connection_flush(upstart->connection);178 * re-exec has finished and Upstart is accepting connections
178179 * once again_, the only solution is to wait for the forced
179 /* We don't care about the return code, but we have to keep180 * disconnect, then poll until it is possible to create a new
180 * the compiler happy.181 * connection.
181 */182 *
182 if (ret != (DBusPendingCall *)TRUE)183 * Note that we don't (can't) care about the return code since
183 return 0;184 * it's not reliable:
185 *
186 * - either the re-exec request completed and D-Bus returned zero
187 * before Upstart started the re-exec.
188 *
189 * - or the re-exec request completed but upstart started the
190 * re-exec (severing all D-Bus connections) before D-Bus got a
191 * chance to finish cleanly meaning we receive a return of -1.
192 *
193 * We cannot know exactly what happened so have to allow for
194 * both scenarios. Note the implicit assumption that the re-exec
195 * request itself was accepted. If this assumption is incorrect
196 * (should not be possible), the worst case scenario is that
197 * upstart does not re-exec and then we quickly drop out of the
198 * reconnect block since it never went offline.
199 */
200 ret = upstart_restart_sync (NULL, upstart);
201
202 if (ret < 0) {
203 err = nih_error_get ();
204 nih_free (err);
205 }
206
207 nih_free (upstart);
208
209 nih_debug ("Waiting for upstart to finish re-exec");
210
211 /* We believe Upstart is now in the process of
212 * re-exec'ing so attempt forever to reconnect.
213 *
214 * This sounds dangerous but there is no other option,
215 * and a connection must be possible unless the system
216 * is completely broken.
217 */
218 while (TRUE) {
219
220 upstart = upstart_open (NULL);
221 if (upstart)
222 break;
223
224 err = nih_error_get ();
225 nih_free (err);
226
227 /* Avoid DoS'ing the system whilst we wait */
228 usleep (100000);
229 }
184230
185 return 0;231 return 0;
186}232}

Subscribers

People subscribed via source and target branches