Merge lp:~allenap/maas/dev-services-shutdown into lp:maas/trunk

Proposed by Gavin Panella
Status: Merged
Approved by: Gavin Panella
Approved revision: 1418
Merged at revision: 1419
Proposed branch: lp:~allenap/maas/dev-services-shutdown
Merge into: lp:maas/trunk
Diff against target: 30 lines (+5/-4)
2 files modified
services/cluster-worker/run (+1/-1)
services/region-worker/run (+4/-3)
To merge this branch: bzr merge lp:~allenap/maas/dev-services-shutdown
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Review via email:

Commit message

Use pgrphack to ensure that the celeryd development services shut down correctly.

Previously the region-worker service would leave processes behind, and the cluster-worker service would hang if not started via fghack.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :


> <rvba> ... I'm still having an error when I try to start the services
> again (after they have been stopped):
> {{{
> ==> logs/webapp/current <==
> [02/Jan/2013 18:28:36] "GET /accounts/login/?next=%2F HTTP/1.1" 200 53437
> ^C--> Stop `web`
> --> Stop `region-worker`
> --> Stop `database`
> --> Stop `txlongpoll`
> --> Stop `pserv`
> --> Stop `dns`
> --> Stop `webapp`
> --> Stop `reloader`
> --> Stop `cluster-worker`
> rvb@leaf:~/canonical/dev-services-shutdown$ make run
> --> Start `web`
> setlock: fatal: unable to lock /run/lock/ temporary failure
> }}}
> …Am I missing something?

This means that either a supervise process (started by `make
services/<name>/@supervise` or something that depends on that) is
still running, or the service is running somewhere else, even in
another branch (possibly invoked by `make services/<name>/@run`).

Try `fuser -v /run/lock/` to see which process is still
holding that lock.

Revision history for this message
Raphaël Badin (rvb) wrote :

rvb@leaf:~/canonical/dev-services-shutdown$ fuser -v /run/lock/
                     USER PID ACCESS COMMAND
                     rvb 27125 F.... fghack
                     rvb 27132 F.... apache2

Sometimes I also get:

--> Start `web`
--> Start `region-worker`
--> Start `database`
setlock: fatal: unable to lock /run/lock/ temporary failure

rvb@leaf:~/canonical/dev-services-shutdown$ fuser -v /run/lock/
                     USER PID ACCESS COMMAND
                     rvb 28243 F.... postgres
                     rvb 28272 F.... postgres
                     rvb 28273 F.... postgres
                     rvb 28274 F.... postgres
                     rvb 28275 F.... postgres

Note that it does not happen every time I run 'make run'…

Revision history for this message
Raphaël Badin (rvb) wrote :

As I said this morning, it looks like the services (celery, apache2, postgres) take a few seconds to actually stop and you get in trouble if you try to start them up right after they've been told to shutdown by svok. If one waits a couple of seconds after hitting CTRL-C, then the services can be re-started ok.

This is not ideal but already a massive improvement over what we have now.

This is probably worth backporting to 1.2 btw.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'services/cluster-worker/run'
2--- services/cluster-worker/run 2012-12-11 13:30:57 +0000
3+++ services/cluster-worker/run 2013-01-02 17:01:23 +0000
4@@ -22,5 +22,5 @@
5 export CLUSTER_UUID="adfd3977-f251-4f2c-8d61-745dbd690bfc"
7 script="$(readlink -f bin/maas-provision)"
8-exec fghack "${script}" start-cluster-controller \
9+exec pgrphack "${script}" start-cluster-controller \
10 -u "$(id -un)" -g "$(id -gn)"
12=== modified file 'services/region-worker/run'
13--- services/region-worker/run 2012-10-05 12:08:51 +0000
14+++ services/region-worker/run 2013-01-02 17:01:23 +0000
15@@ -14,11 +14,12 @@
16 # because there are race issues when restarting.
17 [ -z "${logdir:-}" ] || exec &>> "${logdir}/current"
19-# XXX JeroenVermeulen 2012-08-23, bug=1040529: Use fghack to kludge around
20-# hanging celery shutdown.
21 export PYTHONPATH=etc/:src/
22 script="$(readlink -f bin/celeryd)"
23-exec fghack "${script}" \
24+# XXX GavinPanella 2013-01-02, bug=1040529: celeryd does not shutdown
25+# correctly when signalled: processes are often left behind. However,
26+# pgrphack works around this, ensuring a complete shutdown.
27+exec pgrphack "${script}" \
28 --loglevel INFO --beat --queues celery,master \
29 --schedule=run/celerybeat-region-schedule \
30 --config=democeleryconfig