Merge lp:~allenap/maas/dev-services-shutdown into lp:~maas-committers/maas/trunk

Proposed by Gavin Panella
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: 1419
Proposed branch: lp:~allenap/maas/dev-services-shutdown
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 30 lines (+5/-4)
2 files modified
services/cluster-worker/run (+1/-1)
services/region-worker/run (+4/-3)
To merge this branch: bzr merge lp:~allenap/maas/dev-services-shutdown
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Review via email: mp+141645@code.launchpad.net

Commit message

Use pgrphack to ensure that the celeryd development services shut down correctly.

Previously the region-worker service would leave processes behind, and the cluster-worker service would hang if not started via fghack.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

On IRC:

> <rvba> ... I'm still having an error when I try to start the services
> again (after they have been stopped):
>
> {{{
> ==> logs/webapp/current <==
> [02/Jan/2013 18:28:36] "GET /accounts/login/?next=%2F HTTP/1.1" 200 53437
> ^C--> Stop `web`
> --> Stop `region-worker`
> --> Stop `database`
> --> Stop `txlongpoll`
> --> Stop `pserv`
> --> Stop `dns`
> --> Stop `webapp`
> --> Stop `reloader`
> --> Stop `cluster-worker`
>
> rvb@leaf:~/canonical/dev-services-shutdown$ make run
> --> Start `web`
> setlock: fatal: unable to lock /run/lock/maas.dev.web: temporary failure
> }}}
>
> …Am I missing something?

This means that either a supervise process (started by `make
services/<name>/@supervise` or something that depends on that) is
still running, or the service is running somewhere else, even in
another branch (possibly invoked by `make services/<name>/@run`).

Try `fuser -v /run/lock/maas.dev.web` to see which process is still
holding that lock.

Revision history for this message
Raphaël Badin (rvb) wrote :

rvb@leaf:~/canonical/dev-services-shutdown$ fuser -v /run/lock/maas.dev.web
                     USER PID ACCESS COMMAND
/run/lock/maas.dev.web:
                     rvb 27125 F.... fghack
                     rvb 27132 F.... apache2

Sometimes I also get:

--> Start `web`
--> Start `region-worker`
--> Start `database`
setlock: fatal: unable to lock /run/lock/maas.dev.database: temporary failure

rvb@leaf:~/canonical/dev-services-shutdown$ fuser -v /run/lock/maas.dev.database
                     USER PID ACCESS COMMAND
/run/lock/maas.dev.database:
                     rvb 28243 F.... postgres
                     rvb 28272 F.... postgres
                     rvb 28273 F.... postgres
                     rvb 28274 F.... postgres
                     rvb 28275 F.... postgres

Note that it does not happen every time I run 'make run'…

Revision history for this message
Raphaël Badin (rvb) wrote :

As I said this morning, it looks like the services (celery, apache2, postgres) take a few seconds to actually stop and you get in trouble if you try to start them up right after they've been told to shutdown by svok. If one waits a couple of seconds after hitting CTRL-C, then the services can be re-started ok.

This is not ideal but already a massive improvement over what we have now.

This is probably worth backporting to 1.2 btw.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'services/cluster-worker/run'
--- services/cluster-worker/run 2012-12-11 13:30:57 +0000
+++ services/cluster-worker/run 2013-01-02 17:01:23 +0000
@@ -22,5 +22,5 @@
22export CLUSTER_UUID="adfd3977-f251-4f2c-8d61-745dbd690bfc"22export CLUSTER_UUID="adfd3977-f251-4f2c-8d61-745dbd690bfc"
2323
24script="$(readlink -f bin/maas-provision)"24script="$(readlink -f bin/maas-provision)"
25exec fghack "${script}" start-cluster-controller \25exec pgrphack "${script}" start-cluster-controller \
26 http://0.0.0.0:5240/ -u "$(id -un)" -g "$(id -gn)"26 http://0.0.0.0:5240/ -u "$(id -un)" -g "$(id -gn)"
2727
=== modified file 'services/region-worker/run'
--- services/region-worker/run 2012-10-05 12:08:51 +0000
+++ services/region-worker/run 2013-01-02 17:01:23 +0000
@@ -14,11 +14,12 @@
14# because there are race issues when restarting.14# because there are race issues when restarting.
15[ -z "${logdir:-}" ] || exec &>> "${logdir}/current"15[ -z "${logdir:-}" ] || exec &>> "${logdir}/current"
1616
17# XXX JeroenVermeulen 2012-08-23, bug=1040529: Use fghack to kludge around
18# hanging celery shutdown.
19export PYTHONPATH=etc/:src/17export PYTHONPATH=etc/:src/
20script="$(readlink -f bin/celeryd)"18script="$(readlink -f bin/celeryd)"
21exec fghack "${script}" \19# XXX GavinPanella 2013-01-02, bug=1040529: celeryd does not shutdown
20# correctly when signalled: processes are often left behind. However,
21# pgrphack works around this, ensuring a complete shutdown.
22exec pgrphack "${script}" \
22 --loglevel INFO --beat --queues celery,master \23 --loglevel INFO --beat --queues celery,master \
23 --schedule=run/celerybeat-region-schedule \24 --schedule=run/celerybeat-region-schedule \
24 --config=democeleryconfig25 --config=democeleryconfig