Merge lp:~rvb/maas/bug-lock-atexit into lp:~maas-committers/maas/trunk

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 1596
Proposed branch: lp:~rvb/maas/bug-lock-atexit
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 56 lines (+23/-1)
2 files modified
src/maasserver/start_up.py (+4/-0)
src/maasserver/tests/test_start_up.py (+19/-1)
To merge this branch: bzr merge lp:~rvb/maas/bug-lock-atexit
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve
Gavin Panella (community) Approve
Review via email: mp+185430@code.launchpad.net

Commit message

Remove the lock if the process gets killed in start_up().

Description of the change

Testing shows this fixes bug 1221059: what we suspect happens here is that, when the call to the avahi bindings errors, apache kills (with SIGTERM) the WSGI process. This causes the lock to be left on disk, which in turn prevents any other WSGI to start because start_up() blocks, waiting for the lock to be removed.

Note that this will only make MAAS cope with transient avahi errors, if an error is permanent (i.e. if the call to the avahi bindings always errors) then this won't mask the problem.

Testing show that this fix works: I've build a package with this fix and tested it.
http://10.189.74.2:8080/view/MAAS/job/saucy-adt-maas-manual/buildTimeTrend
Runs [27-30] are done using the package with this fix.
Runs [31-34] are done with the daily package (i.e. without the fix)
(The tests used here are a simplified version of the integration tests, which only run up to the point where one can log into the API)

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good. I'm still baffled that atexit works but finally doesn't,
but data is king and I'm happy that you've figured out a way to
address this issue.

[1]

+        atexit_recorder = self.patch(start_up, 'register_atexit')

You could patch the atexit module directly. That would remove the need
for the import-as. Or you could mock the atexit name in the start_up
module.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Glad to hear that this seems to work. A few notes:

 * Why rename register() on import? I'd just "import atexit" and call "atexit.register" — and patch(atexit, 'register') in the test. Simpler all round, I think.

 * Do we know that lock.break_lock won't blow up even when the lock is not held? It could be hard to detect if it did, but might prevent further cleanup.

 * The noun and the command are called "shutdown." The verb is "shut down."

review: Approve
Revision history for this message
Raphaël Badin (rvb) wrote :

> Glad to hear that this seems to work. A few notes:
>
> * Why rename register() on import? I'd just "import atexit" and call
> "atexit.register" — and patch(atexit, 'register') in the test. Simpler all
> round, I think.

All right, same remark from Gavin. I thought it was a bit clearer with the rename but it looks I'm the minority here.

>
> * Do we know that lock.break_lock won't blow up even when the lock is not
> held? It could be hard to detect if it did, but might prevent further
> cleanup.

Yes, I tested it.

> * The noun and the command are called "shutdown." The verb is "shut down."

Okay :).

Revision history for this message
Raphaël Badin (rvb) wrote :

> Looks good. I'm still baffled that atexit works but finally doesn't,
> but data is king and I'm happy that you've figured out a way to
> address this issue.

To be fair, Jeroen suggested this fix.

> [1]
>
> +        atexit_recorder = self.patch(start_up, 'register_atexit')
>
> You could patch the atexit module directly. That would remove the need
> for the import-as. Or you could mock the atexit name in the start_up
> module.

All right, Jeroen feels the same so I'm outnumbered :)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/maasserver/start_up.py'
2--- src/maasserver/start_up.py 2013-08-23 06:59:22 +0000
3+++ src/maasserver/start_up.py 2013-09-13 08:46:53 +0000
4@@ -15,6 +15,7 @@
5 ]
6
7
8+import atexit
9 from textwrap import dedent
10
11 from django.db import connection
12@@ -56,6 +57,9 @@
13 internally are not ran concurrently.
14 """
15 lock = FileLock(LOCK_FILE_NAME)
16+ # In case this process gets shut down, clean up the lock.
17+ atexit.register(lock.break_lock)
18+
19 lock.acquire(timeout=LOCK_TIMEOUT)
20 try:
21 inner_start_up()
22
23=== modified file 'src/maasserver/tests/test_start_up.py'
24--- src/maasserver/tests/test_start_up.py 2013-09-11 08:13:20 +0000
25+++ src/maasserver/tests/test_start_up.py 2013-09-13 08:46:53 +0000
26@@ -30,7 +30,11 @@
27 from maasserver.testing.testcase import MAASServerTestCase
28 from maastesting.celery import CeleryFixture
29 from maastesting.fakemethod import FakeMethod
30-from mock import Mock
31+from mock import (
32+ call,
33+ MagicMock,
34+ Mock,
35+ )
36 from provisioningserver import tasks
37 from testresources import FixtureResource
38
39@@ -153,3 +157,17 @@
40 self.assertNotIn(
41 COMPONENT.IMPORT_PXE_FILES,
42 [args[0][0] for args in recorder.call_args_list])
43+
44+ def test_start_up_registers_atexit_lock_cleanup(self):
45+ filelock_mock = MagicMock()
46+ self.patch(start_up, 'FileLock', Mock(side_effect=filelock_mock))
47+ # Patch atexit.register to assert it's called with the right
48+ # argument.
49+ atexit_mock = self.patch(start_up, 'atexit')
50+
51+ start_up.start_up()
52+
53+ self.assertEqual(
54+ [call.register(
55+ filelock_mock(start_up.LOCK_FILE_NAME).break_lock)],
56+ atexit_mock.mock_calls)