Merge lp:~rvb/maas/bug-lock-atexit into lp:~maas-committers/maas/trunk

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 1596
Proposed branch: lp:~rvb/maas/bug-lock-atexit
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 56 lines (+23/-1)
2 files modified
src/maasserver/start_up.py (+4/-0)
src/maasserver/tests/test_start_up.py (+19/-1)
To merge this branch: bzr merge lp:~rvb/maas/bug-lock-atexit
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) Approve
Gavin Panella (community) Approve
Review via email: mp+185430@code.launchpad.net

Commit message

Remove the lock if the process gets killed in start_up().

Description of the change

Testing shows this fixes bug 1221059: what we suspect happens here is that, when the call to the avahi bindings errors, apache kills (with SIGTERM) the WSGI process. This causes the lock to be left on disk, which in turn prevents any other WSGI to start because start_up() blocks, waiting for the lock to be removed.

Note that this will only make MAAS cope with transient avahi errors, if an error is permanent (i.e. if the call to the avahi bindings always errors) then this won't mask the problem.

Testing show that this fix works: I've build a package with this fix and tested it.
http://10.189.74.2:8080/view/MAAS/job/saucy-adt-maas-manual/buildTimeTrend
Runs [27-30] are done using the package with this fix.
Runs [31-34] are done with the daily package (i.e. without the fix)
(The tests used here are a simplified version of the integration tests, which only run up to the point where one can log into the API)

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good. I'm still baffled that atexit works but finally doesn't,
but data is king and I'm happy that you've figured out a way to
address this issue.

[1]

+        atexit_recorder = self.patch(start_up, 'register_atexit')

You could patch the atexit module directly. That would remove the need
for the import-as. Or you could mock the atexit name in the start_up
module.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Glad to hear that this seems to work. A few notes:

 * Why rename register() on import? I'd just "import atexit" and call "atexit.register" — and patch(atexit, 'register') in the test. Simpler all round, I think.

 * Do we know that lock.break_lock won't blow up even when the lock is not held? It could be hard to detect if it did, but might prevent further cleanup.

 * The noun and the command are called "shutdown." The verb is "shut down."

review: Approve
Revision history for this message
Raphaël Badin (rvb) wrote :

> Glad to hear that this seems to work. A few notes:
>
> * Why rename register() on import? I'd just "import atexit" and call
> "atexit.register" — and patch(atexit, 'register') in the test. Simpler all
> round, I think.

All right, same remark from Gavin. I thought it was a bit clearer with the rename but it looks I'm the minority here.

>
> * Do we know that lock.break_lock won't blow up even when the lock is not
> held? It could be hard to detect if it did, but might prevent further
> cleanup.

Yes, I tested it.

> * The noun and the command are called "shutdown." The verb is "shut down."

Okay :).

Revision history for this message
Raphaël Badin (rvb) wrote :

> Looks good. I'm still baffled that atexit works but finally doesn't,
> but data is king and I'm happy that you've figured out a way to
> address this issue.

To be fair, Jeroen suggested this fix.

> [1]
>
> +        atexit_recorder = self.patch(start_up, 'register_atexit')
>
> You could patch the atexit module directly. That would remove the need
> for the import-as. Or you could mock the atexit name in the start_up
> module.

All right, Jeroen feels the same so I'm outnumbered :)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/maasserver/start_up.py'
--- src/maasserver/start_up.py 2013-08-23 06:59:22 +0000
+++ src/maasserver/start_up.py 2013-09-13 08:46:53 +0000
@@ -15,6 +15,7 @@
15 ]15 ]
1616
1717
18import atexit
18from textwrap import dedent19from textwrap import dedent
1920
20from django.db import connection21from django.db import connection
@@ -56,6 +57,9 @@
56 internally are not ran concurrently.57 internally are not ran concurrently.
57 """58 """
58 lock = FileLock(LOCK_FILE_NAME)59 lock = FileLock(LOCK_FILE_NAME)
60 # In case this process gets shut down, clean up the lock.
61 atexit.register(lock.break_lock)
62
59 lock.acquire(timeout=LOCK_TIMEOUT)63 lock.acquire(timeout=LOCK_TIMEOUT)
60 try:64 try:
61 inner_start_up()65 inner_start_up()
6266
=== modified file 'src/maasserver/tests/test_start_up.py'
--- src/maasserver/tests/test_start_up.py 2013-09-11 08:13:20 +0000
+++ src/maasserver/tests/test_start_up.py 2013-09-13 08:46:53 +0000
@@ -30,7 +30,11 @@
30from maasserver.testing.testcase import MAASServerTestCase30from maasserver.testing.testcase import MAASServerTestCase
31from maastesting.celery import CeleryFixture31from maastesting.celery import CeleryFixture
32from maastesting.fakemethod import FakeMethod32from maastesting.fakemethod import FakeMethod
33from mock import Mock33from mock import (
34 call,
35 MagicMock,
36 Mock,
37 )
34from provisioningserver import tasks38from provisioningserver import tasks
35from testresources import FixtureResource39from testresources import FixtureResource
3640
@@ -153,3 +157,17 @@
153 self.assertNotIn(157 self.assertNotIn(
154 COMPONENT.IMPORT_PXE_FILES,158 COMPONENT.IMPORT_PXE_FILES,
155 [args[0][0] for args in recorder.call_args_list])159 [args[0][0] for args in recorder.call_args_list])
160
161 def test_start_up_registers_atexit_lock_cleanup(self):
162 filelock_mock = MagicMock()
163 self.patch(start_up, 'FileLock', Mock(side_effect=filelock_mock))
164 # Patch atexit.register to assert it's called with the right
165 # argument.
166 atexit_mock = self.patch(start_up, 'atexit')
167
168 start_up.start_up()
169
170 self.assertEqual(
171 [call.register(
172 filelock_mock(start_up.LOCK_FILE_NAME).break_lock)],
173 atexit_mock.mock_calls)