Merge lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad

Proposed by Curtis Hovey
Status: Merged
Approved by: Curtis Hovey
Approved revision: no longer in the source branch.
Merged at revision: 16412
Proposed branch: lp:~sinzui/launchpad/shutdown-mailman
Merge into: lp:launchpad
Diff against target: 135 lines (+50/-27)
3 files modified
lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py (+11/-3)
lib/lp/services/mailman/tests/test_xmlrpcrunner.py (+37/-0)
lib/lp/testing/smtpd.py (+2/-24)
To merge this branch: bzr merge lp:~sinzui/launchpad/shutdown-mailman
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Review via email: mp+141967@code.launchpad.net

Commit message

Update the mailman xmlrpc runner to exit early when the service is stopped.

Description of the change

During the last few deployments (possibly longer) mailman hasn't shut
down cleaning. We run the initscript stop -
https://pastebin.canonical.com/45791/ - and the mailmanctl process is
still running and needs to manually killed. This makes mailman a tough
target for a "nodowntime" deployment as it would block on failing to
shut down.

RULES

    Pre-implementation: gnuoy
    * We see that the xmlrpc queue runner continues to run after the rest
      of mailman has shutdown.
      * Other queues process 1 file/message per oneloop() and the longest
        we have seen a process run is about 3 minutes.
      * The xmlrpc runner does not process files, it make a series of
        network calls that take an average of 15 minutes to complete
        per oneloop().
    * The base runner class provides shortcircuit() which can be checked
      by oneloop() to exit early. The get_subscriptions() method takes
      up most of the time oneloop() uses; the method is actually doing
      batching work in a subloop. The subloop can also check to exit early.

QA

    * Stop staging's mailman to verify that shutdown completes quickly,
      and that it can be restarted without a paused process. We have
      verified that staging behaves exactly like production, taking
      between 5 and 15 minutes to shutdown the xmlrpc queue, and the main
      mailman process is left behind.

LINT

    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py

LoC

    I have more than 10,000 lines of credit this week.

TEST

    ./bin/test -vc -t OneLoop lp.services.mailman

IMPLEMENTATION

I added a call to runner._shortcircuit() to the _get_subscriptions() method.
I did the same to the _oneloop() method, but I changed the method to
loop over the all the methods to avoid repeated guards around each
method.
    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

I Removed some unused/brittle smptd -> mailman test helper code. The code was not used, and we don't want to use it since is was a source of timeouts in tests.

Revision history for this message
Benji York (benji) wrote :

This looks good other than the conflict jetsam starting on line 108 of the diff.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py'
--- lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py 2013-01-07 03:29:28 +0000
+++ lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py 2013-01-08 14:30:32 +0000
@@ -137,10 +137,15 @@
137 This method always returns 0 to indicate to the base class's main loop137 This method always returns 0 to indicate to the base class's main loop
138 that it should sleep for a while after calling this method.138 that it should sleep for a while after calling this method.
139 """139 """
140 methods = (
141 self._check_list_actions, self._get_subscriptions,
142 self._check_held_messages)
140 try:143 try:
141 self._check_list_actions()144 for method in methods:
142 self._get_subscriptions()145 if self._shortcircuit():
143 self._check_held_messages()146 # The runner was shutdown during the long network call.
147 break
148 method()
144 except:149 except:
145 # Log the exception and report an OOPS.150 # Log the exception and report an OOPS.
146 log_exception('Unexpected XMLRPCRunner exception')151 log_exception('Unexpected XMLRPCRunner exception')
@@ -344,6 +349,9 @@
344 # different.349 # different.
345 shuffle(lists)350 shuffle(lists)
346 while lists:351 while lists:
352 if self._shortcircuit():
353 # Stop was called during a long network call.
354 return
347 batch = lists[:mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE]355 batch = lists[:mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE]
348 lists = lists[mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE:]356 lists = lists[mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE:]
349 ## syslog('xmlrpc', 'batch: %s', batch)357 ## syslog('xmlrpc', 'batch: %s', batch)
350358
=== modified file 'lib/lp/services/mailman/tests/test_xmlrpcrunner.py'
--- lib/lp/services/mailman/tests/test_xmlrpcrunner.py 2013-01-04 05:42:01 +0000
+++ lib/lp/services/mailman/tests/test_xmlrpcrunner.py 2013-01-08 14:30:32 +0000
@@ -40,6 +40,7 @@
40from lp.services.xmlrpc import Transport40from lp.services.xmlrpc import Transport
41from lp.testing import (41from lp.testing import (
42 person_logged_in,42 person_logged_in,
43 monkey_patch,
43 TestCase,44 TestCase,
44 )45 )
45from lp.testing.fixture import CaptureOops46from lp.testing.fixture import CaptureOops
@@ -379,6 +380,20 @@
379 with locked_list(mm_list_2):380 with locked_list(mm_list_2):
380 self.assertEqual(1, mm_list_2.isMember(lp_user_email))381 self.assertEqual(1, mm_list_2.isMember(lp_user_email))
381382
383 def test_get_subscriptions_shortcircut(self):
384 # The method exist earlty without completing the update when
385 # the runner is stopping.
386 team, mailing_list = self.makeTeamList('team-1', 'owner-1')
387 lp_user_email = 'albatros@eg.dom'
388 lp_user = self.factory.makePerson(name='albatros', email=lp_user_email)
389 with person_logged_in(lp_user):
390 # The factory person has auto join mailing list enabled.
391 lp_user.join(team)
392 self.runner.stop()
393 self.runner._get_subscriptions()
394 with locked_list(self.mm_list):
395 self.assertEqual(0, self.mm_list.isMember(lp_user_email))
396
382 def test_constructing_to_active_recovery(self):397 def test_constructing_to_active_recovery(self):
383 # Lp is informed of the active list if it wrongly believes it is398 # Lp is informed of the active list if it wrongly believes it is
384 # being constructed.399 # being constructed.
@@ -418,3 +433,25 @@
418 MailingListStatus.UPDATING)433 MailingListStatus.UPDATING)
419 self.runner._oneloop()434 self.runner._oneloop()
420 self.assertEqual(MailingListStatus.ACTIVE, mailing_list.status)435 self.assertEqual(MailingListStatus.ACTIVE, mailing_list.status)
436
437 def test_shortcircuit(self):
438 # Oneloop will exit early if the runner is stopping.
439 class State:
440
441 def __init__(self):
442 self.checked = None
443
444 shortcircut = State()
445
446 def fake_called():
447 shortcircut.checked = False
448
449 def fake_stop():
450 shortcircut.checked = True
451 self.runner.stop()
452
453 with monkey_patch(self.runner,
454 _check_list_actions=fake_stop, _get_subscriptions=fake_called):
455 self.runner._oneloop()
456 self.assertTrue(self.runner._shortcircuit())
457 self.assertTrue(shortcircut.checked)
421458
=== modified file 'lib/lp/testing/smtpd.py'
--- lib/lp/testing/smtpd.py 2013-01-07 03:29:28 +0000
+++ lib/lp/testing/smtpd.py 2013-01-08 14:30:32 +0000
@@ -31,29 +31,7 @@
31 message_id, message['to'],31 message_id, message['to'],
32 message['x-beenthere'],32 message['x-beenthere'],
33 message['x-mailfrom'], message['x-rcptto'])33 message['x-mailfrom'], message['x-rcptto'])
34 from Mailman.Utils import list_names34 self.queue.put(message)
35 listnames = list_names()
36 try:
37 local, hostname = message['to'].split('@', 1)
38 log.debug('local: %s, hostname: %s, listnames: %s',
39 local, hostname, listnames)
40 except ValueError:
41 # There was no '@' sign in the email message, so ignore it.
42 log.debug('Bad To header: %s', message.get('to', 'n/a'))
43 return
44 # If the message came from Mailman, place it onto the queue. If the
45 # local part indicates that the message is destined for a Mailman
46 # mailing list, deliver it to Mailman's incoming queue.
47 if local in listnames and 'x-beenthere' not in message:
48 # It's destined for a mailing list.
49 log.debug('delivered to Mailman: %s', message_id)
50 from Mailman.Post import inject
51 inject(local, message)
52 else:
53 # It came from Mailman and goes in the queue, or it's destined for
54 # a 'normal' user. Either way, it goes in the queue.
55 log.debug('delivered to upstream: %s', message_id)
56 self.queue.put(message)
5735
58 def reset(self):36 def reset(self):
59 # Base class is old-style.37 # Base class is old-style.
@@ -68,7 +46,7 @@
6846
69class SMTPController(QueueController):47class SMTPController(QueueController):
70 """A controller for the `SMTPServer`."""48 """A controller for the `SMTPServer`."""
71 49
72 def _make_server(self, host, port):50 def _make_server(self, host, port):
73 """See `QueueController`."""51 """See `QueueController`."""
74 self.server = SMTPServer(host, port, self.queue)52 self.server = SMTPServer(host, port, self.queue)