Launchpad itself

Merge lp:~adeuring/launchpad/bug-1015667-3 into lp:launchpad

bug-1015667-3
Merge into devel

Proposed by Abel Deuring on 2012-07-16

Status:

Merged

Approved by:

Abel Deuring on 2012-07-16

Approved revision:

no longer in the source branch.

Merged at revision:

15656

Proposed branch:

lp:~adeuring/launchpad/bug-1015667-3

Merge into:

lp:launchpad

Diff against target:

265 lines (+117/-23)

7 files modified

lib/lp/services/config/schema-lazr.conf (+7/-1)
lib/lp/services/job/celeryconfig.py (+5/-1)
lib/lp/services/job/celeryjob.py (+32/-14)
lib/lp/services/job/runner.py (+1/-1)
lib/lp/services/job/tests/test_celery_configuration.py (+2/-1)
lib/lp/services/job/tests/test_celeryjob.py (+69/-4)
lib/lp/services/job/tests/test_runner.py (+1/-1)

To merge this branch:

bzr merge lp:~adeuring/launchpad/bug-1015667-3

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Benji York (community)	code	2012-07-16	Approve on 2012-07-16
Review via email: mp+115110@code.launchpad.net

Commit message

run the Celery task RunMissingReady with ignore_result=True

Description of the change

This branch fixes bug 1015667: Celery is leaving an ever increasing number
of queues behind. Well, at least I hope that the bug will be fixed --
the result queues created by Celery tasks generally do not give any hint
which task created them: The queue names are the task IDs which are by
default strings generated by uuid4(), and the content of the queued
messages is nothing more than a status message and the data returned by
Task.run(). (OK, this is not completely true: Messages for failed tasks
also contain a traceback -- but that does not help much when we have
tasks that tend to run without failures.)

The culprit was the task run_missing_jobs which simply did not had
the flag ignore_result set.

I changed the task from a decorated function into a class derived
from celery.task.Task and defined the class attribute ignore_result.

This also allows also to override the default task ID. If no ID is
specified as a parameter for celery.task.Task.apply_async(), a default
value as mentioned above is created in Task.apply() or
celery.app.amqp.TaskPublisher.delay_task().

The test that no result messsage queue is created is somewhat paranoid:
Testing that an unwanted side effect of some other functionality does
not occur must ensure that main event actually happens. Otherwise,
the test might simply check: "If nothing happens, there is no result
queue."

test_run_missing_ready_does_not_return_results() does this by calling
list_queued() -- but since this function uses another function
drain_queues() which can also consume queued messages I wanted to be
sure that this does not happen, hence list_queues() is called twice
before celeryd is started.

    Side note, for extra fun: have a look at the implementation of
    list_queued() and drain_queues() in lazr.jobrunner. drain_queues()
    has the parameter "retain"; if it is True, messages are consumed;
    list_queued() calls drain_queued(retain=True, ...), so that looks
    sane. But the usage of this parameter in drain_queues() scares me:

            consumer = Consumer(
                connection, bindings, callbacks=callbacks, no_ack=not retain,
                auto_declare=not passive_queues)

    So no_ack must be False in order to kepp the message in the queue.
    And if you want to remove the message, you should set the "don't
    acknowledge" flag to False...

This test (and several others for the lazr.jobrunner package and for
lp.servcices.job) could be more simple and more robust if there would
be a way to easily check which queues exist on a rabbitmq instance
and how many messages they contain. There is "rabbitmqctl list_queues"
but this requires root privileges. Also, rabbitmq can provide a
webservice API, but last time I checked it was not available on precise.

And a final rant: After quickly glancing through the AMQP specs I believe
that the protocol does not provide a mecahnism to answer the question
"which queues exist on a given exchange". Scary...

Anyway, I also noticed a minor flaw in BaseRunnableJob.taskId(). This
method creates task IDs for regular job runner tasks, similar to
RUnMissingReady.appy_async(). BaseRunnableJob.taskId() also adds the
DB ID of a job and separated the different parts (class name, DB ID,
UUID) with '-'. But the '-' are removed from the queue name, so it is
a bit hard to separate the DB ID from the UUID in the queue name.
So I replaced the "separator symbol" '-' with '_'.

tests:

./bin/test services.job -vvt lp.services.job.tests.test_celeryjob
./bin/test services.job -vvt lp.services.job.tests.test_runner

no lint

Revision history for this message

Benji York (benji) wrote on 2012-07-16:

This branch looks great.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Abel Deuring

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Harpianto,ANDI

James Troup

John A Meinel

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Ubuntu-BR DevOps

William Grant

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'lib/lp/services/config/schema-lazr.conf'
 --- lib/lp/services/config/schema-lazr.conf	2012-07-03 17:09:00 +0000
 +++ lib/lp/services/config/schema-lazr.conf	2012-07-19 14:00:56 +0000
@@ -1805,7 +1805,7 @@
  [job_runner_queues]
  # The names of all queues.
--queues: job job_slow branch_write_job branch_write_job_slow
++queues: job job_slow branch_write_job branch_write_job_slow celerybeat
  # The main job queue.
  [job]
@@ -1837,3 +1837,9 @@
  timeout: 86400
  fallback_queue:
  concurrency: 1
++
++# The queue used for the celerybeat task RunMissingReady
++[celerybeat]
++timeout: 86400
++fallback_queue:
++concurrency: 1
 === modified file 'lib/lp/services/job/celeryconfig.py'
 --- lib/lp/services/job/celeryconfig.py	2012-06-29 08:40:05 +0000
 +++ lib/lp/services/job/celeryconfig.py	2012-07-19 14:00:56 +0000
@@ -35,6 +35,7 @@
      Doing this in a function is convenient for testing.
      """
      result = {}
++    CELERY_BEAT_QUEUE = 'celerybeat'
      celery_queues = {}
      queue_names = config.job_runner_queues.queues
      queue_names = queue_names.split(' ')
@@ -85,7 +86,10 @@
      result['CELERYBEAT_SCHEDULE'] = {
          'schedule-missing': {
              'task': 'lp.services.job.celeryjob.run_missing_ready',
--            'schedule': timedelta(seconds=600)
++            'schedule': timedelta(seconds=600),
++            'options': {
++                'routing_key': CELERY_BEAT_QUEUE,
++                },
+         }
+     }
      # See http://ask.github.com/celery/userguide/optimizing.html:
 === modified file 'lib/lp/services/job/celeryjob.py'
 --- lib/lp/services/job/celeryjob.py	2012-06-14 05:18:22 +0000
 +++ lib/lp/services/job/celeryjob.py	2012-07-19 14:00:56 +0000
@@ -16,10 +16,11 @@
  from logging import info
  import os
++from uuid import uuid4
  os.environ.setdefault('CELERY_CONFIG_MODULE', 'lp.services.job.celeryconfig')
--from celery.task import task
++from celery.task import Task
  from lazr.jobrunner.celerytask import RunJob
  from storm.zope.interfaces import IZStorm
  import transaction
@@ -80,24 +81,41 @@
              queued_job_ids]
--@task
--def run_missing_ready(_no_init=False):
++class RunMissingReady(Task):
      """Task to run any jobs that are ready but not scheduled.
      Currently supports only BranchScanJob.
      :param _no_init: For tests.  If True, do not perform the initialization.
      """
--    if not _no_init:
--        task_init('run_missing_ready')
--    with TransactionFreeOperation():
--        count = 0
--        for job in find_missing_ready(BranchScanJob):
--            if not celery_enabled(job.__class__.__name__):
--                continue
--            job.celeryCommitHook(True)
--            count += 1
--        info('Scheduled %d missing jobs.', count)
--        transaction.commit()
++    ignore_result = True
++
++    def run(self, _no_init=False):
++        if not _no_init:
++            task_init('run_missing_ready')
++        with TransactionFreeOperation():
++            count = 0
++            for job in find_missing_ready(BranchScanJob):
++                if not celery_enabled(job.__class__.__name__):
++                    continue
++                job.celeryCommitHook(True)
++                count += 1
++            info('Scheduled %d missing jobs.', count)
++            transaction.commit()
++
++    def apply_async(self, args=None, kwargs=None, task_id=None, publisher=None,
++                    connection=None, router=None, queues=None, **options):
++        """Create a task_id if none is specified.
++
++        Override the quite generic default task_id with one containing
++        the class name.
++
++        See also `celery.task.Task.apply_async()`.
++        """
++        if task_id is None:
++            task_id = '%s_%s' % (self.__class__.__name__, uuid4())
++        return super(RunMissingReady, self).apply_async(
++            args, kwargs, task_id, publisher, connection, router, queues,
++            **options)
  needs_zcml = True
 === modified file 'lib/lp/services/job/runner.py'
 --- lib/lp/services/job/runner.py	2012-07-03 15:36:00 +0000
 +++ lib/lp/services/job/runner.py	2012-07-19 14:00:56 +0000
@@ -217,7 +217,7 @@
              'result': SoftTimeLimitExceeded(1,),
              'task_id': 'cba7d07b-37fe-4f1d-a5f6-79ad7c30222f'}
          """
--        return '%s-%s-%s' % (
++        return '%s_%s_%s' % (
              self.__class__.__name__, self.job_id, uuid4())
      def runViaCelery(self, ignore_result=False):
 === modified file 'lib/lp/services/job/tests/test_celery_configuration.py'
 --- lib/lp/services/job/tests/test_celery_configuration.py	2012-04-18 17:20:14 +0000
 +++ lib/lp/services/job/tests/test_celery_configuration.py	2012-07-19 14:00:56 +0000
@@ -25,7 +25,8 @@
          # Four queues are defined; the binding key for each queue is
          # just the queue name.
          queue_names = [
--            'branch_write_job', 'branch_write_job_slow', 'job', 'job_slow']
++            'branch_write_job', 'branch_write_job_slow', 'celerybeat', 'job',
++            'job_slow']
          queues = config['CELERY_QUEUES']
          self.assertEqual(queue_names, sorted(queues))
          for name in queue_names:
 === modified file 'lib/lp/services/job/tests/test_celeryjob.py'
 --- lib/lp/services/job/tests/test_celeryjob.py	2012-06-27 03:25:41 +0000
 +++ lib/lp/services/job/tests/test_celeryjob.py	2012-07-19 14:00:56 +0000
@@ -1,10 +1,18 @@
  # Copyright 2012 Canonical Ltd.  This software is licensed under the
  # GNU Affero General Public License version 3 (see the file LICENSE).
++from cStringIO import StringIO
++import sys
++from time import (
++    sleep,
++    time,
++    )
++from lazr.jobrunner.bin.clear_queues import clear_queues
  from lp.code.model.branchjob import BranchScanJob
  from lp.scripts.helpers import TransactionFreeOperation
  from lp.services.features.testing import FeatureFixture
  from lp.services.job.tests import (
++    celeryd,
      drain_celery_queues,
      monitor_celery,
+     )
@@ -21,10 +29,10 @@
          super(TestRunMissingJobs, self).setUp()
          from lp.services.job.celeryjob import (
              find_missing_ready,
--            run_missing_ready,
++            RunMissingReady,
+         )
          self.find_missing_ready = find_missing_ready
--        self.run_missing_ready = run_missing_ready
++        self.RunMissingReady = RunMissingReady
      def createMissingJob(self):
          job = BranchScanJob.create(self.factory.makeBranch())
@@ -48,7 +56,7 @@
          with monitor_celery() as responses:
              with dbuser('run_missing_ready'):
                  with TransactionFreeOperation.require():
--                    self.run_missing_ready(_no_init=True)
++                    self.RunMissingReady().run(_no_init=True)
          self.assertEqual([], responses)
      def test_run_missing_ready(self):
@@ -59,5 +67,62 @@
          with monitor_celery() as responses:
              with dbuser('run_missing_ready'):
                  with TransactionFreeOperation.require():
--                    self.run_missing_ready(_no_init=True)
++                    self.RunMissingReady().run(_no_init=True)
          self.assertEqual(1, len(responses))
++
++    def test_run_missing_ready_does_not_return_results(self):
++        """The celerybeat task run_missing_ready does not create a
++        result queue."""
++        from lazr.jobrunner.celerytask import list_queued
++        job_queue_name = 'celerybeat'
++        request = self.RunMissingReady().apply_async(
++            kwargs={'_no_init': True}, queue=job_queue_name)
++        self.assertTrue(request.task_id.startswith('RunMissingReady_'))
++        result_queue_name = request.task_id.replace('-', '')
++        # Paranoia check: This test intends to prove that a Celery
++        # result queue fot the task created above will _not_ be created.
++        # This would also happen when "with celeryd()" would do nothing.
++        # So let's be sure that right now a task is queued...
++        self.assertEqual(
++            1, len(list_queued(self.RunMissingReady.app, [job_queue_name])))
++        # ...and that list_queued() calls do not consume messages.
++        self.assertEqual(
++            1, len(list_queued(self.RunMissingReady.app, [job_queue_name])))
++        # Wait at most 60 seconds for celeryd to start and process
++        # the task.
++        with celeryd(job_queue_name):
++            wait_until = time() + 60
++            while (time() < wait_until):
++                queued_tasks = list_queued(
++                    self.RunMissingReady.app, [job_queue_name])
++                if len(queued_tasks) == 0:
++                    break
++                sleep(.2)
++        # But now the message has been consumed by celeryd.
++        self.assertEqual(
++            0, len(list_queued(self.RunMissingReady.app, [job_queue_name])))
++        # No result queue was created for the task.
++        try:
++            real_stdout = sys.stdout
++            real_stderr = sys.stderr
++            sys.stdout = fake_stdout = StringIO()
++            sys.stderr = fake_stderr = StringIO()
++            clear_queues(
++                ['script_name', '-c', 'lp.services.job.celeryconfig',
++                 result_queue_name])
++        finally:
++            sys.stdout = real_stdout
++            sys.stderr = real_stderr
++        fake_stdout = fake_stdout.getvalue()
++        fake_stderr = fake_stderr.getvalue()
++        self.assertEqual(
++            '', fake_stdout,
++            "Unexpected output from clear_queues:\n"
++            "stdout: %r\n"
++            "stderr: %r" % (fake_stdout, fake_stderr))
++        self.assertEqual(
++            "NOT_FOUND - no queue '%s' in vhost '/'\n" % result_queue_name,
++            fake_stderr,
++            "Unexpected output from clear_queues:\n"
++            "stdout: %r\n"
++            "stderr: %r" % (fake_stdout, fake_stderr))
 === modified file 'lib/lp/services/job/tests/test_runner.py'
 --- lib/lp/services/job/tests/test_runner.py	2012-07-03 15:36:00 +0000
 +++ lib/lp/services/job/tests/test_runner.py	2012-07-19 14:00:56 +0000
@@ -384,7 +384,7 @@
          task_id = job.taskId()
          uuid_expr = (
              '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}')
--        mo = re.search('^NullJob-%s-%s$' % (job.job_id, uuid_expr), task_id)
++        mo = re.search('^NullJob_%s_%s$' % (job.job_id, uuid_expr), task_id)
          self.assertIsNot(None, mo)