Launchpad itself

Merge lp:~stub/launchpad/garbo into lp:launchpad

garbo
Merge into devel

Proposed by Stuart Bishop on 2011-02-21

Status:

Merged

Approved by:

Jeroen T. Vermeulen on 2011-02-25

Approved revision:

no longer in the source branch.

Merged at revision:

12533

Proposed branch:

lp:~stub/launchpad/garbo

Merge into:

lp:launchpad

Diff against target:

477 lines (+304/-50)

7 files modified

database/schema/security.cfg (+2/-0)
database/schema/trusted.sql (+22/-0)
lib/canonical/launchpad/doc/looptuner.txt (+26/-0)
lib/canonical/launchpad/interfaces/looptuner.py (+9/-0)
lib/canonical/launchpad/utilities/looptuner.py (+54/-49)
lib/lp/scripts/garbo.py (+108/-1)
lib/lp/scripts/tests/test_garbo.py (+83/-0)

To merge this branch:

bzr merge lp:~stub/launchpad/garbo

Related bugs:

Bug #768139: garbo script timeout behavior needs tweaking	High	Fix Released
Bug #795305: bugsummaryjournal is not automatically rolled up	High	Fix Released
Bug #893720: garbo.py not threadsafe (fails with IndexError due to race condition)	Critical	Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Jeroen T. Vermeulen (community)	code	2011-02-21	Approve on 2011-02-25
Review via email: mp+50595@code.launchpad.net

Commit message

[r=jtv][bug=721195] Generic bulk deletion helper for garbo, and garbo job to remove garbage POTranslation rows

Description of the change

A generic bulk removal helper using the optimal method.

Garbage POTranslation record pruner, using the bulk removal garbo template.

Note that there are no tests for the POTranslationPruner - the behavior of the base class is already tested, and the only thing remaining is the SQL defining the rows to remove. It is pointless testing this, because we would need to use the same chunk of SQL in the test to calculate what rows should be matched. The pruner itself is invoked each time runDaily() is invoked in the existing tests.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-02-22:

Thanks for implementing this cleanup. The disk space will be welcome! Daily runs may be a bit much, since it's so unusual for a POTranslation to become unreferenced. Any idea how long the query that searches for orphans will take?

Don't let the Needs Fixing vote put you off—most of the branch looks very good indeed. It's just that I feel the test needs some more work. I do have a few other notes, most in the form of questions because you probably know better than I do, actually.

It seems a bit of a shame to have a procedure just to wrap a FETCH. I think we're all on 8.4 now… have you tried putting the FETCH in a WITH instead? Also, is there no way to fetch n tuples at once?

The CLOSE ALL doesn't belong in BulkPruner.__init__ IMHO. Can't we have a cleanup method called in a "finally:"? If the client dies without passing through there, we wouldn't reuse the connection anyway and the cursor would quietly slip into oblivion.

As for the actual DELETE, doesn't it work with USING?

The reference to POComment is a gentle reminder that that table isn't actually in use. It's empty as well.

And then there's the test. It's a bit unwieldy and worse, relies on sample data. That's the part that needs fixing. It would be easier AFAICS to have two simple end-do-end tests: "create object plus BulkPruner that cleans it up; run; abort; show that the object is gone" and "create object plus BulkPruner that doesn't clean it up; run; show that the object is still there." For much of the rest you can trust LoopTuner test coverage. Batching behaviour would be best verified by a separate test method.

By the way, I suspect ZopelessDatabaseLayer would suffice for this test. Less setup.

Jeroen

review: Needs Fixing (code)

Revision history for this message

Stuart Bishop (stub) wrote on 2011-02-23:

The query to search for orphans takes 10 minutes cold. I've opened Bug #723596, cited in an XXX comment stating that this should be run less frequently.

Yes, it sucks needing the procedure to use FETCH. It seems impossible to use FETCH inside a SELECT statement without it, and WITH clauses don't help as they contain SELECT statements. It is interesting that the only way of declaring a CURSOR WITH HOLD is using the DECLARE SQLstatement, as the cursors declared using the plpgsql DECLARE statement implicitly close at the end of the transaction. So as far as I can tell, this is the only possible structure to do what we are doing. Thankfully, this is all in the base class so if we work out a better approach there is just one piece of code to change :)

I agree CLOSE ALL doesn't belong in the constructor. I didn't want to add a cleanup method to ILoopTuner (and all the implementations) though, esp as it will be rare to need the cleanup method. I considered a check adapting the looptuner instance to ICleanup (does not yet exist) and invoking that, but thought that was overkill for this case. From IRC discussion, I've added an optional cleanUp method to ITunableLoop and used it.

The DELETE could be rewritten to use USING, but it is less readable (table alias needed and 4 substitutions vs. the current 3) and have nearly identical query plans (IN has a 'Hash Semi Join' as the top node, USING has a 'Hash Join' as the top node. USING actually has an infinitesimally larger cost. Go figure :)

The BulkPruner tests no longer use the Launchpad database. They now use the ZopelessDatabaseLayer.

I don't think your suggested test structure is an improvement. It seems redundant, as the existing test would still be needed to ensure things like commit-per-iteration, only removing chunk items per iteration. We are testing the object implements the loop tuner interface correctly, and the existing loop tuner tests confirm that this interface is invoked correctly (which is lacking in the tests for the other garbo tasks - bogus implementations that don't deal with the chunk size correctly can still pass).

The query to search for orphans takes 10 minutes cold. I've opened Bug #723596, cited in an XXX comment stating that this should be run less frequently.

The BulkPruner tests no longer use the Launchpad database. They now use the ZopelessDatabaseLayer.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-02-23:

Thanks for clarifying these things for me, and making the changes as well. As discussed on IRC, I agree that it'd be nice to enforce correctness on the cleanup API but it's not worth over-engineering.

It just occurred to me that it would have been enough to find and delete an arbitrary 1,000 orphaned POTranslations (or pick your favourite number) and leave the rest for the next garbo run. We do that with some other cleanups of this kind. It would have been much simpler, though on the other hand, the pruner may be useful for other tables that have different dynamics.

As for the test, yes, you'd still want to test those things — but it's much more manageable to separate them from the integration test. Testing the intermediate states in a longer test scenario has caused us vast amounts of maintenance drag in the past; it's essentially what we've been doing in the doctests. Relying on sample data makes it worse. Setting up your data for each individual test can be more expensive, but it gets better as you lighten the setup needed for the detailed tests and it's easier to cover all the edge cases. Often it's even worth restructuring code for.

Jeroen

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2011-02-25:

It turns out that splitting the test up into smaller chunks only made the complexity worse. Thanks for trying though, and for removing the dependency on test data as well.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Harpianto,ANDI

James Troup

John A Meinel

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Stuart Bishop

Ubuntu-BR DevOps

William Grant

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'database/schema/security.cfg'
 --- database/schema/security.cfg	2011-02-22 20:43:35 +0000
 +++ database/schema/security.cfg	2011-02-28 13:01:31 +0000
@@ -16,6 +16,7 @@
  public.activity()                          = EXECUTE
  public.person_sort_key(text, text)         = EXECUTE
  public.calculate_bug_heat(integer)         = EXECUTE
++public.cursor_fetch(refcursor, integer)    = EXECUTE
  public.debversion_sort_key(text)           = EXECUTE
  public.milestone_sort_key(timestamp without time zone, text) = EXECUTE
  public.version_sort_key(text)              = EXECUTE
@@ -2061,6 +2062,7 @@
  public.openidassociation                = SELECT, DELETE
  public.openidconsumerassociation        = SELECT, DELETE
  public.openidconsumernonce              = SELECT, DELETE
++public.potranslation                    = SELECT, DELETE
  public.revisioncache                    = SELECT, DELETE
  public.person                           = SELECT, DELETE
  public.revisionauthor                   = SELECT, UPDATE
 === modified file 'database/schema/trusted.sql'
 --- database/schema/trusted.sql	2011-02-02 17:52:54 +0000
 +++ database/schema/trusted.sql	2011-02-28 13:01:31 +0000
@@ -51,6 +51,28 @@
  'Return the number of NULLs in the first row of the given array.';
++CREATE OR REPLACE FUNCTION cursor_fetch(cur refcursor, n integer)
++RETURNS SETOF record LANGUAGE plpgsql AS
++$$
++DECLARE
++    r record;
++    count integer;
++BEGIN
++    FOR count IN 1..n LOOP
++        FETCH FORWARD FROM cur INTO r;
++        IF NOT FOUND THEN
++            RETURN;
++        END IF;
++        RETURN NEXT r;
++    END LOOP;
++END;
++$$;
++
++COMMENT ON FUNCTION cursor_fetch(refcursor, integer) IS
++'Fetch the next n items from a cursor. Work around for not being able to use FETCH inside a SELECT statement.';
++
++
++
  CREATE OR REPLACE FUNCTION replication_lag() RETURNS interval
  LANGUAGE plpgsql STABLE SECURITY DEFINER AS
  $$
 === modified file 'lib/canonical/launchpad/doc/looptuner.txt'
 --- lib/canonical/launchpad/doc/looptuner.txt	2011-01-14 16:06:46 +0000
 +++ lib/canonical/launchpad/doc/looptuner.txt	2011-02-28 13:01:31 +0000
@@ -457,3 +457,29 @@
      WARNING:root:Task aborted after 20 seconds.
++== Cleanup ==
++
++    Loops can define a clean up hook to clean up opened resources.
++    We need this because loops can be aborted mid run, so we cannot rely
++    on clean up code in the isDone() method, and __del__ is fragile and
++    can never be relied on.
++
++    >>> class PlannedLoopWithCleanup(PlannedLoop):
++    ...     def cleanUp(self):
++    ...         print 'clean up'
++
++    >>> body = PlannedLoopWithCleanup([])
++    >>> loop = TestTuner(body, goal_seconds, 100)
++    >>> loop.run()
++    done
++    clean up
++
++    >>> body = PlannedLoopWithCleanup([5, 7, 8, 9, 10, 11, 9, 20, 7, 10, 10])
++    >>> loop = TestTuner(body, goal_seconds, 100, abort_time=20)
++    >>> loop.run()
++    start
++    same (0.0)
++    same (0.0)
++    WARNING:root:Task aborted after 20 seconds.
++    clean up
++
 === modified file 'lib/canonical/launchpad/interfaces/looptuner.py'
 --- lib/canonical/launchpad/interfaces/looptuner.py	2009-06-25 05:30:52 +0000
 +++ lib/canonical/launchpad/interfaces/looptuner.py	2011-02-28 13:01:31 +0000
@@ -32,3 +32,12 @@
          to get as close as possible to your time goal.
          """
++    def cleanUp(self):
++        """Clean up any open resources.
++
++        Optional.
++
++        This method is needed because loops may be aborted before
++        completion, so clean up code in the isDone() method may
++        never be invoked.
++        """
 === modified file 'lib/canonical/launchpad/utilities/looptuner.py'
 --- lib/canonical/launchpad/utilities/looptuner.py	2010-09-03 04:14:41 +0000
 +++ lib/canonical/launchpad/utilities/looptuner.py	2011-02-28 13:01:31 +0000
@@ -24,6 +24,7 @@
      MAIN_STORE,
      MASTER_FLAVOR,
+     )
++from canonical.lazr.utils import safe_hasattr
  class LoopTuner:
@@ -117,55 +118,59 @@
      def run(self):
          """Run the loop to completion."""
--        chunk_size = self.minimum_chunk_size
--        iteration = 0
--        total_size = 0
--        self.start_time = self._time()
--        last_clock = self.start_time
--        while not self.operation.isDone():
--
--            if self._isTimedOut():
--                self.log.warn(
--                    "Task aborted after %d seconds." % self.abort_time)
--                break
--
--            self.operation(chunk_size)
--
--            new_clock = self._time()
--            time_taken = new_clock - last_clock
--            last_clock = new_clock
--            self.log.debug("Iteration %d (size %.1f): %.3f seconds" %
--                         (iteration, chunk_size, time_taken))
--
--            last_clock = self._coolDown(last_clock)
--
--            total_size += chunk_size
--
--            # Adjust parameter value to approximate goal_seconds.  The new
--            # value is the average of two numbers: the previous value, and an
--            # estimate of how many rows would take us to exactly goal_seconds
--            # seconds.
--            # The weight in this estimate of any given historic measurement
--            # decays exponentially with an exponent of 1/2.  This softens the
--            # blows from spikes and dips in processing time.
--            # Set a reasonable minimum for time_taken, just in case we get
--            # weird values for whatever reason and destabilize the
--            # algorithm.
--            time_taken = max(self.goal_seconds/10, time_taken)
--            chunk_size *= (1 + self.goal_seconds/time_taken)/2
--            chunk_size = max(chunk_size, self.minimum_chunk_size)
--            chunk_size = min(chunk_size, self.maximum_chunk_size)
--            iteration += 1
--
--        total_time = last_clock - self.start_time
--        average_size = total_size/max(1, iteration)
--        average_speed = total_size/max(1, total_time)
--        self.log.debug(
--            "Done. %d items in %d iterations, "
--            "%.3f seconds, "
--            "average size %f (%s/s)" %
--                (total_size, iteration, total_time, average_size,
--                 average_speed))
++        try:
++            chunk_size = self.minimum_chunk_size
++            iteration = 0
++            total_size = 0
++            self.start_time = self._time()
++            last_clock = self.start_time
++            while not self.operation.isDone():
++
++                if self._isTimedOut():
++                    self.log.warn(
++                        "Task aborted after %d seconds." % self.abort_time)
++                    break
++
++                self.operation(chunk_size)
++
++                new_clock = self._time()
++                time_taken = new_clock - last_clock
++                last_clock = new_clock
++                self.log.debug("Iteration %d (size %.1f): %.3f seconds" %
++                            (iteration, chunk_size, time_taken))
++
++                last_clock = self._coolDown(last_clock)
++
++                total_size += chunk_size
++
++                # Adjust parameter value to approximate goal_seconds.
++                # The new value is the average of two numbers: the
++                # previous value, and an estimate of how many rows would
++                # take us to exactly goal_seconds seconds. The weight in
++                # this estimate of any given historic measurement decays
++                # exponentially with an exponent of 1/2. This softens
++                # the blows from spikes and dips in processing time. Set
++                # a reasonable minimum for time_taken, just in case we
++                # get weird values for whatever reason and destabilize
++                # the algorithm.
++                time_taken = max(self.goal_seconds/10, time_taken)
++                chunk_size *= (1 + self.goal_seconds/time_taken)/2
++                chunk_size = max(chunk_size, self.minimum_chunk_size)
++                chunk_size = min(chunk_size, self.maximum_chunk_size)
++                iteration += 1
++
++            total_time = last_clock - self.start_time
++            average_size = total_size/max(1, iteration)
++            average_speed = total_size/max(1, total_time)
++            self.log.debug(
++                "Done. %d items in %d iterations, "
++                "%.3f seconds, "
++                "average size %f (%s/s)" %
++                    (total_size, iteration, total_time, average_size,
++                    average_speed))
++        finally:
++            if safe_hasattr(self.operation, 'cleanUp'):
++                self.operation.cleanUp()
      def _coolDown(self, bedtime):
          """Sleep for `self.cooldown_time` seconds, if set.
 === modified file 'lib/lp/scripts/garbo.py'
 --- lib/lp/scripts/garbo.py	2011-02-14 00:15:22 +0000
 +++ lib/lp/scripts/garbo.py	2011-02-28 13:01:31 +0000
@@ -75,11 +75,117 @@
      SilentLaunchpadScriptFailure,
+     )
  from lp.translations.interfaces.potemplate import IPOTemplateSet
++from lp.translations.model.potranslation import POTranslation
  ONE_DAY_IN_SECONDS = 24*60*60
++class BulkPruner(TunableLoop):
++    """A abstract ITunableLoop base class for simple pruners.
++
++    This is designed for the case where calculating the list of items
++    is expensive, and this list may be huge. For this use case, it
++    is impractical to calculate a batch of ids to remove each
++    iteration.
++
++    One approach is using a temporary table, populating it
++    with the set of items to remove at the start. However, this
++    approach can perform badly as you either need to prune the
++    temporary table as you go, or using OFFSET to skip to the next
++    batch to remove which gets slower as we progress further through
++    the list.
++
++    Instead, this implementation declares a CURSOR that can be used
++    across multiple transactions, allowing us to calculate the set
++    of items to remove just once and iterate over it, avoiding the
++    seek-to-batch issues with a temporary table and OFFSET yet
++    deleting batches of rows in separate transactions.
++    """
++
++    # The Storm database class for the table we are removing records
++    # from. Must be overridden.
++    target_table_class = None
++
++    # The column name in target_table we use as the integer key. May be
++    # overridden.
++    target_table_key = 'id'
++
++    # An SQL query returning a list of ids to remove from target_table.
++    # The query must return a single column named 'id' and should not
++    # contain duplicates. Must be overridden.
++    ids_to_prune_query = None
++
++    # See `TunableLoop`. May be overridden.
++    maximum_chunk_size = 10000
++
++    def __init__(self, log, abort_time=None):
++        super(BulkPruner, self).__init__(log, abort_time)
++
++        self.store = IMasterStore(self.target_table_class)
++        self.target_table_name = self.target_table_class.__storm_table__
++
++        # Open the cursor.
++        self.store.execute(
++            "DECLARE bulkprunerid NO SCROLL CURSOR WITH HOLD FOR %s"
++            % self.ids_to_prune_query)
++
++    _num_removed = None
++
++    def isDone(self):
++        """See `ITunableLoop`."""
++        return self._num_removed == 0
++
++    def __call__(self, chunk_size):
++        """See `ITunableLoop`."""
++        result = self.store.execute("""
++            DELETE FROM %s WHERE %s IN (
++                SELECT id FROM
++                cursor_fetch('bulkprunerid', %d) AS f(id integer))
++            """
++            % (self.target_table_name, self.target_table_key, chunk_size))
++        self._num_removed = result.rowcount
++        transaction.commit()
++
++    def cleanUp(self):
++        """See `ITunableLoop`."""
++        self.store.execute("CLOSE bulkprunerid")
++
++
++class POTranslationPruner(BulkPruner):
++    """Remove unlinked POTranslation entries.
++
++    XXX bug=723596 StuartBishop: This job only needs to run once per month.
++    """
++
++    target_table_class = POTranslation
++
++    ids_to_prune_query = """
++        SELECT POTranslation.id AS id FROM POTranslation
++        EXCEPT (
++            SELECT potranslation FROM POComment
++
++            UNION ALL SELECT msgstr0 FROM TranslationMessage
++                WHERE msgstr0 IS NOT NULL
++
++            UNION ALL SELECT msgstr1 FROM TranslationMessage
++                WHERE msgstr1 IS NOT NULL
++
++            UNION ALL SELECT msgstr2 FROM TranslationMessage
++                WHERE msgstr2 IS NOT NULL
++
++            UNION ALL SELECT msgstr3 FROM TranslationMessage
++                WHERE msgstr3 IS NOT NULL
++
++            UNION ALL SELECT msgstr4 FROM TranslationMessage
++                WHERE msgstr4 IS NOT NULL
++
++            UNION ALL SELECT msgstr5 FROM TranslationMessage
++                WHERE msgstr5 IS NOT NULL
++            )
++        """
++
++
  class OAuthNoncePruner(TunableLoop):
      """An ITunableLoop to prune old OAuthNonce records.
@@ -173,7 +279,7 @@
                  LIMIT %d
+                 )
              """ % (self.table_name, self.table_name, int(chunksize)))
--        self._num_removed = result._raw_cursor.rowcount
++        self._num_removed = result.rowcount
          transaction.commit()
      def isDone(self):
@@ -895,6 +1001,7 @@
          OldTimeLimitedTokenDeleter,
          RevisionAuthorEmailLinker,
          SuggestiveTemplatesCacheUpdater,
++        POTranslationPruner,
+         ]
      experimental_tunable_loops = [
          PersonPruner,
 === modified file 'lib/lp/scripts/tests/test_garbo.py'
 --- lib/lp/scripts/tests/test_garbo.py	2011-02-16 08:08:20 +0000
 +++ lib/lp/scripts/tests/test_garbo.py	2011-02-28 13:01:31 +0000
@@ -18,6 +18,7 @@
      Min,
      SQL,
+     )
++from storm.locals import Storm, Int
  from storm.store import Store
  import transaction
  from zope.component import getUtility
@@ -45,6 +46,7 @@
      DatabaseLayer,
      LaunchpadScriptLayer,
      LaunchpadZopelessLayer,
++    ZopelessDatabaseLayer,
+     )
  from lp.bugs.model.bugnotification import (
      BugNotification,
@@ -67,6 +69,7 @@
      PersonCreationRationale,
+     )
  from lp.scripts.garbo import (
++    BulkPruner,
      DailyDatabaseGarbageCollector,
      HourlyDatabaseGarbageCollector,
      OpenIDConsumerAssociationPruner,
@@ -98,6 +101,86 @@
          self.failIf(err.strip(), "Output to stderr: %s" % err)
++class BulkFoo(Storm):
++    __storm_table__ = 'bulkfoo'
++    id = Int(primary=True)
++
++
++class BulkFooPruner(BulkPruner):
++    target_table_class = BulkFoo
++    ids_to_prune_query = "SELECT id FROM BulkFoo WHERE id < 5"
++    maximum_chunk_size = 2
++
++
++class TestBulkPruner(TestCase):
++    layer = ZopelessDatabaseLayer
++
++    def setUp(self):
++        super(TestBulkPruner, self).setUp()
++
++        self.store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
++        self.store.execute("CREATE TABLE BulkFoo (id serial PRIMARY KEY)")
++
++        for i in range(10):
++            self.store.add(BulkFoo())
++
++    def test_bulkpruner(self):
++        log = BufferLogger()
++        pruner = BulkFooPruner(log)
++
++        # The loop thinks there is stuff to do. Confirm the initial
++        # state is sane.
++        self.assertFalse(pruner.isDone())
++
++        # An arbitrary chunk size.
++        chunk_size = 2
++
++        # Determine how many items to prune and to leave rather than
++        # hardcode these numbers.
++        num_to_prune = self.store.find(
++            BulkFoo, BulkFoo.id < 5).count()
++        num_to_leave = self.store.find(
++            BulkFoo, BulkFoo.id >= 5).count()
++        self.assertTrue(num_to_prune > chunk_size)
++        self.assertTrue(num_to_leave > 0)
++
++        # Run one loop. Make sure it committed by throwing away
++        # uncommitted changes.
++        pruner(chunk_size)
++        transaction.abort()
++
++        # Confirm 'chunk_size' items where removed; no more, no less.
++        num_remaining = self.store.find(BulkFoo).count()
++        expected_num_remaining = num_to_leave + num_to_prune - chunk_size
++        self.assertEqual(num_remaining, expected_num_remaining)
++
++        # The loop thinks there is more stuff to do.
++        self.assertFalse(pruner.isDone())
++
++        # Run the loop to completion, removing the remaining targetted
++        # rows.
++        while not pruner.isDone():
++            pruner(1000000)
++        transaction.abort()
++
++        # Confirm we have removed all targetted rows.
++        self.assertEqual(self.store.find(BulkFoo, BulkFoo.id < 5).count(), 0)
++
++        # Confirm we have the expected number of remaining rows.
++        # With the previous check, this means no untargetted rows
++        # where removed.
++        self.assertEqual(
++            self.store.find(BulkFoo, BulkFoo.id >= 5).count(), num_to_leave)
++
++        # Cleanup clears up our resources.
++        pruner.cleanUp()
++
++        # We can run it again - temporary objects cleaned up.
++        pruner = BulkFooPruner(log)
++        while not pruner.isDone():
++            pruner(chunk_size)
++
++
  class TestGarbo(TestCaseWithFactory):
      layer = LaunchpadZopelessLayer