Merge lp:~stub/launchpad/garbo into lp:launchpad

Proposed by Stuart Bishop
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 12533
Proposed branch: lp:~stub/launchpad/garbo
Merge into: lp:launchpad
Diff against target: 477 lines (+304/-50)
7 files modified
database/schema/security.cfg (+2/-0)
database/schema/trusted.sql (+22/-0)
lib/canonical/launchpad/doc/looptuner.txt (+26/-0)
lib/canonical/launchpad/interfaces/looptuner.py (+9/-0)
lib/canonical/launchpad/utilities/looptuner.py (+54/-49)
lib/lp/scripts/garbo.py (+108/-1)
lib/lp/scripts/tests/test_garbo.py (+83/-0)
To merge this branch: bzr merge lp:~stub/launchpad/garbo
Reviewer Review Type Date Requested Status
Jeroen T. Vermeulen (community) code Approve
Review via email: mp+50595@code.launchpad.net

Commit message

[r=jtv][bug=721195] Generic bulk deletion helper for garbo, and garbo job to remove garbage POTranslation rows

Description of the change

A generic bulk removal helper using the optimal method.

Garbage POTranslation record pruner, using the bulk removal garbo template.

Note that there are no tests for the POTranslationPruner - the behavior of the base class is already tested, and the only thing remaining is the SQL defining the rows to remove. It is pointless testing this, because we would need to use the same chunk of SQL in the test to calculate what rows should be matched. The pruner itself is invoked each time runDaily() is invoked in the existing tests.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks for implementing this cleanup. The disk space will be welcome! Daily runs may be a bit much, since it's so unusual for a POTranslation to become unreferenced. Any idea how long the query that searches for orphans will take?

Don't let the Needs Fixing vote put you off—most of the branch looks very good indeed. It's just that I feel the test needs some more work. I do have a few other notes, most in the form of questions because you probably know better than I do, actually.

It seems a bit of a shame to have a procedure just to wrap a FETCH. I think we're all on 8.4 now… have you tried putting the FETCH in a WITH instead? Also, is there no way to fetch n tuples at once?

The CLOSE ALL doesn't belong in BulkPruner.__init__ IMHO. Can't we have a cleanup method called in a "finally:"? If the client dies without passing through there, we wouldn't reuse the connection anyway and the cursor would quietly slip into oblivion.

As for the actual DELETE, doesn't it work with USING?

The reference to POComment is a gentle reminder that that table isn't actually in use. It's empty as well.

And then there's the test. It's a bit unwieldy and worse, relies on sample data. That's the part that needs fixing. It would be easier AFAICS to have two simple end-do-end tests: "create object plus BulkPruner that cleans it up; run; abort; show that the object is gone" and "create object plus BulkPruner that doesn't clean it up; run; show that the object is still there." For much of the rest you can trust LoopTuner test coverage. Batching behaviour would be best verified by a separate test method.

By the way, I suspect ZopelessDatabaseLayer would suffice for this test. Less setup.

Jeroen

review: Needs Fixing (code)
Revision history for this message
Stuart Bishop (stub) wrote :

The query to search for orphans takes 10 minutes cold. I've opened Bug #723596, cited in an XXX comment stating that this should be run less frequently.

Yes, it sucks needing the procedure to use FETCH. It seems impossible to use FETCH inside a SELECT statement without it, and WITH clauses don't help as they contain SELECT statements. It is interesting that the only way of declaring a CURSOR WITH HOLD is using the DECLARE SQLstatement, as the cursors declared using the plpgsql DECLARE statement implicitly close at the end of the transaction. So as far as I can tell, this is the only possible structure to do what we are doing. Thankfully, this is all in the base class so if we work out a better approach there is just one piece of code to change :)

I agree CLOSE ALL doesn't belong in the constructor. I didn't want to add a cleanup method to ILoopTuner (and all the implementations) though, esp as it will be rare to need the cleanup method. I considered a check adapting the looptuner instance to ICleanup (does not yet exist) and invoking that, but thought that was overkill for this case. From IRC discussion, I've added an optional cleanUp method to ITunableLoop and used it.

The DELETE could be rewritten to use USING, but it is less readable (table alias needed and 4 substitutions vs. the current 3) and have nearly identical query plans (IN has a 'Hash Semi Join' as the top node, USING has a 'Hash Join' as the top node. USING actually has an infinitesimally larger cost. Go figure :)

The BulkPruner tests no longer use the Launchpad database. They now use the ZopelessDatabaseLayer.

I don't think your suggested test structure is an improvement. It seems redundant, as the existing test would still be needed to ensure things like commit-per-iteration, only removing chunk items per iteration. We are testing the object implements the loop tuner interface correctly, and the existing loop tuner tests confirm that this interface is invoked correctly (which is lacking in the tests for the other garbo tasks - bogus implementations that don't deal with the chunk size correctly can still pass).

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks for clarifying these things for me, and making the changes as well. As discussed on IRC, I agree that it'd be nice to enforce correctness on the cleanup API but it's not worth over-engineering.

It just occurred to me that it would have been enough to find and delete an arbitrary 1,000 orphaned POTranslations (or pick your favourite number) and leave the rest for the next garbo run. We do that with some other cleanups of this kind. It would have been much simpler, though on the other hand, the pruner may be useful for other tables that have different dynamics.

As for the test, yes, you'd still want to test those things — but it's much more manageable to separate them from the integration test. Testing the intermediate states in a longer test scenario has caused us vast amounts of maintenance drag in the past; it's essentially what we've been doing in the doctests. Relying on sample data makes it worse. Setting up your data for each individual test can be more expensive, but it gets better as you lighten the setup needed for the detailed tests and it's easier to cover all the edge cases. Often it's even worth restructuring code for.

Jeroen

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

It turns out that splitting the test up into smaller chunks only made the complexity worse. Thanks for trying though, and for removing the dependency on test data as well.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'database/schema/security.cfg'
2--- database/schema/security.cfg 2011-02-22 20:43:35 +0000
3+++ database/schema/security.cfg 2011-02-28 13:01:31 +0000
4@@ -16,6 +16,7 @@
5 public.activity() = EXECUTE
6 public.person_sort_key(text, text) = EXECUTE
7 public.calculate_bug_heat(integer) = EXECUTE
8+public.cursor_fetch(refcursor, integer) = EXECUTE
9 public.debversion_sort_key(text) = EXECUTE
10 public.milestone_sort_key(timestamp without time zone, text) = EXECUTE
11 public.version_sort_key(text) = EXECUTE
12@@ -2061,6 +2062,7 @@
13 public.openidassociation = SELECT, DELETE
14 public.openidconsumerassociation = SELECT, DELETE
15 public.openidconsumernonce = SELECT, DELETE
16+public.potranslation = SELECT, DELETE
17 public.revisioncache = SELECT, DELETE
18 public.person = SELECT, DELETE
19 public.revisionauthor = SELECT, UPDATE
20
21=== modified file 'database/schema/trusted.sql'
22--- database/schema/trusted.sql 2011-02-02 17:52:54 +0000
23+++ database/schema/trusted.sql 2011-02-28 13:01:31 +0000
24@@ -51,6 +51,28 @@
25 'Return the number of NULLs in the first row of the given array.';
26
27
28+CREATE OR REPLACE FUNCTION cursor_fetch(cur refcursor, n integer)
29+RETURNS SETOF record LANGUAGE plpgsql AS
30+$$
31+DECLARE
32+ r record;
33+ count integer;
34+BEGIN
35+ FOR count IN 1..n LOOP
36+ FETCH FORWARD FROM cur INTO r;
37+ IF NOT FOUND THEN
38+ RETURN;
39+ END IF;
40+ RETURN NEXT r;
41+ END LOOP;
42+END;
43+$$;
44+
45+COMMENT ON FUNCTION cursor_fetch(refcursor, integer) IS
46+'Fetch the next n items from a cursor. Work around for not being able to use FETCH inside a SELECT statement.';
47+
48+
49+
50 CREATE OR REPLACE FUNCTION replication_lag() RETURNS interval
51 LANGUAGE plpgsql STABLE SECURITY DEFINER AS
52 $$
53
54=== modified file 'lib/canonical/launchpad/doc/looptuner.txt'
55--- lib/canonical/launchpad/doc/looptuner.txt 2011-01-14 16:06:46 +0000
56+++ lib/canonical/launchpad/doc/looptuner.txt 2011-02-28 13:01:31 +0000
57@@ -457,3 +457,29 @@
58 WARNING:root:Task aborted after 20 seconds.
59
60
61+== Cleanup ==
62+
63+ Loops can define a clean up hook to clean up opened resources.
64+ We need this because loops can be aborted mid run, so we cannot rely
65+ on clean up code in the isDone() method, and __del__ is fragile and
66+ can never be relied on.
67+
68+ >>> class PlannedLoopWithCleanup(PlannedLoop):
69+ ... def cleanUp(self):
70+ ... print 'clean up'
71+
72+ >>> body = PlannedLoopWithCleanup([])
73+ >>> loop = TestTuner(body, goal_seconds, 100)
74+ >>> loop.run()
75+ done
76+ clean up
77+
78+ >>> body = PlannedLoopWithCleanup([5, 7, 8, 9, 10, 11, 9, 20, 7, 10, 10])
79+ >>> loop = TestTuner(body, goal_seconds, 100, abort_time=20)
80+ >>> loop.run()
81+ start
82+ same (0.0)
83+ same (0.0)
84+ WARNING:root:Task aborted after 20 seconds.
85+ clean up
86+
87
88=== modified file 'lib/canonical/launchpad/interfaces/looptuner.py'
89--- lib/canonical/launchpad/interfaces/looptuner.py 2009-06-25 05:30:52 +0000
90+++ lib/canonical/launchpad/interfaces/looptuner.py 2011-02-28 13:01:31 +0000
91@@ -32,3 +32,12 @@
92 to get as close as possible to your time goal.
93 """
94
95+ def cleanUp(self):
96+ """Clean up any open resources.
97+
98+ Optional.
99+
100+ This method is needed because loops may be aborted before
101+ completion, so clean up code in the isDone() method may
102+ never be invoked.
103+ """
104
105=== modified file 'lib/canonical/launchpad/utilities/looptuner.py'
106--- lib/canonical/launchpad/utilities/looptuner.py 2010-09-03 04:14:41 +0000
107+++ lib/canonical/launchpad/utilities/looptuner.py 2011-02-28 13:01:31 +0000
108@@ -24,6 +24,7 @@
109 MAIN_STORE,
110 MASTER_FLAVOR,
111 )
112+from canonical.lazr.utils import safe_hasattr
113
114
115 class LoopTuner:
116@@ -117,55 +118,59 @@
117
118 def run(self):
119 """Run the loop to completion."""
120- chunk_size = self.minimum_chunk_size
121- iteration = 0
122- total_size = 0
123- self.start_time = self._time()
124- last_clock = self.start_time
125- while not self.operation.isDone():
126-
127- if self._isTimedOut():
128- self.log.warn(
129- "Task aborted after %d seconds." % self.abort_time)
130- break
131-
132- self.operation(chunk_size)
133-
134- new_clock = self._time()
135- time_taken = new_clock - last_clock
136- last_clock = new_clock
137- self.log.debug("Iteration %d (size %.1f): %.3f seconds" %
138- (iteration, chunk_size, time_taken))
139-
140- last_clock = self._coolDown(last_clock)
141-
142- total_size += chunk_size
143-
144- # Adjust parameter value to approximate goal_seconds. The new
145- # value is the average of two numbers: the previous value, and an
146- # estimate of how many rows would take us to exactly goal_seconds
147- # seconds.
148- # The weight in this estimate of any given historic measurement
149- # decays exponentially with an exponent of 1/2. This softens the
150- # blows from spikes and dips in processing time.
151- # Set a reasonable minimum for time_taken, just in case we get
152- # weird values for whatever reason and destabilize the
153- # algorithm.
154- time_taken = max(self.goal_seconds/10, time_taken)
155- chunk_size *= (1 + self.goal_seconds/time_taken)/2
156- chunk_size = max(chunk_size, self.minimum_chunk_size)
157- chunk_size = min(chunk_size, self.maximum_chunk_size)
158- iteration += 1
159-
160- total_time = last_clock - self.start_time
161- average_size = total_size/max(1, iteration)
162- average_speed = total_size/max(1, total_time)
163- self.log.debug(
164- "Done. %d items in %d iterations, "
165- "%.3f seconds, "
166- "average size %f (%s/s)" %
167- (total_size, iteration, total_time, average_size,
168- average_speed))
169+ try:
170+ chunk_size = self.minimum_chunk_size
171+ iteration = 0
172+ total_size = 0
173+ self.start_time = self._time()
174+ last_clock = self.start_time
175+ while not self.operation.isDone():
176+
177+ if self._isTimedOut():
178+ self.log.warn(
179+ "Task aborted after %d seconds." % self.abort_time)
180+ break
181+
182+ self.operation(chunk_size)
183+
184+ new_clock = self._time()
185+ time_taken = new_clock - last_clock
186+ last_clock = new_clock
187+ self.log.debug("Iteration %d (size %.1f): %.3f seconds" %
188+ (iteration, chunk_size, time_taken))
189+
190+ last_clock = self._coolDown(last_clock)
191+
192+ total_size += chunk_size
193+
194+ # Adjust parameter value to approximate goal_seconds.
195+ # The new value is the average of two numbers: the
196+ # previous value, and an estimate of how many rows would
197+ # take us to exactly goal_seconds seconds. The weight in
198+ # this estimate of any given historic measurement decays
199+ # exponentially with an exponent of 1/2. This softens
200+ # the blows from spikes and dips in processing time. Set
201+ # a reasonable minimum for time_taken, just in case we
202+ # get weird values for whatever reason and destabilize
203+ # the algorithm.
204+ time_taken = max(self.goal_seconds/10, time_taken)
205+ chunk_size *= (1 + self.goal_seconds/time_taken)/2
206+ chunk_size = max(chunk_size, self.minimum_chunk_size)
207+ chunk_size = min(chunk_size, self.maximum_chunk_size)
208+ iteration += 1
209+
210+ total_time = last_clock - self.start_time
211+ average_size = total_size/max(1, iteration)
212+ average_speed = total_size/max(1, total_time)
213+ self.log.debug(
214+ "Done. %d items in %d iterations, "
215+ "%.3f seconds, "
216+ "average size %f (%s/s)" %
217+ (total_size, iteration, total_time, average_size,
218+ average_speed))
219+ finally:
220+ if safe_hasattr(self.operation, 'cleanUp'):
221+ self.operation.cleanUp()
222
223 def _coolDown(self, bedtime):
224 """Sleep for `self.cooldown_time` seconds, if set.
225
226=== modified file 'lib/lp/scripts/garbo.py'
227--- lib/lp/scripts/garbo.py 2011-02-14 00:15:22 +0000
228+++ lib/lp/scripts/garbo.py 2011-02-28 13:01:31 +0000
229@@ -75,11 +75,117 @@
230 SilentLaunchpadScriptFailure,
231 )
232 from lp.translations.interfaces.potemplate import IPOTemplateSet
233+from lp.translations.model.potranslation import POTranslation
234
235
236 ONE_DAY_IN_SECONDS = 24*60*60
237
238
239+class BulkPruner(TunableLoop):
240+ """A abstract ITunableLoop base class for simple pruners.
241+
242+ This is designed for the case where calculating the list of items
243+ is expensive, and this list may be huge. For this use case, it
244+ is impractical to calculate a batch of ids to remove each
245+ iteration.
246+
247+ One approach is using a temporary table, populating it
248+ with the set of items to remove at the start. However, this
249+ approach can perform badly as you either need to prune the
250+ temporary table as you go, or using OFFSET to skip to the next
251+ batch to remove which gets slower as we progress further through
252+ the list.
253+
254+ Instead, this implementation declares a CURSOR that can be used
255+ across multiple transactions, allowing us to calculate the set
256+ of items to remove just once and iterate over it, avoiding the
257+ seek-to-batch issues with a temporary table and OFFSET yet
258+ deleting batches of rows in separate transactions.
259+ """
260+
261+ # The Storm database class for the table we are removing records
262+ # from. Must be overridden.
263+ target_table_class = None
264+
265+ # The column name in target_table we use as the integer key. May be
266+ # overridden.
267+ target_table_key = 'id'
268+
269+ # An SQL query returning a list of ids to remove from target_table.
270+ # The query must return a single column named 'id' and should not
271+ # contain duplicates. Must be overridden.
272+ ids_to_prune_query = None
273+
274+ # See `TunableLoop`. May be overridden.
275+ maximum_chunk_size = 10000
276+
277+ def __init__(self, log, abort_time=None):
278+ super(BulkPruner, self).__init__(log, abort_time)
279+
280+ self.store = IMasterStore(self.target_table_class)
281+ self.target_table_name = self.target_table_class.__storm_table__
282+
283+ # Open the cursor.
284+ self.store.execute(
285+ "DECLARE bulkprunerid NO SCROLL CURSOR WITH HOLD FOR %s"
286+ % self.ids_to_prune_query)
287+
288+ _num_removed = None
289+
290+ def isDone(self):
291+ """See `ITunableLoop`."""
292+ return self._num_removed == 0
293+
294+ def __call__(self, chunk_size):
295+ """See `ITunableLoop`."""
296+ result = self.store.execute("""
297+ DELETE FROM %s WHERE %s IN (
298+ SELECT id FROM
299+ cursor_fetch('bulkprunerid', %d) AS f(id integer))
300+ """
301+ % (self.target_table_name, self.target_table_key, chunk_size))
302+ self._num_removed = result.rowcount
303+ transaction.commit()
304+
305+ def cleanUp(self):
306+ """See `ITunableLoop`."""
307+ self.store.execute("CLOSE bulkprunerid")
308+
309+
310+class POTranslationPruner(BulkPruner):
311+ """Remove unlinked POTranslation entries.
312+
313+ XXX bug=723596 StuartBishop: This job only needs to run once per month.
314+ """
315+
316+ target_table_class = POTranslation
317+
318+ ids_to_prune_query = """
319+ SELECT POTranslation.id AS id FROM POTranslation
320+ EXCEPT (
321+ SELECT potranslation FROM POComment
322+
323+ UNION ALL SELECT msgstr0 FROM TranslationMessage
324+ WHERE msgstr0 IS NOT NULL
325+
326+ UNION ALL SELECT msgstr1 FROM TranslationMessage
327+ WHERE msgstr1 IS NOT NULL
328+
329+ UNION ALL SELECT msgstr2 FROM TranslationMessage
330+ WHERE msgstr2 IS NOT NULL
331+
332+ UNION ALL SELECT msgstr3 FROM TranslationMessage
333+ WHERE msgstr3 IS NOT NULL
334+
335+ UNION ALL SELECT msgstr4 FROM TranslationMessage
336+ WHERE msgstr4 IS NOT NULL
337+
338+ UNION ALL SELECT msgstr5 FROM TranslationMessage
339+ WHERE msgstr5 IS NOT NULL
340+ )
341+ """
342+
343+
344 class OAuthNoncePruner(TunableLoop):
345 """An ITunableLoop to prune old OAuthNonce records.
346
347@@ -173,7 +279,7 @@
348 LIMIT %d
349 )
350 """ % (self.table_name, self.table_name, int(chunksize)))
351- self._num_removed = result._raw_cursor.rowcount
352+ self._num_removed = result.rowcount
353 transaction.commit()
354
355 def isDone(self):
356@@ -895,6 +1001,7 @@
357 OldTimeLimitedTokenDeleter,
358 RevisionAuthorEmailLinker,
359 SuggestiveTemplatesCacheUpdater,
360+ POTranslationPruner,
361 ]
362 experimental_tunable_loops = [
363 PersonPruner,
364
365=== modified file 'lib/lp/scripts/tests/test_garbo.py'
366--- lib/lp/scripts/tests/test_garbo.py 2011-02-16 08:08:20 +0000
367+++ lib/lp/scripts/tests/test_garbo.py 2011-02-28 13:01:31 +0000
368@@ -18,6 +18,7 @@
369 Min,
370 SQL,
371 )
372+from storm.locals import Storm, Int
373 from storm.store import Store
374 import transaction
375 from zope.component import getUtility
376@@ -45,6 +46,7 @@
377 DatabaseLayer,
378 LaunchpadScriptLayer,
379 LaunchpadZopelessLayer,
380+ ZopelessDatabaseLayer,
381 )
382 from lp.bugs.model.bugnotification import (
383 BugNotification,
384@@ -67,6 +69,7 @@
385 PersonCreationRationale,
386 )
387 from lp.scripts.garbo import (
388+ BulkPruner,
389 DailyDatabaseGarbageCollector,
390 HourlyDatabaseGarbageCollector,
391 OpenIDConsumerAssociationPruner,
392@@ -98,6 +101,86 @@
393 self.failIf(err.strip(), "Output to stderr: %s" % err)
394
395
396+class BulkFoo(Storm):
397+ __storm_table__ = 'bulkfoo'
398+ id = Int(primary=True)
399+
400+
401+class BulkFooPruner(BulkPruner):
402+ target_table_class = BulkFoo
403+ ids_to_prune_query = "SELECT id FROM BulkFoo WHERE id < 5"
404+ maximum_chunk_size = 2
405+
406+
407+class TestBulkPruner(TestCase):
408+ layer = ZopelessDatabaseLayer
409+
410+ def setUp(self):
411+ super(TestBulkPruner, self).setUp()
412+
413+ self.store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
414+ self.store.execute("CREATE TABLE BulkFoo (id serial PRIMARY KEY)")
415+
416+ for i in range(10):
417+ self.store.add(BulkFoo())
418+
419+ def test_bulkpruner(self):
420+ log = BufferLogger()
421+ pruner = BulkFooPruner(log)
422+
423+ # The loop thinks there is stuff to do. Confirm the initial
424+ # state is sane.
425+ self.assertFalse(pruner.isDone())
426+
427+ # An arbitrary chunk size.
428+ chunk_size = 2
429+
430+ # Determine how many items to prune and to leave rather than
431+ # hardcode these numbers.
432+ num_to_prune = self.store.find(
433+ BulkFoo, BulkFoo.id < 5).count()
434+ num_to_leave = self.store.find(
435+ BulkFoo, BulkFoo.id >= 5).count()
436+ self.assertTrue(num_to_prune > chunk_size)
437+ self.assertTrue(num_to_leave > 0)
438+
439+ # Run one loop. Make sure it committed by throwing away
440+ # uncommitted changes.
441+ pruner(chunk_size)
442+ transaction.abort()
443+
444+ # Confirm 'chunk_size' items where removed; no more, no less.
445+ num_remaining = self.store.find(BulkFoo).count()
446+ expected_num_remaining = num_to_leave + num_to_prune - chunk_size
447+ self.assertEqual(num_remaining, expected_num_remaining)
448+
449+ # The loop thinks there is more stuff to do.
450+ self.assertFalse(pruner.isDone())
451+
452+ # Run the loop to completion, removing the remaining targetted
453+ # rows.
454+ while not pruner.isDone():
455+ pruner(1000000)
456+ transaction.abort()
457+
458+ # Confirm we have removed all targetted rows.
459+ self.assertEqual(self.store.find(BulkFoo, BulkFoo.id < 5).count(), 0)
460+
461+ # Confirm we have the expected number of remaining rows.
462+ # With the previous check, this means no untargetted rows
463+ # where removed.
464+ self.assertEqual(
465+ self.store.find(BulkFoo, BulkFoo.id >= 5).count(), num_to_leave)
466+
467+ # Cleanup clears up our resources.
468+ pruner.cleanUp()
469+
470+ # We can run it again - temporary objects cleaned up.
471+ pruner = BulkFooPruner(log)
472+ while not pruner.isDone():
473+ pruner(chunk_size)
474+
475+
476 class TestGarbo(TestCaseWithFactory):
477 layer = LaunchpadZopelessLayer
478