Merge lp:~lifeless/python-oops-datedir-repo/oops-prune into lp:python-oops-datedir-repo

Proposed by Robert Collins
Status: Merged
Merged at revision: 25
Proposed branch: lp:~lifeless/python-oops-datedir-repo/oops-prune
Merge into: lp:python-oops-datedir-repo
Diff against target: 550 lines (+431/-9)
6 files modified
NEWS (+18/-0)
oops_datedir_repo/prune.py (+155/-0)
oops_datedir_repo/repository.py (+140/-9)
oops_datedir_repo/tests/test_repository.py (+101/-0)
setup.py (+5/-0)
versions.cfg (+12/-0)
To merge this branch: bzr merge lp:~lifeless/python-oops-datedir-repo/oops-prune
Reviewer Review Type Date Requested Status
Steve Kowalik (community) code Approve
Review via email: mp+82324@code.launchpad.net

Commit message

Add GC facility using Launchpad to obtain references to OOPS reports.

Description of the change

Implement incremental OOPS pruning, finishing the decoupling of Launchpad from OOPS storage.

I'm pretty happy with this branch, the only untested code is the CLI glue, which is traditionally a PITA to test - and exacerbated here by the need to talk to Launchpad itself.

For now, I've deliberately left it untested - all the complex logic is in the fully TDD and unit tested repository object.

To post a comment you must log in.
Revision history for this message
Steve Kowalik (stevenk) wrote :

The only comment I have is some of your comments are not full sentences. Other than that, this looks good.

review: Approve (code)
Revision history for this message
Robert Collins (lifeless) wrote :

I will do a quick audit. Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2011-11-13 21:03:43 +0000
3+++ NEWS 2011-11-15 21:50:28 +0000
4@@ -6,6 +6,24 @@
5 NEXT
6 ----
7
8+* Repository has a simple generic config API. See the set_config and get_config
9+ methods. (Robert Collins)
10+
11+* Repository can now answer 'what is the oldest date in the repository' which
12+ is useful for incremental report pruning. See the oldest_date method.
13+ (Robert Collins)
14+
15+* Repository can perform garbage collection of a date range if a list of
16+ references to keep is supplied. See the prune_unreferenced method.
17+ (Robert Collins)
18+
19+* There is a new script bin/prune which will prune reports from a repository
20+ keeping only those referenced in a given Launchpad project or project group.
21+ This adds a dependency on launchpadlib, which should be pypi installable
22+ and is included in the Ubuntu default install - so should be a low barrier
23+ for use. If this is an issue a future release can split this out into a
24+ new addon package. (Robert Collins, #890875)
25+
26 0.0.11
27 ------
28
29
30=== added file 'oops_datedir_repo/prune.py'
31--- oops_datedir_repo/prune.py 1970-01-01 00:00:00 +0000
32+++ oops_datedir_repo/prune.py 2011-11-15 21:50:28 +0000
33@@ -0,0 +1,155 @@
34+#
35+# Copyright (c) 2011, Canonical Ltd
36+#
37+# This program is free software: you can redistribute it and/or modify
38+# it under the terms of the GNU Lesser General Public License as published by
39+# the Free Software Foundation, version 3 only.
40+#
41+# This program is distributed in the hope that it will be useful,
42+# but WITHOUT ANY WARRANTY; without even the implied warranty of
43+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
44+# GNU Lesser General Public License for more details.
45+#
46+# You should have received a copy of the GNU Lesser General Public License
47+# along with this program. If not, see <http://www.gnu.org/licenses/>.
48+# GNU Lesser General Public License version 3 (see the file LICENSE).
49+
50+"""Delete OOPSes that are not referenced in the bugtracker.
51+
52+Currently only has support for the Launchpad bug tracker.
53+"""
54+
55+__metaclass__ = type
56+
57+import datetime
58+import logging
59+import optparse
60+from textwrap import dedent
61+import sys
62+
63+from launchpadlib.launchpad import Launchpad
64+from launchpadlib.uris import lookup_service_root
65+from pytz import utc
66+
67+import oops_datedir_repo
68+
69+__all__ = [
70+ 'main',
71+ ]
72+
73+
74+class LaunchpadTracker:
75+ """Abstracted bug tracker/forums etc - permits testing of main()."""
76+
77+ def __init__(self, options):
78+ self.lp = Launchpad.login_anonymously(
79+ 'oops-prune', options.lpinstance, version='devel')
80+
81+ def find_oops_references(self, start_time, end_time, project=None,
82+ projectgroup=None):
83+ projects = set([])
84+ if project is not None:
85+ projects.add(project)
86+ if projectgroup is not None:
87+ [projects.add(lp_proj.name)
88+ for lp_proj in self.lp.project_groups[projectgroup].projects]
89+ result = set()
90+ lp_projects = self.lp.projects
91+ one_week = datetime.timedelta(weeks=1)
92+ for project in projects:
93+ lp_project = lp_projects[project]
94+ current_start = start_time
95+ while current_start < end_time:
96+ current_end = current_start + one_week
97+ if current_end > end_time:
98+ current_end = end_time
99+ logging.info(
100+ "Querying OOPS references on %s from %s to %s",
101+ project, current_start, current_end)
102+ result.update(lp_project.findReferencedOOPS(
103+ start_date=current_start, end_date=current_end))
104+ current_start = current_end
105+ return result
106+
107+
108+def main(argv=None, tracker=LaunchpadTracker, logging=logging):
109+ """Console script entry point."""
110+ if argv is None:
111+ argv = sys.argv
112+ usage = dedent("""\
113+ %prog [options]
114+
115+ The following options must be supplied:
116+ --repo
117+
118+ And either
119+ --project
120+ or
121+ --projectgroup
122+
123+ e.g.
124+ %prog --repo . --projectgroup launchpad-project
125+
126+ Will process every member project of launchpad-project.
127+
128+ When run this program will ask Launchpad for OOPS references made since
129+ the last date it pruned up to, with an upper limit of one week from
130+ today. It then looks in the repository for all oopses created during
131+ that date range, and if they are not in the set returned by Launchpad,
132+ deletes them. If the repository has never been pruned before, it will
133+ pick the earliest datedir present in the repository as the start date.
134+ """)
135+ description = \
136+ "Delete OOPS reports that are not referenced in a bug tracker."
137+ parser = optparse.OptionParser(
138+ description=description, usage=usage)
139+ parser.add_option('--project',
140+ help="Launchpad project to find references in.")
141+ parser.add_option('--projectgroup',
142+ help="Launchpad project group to find references in.")
143+ parser.add_option('--repo', help="Path to the repository to read from.")
144+ parser.add_option(
145+ '--lpinstance', help="Launchpad instance to use", default="production")
146+ options, args = parser.parse_args(argv[1:])
147+ def needed(*optnames):
148+ present = set()
149+ for optname in optnames:
150+ if getattr(options, optname, None) is not None:
151+ present.add(optname)
152+ if not present:
153+ if len(optnames) == 1:
154+ raise ValueError('Option "%s" must be supplied' % optname)
155+ else:
156+ raise ValueError(
157+ 'One of options %s must be supplied' % (optnames,))
158+ elif len(present) != 1:
159+ raise ValueError(
160+ 'Only one of options %s can be supplied' % (optnames,))
161+ needed('repo')
162+ needed('project', 'projectgroup')
163+ logging.basicConfig(
164+ filename='prune.log', filemode='w', level=logging.DEBUG)
165+ repo = oops_datedir_repo.DateDirRepo(options.repo)
166+ one_week = datetime.timedelta(weeks=1)
167+ one_day = datetime.timedelta(days=1)
168+ # max date to scan for
169+ prune_until = datetime.datetime.now(utc) - one_week
170+ # Get min date to scan for
171+ try:
172+ prune_from = repo.get_config('pruned-until')
173+ except KeyError:
174+ try:
175+ oldest_oops = repo.oldest_date()
176+ except ValueError:
177+ logging.info("No OOPSes in repo, nothing to do.")
178+ return 0
179+ prune_from = datetime.datetime.fromordinal(oldest_oops.toordinal())
180+ # get references from date range
181+ finder = tracker(options)
182+ references = finder.find_oops_references(
183+ prune_from, prune_until, options.project, options.projectgroup)
184+ # delete oops files on disk
185+ repo.prune_unreferenced(prune_from, prune_until, references)
186+ # stash most recent date
187+ repo.set_config('pruned-until', prune_until)
188+ return 0
189
190=== modified file 'oops_datedir_repo/repository.py'
191--- oops_datedir_repo/repository.py 2011-11-11 04:21:05 +0000
192+++ oops_datedir_repo/repository.py 2011-11-15 21:50:28 +0000
193@@ -23,11 +23,13 @@
194 ]
195
196 import datetime
197+import errno
198 from functools import partial
199 from hashlib import md5
200 import os.path
201 import stat
202
203+import bson
204 from pytz import utc
205
206 import serializer
207@@ -37,7 +39,19 @@
208
209
210 class DateDirRepo:
211- """Publish oopses to a date-dir repository."""
212+ """Publish oopses to a date-dir repository.
213+
214+ A date-dir repository is a directory containing:
215+
216+ * Zero or one directories called 'metadata'. If it exists this directory
217+ contains any housekeeping material needed (such as a metadata.conf ini
218+ file).
219+
220+ * Zero or more directories named like YYYY-MM-DD, which contain zero or
221+ more OOPS reports. OOPS file names can take various forms, but must not
222+ end in .tmp - those are considered to be OOPS reports that are currently
223+ being written.
224+ """
225
226 def __init__(self, error_dir, instance_id=None, serializer=None,
227 inherit_id=False, stash_path=False):
228@@ -70,12 +84,14 @@
229 )
230 else:
231 self.log_namer = None
232- self.root = error_dir
233+ self.root = error_dir
234 if serializer is None:
235 serializer = serializer_bson
236 self.serializer = serializer
237 self.inherit_id = inherit_id
238 self.stash_path = stash_path
239+ self.metadatadir = os.path.join(self.root, 'metadata')
240+ self.config_path = os.path.join(self.metadatadir, 'config.bson')
241
242 def publish(self, report, now=None):
243 """Write the report to disk.
244@@ -148,13 +164,8 @@
245 two_days = datetime.timedelta(2)
246 now = datetime.date.today()
247 old = now - two_days
248- for dirname in os.listdir(self.root):
249- try:
250- y, m, d = dirname.split('-')
251- except ValueError:
252- # Not a datedir
253- continue
254- date = datetime.date(int(y),int(m),int(d))
255+ for dirname, (y,m,d) in self._datedirs():
256+ date = datetime.date(y, m, d)
257 prune = date < old
258 dirpath = os.path.join(self.root, dirname)
259 files = os.listdir(dirpath)
260@@ -171,3 +182,123 @@
261 oopsid = publisher(report)
262 if oopsid:
263 os.unlink(candidate)
264+
265+ def _datedirs(self):
266+ """Yield each subdir which looks like a datedir."""
267+ for dirname in os.listdir(self.root):
268+ try:
269+ y, m, d = dirname.split('-')
270+ y = int(y)
271+ m = int(m)
272+ d = int(d)
273+ except ValueError:
274+ # Not a datedir
275+ continue
276+ yield dirname, (y, m, d)
277+
278+ def _read_config(self):
279+ """Return the current config document from disk."""
280+ try:
281+ with open(self.config_path, 'rb') as config_file:
282+ return bson.loads(config_file.read())
283+ except IOError, e:
284+ if e.errno != errno.ENOENT:
285+ raise
286+ return {}
287+
288+ def get_config(self, key):
289+ """Return a key from the repository config.
290+
291+ :param key: A key to read from the config.
292+ """
293+ return self._read_config()[key]
294+
295+ def set_config(self, key, value):
296+ """Set config option key to value.
297+
298+ This is written to the bson document root/metadata/config.bson
299+
300+ :param key: The key to set - anything that can be a key in a bson
301+ document.
302+ :param value: The value to set - anything that can be a value in a
303+ bson document.
304+ """
305+ config = self._read_config()
306+ config[key] = value
307+ try:
308+ with open(self.config_path + '.tmp', 'wb') as config_file:
309+ config_file.write(bson.dumps(config))
310+ except IOError, e:
311+ if e.errno != errno.ENOENT:
312+ raise
313+ os.mkdir(self.metadatadir)
314+ with open(self.config_path + '.tmp', 'wb') as config_file:
315+ config_file.write(bson.dumps(config))
316+ os.rename(self.config_path + '.tmp', self.config_path)
317+
318+ def oldest_date(self):
319+ """Return the date of the oldest datedir in the repository.
320+
321+ If pruning / resubmission is working this should also be the date of
322+ the oldest oops in the repository.
323+ """
324+ dirs = list(self._datedirs())
325+ if not dirs:
326+ raise ValueError("No OOPSes in repository.")
327+ return datetime.date(*sorted(dirs)[0][1])
328+
329+ def prune_unreferenced(self, start_time, stop_time, references):
330+ """Delete OOPS reports filed between start_time and stop_time.
331+
332+ A report is deleted if all of the following are true:
333+
334+ * it is in a datedir covered by [start_time, stop_time] inclusive of
335+ the end points.
336+
337+ * It is not in the set references.
338+
339+ * Its timestamp falls between start_time and stop_time inclusively or
340+ it's timestamp is outside the datedir it is in or there is no
341+ timestamp on the report.
342+
343+ :param start_time: The lower bound to prune within.
344+ :param stop_time: The upper bound to prune within.
345+ :param references: An iterable of OOPS ids to keep.
346+ """
347+ start_date = start_time.date()
348+ stop_date = stop_time.date()
349+ midnight = datetime.time(tzinfo=utc)
350+ for dirname, (y,m,d) in self._datedirs():
351+ dirdate = datetime.date(y, m, d)
352+ if dirdate < start_date or dirdate > stop_date:
353+ continue
354+ dirpath = os.path.join(self.root, dirname)
355+ files = os.listdir(dirpath)
356+ deleted = 0
357+ for candidate in map(partial(os.path.join, dirpath), files):
358+ if candidate.endswith('.tmp'):
359+ # Old half-written oops: just remove.
360+ os.unlink(candidate)
361+ deleted += 1
362+ continue
363+ with file(candidate, 'rb') as report_file:
364+ report = serializer.read(report_file)
365+ report_time = report.get('time', None)
366+ if (report_time is None or
367+ report_time.date() < dirdate or
368+ report_time.date() > dirdate):
369+ # The report is oddly filed or missing a precise
370+ # datestamp. Treat it like midnight on the day of the
371+ # directory it was placed in - this is a lower bound on
372+ # when it was actually created.
373+ report_time = datetime.datetime.combine(
374+ dirdate, midnight)
375+ if (report_time >= start_time and
376+ report_time <= stop_time and
377+ report['id'] not in references):
378+ # Unreferenced and prunable
379+ os.unlink(candidate)
380+ deleted += 1
381+ if deleted == len(files):
382+ # Everything in the directory was deleted.
383+ os.rmdir(dirpath)
384
385=== modified file 'oops_datedir_repo/tests/test_repository.py'
386--- oops_datedir_repo/tests/test_repository.py 2011-11-11 04:21:05 +0000
387+++ oops_datedir_repo/tests/test_repository.py 2011-11-15 21:50:28 +0000
388@@ -26,6 +26,7 @@
389 import bson
390 from pytz import utc
391 import testtools
392+from testtools.matchers import raises
393
394 from oops_datedir_repo import (
395 DateDirRepo,
396@@ -234,3 +235,103 @@
397 repo = DateDirRepo(self.useFixture(TempDir()).path)
398 os.mkdir(repo.root + '/foo')
399 repo.republish([].append)
400+
401+ def test_republish_ignores_metadata_dir(self):
402+ # The metadata directory is never pruned
403+ repo = DateDirRepo(self.useFixture(TempDir()).path)
404+ os.mkdir(repo.root + '/metadata')
405+ repo.republish([].append)
406+ self.assertTrue(os.path.exists(repo.root + '/metadata'))
407+
408+ def test_get_config_value(self):
409+ # Config values can be asked for from the repository.
410+ repo = DateDirRepo(self.useFixture(TempDir()).path)
411+ pruned = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
412+ repo.set_config('pruned-until', pruned)
413+ # Fresh instance, no memory tricks.
414+ repo = DateDirRepo(repo.root)
415+ self.assertEqual(pruned, repo.get_config('pruned-until'))
416+
417+ def test_set_config_value(self):
418+ # Config values are just keys in a bson document.
419+ repo = DateDirRepo(self.useFixture(TempDir()).path)
420+ pruned = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
421+ repo.set_config('pruned-until', pruned)
422+ with open(repo.root + '/metadata/config.bson', 'rb') as config_file:
423+ from_bson = bson.loads(config_file.read())
424+ self.assertEqual({'pruned-until': pruned}, from_bson)
425+
426+ def test_set_config_preserves_other_values(self):
427+ # E.g. setting 'a' does not affect 'b'
428+ repo = DateDirRepo(self.useFixture(TempDir()).path)
429+ repo.set_config('b', 'b-value')
430+ repo = DateDirRepo(repo.root)
431+ repo.set_config('a', 'a-value')
432+ with open(repo.root + '/metadata/config.bson', 'rb') as config_file:
433+ from_bson = bson.loads(config_file.read())
434+ self.assertEqual({'a': 'a-value', 'b': 'b-value'}, from_bson)
435+
436+ def test_oldest_date_no_contents(self):
437+ repo = DateDirRepo(self.useFixture(TempDir()).path)
438+ self.assertThat(lambda: repo.oldest_date(),
439+ raises(ValueError("No OOPSes in repository.")))
440+
441+ def test_oldest_date_is_oldest(self):
442+ repo = DateDirRepo(self.useFixture(TempDir()).path)
443+ os.mkdir(repo.root + '/2006-04-12')
444+ os.mkdir(repo.root + '/2006-04-13')
445+ self.assertEqual(datetime.date(2006, 4, 12), repo.oldest_date())
446+
447+ def test_prune_unreferenced_no_oopses(self):
448+ # This shouldn't crash.
449+ repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
450+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
451+ old = now - datetime.timedelta(weeks=1)
452+ repo.prune_unreferenced(old, now, [])
453+
454+ def test_prune_unreferenced_no_references(self):
455+ # When there are no references, everything specified is zerged.
456+ repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
457+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
458+ old = now - datetime.timedelta(weeks=1)
459+ report = {'time': now - datetime.timedelta(hours=5)}
460+ repo.publish(report, report['time'])
461+ repo.prune_unreferenced(old, now, [])
462+ self.assertThat(lambda: repo.oldest_date(), raises(ValueError))
463+
464+ def test_prune_unreferenced_outside_dates_kept(self):
465+ # Pruning only affects stuff in the datedirs selected by the dates.
466+ repo = DateDirRepo(
467+ self.useFixture(TempDir()).path, inherit_id=True, stash_path=True)
468+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
469+ old = now - datetime.timedelta(weeks=1)
470+ before = {'time': old - datetime.timedelta(minutes=1)}
471+ after = {'time': now + datetime.timedelta(minutes=1)}
472+ repo.publish(before, before['time'])
473+ repo.publish(after, after['time'])
474+ repo.prune_unreferenced(old, now, [])
475+ self.assertTrue(os.path.isfile(before['datedir_repo_filepath']))
476+ self.assertTrue(os.path.isfile(after['datedir_repo_filepath']))
477+
478+ def test_prune_referenced_inside_dates_kept(self):
479+ repo = DateDirRepo(
480+ self.useFixture(TempDir()).path, inherit_id=True, stash_path=True)
481+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
482+ old = now - datetime.timedelta(weeks=1)
483+ report = {'id': 'foo', 'time': now - datetime.timedelta(minutes=1)}
484+ repo.publish(report, report['time'])
485+ repo.prune_unreferenced(old, now, ['foo'])
486+ self.assertTrue(os.path.isfile(report['datedir_repo_filepath']))
487+
488+ def test_prune_report_midnight_gets_invalid_timed_reports(self):
489+ # If a report has a wonky or missing time, pruning treats it as being
490+ # timed on midnight of the datedir day it is on.
491+ repo = DateDirRepo(self.useFixture(TempDir()).path, stash_path=True)
492+ now = datetime.datetime(2006, 04, 01, 00, 01, 00, tzinfo=utc)
493+ old = now - datetime.timedelta(minutes=2)
494+ badtime = {'time': now - datetime.timedelta(weeks=2)}
495+ missingtime = {}
496+ repo.publish(badtime, now)
497+ repo.publish(missingtime, now)
498+ repo.prune_unreferenced(old, now, [])
499+ self.assertThat(lambda: repo.oldest_date(), raises(ValueError))
500
501=== modified file 'setup.py'
502--- setup.py 2011-11-13 21:07:43 +0000
503+++ setup.py 2011-11-15 21:50:28 +0000
504@@ -40,6 +40,7 @@
505 install_requires = [
506 'bson',
507 'iso8601',
508+ 'launchpadlib', # Needed for pruning - perhaps should be optional.
509 'oops',
510 'pytz',
511 ],
512@@ -49,4 +50,8 @@
513 'testtools',
514 ]
515 ),
516+ entry_points=dict(
517+ console_scripts=[ # `console_scripts` is a magic name to setuptools
518+ 'prune = oops_datedir_repo.prune:main',
519+ ]),
520 )
521
522=== modified file 'versions.cfg'
523--- versions.cfg 2011-10-07 05:52:09 +0000
524+++ versions.cfg 2011-11-15 21:50:28 +0000
525@@ -3,13 +3,25 @@
526
527 [versions]
528 bson = 0.3.2
529+elementtree = 1.2.6-20050316
530 fixtures = 0.3.6
531+httplib2 = 0.6.0
532 iso8601 = 0.1.4
533+keyring = 0.6.2
534+launchpadlib = 1.9.9
535+lazr.authentication = 0.1.1
536+lazr.restfulclient = 0.12.0
537+lazr.uri = 1.0.2
538+oauth = 1.0.0
539 oops = 0.0.8
540 pytz = 2010o
541 setuptools = 0.6c11
542+simplejson = 2.1.3
543 testtools = 0.9.11
544+wadllib = 1.2.0
545+wsgi-intercept = 0.4
546 zc.recipe.egg = 1.3.2
547 z3c.recipe.filetemplate = 2.1.0
548 z3c.recipe.scripts = 1.0.1
549 zc.buildout = 1.5.1
550+zope.interface = 3.8.0

Subscribers

People subscribed via source and target branches

to all changes: