Merge lp:~lifeless/python-oops-datedir-repo/oops-prune into lp:python-oops-datedir-repo

Proposed by Robert Collins
Status: Merged
Merged at revision: 25
Proposed branch: lp:~lifeless/python-oops-datedir-repo/oops-prune
Merge into: lp:python-oops-datedir-repo
Diff against target: 550 lines (+431/-9)
6 files modified
NEWS (+18/-0)
oops_datedir_repo/prune.py (+155/-0)
oops_datedir_repo/repository.py (+140/-9)
oops_datedir_repo/tests/test_repository.py (+101/-0)
setup.py (+5/-0)
versions.cfg (+12/-0)
To merge this branch: bzr merge lp:~lifeless/python-oops-datedir-repo/oops-prune
Reviewer Review Type Date Requested Status
Steve Kowalik (community) code Approve
Review via email: mp+82324@code.launchpad.net

Commit message

Add GC facility using Launchpad to obtain references to OOPS reports.

Description of the change

Implement incremental OOPS pruning, finishing the decoupling of Launchpad from OOPS storage.

I'm pretty happy with this branch, the only untested code is the CLI glue, which is traditionally a PITA to test - and exacerbated here by the need to talk to Launchpad itself.

For now, I've deliberately left it untested - all the complex logic is in the fully TDD and unit tested repository object.

To post a comment you must log in.
Revision history for this message
Steve Kowalik (stevenk) wrote :

The only comment I have is some of your comments are not full sentences. Other than that, this looks good.

review: Approve (code)
Revision history for this message
Robert Collins (lifeless) wrote :

I will do a quick audit. Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2011-11-13 21:03:43 +0000
+++ NEWS 2011-11-15 21:50:28 +0000
@@ -6,6 +6,24 @@
6NEXT6NEXT
7----7----
88
9* Repository has a simple generic config API. See the set_config and get_config
10 methods. (Robert Collins)
11
12* Repository can now answer 'what is the oldest date in the repository' which
13 is useful for incremental report pruning. See the oldest_date method.
14 (Robert Collins)
15
16* Repository can perform garbage collection of a date range if a list of
17 references to keep is supplied. See the prune_unreferenced method.
18 (Robert Collins)
19
20* There is a new script bin/prune which will prune reports from a repository
21 keeping only those referenced in a given Launchpad project or project group.
22 This adds a dependency on launchpadlib, which should be pypi installable
23 and is included in the Ubuntu default install - so should be a low barrier
24 for use. If this is an issue a future release can split this out into a
25 new addon package. (Robert Collins, #890875)
26
90.0.11270.0.11
10------28------
1129
1230
=== added file 'oops_datedir_repo/prune.py'
--- oops_datedir_repo/prune.py 1970-01-01 00:00:00 +0000
+++ oops_datedir_repo/prune.py 2011-11-15 21:50:28 +0000
@@ -0,0 +1,155 @@
1#
2# Copyright (c) 2011, Canonical Ltd
3#
4# This program is free software: you can redistribute it and/or modify
5# it under the terms of the GNU Lesser General Public License as published by
6# the Free Software Foundation, version 3 only.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU Lesser General Public License for more details.
12#
13# You should have received a copy of the GNU Lesser General Public License
14# along with this program. If not, see <http://www.gnu.org/licenses/>.
15# GNU Lesser General Public License version 3 (see the file LICENSE).
16
17"""Delete OOPSes that are not referenced in the bugtracker.
18
19Currently only has support for the Launchpad bug tracker.
20"""
21
22__metaclass__ = type
23
24import datetime
25import logging
26import optparse
27from textwrap import dedent
28import sys
29
30from launchpadlib.launchpad import Launchpad
31from launchpadlib.uris import lookup_service_root
32from pytz import utc
33
34import oops_datedir_repo
35
36__all__ = [
37 'main',
38 ]
39
40
41class LaunchpadTracker:
42 """Abstracted bug tracker/forums etc - permits testing of main()."""
43
44 def __init__(self, options):
45 self.lp = Launchpad.login_anonymously(
46 'oops-prune', options.lpinstance, version='devel')
47
48 def find_oops_references(self, start_time, end_time, project=None,
49 projectgroup=None):
50 projects = set([])
51 if project is not None:
52 projects.add(project)
53 if projectgroup is not None:
54 [projects.add(lp_proj.name)
55 for lp_proj in self.lp.project_groups[projectgroup].projects]
56 result = set()
57 lp_projects = self.lp.projects
58 one_week = datetime.timedelta(weeks=1)
59 for project in projects:
60 lp_project = lp_projects[project]
61 current_start = start_time
62 while current_start < end_time:
63 current_end = current_start + one_week
64 if current_end > end_time:
65 current_end = end_time
66 logging.info(
67 "Querying OOPS references on %s from %s to %s",
68 project, current_start, current_end)
69 result.update(lp_project.findReferencedOOPS(
70 start_date=current_start, end_date=current_end))
71 current_start = current_end
72 return result
73
74
75def main(argv=None, tracker=LaunchpadTracker, logging=logging):
76 """Console script entry point."""
77 if argv is None:
78 argv = sys.argv
79 usage = dedent("""\
80 %prog [options]
81
82 The following options must be supplied:
83 --repo
84
85 And either
86 --project
87 or
88 --projectgroup
89
90 e.g.
91 %prog --repo . --projectgroup launchpad-project
92
93 Will process every member project of launchpad-project.
94
95 When run this program will ask Launchpad for OOPS references made since
96 the last date it pruned up to, with an upper limit of one week from
97 today. It then looks in the repository for all oopses created during
98 that date range, and if they are not in the set returned by Launchpad,
99 deletes them. If the repository has never been pruned before, it will
100 pick the earliest datedir present in the repository as the start date.
101 """)
102 description = \
103 "Delete OOPS reports that are not referenced in a bug tracker."
104 parser = optparse.OptionParser(
105 description=description, usage=usage)
106 parser.add_option('--project',
107 help="Launchpad project to find references in.")
108 parser.add_option('--projectgroup',
109 help="Launchpad project group to find references in.")
110 parser.add_option('--repo', help="Path to the repository to read from.")
111 parser.add_option(
112 '--lpinstance', help="Launchpad instance to use", default="production")
113 options, args = parser.parse_args(argv[1:])
114 def needed(*optnames):
115 present = set()
116 for optname in optnames:
117 if getattr(options, optname, None) is not None:
118 present.add(optname)
119 if not present:
120 if len(optnames) == 1:
121 raise ValueError('Option "%s" must be supplied' % optname)
122 else:
123 raise ValueError(
124 'One of options %s must be supplied' % (optnames,))
125 elif len(present) != 1:
126 raise ValueError(
127 'Only one of options %s can be supplied' % (optnames,))
128 needed('repo')
129 needed('project', 'projectgroup')
130 logging.basicConfig(
131 filename='prune.log', filemode='w', level=logging.DEBUG)
132 repo = oops_datedir_repo.DateDirRepo(options.repo)
133 one_week = datetime.timedelta(weeks=1)
134 one_day = datetime.timedelta(days=1)
135 # max date to scan for
136 prune_until = datetime.datetime.now(utc) - one_week
137 # Get min date to scan for
138 try:
139 prune_from = repo.get_config('pruned-until')
140 except KeyError:
141 try:
142 oldest_oops = repo.oldest_date()
143 except ValueError:
144 logging.info("No OOPSes in repo, nothing to do.")
145 return 0
146 prune_from = datetime.datetime.fromordinal(oldest_oops.toordinal())
147 # get references from date range
148 finder = tracker(options)
149 references = finder.find_oops_references(
150 prune_from, prune_until, options.project, options.projectgroup)
151 # delete oops files on disk
152 repo.prune_unreferenced(prune_from, prune_until, references)
153 # stash most recent date
154 repo.set_config('pruned-until', prune_until)
155 return 0
0156
=== modified file 'oops_datedir_repo/repository.py'
--- oops_datedir_repo/repository.py 2011-11-11 04:21:05 +0000
+++ oops_datedir_repo/repository.py 2011-11-15 21:50:28 +0000
@@ -23,11 +23,13 @@
23 ]23 ]
2424
25import datetime25import datetime
26import errno
26from functools import partial27from functools import partial
27from hashlib import md528from hashlib import md5
28import os.path29import os.path
29import stat30import stat
3031
32import bson
31from pytz import utc33from pytz import utc
3234
33import serializer35import serializer
@@ -37,7 +39,19 @@
3739
3840
39class DateDirRepo:41class DateDirRepo:
40 """Publish oopses to a date-dir repository."""42 """Publish oopses to a date-dir repository.
43
44 A date-dir repository is a directory containing:
45
46 * Zero or one directories called 'metadata'. If it exists this directory
47 contains any housekeeping material needed (such as a metadata.conf ini
48 file).
49
50 * Zero or more directories named like YYYY-MM-DD, which contain zero or
51 more OOPS reports. OOPS file names can take various forms, but must not
52 end in .tmp - those are considered to be OOPS reports that are currently
53 being written.
54 """
4155
42 def __init__(self, error_dir, instance_id=None, serializer=None,56 def __init__(self, error_dir, instance_id=None, serializer=None,
43 inherit_id=False, stash_path=False):57 inherit_id=False, stash_path=False):
@@ -70,12 +84,14 @@
70 )84 )
71 else:85 else:
72 self.log_namer = None86 self.log_namer = None
73 self.root = error_dir87 self.root = error_dir
74 if serializer is None:88 if serializer is None:
75 serializer = serializer_bson89 serializer = serializer_bson
76 self.serializer = serializer90 self.serializer = serializer
77 self.inherit_id = inherit_id91 self.inherit_id = inherit_id
78 self.stash_path = stash_path92 self.stash_path = stash_path
93 self.metadatadir = os.path.join(self.root, 'metadata')
94 self.config_path = os.path.join(self.metadatadir, 'config.bson')
7995
80 def publish(self, report, now=None):96 def publish(self, report, now=None):
81 """Write the report to disk.97 """Write the report to disk.
@@ -148,13 +164,8 @@
148 two_days = datetime.timedelta(2)164 two_days = datetime.timedelta(2)
149 now = datetime.date.today()165 now = datetime.date.today()
150 old = now - two_days166 old = now - two_days
151 for dirname in os.listdir(self.root):167 for dirname, (y,m,d) in self._datedirs():
152 try:168 date = datetime.date(y, m, d)
153 y, m, d = dirname.split('-')
154 except ValueError:
155 # Not a datedir
156 continue
157 date = datetime.date(int(y),int(m),int(d))
158 prune = date < old169 prune = date < old
159 dirpath = os.path.join(self.root, dirname)170 dirpath = os.path.join(self.root, dirname)
160 files = os.listdir(dirpath)171 files = os.listdir(dirpath)
@@ -171,3 +182,123 @@
171 oopsid = publisher(report)182 oopsid = publisher(report)
172 if oopsid:183 if oopsid:
173 os.unlink(candidate)184 os.unlink(candidate)
185
186 def _datedirs(self):
187 """Yield each subdir which looks like a datedir."""
188 for dirname in os.listdir(self.root):
189 try:
190 y, m, d = dirname.split('-')
191 y = int(y)
192 m = int(m)
193 d = int(d)
194 except ValueError:
195 # Not a datedir
196 continue
197 yield dirname, (y, m, d)
198
199 def _read_config(self):
200 """Return the current config document from disk."""
201 try:
202 with open(self.config_path, 'rb') as config_file:
203 return bson.loads(config_file.read())
204 except IOError, e:
205 if e.errno != errno.ENOENT:
206 raise
207 return {}
208
209 def get_config(self, key):
210 """Return a key from the repository config.
211
212 :param key: A key to read from the config.
213 """
214 return self._read_config()[key]
215
216 def set_config(self, key, value):
217 """Set config option key to value.
218
219 This is written to the bson document root/metadata/config.bson
220
221 :param key: The key to set - anything that can be a key in a bson
222 document.
223 :param value: The value to set - anything that can be a value in a
224 bson document.
225 """
226 config = self._read_config()
227 config[key] = value
228 try:
229 with open(self.config_path + '.tmp', 'wb') as config_file:
230 config_file.write(bson.dumps(config))
231 except IOError, e:
232 if e.errno != errno.ENOENT:
233 raise
234 os.mkdir(self.metadatadir)
235 with open(self.config_path + '.tmp', 'wb') as config_file:
236 config_file.write(bson.dumps(config))
237 os.rename(self.config_path + '.tmp', self.config_path)
238
239 def oldest_date(self):
240 """Return the date of the oldest datedir in the repository.
241
242 If pruning / resubmission is working this should also be the date of
243 the oldest oops in the repository.
244 """
245 dirs = list(self._datedirs())
246 if not dirs:
247 raise ValueError("No OOPSes in repository.")
248 return datetime.date(*sorted(dirs)[0][1])
249
250 def prune_unreferenced(self, start_time, stop_time, references):
251 """Delete OOPS reports filed between start_time and stop_time.
252
253 A report is deleted if all of the following are true:
254
255 * it is in a datedir covered by [start_time, stop_time] inclusive of
256 the end points.
257
258 * It is not in the set references.
259
260 * Its timestamp falls between start_time and stop_time inclusively or
261 it's timestamp is outside the datedir it is in or there is no
262 timestamp on the report.
263
264 :param start_time: The lower bound to prune within.
265 :param stop_time: The upper bound to prune within.
266 :param references: An iterable of OOPS ids to keep.
267 """
268 start_date = start_time.date()
269 stop_date = stop_time.date()
270 midnight = datetime.time(tzinfo=utc)
271 for dirname, (y,m,d) in self._datedirs():
272 dirdate = datetime.date(y, m, d)
273 if dirdate < start_date or dirdate > stop_date:
274 continue
275 dirpath = os.path.join(self.root, dirname)
276 files = os.listdir(dirpath)
277 deleted = 0
278 for candidate in map(partial(os.path.join, dirpath), files):
279 if candidate.endswith('.tmp'):
280 # Old half-written oops: just remove.
281 os.unlink(candidate)
282 deleted += 1
283 continue
284 with file(candidate, 'rb') as report_file:
285 report = serializer.read(report_file)
286 report_time = report.get('time', None)
287 if (report_time is None or
288 report_time.date() < dirdate or
289 report_time.date() > dirdate):
290 # The report is oddly filed or missing a precise
291 # datestamp. Treat it like midnight on the day of the
292 # directory it was placed in - this is a lower bound on
293 # when it was actually created.
294 report_time = datetime.datetime.combine(
295 dirdate, midnight)
296 if (report_time >= start_time and
297 report_time <= stop_time and
298 report['id'] not in references):
299 # Unreferenced and prunable
300 os.unlink(candidate)
301 deleted += 1
302 if deleted == len(files):
303 # Everything in the directory was deleted.
304 os.rmdir(dirpath)
174305
=== modified file 'oops_datedir_repo/tests/test_repository.py'
--- oops_datedir_repo/tests/test_repository.py 2011-11-11 04:21:05 +0000
+++ oops_datedir_repo/tests/test_repository.py 2011-11-15 21:50:28 +0000
@@ -26,6 +26,7 @@
26import bson26import bson
27from pytz import utc27from pytz import utc
28import testtools28import testtools
29from testtools.matchers import raises
2930
30from oops_datedir_repo import (31from oops_datedir_repo import (
31 DateDirRepo,32 DateDirRepo,
@@ -234,3 +235,103 @@
234 repo = DateDirRepo(self.useFixture(TempDir()).path)235 repo = DateDirRepo(self.useFixture(TempDir()).path)
235 os.mkdir(repo.root + '/foo')236 os.mkdir(repo.root + '/foo')
236 repo.republish([].append)237 repo.republish([].append)
238
239 def test_republish_ignores_metadata_dir(self):
240 # The metadata directory is never pruned
241 repo = DateDirRepo(self.useFixture(TempDir()).path)
242 os.mkdir(repo.root + '/metadata')
243 repo.republish([].append)
244 self.assertTrue(os.path.exists(repo.root + '/metadata'))
245
246 def test_get_config_value(self):
247 # Config values can be asked for from the repository.
248 repo = DateDirRepo(self.useFixture(TempDir()).path)
249 pruned = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
250 repo.set_config('pruned-until', pruned)
251 # Fresh instance, no memory tricks.
252 repo = DateDirRepo(repo.root)
253 self.assertEqual(pruned, repo.get_config('pruned-until'))
254
255 def test_set_config_value(self):
256 # Config values are just keys in a bson document.
257 repo = DateDirRepo(self.useFixture(TempDir()).path)
258 pruned = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
259 repo.set_config('pruned-until', pruned)
260 with open(repo.root + '/metadata/config.bson', 'rb') as config_file:
261 from_bson = bson.loads(config_file.read())
262 self.assertEqual({'pruned-until': pruned}, from_bson)
263
264 def test_set_config_preserves_other_values(self):
265 # E.g. setting 'a' does not affect 'b'
266 repo = DateDirRepo(self.useFixture(TempDir()).path)
267 repo.set_config('b', 'b-value')
268 repo = DateDirRepo(repo.root)
269 repo.set_config('a', 'a-value')
270 with open(repo.root + '/metadata/config.bson', 'rb') as config_file:
271 from_bson = bson.loads(config_file.read())
272 self.assertEqual({'a': 'a-value', 'b': 'b-value'}, from_bson)
273
274 def test_oldest_date_no_contents(self):
275 repo = DateDirRepo(self.useFixture(TempDir()).path)
276 self.assertThat(lambda: repo.oldest_date(),
277 raises(ValueError("No OOPSes in repository.")))
278
279 def test_oldest_date_is_oldest(self):
280 repo = DateDirRepo(self.useFixture(TempDir()).path)
281 os.mkdir(repo.root + '/2006-04-12')
282 os.mkdir(repo.root + '/2006-04-13')
283 self.assertEqual(datetime.date(2006, 4, 12), repo.oldest_date())
284
285 def test_prune_unreferenced_no_oopses(self):
286 # This shouldn't crash.
287 repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
288 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
289 old = now - datetime.timedelta(weeks=1)
290 repo.prune_unreferenced(old, now, [])
291
292 def test_prune_unreferenced_no_references(self):
293 # When there are no references, everything specified is zerged.
294 repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
295 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
296 old = now - datetime.timedelta(weeks=1)
297 report = {'time': now - datetime.timedelta(hours=5)}
298 repo.publish(report, report['time'])
299 repo.prune_unreferenced(old, now, [])
300 self.assertThat(lambda: repo.oldest_date(), raises(ValueError))
301
302 def test_prune_unreferenced_outside_dates_kept(self):
303 # Pruning only affects stuff in the datedirs selected by the dates.
304 repo = DateDirRepo(
305 self.useFixture(TempDir()).path, inherit_id=True, stash_path=True)
306 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
307 old = now - datetime.timedelta(weeks=1)
308 before = {'time': old - datetime.timedelta(minutes=1)}
309 after = {'time': now + datetime.timedelta(minutes=1)}
310 repo.publish(before, before['time'])
311 repo.publish(after, after['time'])
312 repo.prune_unreferenced(old, now, [])
313 self.assertTrue(os.path.isfile(before['datedir_repo_filepath']))
314 self.assertTrue(os.path.isfile(after['datedir_repo_filepath']))
315
316 def test_prune_referenced_inside_dates_kept(self):
317 repo = DateDirRepo(
318 self.useFixture(TempDir()).path, inherit_id=True, stash_path=True)
319 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
320 old = now - datetime.timedelta(weeks=1)
321 report = {'id': 'foo', 'time': now - datetime.timedelta(minutes=1)}
322 repo.publish(report, report['time'])
323 repo.prune_unreferenced(old, now, ['foo'])
324 self.assertTrue(os.path.isfile(report['datedir_repo_filepath']))
325
326 def test_prune_report_midnight_gets_invalid_timed_reports(self):
327 # If a report has a wonky or missing time, pruning treats it as being
328 # timed on midnight of the datedir day it is on.
329 repo = DateDirRepo(self.useFixture(TempDir()).path, stash_path=True)
330 now = datetime.datetime(2006, 04, 01, 00, 01, 00, tzinfo=utc)
331 old = now - datetime.timedelta(minutes=2)
332 badtime = {'time': now - datetime.timedelta(weeks=2)}
333 missingtime = {}
334 repo.publish(badtime, now)
335 repo.publish(missingtime, now)
336 repo.prune_unreferenced(old, now, [])
337 self.assertThat(lambda: repo.oldest_date(), raises(ValueError))
237338
=== modified file 'setup.py'
--- setup.py 2011-11-13 21:07:43 +0000
+++ setup.py 2011-11-15 21:50:28 +0000
@@ -40,6 +40,7 @@
40 install_requires = [40 install_requires = [
41 'bson',41 'bson',
42 'iso8601',42 'iso8601',
43 'launchpadlib', # Needed for pruning - perhaps should be optional.
43 'oops',44 'oops',
44 'pytz',45 'pytz',
45 ],46 ],
@@ -49,4 +50,8 @@
49 'testtools',50 'testtools',
50 ]51 ]
51 ),52 ),
53 entry_points=dict(
54 console_scripts=[ # `console_scripts` is a magic name to setuptools
55 'prune = oops_datedir_repo.prune:main',
56 ]),
52 )57 )
5358
=== modified file 'versions.cfg'
--- versions.cfg 2011-10-07 05:52:09 +0000
+++ versions.cfg 2011-11-15 21:50:28 +0000
@@ -3,13 +3,25 @@
33
4[versions]4[versions]
5bson = 0.3.25bson = 0.3.2
6elementtree = 1.2.6-20050316
6fixtures = 0.3.67fixtures = 0.3.6
8httplib2 = 0.6.0
7iso8601 = 0.1.49iso8601 = 0.1.4
10keyring = 0.6.2
11launchpadlib = 1.9.9
12lazr.authentication = 0.1.1
13lazr.restfulclient = 0.12.0
14lazr.uri = 1.0.2
15oauth = 1.0.0
8oops = 0.0.816oops = 0.0.8
9pytz = 2010o17pytz = 2010o
10setuptools = 0.6c1118setuptools = 0.6c11
19simplejson = 2.1.3
11testtools = 0.9.1120testtools = 0.9.11
21wadllib = 1.2.0
22wsgi-intercept = 0.4
12zc.recipe.egg = 1.3.223zc.recipe.egg = 1.3.2
13z3c.recipe.filetemplate = 2.1.024z3c.recipe.filetemplate = 2.1.0
14z3c.recipe.scripts = 1.0.125z3c.recipe.scripts = 1.0.1
15zc.buildout = 1.5.126zc.buildout = 1.5.1
27zope.interface = 3.8.0

Subscribers

People subscribed via source and target branches

to all changes: