Merge lp:~lifeless/python-oops-datedir-repo/0.0.9 into lp:python-oops-datedir-repo

Proposed by Robert Collins
Status: Merged
Merged at revision: 18
Proposed branch: lp:~lifeless/python-oops-datedir-repo/0.0.9
Merge into: lp:python-oops-datedir-repo
Diff against target: 192 lines (+83/-7)
6 files modified
NEWS (+11/-0)
README (+1/-1)
oops_datedir_repo/__init__.py (+1/-1)
oops_datedir_repo/repository.py (+16/-3)
oops_datedir_repo/tests/test_repository.py (+53/-1)
setup.py (+1/-1)
To merge this branch: bzr merge lp:~lifeless/python-oops-datedir-repo/0.0.9
Reviewer Review Type Date Requested Status
j.c.sackett (community) Approve
Review via email: mp+78793@code.launchpad.net

Description of the change

Prep for amqp in oops-tools: Support for using ids supplied to the publisher - useful for oops-amqp which shows an id to users before the disk storage code sees the oops, and permit passing the filename chosen for storage forward, which lets the oops-tools publisher put the filename in to the django model.

To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

I think it would be clearer if self.inherit_id left the value of report[id] alone::

  tmp_id = 'OOPS-%s' % md5hash
  if not self.inherit or report.get(id) is None:
    report[id] = tmp_id

"return original_id or report['id']" looks like it's meant to return the canonical oops-id, but when original_id is non-None, won't it be the same as report['id']?

Revision history for this message
Robert Collins (lifeless) wrote :

So thats a really good question.

We have to replace report['id'] in the copy we write, or else its not visible to someone reading the report later.

I've noticed a defect in the branch - we don't inherit the id with the lognamer based id allocator - I'll fix that (we won't use the combination, but someone may) and push that up.

I had the thing in my head that I needed to eject any existing id before hashing, but you're quite right, we can hash with an existing id in there and its no less unique - though if we are not inheriting the id:
 - someone publishes the same oops twice with differing ids, we'd end up writing it twice to disk - big deal
 - publishing the same oops with no id in it would behave as we do today
 - publishing different oopses with the same id will behave as we do today

I'll have a fiddle when I fix the defect and see how it looks.

Thanks,
Rob

Revision history for this message
Robert Collins (lifeless) wrote :

I've pushed up revisions fixing the issue and incorporated aarons feedback into a simpler implementation.

Revision history for this message
j.c.sackett (jcsackett) wrote :

With the change that Aaron suggested, I see nothing to block this.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2011-10-07 06:02:44 +0000
3+++ NEWS 2011-10-13 02:19:24 +0000
4@@ -6,6 +6,17 @@
5 NEXT
6 ----
7
8+0.0.9
9+-----
10+
11+* Permit only generating an OOPS id if the report being published does not
12+ already have one. As this can lead to garbage in the system, the default
13+ is to always replace it. (Robert Collins)
14+
15+* Permit storing the filename that the OOPS was written to in the OOPS after
16+ publication. This is useful for code building on DateDirRepo as a storage
17+ facility for OOPSes. (Robert Collins)
18+
19 0.0.8
20 -----
21
22
23=== modified file 'README'
24--- README 2011-08-16 02:21:20 +0000
25+++ README 2011-10-13 02:19:24 +0000
26@@ -82,7 +82,7 @@
27 bin/py to get a python interpreter with the dependencies available.
28
29 To run the tests use the runner of your choice, the test suite is
30-oops.tests.test_suite.
31+oops_datedir_repo.tests.test_suite.
32
33 For instance::
34
35
36=== modified file 'oops_datedir_repo/__init__.py'
37--- oops_datedir_repo/__init__.py 2011-10-07 05:52:09 +0000
38+++ oops_datedir_repo/__init__.py 2011-10-13 02:19:24 +0000
39@@ -25,7 +25,7 @@
40 # established at this point, and setup.py will use a version of next-$(revno).
41 # If the releaselevel is 'final', then the tarball will be major.minor.micro.
42 # Otherwise it is major.minor.micro~$(revno).
43-__version__ = (0, 0, 8, 'beta', 0)
44+__version__ = (0, 0, 9, 'beta', 0)
45
46 __all__ = [
47 'DateDirRepo',
48
49=== modified file 'oops_datedir_repo/repository.py'
50--- oops_datedir_repo/repository.py 2011-10-07 06:02:44 +0000
51+++ oops_datedir_repo/repository.py 2011-10-13 02:19:24 +0000
52@@ -37,7 +37,8 @@
53 class DateDirRepo:
54 """Publish oopses to a date-dir repository."""
55
56- def __init__(self, error_dir, instance_id=None, serializer=None):
57+ def __init__(self, error_dir, instance_id=None, serializer=None,
58+ inherit_id=False, stash_path=False):
59 """Create a DateDirRepo.
60
61 :param error_dir: The base directory to write OOPSes into. OOPSes are
62@@ -52,6 +53,12 @@
63 :param serializer: If supplied should be the module (e.g.
64 oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes.
65 Defaults to using serializer_bson.
66+ :param inherit_id: If True, use the oops ID (if present) supplied in
67+ the report, rather than always assigning a new one.
68+ :param stash_path: If True, the filename that the OOPS was written to
69+ is stored in the OOPS report under the key 'datedir_repo_filepath'.
70+ It is not stored in the OOPS written to disk, only the in-memory
71+ model.
72 """
73 if instance_id is not None:
74 self.log_namer = UniqueFileAllocator(
75@@ -65,6 +72,8 @@
76 if serializer is None:
77 serializer = serializer_bson
78 self.serializer = serializer
79+ self.inherit_id = inherit_id
80+ self.stash_path = stash_path
81
82 def publish(self, report, now=None):
83 """Write the report to disk.
84@@ -76,20 +85,24 @@
85 now = now.astimezone(utc)
86 else:
87 now = datetime.datetime.now(utc)
88- # Don't mess with the original report:
89+ # Don't mess with the original report when changing ids etc.
90+ original_report = report
91 report = dict(report)
92 if self.log_namer is not None:
93 oopsid, filename = self.log_namer.newId(now)
94 else:
95- report.pop('id', None)
96 md5hash = md5(serializer_bson.dumps(report)).hexdigest()
97 oopsid = 'OOPS-%s' % md5hash
98 prefix = os.path.join(self.root, now.strftime('%Y-%m-%d'))
99 if not os.path.isdir(prefix):
100 os.makedirs(prefix)
101 filename = os.path.join(prefix, oopsid)
102+ if self.inherit_id:
103+ oopsid = report.get('id') or oopsid
104 report['id'] = oopsid
105 self.serializer.write(report, open(filename, 'wb'))
106+ if self.stash_path:
107+ original_report['datedir_repo_filepath'] = filename
108 # Set file permission to: rw-r--r-- (so that reports from
109 # umask-restricted services can be gathered by a tool running as
110 # another user).
111
112=== modified file 'oops_datedir_repo/tests/test_repository.py'
113--- oops_datedir_repo/tests/test_repository.py 2011-10-07 06:02:44 +0000
114+++ oops_datedir_repo/tests/test_repository.py 2011-10-13 02:19:24 +0000
115@@ -81,7 +81,13 @@
116 def test_publish_via_hash(self):
117 repo = DateDirRepo(self.useFixture(TempDir()).path)
118 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
119- report = {'time': now}
120+ # Note the presence of 'id' in the report: this is included in the hash
121+ # calculation (because there is no reason not to - we don't promise
122+ # that reports only differing by id will be assigned the same id even
123+ # when publishing by hash: the hashing is a way to get meaningfully
124+ # unique ids, not a way to optimise similar reports [as all reports
125+ # should have unique timestamps anyway...]
126+ report = {'id': 'ignored', 'time': now}
127 # NB: bson output depends on dict order, so the resulting hash can be
128 # machine specific. This is fine because its merely a strategy to get
129 # unique ids, and after creating the id it is preserved in what is
130@@ -105,3 +111,49 @@
131 repo.publish(report, now)
132 report2 = {'time': now, 'foo': 'bar'}
133 repo.publish(report2, now)
134+
135+ def test_publish_existing_id(self):
136+ # oops_amqp wants to publish to a DateDirRepo but already has an id
137+ # that the user has been told about.
138+ repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
139+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
140+ report = {'time': now, 'id': '45'}
141+ self.assertEqual('45', repo.publish(dict(report), now))
142+ # And to be sure, check the file on disk.
143+ dir = repo.root + '/2006-04-01/'
144+ files = os.listdir(dir)
145+ with open(dir + files[0], 'rb') as fp:
146+ self.assertEqual(report, bson.loads(fp.read()))
147+
148+ def test_publish_existing_id_lognamer(self):
149+ # The id reuse and file allocation strategies should be separate.
150+ repo = DateDirRepo(self.useFixture(TempDir()).path, 'X',
151+ inherit_id=True, stash_path=True)
152+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
153+ report = {'time': now, 'id': '45'}
154+ published_report = dict(report)
155+ self.assertEqual('45', repo.publish(published_report, now))
156+ # The file on disk should have the supplied id.
157+ with open(published_report['datedir_repo_filepath'], 'rb') as fp:
158+ self.assertEqual(report, bson.loads(fp.read()))
159+
160+ def test_publish_add_path(self):
161+ # oops_tools wants to publish to both disk (via DateDirRepo) and a
162+ # database; the database wants the path of the OOPS report on disk.
163+ # Putting it in the (original) report allows the database publisher to
164+ # read that out. This is a little ugly, but there are many worse ways
165+ # to do it, and no known better ones that don't make the core protocol
166+ # too tightly bound to disk publishing.
167+ repo = DateDirRepo(self.useFixture(TempDir()).path, stash_path=True,
168+ inherit_id=True)
169+ now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
170+ report = {'time': now, 'id': '45'}
171+ expected_disk_report = dict(report)
172+ self.assertEqual('45', repo.publish(report, now))
173+ dir = repo.root + '/2006-04-01/'
174+ files = os.listdir(dir)
175+ expected_path = dir + files[0]
176+ self.assertEqual(expected_path, report['datedir_repo_filepath'])
177+ with open(expected_path, 'rb') as fp:
178+ self.assertEqual(expected_disk_report, bson.loads(fp.read()))
179+
180
181=== modified file 'setup.py'
182--- setup.py 2011-10-07 05:52:09 +0000
183+++ setup.py 2011-10-13 02:19:24 +0000
184@@ -22,7 +22,7 @@
185 description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
186
187 setup(name="oops_datedir_repo",
188- version="0.0.8",
189+ version="0.0.9",
190 description="OOPS disk serialisation and repository management.",
191 long_description=description,
192 maintainer="Launchpad Developers",

Subscribers

People subscribed via source and target branches

to all changes: