Merge lp:~lifeless/python-oops-datedir-repo/0.0.9 into lp:python-oops-datedir-repo

Proposed by Robert Collins
Status: Merged
Merged at revision: 18
Proposed branch: lp:~lifeless/python-oops-datedir-repo/0.0.9
Merge into: lp:python-oops-datedir-repo
Diff against target: 192 lines (+83/-7)
6 files modified
NEWS (+11/-0)
README (+1/-1)
oops_datedir_repo/__init__.py (+1/-1)
oops_datedir_repo/repository.py (+16/-3)
oops_datedir_repo/tests/test_repository.py (+53/-1)
setup.py (+1/-1)
To merge this branch: bzr merge lp:~lifeless/python-oops-datedir-repo/0.0.9
Reviewer Review Type Date Requested Status
j.c.sackett (community) Approve
Review via email: mp+78793@code.launchpad.net

Description of the change

Prep for amqp in oops-tools: Support for using ids supplied to the publisher - useful for oops-amqp which shows an id to users before the disk storage code sees the oops, and permit passing the filename chosen for storage forward, which lets the oops-tools publisher put the filename in to the django model.

To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

I think it would be clearer if self.inherit_id left the value of report[id] alone::

  tmp_id = 'OOPS-%s' % md5hash
  if not self.inherit or report.get(id) is None:
    report[id] = tmp_id

"return original_id or report['id']" looks like it's meant to return the canonical oops-id, but when original_id is non-None, won't it be the same as report['id']?

Revision history for this message
Robert Collins (lifeless) wrote :

So thats a really good question.

We have to replace report['id'] in the copy we write, or else its not visible to someone reading the report later.

I've noticed a defect in the branch - we don't inherit the id with the lognamer based id allocator - I'll fix that (we won't use the combination, but someone may) and push that up.

I had the thing in my head that I needed to eject any existing id before hashing, but you're quite right, we can hash with an existing id in there and its no less unique - though if we are not inheriting the id:
 - someone publishes the same oops twice with differing ids, we'd end up writing it twice to disk - big deal
 - publishing the same oops with no id in it would behave as we do today
 - publishing different oopses with the same id will behave as we do today

I'll have a fiddle when I fix the defect and see how it looks.

Thanks,
Rob

Revision history for this message
Robert Collins (lifeless) wrote :

I've pushed up revisions fixing the issue and incorporated aarons feedback into a simpler implementation.

Revision history for this message
j.c.sackett (jcsackett) wrote :

With the change that Aaron suggested, I see nothing to block this.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2011-10-07 06:02:44 +0000
+++ NEWS 2011-10-13 02:19:24 +0000
@@ -6,6 +6,17 @@
6NEXT6NEXT
7----7----
88
90.0.9
10-----
11
12* Permit only generating an OOPS id if the report being published does not
13 already have one. As this can lead to garbage in the system, the default
14 is to always replace it. (Robert Collins)
15
16* Permit storing the filename that the OOPS was written to in the OOPS after
17 publication. This is useful for code building on DateDirRepo as a storage
18 facility for OOPSes. (Robert Collins)
19
90.0.8200.0.8
10-----21-----
1122
1223
=== modified file 'README'
--- README 2011-08-16 02:21:20 +0000
+++ README 2011-10-13 02:19:24 +0000
@@ -82,7 +82,7 @@
82bin/py to get a python interpreter with the dependencies available.82bin/py to get a python interpreter with the dependencies available.
8383
84To run the tests use the runner of your choice, the test suite is84To run the tests use the runner of your choice, the test suite is
85oops.tests.test_suite.85oops_datedir_repo.tests.test_suite.
8686
87For instance::87For instance::
8888
8989
=== modified file 'oops_datedir_repo/__init__.py'
--- oops_datedir_repo/__init__.py 2011-10-07 05:52:09 +0000
+++ oops_datedir_repo/__init__.py 2011-10-13 02:19:24 +0000
@@ -25,7 +25,7 @@
25# established at this point, and setup.py will use a version of next-$(revno).25# established at this point, and setup.py will use a version of next-$(revno).
26# If the releaselevel is 'final', then the tarball will be major.minor.micro.26# If the releaselevel is 'final', then the tarball will be major.minor.micro.
27# Otherwise it is major.minor.micro~$(revno).27# Otherwise it is major.minor.micro~$(revno).
28__version__ = (0, 0, 8, 'beta', 0)28__version__ = (0, 0, 9, 'beta', 0)
2929
30__all__ = [30__all__ = [
31 'DateDirRepo',31 'DateDirRepo',
3232
=== modified file 'oops_datedir_repo/repository.py'
--- oops_datedir_repo/repository.py 2011-10-07 06:02:44 +0000
+++ oops_datedir_repo/repository.py 2011-10-13 02:19:24 +0000
@@ -37,7 +37,8 @@
37class DateDirRepo:37class DateDirRepo:
38 """Publish oopses to a date-dir repository."""38 """Publish oopses to a date-dir repository."""
3939
40 def __init__(self, error_dir, instance_id=None, serializer=None):40 def __init__(self, error_dir, instance_id=None, serializer=None,
41 inherit_id=False, stash_path=False):
41 """Create a DateDirRepo.42 """Create a DateDirRepo.
4243
43 :param error_dir: The base directory to write OOPSes into. OOPSes are44 :param error_dir: The base directory to write OOPSes into. OOPSes are
@@ -52,6 +53,12 @@
52 :param serializer: If supplied should be the module (e.g.53 :param serializer: If supplied should be the module (e.g.
53 oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes.54 oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes.
54 Defaults to using serializer_bson.55 Defaults to using serializer_bson.
56 :param inherit_id: If True, use the oops ID (if present) supplied in
57 the report, rather than always assigning a new one.
58 :param stash_path: If True, the filename that the OOPS was written to
59 is stored in the OOPS report under the key 'datedir_repo_filepath'.
60 It is not stored in the OOPS written to disk, only the in-memory
61 model.
55 """62 """
56 if instance_id is not None:63 if instance_id is not None:
57 self.log_namer = UniqueFileAllocator(64 self.log_namer = UniqueFileAllocator(
@@ -65,6 +72,8 @@
65 if serializer is None:72 if serializer is None:
66 serializer = serializer_bson73 serializer = serializer_bson
67 self.serializer = serializer74 self.serializer = serializer
75 self.inherit_id = inherit_id
76 self.stash_path = stash_path
6877
69 def publish(self, report, now=None):78 def publish(self, report, now=None):
70 """Write the report to disk.79 """Write the report to disk.
@@ -76,20 +85,24 @@
76 now = now.astimezone(utc)85 now = now.astimezone(utc)
77 else:86 else:
78 now = datetime.datetime.now(utc)87 now = datetime.datetime.now(utc)
79 # Don't mess with the original report:88 # Don't mess with the original report when changing ids etc.
89 original_report = report
80 report = dict(report)90 report = dict(report)
81 if self.log_namer is not None:91 if self.log_namer is not None:
82 oopsid, filename = self.log_namer.newId(now)92 oopsid, filename = self.log_namer.newId(now)
83 else:93 else:
84 report.pop('id', None)
85 md5hash = md5(serializer_bson.dumps(report)).hexdigest()94 md5hash = md5(serializer_bson.dumps(report)).hexdigest()
86 oopsid = 'OOPS-%s' % md5hash95 oopsid = 'OOPS-%s' % md5hash
87 prefix = os.path.join(self.root, now.strftime('%Y-%m-%d'))96 prefix = os.path.join(self.root, now.strftime('%Y-%m-%d'))
88 if not os.path.isdir(prefix):97 if not os.path.isdir(prefix):
89 os.makedirs(prefix)98 os.makedirs(prefix)
90 filename = os.path.join(prefix, oopsid)99 filename = os.path.join(prefix, oopsid)
100 if self.inherit_id:
101 oopsid = report.get('id') or oopsid
91 report['id'] = oopsid102 report['id'] = oopsid
92 self.serializer.write(report, open(filename, 'wb'))103 self.serializer.write(report, open(filename, 'wb'))
104 if self.stash_path:
105 original_report['datedir_repo_filepath'] = filename
93 # Set file permission to: rw-r--r-- (so that reports from106 # Set file permission to: rw-r--r-- (so that reports from
94 # umask-restricted services can be gathered by a tool running as107 # umask-restricted services can be gathered by a tool running as
95 # another user).108 # another user).
96109
=== modified file 'oops_datedir_repo/tests/test_repository.py'
--- oops_datedir_repo/tests/test_repository.py 2011-10-07 06:02:44 +0000
+++ oops_datedir_repo/tests/test_repository.py 2011-10-13 02:19:24 +0000
@@ -81,7 +81,13 @@
81 def test_publish_via_hash(self):81 def test_publish_via_hash(self):
82 repo = DateDirRepo(self.useFixture(TempDir()).path)82 repo = DateDirRepo(self.useFixture(TempDir()).path)
83 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)83 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
84 report = {'time': now}84 # Note the presence of 'id' in the report: this is included in the hash
85 # calculation (because there is no reason not to - we don't promise
86 # that reports only differing by id will be assigned the same id even
87 # when publishing by hash: the hashing is a way to get meaningfully
88 # unique ids, not a way to optimise similar reports [as all reports
89 # should have unique timestamps anyway...]
90 report = {'id': 'ignored', 'time': now}
85 # NB: bson output depends on dict order, so the resulting hash can be91 # NB: bson output depends on dict order, so the resulting hash can be
86 # machine specific. This is fine because its merely a strategy to get92 # machine specific. This is fine because its merely a strategy to get
87 # unique ids, and after creating the id it is preserved in what is93 # unique ids, and after creating the id it is preserved in what is
@@ -105,3 +111,49 @@
105 repo.publish(report, now)111 repo.publish(report, now)
106 report2 = {'time': now, 'foo': 'bar'}112 report2 = {'time': now, 'foo': 'bar'}
107 repo.publish(report2, now)113 repo.publish(report2, now)
114
115 def test_publish_existing_id(self):
116 # oops_amqp wants to publish to a DateDirRepo but already has an id
117 # that the user has been told about.
118 repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
119 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
120 report = {'time': now, 'id': '45'}
121 self.assertEqual('45', repo.publish(dict(report), now))
122 # And to be sure, check the file on disk.
123 dir = repo.root + '/2006-04-01/'
124 files = os.listdir(dir)
125 with open(dir + files[0], 'rb') as fp:
126 self.assertEqual(report, bson.loads(fp.read()))
127
128 def test_publish_existing_id_lognamer(self):
129 # The id reuse and file allocation strategies should be separate.
130 repo = DateDirRepo(self.useFixture(TempDir()).path, 'X',
131 inherit_id=True, stash_path=True)
132 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
133 report = {'time': now, 'id': '45'}
134 published_report = dict(report)
135 self.assertEqual('45', repo.publish(published_report, now))
136 # The file on disk should have the supplied id.
137 with open(published_report['datedir_repo_filepath'], 'rb') as fp:
138 self.assertEqual(report, bson.loads(fp.read()))
139
140 def test_publish_add_path(self):
141 # oops_tools wants to publish to both disk (via DateDirRepo) and a
142 # database; the database wants the path of the OOPS report on disk.
143 # Putting it in the (original) report allows the database publisher to
144 # read that out. This is a little ugly, but there are many worse ways
145 # to do it, and no known better ones that don't make the core protocol
146 # too tightly bound to disk publishing.
147 repo = DateDirRepo(self.useFixture(TempDir()).path, stash_path=True,
148 inherit_id=True)
149 now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
150 report = {'time': now, 'id': '45'}
151 expected_disk_report = dict(report)
152 self.assertEqual('45', repo.publish(report, now))
153 dir = repo.root + '/2006-04-01/'
154 files = os.listdir(dir)
155 expected_path = dir + files[0]
156 self.assertEqual(expected_path, report['datedir_repo_filepath'])
157 with open(expected_path, 'rb') as fp:
158 self.assertEqual(expected_disk_report, bson.loads(fp.read()))
159
108160
=== modified file 'setup.py'
--- setup.py 2011-10-07 05:52:09 +0000
+++ setup.py 2011-10-13 02:19:24 +0000
@@ -22,7 +22,7 @@
22description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()22description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
2323
24setup(name="oops_datedir_repo",24setup(name="oops_datedir_repo",
25 version="0.0.8",25 version="0.0.9",
26 description="OOPS disk serialisation and repository management.",26 description="OOPS disk serialisation and repository management.",
27 long_description=description,27 long_description=description,
28 maintainer="Launchpad Developers",28 maintainer="Launchpad Developers",

Subscribers

People subscribed via source and target branches

to all changes: