Python OOPS Date-dir repository

Merge lp:~lifeless/python-oops-datedir-repo/0.0.9 into lp:python-oops-datedir-repo

0.0.9
Merge into trunk

Proposed by Robert Collins on 2011-10-10

Status:	Merged
Merged at revision:	18
Proposed branch:	lp:~lifeless/python-oops-datedir-repo/0.0.9
Merge into:	lp:python-oops-datedir-repo
Diff against target:	192 lines (+83/-7) 6 files modified NEWS (+11/-0) README (+1/-1) oops_datedir_repo/__init__.py (+1/-1) oops_datedir_repo/repository.py (+16/-3) oops_datedir_repo/tests/test_repository.py (+53/-1) setup.py (+1/-1)
To merge this branch:	bzr merge lp:~lifeless/python-oops-datedir-repo/0.0.9
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
j.c.sackett (community)		2011-10-10	Approve on 2011-10-13
Review via email: mp+78793@code.launchpad.net

Description of the change

Prep for amqp in oops-tools: Support for using ids supplied to the publisher - useful for oops-amqp which shows an id to users before the disk storage code sees the oops, and permit passing the filename chosen for storage forward, which lets the oops-tools publisher put the filename in to the django model.

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-10-12:

I think it would be clearer if self.inherit_id left the value of report[id] alone::

  tmp_id = 'OOPS-%s' % md5hash
  if not self.inherit or report.get(id) is None:
    report[id] = tmp_id

"return original_id or report['id']" looks like it's meant to return the canonical oops-id, but when original_id is non-None, won't it be the same as report['id']?

Revision history for this message

Robert Collins (lifeless) wrote on 2011-10-12:

So thats a really good question.

We have to replace report['id'] in the copy we write, or else its not visible to someone reading the report later.

I've noticed a defect in the branch - we don't inherit the id with the lognamer based id allocator - I'll fix that (we won't use the combination, but someone may) and push that up.

I had the thing in my head that I needed to eject any existing id before hashing, but you're quite right, we can hash with an existing id in there and its no less unique - though if we are not inheriting the id:
- someone publishes the same oops twice with differing ids, we'd end up writing it twice to disk - big deal
- publishing the same oops with no id in it would behave as we do today
- publishing different oopses with the same id will behave as we do today

I'll have a fiddle when I fix the defect and see how it looks.

Thanks,
Rob

Revision history for this message

Robert Collins (lifeless) wrote on 2011-10-13:

I've pushed up revisions fixing the issue and incorporated aarons feedback into a simpler implementation.

Revision history for this message

j.c.sackett (jcsackett) wrote on 2011-10-13:

With the change that Aaron suggested, I see nothing to block this.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Robert Collins

 === modified file 'NEWS'
 --- NEWS	2011-10-07 06:02:44 +0000
 +++ NEWS	2011-10-13 02:19:24 +0000
@@ -6,6 +6,17 @@
  NEXT
  ----
++0.0.9
++-----
++
++* Permit only generating an OOPS id if the report being published does not
++  already have one. As this can lead to garbage in the system, the default
++  is to always replace it. (Robert Collins)
++
++* Permit storing the filename that the OOPS was written to in the OOPS after
++  publication. This is useful for code building on DateDirRepo as a storage
++  facility for OOPSes. (Robert Collins)
++
 .0.8
  -----
 === modified file 'README'
 --- README	2011-08-16 02:21:20 +0000
 +++ README	2011-10-13 02:19:24 +0000
@@ -82,7 +82,7 @@
  bin/py to get a python interpreter with the dependencies available.
  To run the tests use the runner of your choice, the test suite is
--oops.tests.test_suite.
++oops_datedir_repo.tests.test_suite.
  For instance::
 === modified file 'oops_datedir_repo/__init__.py'
 --- oops_datedir_repo/__init__.py	2011-10-07 05:52:09 +0000
 +++ oops_datedir_repo/__init__.py	2011-10-13 02:19:24 +0000
@@ -25,7 +25,7 @@
  # established at this point, and setup.py will use a version of next-$(revno).
  # If the releaselevel is 'final', then the tarball will be major.minor.micro.
  # Otherwise it is major.minor.micro~$(revno).
--__version__ = (0, 0, 8, 'beta', 0)
++__version__ = (0, 0, 9, 'beta', 0)
  __all__ = [
      'DateDirRepo',
 === modified file 'oops_datedir_repo/repository.py'
 --- oops_datedir_repo/repository.py	2011-10-07 06:02:44 +0000
 +++ oops_datedir_repo/repository.py	2011-10-13 02:19:24 +0000
@@ -37,7 +37,8 @@
  class DateDirRepo:
      """Publish oopses to a date-dir repository."""
--    def __init__(self, error_dir, instance_id=None, serializer=None):
++    def __init__(self, error_dir, instance_id=None, serializer=None,
++            inherit_id=False, stash_path=False):
          """Create a DateDirRepo.
          :param error_dir: The base directory to write OOPSes into. OOPSes are
@@ -52,6 +53,12 @@
          :param serializer: If supplied should be the module (e.g.
              oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes.
              Defaults to using serializer_bson.
++        :param inherit_id: If True, use the oops ID (if present) supplied in
++            the report, rather than always assigning a new one.
++        :param stash_path: If True, the filename that the OOPS was written to
++            is stored in the OOPS report under the key 'datedir_repo_filepath'.
++            It is not stored in the OOPS written to disk, only the in-memory
++            model.
          """
          if instance_id is not None:
              self.log_namer = UniqueFileAllocator(
@@ -65,6 +72,8 @@
          if serializer is None:
              serializer = serializer_bson
          self.serializer = serializer
++        self.inherit_id = inherit_id
++        self.stash_path = stash_path
      def publish(self, report, now=None):
          """Write the report to disk.
@@ -76,20 +85,24 @@
              now = now.astimezone(utc)
          else:
              now = datetime.datetime.now(utc)
--        # Don't mess with the original report:
++        # Don't mess with the original report when changing ids etc.
++        original_report = report
          report = dict(report)
          if self.log_namer is not None:
              oopsid, filename = self.log_namer.newId(now)
          else:
--            report.pop('id', None)
              md5hash = md5(serializer_bson.dumps(report)).hexdigest()
              oopsid = 'OOPS-%s' % md5hash
              prefix = os.path.join(self.root, now.strftime('%Y-%m-%d'))
              if not os.path.isdir(prefix):
                  os.makedirs(prefix)
              filename = os.path.join(prefix, oopsid)
++        if self.inherit_id:
++            oopsid = report.get('id') or oopsid
          report['id'] = oopsid
          self.serializer.write(report, open(filename, 'wb'))
++        if self.stash_path:
++            original_report['datedir_repo_filepath'] = filename
          # Set file permission to: rw-r--r-- (so that reports from
          # umask-restricted services can be gathered by a tool running as
          # another user).
 === modified file 'oops_datedir_repo/tests/test_repository.py'
 --- oops_datedir_repo/tests/test_repository.py	2011-10-07 06:02:44 +0000
 +++ oops_datedir_repo/tests/test_repository.py	2011-10-13 02:19:24 +0000
@@ -81,7 +81,13 @@
      def test_publish_via_hash(self):
          repo = DateDirRepo(self.useFixture(TempDir()).path)
          now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
--        report = {'time': now}
++        # Note the presence of 'id' in the report: this is included in the hash
++        # calculation (because there is no reason not to - we don't promise
++        # that reports only differing by id will be assigned the same id even
++        # when publishing by hash: the hashing is a way to get meaningfully
++        # unique ids, not a way to optimise similar reports [as all reports
++        # should have unique timestamps anyway...]
++        report = {'id': 'ignored', 'time': now}
          # NB: bson output depends on dict order, so the resulting hash can be
          # machine specific. This is fine because its merely a strategy to get
          # unique ids, and after creating the id it is preserved in what is
@@ -105,3 +111,49 @@
          repo.publish(report, now)
          report2 = {'time': now, 'foo': 'bar'}
          repo.publish(report2, now)
++
++    def test_publish_existing_id(self):
++        # oops_amqp wants to publish to a DateDirRepo but already has an id
++        # that the user has been told about.
++        repo = DateDirRepo(self.useFixture(TempDir()).path, inherit_id=True)
++        now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
++        report = {'time': now, 'id': '45'}
++        self.assertEqual('45', repo.publish(dict(report), now))
++        # And to be sure, check the file on disk.
++        dir = repo.root + '/2006-04-01/'
++        files = os.listdir(dir)
++        with open(dir + files[0], 'rb') as fp:
++            self.assertEqual(report, bson.loads(fp.read()))
++
++    def test_publish_existing_id_lognamer(self):
++        # The id reuse and file allocation strategies should be separate.
++        repo = DateDirRepo(self.useFixture(TempDir()).path, 'X',
++            inherit_id=True, stash_path=True)
++        now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
++        report = {'time': now, 'id': '45'}
++        published_report = dict(report)
++        self.assertEqual('45', repo.publish(published_report, now))
++        #  The file on disk should have the supplied id.
++        with open(published_report['datedir_repo_filepath'], 'rb') as fp:
++            self.assertEqual(report, bson.loads(fp.read()))
++
++    def test_publish_add_path(self):
++        # oops_tools wants to publish to both disk (via DateDirRepo) and a
++        # database; the database wants the path of the OOPS report on disk.
++        # Putting it in the (original) report allows the database publisher to
++        # read that out. This is a little ugly, but there are many worse ways
++        # to do it, and no known better ones that don't make the core protocol
++        # too tightly bound to disk publishing.
++        repo = DateDirRepo(self.useFixture(TempDir()).path, stash_path=True,
++            inherit_id=True)
++        now = datetime.datetime(2006, 04, 01, 00, 30, 00, tzinfo=utc)
++        report = {'time': now, 'id': '45'}
++        expected_disk_report = dict(report)
++        self.assertEqual('45', repo.publish(report, now))
++        dir = repo.root + '/2006-04-01/'
++        files = os.listdir(dir)
++        expected_path = dir + files[0]
++        self.assertEqual(expected_path, report['datedir_repo_filepath'])
++        with open(expected_path, 'rb') as fp:
++            self.assertEqual(expected_disk_report, bson.loads(fp.read()))
++
 === modified file 'setup.py'
 --- setup.py	2011-10-07 05:52:09 +0000
 +++ setup.py	2011-10-13 02:19:24 +0000
@@ -22,7 +22,7 @@
  description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
  setup(name="oops_datedir_repo",
--      version="0.0.8",
++      version="0.0.9",
        description="OOPS disk serialisation and repository management.",
        long_description=description,
        maintainer="Launchpad Developers",

Python OOPS Date-dir repository

Merge lp:~lifeless/python-oops-datedir-repo/0.0.9 into lp:python-oops-datedir-repo

Commit message

Description of the change

Preview Diff

Subscribers