Merge lp:~jameinel/bzr/repack-missing-inventories-437003 into lp:bzr/2.0

Proposed by John A Meinel
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 4769
Proposed branch: lp:~jameinel/bzr/repack-missing-inventories-437003
Merge into: lp:bzr/2.0
Diff against target: 163 lines (+118/-5)
3 files modified
NEWS (+4/-0)
bzrlib/repofmt/groupcompress_repo.py (+13/-4)
bzrlib/tests/test_repository.py (+101/-1)
To merge this branch: bzr merge lp:~jameinel/bzr/repack-missing-inventories-437003
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+52544@code.launchpad.net

Commit message

Fix bug #437003. Don't abort an auto-pack if an inventory is in the repository, just not in the subset of packs we are handling.

Description of the change

This change was done against the 2.0 branch, so I'm proposing it first there, and we can 'merge up'. Though I'll be happy to skip and just propose the merge against newer branches if we prefer. (I'm not 100% sure how we decided the NEWS entries should go.)

Anyway, the proposed fix is pretty simple, and probably applicable all the way through the stack. When GCCHKPacker sees a missing inventory, it falls back to self._pack_collection.repo.inventories._index, to see if the missing inventory isn't in the repository at all. Note that I don't use self._pack_collection.repo.inventories because that would include fallbacks.

The test is pretty ugly. It involves a lot of poking at the exact state of the repository to set up the exact condition. I do a little more poking to then trigger a real failure (remove the pack file which contains the A inventory.)

The test also isn't a permutation test. It might apply vs most Pack based repositories, but I can't guarantee it.

I've also proven that the fix applies to bzr 2.4 without a problem (see lp:///~jameinel/bzr/2.4-repack-missing-inventories-437003). So we should be able to land it on any release we want. The only thing that has problems is the NEWS entries.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

77 + def make_branch_with_disjoint_inventory_and_revision(self):

This test is quite hard to understand (because of the bug it's addressing, not the way you wrote it), I wouldn't mind more comments on the individual steps and the associated assertions. (On a second read, I got it, so you may ignore this remark).

113 + else:
114 + self.fail('Could not find pack containing A\'s inventory')

a comment or a clearer message that THIS-SHOULD-NEVER-HAPPEN will do ;)

136 + (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
137 + ) = self.make_branch_with_disjoint_inventory_and_revision()

Nice indentation !

Overall: nice TDD there ;)

I'm ok to land this as is.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2010-12-16 09:01:41 +0000
+++ NEWS 2011-03-08 12:02:22 +0000
@@ -28,6 +28,10 @@
28* Avoid UnicodeDecodeError in ``bzr add`` with multiple files under a non-ascii28* Avoid UnicodeDecodeError in ``bzr add`` with multiple files under a non-ascii
29 path on windows from symlink support addition. (Martin [gz], #686611)29 path on windows from symlink support addition. (Martin [gz], #686611)
3030
31* Avoid spurious ValueErrors when autopacking a subset of the repository,
32 and seeing a revision without its related inventory
33 (John Arbash Meinel, #437003)
34
31Improvements35Improvements
32************36************
3337
3438
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2010-01-21 16:59:21 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2011-03-08 12:02:22 +0000
@@ -1,4 +1,4 @@
1# Copyright (C) 2008, 2009, 2010 Canonical Ltd1# Copyright (C) 2008-2011 Canonical Ltd
2#2#
3# This program is free software; you can redistribute it and/or modify3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by4# it under the terms of the GNU General Public License as published by
@@ -421,9 +421,18 @@
421 inventory_keys = source_vf.keys()421 inventory_keys = source_vf.keys()
422 missing_inventories = set(self.revision_keys).difference(inventory_keys)422 missing_inventories = set(self.revision_keys).difference(inventory_keys)
423 if missing_inventories:423 if missing_inventories:
424 missing_inventories = sorted(missing_inventories)424 # Go back to the original repo, to see if these are really missing
425 raise ValueError('We are missing inventories for revisions: %s'425 # https://bugs.launchpad.net/bzr/+bug/437003
426 % (missing_inventories,))426 # If we are packing a subset of the repo, it is fine to just have
427 # the data in another Pack file, which is not included in this pack
428 # operation.
429 inv_index = self._pack_collection.repo.inventories._index
430 pmap = inv_index.get_parent_map(missing_inventories)
431 really_missing = missing_inventories.difference(pmap)
432 if really_missing:
433 missing_inventories = sorted(really_missing)
434 raise ValueError('We are missing inventories for revisions: %s'
435 % (missing_inventories,))
427 self._copy_stream(source_vf, target_vf, inventory_keys,436 self._copy_stream(source_vf, target_vf, inventory_keys,
428 'inventories', self._get_filtered_inv_stream, 2)437 'inventories', self._get_filtered_inv_stream, 2)
429438
430439
=== modified file 'bzrlib/tests/test_repository.py'
--- bzrlib/tests/test_repository.py 2010-01-21 19:24:26 +0000
+++ bzrlib/tests/test_repository.py 2011-03-08 12:02:22 +0000
@@ -1,4 +1,4 @@
1# Copyright (C) 2006-2010 Canonical Ltd1# Copyright (C) 2006-2011 Canonical Ltd
2#2#
3# This program is free software; you can redistribute it and/or modify3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by4# it under the terms of the GNU General Public License as published by
@@ -1564,6 +1564,106 @@
1564 self.assertTrue(new_pack.signature_index._optimize_for_size)1564 self.assertTrue(new_pack.signature_index._optimize_for_size)
15651565
15661566
1567class TestGCCHKPacker(TestCaseWithTransport):
1568
1569 def make_abc_branch(self):
1570 builder = self.make_branch_builder('source')
1571 builder.start_series()
1572 builder.build_snapshot('A', None, [
1573 ('add', ('', 'root-id', 'directory', None)),
1574 ('add', ('file', 'file-id', 'file', 'content\n')),
1575 ])
1576 builder.build_snapshot('B', ['A'], [
1577 ('add', ('dir', 'dir-id', 'directory', None))])
1578 builder.build_snapshot('C', ['B'], [
1579 ('modify', ('file-id', 'new content\n'))])
1580 builder.finish_series()
1581 return builder.get_branch()
1582
1583 def make_branch_with_disjoint_inventory_and_revision(self):
1584 """a repo with separate packs for a revisions Revision and Inventory.
1585
1586 There will be one pack file that holds the Revision content, and one
1587 for the Inventory content.
1588
1589 :return: (repository,
1590 pack_name_with_rev_A_Revision,
1591 pack_name_with_rev_A_Inventory,
1592 pack_name_with_rev_C_content)
1593 """
1594 b_source = self.make_abc_branch()
1595 b_base = b_source.bzrdir.sprout('base', revision_id='A').open_branch()
1596 b_stacked = b_base.bzrdir.sprout('stacked', stacked=True).open_branch()
1597 b_stacked.lock_write()
1598 self.addCleanup(b_stacked.unlock)
1599 b_stacked.fetch(b_source, 'B')
1600 # Now re-open the stacked repo directly (no fallbacks) so that we can
1601 # fill in the A rev.
1602 repo_not_stacked = b_stacked.bzrdir.open_repository()
1603 repo_not_stacked.lock_write()
1604 self.addCleanup(repo_not_stacked.unlock)
1605 # Now we should have a pack file with A's inventory, but not its
1606 # Revision
1607 self.assertEqual([('A',), ('B',)],
1608 sorted(repo_not_stacked.inventories.keys()))
1609 self.assertEqual([('B',)],
1610 sorted(repo_not_stacked.revisions.keys()))
1611 stacked_pack_names = repo_not_stacked._pack_collection.names()
1612 # We have a couple names here, figure out which has A's inventory
1613 for name in stacked_pack_names:
1614 pack = repo_not_stacked._pack_collection.get_pack_by_name(name)
1615 keys = [n[1] for n in pack.inventory_index.iter_all_entries()]
1616 if ('A',) in keys:
1617 inv_a_pack_name = name
1618 break
1619 else:
1620 self.fail('Could not find pack containing A\'s inventory')
1621 repo_not_stacked.fetch(b_source.repository, 'A')
1622 self.assertEqual([('A',), ('B',)],
1623 sorted(repo_not_stacked.revisions.keys()))
1624 new_pack_names = set(repo_not_stacked._pack_collection.names())
1625 rev_a_pack_names = new_pack_names.difference(stacked_pack_names)
1626 self.assertEqual(1, len(rev_a_pack_names))
1627 rev_a_pack_name = list(rev_a_pack_names)[0]
1628 # Now fetch 'C', so we have a couple pack files to join
1629 repo_not_stacked.fetch(b_source.repository, 'C')
1630 rev_c_pack_names = set(repo_not_stacked._pack_collection.names())
1631 rev_c_pack_names = rev_c_pack_names.difference(new_pack_names)
1632 self.assertEqual(1, len(rev_c_pack_names))
1633 rev_c_pack_name = list(rev_c_pack_names)[0]
1634 return (repo_not_stacked, rev_a_pack_name, inv_a_pack_name,
1635 rev_c_pack_name)
1636
1637 def test_pack_with_distant_inventories(self):
1638 # See https://bugs.launchpad.net/bzr/+bug/437003
1639 # When repacking, it is possible to have an inventory in a different
1640 # pack file than the associated revision. An autopack can then come
1641 # along, and miss that inventory, and complain.
1642 (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
1643 ) = self.make_branch_with_disjoint_inventory_and_revision()
1644 a_pack = repo._pack_collection.get_pack_by_name(rev_a_pack_name)
1645 c_pack = repo._pack_collection.get_pack_by_name(rev_c_pack_name)
1646 packer = groupcompress_repo.GCCHKPacker(repo._pack_collection,
1647 [a_pack, c_pack], '.test-pack')
1648 # This would raise ValueError in bug #437003, but should not raise an
1649 # error once fixed.
1650 packer.pack()
1651
1652 def test_pack_with_missing_inventory(self):
1653 # Similar to test_pack_with_missing_inventory, but this time, we force
1654 # the A inventory to actually be gone from the repository.
1655 (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
1656 ) = self.make_branch_with_disjoint_inventory_and_revision()
1657 inv_a_pack = repo._pack_collection.get_pack_by_name(inv_a_pack_name)
1658 repo._pack_collection._remove_pack_from_memory(inv_a_pack)
1659 packer = groupcompress_repo.GCCHKPacker(repo._pack_collection,
1660 repo._pack_collection.all_packs(), '.test-pack')
1661 e = self.assertRaises(ValueError, packer.pack)
1662 packer.new_pack.abort()
1663 self.assertContainsRe(str(e),
1664 r"We are missing inventories for revisions: .*'A'")
1665
1666
1567class TestCrossFormatPacks(TestCaseWithTransport):1667class TestCrossFormatPacks(TestCaseWithTransport):
15681668
1569 def log_pack(self, hint=None):1669 def log_pack(self, hint=None):

Subscribers

People subscribed via source and target branches