Merge lp:~jameinel/bzr/repack-missing-inventories-437003 into lp:bzr/2.0

Proposed by John A Meinel
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 4769
Proposed branch: lp:~jameinel/bzr/repack-missing-inventories-437003
Merge into: lp:bzr/2.0
Diff against target: 163 lines (+118/-5)
3 files modified
NEWS (+4/-0)
bzrlib/repofmt/groupcompress_repo.py (+13/-4)
bzrlib/tests/test_repository.py (+101/-1)
To merge this branch: bzr merge lp:~jameinel/bzr/repack-missing-inventories-437003
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+52544@code.launchpad.net

Commit message

Fix bug #437003. Don't abort an auto-pack if an inventory is in the repository, just not in the subset of packs we are handling.

Description of the change

This change was done against the 2.0 branch, so I'm proposing it first there, and we can 'merge up'. Though I'll be happy to skip and just propose the merge against newer branches if we prefer. (I'm not 100% sure how we decided the NEWS entries should go.)

Anyway, the proposed fix is pretty simple, and probably applicable all the way through the stack. When GCCHKPacker sees a missing inventory, it falls back to self._pack_collection.repo.inventories._index, to see if the missing inventory isn't in the repository at all. Note that I don't use self._pack_collection.repo.inventories because that would include fallbacks.

The test is pretty ugly. It involves a lot of poking at the exact state of the repository to set up the exact condition. I do a little more poking to then trigger a real failure (remove the pack file which contains the A inventory.)

The test also isn't a permutation test. It might apply vs most Pack based repositories, but I can't guarantee it.

I've also proven that the fix applies to bzr 2.4 without a problem (see lp:///~jameinel/bzr/2.4-repack-missing-inventories-437003). So we should be able to land it on any release we want. The only thing that has problems is the NEWS entries.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

77 + def make_branch_with_disjoint_inventory_and_revision(self):

This test is quite hard to understand (because of the bug it's addressing, not the way you wrote it), I wouldn't mind more comments on the individual steps and the associated assertions. (On a second read, I got it, so you may ignore this remark).

113 + else:
114 + self.fail('Could not find pack containing A\'s inventory')

a comment or a clearer message that THIS-SHOULD-NEVER-HAPPEN will do ;)

136 + (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
137 + ) = self.make_branch_with_disjoint_inventory_and_revision()

Nice indentation !

Overall: nice TDD there ;)

I'm ok to land this as is.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-12-16 09:01:41 +0000
3+++ NEWS 2011-03-08 12:02:22 +0000
4@@ -28,6 +28,10 @@
5 * Avoid UnicodeDecodeError in ``bzr add`` with multiple files under a non-ascii
6 path on windows from symlink support addition. (Martin [gz], #686611)
7
8+* Avoid spurious ValueErrors when autopacking a subset of the repository,
9+ and seeing a revision without its related inventory
10+ (John Arbash Meinel, #437003)
11+
12 Improvements
13 ************
14
15
16=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
17--- bzrlib/repofmt/groupcompress_repo.py 2010-01-21 16:59:21 +0000
18+++ bzrlib/repofmt/groupcompress_repo.py 2011-03-08 12:02:22 +0000
19@@ -1,4 +1,4 @@
20-# Copyright (C) 2008, 2009, 2010 Canonical Ltd
21+# Copyright (C) 2008-2011 Canonical Ltd
22 #
23 # This program is free software; you can redistribute it and/or modify
24 # it under the terms of the GNU General Public License as published by
25@@ -421,9 +421,18 @@
26 inventory_keys = source_vf.keys()
27 missing_inventories = set(self.revision_keys).difference(inventory_keys)
28 if missing_inventories:
29- missing_inventories = sorted(missing_inventories)
30- raise ValueError('We are missing inventories for revisions: %s'
31- % (missing_inventories,))
32+ # Go back to the original repo, to see if these are really missing
33+ # https://bugs.launchpad.net/bzr/+bug/437003
34+ # If we are packing a subset of the repo, it is fine to just have
35+ # the data in another Pack file, which is not included in this pack
36+ # operation.
37+ inv_index = self._pack_collection.repo.inventories._index
38+ pmap = inv_index.get_parent_map(missing_inventories)
39+ really_missing = missing_inventories.difference(pmap)
40+ if really_missing:
41+ missing_inventories = sorted(really_missing)
42+ raise ValueError('We are missing inventories for revisions: %s'
43+ % (missing_inventories,))
44 self._copy_stream(source_vf, target_vf, inventory_keys,
45 'inventories', self._get_filtered_inv_stream, 2)
46
47
48=== modified file 'bzrlib/tests/test_repository.py'
49--- bzrlib/tests/test_repository.py 2010-01-21 19:24:26 +0000
50+++ bzrlib/tests/test_repository.py 2011-03-08 12:02:22 +0000
51@@ -1,4 +1,4 @@
52-# Copyright (C) 2006-2010 Canonical Ltd
53+# Copyright (C) 2006-2011 Canonical Ltd
54 #
55 # This program is free software; you can redistribute it and/or modify
56 # it under the terms of the GNU General Public License as published by
57@@ -1564,6 +1564,106 @@
58 self.assertTrue(new_pack.signature_index._optimize_for_size)
59
60
61+class TestGCCHKPacker(TestCaseWithTransport):
62+
63+ def make_abc_branch(self):
64+ builder = self.make_branch_builder('source')
65+ builder.start_series()
66+ builder.build_snapshot('A', None, [
67+ ('add', ('', 'root-id', 'directory', None)),
68+ ('add', ('file', 'file-id', 'file', 'content\n')),
69+ ])
70+ builder.build_snapshot('B', ['A'], [
71+ ('add', ('dir', 'dir-id', 'directory', None))])
72+ builder.build_snapshot('C', ['B'], [
73+ ('modify', ('file-id', 'new content\n'))])
74+ builder.finish_series()
75+ return builder.get_branch()
76+
77+ def make_branch_with_disjoint_inventory_and_revision(self):
78+ """a repo with separate packs for a revisions Revision and Inventory.
79+
80+ There will be one pack file that holds the Revision content, and one
81+ for the Inventory content.
82+
83+ :return: (repository,
84+ pack_name_with_rev_A_Revision,
85+ pack_name_with_rev_A_Inventory,
86+ pack_name_with_rev_C_content)
87+ """
88+ b_source = self.make_abc_branch()
89+ b_base = b_source.bzrdir.sprout('base', revision_id='A').open_branch()
90+ b_stacked = b_base.bzrdir.sprout('stacked', stacked=True).open_branch()
91+ b_stacked.lock_write()
92+ self.addCleanup(b_stacked.unlock)
93+ b_stacked.fetch(b_source, 'B')
94+ # Now re-open the stacked repo directly (no fallbacks) so that we can
95+ # fill in the A rev.
96+ repo_not_stacked = b_stacked.bzrdir.open_repository()
97+ repo_not_stacked.lock_write()
98+ self.addCleanup(repo_not_stacked.unlock)
99+ # Now we should have a pack file with A's inventory, but not its
100+ # Revision
101+ self.assertEqual([('A',), ('B',)],
102+ sorted(repo_not_stacked.inventories.keys()))
103+ self.assertEqual([('B',)],
104+ sorted(repo_not_stacked.revisions.keys()))
105+ stacked_pack_names = repo_not_stacked._pack_collection.names()
106+ # We have a couple names here, figure out which has A's inventory
107+ for name in stacked_pack_names:
108+ pack = repo_not_stacked._pack_collection.get_pack_by_name(name)
109+ keys = [n[1] for n in pack.inventory_index.iter_all_entries()]
110+ if ('A',) in keys:
111+ inv_a_pack_name = name
112+ break
113+ else:
114+ self.fail('Could not find pack containing A\'s inventory')
115+ repo_not_stacked.fetch(b_source.repository, 'A')
116+ self.assertEqual([('A',), ('B',)],
117+ sorted(repo_not_stacked.revisions.keys()))
118+ new_pack_names = set(repo_not_stacked._pack_collection.names())
119+ rev_a_pack_names = new_pack_names.difference(stacked_pack_names)
120+ self.assertEqual(1, len(rev_a_pack_names))
121+ rev_a_pack_name = list(rev_a_pack_names)[0]
122+ # Now fetch 'C', so we have a couple pack files to join
123+ repo_not_stacked.fetch(b_source.repository, 'C')
124+ rev_c_pack_names = set(repo_not_stacked._pack_collection.names())
125+ rev_c_pack_names = rev_c_pack_names.difference(new_pack_names)
126+ self.assertEqual(1, len(rev_c_pack_names))
127+ rev_c_pack_name = list(rev_c_pack_names)[0]
128+ return (repo_not_stacked, rev_a_pack_name, inv_a_pack_name,
129+ rev_c_pack_name)
130+
131+ def test_pack_with_distant_inventories(self):
132+ # See https://bugs.launchpad.net/bzr/+bug/437003
133+ # When repacking, it is possible to have an inventory in a different
134+ # pack file than the associated revision. An autopack can then come
135+ # along, and miss that inventory, and complain.
136+ (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
137+ ) = self.make_branch_with_disjoint_inventory_and_revision()
138+ a_pack = repo._pack_collection.get_pack_by_name(rev_a_pack_name)
139+ c_pack = repo._pack_collection.get_pack_by_name(rev_c_pack_name)
140+ packer = groupcompress_repo.GCCHKPacker(repo._pack_collection,
141+ [a_pack, c_pack], '.test-pack')
142+ # This would raise ValueError in bug #437003, but should not raise an
143+ # error once fixed.
144+ packer.pack()
145+
146+ def test_pack_with_missing_inventory(self):
147+ # Similar to test_pack_with_missing_inventory, but this time, we force
148+ # the A inventory to actually be gone from the repository.
149+ (repo, rev_a_pack_name, inv_a_pack_name, rev_c_pack_name
150+ ) = self.make_branch_with_disjoint_inventory_and_revision()
151+ inv_a_pack = repo._pack_collection.get_pack_by_name(inv_a_pack_name)
152+ repo._pack_collection._remove_pack_from_memory(inv_a_pack)
153+ packer = groupcompress_repo.GCCHKPacker(repo._pack_collection,
154+ repo._pack_collection.all_packs(), '.test-pack')
155+ e = self.assertRaises(ValueError, packer.pack)
156+ packer.new_pack.abort()
157+ self.assertContainsRe(str(e),
158+ r"We are missing inventories for revisions: .*'A'")
159+
160+
161 class TestCrossFormatPacks(TestCaseWithTransport):
162
163 def log_pack(self, hint=None):

Subscribers

People subscribed via source and target branches