Merge lp:~jameinel/bzr/1.19-bug-402778 into lp:~bzr/bzr/trunk-old
- 1.19-bug-402778
- Merge into trunk-old
Status: | Merged | ||||||||
---|---|---|---|---|---|---|---|---|---|
Merged at revision: | not available | ||||||||
Proposed branch: | lp:~jameinel/bzr/1.19-bug-402778 | ||||||||
Merge into: | lp:~bzr/bzr/trunk-old | ||||||||
Diff against target: | 351 lines | ||||||||
To merge this branch: | bzr merge lp:~jameinel/bzr/1.19-bug-402778 | ||||||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Approve | ||
Review via email: mp+10053@code.launchpad.net |
This proposal supersedes a proposal from 2009-08-11.
Commit message
Description of the change
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal | # |
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Tue, 2009-08-11 at 17:35 +0000, John A Meinel wrote:
> @@ -3816,10 +3818,13 @@
> parent_
> parent_
> parent_map = self.source.
> - for parent_tree in self.source.
> - basis_id, delta = self._get_
> + # we iterate over parent_map and not parent_ids because we don't
> + # want to try copying any revision which is a ghost
> + for parent_tree in self.source.
This might be slightly cheaper as
+ for parent_tree in self.source.
I'd prefer to see separate tests for ghosts and inventory validity.
so tweak:, however that is expressed...
review +1
John A Meinel (jameinel) wrote : | # |
This is a follow up from Robert's review.
(original)
1) This fixes InterDifferingS
2) Also fixes IDS to ignore ghost parents when filling in parent inventories.
(updated)
1) Changes test_pack_
2) Adds a conversion to 2a format to the inter-repository tests. Which shows that even with the original fix, things were still broken for 2a formats.
3) Adds a direct test that "Repository.pack()" preserves all inventory texts, even when we don't have a revision associated with it.
I'm pretty sure this closes 2+ bugs marked as critical for 2.0.
Robert Collins (lifeless) wrote : | # |
review: +1
The use of vf.keys() in the gc pack code - might want to check that
reconcile() doesn't use revision selectors - as in that case you will
copy too many inventories [if all tests are passing this probably isn't
the case]
-Rob
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> Review: Approve
> review: +1
>
> The use of vf.keys() in the gc pack code - might want to check that
> reconcile() doesn't use revision selectors - as in that case you will
> copy too many inventories [if all tests are passing this probably isn't
> the case]
>
> -Rob
>
Reconcile has its own implementation of _copy_inventory
def _copy_inventory
source_vf, target_vf = self._build_
if source_vf.keys() != self.revision_keys:
Which is saying "I'm going to prune down to the minimum, and if the
source's number of keys is different than the minimum, then things have
changed."
This means that reconcile on stacked branches is probably slightly wrong.
But anyway, this change obviously doesn't effect reconcile :).
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkq
6woAoI/
=T66k
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
On Wed, 2009-08-12 at 20:48 +0000, John A Meinel wrote:
> Which is saying "I'm going to prune down to the minimum, and if the
> source's number of keys is different than the minimum, then things have
> changed."
>
> This means that reconcile on stacked branches is probably slightly wrong.
>
> But anyway, this change obviously doesn't effect reconcile :).
Ok. So reconcile needs to keep the parent inventories too. I smell
another branch coming up :).
-Rob
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Wed, 2009-08-12 at 20:48 +0000, John A Meinel wrote:
>
>> Which is saying "I'm going to prune down to the minimum, and if the
>> source's number of keys is different than the minimum, then things have
>> changed."
>>
>> This means that reconcile on stacked branches is probably slightly wrong.
>>
>> But anyway, this change obviously doesn't effect reconcile :).
>
> Ok. So reconcile needs to keep the parent inventories too. I smell
> another branch coming up :).
>
> -Rob
>
Note that in that case --1.9 format repos are probably also vulnerable,
as I'm pretty sure Packer uses "self._
doesn't try to also include parent inventories.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkq
jMcAnAvAjV202Vi
=ExZL
-----END PGP SIGNATURE-----
Robert Collins (lifeless) wrote : | # |
On Wed, 2009-08-12 at 21:15 +0000, John A Meinel wrote:
>
> > Ok. So reconcile needs to keep the parent inventories too. I smell
> > another branch coming up :).
> >
> > -Rob
> >
>
> Note that in that case --1.9 format repos are probably also
> vulnerable,
> as I'm pretty sure Packer uses "self._
> doesn't try to also include parent inventories.
exactly :(. Want me to file the bug?
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-08-12 02:59:14 +0000 |
3 | +++ NEWS 2009-08-12 21:35:09 +0000 |
4 | @@ -27,6 +27,11 @@ |
5 | * Further tweaks to handling of ``bzr add`` messages about ignored files. |
6 | (Jason Spashett, #76616) |
7 | |
8 | +* Properly handle fetching into a stacked branch while converting the |
9 | + data, especially when there are also ghosts. The code was filling in |
10 | + parent inventories incorrectly, and also not handling when one of the |
11 | + parents was a ghost. (John Arbash Meinel, #402778, #412198) |
12 | + |
13 | Improvements |
14 | ************ |
15 | |
16 | |
17 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' |
18 | --- bzrlib/repofmt/groupcompress_repo.py 2009-08-04 04:36:34 +0000 |
19 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-08-12 21:35:09 +0000 |
20 | @@ -410,7 +410,18 @@ |
21 | |
22 | def _copy_inventory_texts(self): |
23 | source_vf, target_vf = self._build_vfs('inventory', True, True) |
24 | - self._copy_stream(source_vf, target_vf, self.revision_keys, |
25 | + # It is not sufficient to just use self.revision_keys, as stacked |
26 | + # repositories can have more inventories than they have revisions. |
27 | + # One alternative would be to do something with |
28 | + # get_parent_map(self.revision_keys), but that shouldn't be any faster |
29 | + # than this. |
30 | + inventory_keys = source_vf.keys() |
31 | + missing_inventories = set(self.revision_keys).difference(inventory_keys) |
32 | + if missing_inventories: |
33 | + missing_inventories = sorted(missing_inventories) |
34 | + raise ValueError('We are missing inventories for revisions: %s' |
35 | + % (missing_inventories,)) |
36 | + self._copy_stream(source_vf, target_vf, inventory_keys, |
37 | 'inventories', self._get_filtered_inv_stream, 2) |
38 | |
39 | def _copy_chk_texts(self): |
40 | @@ -1110,7 +1121,7 @@ |
41 | |
42 | class RepositoryFormat2a(RepositoryFormatCHK2): |
43 | """A CHK repository that uses the bencode revision serializer. |
44 | - |
45 | + |
46 | This is the same as RepositoryFormatCHK2 but with a public name. |
47 | """ |
48 | |
49 | |
50 | === modified file 'bzrlib/repository.py' |
51 | --- bzrlib/repository.py 2009-08-11 02:45:36 +0000 |
52 | +++ bzrlib/repository.py 2009-08-12 21:35:09 +0000 |
53 | @@ -3812,6 +3812,8 @@ |
54 | # for the new revisions that we are about to insert. We do this |
55 | # before adding the revisions so that no revision is added until |
56 | # all the inventories it may depend on are added. |
57 | + # Note that this is overzealous, as we may have fetched these in an |
58 | + # earlier batch. |
59 | parent_ids = set() |
60 | revision_ids = set() |
61 | for revision in pending_revisions: |
62 | @@ -3820,10 +3822,13 @@ |
63 | parent_ids.difference_update(revision_ids) |
64 | parent_ids.discard(_mod_revision.NULL_REVISION) |
65 | parent_map = self.source.get_parent_map(parent_ids) |
66 | - for parent_tree in self.source.revision_trees(parent_ids): |
67 | - basis_id, delta = self._get_delta_for_revision(tree, parent_ids, basis_id, cache) |
68 | + # we iterate over parent_map and not parent_ids because we don't |
69 | + # want to try copying any revision which is a ghost |
70 | + for parent_tree in self.source.revision_trees(parent_map): |
71 | current_revision_id = parent_tree.get_revision_id() |
72 | parents_parents = parent_map[current_revision_id] |
73 | + basis_id, delta = self._get_delta_for_revision(parent_tree, |
74 | + parents_parents, basis_id, cache) |
75 | self.target.add_inventory_by_delta( |
76 | basis_id, delta, current_revision_id, parents_parents) |
77 | # insert signatures and revisions |
78 | |
79 | === modified file 'bzrlib/tests/__init__.py' |
80 | --- bzrlib/tests/__init__.py 2009-08-04 11:40:59 +0000 |
81 | +++ bzrlib/tests/__init__.py 2009-08-12 21:35:09 +0000 |
82 | @@ -3386,6 +3386,7 @@ |
83 | 'bzrlib.tests.per_lock', |
84 | 'bzrlib.tests.per_transport', |
85 | 'bzrlib.tests.per_tree', |
86 | + 'bzrlib.tests.per_pack_repository', |
87 | 'bzrlib.tests.per_repository', |
88 | 'bzrlib.tests.per_repository_chk', |
89 | 'bzrlib.tests.per_repository_reference', |
90 | @@ -3480,7 +3481,6 @@ |
91 | 'bzrlib.tests.test_osutils', |
92 | 'bzrlib.tests.test_osutils_encodings', |
93 | 'bzrlib.tests.test_pack', |
94 | - 'bzrlib.tests.test_pack_repository', |
95 | 'bzrlib.tests.test_patch', |
96 | 'bzrlib.tests.test_patches', |
97 | 'bzrlib.tests.test_permissions', |
98 | |
99 | === modified file 'bzrlib/tests/per_interrepository/__init__.py' |
100 | --- bzrlib/tests/per_interrepository/__init__.py 2009-07-10 06:46:10 +0000 |
101 | +++ bzrlib/tests/per_interrepository/__init__.py 2009-08-12 21:35:09 +0000 |
102 | @@ -68,7 +68,12 @@ |
103 | |
104 | def default_test_list(): |
105 | """Generate the default list of interrepo permutations to test.""" |
106 | - from bzrlib.repofmt import knitrepo, pack_repo, weaverepo |
107 | + from bzrlib.repofmt import ( |
108 | + groupcompress_repo, |
109 | + knitrepo, |
110 | + pack_repo, |
111 | + weaverepo, |
112 | + ) |
113 | result = [] |
114 | # test the default InterRepository between format 6 and the current |
115 | # default format. |
116 | @@ -111,6 +116,9 @@ |
117 | result.append((InterDifferingSerializer, |
118 | pack_repo.RepositoryFormatKnitPack1(), |
119 | pack_repo.RepositoryFormatKnitPack6RichRoot())) |
120 | + result.append((InterDifferingSerializer, |
121 | + pack_repo.RepositoryFormatKnitPack6RichRoot(), |
122 | + groupcompress_repo.RepositoryFormat2a())) |
123 | return result |
124 | |
125 | |
126 | |
127 | === modified file 'bzrlib/tests/per_interrepository/test_fetch.py' |
128 | --- bzrlib/tests/per_interrepository/test_fetch.py 2009-07-10 06:46:10 +0000 |
129 | +++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-12 21:35:09 +0000 |
130 | @@ -132,17 +132,23 @@ |
131 | altered by all revisions it contains, which means that it needs both |
132 | the inventory for any revision it has, and the inventories of all that |
133 | revision's parents. |
134 | + |
135 | + However, we should also skip any revisions which are ghosts in the |
136 | + parents. |
137 | """ |
138 | - to_repo = self.make_to_repository('to') |
139 | - if not to_repo._format.supports_external_lookups: |
140 | + if not self.repository_format_to.supports_external_lookups: |
141 | raise TestNotApplicable("Need stacking support in the target.") |
142 | builder = self.make_branch_builder('branch') |
143 | builder.start_series() |
144 | builder.build_snapshot('base', None, [ |
145 | - ('add', ('', 'root-id', 'directory', ''))]) |
146 | - builder.build_snapshot('left', ['base'], []) |
147 | - builder.build_snapshot('right', ['base'], []) |
148 | - builder.build_snapshot('merge', ['left', 'right'], []) |
149 | + ('add', ('', 'root-id', 'directory', '')), |
150 | + ('add', ('file', 'file-id', 'file', 'content\n'))]) |
151 | + builder.build_snapshot('left', ['base'], [ |
152 | + ('modify', ('file-id', 'left content\n'))]) |
153 | + builder.build_snapshot('right', ['base'], [ |
154 | + ('modify', ('file-id', 'right content\n'))]) |
155 | + builder.build_snapshot('merge', ['left', 'right'], [ |
156 | + ('modify', ('file-id', 'left and right content\n'))]) |
157 | builder.finish_series() |
158 | branch = builder.get_branch() |
159 | repo = self.make_to_repository('trunk') |
160 | @@ -161,6 +167,57 @@ |
161 | self.assertEqual( |
162 | set([('left',), ('right',), ('merge',)]), |
163 | unstacked_repo.inventories.keys()) |
164 | + # And the basis inventories have been copied correctly |
165 | + trunk.lock_read() |
166 | + self.addCleanup(trunk.unlock) |
167 | + left_tree, right_tree = trunk.repository.revision_trees( |
168 | + ['left', 'right']) |
169 | + stacked_branch.lock_read() |
170 | + self.addCleanup(stacked_branch.unlock) |
171 | + (stacked_left_tree, |
172 | + stacked_right_tree) = stacked_branch.repository.revision_trees( |
173 | + ['left', 'right']) |
174 | + self.assertEqual(left_tree.inventory, stacked_left_tree.inventory) |
175 | + self.assertEqual(right_tree.inventory, stacked_right_tree.inventory) |
176 | + |
177 | + def test_fetch_across_stacking_boundary_ignores_ghost(self): |
178 | + if not self.repository_format_to.supports_external_lookups: |
179 | + raise TestNotApplicable("Need stacking support in the target.") |
180 | + to_repo = self.make_to_repository('to') |
181 | + builder = self.make_branch_builder('branch') |
182 | + builder.start_series() |
183 | + builder.build_snapshot('base', None, [ |
184 | + ('add', ('', 'root-id', 'directory', '')), |
185 | + ('add', ('file', 'file-id', 'file', 'content\n'))]) |
186 | + builder.build_snapshot('second', ['base'], [ |
187 | + ('modify', ('file-id', 'second content\n'))]) |
188 | + builder.build_snapshot('third', ['second', 'ghost'], [ |
189 | + ('modify', ('file-id', 'third content\n'))]) |
190 | + builder.finish_series() |
191 | + branch = builder.get_branch() |
192 | + repo = self.make_to_repository('trunk') |
193 | + trunk = repo.bzrdir.create_branch() |
194 | + trunk.repository.fetch(branch.repository, 'second') |
195 | + repo = self.make_to_repository('stacked') |
196 | + stacked_branch = repo.bzrdir.create_branch() |
197 | + stacked_branch.set_stacked_on_url(trunk.base) |
198 | + stacked_branch.repository.fetch(branch.repository, 'third') |
199 | + unstacked_repo = stacked_branch.bzrdir.open_repository() |
200 | + unstacked_repo.lock_read() |
201 | + self.addCleanup(unstacked_repo.unlock) |
202 | + self.assertFalse(unstacked_repo.has_revision('second')) |
203 | + self.assertFalse(unstacked_repo.has_revision('ghost')) |
204 | + self.assertEqual( |
205 | + set([('second',), ('third',)]), |
206 | + unstacked_repo.inventories.keys()) |
207 | + # And the basis inventories have been copied correctly |
208 | + trunk.lock_read() |
209 | + self.addCleanup(trunk.unlock) |
210 | + second_tree = trunk.repository.revision_tree('second') |
211 | + stacked_branch.lock_read() |
212 | + self.addCleanup(stacked_branch.unlock) |
213 | + stacked_second_tree = stacked_branch.repository.revision_tree('second') |
214 | + self.assertEqual(second_tree.inventory, stacked_second_tree.inventory) |
215 | |
216 | def test_fetch_missing_basis_text(self): |
217 | """If fetching a delta, we should die if a basis is not present.""" |
218 | @@ -276,8 +333,12 @@ |
219 | to_repo = self.make_to_repository('to') |
220 | to_repo.fetch(from_tree.branch.repository) |
221 | recorded_inv_sha1 = to_repo.get_inventory_sha1('foo-id') |
222 | - xml = to_repo.get_inventory_xml('foo-id') |
223 | - computed_inv_sha1 = osutils.sha_string(xml) |
224 | + to_repo.lock_read() |
225 | + self.addCleanup(to_repo.unlock) |
226 | + stream = to_repo.inventories.get_record_stream([('foo-id',)], |
227 | + 'unordered', True) |
228 | + bytes = stream.next().get_bytes_as('fulltext') |
229 | + computed_inv_sha1 = osutils.sha_string(bytes) |
230 | self.assertEqual(computed_inv_sha1, recorded_inv_sha1) |
231 | |
232 | |
233 | |
234 | === renamed file 'bzrlib/tests/test_pack_repository.py' => 'bzrlib/tests/per_pack_repository.py' |
235 | --- bzrlib/tests/test_pack_repository.py 2009-08-11 05:26:57 +0000 |
236 | +++ bzrlib/tests/per_pack_repository.py 2009-08-12 21:35:09 +0000 |
237 | @@ -1,4 +1,4 @@ |
238 | -# Copyright (C) 2008 Canonical Ltd |
239 | +# Copyright (C) 2008, 2009 Canonical Ltd |
240 | # |
241 | # This program is free software; you can redistribute it and/or modify |
242 | # it under the terms of the GNU General Public License as published by |
243 | @@ -42,7 +42,7 @@ |
244 | pack_repo, |
245 | groupcompress_repo, |
246 | ) |
247 | -from bzrlib.repofmt.groupcompress_repo import RepositoryFormatCHK1 |
248 | +from bzrlib.repofmt.groupcompress_repo import RepositoryFormat2a |
249 | from bzrlib.smart import ( |
250 | client, |
251 | server, |
252 | @@ -84,7 +84,7 @@ |
253 | """Packs reuse deltas.""" |
254 | format = self.get_format() |
255 | repo = self.make_repository('.', format=format) |
256 | - if isinstance(format.repository_format, RepositoryFormatCHK1): |
257 | + if isinstance(format.repository_format, RepositoryFormat2a): |
258 | # TODO: This is currently a workaround. CHK format repositories |
259 | # ignore the 'deltas' flag, but during conversions, we can't |
260 | # do unordered delta fetches. Remove this clause once we |
261 | @@ -295,6 +295,41 @@ |
262 | self.assertEqual(1, len(list(index.iter_all_entries()))) |
263 | self.assertEqual(2, len(tree.branch.repository.all_revision_ids())) |
264 | |
265 | + def test_pack_preserves_all_inventories(self): |
266 | + # This is related to bug: |
267 | + # https://bugs.launchpad.net/bzr/+bug/412198 |
268 | + # Stacked repositories need to keep the inventory for parents, even |
269 | + # after a pack operation. However, it is harder to test that, then just |
270 | + # test that all inventory texts are preserved. |
271 | + format = self.get_format() |
272 | + builder = self.make_branch_builder('source', format=format) |
273 | + builder.start_series() |
274 | + builder.build_snapshot('A-id', None, [ |
275 | + ('add', ('', 'root-id', 'directory', None))]) |
276 | + builder.build_snapshot('B-id', None, [ |
277 | + ('add', ('file', 'file-id', 'file', 'B content\n'))]) |
278 | + builder.build_snapshot('C-id', None, [ |
279 | + ('modify', ('file-id', 'C content\n'))]) |
280 | + builder.finish_series() |
281 | + b = builder.get_branch() |
282 | + b.lock_read() |
283 | + self.addCleanup(b.unlock) |
284 | + repo = self.make_repository('repo', shared=True, format=format) |
285 | + repo.lock_write() |
286 | + self.addCleanup(repo.unlock) |
287 | + repo.fetch(b.repository, revision_id='B-id') |
288 | + inv = b.repository.iter_inventories(['C-id']).next() |
289 | + repo.start_write_group() |
290 | + repo.add_inventory('C-id', inv, ['B-id']) |
291 | + repo.commit_write_group() |
292 | + self.assertEqual([('A-id',), ('B-id',), ('C-id',)], |
293 | + sorted(repo.inventories.keys())) |
294 | + repo.pack() |
295 | + self.assertEqual([('A-id',), ('B-id',), ('C-id',)], |
296 | + sorted(repo.inventories.keys())) |
297 | + # Content should be preserved as well |
298 | + self.assertEqual(inv, repo.iter_inventories(['C-id']).next()) |
299 | + |
300 | def test_pack_layout(self): |
301 | # Test that the ordering of revisions in pack repositories is |
302 | # tip->ancestor |
303 | @@ -311,7 +346,7 @@ |
304 | # revision access tends to be tip->ancestor, so ordering that way on |
305 | # disk is a good idea. |
306 | for _1, key, val, refs in pack.revision_index.iter_all_entries(): |
307 | - if type(format.repository_format) is RepositoryFormatCHK1: |
308 | + if type(format.repository_format) is RepositoryFormat2a: |
309 | # group_start, group_len, internal_start, internal_len |
310 | pos = map(int, val.split()) |
311 | else: |
312 | @@ -589,7 +624,7 @@ |
313 | |
314 | def make_write_ready_repo(self): |
315 | format = self.get_format() |
316 | - if isinstance(format.repository_format, RepositoryFormatCHK1): |
317 | + if isinstance(format.repository_format, RepositoryFormat2a): |
318 | raise TestNotApplicable("No missing compression parents") |
319 | repo = self.make_repository('.', format=format) |
320 | repo.lock_write() |
321 | @@ -808,7 +843,7 @@ |
322 | matching_format_name = 'pack-0.92-subtree' |
323 | else: |
324 | if repo._format.supports_chks: |
325 | - matching_format_name = 'development6-rich-root' |
326 | + matching_format_name = '2a' |
327 | else: |
328 | matching_format_name = 'rich-root-pack' |
329 | mismatching_format_name = 'pack-0.92' |
330 | @@ -841,7 +876,7 @@ |
331 | else: |
332 | if repo.supports_rich_root(): |
333 | if repo._format.supports_chks: |
334 | - matching_format_name = 'development6-rich-root' |
335 | + matching_format_name = '2a' |
336 | else: |
337 | matching_format_name = 'rich-root-pack' |
338 | mismatching_format_name = 'pack-0.92-subtree' |
339 | @@ -1062,9 +1097,9 @@ |
340 | "(bzr 1.9)\n", |
341 | format_supports_external_lookups=True, |
342 | index_class=BTreeGraphIndex), |
343 | - dict(format_name='development6-rich-root', |
344 | - format_string='Bazaar development format - group compression ' |
345 | - 'and chk inventory (needs bzr.dev from 1.14)\n', |
346 | + dict(format_name='2a', |
347 | + format_string="Bazaar repository format 2a " |
348 | + "(needs bzr 1.16 or later)\n", |
349 | format_supports_external_lookups=True, |
350 | index_class=BTreeGraphIndex), |
351 | ] |
This fixes 2 bugs found as part of bug #402778
InterDifferingS erializer was failing when the target was a stacked branch. It had 2 primary bugs
1) The inner loop was using the wrong variable, which meant that it was generating a delta for one revision, and then inserting it as the delta for a different revision.
2) It was trying to fill in parent inventories for all parents, even if some parents where ghosts in the source.
I updated the test that Andrew added. I had to add content to the revisions, because otherwise you can't tell whether the added inventories are valid. And I added a ghost parent, to make sure that we don't try to fill in ghosts that we don't have.