Merge lp:~jameinel/bzr/1.19-bug-402778 into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.19-bug-402778
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 351 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.19-bug-402778
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
Review via email: mp+10053@code.launchpad.net

This proposal supersedes a proposal from 2009-08-11.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

This fixes 2 bugs found as part of bug #402778

InterDifferingSerializer was failing when the target was a stacked branch. It had 2 primary bugs

1) The inner loop was using the wrong variable, which meant that it was generating a delta for one revision, and then inserting it as the delta for a different revision.

2) It was trying to fill in parent inventories for all parents, even if some parents where ghosts in the source.

I updated the test that Andrew added. I had to add content to the revisions, because otherwise you can't tell whether the added inventories are valid. And I added a ghost parent, to make sure that we don't try to fill in ghosts that we don't have.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Tue, 2009-08-11 at 17:35 +0000, John A Meinel wrote:
> @@ -3816,10 +3818,13 @@
> parent_ids.difference_update(revision_ids)
> parent_ids.discard(_mod_revision.NULL_REVISION)
> parent_map = self.source.get_parent_map(parent_ids)
> - for parent_tree in self.source.revision_trees(parent_ids):
> - basis_id, delta = self._get_delta_for_revision(tree, parent_ids, basis_id, cache)
> + # we iterate over parent_map and not parent_ids because we don't
> + # want to try copying any revision which is a ghost
> + for parent_tree in self.source.revision_trees(parent_map.keys()):

This might be slightly cheaper as
+ for parent_tree in self.source.revision_trees(parent_map):

I'd prefer to see separate tests for ghosts and inventory validity.

so tweak:, however that is expressed...
 review +1

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

This is a follow up from Robert's review.

(original)
1) This fixes InterDifferingSerializer to properly transmit parent inventories for stacked branches. (Rather than passing the last-tree inventory and claim it was the parent inventory.)
2) Also fixes IDS to ignore ghost parents when filling in parent inventories.

(updated)
1) Changes test_pack_repository => per_pack_repository. Andrew mentioned this, I liked the change.

2) Adds a conversion to 2a format to the inter-repository tests. Which shows that even with the original fix, things were still broken for 2a formats.

3) Adds a direct test that "Repository.pack()" preserves all inventory texts, even when we don't have a revision associated with it.

I'm pretty sure this closes 2+ bugs marked as critical for 2.0.

Revision history for this message
Robert Collins (lifeless) wrote :

 review: +1

The use of vf.keys() in the gc pack code - might want to check that
reconcile() doesn't use revision selectors - as in that case you will
copy too many inventories [if all tests are passing this probably isn't
the case]

-Rob

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Approve
> review: +1
>
> The use of vf.keys() in the gc pack code - might want to check that
> reconcile() doesn't use revision selectors - as in that case you will
> copy too many inventories [if all tests are passing this probably isn't
> the case]
>
> -Rob
>

Reconcile has its own implementation of _copy_inventory_texts:

    def _copy_inventory_texts(self):
        source_vf, target_vf = self._build_vfs('inventory', True, True)
        self._copy_stream(source_vf, target_vf, self.revision_keys,
                          'inventories', self._get_filtered_inv_stream, 2)
        if source_vf.keys() != self.revision_keys:
            self._data_changed = True

Which is saying "I'm going to prune down to the minimum, and if the
source's number of keys is different than the minimum, then things have
changed."

This means that reconcile on stacked branches is probably slightly wrong.

But anyway, this change obviously doesn't effect reconcile :).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqDKfUACgkQJdeBCYSNAAOuKQCeKD4PtKpnw6Ro0AZaWv/w0LPp
6woAoI/3kDA39hojP67PXDb715bJXIKU
=T66k
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2009-08-12 at 20:48 +0000, John A Meinel wrote:

> Which is saying "I'm going to prune down to the minimum, and if the
> source's number of keys is different than the minimum, then things have
> changed."
>
> This means that reconcile on stacked branches is probably slightly wrong.
>
> But anyway, this change obviously doesn't effect reconcile :).

Ok. So reconcile needs to keep the parent inventories too. I smell
another branch coming up :).

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Wed, 2009-08-12 at 20:48 +0000, John A Meinel wrote:
>
>> Which is saying "I'm going to prune down to the minimum, and if the
>> source's number of keys is different than the minimum, then things have
>> changed."
>>
>> This means that reconcile on stacked branches is probably slightly wrong.
>>
>> But anyway, this change obviously doesn't effect reconcile :).
>
> Ok. So reconcile needs to keep the parent inventories too. I smell
> another branch coming up :).
>
> -Rob
>

Note that in that case --1.9 format repos are probably also vulnerable,
as I'm pretty sure Packer uses "self._revision_keys" directly, and
doesn't try to also include parent inventories.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqDMI8ACgkQJdeBCYSNAANyHQCeJ4DVIM/tuX1KbvnHDxMwQJrU
jMcAnAvAjV202VihITKOng43e4tYHqiz
=ExZL
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2009-08-12 at 21:15 +0000, John A Meinel wrote:
>
> > Ok. So reconcile needs to keep the parent inventories too. I smell
> > another branch coming up :).
> >
> > -Rob
> >
>
> Note that in that case --1.9 format repos are probably also
> vulnerable,
> as I'm pretty sure Packer uses "self._revision_keys" directly, and
> doesn't try to also include parent inventories.

exactly :(. Want me to file the bug?

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-12 02:59:14 +0000
3+++ NEWS 2009-08-12 21:35:09 +0000
4@@ -27,6 +27,11 @@
5 * Further tweaks to handling of ``bzr add`` messages about ignored files.
6 (Jason Spashett, #76616)
7
8+* Properly handle fetching into a stacked branch while converting the
9+ data, especially when there are also ghosts. The code was filling in
10+ parent inventories incorrectly, and also not handling when one of the
11+ parents was a ghost. (John Arbash Meinel, #402778, #412198)
12+
13 Improvements
14 ************
15
16
17=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
18--- bzrlib/repofmt/groupcompress_repo.py 2009-08-04 04:36:34 +0000
19+++ bzrlib/repofmt/groupcompress_repo.py 2009-08-12 21:35:09 +0000
20@@ -410,7 +410,18 @@
21
22 def _copy_inventory_texts(self):
23 source_vf, target_vf = self._build_vfs('inventory', True, True)
24- self._copy_stream(source_vf, target_vf, self.revision_keys,
25+ # It is not sufficient to just use self.revision_keys, as stacked
26+ # repositories can have more inventories than they have revisions.
27+ # One alternative would be to do something with
28+ # get_parent_map(self.revision_keys), but that shouldn't be any faster
29+ # than this.
30+ inventory_keys = source_vf.keys()
31+ missing_inventories = set(self.revision_keys).difference(inventory_keys)
32+ if missing_inventories:
33+ missing_inventories = sorted(missing_inventories)
34+ raise ValueError('We are missing inventories for revisions: %s'
35+ % (missing_inventories,))
36+ self._copy_stream(source_vf, target_vf, inventory_keys,
37 'inventories', self._get_filtered_inv_stream, 2)
38
39 def _copy_chk_texts(self):
40@@ -1110,7 +1121,7 @@
41
42 class RepositoryFormat2a(RepositoryFormatCHK2):
43 """A CHK repository that uses the bencode revision serializer.
44-
45+
46 This is the same as RepositoryFormatCHK2 but with a public name.
47 """
48
49
50=== modified file 'bzrlib/repository.py'
51--- bzrlib/repository.py 2009-08-11 02:45:36 +0000
52+++ bzrlib/repository.py 2009-08-12 21:35:09 +0000
53@@ -3812,6 +3812,8 @@
54 # for the new revisions that we are about to insert. We do this
55 # before adding the revisions so that no revision is added until
56 # all the inventories it may depend on are added.
57+ # Note that this is overzealous, as we may have fetched these in an
58+ # earlier batch.
59 parent_ids = set()
60 revision_ids = set()
61 for revision in pending_revisions:
62@@ -3820,10 +3822,13 @@
63 parent_ids.difference_update(revision_ids)
64 parent_ids.discard(_mod_revision.NULL_REVISION)
65 parent_map = self.source.get_parent_map(parent_ids)
66- for parent_tree in self.source.revision_trees(parent_ids):
67- basis_id, delta = self._get_delta_for_revision(tree, parent_ids, basis_id, cache)
68+ # we iterate over parent_map and not parent_ids because we don't
69+ # want to try copying any revision which is a ghost
70+ for parent_tree in self.source.revision_trees(parent_map):
71 current_revision_id = parent_tree.get_revision_id()
72 parents_parents = parent_map[current_revision_id]
73+ basis_id, delta = self._get_delta_for_revision(parent_tree,
74+ parents_parents, basis_id, cache)
75 self.target.add_inventory_by_delta(
76 basis_id, delta, current_revision_id, parents_parents)
77 # insert signatures and revisions
78
79=== modified file 'bzrlib/tests/__init__.py'
80--- bzrlib/tests/__init__.py 2009-08-04 11:40:59 +0000
81+++ bzrlib/tests/__init__.py 2009-08-12 21:35:09 +0000
82@@ -3386,6 +3386,7 @@
83 'bzrlib.tests.per_lock',
84 'bzrlib.tests.per_transport',
85 'bzrlib.tests.per_tree',
86+ 'bzrlib.tests.per_pack_repository',
87 'bzrlib.tests.per_repository',
88 'bzrlib.tests.per_repository_chk',
89 'bzrlib.tests.per_repository_reference',
90@@ -3480,7 +3481,6 @@
91 'bzrlib.tests.test_osutils',
92 'bzrlib.tests.test_osutils_encodings',
93 'bzrlib.tests.test_pack',
94- 'bzrlib.tests.test_pack_repository',
95 'bzrlib.tests.test_patch',
96 'bzrlib.tests.test_patches',
97 'bzrlib.tests.test_permissions',
98
99=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
100--- bzrlib/tests/per_interrepository/__init__.py 2009-07-10 06:46:10 +0000
101+++ bzrlib/tests/per_interrepository/__init__.py 2009-08-12 21:35:09 +0000
102@@ -68,7 +68,12 @@
103
104 def default_test_list():
105 """Generate the default list of interrepo permutations to test."""
106- from bzrlib.repofmt import knitrepo, pack_repo, weaverepo
107+ from bzrlib.repofmt import (
108+ groupcompress_repo,
109+ knitrepo,
110+ pack_repo,
111+ weaverepo,
112+ )
113 result = []
114 # test the default InterRepository between format 6 and the current
115 # default format.
116@@ -111,6 +116,9 @@
117 result.append((InterDifferingSerializer,
118 pack_repo.RepositoryFormatKnitPack1(),
119 pack_repo.RepositoryFormatKnitPack6RichRoot()))
120+ result.append((InterDifferingSerializer,
121+ pack_repo.RepositoryFormatKnitPack6RichRoot(),
122+ groupcompress_repo.RepositoryFormat2a()))
123 return result
124
125
126
127=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
128--- bzrlib/tests/per_interrepository/test_fetch.py 2009-07-10 06:46:10 +0000
129+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-12 21:35:09 +0000
130@@ -132,17 +132,23 @@
131 altered by all revisions it contains, which means that it needs both
132 the inventory for any revision it has, and the inventories of all that
133 revision's parents.
134+
135+ However, we should also skip any revisions which are ghosts in the
136+ parents.
137 """
138- to_repo = self.make_to_repository('to')
139- if not to_repo._format.supports_external_lookups:
140+ if not self.repository_format_to.supports_external_lookups:
141 raise TestNotApplicable("Need stacking support in the target.")
142 builder = self.make_branch_builder('branch')
143 builder.start_series()
144 builder.build_snapshot('base', None, [
145- ('add', ('', 'root-id', 'directory', ''))])
146- builder.build_snapshot('left', ['base'], [])
147- builder.build_snapshot('right', ['base'], [])
148- builder.build_snapshot('merge', ['left', 'right'], [])
149+ ('add', ('', 'root-id', 'directory', '')),
150+ ('add', ('file', 'file-id', 'file', 'content\n'))])
151+ builder.build_snapshot('left', ['base'], [
152+ ('modify', ('file-id', 'left content\n'))])
153+ builder.build_snapshot('right', ['base'], [
154+ ('modify', ('file-id', 'right content\n'))])
155+ builder.build_snapshot('merge', ['left', 'right'], [
156+ ('modify', ('file-id', 'left and right content\n'))])
157 builder.finish_series()
158 branch = builder.get_branch()
159 repo = self.make_to_repository('trunk')
160@@ -161,6 +167,57 @@
161 self.assertEqual(
162 set([('left',), ('right',), ('merge',)]),
163 unstacked_repo.inventories.keys())
164+ # And the basis inventories have been copied correctly
165+ trunk.lock_read()
166+ self.addCleanup(trunk.unlock)
167+ left_tree, right_tree = trunk.repository.revision_trees(
168+ ['left', 'right'])
169+ stacked_branch.lock_read()
170+ self.addCleanup(stacked_branch.unlock)
171+ (stacked_left_tree,
172+ stacked_right_tree) = stacked_branch.repository.revision_trees(
173+ ['left', 'right'])
174+ self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)
175+ self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)
176+
177+ def test_fetch_across_stacking_boundary_ignores_ghost(self):
178+ if not self.repository_format_to.supports_external_lookups:
179+ raise TestNotApplicable("Need stacking support in the target.")
180+ to_repo = self.make_to_repository('to')
181+ builder = self.make_branch_builder('branch')
182+ builder.start_series()
183+ builder.build_snapshot('base', None, [
184+ ('add', ('', 'root-id', 'directory', '')),
185+ ('add', ('file', 'file-id', 'file', 'content\n'))])
186+ builder.build_snapshot('second', ['base'], [
187+ ('modify', ('file-id', 'second content\n'))])
188+ builder.build_snapshot('third', ['second', 'ghost'], [
189+ ('modify', ('file-id', 'third content\n'))])
190+ builder.finish_series()
191+ branch = builder.get_branch()
192+ repo = self.make_to_repository('trunk')
193+ trunk = repo.bzrdir.create_branch()
194+ trunk.repository.fetch(branch.repository, 'second')
195+ repo = self.make_to_repository('stacked')
196+ stacked_branch = repo.bzrdir.create_branch()
197+ stacked_branch.set_stacked_on_url(trunk.base)
198+ stacked_branch.repository.fetch(branch.repository, 'third')
199+ unstacked_repo = stacked_branch.bzrdir.open_repository()
200+ unstacked_repo.lock_read()
201+ self.addCleanup(unstacked_repo.unlock)
202+ self.assertFalse(unstacked_repo.has_revision('second'))
203+ self.assertFalse(unstacked_repo.has_revision('ghost'))
204+ self.assertEqual(
205+ set([('second',), ('third',)]),
206+ unstacked_repo.inventories.keys())
207+ # And the basis inventories have been copied correctly
208+ trunk.lock_read()
209+ self.addCleanup(trunk.unlock)
210+ second_tree = trunk.repository.revision_tree('second')
211+ stacked_branch.lock_read()
212+ self.addCleanup(stacked_branch.unlock)
213+ stacked_second_tree = stacked_branch.repository.revision_tree('second')
214+ self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)
215
216 def test_fetch_missing_basis_text(self):
217 """If fetching a delta, we should die if a basis is not present."""
218@@ -276,8 +333,12 @@
219 to_repo = self.make_to_repository('to')
220 to_repo.fetch(from_tree.branch.repository)
221 recorded_inv_sha1 = to_repo.get_inventory_sha1('foo-id')
222- xml = to_repo.get_inventory_xml('foo-id')
223- computed_inv_sha1 = osutils.sha_string(xml)
224+ to_repo.lock_read()
225+ self.addCleanup(to_repo.unlock)
226+ stream = to_repo.inventories.get_record_stream([('foo-id',)],
227+ 'unordered', True)
228+ bytes = stream.next().get_bytes_as('fulltext')
229+ computed_inv_sha1 = osutils.sha_string(bytes)
230 self.assertEqual(computed_inv_sha1, recorded_inv_sha1)
231
232
233
234=== renamed file 'bzrlib/tests/test_pack_repository.py' => 'bzrlib/tests/per_pack_repository.py'
235--- bzrlib/tests/test_pack_repository.py 2009-08-11 05:26:57 +0000
236+++ bzrlib/tests/per_pack_repository.py 2009-08-12 21:35:09 +0000
237@@ -1,4 +1,4 @@
238-# Copyright (C) 2008 Canonical Ltd
239+# Copyright (C) 2008, 2009 Canonical Ltd
240 #
241 # This program is free software; you can redistribute it and/or modify
242 # it under the terms of the GNU General Public License as published by
243@@ -42,7 +42,7 @@
244 pack_repo,
245 groupcompress_repo,
246 )
247-from bzrlib.repofmt.groupcompress_repo import RepositoryFormatCHK1
248+from bzrlib.repofmt.groupcompress_repo import RepositoryFormat2a
249 from bzrlib.smart import (
250 client,
251 server,
252@@ -84,7 +84,7 @@
253 """Packs reuse deltas."""
254 format = self.get_format()
255 repo = self.make_repository('.', format=format)
256- if isinstance(format.repository_format, RepositoryFormatCHK1):
257+ if isinstance(format.repository_format, RepositoryFormat2a):
258 # TODO: This is currently a workaround. CHK format repositories
259 # ignore the 'deltas' flag, but during conversions, we can't
260 # do unordered delta fetches. Remove this clause once we
261@@ -295,6 +295,41 @@
262 self.assertEqual(1, len(list(index.iter_all_entries())))
263 self.assertEqual(2, len(tree.branch.repository.all_revision_ids()))
264
265+ def test_pack_preserves_all_inventories(self):
266+ # This is related to bug:
267+ # https://bugs.launchpad.net/bzr/+bug/412198
268+ # Stacked repositories need to keep the inventory for parents, even
269+ # after a pack operation. However, it is harder to test that, then just
270+ # test that all inventory texts are preserved.
271+ format = self.get_format()
272+ builder = self.make_branch_builder('source', format=format)
273+ builder.start_series()
274+ builder.build_snapshot('A-id', None, [
275+ ('add', ('', 'root-id', 'directory', None))])
276+ builder.build_snapshot('B-id', None, [
277+ ('add', ('file', 'file-id', 'file', 'B content\n'))])
278+ builder.build_snapshot('C-id', None, [
279+ ('modify', ('file-id', 'C content\n'))])
280+ builder.finish_series()
281+ b = builder.get_branch()
282+ b.lock_read()
283+ self.addCleanup(b.unlock)
284+ repo = self.make_repository('repo', shared=True, format=format)
285+ repo.lock_write()
286+ self.addCleanup(repo.unlock)
287+ repo.fetch(b.repository, revision_id='B-id')
288+ inv = b.repository.iter_inventories(['C-id']).next()
289+ repo.start_write_group()
290+ repo.add_inventory('C-id', inv, ['B-id'])
291+ repo.commit_write_group()
292+ self.assertEqual([('A-id',), ('B-id',), ('C-id',)],
293+ sorted(repo.inventories.keys()))
294+ repo.pack()
295+ self.assertEqual([('A-id',), ('B-id',), ('C-id',)],
296+ sorted(repo.inventories.keys()))
297+ # Content should be preserved as well
298+ self.assertEqual(inv, repo.iter_inventories(['C-id']).next())
299+
300 def test_pack_layout(self):
301 # Test that the ordering of revisions in pack repositories is
302 # tip->ancestor
303@@ -311,7 +346,7 @@
304 # revision access tends to be tip->ancestor, so ordering that way on
305 # disk is a good idea.
306 for _1, key, val, refs in pack.revision_index.iter_all_entries():
307- if type(format.repository_format) is RepositoryFormatCHK1:
308+ if type(format.repository_format) is RepositoryFormat2a:
309 # group_start, group_len, internal_start, internal_len
310 pos = map(int, val.split())
311 else:
312@@ -589,7 +624,7 @@
313
314 def make_write_ready_repo(self):
315 format = self.get_format()
316- if isinstance(format.repository_format, RepositoryFormatCHK1):
317+ if isinstance(format.repository_format, RepositoryFormat2a):
318 raise TestNotApplicable("No missing compression parents")
319 repo = self.make_repository('.', format=format)
320 repo.lock_write()
321@@ -808,7 +843,7 @@
322 matching_format_name = 'pack-0.92-subtree'
323 else:
324 if repo._format.supports_chks:
325- matching_format_name = 'development6-rich-root'
326+ matching_format_name = '2a'
327 else:
328 matching_format_name = 'rich-root-pack'
329 mismatching_format_name = 'pack-0.92'
330@@ -841,7 +876,7 @@
331 else:
332 if repo.supports_rich_root():
333 if repo._format.supports_chks:
334- matching_format_name = 'development6-rich-root'
335+ matching_format_name = '2a'
336 else:
337 matching_format_name = 'rich-root-pack'
338 mismatching_format_name = 'pack-0.92-subtree'
339@@ -1062,9 +1097,9 @@
340 "(bzr 1.9)\n",
341 format_supports_external_lookups=True,
342 index_class=BTreeGraphIndex),
343- dict(format_name='development6-rich-root',
344- format_string='Bazaar development format - group compression '
345- 'and chk inventory (needs bzr.dev from 1.14)\n',
346+ dict(format_name='2a',
347+ format_string="Bazaar repository format 2a "
348+ "(needs bzr 1.16 or later)\n",
349 format_supports_external_lookups=True,
350 index_class=BTreeGraphIndex),
351 ]