Merge lp:~jameinel/bzr/1.18-bundle-and-stack-393349 into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.18-bundle-and-stack-393349
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 0 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.18-bundle-and-stack-393349
Reviewer Review Type Date Requested Status
Martin Pool Approve
Review via email: mp+9334@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is a feature fix for bug #393349. It changes the bundle create and insert code to allow bundles to work with --2a format repositories.

The basic change is to make the bundle writer look at repository._serializer.supports_altered_by_hack. If it does support it, then we just read mpdiffs directly from the VF (same as before). If it *doesn't* then it uses serializer.write_inventory_to_string().

In doing so, I discovered that CHKSerializer actually inherits from a non-rich-root XML serializer. However we obviously aren't using those parts in production today (since otherwise it would have broken as observed). Andrew ran into this as well when he was working on the inventory delta code.

Trying to apply one of these bundles with an older bzr will result in a sha1sum mismatch (as it tries to apply the mpdiff directly to the chk format text, which shouldn't apply at all).

There are a couple of serious performance issues (it takes ~1m+ to generate a 500 revision bundle, and 3m+ to insert it, while 'bzr pull' on those revisions takes ~12s.) I'll probably work on some of the obvious ones, but for now I wanted to get the basic functionality reviewed and merged.

Revision history for this message
Martin Pool (mbp) wrote :
Download full text (7.1 KiB)

=== modified file 'NEWS'
--- NEWS 2009-07-16 08:00:03 +0000
+++ NEWS 2009-07-27 18:03:19 +0000
@@ -33,6 +33,9 @@
 * ``bzr mv`` no longer takes out branch locks, which allows it to work
   when the branch is readonly. (Robert Collins, #216541)

+* ``bzr send`` now generates valid bundles with ``--2a`` formats.
+ (John Arbash Meinel, #393349)
+
 * Fixed spurious "Source branch does not support stacking" warning when
   pushing. (Andrew Bennetts, #388908)

It'd be useful to mention in the news file the constraints on interoperation
with other formats and clients.

=== modified file 'bzrlib/bundle/serializer/v4.py'
--- bzrlib/bundle/serializer/v4.py 2009-06-10 03:56:49 +0000
+++ bzrlib/bundle/serializer/v4.py 2009-07-27 17:51:10 +0000
@@ -315,12 +315,69 @@
     def write_revisions(self):
         """Write bundle records for all revisions and signatures"""
         inv_vf = self.repository.inventories
- revision_order = [key[-1] for key in multiparent.topo_iter_keys(inv_vf,
- self.revision_keys)]
+ topological_order = [key[-1] for key in multiparent.topo_iter_keys(
+ inv_vf, self.revision_keys)]
+ revision_order = topological_order
         if self.target is not None and self.target in self.revision_ids:
+ # Make sure the target is always the last entry
+ revision_order = list(topological_order)
             revision_order.remove(self.target)
             revision_order.append(self.target)
- self._add_mp_records_keys('inventory', inv_vf, [(revid,) for revid in revision_order])
+ if self.repository._serializer.support_altered_by_hack:
+ self._add_mp_records_keys('inventory', inv_vf,
+ [(revid,) for revid in topological_order])
+ else:
+ self._add_inventory_mpdiffs_from_serializer(topological_order)
+ self._add_revision_texts(revision_order)

I think you should have, here, at least some description of why you're
going to use support_altered_by_hack here, because it's (at least to me)
not very clear that it's the same thing as being able to get the
inventory mp records from the versionedfile.

Is there a reason the orders should be different?

+
+ def _add_inventory_mpdiffs_from_serializer(self, revision_order):

Maybe a comment or docstring about the purpose of this?

+ inventory_key_order = [(r,) for r in revision_order]
+ parent_map = self.repository.inventories.get_parent_map(
+ inventory_key_order)
+ missing_keys = set(inventory_key_order).difference(parent_map)
+ if missing_keys:
+ raise errors.RevisionNotPresent(list(missing_keys)[0],
+ self.repository.inventories)
+ inv_to_str = self.repository._serializer.write_inventory_to_string
+ # Make sure that we grab the parent texts first
+ just_parents = set()
+ map(just_parents.update, parent_map.itervalues())
+ just_parents.difference_update(parent_map)
+ # Ignore ghost parents
+ present_parents = self.repository.inventories.get_parent_map(
+ ...

Read more...

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.9 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Review: Approve
> === modified file 'NEWS'
> --- NEWS 2009-07-16 08:00:03 +0000
> +++ NEWS 2009-07-27 18:03:19 +0000
> @@ -33,6 +33,9 @@
> * ``bzr mv`` no longer takes out branch locks, which allows it to work
> when the branch is readonly. (Robert Collins, #216541)
>
> +* ``bzr send`` now generates valid bundles with ``--2a`` formats.
> + (John Arbash Meinel, #393349)
> +
> * Fixed spurious "Source branch does not support stacking" warning when
> pushing. (Andrew Bennetts, #388908)
>
> It'd be useful to mention in the news file the constraints on interoperation
> with other formats and clients.
>

Will do. I think generally it will only be compatible with newer bzr
versions. I had hoped generating a --2a could be compatible with
1.9-rich-root, but it turns out the old XML-serializer version was wrong
since it was using xml5 which was a non-rich-root format.

>
> === modified file 'bzrlib/bundle/serializer/v4.py'
> --- bzrlib/bundle/serializer/v4.py 2009-06-10 03:56:49 +0000
> +++ bzrlib/bundle/serializer/v4.py 2009-07-27 17:51:10 +0000
> @@ -315,12 +315,69 @@
> def write_revisions(self):
> """Write bundle records for all revisions and signatures"""
> inv_vf = self.repository.inventories
> - revision_order = [key[-1] for key in multiparent.topo_iter_keys(inv_vf,
> - self.revision_keys)]
> + topological_order = [key[-1] for key in multiparent.topo_iter_keys(
> + inv_vf, self.revision_keys)]
> + revision_order = topological_order
> if self.target is not None and self.target in self.revision_ids:
> + # Make sure the target is always the last entry
> + revision_order = list(topological_order)
> revision_order.remove(self.target)
> revision_order.append(self.target)
> - self._add_mp_records_keys('inventory', inv_vf, [(revid,) for revid in revision_order])
> + if self.repository._serializer.support_altered_by_hack:
> + self._add_mp_records_keys('inventory', inv_vf,
> + [(revid,) for revid in topological_order])
> + else:
> + self._add_inventory_mpdiffs_from_serializer(topological_order)
> + self._add_revision_texts(revision_order)
>
> I think you should have, here, at least some description of why you're
> going to use support_altered_by_hack here, because it's (at least to me)
> not very clear that it's the same thing as being able to get the
> inventory mp records from the versionedfile.

Sure. I'll at least put comments in.

>
> Is there a reason the orders should be different?
>

For whatever reason the *last* entry in the revisions is supposed to be
the target. However, the inventory stream *must* be in topological
order, or you don't have the parent text in order to apply the mpdiff.

We had a test that was doing (handwaving):

  create_bundle(include_revs=['rev1', 'rev2'], target_rev='rev1')

It seemed to be doing it by accident, but for whatever reason it was
including a child revision in the bundle, but making the parent revis...

Read more...

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/4 John A Meinel <email address hidden>:
>> Is there a reason the orders should be different?
>>
>
> For whatever reason the *last* entry in the revisions is supposed to be
> the target. However, the inventory stream *must* be in topological
> order, or you don't have the parent text in order to apply the mpdiff.
>
> We had a test that was doing (handwaving):
>
>  create_bundle(include_revs=['rev1', 'rev2'], target_rev='rev1')
>
> It seemed to be doing it by accident, but for whatever reason it was
> including a child revision in the bundle, but making the parent revision
> the "target" revision.
>
> The remove() + append() would then order them in *reverse* topological
> order, which was causing the inventory building to fail.

OK, so again maybe a comment would be good.

>> -class CHKSerializer(xml5.Serializer_v5):
>> +class CHKSerializer(xml6.Serializer_v6):
>>      """A CHKInventory based serializer with 'plain' behaviour."""
>>
>>      format_num = '9'
>>
>> It would be nice (not necessarily for this patch) if we didn't have
>> concrete final classes inheriting from each other, but rather just made
>> them inherit from some kind of trait class.
>
> I think that would be ok, though I think you would end up with:
>
> class Serializer_v6(V6TraitClass):
>  pass
>
> serializer_v6 = Serializer_v6()
>
> Maybe that fits what you want, though. As then v6 is not an instance of
> v5, even if the traits base classes are the same.

I was thinking that the behaviour like having rich roots would be in
the trait class, and then the specific version number would be in a
subclass, so that at the class graph level you could see what they had
in common but not get the incorrect conclusion that a CHK serializer
_was_ an XML serializer. Maybe I should send a hacking guide patch or
rfc.

--
Martin <http://launchpad.net/~mbp/>

Preview Diff

Empty