Code review comment for lp:~spiv/bzr/inventory-delta

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
> Review: Needs Fixing
> So, to further my discussion. While investigating this, I found some odd bits
> in _stream_invs_as_deltas. For example:
[...]
>
> ^- Notice that once you've prepared a delta for "inv.revision_id" you then set
> "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending
> most of the inventories as complete texts, because you won't actually think
> you've sent what you actually have done.

Oops, right. I've merged your fix for that.

> + yield versionedfile.InventoryDeltaContentFactory(
> + key, parents, None, delta, basis_id, flags, from_repo)
>
> The way you've handled the "no parents are available use NULL", means that you
> actually create a delta to NULL for *every* revision, and find out if it is the
> smallest one. Which seems inefficient versus just using NULL when nothing else
> is available. (Note that IDS goes as far as to not use parents that aren't
> cached even if they have been sent, and to fall back to using the last-sent
> revision otherwise.)
>
> I've changed this loop around a bit, to avoid some duplication and make it a
> little bit clearer (at least to me) what is going on. I also added a quick
> LRUCache for the parent inventories that we've just sent, as re-extracting
> them from the repository is not going to be efficient. (Especially extracting
> them one at a time.)

Thanks, I've merged your fix.

> Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.
>
> It seems like we should be consistent at that level. What do you think?

It would be nice, but not vital.

> As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:
[...]
> serializer.require_flags(**self._format_flags)
>
> It isn't vastly better, but at least they are named arguments, rather than
> fairly arbitrary positional ones.

Ok, I'll do that.

> In the end, I don't think that it is ideal that an incremental push of 1
> revision will transmit a complete inventory (in delta form). I understand the
> limitation is that we currently are unable to 'buffer' these records anywhere
> in a persistent state between RPC calls. (Even though we have a bytes-on-the
> wire representation, we don't have an index/etc to look them up in.)

(As said elsewhere on this thread...)

This is only an issue for a cross-format push, which doesn't have to be
maximally efficient. It just has to be reasonable. We can look at doing better
later if we want, but this is already a massive improvement in terms of data
transferred compared to IDS.

[...]
> For *me* the ssh handshake is probably signficantly more than the time to push
> up 2 inventories of bzr.dev, but that tradeoff is very different for people
> working over a tiny bandwidth from their GPRS phone, trying to push a critical
> bugfix for a Launchpad branch.

Then they probably should go to the effort of having their local branch in a
matching format :)

We're not intending to use inventory-deltas for anything but cross-format
fetches AFAIK.

> Again, the code as written fails to actually transmit data over the smart protocol, I get:
> $ time awbzr branch bzrtools_1.9/ -r -100 bzr://localhost/test-2a/b
> bzr: ERROR: Version present for / in TREE_ROOT
>
> even with the fixes I've mentioned so far.

I've found and fixed this. It turned out to be a bug in inventory_delta.py (it
couldn't round-trip deltas from non-rich-root inventories).

[...]
> ^- However, if I traced it correctly, old_name == 'basis_id' which is a string, but new_name == key, aka a tuple.
>
> So we're clearly doing something wrong here.

Yes. I found and fixed this too. I've also made delta_to_lines raise TypeError
if it gets non-strs for old_name or new_name, because that's always going to be
a mistake. The current behaviour (which will silently str a tuple) isn't very
helpful.

> Digging further, the bytes on the wire don't include things like:
> InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
> InventoryDeltaContentFactory.sha1

Inventories don't have parent lists, IIRC? Revisions do, and text versions do,
but inventories don't. (They may have a compression parent, but not a semantic
parent.)

The sha1 is never set, although perhaps it should be.

These two attributes are really just there to satisfy the ContentFactory
interface. After a separate discussion with Robert I'm goint get rid of
InventoryDeltaContentFactory and instead make an explicit inventory-delta
substream that will simply carry FulltextContentFactories of serialised deltas.
(I'm not thrilled at an unnecessary serialise/deseralise pass this will incur in
process, but it doesn't seem to be the bottleneck at the moment.)

[...]
> So I've made some updates into lp:~jameinel/bzr/1.18-inventory-delta

Thanks, merged. (Including, for now, the disabling of
InterDifferingSerializer... FWIW, I was doing that by commenting out the
registration call rather than hacking is_compatible :P)

I'll send an updated merge proposal once I've done the change to use a new
substream rather than add a content factory that isn't really a factory for
bytes in a versionedfile.

-Andrew.

« Back to merge proposal