John A Meinel wrote: > Review: Needs Fixing > So, to further my discussion. While investigating this, I found some odd bits > in _stream_invs_as_deltas. For example: [...] > > ^- Notice that once you've prepared a delta for "inv.revision_id" you then set > "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending > most of the inventories as complete texts, because you won't actually think > you've sent what you actually have done. Oops, right. I've merged your fix for that. > + yield versionedfile.InventoryDeltaContentFactory( > + key, parents, None, delta, basis_id, flags, from_repo) > > The way you've handled the "no parents are available use NULL", means that you > actually create a delta to NULL for *every* revision, and find out if it is the > smallest one. Which seems inefficient versus just using NULL when nothing else > is available. (Note that IDS goes as far as to not use parents that aren't > cached even if they have been sent, and to fall back to using the last-sent > revision otherwise.) > > I've changed this loop around a bit, to avoid some duplication and make it a > little bit clearer (at least to me) what is going on. I also added a quick > LRUCache for the parent inventories that we've just sent, as re-extracting > them from the repository is not going to be efficient. (Especially extracting > them one at a time.) Thanks, I've merged your fix. > Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string. > > It seems like we should be consistent at that level. What do you think? It would be nice, but not vital. > As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to: [...] > serializer.require_flags(**self._format_flags) > > It isn't vastly better, but at least they are named arguments, rather than > fairly arbitrary positional ones. Ok, I'll do that. > In the end, I don't think that it is ideal that an incremental push of 1 > revision will transmit a complete inventory (in delta form). I understand the > limitation is that we currently are unable to 'buffer' these records anywhere > in a persistent state between RPC calls. (Even though we have a bytes-on-the > wire representation, we don't have an index/etc to look them up in.) (As said elsewhere on this thread...) This is only an issue for a cross-format push, which doesn't have to be maximally efficient. It just has to be reasonable. We can look at doing better later if we want, but this is already a massive improvement in terms of data transferred compared to IDS. [...] > For *me* the ssh handshake is probably signficantly more than the time to push > up 2 inventories of bzr.dev, but that tradeoff is very different for people > working over a tiny bandwidth from their GPRS phone, trying to push a critical > bugfix for a Launchpad branch. Then they probably should go to the effort of having their local branch in a matching format :) We're not intending to use inventory-deltas for anything but cross-format fetches AFAIK. > Again, the code as written fails to actually transmit data over the smart protocol, I get: > $ time awbzr branch bzrtools_1.9/ -r -100 bzr://localhost/test-2a/b > bzr: ERROR: Version present for / in TREE_ROOT > > even with the fixes I've mentioned so far. I've found and fixed this. It turned out to be a bug in inventory_delta.py (it couldn't round-trip deltas from non-rich-root inventories). [...] > ^- However, if I traced it correctly, old_name == 'basis_id' which is a string, but new_name == key, aka a tuple. > > So we're clearly doing something wrong here. Yes. I found and fixed this too. I've also made delta_to_lines raise TypeError if it gets non-strs for old_name or new_name, because that's always going to be a mistake. The current behaviour (which will silently str a tuple) isn't very helpful. > Digging further, the bytes on the wire don't include things like: > InventoryDeltaContentFactory.parents (list/tuple of tuple keys) > InventoryDeltaContentFactory.sha1 Inventories don't have parent lists, IIRC? Revisions do, and text versions do, but inventories don't. (They may have a compression parent, but not a semantic parent.) The sha1 is never set, although perhaps it should be. These two attributes are really just there to satisfy the ContentFactory interface. After a separate discussion with Robert I'm goint get rid of InventoryDeltaContentFactory and instead make an explicit inventory-delta substream that will simply carry FulltextContentFactories of serialised deltas. (I'm not thrilled at an unnecessary serialise/deseralise pass this will incur in process, but it doesn't seem to be the bottleneck at the moment.) [...] > So I've made some updates into lp:~jameinel/bzr/1.18-inventory-delta Thanks, merged. (Including, for now, the disabling of InterDifferingSerializer... FWIW, I was doing that by commenting out the registration call rather than hacking is_compatible :P) I'll send an updated merge proposal once I've done the change to use a new substream rather than add a content factory that isn't really a factory for bytes in a versionedfile. -Andrew.