Code review comment for lp:~spiv/bzr/inventory-delta

Revision history for this message
Andrew Bennetts (spiv) wrote :

John Arbash Meinel wrote:
[...]
> So do I understand correctly that:
>
> IDS => bzr:// ~30min
> InventoryDelta => bzr:// 9.5min
> IDS => file:// < 1min

Yes, that's right.

> So when I was looking at InventoryDelta (before we fixed it to actually
> send deltas :) the #1 overhead was actually in "_generate_root_texts()"
> because that was iterating over revision_trees and having to extract all
> of the inventories yet again.

Right, and that's probably still the bottleneck. But even with that bottleneck
it's much faster over the network (even a very fast "network" like the loopback
interface), so I think it's worth merging.

> Anyway I'll give the new code a look over. Unfortunately there are still
> a lot of conflated factors, like code that wants to transmit all of the
> "texts" before we transmit *any* "inventories" content, which means
> somewhere you need to do buffering.
>
> IDS "works" by batching 100 at a time, so it only buffers the 100 or so
> inventory deltas before it writes the root texts to the target repo.

Yeah. It might be nice to somehow arrange similar batching when sending streams
over the network. If we can arrange to make these streams self-contained it
would make it easier to do incremental packing too.

Actually... all we need for incremental packing (which would fix the
"inventory-delta push to 2a is very bloated on disk until the stream is done")
is a way to be able to force a repack of an uncommitted pack (i.e. in upload/,
not inserted). That's probably not too hard to add, and then the StreamSink can
trigger that every N records or when the pack reaches N bytes or something.
I'll have a play with this.

-Andrew.

« Back to merge proposal