Code review comment for lp:~jameinel/bzr/1.15-pack-source

Revision history for this message
John A Meinel (jameinel) wrote :

This proposal changes how pack <=> pack fetching triggers.

It removes the InterPackRepo optimizer (which uses Packer internally) in favor of a new KnitPackStreamSource.

The new source is a very streamlined version of StreamSource, which doesn't attempt to handle all the different cross-format issues. It only supports exact format fetching, and does so in a nice streamlined fashion.

Specifically, it sends data as (signatures, revisions, inventories, texts) since it knows we have atomic insertion.

It walks the inventory pages a single time, and extracts the text keys as the fetch is going, rather than doing so in a pre-read fetch. This is a moderate win for dump transport fetching (versus StreamSource, but not InterPackRepo) because it avoids reading the Inventory pages twice.

It also fixes a bug with the current InterPackRepo code. Namely, the Packer code was recently changed to make sure that all file_keys that are referenced are fetched, rather than only the ones mentioned in the specific revisions being fetched. This was done at ~ the same time as the updates to file_ids_altered_by... However, in updating that, it was not updated to read the parent inventories and remove their text keys.

This meant that if you got a fulltext inventory, you would end up copying the data for all texts in that revision, whether they were modified or not. For bzr.dev, this meant that it often downloaded ~3MB of extra data for a small change. I considered fixing Packer to handle this, but I figured we wanted to move to StreamSource as the one-and-only method for fetching anyway.

I also did a little bit of changes to make it clearer when a set of something was *keys* (tuples) and when it was *ids* (strings).

I also moved some of the helpers that were added as part of the gc-stacking patch, into the base Repository class, so that I could simply re-use them.

« Back to merge proposal