Comment 12 for bug 402652

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 402652] Re: smart fetch for --2a does not opportunistically combine groups

On Mon, 2009-08-31 at 22:07 +0000, John A Meinel wrote:

A little stale now, but I wanted to address a couple of points:

> So you and I seem to have very different thresholds for what is
> reasonable.
>
> $ time bzr branch bzr-2a/bzr.dev copy
> real 0m59.077s
>
> $ time bzr pack bzr-2a
> real 2m1.454s
>
> So it takes 2x longer to repack all the content than it did to just
> copy
> it. By that estimate, it would be 3m+ to do a 'bzr branch' outside of
> a
> shared repo. (Or 3x longer than it currently does.)

I tested with launchpad, and saw
8m
and
11m
respectively.

I'm not sure why bzr itself shows greater overhead. Perhaps its the
specific machine, or some measurement issue.

> So it is taking roughly 2min to compress everything, but we can
> *extract* all of that content in about 4s. (Not exactly fair, because
> we
> also have inventory/chk/etc content involved in a 'bzr pack'.)
>
>
> If I change the above loop to be:
> osutils.sha_string(record.get_bytes_as('fulltext'))
>
> The time changes to 12s/loop, or 224MB/s. (About right when you
> consider
> that sha1 is roughly 1/2 the speed of decompression, so you do 1
> decompression and 2 sha1 ticks, or 3x original speed.)
>
> Still, extracting and sha1summing everything is an order of magnitude
> faster than the 120s to compress it all.

Thats true, but doesn't correspond to the moderate (30% by my
measurement, 100% by yours) difference we see at the command line: there
is more to it than this.

> I would really *like* it if we would extract all the content and make
> sure the (file_id, revision) => sha1sum from the inventory matches the
> content before it is put into the repository.....

me too :(

-Rob