On Mon, 2009-08-31 at 22:07 +0000, John A Meinel wrote:
A little stale now, but I wanted to address a couple of points:
> So you and I seem to have very different thresholds for what is
> reasonable.
>
> $ time bzr branch bzr-2a/bzr.dev copy
> real 0m59.077s
>
> $ time bzr pack bzr-2a
> real 2m1.454s
>
> So it takes 2x longer to repack all the content than it did to just
> copy
> it. By that estimate, it would be 3m+ to do a 'bzr branch' outside of
> a
> shared repo. (Or 3x longer than it currently does.)
I tested with launchpad, and saw
8m
and
11m
respectively.
I'm not sure why bzr itself shows greater overhead. Perhaps its the
specific machine, or some measurement issue.
> So it is taking roughly 2min to compress everything, but we can
> *extract* all of that content in about 4s. (Not exactly fair, because
> we
> also have inventory/chk/etc content involved in a 'bzr pack'.)
>
>
> If I change the above loop to be:
> osutils.sha_string(record.get_bytes_as('fulltext'))
>
> The time changes to 12s/loop, or 224MB/s. (About right when you
> consider
> that sha1 is roughly 1/2 the speed of decompression, so you do 1
> decompression and 2 sha1 ticks, or 3x original speed.)
>
> Still, extracting and sha1summing everything is an order of magnitude
> faster than the 120s to compress it all.
Thats true, but doesn't correspond to the moderate (30% by my
measurement, 100% by yours) difference we see at the command line: there
is more to it than this.
> I would really *like* it if we would extract all the content and make
> sure the (file_id, revision) => sha1sum from the inventory matches the
> content before it is put into the repository.....
On Mon, 2009-08-31 at 22:07 +0000, John A Meinel wrote:
A little stale now, but I wanted to address a couple of points:
> So you and I seem to have very different thresholds for what is
> reasonable.
>
> $ time bzr branch bzr-2a/bzr.dev copy
> real 0m59.077s
>
> $ time bzr pack bzr-2a
> real 2m1.454s
>
> So it takes 2x longer to repack all the content than it did to just
> copy
> it. By that estimate, it would be 3m+ to do a 'bzr branch' outside of
> a
> shared repo. (Or 3x longer than it currently does.)
I tested with launchpad, and saw
8m
and
11m
respectively.
I'm not sure why bzr itself shows greater overhead. Perhaps its the
specific machine, or some measurement issue.
> So it is taking roughly 2min to compress everything, but we can sha_string( record. get_bytes_ as('fulltext' ))
> *extract* all of that content in about 4s. (Not exactly fair, because
> we
> also have inventory/chk/etc content involved in a 'bzr pack'.)
>
>
> If I change the above loop to be:
> osutils.
>
> The time changes to 12s/loop, or 224MB/s. (About right when you
> consider
> that sha1 is roughly 1/2 the speed of decompression, so you do 1
> decompression and 2 sha1 ticks, or 3x original speed.)
>
> Still, extracting and sha1summing everything is an order of magnitude
> faster than the 120s to compress it all.
Thats true, but doesn't correspond to the moderate (30% by my
measurement, 100% by yours) difference we see at the command line: there
is more to it than this.
> I would really *like* it if we would extract all the content and make
> sure the (file_id, revision) => sha1sum from the inventory matches the
> content before it is put into the repository.....
me too :(
-Rob