Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
So for starters, let me mention what I found wrt performance:
time bzr.dev branch mysql-1k myqsl-2a/1k
real 3m18.490s
time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
real 2m29.953s
+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@ pass
else:
# Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie return cached_ie.copy()
kind = elt.tag
It has 2 basic effects:
1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())
2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
x.attribute1 == y.attribute1
and x.attribute2 == y.attribute2
etc.
As you can see it is a big win for this test case (about 4:3 or 33% faster)
So what about Andrew's work:
time bzr.inv.delta branch mysql-1k myqsl-2a/1k
real 10m14.267s
time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
real 9m49.372s
It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043
For most of that time, making it really look like it was stalled.
Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.
If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).
So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote: erializer.
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingS
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
So for starters, let me mention what I found wrt performance:
time bzr.dev branch mysql-1k myqsl-2a/1k
real 3m18.490s
time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
real 2m29.953s
+xml8 is just this patch:
pass
return cached_ie.copy()
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
else:
# Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
kind = elt.tag
It has 2 basic effects:
1) Avoid copying all inventory entries all the time (so reduce the time copy())
spent in InventoryEntry.
2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
x.attribute1 == y.attribute1
and x.attribute2 == y.attribute2
etc.
As you can see it is a big win for this test case (about 4:3 or 33% faster)
So what about Andrew's work:
time bzr.inv.delta branch mysql-1k myqsl-2a/1k
real 10m14.267s
time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
real 9m49.372s
It also was stuck at: ####### #####- ] Fetching revisions:Inserting stream:Walking
[######
content 912/1043
For most of that time, making it really look like it was stalled.
Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.
If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).
So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.
John enigmail. mozdev. org
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkp fWm0ACgkQJdeBCY SNAAM8mgCgru3K3 SpP8BcMZdLJLHkq SSTQ HNl3Kxbu0clMKLn R7
TlAAoLVd7ndCbPe
=BB8Q
-----END PGP SIGNATURE-----