-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Bennetts wrote: > Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr. > > Requested reviews: > bzr-core (bzr-core) > > This is a pretty big patch. It does lots of things: > > * adds new insert_stream and get_stream verbs > * adds de/serialization of inventory-delta records on the network > * fixes rich-root generation in StreamSource > * adds a bunch of new scenarios to per_interrepository tests > * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when > the new pack is already optimal). > * improves the inventory_delta module a little > * various miscellaneous fixes and new tests that are hopefully self-evident > * and, most controversially, removes InterDifferingSerializer. > >>From John's mail a while back there were a bunch of issues with removing IDS. I > think the outstanding ones are: > >> 1) Incremental updates. IDS converts batches of 100 revs at a time, >> which also triggers autopacks at 1k revs. Streaming fetch is currently >> an all-or-nothing, which isn't appropriate (IMO) for conversions. >> Consider that conversion can take *days*, it is important to have >> something that can be stopped and resumed. It also picks out the 'optimal' deltas by computing many different ones and finding whichever one was the 'smallest'. For local conversions, the time to compute 2-3 deltas was much smaller than to apply an inefficient delta. >> >> 2) Also, auto-packing as you go avoids the case you ran into, where bzr >> bloats to 2.4GB before packing back to 25MB. We know the new format is >> even more sensitive to packing efficiency. Not to mention that a single >> big-stream generates a single large pack, it isn't directly obvious that >> we are being so inefficient. > > i.e. performance concerns. > Generally, yes. There is also: 3) Being able to resume because you snapshotted periodically as you went. This seems even more important for a network transfer. > The streaming code is pretty similar in how it does the conversion now to the > way IDS did it, but probably still different enough that we will want to measure > the impact of this. I'm definitely concerned about case 2, the lack of packing > as you go, although perhaps the degree of bloat is reduced by using > semantic inventory-delta records? > I don't think bzr bloating from 100MB => 2.4GB (and then back down to 25MB post pack) was because of inventory records. However, if it was purely because of a bad streaming order, we could probably fix that by changing how we stream texts. > The reason why I eventually deleted IDS was that it was just too burdensome to > keep two code paths alive, thoroughly tested, and correct. For instance, if we > simply reinstated IDS for local-only fetches then most of the test suite, > including the relevant interrepo tests, will only exercise IDS. Also, IDS > turned out to have a bug when used on a stacked repository that the extending > test suite in this branch revealed (I've forgotten the details, but can dig them > up if you like). It didn't seem worth the hassle of fixing IDS when I already > had a working implementation. > > I'm certainly open to reinstating IDS if it's the most expedient way to have > reasonable local performance for upgrades, but I thought I'd try to be bold and > see if we could just live without the extra complexity. Maybe we can improve > performance of streaming rather than resurrect IDS? > > -Andrew. I'm certainly open to the suggestion of getting rid of IDS. I don't like having multiple code paths. It just happens that there are *big* wins and it is often easier to write optimized code in a different framework. (3) is an issue I'd like to see addressed, but which Robert seems particularly unhappy having us try to do. (See other bug comments, etc about how other systems don't do it and he feels it isn't worth doing.) It was fairly straightforward to do with IDS, the argument I think from Robert is that the client would need to be computing whether it has a 'complete' set and thus can commit the current write group. (the *source* knows these sort of things, and can just say "and now you have it", but the client has to re-do all that work to figure it out from a stream.) I'll send a follow up email later to do more of a review of the code changes. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpfT4UACgkQJdeBCYSNAAM0kQCfTO0rlN9Zl1LDues6IpHdp7ju FHEAoKgP+f81hxKB6o3iyVt5mYPoFyUD =QSKC -----END PGP SIGNATURE-----