Bazaar

Merge lp:~spiv/bzr/inventory-delta into lp:~bzr/bzr/trunk-old

inventory-delta
Merge into trunk-old

Proposed by Andrew Bennetts on 2009-07-16

Status:

Superseded

Proposed branch:

lp:~spiv/bzr/inventory-delta

Merge into:

lp:~bzr/bzr/trunk-old

Diff against target:

2656 lines

To merge this branch:

bzr merge lp:~spiv/bzr/inventory-delta

Related bugs:

Bug #385826: push and pull with different formats extremely slow on network	Critical	Fix Released
Bug #410917: inter-format push to remote 2a branch does packing on local end	Medium	Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
bzr-core		2009-07-16	Pending
Review via email: mp+8860@code.launchpad.net

This proposal has been superseded by a proposal from 2009-07-22.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-07-16:

This is a pretty big patch. It does lots of things:

* adds new insert_stream and get_stream verbs
* adds de/serialization of inventory-delta records on the network
* fixes rich-root generation in StreamSource
* adds a bunch of new scenarios to per_interrepository tests
* fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
the new pack is already optimal).
* improves the inventory_delta module a little
* various miscellaneous fixes and new tests that are hopefully self-evident
* and, most controversially, removes InterDifferingSerializer.

From John's mail a while back there were a bunch of issues with removing IDS. I
think the outstanding ones are:

> 1) Incremental updates. IDS converts batches of 100 revs at a time,
> which also triggers autopacks at 1k revs. Streaming fetch is currently
> an all-or-nothing, which isn't appropriate (IMO) for conversions.
> Consider that conversion can take *days*, it is important to have
> something that can be stopped and resumed.
>
> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
> bloats to 2.4GB before packing back to 25MB. We know the new format is
> even more sensitive to packing efficiency. Not to mention that a single
> big-stream generates a single large pack, it isn't directly obvious that
> we are being so inefficient.

i.e. performance concerns.

The streaming code is pretty similar in how it does the conversion now to the
way IDS did it, but probably still different enough that we will want to measure
the impact of this. I'm definitely concerned about case 2, the lack of packing
as you go, although perhaps the degree of bloat is reduced by using
semantic inventory-delta records?

The reason why I eventually deleted IDS was that it was just too burdensome to
keep two code paths alive, thoroughly tested, and correct. For instance, if we
simply reinstated IDS for local-only fetches then most of the test suite,
including the relevant interrepo tests, will only exercise IDS. Also, IDS
turned out to have a bug when used on a stacked repository that the extending
test suite in this branch revealed (I've forgotten the details, but can dig them
up if you like). It didn't seem worth the hassle of fixing IDS when I already
had a working implementation.

I'm certainly open to reinstating IDS if it's the most expedient way to have
reasonable local performance for upgrades, but I thought I'd try to be bold and
see if we could just live without the extra complexity. Maybe we can improve
performance of streaming rather than resurrect IDS?

-Andrew.

This is a pretty big patch.  It does lots of things:

* adds new insert_stream and get_stream verbs
 * adds de/serialization of inventory-delta records on the network
 * fixes rich-root generation in StreamSource
 * adds a bunch of new scenarios to per_interrepository tests
 * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
   the new pack is already optimal).
 * improves the inventory_delta module a little
 * various miscellaneous fixes and new tests that are hopefully self-evident
 * and, most controversially, removes InterDifferingSerializer.

From John's mail a while back there were a bunch of issues with removing IDS.  I
think the outstanding ones are:

> 1) Incremental updates. IDS converts batches of 100 revs at a time,
> which also triggers autopacks at 1k revs. Streaming fetch is currently
> an all-or-nothing, which isn't appropriate (IMO) for conversions.
> Consider that conversion can take *days*, it is important to have
> something that can be stopped and resumed.
> 
> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
> bloats to 2.4GB before packing back to 25MB. We know the new format is
> even more sensitive to packing efficiency. Not to mention that a single
> big-stream generates a single large pack, it isn't directly obvious that
> we are being so inefficient.

i.e. performance concerns.

The streaming code is pretty similar in how it does the conversion now to the
way IDS did it, but probably still different enough that we will want to measure
the impact of this.  I'm definitely concerned about case 2, the lack of packing
as you go, although perhaps the degree of bloat is reduced by using
semantic inventory-delta records?

The reason why I eventually deleted IDS was that it was just too burdensome to
keep two code paths alive, thoroughly tested, and correct.  For instance, if we
simply reinstated IDS for local-only fetches then most of the test suite,
including the relevant interrepo tests, will only exercise IDS.  Also, IDS
turned out to have a bug when used on a stacked repository that the extending
test suite in this branch revealed (I've forgotten the details, but can dig them
up if you like).  It didn't seem worth the hassle of fixing IDS when I already
had a working implementation.

I'm certainly open to reinstating IDS if it's the most expedient way to have
reasonable local performance for upgrades, but I thought I'd try to be bold and
see if we could just live without the extra complexity.  Maybe we can improve
performance of streaming rather than resurrect IDS?

-Andrew.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-16:

Download full text (4.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.

It also picks out the 'optimal' deltas by computing many different ones
and finding whichever one was the 'smallest'. For local conversions, the
time to compute 2-3 deltas was much smaller than to apply an inefficient
delta.

>>
>> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
>> bloats to 2.4GB before packing back to 25MB. We know the new format is
>> even more sensitive to packing efficiency. Not to mention that a single
>> big-stream generates a single large pack, it isn't directly obvious that
>> we are being so inefficient.
>
> i.e. performance concerns.
>

Generally, yes.

There is also:

3) Being able to resume because you snapshotted periodically as you
went. This seems even more important for a network transfer.

> The streaming code is pretty similar in how it does the conversion now to the
> way IDS did it, but probably still different enough that we will want to measure
> the impact of this. I'm definitely concerned about case 2, the lack of packing
> as you go, although perhaps the degree of bloat is reduced by using
> semantic inventory-delta records?
>

I don't think bzr bloating from 100MB => 2.4GB (and then back down to
25MB post pack) was because of inventory records. However, if it was
purely because of a bad streaming order, we could probably fix that by
changing how we stream texts.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
> 
> Requested reviews:
>     bzr-core (bzr-core)
> 
> This is a pretty big patch.  It does lots of things:
> 
>  * adds new insert_stream and get_stream verbs
>  * adds de/serialization of inventory-delta records on the network
>  * fixes rich-root generation in StreamSource
>  * adds a bunch of new scenarios to per_interrepository tests
>  * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
>    the new pack is already optimal).
>  * improves the inventory_delta module a little
>  * various miscellaneous fixes and new tests that are hopefully self-evident
>  * and, most controversially, removes InterDifferingSerializer.
> 
>>From John's mail a while back there were a bunch of issues with removing IDS.  I
> think the outstanding ones are:
> 
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.

Generally, yes.

There is also:

3) Being able to resume because you snapshotted periodically as you
went. This seems even more important for a network transfer.

> The streaming code is pretty similar in how it does the conversion now to the
> way IDS did it, but probably still different enough that we will want to measure
> the impact of this.  I'm definitely concerned about case 2, the lack of packing
> as you go, although perhaps the degree of bloat is reduced by using
> semantic inventory-delta records?
>

> The reason why I eventually deleted IDS was that it was just too burdensome to
> keep two code paths alive, thoroughly tested, and correct.  For instance, if we
> simply reinstated IDS for local-only fetches then most of the test suite,
> including the relevant interrepo tests, will only exercise IDS.  Also, IDS
> turned out to have a bug when used on a stacked repository that the extending
> test suite in this branch revealed (I've forgotten the details, but can dig them
> up if you like).  It didn't seem worth the hassle of fixing IDS when I already
> had a working implementation.
> 
> I'm certainly open to reinstating IDS if it's the most expedient way to have
> reasonable local performance for upgrades, but I thought I'd try to be bold and
> see if we could just live without the extra complexity.  Maybe we can improve
> performance of streaming rather than resurrect IDS?
> 
> -Andrew.

I'm certainly open to the suggestion of getting rid of IDS. I don't like
having multiple code paths. It just happens that there are *big* wins
and it is often easier to write optimized code in a different framework.

(3) is an issue I'd like to see addressed, but which Robert seems
particularly unhappy having us try to do. (See other bug comments, etc
about how other systems don't do it and he feels it isn't worth doing.)

It was fairly straightforward to do with IDS, the argument I think from
Robert is that the client would need to be computing whether it has a
'complete' set and thus can commit the current write group. (the
*source* knows these sort of things, and can just say "and now you have
it", but the client has to re-do all that work to figure it out from a
stream.)

I'll send a follow up email later to do more of a review of the code
changes.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfT4UACgkQJdeBCYSNAAM0kQCfTO0rlN9Zl1LDues6IpHdp7ju
FHEAoKgP+f81hxKB6o3iyVt5mYPoFyUD
=QSKC
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-16:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

and

4) Progress indication

This is really quite useful for a process that can take *days* to
complete. The Stream code is often quite nice, but the fact that it
gives you 2 states:
'getting stream'
'inserting stream'

and nothing more than that is pretty crummy.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfUK8ACgkQJdeBCYSNAAMa+wCgybpPdd4Yie/Craew/zxX9eF7
cWMAoNcxPftDDdLssboDW7rezk4d2L2d
=WA26
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-16:

Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So for starters, let me mention what I found wrt performance:

time bzr.dev branch mysql-1k myqsl-2a/1k
real 3m18.490s

time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
real 2m29.953s

+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
                 return cached_ie.copy()

kind = elt.tag

It has 2 basic effects:

1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())

2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
x.attribute1 == y.attribute1
and x.attribute2 == y.attribute2
etc.

As you can see it is a big win for this test case (about 4:3 or 33% faster)

So what about Andrew's work:

time bzr.inv.delta branch mysql-1k myqsl-2a/1k
real 10m14.267s

time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
real 9m49.372s

It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043

For most of that time, making it really look like it was stalled.

Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.

If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).

So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfWm0ACgkQJdeBCYSNAAM8mgCgru3K3SpP8BcMZdLJLH...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
> 
> Requested reviews:
>     bzr-core (bzr-core)
> 
> This is a pretty big patch.  It does lots of things:
> 
>  * adds new insert_stream and get_stream verbs
>  * adds de/serialization of inventory-delta records on the network
>  * fixes rich-root generation in StreamSource
>  * adds a bunch of new scenarios to per_interrepository tests
>  * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
>    the new pack is already optimal).
>  * improves the inventory_delta module a little
>  * various miscellaneous fixes and new tests that are hopefully self-evident
>  * and, most controversially, removes InterDifferingSerializer.
> 
>>From John's mail a while back there were a bunch of issues with removing IDS.  I
> think the outstanding ones are:

So for starters, let me mention what I found wrt performance:

time bzr.dev branch mysql-1k myqsl-2a/1k
  real    3m18.490s

time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
  real    2m29.953s

+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py      2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py      2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- -                # if cached_ie.kind == 'directory':
- -                #     return cached_ie.copy()
- -                # return cached_ie
+                if cached_ie.kind == 'directory':
+                    return cached_ie.copy()
+                return cached_ie
                 return cached_ie.copy()

kind = elt.tag

It has 2 basic effects:

1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())

2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
  x.attribute1 == y.attribute1
  and x.attribute2 == y.attribute2
etc.

As you can see it is a big win for this test case (about 4:3 or 33% faster)

So what about Andrew's work:

time bzr.inv.delta branch mysql-1k myqsl-2a/1k
  real    10m14.267s

time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
  real    9m49.372s

It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043

For most of that time, making it really look like it was stalled.

If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfWm0ACgkQJdeBCYSNAAM8mgCgru3K3SpP8BcMZdLJLHkqSSTQ
TlAAoLVd7ndCbPeHNl3Kxbu0clMKLnR7
=BB8Q
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-16:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Andrew Bennetts wrote:

...

> So for starters, let me mention what I found wrt performance:
>
> time bzr.dev branch mysql-1k myqsl-2a/1k
> real 3m18.490s
>
> time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
> real 2m29.953s

...

> time bzr.inv.delta branch mysql-1k myqsl-2a/1k
> real 10m14.267s
>
> time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
> real 9m49.372s

Also, for real-world space issues:
$ du -ksh mysql-2a*/.bzr/repository/obsolete*
1.9M mysql-2a-bzr.dev/.bzr/repository/obsolete_packs
467M mysql-2a-inv-delta/.bzr/repository/obsolete_packs

The peak size (watch du -ksh mysql-2a-bzr.dev) during conversion using
IDS was 49MB.

$ du -ksh mysql-2a*/.bzr/repository/packs*
11M mysql-2a-bzr.dev/.bzr/repository/packs
9.1M mysql-2a-inv-delta/.bzr/repository/packs

So the new code wins slightly in the final size on disk, because it
packed at the end, rather than at 1k revs (and then there were another
40+ revs inserted.)

However, it bloated from 15MB => 467MB while it was doing the transfer
before the final size. Versus a peak of 50MB (almost 10x larger).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfdCkACgkQJdeBCYSNAANABACgl4l4L1AjaiXRJgrn5iwLrVe1
tVEAnRRJ0QbWzd8lXFXQXhWdhvqFjnw8
=pXZe
-----END PGP SIGNATURE-----

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-07-17:

John A Meinel wrote:
[...]
> It also picks out the 'optimal' deltas by computing many different ones
> and finding whichever one was the 'smallest'. For local conversions, the
> time to compute 2-3 deltas was much smaller than to apply an inefficient
> delta.

FWIW, the streaming code also does this. My guess (not yet measured) is that
sending less bytes over the network is also a win, especially when one parent
might be a one-liner and the other might be large merge from trunk.

[...]
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

Yes, although we already don't have this for the network. It would be great to
have...

[...]
> I'm certainly open to the suggestion of getting rid of IDS. I don't like
> having multiple code paths. It just happens that there are *big* wins
> and it is often easier to write optimized code in a different framework.

Sure. Like I said for me it was just getting to be a large hassle to maintain
both paths in my branch, even though they were increasingly sharing a lot of
code for e.g. rich root generation before I deleted IDS.

I'd like to try see if we can cheaply fix the performance issues you report in
other mails without needing IDS. If we do need IDS for a while longer then
fine, although I think we'll want to restrict it to local source, local target,
non-stacked cases only.

Thanks for the measurements and quick feedback.

-Andrew.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-07-17:

On Thu, 2009-07-16 at 16:06 +0000, John A Meinel wrote:
>
> (3) is an issue I'd like to see addressed, but which Robert seems
> particularly unhappy having us try to do. (See other bug comments, etc
> about how other systems don't do it and he feels it isn't worth
> doing.)

I'd like to be clear about this. I'd be ecstatic *if* we can do it well
and robustly. However I don't think it is *at all* easy to that. If I'm
wrong - great.

I'm fine with keeping IDS for local fetches. But when networking is
involved IDS is massively slower than the streaming codepath.

> It was fairly straightforward to do with IDS, the argument I think
> from
> Robert is that the client would need to be computing whether it has a
> 'complete' set and thus can commit the current write group. (the
> *source* knows these sort of things, and can just say "and now you
> have
> it", but the client has to re-do all that work to figure it out from a
> stream.)

I think that aspect is simple - we have a stream subtype that says
'checkpoint'. Its the requirement to do all that work that is, I think
problematic - and thats *without* considering stacking, which makes it
hugely harder.

-Rob

Revision history for this message

Robert Collins (lifeless) wrote on 2009-07-17:

On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>
>
> 4) Progress indication
>
> This is really quite useful for a process that can take *days* to
> complete. The Stream code is often quite nice, but the fact that it
> gives you 2 states:
> 'getting stream'
> 'inserting stream'
>
> and nothing more than that is pretty crummy.

That is a separate bug however, and one that affects normal fetches too.
So I don't think tying it to the IDS discussion is necessary or
particularly helpful.

-Rob

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-17:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>>
>> 4) Progress indication
>>
>> This is really quite useful for a process that can take *days* to
>> complete. The Stream code is often quite nice, but the fact that it
>> gives you 2 states:
>> 'getting stream'
>> 'inserting stream'
>>
>> and nothing more than that is pretty crummy.
>
> That is a separate bug however, and one that affects normal fetches too.
> So I don't think tying it to the IDS discussion is necessary or
> particularly helpful.
>
> -Rob
>

It is explicitly relevant that doing "bzr upgrade --2a" which will take
longer-than-normal would now not even show a progress bar.

For local fetches, you don't even get the "transport activity"
indicator, so it *really* looks hung. It doesn't even write things into
.bzr.log so that you know it is doing anything other than spinning in a
while True loop. I guess you can tell because your disk consumption is
going way up...

I don't honestly know the performance difference for streaming a lot of
content over the network. Given a 4x performance slowdown, for large
fetches IDS could still be faster. I certainly agree that IDS is
probably significantly more inefficient when doing something like "give
me the last 2 revs".

It honestly wasn't something I was optimizing for (cross format
fetching). I *was* trying to make 'bzr upgrade' be measured in hours
rather than days/weeks/etc.

Also, given that you have to upgrade all of your stacked locations at
the same time, and --2a is a trap door, aren't 95% of upgrades going to
be all at once anyway?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpf46YACgkQJdeBCYSNAANwAwCfYQj7gws3O4KDPxqrcMLu4nfB
554AoIyuns4b5Fsa3wf4uFhf4Uex00oQ
=qjX9
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Andrew Bennetts

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~spiv/bzr/inventory-delta into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-07-20 21:21:10 +0000
 +++ NEWS	2009-07-22 00:35:22 +0000
@@ -56,6 +56,9 @@
    lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or
    ``do_end``.  (Andrew Bennetts, #338561)
++* StreamSource generates rich roots from non-rich root sources correctly
++  now.  (Andrew Bennetts, #368921)
++
  * ``WorkingTree4.unversion`` will no longer fail to unversion ids which
    were present in a parent tree but renamed in the working tree.
    (Robert Collins, #187207)
@@ -63,6 +66,12 @@
  Improvements
  ************
++* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
++  smart server are more efficient now.  They send inventory deltas rather
++  than full inventories.  The smart server has two new requests,
++  ``Repository.get_stream_1.18`` and ``Repository.insert_stream_1.18`` to
++  support this.  (Andrew Bennetts, #374738, #385826)
++
  Documentation
  *************
@@ -84,6 +93,9 @@
  * ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds
    as new a key which was already mapped. (Robert Collins)
++* InterDifferingSerializer has been removed.  The transformations it
++  provided are now done automatically by StreamSource.  (Andrew Bennetts)
++
  * Inventory delta application catches more cases of corruption and can
    prevent corrupt deltas from affecting consistency of data structures on
    disk. (Robert Collins)
 === modified file 'bzrlib/fetch.py'
 --- bzrlib/fetch.py	2009-06-17 17:57:15 +0000
 +++ bzrlib/fetch.py	2009-07-22 00:35:22 +0000
@@ -25,16 +25,21 @@
  import operator
++from bzrlib.lazy_import import lazy_import
++lazy_import(globals(), """
++from bzrlib import (
++    tsort,
++    versionedfile,
++    )
++""")
  import bzrlib
  from bzrlib import (
      errors,
      symbol_versioning,
+     )
  from bzrlib.revision import NULL_REVISION
--from bzrlib.tsort import topo_sort
  from bzrlib.trace import mutter
  import bzrlib.ui
--from bzrlib.versionedfile import FulltextContentFactory
  class RepoFetcher(object):
@@ -216,11 +221,9 @@
      def _find_root_ids(self, revs, parent_map, graph):
          revision_root = {}
--        planned_versions = {}
          for tree in self.iter_rev_trees(revs):
              revision_id = tree.inventory.root.revision
              root_id = tree.get_root_id()
--            planned_versions.setdefault(root_id, []).append(revision_id)
              revision_root[revision_id] = root_id
          # Find out which parents we don't already know root ids for
          parents = set()
@@ -232,7 +235,7 @@
          for tree in self.iter_rev_trees(parents):
              root_id = tree.get_root_id()
              revision_root[tree.get_revision_id()] = root_id
--        return revision_root, planned_versions
++        return revision_root
      def generate_root_texts(self, revs):
          """Generate VersionedFiles for all root ids.
@@ -241,9 +244,8 @@
          """
          graph = self.source.get_graph()
          parent_map = graph.get_parent_map(revs)
--        rev_order = topo_sort(parent_map)
--        rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(
--            revs, parent_map, graph)
++        rev_order = tsort.topo_sort(parent_map)
++        rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
          root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
              rev_order]
          # Guaranteed stable, this groups all the file id operations together
@@ -252,20 +254,77 @@
          # yet, and are unlikely to in non-rich-root environments anyway.
          root_id_order.sort(key=operator.itemgetter(0))
          # Create a record stream containing the roots to create.
--        def yield_roots():
--            for key in root_id_order:
--                root_id, rev_id = key
--                rev_parents = parent_map[rev_id]
--                # We drop revision parents with different file-ids, because
--                # that represents a rename of the root to a different location
--                # - its not actually a parent for us. (We could look for that
--                # file id in the revision tree at considerably more expense,
--                # but for now this is sufficient (and reconcile will catch and
--                # correct this anyway).
--                # When a parent revision is a ghost, we guess that its root id
--                # was unchanged (rather than trimming it from the parent list).
--                parent_keys = tuple((root_id, parent) for parent in rev_parents
--                    if parent != NULL_REVISION and
--                        rev_id_to_root_id.get(parent, root_id) == root_id)
--                yield FulltextContentFactory(key, parent_keys, None, '')
--        return [('texts', yield_roots())]
++        from bzrlib.graph import FrozenHeadsCache
++        graph = FrozenHeadsCache(graph)
++        new_roots_stream = _new_root_data_stream(
++            root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
++        return [('texts', new_roots_stream)]
++
++
++def _new_root_data_stream(
++    root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
++    for root_key in root_keys_to_create:
++        root_id, rev_id = root_key
++        parent_keys = _parent_keys_for_root_version(
++            root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph)
++        yield versionedfile.FulltextContentFactory(
++            root_key, parent_keys, None, '')
++
++
++def _parent_keys_for_root_version(
++    root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
++    """Get the parent keys for a given root id."""
++    # Include direct parents of the revision, but only if they used the same
++    # root_id and are heads.
++    rev_parents = parent_map[rev_id]
++    parent_ids = []
++    for parent_id in rev_parents:
++        if parent_id == NULL_REVISION:
++            continue
++        if parent_id not in rev_id_to_root_id_map:
++            # We probably didn't read this revision, go spend the extra effort
++            # to actually check
++            try:
++                tree = repo.revision_tree(parent_id)
++            except errors.NoSuchRevision:
++                # Ghost, fill out rev_id_to_root_id in case we encounter this
++                # again.
++                # But set parent_root_id to None since we don't really know
++                parent_root_id = None
++            else:
++                parent_root_id = tree.get_root_id()
++            rev_id_to_root_id_map[parent_id] = None
++            # XXX: why not:
++            #   rev_id_to_root_id_map[parent_id] = parent_root_id
++            # memory consumption maybe?
++        else:
++            parent_root_id = rev_id_to_root_id_map[parent_id]
++        if root_id == parent_root_id:
++            # With stacking we _might_ want to refer to a non-local revision,
++            # but this code path only applies when we have the full content
++            # available, so ghosts really are ghosts, not just the edge of
++            # local data.
++            parent_ids.append(parent_id)
++        else:
++            # root_id may be in the parent anyway.
++            try:
++                tree = repo.revision_tree(parent_id)
++            except errors.NoSuchRevision:
++                # ghost, can't refer to it.
++                pass
++            else:
++                try:
++                    parent_ids.append(tree.inventory[root_id].revision)
++                except errors.NoSuchId:
++                    # not in the tree
++                    pass
++    # Drop non-head parents
++    if graph is None:
++        graph = repo.get_graph()
++    heads = graph.heads(parent_ids)
++    selected_ids = []
++    for parent_id in parent_ids:
++        if parent_id in heads and parent_id not in selected_ids:
++            selected_ids.append(parent_id)
++    parent_keys = [(root_id, parent_id) for parent_id in selected_ids]
++    return parent_keys
 === modified file 'bzrlib/inventory_delta.py'
 --- bzrlib/inventory_delta.py	2009-04-02 05:53:12 +0000
 +++ bzrlib/inventory_delta.py	2009-07-22 00:35:22 +0000
@@ -119,28 +119,46 @@
  class InventoryDeltaSerializer(object):
      """Serialize and deserialize inventory deltas."""
++    # XXX: really, the serializer and deserializer should be two separate
++    # classes.
++
      FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
--    def __init__(self, versioned_root, tree_references):
--        """Create an InventoryDeltaSerializer.
++    def __init__(self):
++        """Create an InventoryDeltaSerializer."""
++        self._versioned_root = None
++        self._tree_references = None
++        self._entry_to_content = {
++            'directory': _directory_content,
++            'file': _file_content,
++            'symlink': _link_content,
++        }
++
++    def require_flags(self, versioned_root=None, tree_references=None):
++        """Set the versioned_root and/or tree_references flags for this
++        (de)serializer.
          :param versioned_root: If True, any root entry that is seen is expected
              to be versioned, and root entries can have any fileid.
          :param tree_references: If True support tree-reference entries.
          """
++        if versioned_root is not None and self._versioned_root is not None:
++            raise AssertionError(
++                "require_flags(versioned_root=...) already called.")
++        if tree_references is not None and self._tree_references is not None:
++            raise AssertionError(
++                "require_flags(tree_references=...) already called.")
          self._versioned_root = versioned_root
          self._tree_references = tree_references
--        self._entry_to_content = {
--            'directory': _directory_content,
--            'file': _file_content,
--            'symlink': _link_content,
--        }
          if tree_references:
              self._entry_to_content['tree-reference'] = _reference_content
      def delta_to_lines(self, old_name, new_name, delta_to_new):
          """Return a line sequence for delta_to_new.
++        Both the versioned_root and tree_references flags must be set via
++        require_flags before calling this.
++
          :param old_name: A UTF8 revision id for the old inventory.  May be
              NULL_REVISION if there is no older inventory and delta_to_new
              includes the entire inventory contents.
@@ -150,6 +168,10 @@
              takes.
          :return: The serialized delta as lines.
          """
++        if self._versioned_root is None or self._tree_references is None:
++            raise AssertionError(
++                "Cannot serialise unless versioned_root/tree_references flags "
++                "are both set.")
          lines = ['', '', '', '', '']
          to_line = self._delta_item_to_line
          for delta_item in delta_to_new:
@@ -188,6 +210,10 @@
                  oldpath_utf8 = 'None'
              else:
                  oldpath_utf8 = '/' + oldpath.encode('utf8')
++            if newpath == '/':
++                raise AssertionError(
++                    "Bad inventory delta: '/' is not a valid newpath "
++                    "(should be '') in delta item %r" % (delta_item,))
              # TODO: Test real-world utf8 cache hit rate. It may be a win.
              newpath_utf8 = '/' + newpath.encode('utf8')
              # Serialize None as ''
@@ -221,10 +247,18 @@
      def parse_text_bytes(self, bytes):
          """Parse the text bytes of a serialized inventory delta.
++        If versioned_root and/or tree_references flags were set via
++        require_flags, then the parsed flags must match or a BzrError will be
++        raised.
++
          :param bytes: The bytes to parse. This can be obtained by calling
              delta_to_lines and then doing ''.join(delta_lines).
--        :return: (parent_id, new_id, inventory_delta)
++        :return: (parent_id, new_id, versioned_root, tree_references,
++            inventory_delta)
          """
++        if bytes[-1:] != '\n':
++            last_line = bytes.rsplit('\n', 1)[-1]
++            raise errors.BzrError('last line not empty: %r' % (last_line,))
          lines = bytes.split('\n')[:-1] # discard the last empty line
          if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:
              raise errors.BzrError('unknown format %r' % lines[0:1])
@@ -240,11 +274,13 @@
          if len(lines) < 5 or not lines[4].startswith('tree_references: '):
              raise errors.BzrError('missing tree_references: marker')
          delta_tree_references = self._deserialize_bool(lines[4][17:])
--        if delta_versioned_root != self._versioned_root:
++        if (self._versioned_root is not None and
++            delta_versioned_root != self._versioned_root):
              raise errors.BzrError(
                  "serialized versioned_root flag is wrong: %s" %
                  (delta_versioned_root,))
--        if delta_tree_references != self._tree_references:
++        if (self._tree_references is not None
++            and delta_tree_references != self._tree_references):
              raise errors.BzrError(
                  "serialized tree_references flag is wrong: %s" %
                  (delta_tree_references,))
@@ -266,22 +302,34 @@
                      raise errors.BzrError("Versioned root found: %r" % line)
              elif last_modified[-1] == ':':
                      raise errors.BzrError('special revisionid found: %r' % line)
--            if not delta_tree_references and content.startswith('tree\x00'):
++            if delta_tree_references is False and content.startswith('tree\x00'):
                  raise errors.BzrError("Tree reference found: %r" % line)
--            content_tuple = tuple(content.split('\x00'))
--            entry = _parse_entry(
--                newpath_utf8, file_id, parent_id, last_modified, content_tuple)
              if oldpath_utf8 == 'None':
                  oldpath = None
++            elif oldpath_utf8[:1] != '/':
++                raise errors.BzrError(
++                    "oldpath invalid (does not start with /): %r"
++                    % (oldpath_utf8,))
              else:
++                oldpath_utf8 = oldpath_utf8[1:]
                  oldpath = oldpath_utf8.decode('utf8')
              if newpath_utf8 == 'None':
                  newpath = None
++            elif newpath_utf8[:1] != '/':
++                raise errors.BzrError(
++                    "newpath invalid (does not start with /): %r"
++                    % (newpath_utf8,))
              else:
++                # Trim leading slash
++                newpath_utf8 = newpath_utf8[1:]
                  newpath = newpath_utf8.decode('utf8')
++            content_tuple = tuple(content.split('\x00'))
++            entry = _parse_entry(
++                newpath_utf8, file_id, parent_id, last_modified, content_tuple)
              delta_item = (oldpath, newpath, file_id, entry)
              result.append(delta_item)
--        return delta_parent_id, delta_version_id, result
++        return (delta_parent_id, delta_version_id, delta_versioned_root,
++                delta_tree_references, result)
  def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):
 === modified file 'bzrlib/remote.py'
 --- bzrlib/remote.py	2009-07-06 09:47:35 +0000
 +++ bzrlib/remote.py	2009-07-22 00:35:22 +0000
@@ -422,6 +422,7 @@
          self._custom_format = None
          self._network_name = None
          self._creating_bzrdir = None
++        self._supports_chks = None
          self._supports_external_lookups = None
          self._supports_tree_reference = None
          self._rich_root_data = None
@@ -439,6 +440,13 @@
          return self._rich_root_data
      @property
++    def supports_chks(self):
++        if self._supports_chks is None:
++            self._ensure_real()
++            self._supports_chks = self._custom_format.supports_chks
++        return self._supports_chks
++
++    @property
      def supports_external_lookups(self):
          if self._supports_external_lookups is None:
              self._ensure_real()
@@ -575,6 +583,11 @@
          self._ensure_real()
          return self._custom_format._serializer
++    @property
++    def repository_class(self):
++        self._ensure_real()
++        return self._custom_format.repository_class
++
  class RemoteRepository(_RpcHelper):
      """Repository accessed over rpc.
@@ -1158,9 +1171,9 @@
          self._ensure_real()
          return self._real_repository.get_inventory(revision_id)
--    def iter_inventories(self, revision_ids):
++    def iter_inventories(self, revision_ids, ordering='unordered'):
          self._ensure_real()
--        return self._real_repository.iter_inventories(revision_ids)
++        return self._real_repository.iter_inventories(revision_ids, ordering)
      @needs_read_lock
      def get_revision(self, revision_id):
@@ -1647,6 +1660,9 @@
  class RemoteStreamSink(repository.StreamSink):
++    def __init__(self, target_repo):
++        repository.StreamSink.__init__(self, target_repo)
++
      def _insert_real(self, stream, src_format, resume_tokens):
          self.target_repo._ensure_real()
          sink = self.target_repo._real_repository._get_sink()
@@ -1658,43 +1674,57 @@
      def insert_stream(self, stream, src_format, resume_tokens):
          target = self.target_repo
          target._unstacked_provider.missing_keys.clear()
++        candidate_calls = [('Repository.insert_stream_1.18', (1, 18))]
          if target._lock_token:
--            verb = 'Repository.insert_stream_locked'
--            extra_args = (target._lock_token or '',)
--            required_version = (1, 14)
++            candidate_calls.append(('Repository.insert_stream_locked', (1, 14)))
++            lock_args = (target._lock_token or '',)
          else:
--            verb = 'Repository.insert_stream'
--            extra_args = ()
--            required_version = (1, 13)
++            candidate_calls.append(('Repository.insert_stream', (1, 13)))
++            lock_args = ()
          client = target._client
          medium = client._medium
--        if medium._is_remote_before(required_version):
--            # No possible way this can work.
--            return self._insert_real(stream, src_format, resume_tokens)
          path = target.bzrdir._path_for_remote_call(client)
--        if not resume_tokens:
--            # XXX: Ugly but important for correctness, *will* be fixed during
--            # 1.13 cycle. Pushing a stream that is interrupted results in a
--            # fallback to the _real_repositories sink *with a partial stream*.
--            # Thats bad because we insert less data than bzr expected. To avoid
--            # this we do a trial push to make sure the verb is accessible, and
--            # do not fallback when actually pushing the stream. A cleanup patch
--            # is going to look at rewinding/restarting the stream/partial
--            # buffering etc.
++        found_verb = False
++        for verb, required_version in candidate_calls:
++            if medium._is_remote_before(required_version):
++                continue
++            if resume_tokens:
++                # We've already done the probing (and set _is_remote_before) on
++                # a previous insert.
++                found_verb = True
++                break
              byte_stream = smart_repo._stream_to_byte_stream([], src_format)
              try:
                  response = client.call_with_body_stream(
--                    (verb, path, '') + extra_args, byte_stream)
++                    (verb, path, '') + lock_args, byte_stream)
              except errors.UnknownSmartMethod:
                  medium._remember_remote_is_before(required_version)
--                return self._insert_real(stream, src_format, resume_tokens)
++            else:
++                found_verb = True
++                break
++        if not found_verb:
++            # Have to use VFS.
++            return self._insert_real(stream, src_format, resume_tokens)
++        self._last_inv_record = None
++        self._last_substream = None
++        if required_version < (1, 18):
++            # Remote side doesn't support inventory deltas.  Wrap the stream to
++            # make sure we don't send any.  If the stream contains inventory
++            # deltas we'll interrupt the smart insert_stream request and
++            # fallback to VFS.
++            stream = self._stop_stream_if_inventory_delta(stream)
          byte_stream = smart_repo._stream_to_byte_stream(
              stream, src_format)
          resume_tokens = ' '.join(resume_tokens)
          response = client.call_with_body_stream(
--            (verb, path, resume_tokens) + extra_args, byte_stream)
++            (verb, path, resume_tokens) + lock_args, byte_stream)
          if response[0][0] not in ('ok', 'missing-basis'):
              raise errors.UnexpectedSmartServerResponse(response)
++        if self._last_inv_record is not None:
++            # The stream included an inventory-delta record, but the remote
++            # side isn't new enough to support them.  So we need to send the
++            # rest of the stream via VFS.
++            return self._resume_stream_with_vfs(response, src_format)
          if response[0][0] == 'missing-basis':
              tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
              resume_tokens = tokens
@@ -1703,6 +1733,60 @@
              self.target_repo.refresh_data()
              return [], set()
++    def _resume_stream_with_vfs(self, response, src_format):
++        """Resume sending a stream via VFS, first resending the record and
++        substream that couldn't be sent via an insert_stream verb.
++        """
++        if response[0][0] == 'missing-basis':
++            tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
++            # Ignore missing_keys, we haven't finished inserting yet
++        else:
++            tokens = []
++        def resume_substream():
++            # First yield the record we stopped at.
++            yield self._last_inv_record
++            self._last_inv_record = None
++            # Then yield the rest of the substream that was interrupted.
++            for record in self._last_substream:
++                yield record
++            self._last_substream = None
++        def resume_stream():
++            # Finish sending the interrupted substream
++            yield ('inventories', resume_substream())
++            # Then simply continue sending the rest of the stream.
++            for substream_kind, substream in self._last_stream:
++                yield substream_kind, substream
++        return self._insert_real(resume_stream(), src_format, tokens)
++
++    def _stop_stream_if_inventory_delta(self, stream):
++        """Normally this just lets the original stream pass-through unchanged.
++
++        However if any 'inventories' substream includes an inventory-delta
++        record it will stop streaming, and store the interrupted record,
++        substream and stream in self._last_inv_record, self._last_substream and
++        self._last_stream so that the stream can be resumed by
++        _resume_stream_with_vfs.
++        """
++        def filter_inv_substream(inv_substream):
++            substream_iter = iter(inv_substream)
++            for record in substream_iter:
++                if record.storage_kind == 'inventory-delta':
++                    self._last_inv_record = record
++                    self._last_substream = substream_iter
++                    return
++                else:
++                    yield record
++
++        stream_iter = iter(stream)
++        for substream_kind, substream in stream_iter:
++            if substream_kind == 'inventories':
++                yield substream_kind, filter_inv_substream(substream)
++                if self._last_inv_record is not None:
++                    self._last_stream = stream_iter
++                    return
++            else:
++                yield substream_kind, substream
++
  class RemoteStreamSource(repository.StreamSource):
      """Stream data from a remote server."""
@@ -1714,6 +1798,12 @@
          return self.missing_parents_chain(search, [self.from_repository] +
              self.from_repository._fallback_repositories)
++    def get_stream_for_missing_keys(self, missing_keys):
++        self.from_repository._ensure_real()
++        real_repo = self.from_repository._real_repository
++        real_source = real_repo._get_source(self.to_format)
++        return real_source.get_stream_for_missing_keys(missing_keys)
++
      def _real_stream(self, repo, search):
          """Get a stream for search from repo.
@@ -1748,18 +1838,26 @@
              return self._real_stream(repo, search)
          client = repo._client
          medium = client._medium
--        if medium._is_remote_before((1, 13)):
--            # streaming was added in 1.13
--            return self._real_stream(repo, search)
          path = repo.bzrdir._path_for_remote_call(client)
--        try:
--            search_bytes = repo._serialise_search_result(search)
--            response = repo._call_with_body_bytes_expecting_body(
--                'Repository.get_stream',
--                (path, self.to_format.network_name()), search_bytes)
--            response_tuple, response_handler = response
--        except errors.UnknownSmartMethod:
--            medium._remember_remote_is_before((1,13))
++        search_bytes = repo._serialise_search_result(search)
++        args = (path, self.to_format.network_name())
++        candidate_verbs = [
++            ('Repository.get_stream_1.18', (1, 18)),
++            ('Repository.get_stream', (1, 13))]
++        found_verb = False
++        for verb, version in candidate_verbs:
++            if medium._is_remote_before(version):
++                continue
++            try:
++                response = repo._call_with_body_bytes_expecting_body(
++                    verb, args, search_bytes)
++            except errors.UnknownSmartMethod:
++                medium._remember_remote_is_before(version)
++            else:
++                response_tuple, response_handler = response
++                found_verb = True
++                break
++        if not found_verb:
              return self._real_stream(repo, search)
          if response_tuple[0] != 'ok':
              raise errors.UnexpectedSmartServerResponse(response_tuple)
 === modified file 'bzrlib/repofmt/groupcompress_repo.py'
 --- bzrlib/repofmt/groupcompress_repo.py	2009-07-14 17:33:13 +0000
 +++ bzrlib/repofmt/groupcompress_repo.py	2009-07-22 00:35:23 +0000
@@ -154,6 +154,8 @@
          self._writer.begin()
          # what state is the pack in? (open, finished, aborted)
          self._state = 'open'
++        # no name until we finish writing the content
++        self.name = None
      def _check_references(self):
          """Make sure our external references are present.
@@ -466,6 +468,13 @@
          if not self._use_pack(self.new_pack):
              self.new_pack.abort()
              return None
++        self.new_pack.finish_content()
++        if len(self.packs) == 1:
++            old_pack = self.packs[0]
++            if old_pack.name == self.new_pack._hash.hexdigest():
++                # The single old pack was already optimally packed.
++                self.new_pack.abort()
++                return None
          self.pb.update('finishing repack', 6, 7)
          self.new_pack.finish()
          self._pack_collection.allocate(self.new_pack)
@@ -766,10 +775,10 @@
              if basis_tree is not None:
                  basis_tree.unlock()
--    def _iter_inventories(self, revision_ids):
++    def _iter_inventories(self, revision_ids, ordering):
          """Iterate over many inventory objects."""
          keys = [(revision_id,) for revision_id in revision_ids]
--        stream = self.inventories.get_record_stream(keys, 'unordered', True)
++        stream = self.inventories.get_record_stream(keys, ordering, True)
          texts = {}
          for record in stream:
              if record.storage_kind != 'absent':
@@ -779,7 +788,7 @@
          for key in keys:
              yield inventory.CHKInventory.deserialise(self.chk_bytes, texts[key], key)
--    def _iter_inventory_xmls(self, revision_ids):
++    def _iter_inventory_xmls(self, revision_ids, ordering):
          # Without a native 'xml' inventory, this method doesn't make sense, so
          # make it raise to trap naughty direct users.
          raise NotImplementedError(self._iter_inventory_xmls)
@@ -879,14 +888,13 @@
      def _get_source(self, to_format):
          """Return a source for streaming from this repository."""
--        if isinstance(to_format, remote.RemoteRepositoryFormat):
--            # Can't just check attributes on to_format with the current code,
--            # work around this:
--            to_format._ensure_real()
--            to_format = to_format._custom_format
--        if to_format.__class__ is self._format.__class__:
++        if (to_format.supports_chks and
++            self._format.repository_class is to_format.repository_class and
++            self._format._serializer == to_format._serializer):
              # We must be exactly the same format, otherwise stuff like the chk
--            # page layout might be different
++            # page layout might be different.
++            # Actually, this test is just slightly looser than exact so that
++            # CHK2 <-> 2a transfers will work.
              return GroupCHKStreamSource(self, to_format)
          return super(CHKInventoryRepository, self)._get_source(to_format)
@@ -1036,8 +1044,6 @@
      repository_class = CHKInventoryRepository
      supports_external_lookups = True
      supports_chks = True
--    # For right now, setting this to True gives us InterModel1And2 rather
--    # than InterDifferingSerializer
      _commit_builder_class = PackRootCommitBuilder
      rich_root_data = True
      _serializer = chk_serializer.chk_serializer_255_bigpage
 === modified file 'bzrlib/repofmt/pack_repo.py'
 --- bzrlib/repofmt/pack_repo.py	2009-07-01 10:42:14 +0000
 +++ bzrlib/repofmt/pack_repo.py	2009-07-22 00:35:23 +0000
@@ -422,6 +422,8 @@
          self._writer.begin()
          # what state is the pack in? (open, finished, aborted)
          self._state = 'open'
++        # no name until we finish writing the content
++        self.name = None
      def abort(self):
          """Cancel creating this pack."""
@@ -448,6 +450,14 @@
              self.signature_index.key_count() or
              (self.chk_index is not None and self.chk_index.key_count()))
++    def finish_content(self):
++        if self.name is not None:
++            return
++        self._writer.end()
++        if self._buffer[1]:
++            self._write_data('', flush=True)
++        self.name = self._hash.hexdigest()
++
      def finish(self, suspend=False):
          """Finish the new pack.
@@ -459,10 +469,7 @@
           - stores the index size tuple for the pack in the index_sizes
             attribute.
          """
--        self._writer.end()
--        if self._buffer[1]:
--            self._write_data('', flush=True)
--        self.name = self._hash.hexdigest()
++        self.finish_content()
          if not suspend:
              self._check_references()
          # write indices
@@ -1567,7 +1574,7 @@
          # determine which packs need changing
          pack_operations = [[0, []]]
          for pack in self.all_packs():
--            if not hint or pack.name in hint:
++            if hint is None or pack.name in hint:
                  pack_operations[-1][0] += pack.get_revision_count()
                  pack_operations[-1][1].append(pack)
          self._execute_pack_operations(pack_operations, OptimisingPacker)
@@ -2093,6 +2100,7 @@
                  # when autopack takes no steps, the names list is still
                  # unsaved.
                  return self._save_pack_names()
++        return []
      def _suspend_write_group(self):
          tokens = [pack.name for pack in self._resumed_packs]
 === modified file 'bzrlib/repository.py'
 --- bzrlib/repository.py	2009-07-02 23:10:53 +0000
 +++ bzrlib/repository.py	2009-07-22 00:35:22 +0000
@@ -31,6 +31,7 @@
      gpg,
      graph,
      inventory,
++    inventory_delta,
      lazy_regex,
      lockable_files,
      lockdir,
@@ -923,6 +924,11 @@
          """
          if self._write_group is not self.get_transaction():
              # has an unlock or relock occured ?
++            if suppress_errors:
++                mutter(
++                '(suppressed) mismatched lock context and write group. %r, %r',
++                self._write_group, self.get_transaction())
++                return
              raise errors.BzrError(
                  'mismatched lock context and write group. %r, %r' %
                  (self._write_group, self.get_transaction()))
@@ -2178,7 +2184,7 @@
          """Get Inventory object by revision id."""
          return self.iter_inventories([revision_id]).next()
--    def iter_inventories(self, revision_ids):
++    def iter_inventories(self, revision_ids, ordering='unordered'):
          """Get many inventories by revision_ids.
          This will buffer some or all of the texts used in constructing the
@@ -2186,21 +2192,23 @@
          time.
          :param revision_ids: The expected revision ids of the inventories.
++        :param ordering: optional ordering, e.g. 'topological'.
          :return: An iterator of inventories.
          """
          if ((None in revision_ids)
              or (_mod_revision.NULL_REVISION in revision_ids)):
              raise ValueError('cannot get null revision inventory')
--        return self._iter_inventories(revision_ids)
++        return self._iter_inventories(revision_ids, ordering)
--    def _iter_inventories(self, revision_ids):
++    def _iter_inventories(self, revision_ids, ordering):
          """single-document based inventory iteration."""
--        for text, revision_id in self._iter_inventory_xmls(revision_ids):
++        inv_xmls = self._iter_inventory_xmls(revision_ids, ordering)
++        for text, revision_id in inv_xmls:
              yield self.deserialise_inventory(revision_id, text)
--    def _iter_inventory_xmls(self, revision_ids):
++    def _iter_inventory_xmls(self, revision_ids, ordering='unordered'):
          keys = [(revision_id,) for revision_id in revision_ids]
--        stream = self.inventories.get_record_stream(keys, 'unordered', True)
++        stream = self.inventories.get_record_stream(keys, ordering, True)
          text_chunks = {}
          for record in stream:
              if record.storage_kind != 'absent':
@@ -2236,7 +2244,7 @@
      @needs_read_lock
      def get_inventory_xml(self, revision_id):
          """Get inventory XML as a file object."""
--        texts = self._iter_inventory_xmls([revision_id])
++        texts = self._iter_inventory_xmls([revision_id], 'unordered')
          try:
              text, revision_id = texts.next()
          except StopIteration:
@@ -3480,288 +3488,6 @@
          return self.source.revision_ids_to_search_result(result_set)
--class InterDifferingSerializer(InterRepository):
--
--    @classmethod
--    def _get_repo_format_to_test(self):
--        return None
--
--    @staticmethod
--    def is_compatible(source, target):
--        """Be compatible with Knit2 source and Knit3 target"""
--        # This is redundant with format.check_conversion_target(), however that
--        # raises an exception, and we just want to say "False" as in we won't
--        # support converting between these formats.
--        if source.supports_rich_root() and not target.supports_rich_root():
--            return False
--        if (source._format.supports_tree_reference
--            and not target._format.supports_tree_reference):
--            return False
--        return True
--
--    def _get_delta_for_revision(self, tree, parent_ids, basis_id, cache):
--        """Get the best delta and base for this revision.
--
--        :return: (basis_id, delta)
--        """
--        possible_trees = [(parent_id, cache[parent_id])
--                          for parent_id in parent_ids
--                           if parent_id in cache]
--        if len(possible_trees) == 0:
--            # There either aren't any parents, or the parents aren't in the
--            # cache, so just use the last converted tree
--            possible_trees.append((basis_id, cache[basis_id]))
--        deltas = []
--        for basis_id, basis_tree in possible_trees:
--            delta = tree.inventory._make_delta(basis_tree.inventory)
--            deltas.append((len(delta), basis_id, delta))
--        deltas.sort()
--        return deltas[0][1:]
--
--    def _get_parent_keys(self, root_key, parent_map):
--        """Get the parent keys for a given root id."""
--        root_id, rev_id = root_key
--        # Include direct parents of the revision, but only if they used
--        # the same root_id and are heads.
--        parent_keys = []
--        for parent_id in parent_map[rev_id]:
--            if parent_id == _mod_revision.NULL_REVISION:
--                continue
--            if parent_id not in self._revision_id_to_root_id:
--                # We probably didn't read this revision, go spend the
--                # extra effort to actually check
--                try:
--                    tree = self.source.revision_tree(parent_id)
--                except errors.NoSuchRevision:
--                    # Ghost, fill out _revision_id_to_root_id in case we
--                    # encounter this again.
--                    # But set parent_root_id to None since we don't really know
--                    parent_root_id = None
--                else:
--                    parent_root_id = tree.get_root_id()
--                self._revision_id_to_root_id[parent_id] = None
--            else:
--                parent_root_id = self._revision_id_to_root_id[parent_id]
--            if root_id == parent_root_id:
--                # With stacking we _might_ want to refer to a non-local
--                # revision, but this code path only applies when we have the
--                # full content available, so ghosts really are ghosts, not just
--                # the edge of local data.
--                parent_keys.append((parent_id,))
--            else:
--                # root_id may be in the parent anyway.
--                try:
--                    tree = self.source.revision_tree(parent_id)
--                except errors.NoSuchRevision:
--                    # ghost, can't refer to it.
--                    pass
--                else:
--                    try:
--                        parent_keys.append((tree.inventory[root_id].revision,))
--                    except errors.NoSuchId:
--                        # not in the tree
--                        pass
--        g = graph.Graph(self.source.revisions)
--        heads = g.heads(parent_keys)
--        selected_keys = []
--        for key in parent_keys:
--            if key in heads and key not in selected_keys:
--                selected_keys.append(key)
--        return tuple([(root_id,)+ key for key in selected_keys])
--
--    def _new_root_data_stream(self, root_keys_to_create, parent_map):
--        for root_key in root_keys_to_create:
--            parent_keys = self._get_parent_keys(root_key, parent_map)
--            yield versionedfile.FulltextContentFactory(root_key,
--                parent_keys, None, '')
--
--    def _fetch_batch(self, revision_ids, basis_id, cache):
--        """Fetch across a few revisions.
--
--        :param revision_ids: The revisions to copy
--        :param basis_id: The revision_id of a tree that must be in cache, used
--            as a basis for delta when no other base is available
--        :param cache: A cache of RevisionTrees that we can use.
--        :return: The revision_id of the last converted tree. The RevisionTree
--            for it will be in cache
--        """
--        # Walk though all revisions; get inventory deltas, copy referenced
--        # texts that delta references, insert the delta, revision and
--        # signature.
--        root_keys_to_create = set()
--        text_keys = set()
--        pending_deltas = []
--        pending_revisions = []
--        parent_map = self.source.get_parent_map(revision_ids)
--        for tree in self.source.revision_trees(revision_ids):
--            current_revision_id = tree.get_revision_id()
--            parent_ids = parent_map.get(current_revision_id, ())
--            basis_id, delta = self._get_delta_for_revision(tree, parent_ids,
--                                                           basis_id, cache)
--            if self._converting_to_rich_root:
--                self._revision_id_to_root_id[current_revision_id] = \
--                    tree.get_root_id()
--            # Find text entries that need to be copied
--            for old_path, new_path, file_id, entry in delta:
--                if new_path is not None:
--                    if not new_path:
--                        # This is the root
--                        if not self.target.supports_rich_root():
--                            # The target doesn't support rich root, so we don't
--                            # copy
--                            continue
--                        if self._converting_to_rich_root:
--                            # This can't be copied normally, we have to insert
--                            # it specially
--                            root_keys_to_create.add((file_id, entry.revision))
--                            continue
--                    text_keys.add((file_id, entry.revision))
--            revision = self.source.get_revision(current_revision_id)
--            pending_deltas.append((basis_id, delta,
--                current_revision_id, revision.parent_ids))
--            pending_revisions.append(revision)
--            cache[current_revision_id] = tree
--            basis_id = current_revision_id
--        # Copy file texts
--        from_texts = self.source.texts
--        to_texts = self.target.texts
--        if root_keys_to_create:
--            root_stream = self._new_root_data_stream(root_keys_to_create,
--                                                     parent_map)
--            to_texts.insert_record_stream(root_stream)
--        to_texts.insert_record_stream(from_texts.get_record_stream(
--            text_keys, self.target._format._fetch_order,
--            not self.target._format._fetch_uses_deltas))
--        # insert inventory deltas
--        for delta in pending_deltas:
--            self.target.add_inventory_by_delta(*delta)
--        if self.target._fallback_repositories:
--            # Make sure this stacked repository has all the parent inventories
--            # for the new revisions that we are about to insert.  We do this
--            # before adding the revisions so that no revision is added until
--            # all the inventories it may depend on are added.
--            parent_ids = set()
--            revision_ids = set()
--            for revision in pending_revisions:
--                revision_ids.add(revision.revision_id)
--                parent_ids.update(revision.parent_ids)
--            parent_ids.difference_update(revision_ids)
--            parent_ids.discard(_mod_revision.NULL_REVISION)
--            parent_map = self.source.get_parent_map(parent_ids)
--            for parent_tree in self.source.revision_trees(parent_ids):
--                basis_id, delta = self._get_delta_for_revision(tree, parent_ids, basis_id, cache)
--                current_revision_id = parent_tree.get_revision_id()
--                parents_parents = parent_map[current_revision_id]
--                self.target.add_inventory_by_delta(
--                    basis_id, delta, current_revision_id, parents_parents)
--        # insert signatures and revisions
--        for revision in pending_revisions:
--            try:
--                signature = self.source.get_signature_text(
--                    revision.revision_id)
--                self.target.add_signature_text(revision.revision_id,
--                    signature)
--            except errors.NoSuchRevision:
--                pass
--            self.target.add_revision(revision.revision_id, revision)
--        return basis_id
--
--    def _fetch_all_revisions(self, revision_ids, pb):
--        """Fetch everything for the list of revisions.
--
--        :param revision_ids: The list of revisions to fetch. Must be in
--            topological order.
--        :param pb: A ProgressBar
--        :return: None
--        """
--        basis_id, basis_tree = self._get_basis(revision_ids[0])
--        batch_size = 100
--        cache = lru_cache.LRUCache(100)
--        cache[basis_id] = basis_tree
--        del basis_tree # We don't want to hang on to it here
--        hints = []
--        for offset in range(0, len(revision_ids), batch_size):
--            self.target.start_write_group()
--            try:
--                pb.update('Transferring revisions', offset,
--                          len(revision_ids))
--                batch = revision_ids[offset:offset+batch_size]
--                basis_id = self._fetch_batch(batch, basis_id, cache)
--            except:
--                self.target.abort_write_group()
--                raise
--            else:
--                hint = self.target.commit_write_group()
--                if hint:
--                    hints.extend(hint)
--        if hints and self.target._format.pack_compresses:
--            self.target.pack(hint=hints)
--        pb.update('Transferring revisions', len(revision_ids),
--                  len(revision_ids))
--
--    @needs_write_lock
--    def fetch(self, revision_id=None, pb=None, find_ghosts=False,
--            fetch_spec=None):
--        """See InterRepository.fetch()."""
--        if fetch_spec is not None:
--            raise AssertionError("Not implemented yet...")
--        if (not self.source.supports_rich_root()
--            and self.target.supports_rich_root()):
--            self._converting_to_rich_root = True
--            self._revision_id_to_root_id = {}
--        else:
--            self._converting_to_rich_root = False
--        revision_ids = self.target.search_missing_revision_ids(self.source,
--            revision_id, find_ghosts=find_ghosts).get_keys()
--        if not revision_ids:
--            return 0, 0
--        revision_ids = tsort.topo_sort(
--            self.source.get_graph().get_parent_map(revision_ids))
--        if not revision_ids:
--            return 0, 0
--        # Walk though all revisions; get inventory deltas, copy referenced
--        # texts that delta references, insert the delta, revision and
--        # signature.
--        first_rev = self.source.get_revision(revision_ids[0])
--        if pb is None:
--            my_pb = ui.ui_factory.nested_progress_bar()
--            pb = my_pb
--        else:
--            symbol_versioning.warn(
--                symbol_versioning.deprecated_in((1, 14, 0))
--                % "pb parameter to fetch()")
--            my_pb = None
--        try:
--            self._fetch_all_revisions(revision_ids, pb)
--        finally:
--            if my_pb is not None:
--                my_pb.finished()
--        return len(revision_ids), 0
--
--    def _get_basis(self, first_revision_id):
--        """Get a revision and tree which exists in the target.
--
--        This assumes that first_revision_id is selected for transmission
--        because all other ancestors are already present. If we can't find an
--        ancestor we fall back to NULL_REVISION since we know that is safe.
--
--        :return: (basis_id, basis_tree)
--        """
--        first_rev = self.source.get_revision(first_revision_id)
--        try:
--            basis_id = first_rev.parent_ids[0]
--            # only valid as a basis if the target has it
--            self.target.get_revision(basis_id)
--            # Try to get a basis tree - if its a ghost it will hit the
--            # NoSuchRevision case.
--            basis_tree = self.source.revision_tree(basis_id)
--        except (IndexError, errors.NoSuchRevision):
--            basis_id = _mod_revision.NULL_REVISION
--            basis_tree = self.source.revision_tree(basis_id)
--        return basis_id, basis_tree
--
--
--InterRepository.register_optimiser(InterDifferingSerializer)
  InterRepository.register_optimiser(InterSameDataRepository)
  InterRepository.register_optimiser(InterWeaveRepo)
  InterRepository.register_optimiser(InterKnitRepo)
@@ -3882,9 +3608,6 @@
          self.file_ids = set([file_id for file_id, _ in
              self.text_index.iterkeys()])
          # text keys is now grouped by file_id
--        n_weaves = len(self.file_ids)
--        files_in_revisions = {}
--        revisions_of_files = {}
          n_versions = len(self.text_index)
          progress_bar.update('loading text store', 0, n_versions)
          parent_map = self.repository.texts.get_parent_map(self.text_index)
@@ -3983,6 +3706,7 @@
                  pass
              else:
                  new_pack.set_write_cache_size(1024*1024)
++        delta_deserializer = inventory_delta.InventoryDeltaSerializer()
          for substream_type, substream in stream:
              if substream_type == 'texts':
                  self.target_repo.texts.insert_record_stream(substream)
@@ -3992,7 +3716,8 @@
                          substream)
                  else:
                      self._extract_and_insert_inventories(
--                        substream, src_serializer)
++                        substream, src_serializer,
++                        delta_deserializer.parse_text_bytes)
              elif substream_type == 'chk_bytes':
                  # XXX: This doesn't support conversions, as it assumes the
                  #      conversion was done in the fetch code.
@@ -4049,18 +3774,40 @@
              self.target_repo.pack(hint=hint)
          return [], set()
--    def _extract_and_insert_inventories(self, substream, serializer):
++    def _extract_and_insert_inventories(self, substream, serializer,
++            parse_delta=None):
          """Generate a new inventory versionedfile in target, converting data.
          The inventory is retrieved from the source, (deserializing it), and
          stored in the target (reserializing it in a different format).
          """
++        target_rich_root = self.target_repo._format.rich_root_data
++        target_tree_refs = self.target_repo._format.supports_tree_reference
          for record in substream:
++            if record.storage_kind == 'inventory-delta':
++                # Insert the delta directly
++                delta_tuple = record.get_bytes_as('inventory-delta')
++                basis_id, new_id, inv_delta, format_flags = delta_tuple
++                # Make sure the delta is compatible with the target
++                if format_flags[0] and not target_rich_root:
++                    raise errors.IncompatibleRevision(self.target_repo._format)
++                if format_flags[1] and not target_tree_refs:
++                    raise errors.IncompatibleRevision(self.target_repo._format)
++                revision_id = new_id[0]
++                parents = [key[0] for key in record.parents]
++                self.target_repo.add_inventory_by_delta(
++                    basis_id, inv_delta, revision_id, parents)
++                continue
++            # It's not a delta, so it must be a fulltext in the source
++            # serializer's format.
              bytes = record.get_bytes_as('fulltext')
              revision_id = record.key[0]
              inv = serializer.read_inventory_from_string(bytes, revision_id)
              parents = [key[0] for key in record.parents]
              self.target_repo.add_inventory(revision_id, inv, parents)
++            # No need to keep holding this full inv in memory when the rest of
++            # the substream is likely to be all deltas.
++            del inv
      def _extract_and_insert_revisions(self, substream, serializer):
          for record in substream:
@@ -4115,11 +3862,8 @@
          return [('signatures', signatures), ('revisions', revisions)]
      def _generate_root_texts(self, revs):
--        """This will be called by __fetch between fetching weave texts and
++        """This will be called by get_stream between fetching weave texts and
          fetching the inventory weave.
--
--        Subclasses should override this if they need to generate root texts
--        after fetching weave texts.
          """
          if self._rich_root_upgrade():
              import bzrlib.fetch
@@ -4157,9 +3901,6 @@
                  # will be valid.
                  for _ in self._generate_root_texts(revs):
                      yield _
--                # NB: This currently reopens the inventory weave in source;
--                # using a single stream interface instead would avoid this.
--                from_weave = self.from_repository.inventories
                  # we fetch only the referenced inventories because we do not
                  # know for unselected inventories whether all their required
                  # texts are present in the other repository - it could be
@@ -4204,6 +3945,22 @@
              if not keys:
                  # No need to stream something we don't have
                  continue
++            if substream_kind == 'inventories':
++                # Some missing keys are genuinely ghosts, filter those out.
++                present = self.from_repository.inventories.get_parent_map(keys)
++                revs = [key[0] for key in present]
++                # As with the original stream, we may need to generate root
++                # texts for the inventories we're about to stream.
++                for _ in self._generate_root_texts(revs):
++                    yield _
++                # Get the inventory stream more-or-less as we do for the
++                # original stream; there's no reason to assume that records
++                # direct from the source will be suitable for the sink.  (Think
++                # e.g. 2a -> 1.9-rich-root).
++                for info in self._get_inventory_stream(revs, missing=True):
++                    yield info
++                continue
++
              # Ask for full texts always so that we don't need more round trips
              # after this stream.
              # Some of the missing keys are genuinely ghosts, so filter absent
@@ -4224,129 +3981,95 @@
          return (not self.from_repository._format.rich_root_data and
              self.to_format.rich_root_data)
--    def _get_inventory_stream(self, revision_ids):
++    def _get_inventory_stream(self, revision_ids, missing=False):
          from_format = self.from_repository._format
--        if (from_format.supports_chks and self.to_format.supports_chks
--            and (from_format._serializer == self.to_format._serializer)):
--            # Both sides support chks, and they use the same serializer, so it
--            # is safe to transmit the chk pages and inventory pages across
--            # as-is.
--            return self._get_chk_inventory_stream(revision_ids)
++        if (from_format.supports_chks and self.to_format.supports_chks and
++            from_format.network_name() == self.to_format.network_name()):
++            raise AssertionError(
++                "this case should be handled by GroupCHKStreamSource")
          elif (not from_format.supports_chks):
              # Source repository doesn't support chks. So we can transmit the
              # inventories 'as-is' and either they are just accepted on the
              # target, or the Sink will properly convert it.
--            return self._get_simple_inventory_stream(revision_ids)
++            # (XXX: this assumes that all non-chk formats are understood as-is
++            # by any Sink, but that presumably isn't true for foreign repo
++            # formats added by bzr-svn etc?)
++            return self._get_simple_inventory_stream(revision_ids,
++                    missing=missing)
          else:
--            # XXX: Hack to make not-chk->chk fetch: copy the inventories as
--            #      inventories. Note that this should probably be done somehow
--            #      as part of bzrlib.repository.StreamSink. Except JAM couldn't
--            #      figure out how a non-chk repository could possibly handle
--            #      deserializing an inventory stream from a chk repo, as it
--            #      doesn't have a way to understand individual pages.
--            return self._get_convertable_inventory_stream(revision_ids)
++            # Make chk->non-chk (and chk with different serializers) fetch:
++            # copy the inventories as (format-neutral) inventory deltas.
++            return self._get_convertable_inventory_stream(revision_ids,
++                    fulltexts=missing)
--    def _get_simple_inventory_stream(self, revision_ids):
++    def _get_simple_inventory_stream(self, revision_ids, missing=False):
++        # NB: This currently reopens the inventory weave in source;
++        # using a single stream interface instead would avoid this.
          from_weave = self.from_repository.inventories
++        if missing:
++            delta_closure = True
++        else:
++            delta_closure = not self.delta_on_metadata()
          yield ('inventories', from_weave.get_record_stream(
              [(rev_id,) for rev_id in revision_ids],
--            self.inventory_fetch_order(),
--            not self.delta_on_metadata()))
--
--    def _get_chk_inventory_stream(self, revision_ids):
--        """Fetch the inventory texts, along with the associated chk maps."""
--        # We want an inventory outside of the search set, so that we can filter
--        # out uninteresting chk pages. For now we use
--        # _find_revision_outside_set, but if we had a Search with cut_revs, we
--        # could use that instead.
--        start_rev_id = self.from_repository._find_revision_outside_set(
--                            revision_ids)
--        start_rev_key = (start_rev_id,)
--        inv_keys_to_fetch = [(rev_id,) for rev_id in revision_ids]
--        if start_rev_id != _mod_revision.NULL_REVISION:
--            inv_keys_to_fetch.append((start_rev_id,))
--        # Any repo that supports chk_bytes must also support out-of-order
--        # insertion. At least, that is how we expect it to work
--        # We use get_record_stream instead of iter_inventories because we want
--        # to be able to insert the stream as well. We could instead fetch
--        # allowing deltas, and then iter_inventories, but we don't know whether
--        # source or target is more 'local' anway.
--        inv_stream = self.from_repository.inventories.get_record_stream(
--            inv_keys_to_fetch, 'unordered',
--            True) # We need them as full-texts so we can find their references
--        uninteresting_chk_roots = set()
--        interesting_chk_roots = set()
--        def filter_inv_stream(inv_stream):
--            for idx, record in enumerate(inv_stream):
--                ### child_pb.update('fetch inv', idx, len(inv_keys_to_fetch))
--                bytes = record.get_bytes_as('fulltext')
--                chk_inv = inventory.CHKInventory.deserialise(
--                    self.from_repository.chk_bytes, bytes, record.key)
--                if record.key == start_rev_key:
--                    uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
--                    p_id_map = chk_inv.parent_id_basename_to_file_id
--                    if p_id_map is not None:
--                        uninteresting_chk_roots.add(p_id_map.key())
--                else:
--                    yield record
--                    interesting_chk_roots.add(chk_inv.id_to_entry.key())
--                    p_id_map = chk_inv.parent_id_basename_to_file_id
--                    if p_id_map is not None:
--                        interesting_chk_roots.add(p_id_map.key())
--        ### pb.update('fetch inventory', 0, 2)
--        yield ('inventories', filter_inv_stream(inv_stream))
--        # Now that we have worked out all of the interesting root nodes, grab
--        # all of the interesting pages and insert them
--        ### pb.update('fetch inventory', 1, 2)
--        interesting = chk_map.iter_interesting_nodes(
--            self.from_repository.chk_bytes, interesting_chk_roots,
--            uninteresting_chk_roots)
--        def to_stream_adapter():
--            """Adapt the iter_interesting_nodes result to a single stream.
--
--            iter_interesting_nodes returns records as it processes them, along
--            with keys. However, we only want to return the records themselves.
--            """
--            for record, items in interesting:
--                if record is not None:
--                    yield record
--        # XXX: We could instead call get_record_stream(records.keys())
--        #      ATM, this will always insert the records as fulltexts, and
--        #      requires that you can hang on to records once you have gone
--        #      on to the next one. Further, it causes the target to
--        #      recompress the data. Testing shows it to be faster than
--        #      requesting the records again, though.
--        yield ('chk_bytes', to_stream_adapter())
--        ### pb.update('fetch inventory', 2, 2)
--
--    def _get_convertable_inventory_stream(self, revision_ids):
--        # XXX: One of source or target is using chks, and they don't have
--        #      compatible serializations. The StreamSink code expects to be
--        #      able to convert on the target, so we need to put
--        #      bytes-on-the-wire that can be converted
--        yield ('inventories', self._stream_invs_as_fulltexts(revision_ids))
--
--    def _stream_invs_as_fulltexts(self, revision_ids):
++            self.inventory_fetch_order(), delta_closure))
++
++    def _get_convertable_inventory_stream(self, revision_ids, fulltexts=False):
++        # The source is using CHKs, but the target either doesn't or is has a
++        # different serializer.  The StreamSink code expects to be able to
++        # convert on the target, so we need to put bytes-on-the-wire that can
++        # be converted.  That means inventory deltas (if the remote is <1.18,
++        # RemoteStreamSink will fallback to VFS to insert the deltas).
++        yield ('inventories',
++           self._stream_invs_as_deltas(revision_ids, fulltexts=fulltexts))
++
++    def _stream_invs_as_deltas(self, revision_ids, fulltexts=False):
          from_repo = self.from_repository
--        from_serializer = from_repo._format._serializer
          revision_keys = [(rev_id,) for rev_id in revision_ids]
          parent_map = from_repo.inventories.get_parent_map(revision_keys)
--        for inv in self.from_repository.iter_inventories(revision_ids):
--            # XXX: This is a bit hackish, but it works. Basically,
--            #      CHKSerializer 'accidentally' supports
--            #      read/write_inventory_to_string, even though that is never
--            #      the format that is stored on disk. It *does* give us a
--            #      single string representation for an inventory, so live with
--            #      it for now.
--            #      This would be far better if we had a 'serialized inventory
--            #      delta' form. Then we could use 'inventory._make_delta', and
--            #      transmit that. This would both be faster to generate, and
--            #      result in fewer bytes-on-the-wire.
--            as_bytes = from_serializer.write_inventory_to_string(inv)
++        # XXX: possibly repos could implement a more efficient iter_inv_deltas
++        # method...
++        inventories = self.from_repository.iter_inventories(
++            revision_ids, 'topological')
++        # XXX: ideally these flags would be per-revision, not per-repo (e.g.
++        # streaming a non-rich-root revision out of a rich-root repo back into
++        # a non-rich-root repo ought to be allowed)
++        format = from_repo._format
++        flags = (format.rich_root_data, format.supports_tree_reference)
++        invs_sent_so_far = set([_mod_revision.NULL_REVISION])
++        for inv in inventories:
              key = (inv.revision_id,)
--            parent_keys = parent_map.get(key, ())
--            yield versionedfile.FulltextContentFactory(
--                key, parent_keys, None, as_bytes)
++            parents = parent_map.get(key, ())
++            if fulltexts or parents == ():
++                # Either the caller asked for fulltexts, or there is no parent,
++                # so, stream as a delta from null:.
++                basis_id = _mod_revision.NULL_REVISION
++                parent_inv = Inventory(None)
++                delta = inv._make_delta(parent_inv)
++            else:
++                # Make a delta against each parent so that we can find the
++                # smallest.
++                best_delta = None
++                parent_ids = [parent_key[0] for parent_key in parents]
++                parent_ids.append(_mod_revision.NULL_REVISION)
++                for parent_id in parent_ids:
++                    if parent_id not in invs_sent_so_far:
++                        # We don't know that the remote side has this basis, so
++                        # we can't use it.
++                        continue
++                    if parent_id == _mod_revision.NULL_REVISION:
++                        parent_inv = Inventory(None)
++                    else:
++                        parent_inv = from_repo.get_inventory(parent_id)
++                    candidate_delta = inv._make_delta(parent_inv)
++                    if (best_delta is None or
++                        len(best_delta) > len(candidate_delta)):
++                        best_delta = candidate_delta
++                        basis_id = parent_id
++                delta = best_delta
++            invs_sent_so_far.add(basis_id)
++            yield versionedfile.InventoryDeltaContentFactory(
++                key, parents, None, delta, basis_id, flags, from_repo)
  def _iter_for_revno(repo, partial_history_cache, stop_index=None,
 === modified file 'bzrlib/smart/protocol.py'
 --- bzrlib/smart/protocol.py	2009-07-17 01:48:56 +0000
 +++ bzrlib/smart/protocol.py	2009-07-22 00:35:23 +0000
@@ -1209,6 +1209,8 @@
          except (KeyboardInterrupt, SystemExit):
              raise
          except Exception:
++            mutter('_iter_with_errors caught error')
++            log_exception_quietly()
              yield sys.exc_info(), None
              return
 === modified file 'bzrlib/smart/repository.py'
 --- bzrlib/smart/repository.py	2009-06-16 06:46:32 +0000
 +++ bzrlib/smart/repository.py	2009-07-22 00:35:23 +0000
@@ -30,6 +30,7 @@
      graph,
      osutils,
      pack,
++    versionedfile,
+     )
  from bzrlib.bzrdir import BzrDir
  from bzrlib.smart.request import (
@@ -39,7 +40,11 @@
+     )
  from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
  from bzrlib import revision as _mod_revision
--from bzrlib.versionedfile import NetworkRecordStream, record_to_fulltext_bytes
++from bzrlib.versionedfile import (
++    NetworkRecordStream,
++    record_to_fulltext_bytes,
++    record_to_inventory_delta_bytes,
++    )
  class SmartServerRepositoryRequest(SmartServerRequest):
@@ -414,8 +419,39 @@
              repository.
          """
          self._to_format = network_format_registry.get(to_network_name)
++        if self._should_fake_unknown():
++            return FailedSmartServerResponse(
++                ('UnknownMethod', 'Repository.get_stream'))
          return None # Signal that we want a body.
++    def _should_fake_unknown(self):
++        # This is a workaround for bugs in pre-1.18 clients that claim to
++        # support receiving streams of CHK repositories.  The pre-1.18 client
++        # expects inventory records to be serialized in the format defined by
++        # to_network_name, but in pre-1.18 (at least) that format definition
++        # tries to use the xml5 serializer, which does not correctly handle
++        # rich-roots.  After 1.18 the client can also accept inventory-deltas
++        # (which avoids this issue), and those clients will use the
++        # Repository.get_stream_1.18 verb instead of this one.
++        # So: if this repository is CHK, and the to_format doesn't match,
++        # we should just fake an UnknownSmartMethod error so that the client
++        # will fallback to VFS, rather than sending it a stream we know it
++        # cannot handle.
++        from_format = self._repository._format
++        to_format = self._to_format
++        if not from_format.supports_chks:
++            # Source not CHK: that's ok
++            return False
++        if (to_format.supports_chks and
++            from_format.repository_class is to_format.repository_class and
++            from_format._serializer == to_format._serializer):
++            # Source is CHK, but target matches: that's ok
++            # (e.g. 2a->2a, or CHK2->2a)
++            return False
++        # Source is CHK, and target is not CHK or incompatible CHK.  We can't
++        # generate a compatible stream.
++        return True
++
      def do_body(self, body_bytes):
          repository = self._repository
          repository.lock_read()
@@ -451,6 +487,14 @@
              repository.unlock()
++class SmartServerRepositoryGetStream_1_18(SmartServerRepositoryGetStream):
++
++    def _should_fake_unknown(self):
++        # The client is at least 1.18, so we don't need to work around any
++        # bugs.
++        return False
++
++
  def _stream_to_byte_stream(stream, src_format):
      """Convert a record stream to a self delimited byte stream."""
      pack_writer = pack.ContainerSerialiser()
@@ -460,6 +504,8 @@
          for record in substream:
              if record.storage_kind in ('chunked', 'fulltext'):
                  serialised = record_to_fulltext_bytes(record)
++            elif record.storage_kind == 'inventory-delta':
++                serialised = record_to_inventory_delta_bytes(record)
              elif record.storage_kind == 'absent':
                  raise ValueError("Absent factory for %s" % (record.key,))
              else:
@@ -650,6 +696,23 @@
              return SuccessfulSmartServerResponse(('ok', ))
++class SmartServerRepositoryInsertStream_1_18(SmartServerRepositoryInsertStreamLocked):
++    """Insert a record stream from a RemoteSink into a repository.
++
++    Same as SmartServerRepositoryInsertStreamLocked, except:
++     - the lock token argument is optional
++     - servers that implement this verb accept 'inventory-delta' records in the
++       stream.
++
++    New in 1.18.
++    """
++
++    def do_repository_request(self, repository, resume_tokens, lock_token=None):
++        """StreamSink.insert_stream for a remote repository."""
++        SmartServerRepositoryInsertStreamLocked.do_repository_request(
++            self, repository, resume_tokens, lock_token)
++
++
  class SmartServerRepositoryInsertStream(SmartServerRepositoryInsertStreamLocked):
      """Insert a record stream from a RemoteSink into an unlocked repository.
 === modified file 'bzrlib/smart/request.py'
 --- bzrlib/smart/request.py	2009-07-17 01:47:01 +0000
 +++ bzrlib/smart/request.py	2009-07-22 00:35:23 +0000
@@ -550,6 +550,8 @@
  request_handlers.register_lazy(
      'Repository.insert_stream', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream')
  request_handlers.register_lazy(
++    'Repository.insert_stream_1.18', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream_1_18')
++request_handlers.register_lazy(
      'Repository.insert_stream_locked', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStreamLocked')
  request_handlers.register_lazy(
      'Repository.is_shared', 'bzrlib.smart.repository', 'SmartServerRepositoryIsShared')
@@ -567,6 +569,9 @@
      'Repository.get_stream', 'bzrlib.smart.repository',
      'SmartServerRepositoryGetStream')
  request_handlers.register_lazy(
++    'Repository.get_stream_1.18', 'bzrlib.smart.repository',
++    'SmartServerRepositoryGetStream_1_18')
++request_handlers.register_lazy(
      'Repository.tarball', 'bzrlib.smart.repository',
      'SmartServerRepositoryTarball')
  request_handlers.register_lazy(
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2009-07-20 04:22:47 +0000
 +++ bzrlib/tests/__init__.py	2009-07-22 00:35:23 +0000
@@ -1912,6 +1912,16 @@
          sio.encoding = output_encoding
          return sio
++    def disable_verb(self, verb):
++        """Disable a smart server verb for one test."""
++        from bzrlib.smart import request
++        request_handlers = request.request_handlers
++        orig_method = request_handlers.get(verb)
++        request_handlers.remove(verb)
++        def restoreVerb():
++            request_handlers.register(verb, orig_method)
++        self.addCleanup(restoreVerb)
++
  class CapturedCall(object):
      """A helper for capturing smart server calls for easy debug analysis."""
 === modified file 'bzrlib/tests/per_branch/test_push.py'
 --- bzrlib/tests/per_branch/test_push.py	2009-07-10 05:49:34 +0000
 +++ bzrlib/tests/per_branch/test_push.py	2009-07-22 00:35:23 +0000
@@ -260,14 +260,15 @@
          self.assertFalse(local.is_locked())
          local.push(remote)
          hpss_call_names = [item.call.method for item in self.hpss_calls]
--        self.assertTrue('Repository.insert_stream' in hpss_call_names)
--        insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
++        self.assertTrue('Repository.insert_stream_1.18' in hpss_call_names)
++        insert_stream_idx = hpss_call_names.index(
++            'Repository.insert_stream_1.18')
          calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
          # After inserting the stream the client has no reason to query the
          # remote graph any further.
          self.assertEqual(
--            ['Repository.insert_stream', 'Repository.insert_stream', 'get',
--             'Branch.set_last_revision_info', 'Branch.unlock'],
++            ['Repository.insert_stream_1.18', 'Repository.insert_stream_1.18',
++             'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
              calls_after_insert_stream)
      def disableOptimisticGetParentMap(self):
 === modified file 'bzrlib/tests/per_interbranch/test_push.py'
 --- bzrlib/tests/per_interbranch/test_push.py	2009-07-10 05:49:34 +0000
 +++ bzrlib/tests/per_interbranch/test_push.py	2009-07-22 00:35:23 +0000
@@ -266,14 +266,15 @@
          self.assertFalse(local.is_locked())
          local.push(remote)
          hpss_call_names = [item.call.method for item in self.hpss_calls]
--        self.assertTrue('Repository.insert_stream' in hpss_call_names)
--        insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
++        self.assertTrue('Repository.insert_stream_1.18' in hpss_call_names)
++        insert_stream_idx = hpss_call_names.index(
++            'Repository.insert_stream_1.18')
          calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
          # After inserting the stream the client has no reason to query the
          # remote graph any further.
          self.assertEqual(
--            ['Repository.insert_stream', 'Repository.insert_stream', 'get',
--             'Branch.set_last_revision_info', 'Branch.unlock'],
++            ['Repository.insert_stream_1.18', 'Repository.insert_stream_1.18',
++             'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
              calls_after_insert_stream)
      def disableOptimisticGetParentMap(self):
 === modified file 'bzrlib/tests/per_interrepository/__init__.py'
 --- bzrlib/tests/per_interrepository/__init__.py	2009-07-10 06:46:10 +0000
 +++ bzrlib/tests/per_interrepository/__init__.py	2009-07-22 00:35:23 +0000
@@ -32,8 +32,6 @@
+     )
  from bzrlib.repository import (
--    InterDifferingSerializer,
--    InterKnitRepo,
      InterRepository,
+     )
  from bzrlib.tests import (
@@ -51,15 +49,13 @@
          (interrepo_class, repository_format, repository_format_to).
      """
      result = []
--    for interrepo_class, repository_format, repository_format_to in formats:
--        id = '%s,%s,%s' % (interrepo_class.__name__,
--                            repository_format.__class__.__name__,
--                            repository_format_to.__class__.__name__)
++    for repository_format, repository_format_to in formats:
++        id = '%s,%s' % (repository_format.__class__.__name__,
++                        repository_format_to.__class__.__name__)
          scenario = (id,
              {"transport_server": transport_server,
               "transport_readonly_server": transport_readonly_server,
               "repository_format": repository_format,
--             "interrepo_class": interrepo_class,
               "repository_format_to": repository_format_to,
               })
          result.append(scenario)
@@ -68,8 +64,12 @@
  def default_test_list():
      """Generate the default list of interrepo permutations to test."""
--    from bzrlib.repofmt import knitrepo, pack_repo, weaverepo
++    from bzrlib.repofmt import (
++        knitrepo, pack_repo, weaverepo, groupcompress_repo,
++        )
      result = []
++    def add_combo(from_format, to_format):
++        result.append((from_format, to_format))
      # test the default InterRepository between format 6 and the current
      # default format.
      # XXX: robertc 20060220 reinstate this when there are two supported
@@ -80,37 +80,33 @@
      for optimiser_class in InterRepository._optimisers:
          format_to_test = optimiser_class._get_repo_format_to_test()
          if format_to_test is not None:
--            result.append((optimiser_class,
--                           format_to_test, format_to_test))
++            add_combo(format_to_test, format_to_test)
      # if there are specific combinations we want to use, we can add them
      # here. We want to test rich root upgrading.
--    result.append((InterRepository,
--                   weaverepo.RepositoryFormat5(),
--                   knitrepo.RepositoryFormatKnit3()))
--    result.append((InterRepository,
--                   knitrepo.RepositoryFormatKnit1(),
--                   knitrepo.RepositoryFormatKnit3()))
--    result.append((InterRepository,
--                   knitrepo.RepositoryFormatKnit1(),
--                   knitrepo.RepositoryFormatKnit3()))
--    result.append((InterKnitRepo,
--                   knitrepo.RepositoryFormatKnit1(),
--                   pack_repo.RepositoryFormatKnitPack1()))
--    result.append((InterKnitRepo,
--                   pack_repo.RepositoryFormatKnitPack1(),
--                   knitrepo.RepositoryFormatKnit1()))
--    result.append((InterKnitRepo,
--                   knitrepo.RepositoryFormatKnit3(),
--                   pack_repo.RepositoryFormatKnitPack3()))
--    result.append((InterKnitRepo,
--                   pack_repo.RepositoryFormatKnitPack3(),
--                   knitrepo.RepositoryFormatKnit3()))
--    result.append((InterKnitRepo,
--                   pack_repo.RepositoryFormatKnitPack3(),
--                   pack_repo.RepositoryFormatKnitPack4()))
--    result.append((InterDifferingSerializer,
--                   pack_repo.RepositoryFormatKnitPack1(),
--                   pack_repo.RepositoryFormatKnitPack6RichRoot()))
++    add_combo(weaverepo.RepositoryFormat5(),
++              knitrepo.RepositoryFormatKnit3())
++    add_combo(knitrepo.RepositoryFormatKnit1(),
++              knitrepo.RepositoryFormatKnit3())
++    add_combo(knitrepo.RepositoryFormatKnit1(),
++              pack_repo.RepositoryFormatKnitPack1())
++    add_combo(pack_repo.RepositoryFormatKnitPack1(),
++              knitrepo.RepositoryFormatKnit1())
++    add_combo(knitrepo.RepositoryFormatKnit3(),
++              pack_repo.RepositoryFormatKnitPack3())
++    add_combo(pack_repo.RepositoryFormatKnitPack3(),
++              knitrepo.RepositoryFormatKnit3())
++    add_combo(pack_repo.RepositoryFormatKnitPack3(),
++              pack_repo.RepositoryFormatKnitPack4())
++    add_combo(pack_repo.RepositoryFormatKnitPack1(),
++              pack_repo.RepositoryFormatKnitPack6RichRoot())
++    add_combo(pack_repo.RepositoryFormatKnitPack6RichRoot(),
++              groupcompress_repo.RepositoryFormat2a())
++    add_combo(groupcompress_repo.RepositoryFormat2a(),
++              pack_repo.RepositoryFormatKnitPack6RichRoot())
++    add_combo(groupcompress_repo.RepositoryFormatCHK2(),
++              groupcompress_repo.RepositoryFormat2a())
++    add_combo(groupcompress_repo.RepositoryFormatCHK1(),
++              groupcompress_repo.RepositoryFormat2a())
      return result
 === modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
 --- bzrlib/tests/per_interrepository/test_fetch.py	2009-07-10 06:46:10 +0000
 +++ bzrlib/tests/per_interrepository/test_fetch.py	2009-07-22 00:35:23 +0000
@@ -28,6 +28,9 @@
  from bzrlib.errors import (
      NoSuchRevision,
+     )
++from bzrlib.graph import (
++    SearchResult,
++    )
  from bzrlib.revision import (
      NULL_REVISION,
      Revision,
@@ -124,6 +127,15 @@
              to_repo.texts.get_record_stream([('foo', revid)],
              'unordered', True).next().get_bytes_as('fulltext'))
++    def test_fetch_parent_inventories_at_stacking_boundary_smart(self):
++        self.setup_smart_server_with_call_log()
++        self.test_fetch_parent_inventories_at_stacking_boundary()
++
++    def test_fetch_parent_inventories_at_stacking_boundary_smart_old(self):
++        self.setup_smart_server_with_call_log()
++        self.disable_verb('Repository.insert_stream_1.18')
++        self.test_fetch_parent_inventories_at_stacking_boundary()
++
      def test_fetch_parent_inventories_at_stacking_boundary(self):
          """Fetch to a stacked branch copies inventories for parents of
          revisions at the stacking boundary.
@@ -156,11 +168,25 @@
          unstacked_repo = stacked_branch.bzrdir.open_repository()
          unstacked_repo.lock_read()
          self.addCleanup(unstacked_repo.unlock)
--        self.assertFalse(unstacked_repo.has_revision('left'))
--        self.assertFalse(unstacked_repo.has_revision('right'))
--        self.assertEqual(
--            set([('left',), ('right',), ('merge',)]),
--            unstacked_repo.inventories.keys())
++        if not unstacked_repo._format.supports_chks:
++            # these assertions aren't valid for groupcompress repos, which may
++            # transfer data than strictly necessary to avoid breaking up an
++            # already-compressed block of data.
++            self.assertFalse(unstacked_repo.has_revision('left'))
++            self.assertFalse(unstacked_repo.has_revision('right'))
++        self.assertTrue(unstacked_repo.has_revision('merge'))
++        # We used to check for the presence of parent invs here, but what
++        # really matters is that the repo can stream the new revision without
++        # the help of any fallback repos.
++        self.assertCanStreamRevision(unstacked_repo, 'merge')
++
++    def assertCanStreamRevision(self, repo, revision_id):
++        exclude_keys = set(repo.all_revision_ids()) - set([revision_id])
++        search = SearchResult([revision_id], exclude_keys, 1, [revision_id])
++        source = repo._get_source(repo._format)
++        for substream_kind, substream in source.get_stream(search):
++            # Consume the substream
++            list(substream)
      def test_fetch_missing_basis_text(self):
          """If fetching a delta, we should die if a basis is not present."""
@@ -276,7 +302,10 @@
          to_repo = self.make_to_repository('to')
          to_repo.fetch(from_tree.branch.repository)
          recorded_inv_sha1 = to_repo.get_inventory_sha1('foo-id')
--        xml = to_repo.get_inventory_xml('foo-id')
++        try:
++            xml = to_repo.get_inventory_xml('foo-id')
++        except NotImplementedError:
++            raise TestNotApplicable('repo does not provide get_inventory_xml')
          computed_inv_sha1 = osutils.sha_string(xml)
          self.assertEqual(computed_inv_sha1, recorded_inv_sha1)
 === modified file 'bzrlib/tests/test_inventory_delta.py'
 --- bzrlib/tests/test_inventory_delta.py	2009-04-02 05:53:12 +0000
 +++ bzrlib/tests/test_inventory_delta.py	2009-07-22 00:35:23 +0000
@@ -93,30 +93,26 @@
      """Test InventoryDeltaSerializer.parse_text_bytes."""
      def test_parse_no_bytes(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
          err = self.assertRaises(
              errors.BzrError, serializer.parse_text_bytes, '')
--        self.assertContainsRe(str(err), 'unknown format')
++        self.assertContainsRe(str(err), 'last line not empty')
      def test_parse_bad_format(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
          err = self.assertRaises(errors.BzrError,
              serializer.parse_text_bytes, 'format: foo\n')
          self.assertContainsRe(str(err), 'unknown format')
      def test_parse_no_parent(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
          err = self.assertRaises(errors.BzrError,
              serializer.parse_text_bytes,
              'format: bzr inventory delta v1 (bzr 1.14)\n')
          self.assertContainsRe(str(err), 'missing parent: marker')
      def test_parse_no_version(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
          err = self.assertRaises(errors.BzrError,
              serializer.parse_text_bytes,
              'format: bzr inventory delta v1 (bzr 1.14)\n'
@@ -124,8 +120,7 @@
          self.assertContainsRe(str(err), 'missing version: marker')
      def test_parse_duplicate_key_errors(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
          double_root_lines = \
  """format: bzr inventory delta v1 (bzr 1.14)
  parent: null:
@@ -140,19 +135,20 @@
          self.assertContainsRe(str(err), 'duplicate file id')
      def test_parse_versioned_root_only(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          parse_result = serializer.parse_text_bytes(root_only_lines)
          expected_entry = inventory.make_entry(
              'directory', u'', None, 'an-id')
          expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
          self.assertEqual(
--            ('null:', 'entry-version', [(None, '/', 'an-id', expected_entry)]),
++            ('null:', 'entry-version', True, True,
++             [(None, '', 'an-id', expected_entry)]),
              parse_result)
      def test_parse_special_revid_not_valid_last_mod(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
  parent: null:
  version: null:
@@ -165,8 +161,8 @@
          self.assertContainsRe(str(err), 'special revisionid found')
      def test_parse_versioned_root_versioned_disabled(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
  parent: null:
  version: null:
@@ -179,8 +175,8 @@
          self.assertContainsRe(str(err), 'Versioned root found')
      def test_parse_unique_root_id_root_versioned_disabled(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
  parent: null:
  version: null:
@@ -193,21 +189,80 @@
          self.assertContainsRe(str(err), 'Versioned root found')
      def test_parse_unversioned_root_versioning_enabled(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          err = self.assertRaises(errors.BzrError,
              serializer.parse_text_bytes, root_only_unversioned)
          self.assertContainsRe(
              str(err), 'serialized versioned_root flag is wrong: False')
      def test_parse_tree_when_disabled(self):
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=False)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=False)
          err = self.assertRaises(errors.BzrError,
              serializer.parse_text_bytes, reference_lines)
          self.assertContainsRe(
              str(err), 'serialized tree_references flag is wrong: True')
++    def test_parse_tree_when_header_disallows(self):
++        # A deserializer that allows tree_references to be set or unset.
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        # A serialised inventory delta with a header saying no tree refs, but
++        # that has a tree ref in its content.
++        lines = """format: bzr inventory delta v1 (bzr 1.14)
++parent: null:
++version: entry-version
++versioned_root: false
++tree_references: false
++None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
++"""
++        err = self.assertRaises(errors.BzrError,
++            serializer.parse_text_bytes, lines)
++        self.assertContainsRe(str(err), 'Tree reference found')
++
++    def test_parse_versioned_root_when_header_disallows(self):
++        # A deserializer that allows tree_references to be set or unset.
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        # A serialised inventory delta with a header saying no tree refs, but
++        # that has a tree ref in its content.
++        lines = """format: bzr inventory delta v1 (bzr 1.14)
++parent: null:
++version: entry-version
++versioned_root: false
++tree_references: false
++None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
++"""
++        err = self.assertRaises(errors.BzrError,
++            serializer.parse_text_bytes, lines)
++        self.assertContainsRe(str(err), 'Versioned root found')
++
++    def test_parse_last_line_not_empty(self):
++        """newpath must start with / if it is not None."""
++        # Trim the trailing newline from a valid serialization
++        lines = root_only_lines[:-1]
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        err = self.assertRaises(errors.BzrError,
++            serializer.parse_text_bytes, lines)
++        self.assertContainsRe(str(err), 'last line not empty')
++
++    def test_parse_invalid_newpath(self):
++        """newpath must start with / if it is not None."""
++        lines = empty_lines
++        lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        err = self.assertRaises(errors.BzrError,
++            serializer.parse_text_bytes, lines)
++        self.assertContainsRe(str(err), 'newpath invalid')
++
++    def test_parse_invalid_oldpath(self):
++        """oldpath must start with / if it is not None."""
++        lines = root_only_lines
++        lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        err = self.assertRaises(errors.BzrError,
++            serializer.parse_text_bytes, lines)
++        self.assertContainsRe(str(err), 'oldpath invalid')
++
  class TestSerialization(TestCase):
      """Tests for InventoryDeltaSerializer.delta_to_lines."""
@@ -216,8 +271,8 @@
          old_inv = Inventory(None)
          new_inv = Inventory(None)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(True, True)
          self.assertEqual(StringIO(empty_lines).readlines(),
              serializer.delta_to_lines(NULL_REVISION, NULL_REVISION, delta))
@@ -228,8 +283,8 @@
          root.revision = 'a@e\xc3\xa5ample.com--2004'
          new_inv.add(root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          self.assertEqual(StringIO(root_only_lines).readlines(),
              serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
@@ -239,8 +294,8 @@
          root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
          new_inv.add(root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=False)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(False, False)
          self.assertEqual(StringIO(root_only_unversioned).readlines(),
              serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
@@ -253,8 +308,8 @@
          non_root = new_inv.make_entry('directory', 'foo', root.file_id, 'id')
          new_inv.add(non_root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          err = self.assertRaises(errors.BzrError,
              serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
          self.assertEqual(str(err), 'no version for fileid id')
@@ -265,8 +320,8 @@
          root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
          new_inv.add(root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          err = self.assertRaises(errors.BzrError,
              serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
          self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
@@ -278,8 +333,8 @@
          root.revision = 'a@e\xc3\xa5ample.com--2004'
          new_inv.add(root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          err = self.assertRaises(errors.BzrError,
              serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
          self.assertEqual(str(err), 'Version present for / in TREE_ROOT')
@@ -290,8 +345,8 @@
          root = new_inv.make_entry('directory', '', None, 'my-rich-root-id')
          new_inv.add(root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=False, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          err = self.assertRaises(errors.BzrError,
              serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
          self.assertEqual(
@@ -308,8 +363,8 @@
          non_root.kind = 'strangelove'
          new_inv.add(non_root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(True, True)
          # we expect keyerror because there is little value wrapping this.
          # This test aims to prove that it errors more than how it errors.
          err = self.assertRaises(KeyError,
@@ -328,8 +383,8 @@
          non_root.reference_revision = 'subtree-version'
          new_inv.add(non_root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=False)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=False)
          # we expect keyerror because there is little value wrapping this.
          # This test aims to prove that it errors more than how it errors.
          err = self.assertRaises(KeyError,
@@ -348,23 +403,26 @@
          non_root.reference_revision = 'subtree-version'
          new_inv.add(non_root)
          delta = new_inv._make_delta(old_inv)
--        serializer = inventory_delta.InventoryDeltaSerializer(
--            versioned_root=True, tree_references=True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          self.assertEqual(StringIO(reference_lines).readlines(),
              serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
      def test_to_inventory_root_id_versioned_not_permitted(self):
--        delta = [(None, '/', 'TREE_ROOT', inventory.make_entry(
--            'directory', '', None, 'TREE_ROOT'))]
--        serializer = inventory_delta.InventoryDeltaSerializer(False, True)
++        root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
++        root_entry.revision = 'some-version'
++        delta = [(None, '', 'TREE_ROOT', root_entry)]
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=False, tree_references=True)
          self.assertRaises(
              errors.BzrError, serializer.delta_to_lines, 'old-version',
              'new-version', delta)
      def test_to_inventory_root_id_not_versioned(self):
--        delta = [(None, '/', 'an-id', inventory.make_entry(
++        delta = [(None, '', 'an-id', inventory.make_entry(
              'directory', '', None, 'an-id'))]
--        serializer = inventory_delta.InventoryDeltaSerializer(True, True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          self.assertRaises(
              errors.BzrError, serializer.delta_to_lines, 'old-version',
              'new-version', delta)
@@ -374,12 +432,13 @@
          tree_ref = make_entry('tree-reference', 'foo', 'changed-in', 'ref-id')
          tree_ref.reference_revision = 'ref-revision'
          delta = [
--            (None, '/', 'an-id',
++            (None, '', 'an-id',
               make_entry('directory', '', 'changed-in', 'an-id')),
--            (None, '/foo', 'ref-id', tree_ref)
++            (None, 'foo', 'ref-id', tree_ref)
              # a file that followed the root move
+             ]
--        serializer = inventory_delta.InventoryDeltaSerializer(True, True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          self.assertRaises(errors.BzrError, serializer.delta_to_lines,
              'old-version', 'new-version', delta)
@@ -430,7 +489,8 @@
                     executable=True, text_size=30, text_sha1='some-sha',
                     revision='old-rev')),
+             ]
--        serializer = inventory_delta.InventoryDeltaSerializer(True, True)
++        serializer = inventory_delta.InventoryDeltaSerializer()
++        serializer.require_flags(versioned_root=True, tree_references=True)
          lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
          expected = """format: bzr inventory delta v1 (bzr 1.14)
  parent: null:
 === modified file 'bzrlib/tests/test_remote.py'
 --- bzrlib/tests/test_remote.py	2009-07-16 05:22:50 +0000
 +++ bzrlib/tests/test_remote.py	2009-07-22 00:35:23 +0000
@@ -31,6 +31,7 @@
      config,
      errors,
      graph,
++    inventory,
      pack,
      remote,
      repository,
@@ -38,6 +39,7 @@
      tests,
      treebuilder,
      urlutils,
++    versionedfile,
+     )
  from bzrlib.branch import Branch
  from bzrlib.bzrdir import BzrDir, BzrDirFormat
@@ -332,15 +334,6 @@
          reference_bzrdir_format = bzrdir.format_registry.get('default')()
          return reference_bzrdir_format.repository_format
--    def disable_verb(self, verb):
--        """Disable a verb for one test."""
--        request_handlers = smart.request.request_handlers
--        orig_method = request_handlers.get(verb)
--        request_handlers.remove(verb)
--        def restoreVerb():
--            request_handlers.register(verb, orig_method)
--        self.addCleanup(restoreVerb)
--
      def assertFinished(self, fake_client):
          """Assert that all of a FakeClient's expected calls have occurred."""
          fake_client.finished_test()
@@ -2220,54 +2213,214 @@
  class TestRepositoryInsertStream(TestRemoteRepository):
--
--    def test_unlocked_repo(self):
--        transport_path = 'quack'
--        repo, client = self.setup_fake_client_and_repository(transport_path)
--        client.add_expected_call(
--            'Repository.insert_stream', ('quack/', ''),
--            'success', ('ok',))
--        client.add_expected_call(
--            'Repository.insert_stream', ('quack/', ''),
--            'success', ('ok',))
--        sink = repo._get_sink()
--        fmt = repository.RepositoryFormat.get_default_format()
--        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
--        self.assertEqual([], resume_tokens)
--        self.assertEqual(set(), missing_keys)
--        self.assertFinished(client)
--
--    def test_locked_repo_with_no_lock_token(self):
--        transport_path = 'quack'
--        repo, client = self.setup_fake_client_and_repository(transport_path)
--        client.add_expected_call(
--            'Repository.lock_write', ('quack/', ''),
--            'success', ('ok', ''))
--        client.add_expected_call(
--            'Repository.insert_stream', ('quack/', ''),
--            'success', ('ok',))
--        client.add_expected_call(
--            'Repository.insert_stream', ('quack/', ''),
--            'success', ('ok',))
--        repo.lock_write()
--        sink = repo._get_sink()
--        fmt = repository.RepositoryFormat.get_default_format()
--        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
--        self.assertEqual([], resume_tokens)
--        self.assertEqual(set(), missing_keys)
--        self.assertFinished(client)
--
--    def test_locked_repo_with_lock_token(self):
--        transport_path = 'quack'
--        repo, client = self.setup_fake_client_and_repository(transport_path)
--        client.add_expected_call(
--            'Repository.lock_write', ('quack/', ''),
--            'success', ('ok', 'a token'))
--        client.add_expected_call(
--            'Repository.insert_stream_locked', ('quack/', '', 'a token'),
--            'success', ('ok',))
--        client.add_expected_call(
--            'Repository.insert_stream_locked', ('quack/', '', 'a token'),
++    """Tests for using Repository.insert_stream verb when the _1.18 variant is
++    not available.
++
++    This test case is very similar to TestRepositoryInsertStream_1_18.
++    """
++
++    def setUp(self):
++        TestRemoteRepository.setUp(self)
++        self.disable_verb('Repository.insert_stream_1.18')
++
++    def test_unlocked_repo(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'unknown', ('Repository.insert_stream_1.18',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
++        self.assertEqual([], resume_tokens)
++        self.assertEqual(set(), missing_keys)
++        self.assertFinished(client)
++
++    def test_locked_repo_with_no_lock_token(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.lock_write', ('quack/', ''),
++            'success', ('ok', ''))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'unknown', ('Repository.insert_stream_1.18',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        repo.lock_write()
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
++        self.assertEqual([], resume_tokens)
++        self.assertEqual(set(), missing_keys)
++        self.assertFinished(client)
++
++    def test_locked_repo_with_lock_token(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.lock_write', ('quack/', ''),
++            'success', ('ok', 'a token'))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
++            'unknown', ('Repository.insert_stream_1.18',))
++        client.add_expected_call(
++            'Repository.insert_stream_locked', ('quack/', '', 'a token'),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream_locked', ('quack/', '', 'a token'),
++            'success', ('ok',))
++        repo.lock_write()
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
++        self.assertEqual([], resume_tokens)
++        self.assertEqual(set(), missing_keys)
++        self.assertFinished(client)
++
++    def test_stream_with_inventory_delta(self):
++        """inventory-delta records can't be sent to the
++        Repository.insert_stream verb.  So when one is encountered the
++        RemoteSink immediately stops using that verb and falls back to VFS
++        insert_stream.
++        """
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'unknown', ('Repository.insert_stream_1.18',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream', ('quack/', ''),
++            'success', ('ok',))
++        # Create a fake real repository for insert_stream to fall back on, so
++        # that we can directly see the records the RemoteSink passes to the
++        # real sink.
++        class FakeRealSink:
++            def __init__(self):
++                self.records = []
++            def insert_stream(self, stream, src_format, resume_tokens):
++                for substream_kind, substream in stream:
++                    self.records.append(
++                        (substream_kind, [record.key for record in substream]))
++                return ['fake tokens'], ['fake missing keys']
++        fake_real_sink = FakeRealSink()
++        class FakeRealRepository:
++            def _get_sink(self):
++                return fake_real_sink
++        repo._real_repository = FakeRealRepository()
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        stream = self.make_stream_with_inv_deltas(fmt)
++        resume_tokens, missing_keys = sink.insert_stream(stream, fmt, [])
++        # Every record from the first inventory delta should have been sent to
++        # the VFS sink.
++        expected_records = [
++            ('inventories', [('rev2',), ('rev3',)]),
++            ('texts', [('some-rev', 'some-file')])]
++        self.assertEqual(expected_records, fake_real_sink.records)
++        # The return values from the real sink's insert_stream are propagated
++        # back to the original caller.
++        self.assertEqual(['fake tokens'], resume_tokens)
++        self.assertEqual(['fake missing keys'], missing_keys)
++        self.assertFinished(client)
++
++    def make_stream_with_inv_deltas(self, fmt):
++        """Make a simple stream with an inventory delta followed by more
++        records and more substreams to test that all records and substreams
++        from that point on are used.
++
++        This sends, in order:
++           * inventories substream: rev1, rev2, rev3.  rev2 and rev3 are
++             inventory-deltas.
++           * texts substream: (some-rev, some-file)
++        """
++        # Define a stream using generators so that it isn't rewindable.
++        def stream_with_inv_delta():
++            yield ('inventories', inventory_substream_with_delta())
++            yield ('texts', [
++                versionedfile.FulltextContentFactory(
++                    ('some-rev', 'some-file'), (), None, 'content')])
++        def inventory_substream_with_delta():
++            # An empty inventory fulltext.  This will be streamed normally.
++            inv = inventory.Inventory(revision_id='rev1')
++            text = fmt._serializer.write_inventory_to_string(inv)
++            yield versionedfile.FulltextContentFactory(
++                ('rev1',), (), None, text)
++            # An inventory delta.  This can't be streamed via this verb, so it
++            # will trigger a fallback to VFS insert_stream.
++            entry = inv.make_entry(
++                'directory', 'newdir', inv.root.file_id, 'newdir-id')
++            delta = [(None, 'newdir', 'newdir-id', entry)]
++            yield versionedfile.InventoryDeltaContentFactory(
++                ('rev2',), (('rev1',)), None, ('rev1',), (True, False), None)
++            # Another delta.
++            yield versionedfile.InventoryDeltaContentFactory(
++                ('rev3',), (('rev1',)), None, ('rev1',), (True, False), None)
++        return stream_with_inv_delta()
++
++
++class TestRepositoryInsertStream_1_18(TestRemoteRepository):
++
++    def test_unlocked_repo(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'success', ('ok',))
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
++        self.assertEqual([], resume_tokens)
++        self.assertEqual(set(), missing_keys)
++        self.assertFinished(client)
++
++    def test_locked_repo_with_no_lock_token(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.lock_write', ('quack/', ''),
++            'success', ('ok', ''))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', ''),
++            'success', ('ok',))
++        repo.lock_write()
++        sink = repo._get_sink()
++        fmt = repository.RepositoryFormat.get_default_format()
++        resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
++        self.assertEqual([], resume_tokens)
++        self.assertEqual(set(), missing_keys)
++        self.assertFinished(client)
++
++    def test_locked_repo_with_lock_token(self):
++        transport_path = 'quack'
++        repo, client = self.setup_fake_client_and_repository(transport_path)
++        client.add_expected_call(
++            'Repository.lock_write', ('quack/', ''),
++            'success', ('ok', 'a token'))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
++            'success', ('ok',))
++        client.add_expected_call(
++            'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
              'success', ('ok',))
          repo.lock_write()
          sink = repo._get_sink()
 === modified file 'bzrlib/tests/test_selftest.py'
 --- bzrlib/tests/test_selftest.py	2009-07-20 04:22:47 +0000
 +++ bzrlib/tests/test_selftest.py	2009-07-22 00:35:23 +0000
@@ -124,7 +124,7 @@
          self.assertEqual(sample_permutation,
                           get_transport_test_permutations(MockModule()))
--    def test_scenarios_invlude_all_modules(self):
++    def test_scenarios_include_all_modules(self):
          # this checks that the scenario generator returns as many permutations
          # as there are in all the registered transport modules - we assume if
          # this matches its probably doing the right thing especially in
@@ -297,14 +297,12 @@
          scenarios = make_scenarios(server1, server2, formats)
          self.assertEqual([
              ('str,str,str',
--             {'interrepo_class': str,
--              'repository_format': 'C1',
++             {'repository_format': 'C1',
                'repository_format_to': 'C2',
                'transport_readonly_server': 'b',
                'transport_server': 'a'}),
              ('int,str,str',
--             {'interrepo_class': int,
--              'repository_format': 'D1',
++             {'repository_format': 'D1',
                'repository_format_to': 'D2',
                'transport_readonly_server': 'b',
                'transport_server': 'a'})],
 === modified file 'bzrlib/tests/test_xml.py'
 --- bzrlib/tests/test_xml.py	2009-04-03 21:50:40 +0000
 +++ bzrlib/tests/test_xml.py	2009-07-22 00:35:23 +0000
@@ -19,6 +19,7 @@
  from bzrlib import (
      errors,
      inventory,
++    xml6,
      xml7,
      xml8,
      serializer,
@@ -139,6 +140,14 @@
  </inventory>
  """
++_expected_inv_v6 = """<inventory format="6" revision_id="rev_outer">
++<directory file_id="tree-root-321" name="" revision="rev_outer" />
++<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
++<file file_id="file-id" name="file" parent_id="tree-root-321" revision="rev_outer" text_sha1="A" text_size="1" />
++<symlink file_id="link-id" name="link" parent_id="tree-root-321" revision="rev_outer" symlink_target="a" />
++</inventory>
++"""
++
  _expected_inv_v7 = """<inventory format="7" revision_id="rev_outer">
  <directory file_id="tree-root-321" name="" revision="rev_outer" />
  <directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
@@ -377,6 +386,17 @@
          for path, ie in inv.iter_entries():
              self.assertEqual(ie, inv2[ie.file_id])
++    def test_roundtrip_inventory_v6(self):
++        inv = self.get_sample_inventory()
++        txt = xml6.serializer_v6.write_inventory_to_string(inv)
++        lines = xml6.serializer_v6.write_inventory_to_lines(inv)
++        self.assertEqual(bzrlib.osutils.split_lines(txt), lines)
++        self.assertEqualDiff(_expected_inv_v6, txt)
++        inv2 = xml6.serializer_v6.read_inventory_from_string(txt)
++        self.assertEqual(4, len(inv2))
++        for path, ie in inv.iter_entries():
++            self.assertEqual(ie, inv2[ie.file_id])
++
      def test_wrong_format_v7(self):
          """Can't accidentally open a file with wrong serializer"""
          s_v6 = bzrlib.xml6.serializer_v6
 === modified file 'bzrlib/versionedfile.py'
 --- bzrlib/versionedfile.py	2009-07-06 20:21:34 +0000
 +++ bzrlib/versionedfile.py	2009-07-22 00:35:23 +0000
@@ -34,6 +34,8 @@
      errors,
      groupcompress,
      index,
++    inventory,
++    inventory_delta,
      knit,
      osutils,
      multiparent,
@@ -158,6 +160,31 @@
              self.storage_kind)
++class InventoryDeltaContentFactory(ContentFactory):
++
++    def __init__(self, key, parents, sha1, delta, basis_id, format_flags,
++            repo=None):
++        self.sha1 = sha1
++        self.storage_kind = 'inventory-delta'
++        self.key = key
++        self.parents = parents
++        self._delta = delta
++        self._basis_id = basis_id
++        self._format_flags = format_flags
++        self._repo = repo
++
++    def get_bytes_as(self, storage_kind):
++        if storage_kind == self.storage_kind:
++            return self._basis_id, self.key, self._delta, self._format_flags
++        elif storage_kind == 'inventory-delta-bytes':
++            serializer = inventory_delta.InventoryDeltaSerializer()
++            serializer.require_flags(*self._format_flags)
++            return ''.join(serializer.delta_to_lines(
++                self._basis_id, self.key, self._delta))
++        raise errors.UnavailableRepresentation(self.key, storage_kind,
++            self.storage_kind)
++
++
  class AbsentContentFactory(ContentFactory):
      """A placeholder content factory for unavailable texts.
@@ -1551,13 +1578,15 @@
              record.get_bytes_as(record.storage_kind) call.
          """
          self._bytes_iterator = bytes_iterator
--        self._kind_factory = {'knit-ft-gz':knit.knit_network_to_record,
--            'knit-delta-gz':knit.knit_network_to_record,
--            'knit-annotated-ft-gz':knit.knit_network_to_record,
--            'knit-annotated-delta-gz':knit.knit_network_to_record,
--            'knit-delta-closure':knit.knit_delta_closure_to_records,
--            'fulltext':fulltext_network_to_record,
--            'groupcompress-block':groupcompress.network_block_to_records,
++        self._kind_factory = {
++            'fulltext': fulltext_network_to_record,
++            'groupcompress-block': groupcompress.network_block_to_records,
++            'inventory-delta': inventory_delta_network_to_record,
++            'knit-ft-gz': knit.knit_network_to_record,
++            'knit-delta-gz': knit.knit_network_to_record,
++            'knit-annotated-ft-gz': knit.knit_network_to_record,
++            'knit-annotated-delta-gz': knit.knit_network_to_record,
++            'knit-delta-closure': knit.knit_delta_closure_to_records,
+             }
      def read(self):
@@ -1583,6 +1612,21 @@
      return [FulltextContentFactory(key, parents, None, fulltext)]
++def inventory_delta_network_to_record(kind, bytes, line_end):
++    """Convert a network inventory-delta record to record."""
++    meta_len, = struct.unpack('!L', bytes[line_end:line_end+4])
++    record_meta = bytes[line_end+4:line_end+4+meta_len]
++    key, parents = bencode.bdecode_as_tuple(record_meta)
++    if parents == 'nil':
++        parents = None
++    inventory_delta_bytes = bytes[line_end+4+meta_len:]
++    deserialiser = inventory_delta.InventoryDeltaSerializer()
++    parse_result = deserialiser.parse_text_bytes(inventory_delta_bytes)
++    basis_id, new_id, rich_root, tree_refs, delta = parse_result
++    return [InventoryDeltaContentFactory(
++        key, parents, None, delta, basis_id, (rich_root, tree_refs))]
++
++
  def _length_prefix(bytes):
      return struct.pack('!L', len(bytes))
@@ -1598,6 +1642,17 @@
          _length_prefix(record_meta), record_meta, record_content)
++def record_to_inventory_delta_bytes(record):
++    record_content = record.get_bytes_as('inventory-delta-bytes')
++    if record.parents is None:
++        parents = 'nil'
++    else:
++        parents = record.parents
++    record_meta = bencode.bencode((record.key, parents))
++    return "inventory-delta\n%s%s%s" % (
++        _length_prefix(record_meta), record_meta, record_content)
++
++
  def sort_groupcompress(parent_map):
      """Sort and group the keys in parent_map into groupcompress order.
 === modified file 'bzrlib/xml5.py'
 --- bzrlib/xml5.py	2009-04-04 02:50:01 +0000
 +++ bzrlib/xml5.py	2009-07-22 00:35:23 +0000
@@ -39,8 +39,8 @@
          format = elt.get('format')
          if format is not None:
              if format != '5':
--                raise BzrError("invalid format version %r on inventory"
--                                % format)
++                raise errors.BzrError("invalid format version %r on inventory"
++                                      % format)
          data_revision_id = elt.get('revision_id')
          if data_revision_id is not None:
              revision_id = cache_utf8.encode(data_revision_id)