Merge lp:~spiv/bzr/inventory-delta into lp:~bzr/bzr/trunk-old

Proposed by Andrew Bennetts
Status: Superseded
Proposed branch: lp:~spiv/bzr/inventory-delta
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 2476 lines (has conflicts)
Text conflict in NEWS
To merge this branch: bzr merge lp:~spiv/bzr/inventory-delta
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+9125@code.launchpad.net

This proposal supersedes a proposal from 2009-07-16.

This proposal has been superseded by a proposal from 2009-08-05.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This is a pretty big patch. It does lots of things:

 * adds new insert_stream and get_stream verbs
 * adds de/serialization of inventory-delta records on the network
 * fixes rich-root generation in StreamSource
 * adds a bunch of new scenarios to per_interrepository tests
 * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
   the new pack is already optimal).
 * improves the inventory_delta module a little
 * various miscellaneous fixes and new tests that are hopefully self-evident
 * and, most controversially, removes InterDifferingSerializer.

From John's mail a while back there were a bunch of issues with removing IDS. I
think the outstanding ones are:

> 1) Incremental updates. IDS converts batches of 100 revs at a time,
> which also triggers autopacks at 1k revs. Streaming fetch is currently
> an all-or-nothing, which isn't appropriate (IMO) for conversions.
> Consider that conversion can take *days*, it is important to have
> something that can be stopped and resumed.
>
> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
> bloats to 2.4GB before packing back to 25MB. We know the new format is
> even more sensitive to packing efficiency. Not to mention that a single
> big-stream generates a single large pack, it isn't directly obvious that
> we are being so inefficient.

i.e. performance concerns.

The streaming code is pretty similar in how it does the conversion now to the
way IDS did it, but probably still different enough that we will want to measure
the impact of this. I'm definitely concerned about case 2, the lack of packing
as you go, although perhaps the degree of bloat is reduced by using
semantic inventory-delta records?

The reason why I eventually deleted IDS was that it was just too burdensome to
keep two code paths alive, thoroughly tested, and correct. For instance, if we
simply reinstated IDS for local-only fetches then most of the test suite,
including the relevant interrepo tests, will only exercise IDS. Also, IDS
turned out to have a bug when used on a stacked repository that the extending
test suite in this branch revealed (I've forgotten the details, but can dig them
up if you like). It didn't seem worth the hassle of fixing IDS when I already
had a working implementation.

I'm certainly open to reinstating IDS if it's the most expedient way to have
reasonable local performance for upgrades, but I thought I'd try to be bold and
see if we could just live without the extra complexity. Maybe we can improve
performance of streaming rather than resurrect IDS?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (4.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.

It also picks out the 'optimal' deltas by computing many different ones
and finding whichever one was the 'smallest'. For local conversions, the
time to compute 2-3 deltas was much smaller than to apply an inefficient
delta.

>>
>> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
>> bloats to 2.4GB before packing back to 25MB. We know the new format is
>> even more sensitive to packing efficiency. Not to mention that a single
>> big-stream generates a single large pack, it isn't directly obvious that
>> we are being so inefficient.
>
> i.e. performance concerns.
>

Generally, yes.

There is also:

3) Being able to resume because you snapshotted periodically as you
went. This seems even more important for a network transfer.

> The streaming code is pretty similar in how it does the conversion now to the
> way IDS did it, but probably still different enough that we will want to measure
> the impact of this. I'm definitely concerned about case 2, the lack of packing
> as you go, although perhaps the degree of bloat is reduced by using
> semantic inventory-delta records?
>

I don't think bzr bloating from 100MB => 2.4GB (and then back down to
25MB post pack) was because of inventory records. However, if it was
purely because of a bad streaming order, we could probably fix that by
changing how we stream texts.

> The reason why I eventually deleted IDS was that it was just too burdensome to
> keep two code paths alive, thoroughly tested, and correct. For instance, if we
> simply reinstated IDS for local-only fetches then most of the test suite,
> including the relevant interrepo tests, will only exercise IDS. Also, IDS
> turned out to have a bug when used on a stacked repository that the extending
> test suite in this branch revealed (I've forgotten the details, but can dig them
> up if you like). It didn't seem worth the hassle of fixing IDS when I already
> had a working imple...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

and

4) Progress indication

This is really quite useful for a process that can take *days* to
complete. The Stream code is often quite nice, but the fact that it
gives you 2 states:
 'getting stream'
 'inserting stream'

and nothing more than that is pretty crummy.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfUK8ACgkQJdeBCYSNAAMa+wCgybpPdd4Yie/Craew/zxX9eF7
cWMAoNcxPftDDdLssboDW7rezk4d2L2d
=WA26
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:

So for starters, let me mention what I found wrt performance:

time bzr.dev branch mysql-1k myqsl-2a/1k
  real 3m18.490s

time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
  real 2m29.953s

+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
                 return cached_ie.copy()

         kind = elt.tag

It has 2 basic effects:

1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())

2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
  x.attribute1 == y.attribute1
  and x.attribute2 == y.attribute2
etc.

As you can see it is a big win for this test case (about 4:3 or 33% faster)

So what about Andrew's work:

time bzr.inv.delta branch mysql-1k myqsl-2a/1k
  real 10m14.267s

time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
  real 9m49.372s

It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043

For most of that time, making it really look like it was stalled.

Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.

If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).

So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfWm0ACgkQJdeBCYSNAAM8mgCgru3K3SpP8BcMZdLJLH...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Andrew Bennetts wrote:

...

> So for starters, let me mention what I found wrt performance:
>
> time bzr.dev branch mysql-1k myqsl-2a/1k
> real 3m18.490s
>
> time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
> real 2m29.953s

...

> time bzr.inv.delta branch mysql-1k myqsl-2a/1k
> real 10m14.267s
>
> time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
> real 9m49.372s

Also, for real-world space issues:
$ du -ksh mysql-2a*/.bzr/repository/obsolete*
1.9M mysql-2a-bzr.dev/.bzr/repository/obsolete_packs
467M mysql-2a-inv-delta/.bzr/repository/obsolete_packs

The peak size (watch du -ksh mysql-2a-bzr.dev) during conversion using
IDS was 49MB.

$ du -ksh mysql-2a*/.bzr/repository/packs*
11M mysql-2a-bzr.dev/.bzr/repository/packs
9.1M mysql-2a-inv-delta/.bzr/repository/packs

So the new code wins slightly in the final size on disk, because it
packed at the end, rather than at 1k revs (and then there were another
40+ revs inserted.)

However, it bloated from 15MB => 467MB while it was doing the transfer
before the final size. Versus a peak of 50MB (almost 10x larger).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfdCkACgkQJdeBCYSNAANABACgl4l4L1AjaiXRJgrn5iwLrVe1
tVEAnRRJ0QbWzd8lXFXQXhWdhvqFjnw8
=pXZe
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

John A Meinel wrote:
[...]
> It also picks out the 'optimal' deltas by computing many different ones
> and finding whichever one was the 'smallest'. For local conversions, the
> time to compute 2-3 deltas was much smaller than to apply an inefficient
> delta.

FWIW, the streaming code also does this. My guess (not yet measured) is that
sending less bytes over the network is also a win, especially when one parent
might be a one-liner and the other might be large merge from trunk.

[...]
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

Yes, although we already don't have this for the network. It would be great to
have...

[...]
> I'm certainly open to the suggestion of getting rid of IDS. I don't like
> having multiple code paths. It just happens that there are *big* wins
> and it is often easier to write optimized code in a different framework.

Sure. Like I said for me it was just getting to be a large hassle to maintain
both paths in my branch, even though they were increasingly sharing a lot of
code for e.g. rich root generation before I deleted IDS.

I'd like to try see if we can cheaply fix the performance issues you report in
other mails without needing IDS. If we do need IDS for a while longer then
fine, although I think we'll want to restrict it to local source, local target,
non-stacked cases only.

Thanks for the measurements and quick feedback.

-Andrew.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:06 +0000, John A Meinel wrote:
>
> (3) is an issue I'd like to see addressed, but which Robert seems
> particularly unhappy having us try to do. (See other bug comments, etc
> about how other systems don't do it and he feels it isn't worth
> doing.)

I'd like to be clear about this. I'd be ecstatic *if* we can do it well
and robustly. However I don't think it is *at all* easy to that. If I'm
wrong - great.

I'm fine with keeping IDS for local fetches. But when networking is
involved IDS is massively slower than the streaming codepath.

> It was fairly straightforward to do with IDS, the argument I think
> from
> Robert is that the client would need to be computing whether it has a
> 'complete' set and thus can commit the current write group. (the
> *source* knows these sort of things, and can just say "and now you
> have
> it", but the client has to re-do all that work to figure it out from a
> stream.)

I think that aspect is simple - we have a stream subtype that says
'checkpoint'. Its the requirement to do all that work that is, I think
problematic - and thats *without* considering stacking, which makes it
hugely harder.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>
>
> 4) Progress indication
>
> This is really quite useful for a process that can take *days* to
> complete. The Stream code is often quite nice, but the fact that it
> gives you 2 states:
> 'getting stream'
> 'inserting stream'
>
> and nothing more than that is pretty crummy.

That is a separate bug however, and one that affects normal fetches too.
So I don't think tying it to the IDS discussion is necessary or
particularly helpful.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>>
>> 4) Progress indication
>>
>> This is really quite useful for a process that can take *days* to
>> complete. The Stream code is often quite nice, but the fact that it
>> gives you 2 states:
>> 'getting stream'
>> 'inserting stream'
>>
>> and nothing more than that is pretty crummy.
>
> That is a separate bug however, and one that affects normal fetches too.
> So I don't think tying it to the IDS discussion is necessary or
> particularly helpful.
>
> -Rob
>

It is explicitly relevant that doing "bzr upgrade --2a" which will take
longer-than-normal would now not even show a progress bar.

For local fetches, you don't even get the "transport activity"
indicator, so it *really* looks hung. It doesn't even write things into
.bzr.log so that you know it is doing anything other than spinning in a
while True loop. I guess you can tell because your disk consumption is
going way up...

I don't honestly know the performance difference for streaming a lot of
content over the network. Given a 4x performance slowdown, for large
fetches IDS could still be faster. I certainly agree that IDS is
probably significantly more inefficient when doing something like "give
me the last 2 revs".

It honestly wasn't something I was optimizing for (cross format
fetching). I *was* trying to make 'bzr upgrade' be measured in hours
rather than days/weeks/etc.

Also, given that you have to upgrade all of your stacked locations at
the same time, and --2a is a trap door, aren't 95% of upgrades going to
be all at once anyway?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpf46YACgkQJdeBCYSNAANwAwCfYQj7gws3O4KDPxqrcMLu4nfB
554AoIyuns4b5Fsa3wf4uFhf4Uex00oQ
=qjX9
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :

This is the same patch updated for bzr.dev, and with InterDifferingSerializer restored for local fetches only. For non-local fetches I'm pretty sure the streaming code path is massively faster based on the timings I've done (even via TCP HPSS to localhost).

I'm sure we do want to get rid of IDS eventually (and fix the shortcomings in streaming that John has pointed out) but doing that shouldn't block the rest of this work, even if it is a small maintenance headache.

[FWIW, if I don't restrict IDS to local-only branches I get test failures like:

Traceback (most recent call last):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 137, in test_fetch_parent_inventories_at_stacking_boundary_smart_old
    self.test_fetch_parent_inventories_at_stacking_boundary()
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 181, in test_fetch_parent_inventories_at_stacking_boundary
    self.assertCanStreamRevision(unstacked_repo, 'merge')
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 187, in assertCanStreamRevision
    for substream_kind, substream in source.get_stream(search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/remote.py", line 1895, in missing_parents_chain
    for kind, stream in self._get_stream(sources[0], search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/repository.py", line 537, in record_stream
    for bytes in byte_stream:
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 338, in read_streamed_body
    _translate_error(self._body_error_args)
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 361, in _translate_error
    raise errors.ErrorFromSmartServer(error_tuple)
ErrorFromSmartServer: Error received from smart server: ('error', "<bzrlib.groupcompress.GroupCompressVersionedFiles object at 0xb06348c> has no revision ('sha1:98fd3a13366960dc27dcb4b6ddb2b55aca3aae7b',)")

(for scenarios like Pack6RichRoot->2a).]

Anyway, please review.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm not really sure what testing you've done on this, but I'm getting some really strange results.

Specifically, when I push it turns out to be sending xml fragments over the wire. Specifically it seems to be this code:
        elif (not from_format.supports_chks):
            # Source repository doesn't support chks. So we can transmit the
            # inventories 'as-is' and either they are just accepted on the
            # target, or the Sink will properly convert it.
            # (XXX: this assumes that all non-chk formats are understood as-is
            # by any Sink, but that presumably isn't true for foreign repo
            # formats added by bzr-svn etc?)
            return self._get_simple_inventory_stream(revision_ids,
                    missing=missing)

Which means that the raw xml bytes are being transmitted, and then the target side is extracting the xml upcasting and downcasting.

I see that there are code paths in place to do otherwise, but as near as I can tell, "_stream_invs_as_deltas" is only getting called if the *source* format is CHK.

From the profiling I've done, the _generate_root_texts() code is the bulk of the overhead with the new format. But *that* is because all of the data is sent to the server as XML texts, and the server has to do all the work to convert it.

When I just disable the code path that sends 'simple_inventory_stream', then I get:

$ time wbzr push bzr://localhost/test-2a/x
bzr: ERROR: Version present for / in TREE_ROOT

So I have a *strong* feeling the code you've introduced is actually broken, and you just didn't realize it.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (7.1 KiB)

So, to further my discussion. While investigating this, I found some odd bits in _stream_invs_as_deltas. For example:

+ def _stream_invs_as_deltas(self, revision_ids, fulltexts=False):
         from_repo = self.from_repository

...

+ inventories = self.from_repository.iter_inventories(
+ revision_ids, 'topological')
+ # XXX: ideally these flags would be per-revision, not per-repo (e.g.
+ # streaming a non-rich-root revision out of a rich-root repo back into
+ # a non-rich-root repo ought to be allowed)
+ format = from_repo._format
+ flags = (format.rich_root_data, format.supports_tree_reference)
+ invs_sent_so_far = set([_mod_revision.NULL_REVISION])
+ for inv in inventories:
             key = (inv.revision_id,)

...

+ for parent_id in parent_ids:
...
+ if (best_delta is None or
+ len(best_delta) > len(candidate_delta)):
+ best_delta = candidate_delta
+ basis_id = parent_id
+ delta = best_delta
+ invs_sent_so_far.add(basis_id)

^- Notice that once you've prepared a delta for "inv.revision_id" you then set "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending most of the inventories as complete texts, because you won't actually think you've sent what you actually have done.

+ yield versionedfile.InventoryDeltaContentFactory(
+ key, parents, None, delta, basis_id, flags, from_repo)

The way you've handled the "no parents are available use NULL", means that you
actually create a delta to NULL for *every* revision, and find out if it is the
smallest one. Which seems inefficient versus just using NULL when nothing else
is available. (Note that IDS goes as far as to not use parents that aren't
cached even if they have been sent, and to fall back to using the last-sent
revision otherwise.)

I've changed this loop around a bit, to avoid some duplication and make it a little bit clearer (at least to me) what is going on. I also added a quick LRUCache for the parent inventories that we've just sent, as re-extracting them from the repository is not going to be efficient. (Especially extracting them one at a time.)

Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.

It seems like we should be consistent at that level. What do you think?

As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:

            serializer.require_flags(*self._format_flags)

And that, IMO, is asking for trouble. Yes it is most likely correct as written, but it means that you have to line up the tuple created at the start of _stream_invs_as_deltas:
        flags = (format.rich_root_data, format.supports_tree_reference)

with the *arguments* to a function nested 3 calls away.
So how about:

  flags = {'rich_root_data': format.rich_root_data,
           'supports_tree_reference': format.supports_tree_reference
          }
...

and
  serializer.require_flags(**self._format_flags)

It isn't vastly better, but ...

Read more...

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
>
> I suppose one possibility would be to have the client reject the
> stream entirely and request it be re-sent with the basis available.

If we do this then... The server just asks for that inventory again. The
problem though, is that *all* inventories will need to be asked for
again, which could be pathologically bad. It is possible to write a file
and restore it later without network API changes, just pass in write
group save tokens. I don't think we need to block on that for this patch
though - sending a full inventory is no worse than sending the full text
of a file, which 2a does as well - and ChangeLog files etc are very big
- as large or larger than an inventory, I would expect.

> Of course, the get_stream_for_missing_keys code is wired up such that
> clients
> ignore the response about missing inventories if the target is a
> stacked repo,
> and just always force up an extra fulltext copy of all parent
> inventories
> anyway. (Because of the bugs in 1.13, IIRC.)

The condition is wrong... it probably should say 'if we're not told
anything is missing..' - and it should be guarded such that the new push
verb (introduced recently?) doesn' trigger this paranoia.

> Anyway, it means that what you get out the other side of translating
> into and then back out of the serialized form is not exactly equal to
> what went in, which seems very bad (and likely to *not* break when
> using local actions, and then have it break when using remote ones.)

This is a reason to serialise/deserialise locally ;) - at least to start
with.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

Robert Collins wrote:
> On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
> >
> > I suppose one possibility would be to have the client reject the
> > stream entirely and request it be re-sent with the basis available.
>
> If we do this then... The server just asks for that inventory again. The
> problem though, is that *all* inventories will need to be asked for
> again, which could be pathologically bad. It is possible to write a file
> and restore it later without network API changes, just pass in write
> group save tokens. I don't think we need to block on that for this patch
> though - sending a full inventory is no worse than sending the full text
> of a file, which 2a does as well - and ChangeLog files etc are very big
> - as large or larger than an inventory, I would expect.

Also, we currently only allow one “get missing keys” stream after the original
stream. So if we optimistically send just a delta then having the client
reject it means that the next time we have to send not only a full delta closure
the second time, we also need to send all the inventory parents, because there's
no further opportunity for the client to ask for more keys. This is potentially
even worse than just sending a single fulltext.

I think sending a fulltext cross-format is fine. We aren't trying to maximally
optimise cross-format fetches, just make them acceptable. We should file a bug
for improving this, but I don't think it's particularly urgent.

FWIW, I think the correct way to fix this is to allow the receiving repository
to somehow store inventory-deltas that it can't add yet, so that it can work as
part of the regular suspend_write_group and get missing keys logic. Then the
sender can optimistically send a delta, and the receiver can either insert it or
store it in a temporary upload pack and ask for the parent keys, just as we
already do for all other types of deltas. This is optimal in the case where the
recipient already has the basis, and only requires one extra roundtrip in the
case that it doesn't. We can perhaps do this by creating an upload pack just to
store the fulltexts of inventory-deltas, including that pack in the resume
tokens, but making sure never to insert in the final commit.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (5.1 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> So, to further my discussion. While investigating this, I found some odd bits
> in _stream_invs_as_deltas. For example:
[...]
>
> ^- Notice that once you've prepared a delta for "inv.revision_id" you then set
> "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending
> most of the inventories as complete texts, because you won't actually think
> you've sent what you actually have done.

Oops, right. I've merged your fix for that.

> + yield versionedfile.InventoryDeltaContentFactory(
> + key, parents, None, delta, basis_id, flags, from_repo)
>
> The way you've handled the "no parents are available use NULL", means that you
> actually create a delta to NULL for *every* revision, and find out if it is the
> smallest one. Which seems inefficient versus just using NULL when nothing else
> is available. (Note that IDS goes as far as to not use parents that aren't
> cached even if they have been sent, and to fall back to using the last-sent
> revision otherwise.)
>
> I've changed this loop around a bit, to avoid some duplication and make it a
> little bit clearer (at least to me) what is going on. I also added a quick
> LRUCache for the parent inventories that we've just sent, as re-extracting
> them from the repository is not going to be efficient. (Especially extracting
> them one at a time.)

Thanks, I've merged your fix.

> Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.
>
> It seems like we should be consistent at that level. What do you think?

It would be nice, but not vital.

> As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:
[...]
> serializer.require_flags(**self._format_flags)
>
> It isn't vastly better, but at least they are named arguments, rather than
> fairly arbitrary positional ones.

Ok, I'll do that.

> In the end, I don't think that it is ideal that an incremental push of 1
> revision will transmit a complete inventory (in delta form). I understand the
> limitation is that we currently are unable to 'buffer' these records anywhere
> in a persistent state between RPC calls. (Even though we have a bytes-on-the
> wire representation, we don't have an index/etc to look them up in.)

(As said elsewhere on this thread...)

This is only an issue for a cross-format push, which doesn't have to be
maximally efficient. It just has to be reasonable. We can look at doing better
later if we want, but this is already a massive improvement in terms of data
transferred compared to IDS.

[...]
> For *me* the ssh handshake is probably signficantly more than the time to push
> up 2 inventories of bzr.dev, but that tradeoff is very different for people
> working over a tiny bandwidth from their GPRS phone, trying to push a critical
> bugfix for a Launchpad branch.

Then they probably should go to the effort of having their local branch in a
matching format :)

We're not intending to use inventory-deltas for anything but cross-format
fetches AFAIK.

> Again, the code as written fails to actually transmit data over the...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>
> > Digging further, the bytes on the wire don't include things like:
> > InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
> > InventoryDeltaContentFactory.sha1
>
> Inventories don't have parent lists, IIRC? Revisions do, and text
> versions do,
> but inventories don't. (They may have a compression parent, but not a
> semantic
> parent.)

Indeed. I wouldn't give them one in the deltas; let repositories that
care calculate one.
parents=None is valid for the interface..

> The sha1 is never set, although perhaps it should be.

No it shouldn't. the sha1 is format specific, and we don't convert back
to the original format to check it, so it would, at best, be discarded.
sha1 = None is valid for the interface as well

_Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>>> Digging further, the bytes on the wire don't include things like:
>>> InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
>>> InventoryDeltaContentFactory.sha1
>> Inventories don't have parent lists, IIRC? Revisions do, and text
>> versions do,
>> but inventories don't. (They may have a compression parent, but not a
>> semantic
>> parent.)
>
> Indeed. I wouldn't give them one in the deltas; let repositories that
> care calculate one.
> parents=None is valid for the interface..

So... all of our inventory texts have parents in all repository formats
that exist *today*.

It isn't feasible to have them query the revisions. Because in "knit"
streams, they haven't received the revision data until after the
inventory data.

At first I thought we had fixed inventories in CHK, but I checked, and
it still has parents. And there is other code that assumes this fact.

Case in point, my new bundle sending/receiving code. It intentionally
queries the inventories.get_parent_map because
revisions.get_parent_map() has *not* been filled in yet.
(Bundle.insert_revisions always inserts objects as texts, inventories,
revisions.)

John
=:->

>
>> The sha1 is never set, although perhaps it should be.
>
> No it shouldn't. the sha1 is format specific, and we don't convert back
> to the original format to check it, so it would, at best, be discarded.
> sha1 = None is valid for the interface as well
>
> _Rob

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp25csACgkQJdeBCYSNAANx4ACeJKMpECIH1YR6yJ5+RxNCNgIW
jGAAn2eOWm6DNJNWweSo4NF5CZ34OJiC
=Hyh5
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-08-03 at 13:30 +0000, John A Meinel wrote:
>
> > Indeed. I wouldn't give them one in the deltas; let repositories
> that
> > care calculate one.
> > parents=None is valid for the interface..
>
> So... all of our inventory texts have parents in all repository
> formats
> that exist *today*.
>
> It isn't feasible to have them query the revisions. Because in "knit"
> streams, they haven't received the revision data until after the
> inventory data.

ugh. ugh ugh ugh swear swear swear.

> At first I thought we had fixed inventories in CHK, but I checked, and
> it still has parents. And there is other code that assumes this fact.
>
> Case in point, my new bundle sending/receiving code. It intentionally
> queries the inventories.get_parent_map because
> revisions.get_parent_map() has *not* been filled in yet.
> (Bundle.insert_revisions always inserts objects as texts, inventories,
> revisions.)

Why does it need to do that though?

We're going to have to break this chain sometime.

If we can't at this stage, we'll need to either supply parents on the
serialised deltas, or add a parents field to the inventory serialisation
form. I prefer the former, myself.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

Robert Collins wrote:
[...]
> If we can't at this stage, we'll need to either supply parents on the
> serialised deltas, or add a parents field to the inventory serialisation
> form. I prefer the former, myself.

FWIW, my current code sends the parents for the inventory in the .parents of the
FulltextContentFactory holding the serialised inventory-delta (and uses that
when calling add_inventory_by_delta).

So my patch maintains the status quo here. (I'm actually fairly sure that the
version of the code John reviewed which inspired this discussion was doing this
too, but that's irrelevant at this point.)

-Andrew.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-07-28 08:14:15 +0000
3+++ NEWS 2009-07-28 12:36:07 +0000
4@@ -78,6 +78,7 @@
5 lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or
6 ``do_end``. (Andrew Bennetts, #338561)
7
8+<<<<<<< TREE
9 * The optional ``_knit_load_data_pyx`` C extension was never being
10 imported. This caused significant slowdowns when reading data from
11 repositories. (Andrew Bennetts, #405653)
12@@ -88,6 +89,11 @@
13 infinite loop while trying to find a 'relative path'.
14 (John Arbash Meinel, #394227)
15
16+=======
17+* StreamSource generates rich roots from non-rich root sources correctly
18+ now. (Andrew Bennetts, #368921)
19+
20+>>>>>>> MERGE-SOURCE
21 * ``WorkingTree4.unversion`` will no longer fail to unversion ids which
22 were present in a parent tree but renamed in the working tree.
23 (Robert Collins, #187207)
24@@ -95,6 +101,7 @@
25 Improvements
26 ************
27
28+<<<<<<< TREE
29 * Can now rename/move files even if they have been removed from the inventory.
30 (Marius Kruger)
31
32@@ -103,6 +110,14 @@
33 than VFS methods. For example, pushes of small branches with tags take
34 11 rather than 18 smart server requests. (Andrew Bennetts, #398608)
35
36+=======
37+* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
38+ smart server are more efficient now. They send inventory deltas rather
39+ than full inventories. The smart server has two new requests,
40+ ``Repository.get_stream_1.18`` and ``Repository.insert_stream_1.18`` to
41+ support this. (Andrew Bennetts, #374738, #385826)
42+
43+>>>>>>> MERGE-SOURCE
44 Documentation
45 *************
46
47@@ -135,6 +150,9 @@
48 * ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds
49 as new a key which was already mapped. (Robert Collins)
50
51+* InterDifferingSerializer has been removed. The transformations it
52+ provided are now done automatically by StreamSource. (Andrew Bennetts)
53+
54 * Inventory delta application catches more cases of corruption and can
55 prevent corrupt deltas from affecting consistency of data structures on
56 disk. (Robert Collins)
57
58=== modified file 'bzrlib/fetch.py'
59--- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
60+++ bzrlib/fetch.py 2009-07-28 12:36:07 +0000
61@@ -25,16 +25,21 @@
62
63 import operator
64
65+from bzrlib.lazy_import import lazy_import
66+lazy_import(globals(), """
67+from bzrlib import (
68+ tsort,
69+ versionedfile,
70+ )
71+""")
72 import bzrlib
73 from bzrlib import (
74 errors,
75 symbol_versioning,
76 )
77 from bzrlib.revision import NULL_REVISION
78-from bzrlib.tsort import topo_sort
79 from bzrlib.trace import mutter
80 import bzrlib.ui
81-from bzrlib.versionedfile import FulltextContentFactory
82
83
84 class RepoFetcher(object):
85@@ -213,11 +218,9 @@
86
87 def _find_root_ids(self, revs, parent_map, graph):
88 revision_root = {}
89- planned_versions = {}
90 for tree in self.iter_rev_trees(revs):
91 revision_id = tree.inventory.root.revision
92 root_id = tree.get_root_id()
93- planned_versions.setdefault(root_id, []).append(revision_id)
94 revision_root[revision_id] = root_id
95 # Find out which parents we don't already know root ids for
96 parents = set()
97@@ -229,7 +232,7 @@
98 for tree in self.iter_rev_trees(parents):
99 root_id = tree.get_root_id()
100 revision_root[tree.get_revision_id()] = root_id
101- return revision_root, planned_versions
102+ return revision_root
103
104 def generate_root_texts(self, revs):
105 """Generate VersionedFiles for all root ids.
106@@ -238,9 +241,8 @@
107 """
108 graph = self.source.get_graph()
109 parent_map = graph.get_parent_map(revs)
110- rev_order = topo_sort(parent_map)
111- rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(
112- revs, parent_map, graph)
113+ rev_order = tsort.topo_sort(parent_map)
114+ rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
115 root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
116 rev_order]
117 # Guaranteed stable, this groups all the file id operations together
118@@ -249,20 +251,77 @@
119 # yet, and are unlikely to in non-rich-root environments anyway.
120 root_id_order.sort(key=operator.itemgetter(0))
121 # Create a record stream containing the roots to create.
122- def yield_roots():
123- for key in root_id_order:
124- root_id, rev_id = key
125- rev_parents = parent_map[rev_id]
126- # We drop revision parents with different file-ids, because
127- # that represents a rename of the root to a different location
128- # - its not actually a parent for us. (We could look for that
129- # file id in the revision tree at considerably more expense,
130- # but for now this is sufficient (and reconcile will catch and
131- # correct this anyway).
132- # When a parent revision is a ghost, we guess that its root id
133- # was unchanged (rather than trimming it from the parent list).
134- parent_keys = tuple((root_id, parent) for parent in rev_parents
135- if parent != NULL_REVISION and
136- rev_id_to_root_id.get(parent, root_id) == root_id)
137- yield FulltextContentFactory(key, parent_keys, None, '')
138- return [('texts', yield_roots())]
139+ from bzrlib.graph import FrozenHeadsCache
140+ graph = FrozenHeadsCache(graph)
141+ new_roots_stream = _new_root_data_stream(
142+ root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
143+ return [('texts', new_roots_stream)]
144+
145+
146+def _new_root_data_stream(
147+ root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
148+ for root_key in root_keys_to_create:
149+ root_id, rev_id = root_key
150+ parent_keys = _parent_keys_for_root_version(
151+ root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph)
152+ yield versionedfile.FulltextContentFactory(
153+ root_key, parent_keys, None, '')
154+
155+
156+def _parent_keys_for_root_version(
157+ root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
158+ """Get the parent keys for a given root id."""
159+ # Include direct parents of the revision, but only if they used the same
160+ # root_id and are heads.
161+ rev_parents = parent_map[rev_id]
162+ parent_ids = []
163+ for parent_id in rev_parents:
164+ if parent_id == NULL_REVISION:
165+ continue
166+ if parent_id not in rev_id_to_root_id_map:
167+ # We probably didn't read this revision, go spend the extra effort
168+ # to actually check
169+ try:
170+ tree = repo.revision_tree(parent_id)
171+ except errors.NoSuchRevision:
172+ # Ghost, fill out rev_id_to_root_id in case we encounter this
173+ # again.
174+ # But set parent_root_id to None since we don't really know
175+ parent_root_id = None
176+ else:
177+ parent_root_id = tree.get_root_id()
178+ rev_id_to_root_id_map[parent_id] = None
179+ # XXX: why not:
180+ # rev_id_to_root_id_map[parent_id] = parent_root_id
181+ # memory consumption maybe?
182+ else:
183+ parent_root_id = rev_id_to_root_id_map[parent_id]
184+ if root_id == parent_root_id:
185+ # With stacking we _might_ want to refer to a non-local revision,
186+ # but this code path only applies when we have the full content
187+ # available, so ghosts really are ghosts, not just the edge of
188+ # local data.
189+ parent_ids.append(parent_id)
190+ else:
191+ # root_id may be in the parent anyway.
192+ try:
193+ tree = repo.revision_tree(parent_id)
194+ except errors.NoSuchRevision:
195+ # ghost, can't refer to it.
196+ pass
197+ else:
198+ try:
199+ parent_ids.append(tree.inventory[root_id].revision)
200+ except errors.NoSuchId:
201+ # not in the tree
202+ pass
203+ # Drop non-head parents
204+ if graph is None:
205+ graph = repo.get_graph()
206+ heads = graph.heads(parent_ids)
207+ selected_ids = []
208+ for parent_id in parent_ids:
209+ if parent_id in heads and parent_id not in selected_ids:
210+ selected_ids.append(parent_id)
211+ parent_keys = [(root_id, parent_id) for parent_id in selected_ids]
212+ return parent_keys
213
214=== modified file 'bzrlib/inventory_delta.py'
215--- bzrlib/inventory_delta.py 2009-04-02 05:53:12 +0000
216+++ bzrlib/inventory_delta.py 2009-07-28 12:36:07 +0000
217@@ -119,28 +119,46 @@
218 class InventoryDeltaSerializer(object):
219 """Serialize and deserialize inventory deltas."""
220
221+ # XXX: really, the serializer and deserializer should be two separate
222+ # classes.
223+
224 FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
225
226- def __init__(self, versioned_root, tree_references):
227- """Create an InventoryDeltaSerializer.
228+ def __init__(self):
229+ """Create an InventoryDeltaSerializer."""
230+ self._versioned_root = None
231+ self._tree_references = None
232+ self._entry_to_content = {
233+ 'directory': _directory_content,
234+ 'file': _file_content,
235+ 'symlink': _link_content,
236+ }
237+
238+ def require_flags(self, versioned_root=None, tree_references=None):
239+ """Set the versioned_root and/or tree_references flags for this
240+ (de)serializer.
241
242 :param versioned_root: If True, any root entry that is seen is expected
243 to be versioned, and root entries can have any fileid.
244 :param tree_references: If True support tree-reference entries.
245 """
246+ if versioned_root is not None and self._versioned_root is not None:
247+ raise AssertionError(
248+ "require_flags(versioned_root=...) already called.")
249+ if tree_references is not None and self._tree_references is not None:
250+ raise AssertionError(
251+ "require_flags(tree_references=...) already called.")
252 self._versioned_root = versioned_root
253 self._tree_references = tree_references
254- self._entry_to_content = {
255- 'directory': _directory_content,
256- 'file': _file_content,
257- 'symlink': _link_content,
258- }
259 if tree_references:
260 self._entry_to_content['tree-reference'] = _reference_content
261
262 def delta_to_lines(self, old_name, new_name, delta_to_new):
263 """Return a line sequence for delta_to_new.
264
265+ Both the versioned_root and tree_references flags must be set via
266+ require_flags before calling this.
267+
268 :param old_name: A UTF8 revision id for the old inventory. May be
269 NULL_REVISION if there is no older inventory and delta_to_new
270 includes the entire inventory contents.
271@@ -150,6 +168,10 @@
272 takes.
273 :return: The serialized delta as lines.
274 """
275+ if self._versioned_root is None or self._tree_references is None:
276+ raise AssertionError(
277+ "Cannot serialise unless versioned_root/tree_references flags "
278+ "are both set.")
279 lines = ['', '', '', '', '']
280 to_line = self._delta_item_to_line
281 for delta_item in delta_to_new:
282@@ -188,6 +210,10 @@
283 oldpath_utf8 = 'None'
284 else:
285 oldpath_utf8 = '/' + oldpath.encode('utf8')
286+ if newpath == '/':
287+ raise AssertionError(
288+ "Bad inventory delta: '/' is not a valid newpath "
289+ "(should be '') in delta item %r" % (delta_item,))
290 # TODO: Test real-world utf8 cache hit rate. It may be a win.
291 newpath_utf8 = '/' + newpath.encode('utf8')
292 # Serialize None as ''
293@@ -221,10 +247,18 @@
294 def parse_text_bytes(self, bytes):
295 """Parse the text bytes of a serialized inventory delta.
296
297+ If versioned_root and/or tree_references flags were set via
298+ require_flags, then the parsed flags must match or a BzrError will be
299+ raised.
300+
301 :param bytes: The bytes to parse. This can be obtained by calling
302 delta_to_lines and then doing ''.join(delta_lines).
303- :return: (parent_id, new_id, inventory_delta)
304+ :return: (parent_id, new_id, versioned_root, tree_references,
305+ inventory_delta)
306 """
307+ if bytes[-1:] != '\n':
308+ last_line = bytes.rsplit('\n', 1)[-1]
309+ raise errors.BzrError('last line not empty: %r' % (last_line,))
310 lines = bytes.split('\n')[:-1] # discard the last empty line
311 if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:
312 raise errors.BzrError('unknown format %r' % lines[0:1])
313@@ -240,11 +274,13 @@
314 if len(lines) < 5 or not lines[4].startswith('tree_references: '):
315 raise errors.BzrError('missing tree_references: marker')
316 delta_tree_references = self._deserialize_bool(lines[4][17:])
317- if delta_versioned_root != self._versioned_root:
318+ if (self._versioned_root is not None and
319+ delta_versioned_root != self._versioned_root):
320 raise errors.BzrError(
321 "serialized versioned_root flag is wrong: %s" %
322 (delta_versioned_root,))
323- if delta_tree_references != self._tree_references:
324+ if (self._tree_references is not None
325+ and delta_tree_references != self._tree_references):
326 raise errors.BzrError(
327 "serialized tree_references flag is wrong: %s" %
328 (delta_tree_references,))
329@@ -266,22 +302,34 @@
330 raise errors.BzrError("Versioned root found: %r" % line)
331 elif last_modified[-1] == ':':
332 raise errors.BzrError('special revisionid found: %r' % line)
333- if not delta_tree_references and content.startswith('tree\x00'):
334+ if delta_tree_references is False and content.startswith('tree\x00'):
335 raise errors.BzrError("Tree reference found: %r" % line)
336- content_tuple = tuple(content.split('\x00'))
337- entry = _parse_entry(
338- newpath_utf8, file_id, parent_id, last_modified, content_tuple)
339 if oldpath_utf8 == 'None':
340 oldpath = None
341+ elif oldpath_utf8[:1] != '/':
342+ raise errors.BzrError(
343+ "oldpath invalid (does not start with /): %r"
344+ % (oldpath_utf8,))
345 else:
346+ oldpath_utf8 = oldpath_utf8[1:]
347 oldpath = oldpath_utf8.decode('utf8')
348 if newpath_utf8 == 'None':
349 newpath = None
350+ elif newpath_utf8[:1] != '/':
351+ raise errors.BzrError(
352+ "newpath invalid (does not start with /): %r"
353+ % (newpath_utf8,))
354 else:
355+ # Trim leading slash
356+ newpath_utf8 = newpath_utf8[1:]
357 newpath = newpath_utf8.decode('utf8')
358+ content_tuple = tuple(content.split('\x00'))
359+ entry = _parse_entry(
360+ newpath_utf8, file_id, parent_id, last_modified, content_tuple)
361 delta_item = (oldpath, newpath, file_id, entry)
362 result.append(delta_item)
363- return delta_parent_id, delta_version_id, result
364+ return (delta_parent_id, delta_version_id, delta_versioned_root,
365+ delta_tree_references, result)
366
367
368 def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):
369
370=== modified file 'bzrlib/remote.py'
371--- bzrlib/remote.py 2009-07-27 08:02:52 +0000
372+++ bzrlib/remote.py 2009-07-28 12:36:07 +0000
373@@ -426,6 +426,7 @@
374 self._custom_format = None
375 self._network_name = None
376 self._creating_bzrdir = None
377+ self._supports_chks = None
378 self._supports_external_lookups = None
379 self._supports_tree_reference = None
380 self._rich_root_data = None
381@@ -443,6 +444,13 @@
382 return self._rich_root_data
383
384 @property
385+ def supports_chks(self):
386+ if self._supports_chks is None:
387+ self._ensure_real()
388+ self._supports_chks = self._custom_format.supports_chks
389+ return self._supports_chks
390+
391+ @property
392 def supports_external_lookups(self):
393 if self._supports_external_lookups is None:
394 self._ensure_real()
395@@ -579,6 +587,11 @@
396 self._ensure_real()
397 return self._custom_format._serializer
398
399+ @property
400+ def repository_class(self):
401+ self._ensure_real()
402+ return self._custom_format.repository_class
403+
404
405 class RemoteRepository(_RpcHelper):
406 """Repository accessed over rpc.
407@@ -1178,9 +1191,9 @@
408 self._ensure_real()
409 return self._real_repository.get_inventory(revision_id)
410
411- def iter_inventories(self, revision_ids):
412+ def iter_inventories(self, revision_ids, ordering='unordered'):
413 self._ensure_real()
414- return self._real_repository.iter_inventories(revision_ids)
415+ return self._real_repository.iter_inventories(revision_ids, ordering)
416
417 @needs_read_lock
418 def get_revision(self, revision_id):
419@@ -1669,6 +1682,9 @@
420
421 class RemoteStreamSink(repository.StreamSink):
422
423+ def __init__(self, target_repo):
424+ repository.StreamSink.__init__(self, target_repo)
425+
426 def _insert_real(self, stream, src_format, resume_tokens):
427 self.target_repo._ensure_real()
428 sink = self.target_repo._real_repository._get_sink()
429@@ -1680,43 +1696,57 @@
430 def insert_stream(self, stream, src_format, resume_tokens):
431 target = self.target_repo
432 target._unstacked_provider.missing_keys.clear()
433+ candidate_calls = [('Repository.insert_stream_1.18', (1, 18))]
434 if target._lock_token:
435- verb = 'Repository.insert_stream_locked'
436- extra_args = (target._lock_token or '',)
437- required_version = (1, 14)
438+ candidate_calls.append(('Repository.insert_stream_locked', (1, 14)))
439+ lock_args = (target._lock_token or '',)
440 else:
441- verb = 'Repository.insert_stream'
442- extra_args = ()
443- required_version = (1, 13)
444+ candidate_calls.append(('Repository.insert_stream', (1, 13)))
445+ lock_args = ()
446 client = target._client
447 medium = client._medium
448- if medium._is_remote_before(required_version):
449- # No possible way this can work.
450- return self._insert_real(stream, src_format, resume_tokens)
451 path = target.bzrdir._path_for_remote_call(client)
452- if not resume_tokens:
453- # XXX: Ugly but important for correctness, *will* be fixed during
454- # 1.13 cycle. Pushing a stream that is interrupted results in a
455- # fallback to the _real_repositories sink *with a partial stream*.
456- # Thats bad because we insert less data than bzr expected. To avoid
457- # this we do a trial push to make sure the verb is accessible, and
458- # do not fallback when actually pushing the stream. A cleanup patch
459- # is going to look at rewinding/restarting the stream/partial
460- # buffering etc.
461+ found_verb = False
462+ for verb, required_version in candidate_calls:
463+ if medium._is_remote_before(required_version):
464+ continue
465+ if resume_tokens:
466+ # We've already done the probing (and set _is_remote_before) on
467+ # a previous insert.
468+ found_verb = True
469+ break
470 byte_stream = smart_repo._stream_to_byte_stream([], src_format)
471 try:
472 response = client.call_with_body_stream(
473- (verb, path, '') + extra_args, byte_stream)
474+ (verb, path, '') + lock_args, byte_stream)
475 except errors.UnknownSmartMethod:
476 medium._remember_remote_is_before(required_version)
477- return self._insert_real(stream, src_format, resume_tokens)
478+ else:
479+ found_verb = True
480+ break
481+ if not found_verb:
482+ # Have to use VFS.
483+ return self._insert_real(stream, src_format, resume_tokens)
484+ self._last_inv_record = None
485+ self._last_substream = None
486+ if required_version < (1, 18):
487+ # Remote side doesn't support inventory deltas. Wrap the stream to
488+ # make sure we don't send any. If the stream contains inventory
489+ # deltas we'll interrupt the smart insert_stream request and
490+ # fallback to VFS.
491+ stream = self._stop_stream_if_inventory_delta(stream)
492 byte_stream = smart_repo._stream_to_byte_stream(
493 stream, src_format)
494 resume_tokens = ' '.join(resume_tokens)
495 response = client.call_with_body_stream(
496- (verb, path, resume_tokens) + extra_args, byte_stream)
497+ (verb, path, resume_tokens) + lock_args, byte_stream)
498 if response[0][0] not in ('ok', 'missing-basis'):
499 raise errors.UnexpectedSmartServerResponse(response)
500+ if self._last_inv_record is not None:
501+ # The stream included an inventory-delta record, but the remote
502+ # side isn't new enough to support them. So we need to send the
503+ # rest of the stream via VFS.
504+ return self._resume_stream_with_vfs(response, src_format)
505 if response[0][0] == 'missing-basis':
506 tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
507 resume_tokens = tokens
508@@ -1725,6 +1755,60 @@
509 self.target_repo.refresh_data()
510 return [], set()
511
512+ def _resume_stream_with_vfs(self, response, src_format):
513+ """Resume sending a stream via VFS, first resending the record and
514+ substream that couldn't be sent via an insert_stream verb.
515+ """
516+ if response[0][0] == 'missing-basis':
517+ tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
518+ # Ignore missing_keys, we haven't finished inserting yet
519+ else:
520+ tokens = []
521+ def resume_substream():
522+ # First yield the record we stopped at.
523+ yield self._last_inv_record
524+ self._last_inv_record = None
525+ # Then yield the rest of the substream that was interrupted.
526+ for record in self._last_substream:
527+ yield record
528+ self._last_substream = None
529+ def resume_stream():
530+ # Finish sending the interrupted substream
531+ yield ('inventories', resume_substream())
532+ # Then simply continue sending the rest of the stream.
533+ for substream_kind, substream in self._last_stream:
534+ yield substream_kind, substream
535+ return self._insert_real(resume_stream(), src_format, tokens)
536+
537+ def _stop_stream_if_inventory_delta(self, stream):
538+ """Normally this just lets the original stream pass-through unchanged.
539+
540+ However if any 'inventories' substream includes an inventory-delta
541+ record it will stop streaming, and store the interrupted record,
542+ substream and stream in self._last_inv_record, self._last_substream and
543+ self._last_stream so that the stream can be resumed by
544+ _resume_stream_with_vfs.
545+ """
546+ def filter_inv_substream(inv_substream):
547+ substream_iter = iter(inv_substream)
548+ for record in substream_iter:
549+ if record.storage_kind == 'inventory-delta':
550+ self._last_inv_record = record
551+ self._last_substream = substream_iter
552+ return
553+ else:
554+ yield record
555+
556+ stream_iter = iter(stream)
557+ for substream_kind, substream in stream_iter:
558+ if substream_kind == 'inventories':
559+ yield substream_kind, filter_inv_substream(substream)
560+ if self._last_inv_record is not None:
561+ self._last_stream = stream_iter
562+ return
563+ else:
564+ yield substream_kind, substream
565+
566
567 class RemoteStreamSource(repository.StreamSource):
568 """Stream data from a remote server."""
569@@ -1736,6 +1820,12 @@
570 return self.missing_parents_chain(search, [self.from_repository] +
571 self.from_repository._fallback_repositories)
572
573+ def get_stream_for_missing_keys(self, missing_keys):
574+ self.from_repository._ensure_real()
575+ real_repo = self.from_repository._real_repository
576+ real_source = real_repo._get_source(self.to_format)
577+ return real_source.get_stream_for_missing_keys(missing_keys)
578+
579 def _real_stream(self, repo, search):
580 """Get a stream for search from repo.
581
582@@ -1770,18 +1860,26 @@
583 return self._real_stream(repo, search)
584 client = repo._client
585 medium = client._medium
586- if medium._is_remote_before((1, 13)):
587- # streaming was added in 1.13
588- return self._real_stream(repo, search)
589 path = repo.bzrdir._path_for_remote_call(client)
590- try:
591- search_bytes = repo._serialise_search_result(search)
592- response = repo._call_with_body_bytes_expecting_body(
593- 'Repository.get_stream',
594- (path, self.to_format.network_name()), search_bytes)
595- response_tuple, response_handler = response
596- except errors.UnknownSmartMethod:
597- medium._remember_remote_is_before((1,13))
598+ search_bytes = repo._serialise_search_result(search)
599+ args = (path, self.to_format.network_name())
600+ candidate_verbs = [
601+ ('Repository.get_stream_1.18', (1, 18)),
602+ ('Repository.get_stream', (1, 13))]
603+ found_verb = False
604+ for verb, version in candidate_verbs:
605+ if medium._is_remote_before(version):
606+ continue
607+ try:
608+ response = repo._call_with_body_bytes_expecting_body(
609+ verb, args, search_bytes)
610+ except errors.UnknownSmartMethod:
611+ medium._remember_remote_is_before(version)
612+ else:
613+ response_tuple, response_handler = response
614+ found_verb = True
615+ break
616+ if not found_verb:
617 return self._real_stream(repo, search)
618 if response_tuple[0] != 'ok':
619 raise errors.UnexpectedSmartServerResponse(response_tuple)
620
621=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
622--- bzrlib/repofmt/groupcompress_repo.py 2009-07-14 17:33:13 +0000
623+++ bzrlib/repofmt/groupcompress_repo.py 2009-07-28 12:36:07 +0000
624@@ -154,6 +154,8 @@
625 self._writer.begin()
626 # what state is the pack in? (open, finished, aborted)
627 self._state = 'open'
628+ # no name until we finish writing the content
629+ self.name = None
630
631 def _check_references(self):
632 """Make sure our external references are present.
633@@ -466,6 +468,13 @@
634 if not self._use_pack(self.new_pack):
635 self.new_pack.abort()
636 return None
637+ self.new_pack.finish_content()
638+ if len(self.packs) == 1:
639+ old_pack = self.packs[0]
640+ if old_pack.name == self.new_pack._hash.hexdigest():
641+ # The single old pack was already optimally packed.
642+ self.new_pack.abort()
643+ return None
644 self.pb.update('finishing repack', 6, 7)
645 self.new_pack.finish()
646 self._pack_collection.allocate(self.new_pack)
647@@ -766,10 +775,10 @@
648 if basis_tree is not None:
649 basis_tree.unlock()
650
651- def _iter_inventories(self, revision_ids):
652+ def _iter_inventories(self, revision_ids, ordering):
653 """Iterate over many inventory objects."""
654 keys = [(revision_id,) for revision_id in revision_ids]
655- stream = self.inventories.get_record_stream(keys, 'unordered', True)
656+ stream = self.inventories.get_record_stream(keys, ordering, True)
657 texts = {}
658 for record in stream:
659 if record.storage_kind != 'absent':
660@@ -779,7 +788,7 @@
661 for key in keys:
662 yield inventory.CHKInventory.deserialise(self.chk_bytes, texts[key], key)
663
664- def _iter_inventory_xmls(self, revision_ids):
665+ def _iter_inventory_xmls(self, revision_ids, ordering):
666 # Without a native 'xml' inventory, this method doesn't make sense, so
667 # make it raise to trap naughty direct users.
668 raise NotImplementedError(self._iter_inventory_xmls)
669@@ -879,14 +888,13 @@
670
671 def _get_source(self, to_format):
672 """Return a source for streaming from this repository."""
673- if isinstance(to_format, remote.RemoteRepositoryFormat):
674- # Can't just check attributes on to_format with the current code,
675- # work around this:
676- to_format._ensure_real()
677- to_format = to_format._custom_format
678- if to_format.__class__ is self._format.__class__:
679+ if (to_format.supports_chks and
680+ self._format.repository_class is to_format.repository_class and
681+ self._format._serializer == to_format._serializer):
682 # We must be exactly the same format, otherwise stuff like the chk
683- # page layout might be different
684+ # page layout might be different.
685+ # Actually, this test is just slightly looser than exact so that
686+ # CHK2 <-> 2a transfers will work.
687 return GroupCHKStreamSource(self, to_format)
688 return super(CHKInventoryRepository, self)._get_source(to_format)
689
690
691=== modified file 'bzrlib/repofmt/pack_repo.py'
692--- bzrlib/repofmt/pack_repo.py 2009-07-01 10:42:14 +0000
693+++ bzrlib/repofmt/pack_repo.py 2009-07-28 12:36:08 +0000
694@@ -422,6 +422,8 @@
695 self._writer.begin()
696 # what state is the pack in? (open, finished, aborted)
697 self._state = 'open'
698+ # no name until we finish writing the content
699+ self.name = None
700
701 def abort(self):
702 """Cancel creating this pack."""
703@@ -448,6 +450,14 @@
704 self.signature_index.key_count() or
705 (self.chk_index is not None and self.chk_index.key_count()))
706
707+ def finish_content(self):
708+ if self.name is not None:
709+ return
710+ self._writer.end()
711+ if self._buffer[1]:
712+ self._write_data('', flush=True)
713+ self.name = self._hash.hexdigest()
714+
715 def finish(self, suspend=False):
716 """Finish the new pack.
717
718@@ -459,10 +469,7 @@
719 - stores the index size tuple for the pack in the index_sizes
720 attribute.
721 """
722- self._writer.end()
723- if self._buffer[1]:
724- self._write_data('', flush=True)
725- self.name = self._hash.hexdigest()
726+ self.finish_content()
727 if not suspend:
728 self._check_references()
729 # write indices
730@@ -1567,7 +1574,7 @@
731 # determine which packs need changing
732 pack_operations = [[0, []]]
733 for pack in self.all_packs():
734- if not hint or pack.name in hint:
735+ if hint is None or pack.name in hint:
736 pack_operations[-1][0] += pack.get_revision_count()
737 pack_operations[-1][1].append(pack)
738 self._execute_pack_operations(pack_operations, OptimisingPacker)
739@@ -2093,6 +2100,7 @@
740 # when autopack takes no steps, the names list is still
741 # unsaved.
742 return self._save_pack_names()
743+ return []
744
745 def _suspend_write_group(self):
746 tokens = [pack.name for pack in self._resumed_packs]
747
748=== modified file 'bzrlib/repository.py'
749--- bzrlib/repository.py 2009-07-20 10:14:42 +0000
750+++ bzrlib/repository.py 2009-07-28 12:36:07 +0000
751@@ -31,6 +31,7 @@
752 gpg,
753 graph,
754 inventory,
755+ inventory_delta,
756 lazy_regex,
757 lockable_files,
758 lockdir,
759@@ -924,6 +925,11 @@
760 """
761 if self._write_group is not self.get_transaction():
762 # has an unlock or relock occured ?
763+ if suppress_errors:
764+ mutter(
765+ '(suppressed) mismatched lock context and write group. %r, %r',
766+ self._write_group, self.get_transaction())
767+ return
768 raise errors.BzrError(
769 'mismatched lock context and write group. %r, %r' %
770 (self._write_group, self.get_transaction()))
771@@ -2200,7 +2206,7 @@
772 """Get Inventory object by revision id."""
773 return self.iter_inventories([revision_id]).next()
774
775- def iter_inventories(self, revision_ids):
776+ def iter_inventories(self, revision_ids, ordering='unordered'):
777 """Get many inventories by revision_ids.
778
779 This will buffer some or all of the texts used in constructing the
780@@ -2208,21 +2214,23 @@
781 time.
782
783 :param revision_ids: The expected revision ids of the inventories.
784+ :param ordering: optional ordering, e.g. 'topological'.
785 :return: An iterator of inventories.
786 """
787 if ((None in revision_ids)
788 or (_mod_revision.NULL_REVISION in revision_ids)):
789 raise ValueError('cannot get null revision inventory')
790- return self._iter_inventories(revision_ids)
791+ return self._iter_inventories(revision_ids, ordering)
792
793- def _iter_inventories(self, revision_ids):
794+ def _iter_inventories(self, revision_ids, ordering):
795 """single-document based inventory iteration."""
796- for text, revision_id in self._iter_inventory_xmls(revision_ids):
797+ inv_xmls = self._iter_inventory_xmls(revision_ids, ordering)
798+ for text, revision_id in inv_xmls:
799 yield self.deserialise_inventory(revision_id, text)
800
801- def _iter_inventory_xmls(self, revision_ids):
802+ def _iter_inventory_xmls(self, revision_ids, ordering='unordered'):
803 keys = [(revision_id,) for revision_id in revision_ids]
804- stream = self.inventories.get_record_stream(keys, 'unordered', True)
805+ stream = self.inventories.get_record_stream(keys, ordering, True)
806 text_chunks = {}
807 for record in stream:
808 if record.storage_kind != 'absent':
809@@ -2258,7 +2266,7 @@
810 @needs_read_lock
811 def get_inventory_xml(self, revision_id):
812 """Get inventory XML as a file object."""
813- texts = self._iter_inventory_xmls([revision_id])
814+ texts = self._iter_inventory_xmls([revision_id], 'unordered')
815 try:
816 text, revision_id = texts.next()
817 except StopIteration:
818@@ -3519,6 +3527,12 @@
819 if (source._format.supports_tree_reference
820 and not target._format.supports_tree_reference):
821 return False
822+ # Only use this code path for local source and target. IDS does far
823+ # too much IO (both bandwidth and roundtrips) over a network.
824+ if not source.bzrdir.transport.base.startswith('file:///'):
825+ return False
826+ if not target.bzrdir.transport.base.startswith('file:///'):
827+ return False
828 return True
829
830 def _get_delta_for_revision(self, tree, parent_ids, basis_id, cache):
831@@ -3540,63 +3554,6 @@
832 deltas.sort()
833 return deltas[0][1:]
834
835- def _get_parent_keys(self, root_key, parent_map):
836- """Get the parent keys for a given root id."""
837- root_id, rev_id = root_key
838- # Include direct parents of the revision, but only if they used
839- # the same root_id and are heads.
840- parent_keys = []
841- for parent_id in parent_map[rev_id]:
842- if parent_id == _mod_revision.NULL_REVISION:
843- continue
844- if parent_id not in self._revision_id_to_root_id:
845- # We probably didn't read this revision, go spend the
846- # extra effort to actually check
847- try:
848- tree = self.source.revision_tree(parent_id)
849- except errors.NoSuchRevision:
850- # Ghost, fill out _revision_id_to_root_id in case we
851- # encounter this again.
852- # But set parent_root_id to None since we don't really know
853- parent_root_id = None
854- else:
855- parent_root_id = tree.get_root_id()
856- self._revision_id_to_root_id[parent_id] = None
857- else:
858- parent_root_id = self._revision_id_to_root_id[parent_id]
859- if root_id == parent_root_id:
860- # With stacking we _might_ want to refer to a non-local
861- # revision, but this code path only applies when we have the
862- # full content available, so ghosts really are ghosts, not just
863- # the edge of local data.
864- parent_keys.append((parent_id,))
865- else:
866- # root_id may be in the parent anyway.
867- try:
868- tree = self.source.revision_tree(parent_id)
869- except errors.NoSuchRevision:
870- # ghost, can't refer to it.
871- pass
872- else:
873- try:
874- parent_keys.append((tree.inventory[root_id].revision,))
875- except errors.NoSuchId:
876- # not in the tree
877- pass
878- g = graph.Graph(self.source.revisions)
879- heads = g.heads(parent_keys)
880- selected_keys = []
881- for key in parent_keys:
882- if key in heads and key not in selected_keys:
883- selected_keys.append(key)
884- return tuple([(root_id,)+ key for key in selected_keys])
885-
886- def _new_root_data_stream(self, root_keys_to_create, parent_map):
887- for root_key in root_keys_to_create:
888- parent_keys = self._get_parent_keys(root_key, parent_map)
889- yield versionedfile.FulltextContentFactory(root_key,
890- parent_keys, None, '')
891-
892 def _fetch_batch(self, revision_ids, basis_id, cache):
893 """Fetch across a few revisions.
894
895@@ -3648,8 +3605,10 @@
896 from_texts = self.source.texts
897 to_texts = self.target.texts
898 if root_keys_to_create:
899- root_stream = self._new_root_data_stream(root_keys_to_create,
900- parent_map)
901+ from bzrlib.fetch import _new_root_data_stream
902+ root_stream = _new_root_data_stream(
903+ root_keys_to_create, self._revision_id_to_root_id, parent_map,
904+ self.source)
905 to_texts.insert_record_stream(root_stream)
906 to_texts.insert_record_stream(from_texts.get_record_stream(
907 text_keys, self.target._format._fetch_order,
908@@ -3744,7 +3703,6 @@
909 # Walk though all revisions; get inventory deltas, copy referenced
910 # texts that delta references, insert the delta, revision and
911 # signature.
912- first_rev = self.source.get_revision(revision_ids[0])
913 if pb is None:
914 my_pb = ui.ui_factory.nested_progress_bar()
915 pb = my_pb
916@@ -3904,9 +3862,6 @@
917 self.file_ids = set([file_id for file_id, _ in
918 self.text_index.iterkeys()])
919 # text keys is now grouped by file_id
920- n_weaves = len(self.file_ids)
921- files_in_revisions = {}
922- revisions_of_files = {}
923 n_versions = len(self.text_index)
924 progress_bar.update('loading text store', 0, n_versions)
925 parent_map = self.repository.texts.get_parent_map(self.text_index)
926@@ -4005,6 +3960,7 @@
927 pass
928 else:
929 new_pack.set_write_cache_size(1024*1024)
930+ delta_deserializer = inventory_delta.InventoryDeltaSerializer()
931 for substream_type, substream in stream:
932 if substream_type == 'texts':
933 self.target_repo.texts.insert_record_stream(substream)
934@@ -4014,7 +3970,8 @@
935 substream)
936 else:
937 self._extract_and_insert_inventories(
938- substream, src_serializer)
939+ substream, src_serializer,
940+ delta_deserializer.parse_text_bytes)
941 elif substream_type == 'chk_bytes':
942 # XXX: This doesn't support conversions, as it assumes the
943 # conversion was done in the fetch code.
944@@ -4071,18 +4028,40 @@
945 self.target_repo.pack(hint=hint)
946 return [], set()
947
948- def _extract_and_insert_inventories(self, substream, serializer):
949+ def _extract_and_insert_inventories(self, substream, serializer,
950+ parse_delta=None):
951 """Generate a new inventory versionedfile in target, converting data.
952
953 The inventory is retrieved from the source, (deserializing it), and
954 stored in the target (reserializing it in a different format).
955 """
956+ target_rich_root = self.target_repo._format.rich_root_data
957+ target_tree_refs = self.target_repo._format.supports_tree_reference
958 for record in substream:
959+ if record.storage_kind == 'inventory-delta':
960+ # Insert the delta directly
961+ delta_tuple = record.get_bytes_as('inventory-delta')
962+ basis_id, new_id, inv_delta, format_flags = delta_tuple
963+ # Make sure the delta is compatible with the target
964+ if format_flags[0] and not target_rich_root:
965+ raise errors.IncompatibleRevision(self.target_repo._format)
966+ if format_flags[1] and not target_tree_refs:
967+ raise errors.IncompatibleRevision(self.target_repo._format)
968+ revision_id = new_id[0]
969+ parents = [key[0] for key in record.parents]
970+ self.target_repo.add_inventory_by_delta(
971+ basis_id, inv_delta, revision_id, parents)
972+ continue
973+ # It's not a delta, so it must be a fulltext in the source
974+ # serializer's format.
975 bytes = record.get_bytes_as('fulltext')
976 revision_id = record.key[0]
977 inv = serializer.read_inventory_from_string(bytes, revision_id)
978 parents = [key[0] for key in record.parents]
979 self.target_repo.add_inventory(revision_id, inv, parents)
980+ # No need to keep holding this full inv in memory when the rest of
981+ # the substream is likely to be all deltas.
982+ del inv
983
984 def _extract_and_insert_revisions(self, substream, serializer):
985 for record in substream:
986@@ -4137,11 +4116,8 @@
987 return [('signatures', signatures), ('revisions', revisions)]
988
989 def _generate_root_texts(self, revs):
990- """This will be called by __fetch between fetching weave texts and
991+ """This will be called by get_stream between fetching weave texts and
992 fetching the inventory weave.
993-
994- Subclasses should override this if they need to generate root texts
995- after fetching weave texts.
996 """
997 if self._rich_root_upgrade():
998 import bzrlib.fetch
999@@ -4179,9 +4155,6 @@
1000 # will be valid.
1001 for _ in self._generate_root_texts(revs):
1002 yield _
1003- # NB: This currently reopens the inventory weave in source;
1004- # using a single stream interface instead would avoid this.
1005- from_weave = self.from_repository.inventories
1006 # we fetch only the referenced inventories because we do not
1007 # know for unselected inventories whether all their required
1008 # texts are present in the other repository - it could be
1009@@ -4226,6 +4199,22 @@
1010 if not keys:
1011 # No need to stream something we don't have
1012 continue
1013+ if substream_kind == 'inventories':
1014+ # Some missing keys are genuinely ghosts, filter those out.
1015+ present = self.from_repository.inventories.get_parent_map(keys)
1016+ revs = [key[0] for key in present]
1017+ # As with the original stream, we may need to generate root
1018+ # texts for the inventories we're about to stream.
1019+ for _ in self._generate_root_texts(revs):
1020+ yield _
1021+ # Get the inventory stream more-or-less as we do for the
1022+ # original stream; there's no reason to assume that records
1023+ # direct from the source will be suitable for the sink. (Think
1024+ # e.g. 2a -> 1.9-rich-root).
1025+ for info in self._get_inventory_stream(revs, missing=True):
1026+ yield info
1027+ continue
1028+
1029 # Ask for full texts always so that we don't need more round trips
1030 # after this stream.
1031 # Some of the missing keys are genuinely ghosts, so filter absent
1032@@ -4246,129 +4235,95 @@
1033 return (not self.from_repository._format.rich_root_data and
1034 self.to_format.rich_root_data)
1035
1036- def _get_inventory_stream(self, revision_ids):
1037+ def _get_inventory_stream(self, revision_ids, missing=False):
1038 from_format = self.from_repository._format
1039- if (from_format.supports_chks and self.to_format.supports_chks
1040- and (from_format._serializer == self.to_format._serializer)):
1041- # Both sides support chks, and they use the same serializer, so it
1042- # is safe to transmit the chk pages and inventory pages across
1043- # as-is.
1044- return self._get_chk_inventory_stream(revision_ids)
1045+ if (from_format.supports_chks and self.to_format.supports_chks and
1046+ from_format.network_name() == self.to_format.network_name()):
1047+ raise AssertionError(
1048+ "this case should be handled by GroupCHKStreamSource")
1049 elif (not from_format.supports_chks):
1050 # Source repository doesn't support chks. So we can transmit the
1051 # inventories 'as-is' and either they are just accepted on the
1052 # target, or the Sink will properly convert it.
1053- return self._get_simple_inventory_stream(revision_ids)
1054+ # (XXX: this assumes that all non-chk formats are understood as-is
1055+ # by any Sink, but that presumably isn't true for foreign repo
1056+ # formats added by bzr-svn etc?)
1057+ return self._get_simple_inventory_stream(revision_ids,
1058+ missing=missing)
1059 else:
1060- # XXX: Hack to make not-chk->chk fetch: copy the inventories as
1061- # inventories. Note that this should probably be done somehow
1062- # as part of bzrlib.repository.StreamSink. Except JAM couldn't
1063- # figure out how a non-chk repository could possibly handle
1064- # deserializing an inventory stream from a chk repo, as it
1065- # doesn't have a way to understand individual pages.
1066- return self._get_convertable_inventory_stream(revision_ids)
1067+ # Make chk->non-chk (and chk with different serializers) fetch:
1068+ # copy the inventories as (format-neutral) inventory deltas.
1069+ return self._get_convertable_inventory_stream(revision_ids,
1070+ fulltexts=missing)
1071
1072- def _get_simple_inventory_stream(self, revision_ids):
1073+ def _get_simple_inventory_stream(self, revision_ids, missing=False):
1074+ # NB: This currently reopens the inventory weave in source;
1075+ # using a single stream interface instead would avoid this.
1076 from_weave = self.from_repository.inventories
1077+ if missing:
1078+ delta_closure = True
1079+ else:
1080+ delta_closure = not self.delta_on_metadata()
1081 yield ('inventories', from_weave.get_record_stream(
1082 [(rev_id,) for rev_id in revision_ids],
1083- self.inventory_fetch_order(),
1084- not self.delta_on_metadata()))
1085-
1086- def _get_chk_inventory_stream(self, revision_ids):
1087- """Fetch the inventory texts, along with the associated chk maps."""
1088- # We want an inventory outside of the search set, so that we can filter
1089- # out uninteresting chk pages. For now we use
1090- # _find_revision_outside_set, but if we had a Search with cut_revs, we
1091- # could use that instead.
1092- start_rev_id = self.from_repository._find_revision_outside_set(
1093- revision_ids)
1094- start_rev_key = (start_rev_id,)
1095- inv_keys_to_fetch = [(rev_id,) for rev_id in revision_ids]
1096- if start_rev_id != _mod_revision.NULL_REVISION:
1097- inv_keys_to_fetch.append((start_rev_id,))
1098- # Any repo that supports chk_bytes must also support out-of-order
1099- # insertion. At least, that is how we expect it to work
1100- # We use get_record_stream instead of iter_inventories because we want
1101- # to be able to insert the stream as well. We could instead fetch
1102- # allowing deltas, and then iter_inventories, but we don't know whether
1103- # source or target is more 'local' anway.
1104- inv_stream = self.from_repository.inventories.get_record_stream(
1105- inv_keys_to_fetch, 'unordered',
1106- True) # We need them as full-texts so we can find their references
1107- uninteresting_chk_roots = set()
1108- interesting_chk_roots = set()
1109- def filter_inv_stream(inv_stream):
1110- for idx, record in enumerate(inv_stream):
1111- ### child_pb.update('fetch inv', idx, len(inv_keys_to_fetch))
1112- bytes = record.get_bytes_as('fulltext')
1113- chk_inv = inventory.CHKInventory.deserialise(
1114- self.from_repository.chk_bytes, bytes, record.key)
1115- if record.key == start_rev_key:
1116- uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
1117- p_id_map = chk_inv.parent_id_basename_to_file_id
1118- if p_id_map is not None:
1119- uninteresting_chk_roots.add(p_id_map.key())
1120- else:
1121- yield record
1122- interesting_chk_roots.add(chk_inv.id_to_entry.key())
1123- p_id_map = chk_inv.parent_id_basename_to_file_id
1124- if p_id_map is not None:
1125- interesting_chk_roots.add(p_id_map.key())
1126- ### pb.update('fetch inventory', 0, 2)
1127- yield ('inventories', filter_inv_stream(inv_stream))
1128- # Now that we have worked out all of the interesting root nodes, grab
1129- # all of the interesting pages and insert them
1130- ### pb.update('fetch inventory', 1, 2)
1131- interesting = chk_map.iter_interesting_nodes(
1132- self.from_repository.chk_bytes, interesting_chk_roots,
1133- uninteresting_chk_roots)
1134- def to_stream_adapter():
1135- """Adapt the iter_interesting_nodes result to a single stream.
1136-
1137- iter_interesting_nodes returns records as it processes them, along
1138- with keys. However, we only want to return the records themselves.
1139- """
1140- for record, items in interesting:
1141- if record is not None:
1142- yield record
1143- # XXX: We could instead call get_record_stream(records.keys())
1144- # ATM, this will always insert the records as fulltexts, and
1145- # requires that you can hang on to records once you have gone
1146- # on to the next one. Further, it causes the target to
1147- # recompress the data. Testing shows it to be faster than
1148- # requesting the records again, though.
1149- yield ('chk_bytes', to_stream_adapter())
1150- ### pb.update('fetch inventory', 2, 2)
1151-
1152- def _get_convertable_inventory_stream(self, revision_ids):
1153- # XXX: One of source or target is using chks, and they don't have
1154- # compatible serializations. The StreamSink code expects to be
1155- # able to convert on the target, so we need to put
1156- # bytes-on-the-wire that can be converted
1157- yield ('inventories', self._stream_invs_as_fulltexts(revision_ids))
1158-
1159- def _stream_invs_as_fulltexts(self, revision_ids):
1160+ self.inventory_fetch_order(), delta_closure))
1161+
1162+ def _get_convertable_inventory_stream(self, revision_ids, fulltexts=False):
1163+ # The source is using CHKs, but the target either doesn't or is has a
1164+ # different serializer. The StreamSink code expects to be able to
1165+ # convert on the target, so we need to put bytes-on-the-wire that can
1166+ # be converted. That means inventory deltas (if the remote is <1.18,
1167+ # RemoteStreamSink will fallback to VFS to insert the deltas).
1168+ yield ('inventories',
1169+ self._stream_invs_as_deltas(revision_ids, fulltexts=fulltexts))
1170+
1171+ def _stream_invs_as_deltas(self, revision_ids, fulltexts=False):
1172 from_repo = self.from_repository
1173- from_serializer = from_repo._format._serializer
1174 revision_keys = [(rev_id,) for rev_id in revision_ids]
1175 parent_map = from_repo.inventories.get_parent_map(revision_keys)
1176- for inv in self.from_repository.iter_inventories(revision_ids):
1177- # XXX: This is a bit hackish, but it works. Basically,
1178- # CHKSerializer 'accidentally' supports
1179- # read/write_inventory_to_string, even though that is never
1180- # the format that is stored on disk. It *does* give us a
1181- # single string representation for an inventory, so live with
1182- # it for now.
1183- # This would be far better if we had a 'serialized inventory
1184- # delta' form. Then we could use 'inventory._make_delta', and
1185- # transmit that. This would both be faster to generate, and
1186- # result in fewer bytes-on-the-wire.
1187- as_bytes = from_serializer.write_inventory_to_string(inv)
1188+ # XXX: possibly repos could implement a more efficient iter_inv_deltas
1189+ # method...
1190+ inventories = self.from_repository.iter_inventories(
1191+ revision_ids, 'topological')
1192+ # XXX: ideally these flags would be per-revision, not per-repo (e.g.
1193+ # streaming a non-rich-root revision out of a rich-root repo back into
1194+ # a non-rich-root repo ought to be allowed)
1195+ format = from_repo._format
1196+ flags = (format.rich_root_data, format.supports_tree_reference)
1197+ invs_sent_so_far = set([_mod_revision.NULL_REVISION])
1198+ for inv in inventories:
1199 key = (inv.revision_id,)
1200- parent_keys = parent_map.get(key, ())
1201- yield versionedfile.FulltextContentFactory(
1202- key, parent_keys, None, as_bytes)
1203+ parents = parent_map.get(key, ())
1204+ if fulltexts or parents == ():
1205+ # Either the caller asked for fulltexts, or there is no parent,
1206+ # so, stream as a delta from null:.
1207+ basis_id = _mod_revision.NULL_REVISION
1208+ parent_inv = Inventory(None)
1209+ delta = inv._make_delta(parent_inv)
1210+ else:
1211+ # Make a delta against each parent so that we can find the
1212+ # smallest.
1213+ best_delta = None
1214+ parent_ids = [parent_key[0] for parent_key in parents]
1215+ parent_ids.append(_mod_revision.NULL_REVISION)
1216+ for parent_id in parent_ids:
1217+ if parent_id not in invs_sent_so_far:
1218+ # We don't know that the remote side has this basis, so
1219+ # we can't use it.
1220+ continue
1221+ if parent_id == _mod_revision.NULL_REVISION:
1222+ parent_inv = Inventory(None)
1223+ else:
1224+ parent_inv = from_repo.get_inventory(parent_id)
1225+ candidate_delta = inv._make_delta(parent_inv)
1226+ if (best_delta is None or
1227+ len(best_delta) > len(candidate_delta)):
1228+ best_delta = candidate_delta
1229+ basis_id = parent_id
1230+ delta = best_delta
1231+ invs_sent_so_far.add(basis_id)
1232+ yield versionedfile.InventoryDeltaContentFactory(
1233+ key, parents, None, delta, basis_id, flags, from_repo)
1234
1235
1236 def _iter_for_revno(repo, partial_history_cache, stop_index=None,
1237
1238=== modified file 'bzrlib/smart/protocol.py'
1239--- bzrlib/smart/protocol.py 2009-07-17 01:48:56 +0000
1240+++ bzrlib/smart/protocol.py 2009-07-28 12:36:08 +0000
1241@@ -1209,6 +1209,8 @@
1242 except (KeyboardInterrupt, SystemExit):
1243 raise
1244 except Exception:
1245+ mutter('_iter_with_errors caught error')
1246+ log_exception_quietly()
1247 yield sys.exc_info(), None
1248 return
1249
1250
1251=== modified file 'bzrlib/smart/repository.py'
1252--- bzrlib/smart/repository.py 2009-06-16 06:46:32 +0000
1253+++ bzrlib/smart/repository.py 2009-07-28 12:36:08 +0000
1254@@ -30,6 +30,7 @@
1255 graph,
1256 osutils,
1257 pack,
1258+ versionedfile,
1259 )
1260 from bzrlib.bzrdir import BzrDir
1261 from bzrlib.smart.request import (
1262@@ -39,7 +40,11 @@
1263 )
1264 from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
1265 from bzrlib import revision as _mod_revision
1266-from bzrlib.versionedfile import NetworkRecordStream, record_to_fulltext_bytes
1267+from bzrlib.versionedfile import (
1268+ NetworkRecordStream,
1269+ record_to_fulltext_bytes,
1270+ record_to_inventory_delta_bytes,
1271+ )
1272
1273
1274 class SmartServerRepositoryRequest(SmartServerRequest):
1275@@ -414,8 +419,39 @@
1276 repository.
1277 """
1278 self._to_format = network_format_registry.get(to_network_name)
1279+ if self._should_fake_unknown():
1280+ return FailedSmartServerResponse(
1281+ ('UnknownMethod', 'Repository.get_stream'))
1282 return None # Signal that we want a body.
1283
1284+ def _should_fake_unknown(self):
1285+ # This is a workaround for bugs in pre-1.18 clients that claim to
1286+ # support receiving streams of CHK repositories. The pre-1.18 client
1287+ # expects inventory records to be serialized in the format defined by
1288+ # to_network_name, but in pre-1.18 (at least) that format definition
1289+ # tries to use the xml5 serializer, which does not correctly handle
1290+ # rich-roots. After 1.18 the client can also accept inventory-deltas
1291+ # (which avoids this issue), and those clients will use the
1292+ # Repository.get_stream_1.18 verb instead of this one.
1293+ # So: if this repository is CHK, and the to_format doesn't match,
1294+ # we should just fake an UnknownSmartMethod error so that the client
1295+ # will fallback to VFS, rather than sending it a stream we know it
1296+ # cannot handle.
1297+ from_format = self._repository._format
1298+ to_format = self._to_format
1299+ if not from_format.supports_chks:
1300+ # Source not CHK: that's ok
1301+ return False
1302+ if (to_format.supports_chks and
1303+ from_format.repository_class is to_format.repository_class and
1304+ from_format._serializer == to_format._serializer):
1305+ # Source is CHK, but target matches: that's ok
1306+ # (e.g. 2a->2a, or CHK2->2a)
1307+ return False
1308+ # Source is CHK, and target is not CHK or incompatible CHK. We can't
1309+ # generate a compatible stream.
1310+ return True
1311+
1312 def do_body(self, body_bytes):
1313 repository = self._repository
1314 repository.lock_read()
1315@@ -451,6 +487,14 @@
1316 repository.unlock()
1317
1318
1319+class SmartServerRepositoryGetStream_1_18(SmartServerRepositoryGetStream):
1320+
1321+ def _should_fake_unknown(self):
1322+ # The client is at least 1.18, so we don't need to work around any
1323+ # bugs.
1324+ return False
1325+
1326+
1327 def _stream_to_byte_stream(stream, src_format):
1328 """Convert a record stream to a self delimited byte stream."""
1329 pack_writer = pack.ContainerSerialiser()
1330@@ -460,6 +504,8 @@
1331 for record in substream:
1332 if record.storage_kind in ('chunked', 'fulltext'):
1333 serialised = record_to_fulltext_bytes(record)
1334+ elif record.storage_kind == 'inventory-delta':
1335+ serialised = record_to_inventory_delta_bytes(record)
1336 elif record.storage_kind == 'absent':
1337 raise ValueError("Absent factory for %s" % (record.key,))
1338 else:
1339@@ -650,6 +696,23 @@
1340 return SuccessfulSmartServerResponse(('ok', ))
1341
1342
1343+class SmartServerRepositoryInsertStream_1_18(SmartServerRepositoryInsertStreamLocked):
1344+ """Insert a record stream from a RemoteSink into a repository.
1345+
1346+ Same as SmartServerRepositoryInsertStreamLocked, except:
1347+ - the lock token argument is optional
1348+ - servers that implement this verb accept 'inventory-delta' records in the
1349+ stream.
1350+
1351+ New in 1.18.
1352+ """
1353+
1354+ def do_repository_request(self, repository, resume_tokens, lock_token=None):
1355+ """StreamSink.insert_stream for a remote repository."""
1356+ SmartServerRepositoryInsertStreamLocked.do_repository_request(
1357+ self, repository, resume_tokens, lock_token)
1358+
1359+
1360 class SmartServerRepositoryInsertStream(SmartServerRepositoryInsertStreamLocked):
1361 """Insert a record stream from a RemoteSink into an unlocked repository.
1362
1363
1364=== modified file 'bzrlib/smart/request.py'
1365--- bzrlib/smart/request.py 2009-07-27 02:06:05 +0000
1366+++ bzrlib/smart/request.py 2009-07-28 12:36:08 +0000
1367@@ -553,6 +553,8 @@
1368 request_handlers.register_lazy(
1369 'Repository.insert_stream', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream')
1370 request_handlers.register_lazy(
1371+ 'Repository.insert_stream_1.18', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream_1_18')
1372+request_handlers.register_lazy(
1373 'Repository.insert_stream_locked', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStreamLocked')
1374 request_handlers.register_lazy(
1375 'Repository.is_shared', 'bzrlib.smart.repository', 'SmartServerRepositoryIsShared')
1376@@ -570,6 +572,9 @@
1377 'Repository.get_stream', 'bzrlib.smart.repository',
1378 'SmartServerRepositoryGetStream')
1379 request_handlers.register_lazy(
1380+ 'Repository.get_stream_1.18', 'bzrlib.smart.repository',
1381+ 'SmartServerRepositoryGetStream_1_18')
1382+request_handlers.register_lazy(
1383 'Repository.tarball', 'bzrlib.smart.repository',
1384 'SmartServerRepositoryTarball')
1385 request_handlers.register_lazy(
1386
1387=== modified file 'bzrlib/tests/__init__.py'
1388--- bzrlib/tests/__init__.py 2009-07-23 04:46:26 +0000
1389+++ bzrlib/tests/__init__.py 2009-07-28 12:36:08 +0000
1390@@ -1926,6 +1926,16 @@
1391 sio.encoding = output_encoding
1392 return sio
1393
1394+ def disable_verb(self, verb):
1395+ """Disable a smart server verb for one test."""
1396+ from bzrlib.smart import request
1397+ request_handlers = request.request_handlers
1398+ orig_method = request_handlers.get(verb)
1399+ request_handlers.remove(verb)
1400+ def restoreVerb():
1401+ request_handlers.register(verb, orig_method)
1402+ self.addCleanup(restoreVerb)
1403+
1404
1405 class CapturedCall(object):
1406 """A helper for capturing smart server calls for easy debug analysis."""
1407
1408=== modified file 'bzrlib/tests/per_branch/test_push.py'
1409--- bzrlib/tests/per_branch/test_push.py 2009-07-10 05:49:34 +0000
1410+++ bzrlib/tests/per_branch/test_push.py 2009-07-28 12:36:08 +0000
1411@@ -260,14 +260,15 @@
1412 self.assertFalse(local.is_locked())
1413 local.push(remote)
1414 hpss_call_names = [item.call.method for item in self.hpss_calls]
1415- self.assertTrue('Repository.insert_stream' in hpss_call_names)
1416- insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
1417+ self.assertTrue('Repository.insert_stream_1.18' in hpss_call_names)
1418+ insert_stream_idx = hpss_call_names.index(
1419+ 'Repository.insert_stream_1.18')
1420 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
1421 # After inserting the stream the client has no reason to query the
1422 # remote graph any further.
1423 self.assertEqual(
1424- ['Repository.insert_stream', 'Repository.insert_stream', 'get',
1425- 'Branch.set_last_revision_info', 'Branch.unlock'],
1426+ ['Repository.insert_stream_1.18', 'Repository.insert_stream_1.18',
1427+ 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
1428 calls_after_insert_stream)
1429
1430 def disableOptimisticGetParentMap(self):
1431
1432=== modified file 'bzrlib/tests/per_interbranch/test_push.py'
1433--- bzrlib/tests/per_interbranch/test_push.py 2009-07-10 05:49:34 +0000
1434+++ bzrlib/tests/per_interbranch/test_push.py 2009-07-28 12:36:09 +0000
1435@@ -266,14 +266,15 @@
1436 self.assertFalse(local.is_locked())
1437 local.push(remote)
1438 hpss_call_names = [item.call.method for item in self.hpss_calls]
1439- self.assertTrue('Repository.insert_stream' in hpss_call_names)
1440- insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
1441+ self.assertTrue('Repository.insert_stream_1.18' in hpss_call_names)
1442+ insert_stream_idx = hpss_call_names.index(
1443+ 'Repository.insert_stream_1.18')
1444 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
1445 # After inserting the stream the client has no reason to query the
1446 # remote graph any further.
1447 self.assertEqual(
1448- ['Repository.insert_stream', 'Repository.insert_stream', 'get',
1449- 'Branch.set_last_revision_info', 'Branch.unlock'],
1450+ ['Repository.insert_stream_1.18', 'Repository.insert_stream_1.18',
1451+ 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
1452 calls_after_insert_stream)
1453
1454 def disableOptimisticGetParentMap(self):
1455
1456=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
1457--- bzrlib/tests/per_interrepository/__init__.py 2009-07-10 06:46:10 +0000
1458+++ bzrlib/tests/per_interrepository/__init__.py 2009-07-28 12:36:09 +0000
1459@@ -32,8 +32,6 @@
1460 )
1461
1462 from bzrlib.repository import (
1463- InterDifferingSerializer,
1464- InterKnitRepo,
1465 InterRepository,
1466 )
1467 from bzrlib.tests import (
1468@@ -51,15 +49,13 @@
1469 (interrepo_class, repository_format, repository_format_to).
1470 """
1471 result = []
1472- for interrepo_class, repository_format, repository_format_to in formats:
1473- id = '%s,%s,%s' % (interrepo_class.__name__,
1474- repository_format.__class__.__name__,
1475- repository_format_to.__class__.__name__)
1476+ for repository_format, repository_format_to in formats:
1477+ id = '%s,%s' % (repository_format.__class__.__name__,
1478+ repository_format_to.__class__.__name__)
1479 scenario = (id,
1480 {"transport_server": transport_server,
1481 "transport_readonly_server": transport_readonly_server,
1482 "repository_format": repository_format,
1483- "interrepo_class": interrepo_class,
1484 "repository_format_to": repository_format_to,
1485 })
1486 result.append(scenario)
1487@@ -68,8 +64,12 @@
1488
1489 def default_test_list():
1490 """Generate the default list of interrepo permutations to test."""
1491- from bzrlib.repofmt import knitrepo, pack_repo, weaverepo
1492+ from bzrlib.repofmt import (
1493+ knitrepo, pack_repo, weaverepo, groupcompress_repo,
1494+ )
1495 result = []
1496+ def add_combo(from_format, to_format):
1497+ result.append((from_format, to_format))
1498 # test the default InterRepository between format 6 and the current
1499 # default format.
1500 # XXX: robertc 20060220 reinstate this when there are two supported
1501@@ -80,37 +80,33 @@
1502 for optimiser_class in InterRepository._optimisers:
1503 format_to_test = optimiser_class._get_repo_format_to_test()
1504 if format_to_test is not None:
1505- result.append((optimiser_class,
1506- format_to_test, format_to_test))
1507+ add_combo(format_to_test, format_to_test)
1508 # if there are specific combinations we want to use, we can add them
1509 # here. We want to test rich root upgrading.
1510- result.append((InterRepository,
1511- weaverepo.RepositoryFormat5(),
1512- knitrepo.RepositoryFormatKnit3()))
1513- result.append((InterRepository,
1514- knitrepo.RepositoryFormatKnit1(),
1515- knitrepo.RepositoryFormatKnit3()))
1516- result.append((InterRepository,
1517- knitrepo.RepositoryFormatKnit1(),
1518- knitrepo.RepositoryFormatKnit3()))
1519- result.append((InterKnitRepo,
1520- knitrepo.RepositoryFormatKnit1(),
1521- pack_repo.RepositoryFormatKnitPack1()))
1522- result.append((InterKnitRepo,
1523- pack_repo.RepositoryFormatKnitPack1(),
1524- knitrepo.RepositoryFormatKnit1()))
1525- result.append((InterKnitRepo,
1526- knitrepo.RepositoryFormatKnit3(),
1527- pack_repo.RepositoryFormatKnitPack3()))
1528- result.append((InterKnitRepo,
1529- pack_repo.RepositoryFormatKnitPack3(),
1530- knitrepo.RepositoryFormatKnit3()))
1531- result.append((InterKnitRepo,
1532- pack_repo.RepositoryFormatKnitPack3(),
1533- pack_repo.RepositoryFormatKnitPack4()))
1534- result.append((InterDifferingSerializer,
1535- pack_repo.RepositoryFormatKnitPack1(),
1536- pack_repo.RepositoryFormatKnitPack6RichRoot()))
1537+ add_combo(weaverepo.RepositoryFormat5(),
1538+ knitrepo.RepositoryFormatKnit3())
1539+ add_combo(knitrepo.RepositoryFormatKnit1(),
1540+ knitrepo.RepositoryFormatKnit3())
1541+ add_combo(knitrepo.RepositoryFormatKnit1(),
1542+ pack_repo.RepositoryFormatKnitPack1())
1543+ add_combo(pack_repo.RepositoryFormatKnitPack1(),
1544+ knitrepo.RepositoryFormatKnit1())
1545+ add_combo(knitrepo.RepositoryFormatKnit3(),
1546+ pack_repo.RepositoryFormatKnitPack3())
1547+ add_combo(pack_repo.RepositoryFormatKnitPack3(),
1548+ knitrepo.RepositoryFormatKnit3())
1549+ add_combo(pack_repo.RepositoryFormatKnitPack3(),
1550+ pack_repo.RepositoryFormatKnitPack4())
1551+ add_combo(pack_repo.RepositoryFormatKnitPack1(),
1552+ pack_repo.RepositoryFormatKnitPack6RichRoot())
1553+ add_combo(pack_repo.RepositoryFormatKnitPack6RichRoot(),
1554+ groupcompress_repo.RepositoryFormat2a())
1555+ add_combo(groupcompress_repo.RepositoryFormat2a(),
1556+ pack_repo.RepositoryFormatKnitPack6RichRoot())
1557+ add_combo(groupcompress_repo.RepositoryFormatCHK2(),
1558+ groupcompress_repo.RepositoryFormat2a())
1559+ add_combo(groupcompress_repo.RepositoryFormatCHK1(),
1560+ groupcompress_repo.RepositoryFormat2a())
1561 return result
1562
1563
1564
1565=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
1566--- bzrlib/tests/per_interrepository/test_fetch.py 2009-07-10 06:46:10 +0000
1567+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-07-28 12:36:09 +0000
1568@@ -28,6 +28,9 @@
1569 from bzrlib.errors import (
1570 NoSuchRevision,
1571 )
1572+from bzrlib.graph import (
1573+ SearchResult,
1574+ )
1575 from bzrlib.revision import (
1576 NULL_REVISION,
1577 Revision,
1578@@ -124,6 +127,15 @@
1579 to_repo.texts.get_record_stream([('foo', revid)],
1580 'unordered', True).next().get_bytes_as('fulltext'))
1581
1582+ def test_fetch_parent_inventories_at_stacking_boundary_smart(self):
1583+ self.setup_smart_server_with_call_log()
1584+ self.test_fetch_parent_inventories_at_stacking_boundary()
1585+
1586+ def test_fetch_parent_inventories_at_stacking_boundary_smart_old(self):
1587+ self.setup_smart_server_with_call_log()
1588+ self.disable_verb('Repository.insert_stream_1.18')
1589+ self.test_fetch_parent_inventories_at_stacking_boundary()
1590+
1591 def test_fetch_parent_inventories_at_stacking_boundary(self):
1592 """Fetch to a stacked branch copies inventories for parents of
1593 revisions at the stacking boundary.
1594@@ -156,11 +168,25 @@
1595 unstacked_repo = stacked_branch.bzrdir.open_repository()
1596 unstacked_repo.lock_read()
1597 self.addCleanup(unstacked_repo.unlock)
1598- self.assertFalse(unstacked_repo.has_revision('left'))
1599- self.assertFalse(unstacked_repo.has_revision('right'))
1600- self.assertEqual(
1601- set([('left',), ('right',), ('merge',)]),
1602- unstacked_repo.inventories.keys())
1603+ if not unstacked_repo._format.supports_chks:
1604+ # these assertions aren't valid for groupcompress repos, which may
1605+ # transfer data than strictly necessary to avoid breaking up an
1606+ # already-compressed block of data.
1607+ self.assertFalse(unstacked_repo.has_revision('left'))
1608+ self.assertFalse(unstacked_repo.has_revision('right'))
1609+ self.assertTrue(unstacked_repo.has_revision('merge'))
1610+ # We used to check for the presence of parent invs here, but what
1611+ # really matters is that the repo can stream the new revision without
1612+ # the help of any fallback repos.
1613+ self.assertCanStreamRevision(unstacked_repo, 'merge')
1614+
1615+ def assertCanStreamRevision(self, repo, revision_id):
1616+ exclude_keys = set(repo.all_revision_ids()) - set([revision_id])
1617+ search = SearchResult([revision_id], exclude_keys, 1, [revision_id])
1618+ source = repo._get_source(repo._format)
1619+ for substream_kind, substream in source.get_stream(search):
1620+ # Consume the substream
1621+ list(substream)
1622
1623 def test_fetch_missing_basis_text(self):
1624 """If fetching a delta, we should die if a basis is not present."""
1625@@ -276,7 +302,10 @@
1626 to_repo = self.make_to_repository('to')
1627 to_repo.fetch(from_tree.branch.repository)
1628 recorded_inv_sha1 = to_repo.get_inventory_sha1('foo-id')
1629- xml = to_repo.get_inventory_xml('foo-id')
1630+ try:
1631+ xml = to_repo.get_inventory_xml('foo-id')
1632+ except NotImplementedError:
1633+ raise TestNotApplicable('repo does not provide get_inventory_xml')
1634 computed_inv_sha1 = osutils.sha_string(xml)
1635 self.assertEqual(computed_inv_sha1, recorded_inv_sha1)
1636
1637
1638=== modified file 'bzrlib/tests/test_inventory_delta.py'
1639--- bzrlib/tests/test_inventory_delta.py 2009-04-02 05:53:12 +0000
1640+++ bzrlib/tests/test_inventory_delta.py 2009-07-28 12:36:08 +0000
1641@@ -93,30 +93,26 @@
1642 """Test InventoryDeltaSerializer.parse_text_bytes."""
1643
1644 def test_parse_no_bytes(self):
1645- serializer = inventory_delta.InventoryDeltaSerializer(
1646- versioned_root=True, tree_references=True)
1647+ serializer = inventory_delta.InventoryDeltaSerializer()
1648 err = self.assertRaises(
1649 errors.BzrError, serializer.parse_text_bytes, '')
1650- self.assertContainsRe(str(err), 'unknown format')
1651+ self.assertContainsRe(str(err), 'last line not empty')
1652
1653 def test_parse_bad_format(self):
1654- serializer = inventory_delta.InventoryDeltaSerializer(
1655- versioned_root=True, tree_references=True)
1656+ serializer = inventory_delta.InventoryDeltaSerializer()
1657 err = self.assertRaises(errors.BzrError,
1658 serializer.parse_text_bytes, 'format: foo\n')
1659 self.assertContainsRe(str(err), 'unknown format')
1660
1661 def test_parse_no_parent(self):
1662- serializer = inventory_delta.InventoryDeltaSerializer(
1663- versioned_root=True, tree_references=True)
1664+ serializer = inventory_delta.InventoryDeltaSerializer()
1665 err = self.assertRaises(errors.BzrError,
1666 serializer.parse_text_bytes,
1667 'format: bzr inventory delta v1 (bzr 1.14)\n')
1668 self.assertContainsRe(str(err), 'missing parent: marker')
1669
1670 def test_parse_no_version(self):
1671- serializer = inventory_delta.InventoryDeltaSerializer(
1672- versioned_root=True, tree_references=True)
1673+ serializer = inventory_delta.InventoryDeltaSerializer()
1674 err = self.assertRaises(errors.BzrError,
1675 serializer.parse_text_bytes,
1676 'format: bzr inventory delta v1 (bzr 1.14)\n'
1677@@ -124,8 +120,7 @@
1678 self.assertContainsRe(str(err), 'missing version: marker')
1679
1680 def test_parse_duplicate_key_errors(self):
1681- serializer = inventory_delta.InventoryDeltaSerializer(
1682- versioned_root=True, tree_references=True)
1683+ serializer = inventory_delta.InventoryDeltaSerializer()
1684 double_root_lines = \
1685 """format: bzr inventory delta v1 (bzr 1.14)
1686 parent: null:
1687@@ -140,19 +135,20 @@
1688 self.assertContainsRe(str(err), 'duplicate file id')
1689
1690 def test_parse_versioned_root_only(self):
1691- serializer = inventory_delta.InventoryDeltaSerializer(
1692- versioned_root=True, tree_references=True)
1693+ serializer = inventory_delta.InventoryDeltaSerializer()
1694+ serializer.require_flags(versioned_root=True, tree_references=True)
1695 parse_result = serializer.parse_text_bytes(root_only_lines)
1696 expected_entry = inventory.make_entry(
1697 'directory', u'', None, 'an-id')
1698 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
1699 self.assertEqual(
1700- ('null:', 'entry-version', [(None, '/', 'an-id', expected_entry)]),
1701+ ('null:', 'entry-version', True, True,
1702+ [(None, '', 'an-id', expected_entry)]),
1703 parse_result)
1704
1705 def test_parse_special_revid_not_valid_last_mod(self):
1706- serializer = inventory_delta.InventoryDeltaSerializer(
1707- versioned_root=False, tree_references=True)
1708+ serializer = inventory_delta.InventoryDeltaSerializer()
1709+ serializer.require_flags(versioned_root=False, tree_references=True)
1710 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
1711 parent: null:
1712 version: null:
1713@@ -165,8 +161,8 @@
1714 self.assertContainsRe(str(err), 'special revisionid found')
1715
1716 def test_parse_versioned_root_versioned_disabled(self):
1717- serializer = inventory_delta.InventoryDeltaSerializer(
1718- versioned_root=False, tree_references=True)
1719+ serializer = inventory_delta.InventoryDeltaSerializer()
1720+ serializer.require_flags(versioned_root=False, tree_references=True)
1721 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
1722 parent: null:
1723 version: null:
1724@@ -179,8 +175,8 @@
1725 self.assertContainsRe(str(err), 'Versioned root found')
1726
1727 def test_parse_unique_root_id_root_versioned_disabled(self):
1728- serializer = inventory_delta.InventoryDeltaSerializer(
1729- versioned_root=False, tree_references=True)
1730+ serializer = inventory_delta.InventoryDeltaSerializer()
1731+ serializer.require_flags(versioned_root=False, tree_references=True)
1732 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
1733 parent: null:
1734 version: null:
1735@@ -193,21 +189,80 @@
1736 self.assertContainsRe(str(err), 'Versioned root found')
1737
1738 def test_parse_unversioned_root_versioning_enabled(self):
1739- serializer = inventory_delta.InventoryDeltaSerializer(
1740- versioned_root=True, tree_references=True)
1741+ serializer = inventory_delta.InventoryDeltaSerializer()
1742+ serializer.require_flags(versioned_root=True, tree_references=True)
1743 err = self.assertRaises(errors.BzrError,
1744 serializer.parse_text_bytes, root_only_unversioned)
1745 self.assertContainsRe(
1746 str(err), 'serialized versioned_root flag is wrong: False')
1747
1748 def test_parse_tree_when_disabled(self):
1749- serializer = inventory_delta.InventoryDeltaSerializer(
1750- versioned_root=True, tree_references=False)
1751+ serializer = inventory_delta.InventoryDeltaSerializer()
1752+ serializer.require_flags(versioned_root=True, tree_references=False)
1753 err = self.assertRaises(errors.BzrError,
1754 serializer.parse_text_bytes, reference_lines)
1755 self.assertContainsRe(
1756 str(err), 'serialized tree_references flag is wrong: True')
1757
1758+ def test_parse_tree_when_header_disallows(self):
1759+ # A deserializer that allows tree_references to be set or unset.
1760+ serializer = inventory_delta.InventoryDeltaSerializer()
1761+ # A serialised inventory delta with a header saying no tree refs, but
1762+ # that has a tree ref in its content.
1763+ lines = """format: bzr inventory delta v1 (bzr 1.14)
1764+parent: null:
1765+version: entry-version
1766+versioned_root: false
1767+tree_references: false
1768+None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
1769+"""
1770+ err = self.assertRaises(errors.BzrError,
1771+ serializer.parse_text_bytes, lines)
1772+ self.assertContainsRe(str(err), 'Tree reference found')
1773+
1774+ def test_parse_versioned_root_when_header_disallows(self):
1775+ # A deserializer that allows tree_references to be set or unset.
1776+ serializer = inventory_delta.InventoryDeltaSerializer()
1777+ # A serialised inventory delta with a header saying no tree refs, but
1778+ # that has a tree ref in its content.
1779+ lines = """format: bzr inventory delta v1 (bzr 1.14)
1780+parent: null:
1781+version: entry-version
1782+versioned_root: false
1783+tree_references: false
1784+None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
1785+"""
1786+ err = self.assertRaises(errors.BzrError,
1787+ serializer.parse_text_bytes, lines)
1788+ self.assertContainsRe(str(err), 'Versioned root found')
1789+
1790+ def test_parse_last_line_not_empty(self):
1791+ """newpath must start with / if it is not None."""
1792+ # Trim the trailing newline from a valid serialization
1793+ lines = root_only_lines[:-1]
1794+ serializer = inventory_delta.InventoryDeltaSerializer()
1795+ err = self.assertRaises(errors.BzrError,
1796+ serializer.parse_text_bytes, lines)
1797+ self.assertContainsRe(str(err), 'last line not empty')
1798+
1799+ def test_parse_invalid_newpath(self):
1800+ """newpath must start with / if it is not None."""
1801+ lines = empty_lines
1802+ lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
1803+ serializer = inventory_delta.InventoryDeltaSerializer()
1804+ err = self.assertRaises(errors.BzrError,
1805+ serializer.parse_text_bytes, lines)
1806+ self.assertContainsRe(str(err), 'newpath invalid')
1807+
1808+ def test_parse_invalid_oldpath(self):
1809+ """oldpath must start with / if it is not None."""
1810+ lines = root_only_lines
1811+ lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
1812+ serializer = inventory_delta.InventoryDeltaSerializer()
1813+ err = self.assertRaises(errors.BzrError,
1814+ serializer.parse_text_bytes, lines)
1815+ self.assertContainsRe(str(err), 'oldpath invalid')
1816+
1817
1818 class TestSerialization(TestCase):
1819 """Tests for InventoryDeltaSerializer.delta_to_lines."""
1820@@ -216,8 +271,8 @@
1821 old_inv = Inventory(None)
1822 new_inv = Inventory(None)
1823 delta = new_inv._make_delta(old_inv)
1824- serializer = inventory_delta.InventoryDeltaSerializer(
1825- versioned_root=True, tree_references=True)
1826+ serializer = inventory_delta.InventoryDeltaSerializer()
1827+ serializer.require_flags(True, True)
1828 self.assertEqual(StringIO(empty_lines).readlines(),
1829 serializer.delta_to_lines(NULL_REVISION, NULL_REVISION, delta))
1830
1831@@ -228,8 +283,8 @@
1832 root.revision = 'a@e\xc3\xa5ample.com--2004'
1833 new_inv.add(root)
1834 delta = new_inv._make_delta(old_inv)
1835- serializer = inventory_delta.InventoryDeltaSerializer(
1836- versioned_root=True, tree_references=True)
1837+ serializer = inventory_delta.InventoryDeltaSerializer()
1838+ serializer.require_flags(versioned_root=True, tree_references=True)
1839 self.assertEqual(StringIO(root_only_lines).readlines(),
1840 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
1841
1842@@ -239,8 +294,8 @@
1843 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
1844 new_inv.add(root)
1845 delta = new_inv._make_delta(old_inv)
1846- serializer = inventory_delta.InventoryDeltaSerializer(
1847- versioned_root=False, tree_references=False)
1848+ serializer = inventory_delta.InventoryDeltaSerializer()
1849+ serializer.require_flags(False, False)
1850 self.assertEqual(StringIO(root_only_unversioned).readlines(),
1851 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
1852
1853@@ -253,8 +308,8 @@
1854 non_root = new_inv.make_entry('directory', 'foo', root.file_id, 'id')
1855 new_inv.add(non_root)
1856 delta = new_inv._make_delta(old_inv)
1857- serializer = inventory_delta.InventoryDeltaSerializer(
1858- versioned_root=True, tree_references=True)
1859+ serializer = inventory_delta.InventoryDeltaSerializer()
1860+ serializer.require_flags(versioned_root=True, tree_references=True)
1861 err = self.assertRaises(errors.BzrError,
1862 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1863 self.assertEqual(str(err), 'no version for fileid id')
1864@@ -265,8 +320,8 @@
1865 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
1866 new_inv.add(root)
1867 delta = new_inv._make_delta(old_inv)
1868- serializer = inventory_delta.InventoryDeltaSerializer(
1869- versioned_root=True, tree_references=True)
1870+ serializer = inventory_delta.InventoryDeltaSerializer()
1871+ serializer.require_flags(versioned_root=True, tree_references=True)
1872 err = self.assertRaises(errors.BzrError,
1873 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1874 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
1875@@ -278,8 +333,8 @@
1876 root.revision = 'a@e\xc3\xa5ample.com--2004'
1877 new_inv.add(root)
1878 delta = new_inv._make_delta(old_inv)
1879- serializer = inventory_delta.InventoryDeltaSerializer(
1880- versioned_root=False, tree_references=True)
1881+ serializer = inventory_delta.InventoryDeltaSerializer()
1882+ serializer.require_flags(versioned_root=False, tree_references=True)
1883 err = self.assertRaises(errors.BzrError,
1884 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1885 self.assertEqual(str(err), 'Version present for / in TREE_ROOT')
1886@@ -290,8 +345,8 @@
1887 root = new_inv.make_entry('directory', '', None, 'my-rich-root-id')
1888 new_inv.add(root)
1889 delta = new_inv._make_delta(old_inv)
1890- serializer = inventory_delta.InventoryDeltaSerializer(
1891- versioned_root=False, tree_references=True)
1892+ serializer = inventory_delta.InventoryDeltaSerializer()
1893+ serializer.require_flags(versioned_root=False, tree_references=True)
1894 err = self.assertRaises(errors.BzrError,
1895 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1896 self.assertEqual(
1897@@ -308,8 +363,8 @@
1898 non_root.kind = 'strangelove'
1899 new_inv.add(non_root)
1900 delta = new_inv._make_delta(old_inv)
1901- serializer = inventory_delta.InventoryDeltaSerializer(
1902- versioned_root=True, tree_references=True)
1903+ serializer = inventory_delta.InventoryDeltaSerializer()
1904+ serializer.require_flags(True, True)
1905 # we expect keyerror because there is little value wrapping this.
1906 # This test aims to prove that it errors more than how it errors.
1907 err = self.assertRaises(KeyError,
1908@@ -328,8 +383,8 @@
1909 non_root.reference_revision = 'subtree-version'
1910 new_inv.add(non_root)
1911 delta = new_inv._make_delta(old_inv)
1912- serializer = inventory_delta.InventoryDeltaSerializer(
1913- versioned_root=True, tree_references=False)
1914+ serializer = inventory_delta.InventoryDeltaSerializer()
1915+ serializer.require_flags(versioned_root=True, tree_references=False)
1916 # we expect keyerror because there is little value wrapping this.
1917 # This test aims to prove that it errors more than how it errors.
1918 err = self.assertRaises(KeyError,
1919@@ -348,23 +403,26 @@
1920 non_root.reference_revision = 'subtree-version'
1921 new_inv.add(non_root)
1922 delta = new_inv._make_delta(old_inv)
1923- serializer = inventory_delta.InventoryDeltaSerializer(
1924- versioned_root=True, tree_references=True)
1925+ serializer = inventory_delta.InventoryDeltaSerializer()
1926+ serializer.require_flags(versioned_root=True, tree_references=True)
1927 self.assertEqual(StringIO(reference_lines).readlines(),
1928 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
1929
1930 def test_to_inventory_root_id_versioned_not_permitted(self):
1931- delta = [(None, '/', 'TREE_ROOT', inventory.make_entry(
1932- 'directory', '', None, 'TREE_ROOT'))]
1933- serializer = inventory_delta.InventoryDeltaSerializer(False, True)
1934+ root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
1935+ root_entry.revision = 'some-version'
1936+ delta = [(None, '', 'TREE_ROOT', root_entry)]
1937+ serializer = inventory_delta.InventoryDeltaSerializer()
1938+ serializer.require_flags(versioned_root=False, tree_references=True)
1939 self.assertRaises(
1940 errors.BzrError, serializer.delta_to_lines, 'old-version',
1941 'new-version', delta)
1942
1943 def test_to_inventory_root_id_not_versioned(self):
1944- delta = [(None, '/', 'an-id', inventory.make_entry(
1945+ delta = [(None, '', 'an-id', inventory.make_entry(
1946 'directory', '', None, 'an-id'))]
1947- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
1948+ serializer = inventory_delta.InventoryDeltaSerializer()
1949+ serializer.require_flags(versioned_root=True, tree_references=True)
1950 self.assertRaises(
1951 errors.BzrError, serializer.delta_to_lines, 'old-version',
1952 'new-version', delta)
1953@@ -374,12 +432,13 @@
1954 tree_ref = make_entry('tree-reference', 'foo', 'changed-in', 'ref-id')
1955 tree_ref.reference_revision = 'ref-revision'
1956 delta = [
1957- (None, '/', 'an-id',
1958+ (None, '', 'an-id',
1959 make_entry('directory', '', 'changed-in', 'an-id')),
1960- (None, '/foo', 'ref-id', tree_ref)
1961+ (None, 'foo', 'ref-id', tree_ref)
1962 # a file that followed the root move
1963 ]
1964- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
1965+ serializer = inventory_delta.InventoryDeltaSerializer()
1966+ serializer.require_flags(versioned_root=True, tree_references=True)
1967 self.assertRaises(errors.BzrError, serializer.delta_to_lines,
1968 'old-version', 'new-version', delta)
1969
1970@@ -430,7 +489,8 @@
1971 executable=True, text_size=30, text_sha1='some-sha',
1972 revision='old-rev')),
1973 ]
1974- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
1975+ serializer = inventory_delta.InventoryDeltaSerializer()
1976+ serializer.require_flags(versioned_root=True, tree_references=True)
1977 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
1978 expected = """format: bzr inventory delta v1 (bzr 1.14)
1979 parent: null:
1980
1981=== modified file 'bzrlib/tests/test_remote.py'
1982--- bzrlib/tests/test_remote.py 2009-07-16 05:22:50 +0000
1983+++ bzrlib/tests/test_remote.py 2009-07-28 12:36:08 +0000
1984@@ -31,6 +31,7 @@
1985 config,
1986 errors,
1987 graph,
1988+ inventory,
1989 pack,
1990 remote,
1991 repository,
1992@@ -38,6 +39,7 @@
1993 tests,
1994 treebuilder,
1995 urlutils,
1996+ versionedfile,
1997 )
1998 from bzrlib.branch import Branch
1999 from bzrlib.bzrdir import BzrDir, BzrDirFormat
2000@@ -332,15 +334,6 @@
2001 reference_bzrdir_format = bzrdir.format_registry.get('default')()
2002 return reference_bzrdir_format.repository_format
2003
2004- def disable_verb(self, verb):
2005- """Disable a verb for one test."""
2006- request_handlers = smart.request.request_handlers
2007- orig_method = request_handlers.get(verb)
2008- request_handlers.remove(verb)
2009- def restoreVerb():
2010- request_handlers.register(verb, orig_method)
2011- self.addCleanup(restoreVerb)
2012-
2013 def assertFinished(self, fake_client):
2014 """Assert that all of a FakeClient's expected calls have occurred."""
2015 fake_client.finished_test()
2016@@ -2220,54 +2213,214 @@
2017
2018
2019 class TestRepositoryInsertStream(TestRemoteRepository):
2020-
2021- def test_unlocked_repo(self):
2022- transport_path = 'quack'
2023- repo, client = self.setup_fake_client_and_repository(transport_path)
2024- client.add_expected_call(
2025- 'Repository.insert_stream', ('quack/', ''),
2026- 'success', ('ok',))
2027- client.add_expected_call(
2028- 'Repository.insert_stream', ('quack/', ''),
2029- 'success', ('ok',))
2030- sink = repo._get_sink()
2031- fmt = repository.RepositoryFormat.get_default_format()
2032- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2033- self.assertEqual([], resume_tokens)
2034- self.assertEqual(set(), missing_keys)
2035- self.assertFinished(client)
2036-
2037- def test_locked_repo_with_no_lock_token(self):
2038- transport_path = 'quack'
2039- repo, client = self.setup_fake_client_and_repository(transport_path)
2040- client.add_expected_call(
2041- 'Repository.lock_write', ('quack/', ''),
2042- 'success', ('ok', ''))
2043- client.add_expected_call(
2044- 'Repository.insert_stream', ('quack/', ''),
2045- 'success', ('ok',))
2046- client.add_expected_call(
2047- 'Repository.insert_stream', ('quack/', ''),
2048- 'success', ('ok',))
2049- repo.lock_write()
2050- sink = repo._get_sink()
2051- fmt = repository.RepositoryFormat.get_default_format()
2052- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2053- self.assertEqual([], resume_tokens)
2054- self.assertEqual(set(), missing_keys)
2055- self.assertFinished(client)
2056-
2057- def test_locked_repo_with_lock_token(self):
2058- transport_path = 'quack'
2059- repo, client = self.setup_fake_client_and_repository(transport_path)
2060- client.add_expected_call(
2061- 'Repository.lock_write', ('quack/', ''),
2062- 'success', ('ok', 'a token'))
2063- client.add_expected_call(
2064- 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2065- 'success', ('ok',))
2066- client.add_expected_call(
2067- 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2068+ """Tests for using Repository.insert_stream verb when the _1.18 variant is
2069+ not available.
2070+
2071+ This test case is very similar to TestRepositoryInsertStream_1_18.
2072+ """
2073+
2074+ def setUp(self):
2075+ TestRemoteRepository.setUp(self)
2076+ self.disable_verb('Repository.insert_stream_1.18')
2077+
2078+ def test_unlocked_repo(self):
2079+ transport_path = 'quack'
2080+ repo, client = self.setup_fake_client_and_repository(transport_path)
2081+ client.add_expected_call(
2082+ 'Repository.insert_stream_1.18', ('quack/', ''),
2083+ 'unknown', ('Repository.insert_stream_1.18',))
2084+ client.add_expected_call(
2085+ 'Repository.insert_stream', ('quack/', ''),
2086+ 'success', ('ok',))
2087+ client.add_expected_call(
2088+ 'Repository.insert_stream', ('quack/', ''),
2089+ 'success', ('ok',))
2090+ sink = repo._get_sink()
2091+ fmt = repository.RepositoryFormat.get_default_format()
2092+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2093+ self.assertEqual([], resume_tokens)
2094+ self.assertEqual(set(), missing_keys)
2095+ self.assertFinished(client)
2096+
2097+ def test_locked_repo_with_no_lock_token(self):
2098+ transport_path = 'quack'
2099+ repo, client = self.setup_fake_client_and_repository(transport_path)
2100+ client.add_expected_call(
2101+ 'Repository.lock_write', ('quack/', ''),
2102+ 'success', ('ok', ''))
2103+ client.add_expected_call(
2104+ 'Repository.insert_stream_1.18', ('quack/', ''),
2105+ 'unknown', ('Repository.insert_stream_1.18',))
2106+ client.add_expected_call(
2107+ 'Repository.insert_stream', ('quack/', ''),
2108+ 'success', ('ok',))
2109+ client.add_expected_call(
2110+ 'Repository.insert_stream', ('quack/', ''),
2111+ 'success', ('ok',))
2112+ repo.lock_write()
2113+ sink = repo._get_sink()
2114+ fmt = repository.RepositoryFormat.get_default_format()
2115+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2116+ self.assertEqual([], resume_tokens)
2117+ self.assertEqual(set(), missing_keys)
2118+ self.assertFinished(client)
2119+
2120+ def test_locked_repo_with_lock_token(self):
2121+ transport_path = 'quack'
2122+ repo, client = self.setup_fake_client_and_repository(transport_path)
2123+ client.add_expected_call(
2124+ 'Repository.lock_write', ('quack/', ''),
2125+ 'success', ('ok', 'a token'))
2126+ client.add_expected_call(
2127+ 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
2128+ 'unknown', ('Repository.insert_stream_1.18',))
2129+ client.add_expected_call(
2130+ 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2131+ 'success', ('ok',))
2132+ client.add_expected_call(
2133+ 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2134+ 'success', ('ok',))
2135+ repo.lock_write()
2136+ sink = repo._get_sink()
2137+ fmt = repository.RepositoryFormat.get_default_format()
2138+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2139+ self.assertEqual([], resume_tokens)
2140+ self.assertEqual(set(), missing_keys)
2141+ self.assertFinished(client)
2142+
2143+ def test_stream_with_inventory_delta(self):
2144+ """inventory-delta records can't be sent to the
2145+ Repository.insert_stream verb. So when one is encountered the
2146+ RemoteSink immediately stops using that verb and falls back to VFS
2147+ insert_stream.
2148+ """
2149+ transport_path = 'quack'
2150+ repo, client = self.setup_fake_client_and_repository(transport_path)
2151+ client.add_expected_call(
2152+ 'Repository.insert_stream_1.18', ('quack/', ''),
2153+ 'unknown', ('Repository.insert_stream_1.18',))
2154+ client.add_expected_call(
2155+ 'Repository.insert_stream', ('quack/', ''),
2156+ 'success', ('ok',))
2157+ client.add_expected_call(
2158+ 'Repository.insert_stream', ('quack/', ''),
2159+ 'success', ('ok',))
2160+ # Create a fake real repository for insert_stream to fall back on, so
2161+ # that we can directly see the records the RemoteSink passes to the
2162+ # real sink.
2163+ class FakeRealSink:
2164+ def __init__(self):
2165+ self.records = []
2166+ def insert_stream(self, stream, src_format, resume_tokens):
2167+ for substream_kind, substream in stream:
2168+ self.records.append(
2169+ (substream_kind, [record.key for record in substream]))
2170+ return ['fake tokens'], ['fake missing keys']
2171+ fake_real_sink = FakeRealSink()
2172+ class FakeRealRepository:
2173+ def _get_sink(self):
2174+ return fake_real_sink
2175+ repo._real_repository = FakeRealRepository()
2176+ sink = repo._get_sink()
2177+ fmt = repository.RepositoryFormat.get_default_format()
2178+ stream = self.make_stream_with_inv_deltas(fmt)
2179+ resume_tokens, missing_keys = sink.insert_stream(stream, fmt, [])
2180+ # Every record from the first inventory delta should have been sent to
2181+ # the VFS sink.
2182+ expected_records = [
2183+ ('inventories', [('rev2',), ('rev3',)]),
2184+ ('texts', [('some-rev', 'some-file')])]
2185+ self.assertEqual(expected_records, fake_real_sink.records)
2186+ # The return values from the real sink's insert_stream are propagated
2187+ # back to the original caller.
2188+ self.assertEqual(['fake tokens'], resume_tokens)
2189+ self.assertEqual(['fake missing keys'], missing_keys)
2190+ self.assertFinished(client)
2191+
2192+ def make_stream_with_inv_deltas(self, fmt):
2193+ """Make a simple stream with an inventory delta followed by more
2194+ records and more substreams to test that all records and substreams
2195+ from that point on are used.
2196+
2197+ This sends, in order:
2198+ * inventories substream: rev1, rev2, rev3. rev2 and rev3 are
2199+ inventory-deltas.
2200+ * texts substream: (some-rev, some-file)
2201+ """
2202+ # Define a stream using generators so that it isn't rewindable.
2203+ def stream_with_inv_delta():
2204+ yield ('inventories', inventory_substream_with_delta())
2205+ yield ('texts', [
2206+ versionedfile.FulltextContentFactory(
2207+ ('some-rev', 'some-file'), (), None, 'content')])
2208+ def inventory_substream_with_delta():
2209+ # An empty inventory fulltext. This will be streamed normally.
2210+ inv = inventory.Inventory(revision_id='rev1')
2211+ text = fmt._serializer.write_inventory_to_string(inv)
2212+ yield versionedfile.FulltextContentFactory(
2213+ ('rev1',), (), None, text)
2214+ # An inventory delta. This can't be streamed via this verb, so it
2215+ # will trigger a fallback to VFS insert_stream.
2216+ entry = inv.make_entry(
2217+ 'directory', 'newdir', inv.root.file_id, 'newdir-id')
2218+ delta = [(None, 'newdir', 'newdir-id', entry)]
2219+ yield versionedfile.InventoryDeltaContentFactory(
2220+ ('rev2',), (('rev1',)), None, ('rev1',), (True, False), None)
2221+ # Another delta.
2222+ yield versionedfile.InventoryDeltaContentFactory(
2223+ ('rev3',), (('rev1',)), None, ('rev1',), (True, False), None)
2224+ return stream_with_inv_delta()
2225+
2226+
2227+class TestRepositoryInsertStream_1_18(TestRemoteRepository):
2228+
2229+ def test_unlocked_repo(self):
2230+ transport_path = 'quack'
2231+ repo, client = self.setup_fake_client_and_repository(transport_path)
2232+ client.add_expected_call(
2233+ 'Repository.insert_stream_1.18', ('quack/', ''),
2234+ 'success', ('ok',))
2235+ client.add_expected_call(
2236+ 'Repository.insert_stream_1.18', ('quack/', ''),
2237+ 'success', ('ok',))
2238+ sink = repo._get_sink()
2239+ fmt = repository.RepositoryFormat.get_default_format()
2240+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2241+ self.assertEqual([], resume_tokens)
2242+ self.assertEqual(set(), missing_keys)
2243+ self.assertFinished(client)
2244+
2245+ def test_locked_repo_with_no_lock_token(self):
2246+ transport_path = 'quack'
2247+ repo, client = self.setup_fake_client_and_repository(transport_path)
2248+ client.add_expected_call(
2249+ 'Repository.lock_write', ('quack/', ''),
2250+ 'success', ('ok', ''))
2251+ client.add_expected_call(
2252+ 'Repository.insert_stream_1.18', ('quack/', ''),
2253+ 'success', ('ok',))
2254+ client.add_expected_call(
2255+ 'Repository.insert_stream_1.18', ('quack/', ''),
2256+ 'success', ('ok',))
2257+ repo.lock_write()
2258+ sink = repo._get_sink()
2259+ fmt = repository.RepositoryFormat.get_default_format()
2260+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2261+ self.assertEqual([], resume_tokens)
2262+ self.assertEqual(set(), missing_keys)
2263+ self.assertFinished(client)
2264+
2265+ def test_locked_repo_with_lock_token(self):
2266+ transport_path = 'quack'
2267+ repo, client = self.setup_fake_client_and_repository(transport_path)
2268+ client.add_expected_call(
2269+ 'Repository.lock_write', ('quack/', ''),
2270+ 'success', ('ok', 'a token'))
2271+ client.add_expected_call(
2272+ 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
2273+ 'success', ('ok',))
2274+ client.add_expected_call(
2275+ 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
2276 'success', ('ok',))
2277 repo.lock_write()
2278 sink = repo._get_sink()
2279
2280=== modified file 'bzrlib/tests/test_selftest.py'
2281--- bzrlib/tests/test_selftest.py 2009-07-22 06:00:45 +0000
2282+++ bzrlib/tests/test_selftest.py 2009-07-28 12:36:08 +0000
2283@@ -124,7 +124,7 @@
2284 self.assertEqual(sample_permutation,
2285 get_transport_test_permutations(MockModule()))
2286
2287- def test_scenarios_invlude_all_modules(self):
2288+ def test_scenarios_include_all_modules(self):
2289 # this checks that the scenario generator returns as many permutations
2290 # as there are in all the registered transport modules - we assume if
2291 # this matches its probably doing the right thing especially in
2292@@ -297,14 +297,12 @@
2293 scenarios = make_scenarios(server1, server2, formats)
2294 self.assertEqual([
2295 ('str,str,str',
2296- {'interrepo_class': str,
2297- 'repository_format': 'C1',
2298+ {'repository_format': 'C1',
2299 'repository_format_to': 'C2',
2300 'transport_readonly_server': 'b',
2301 'transport_server': 'a'}),
2302 ('int,str,str',
2303- {'interrepo_class': int,
2304- 'repository_format': 'D1',
2305+ {'repository_format': 'D1',
2306 'repository_format_to': 'D2',
2307 'transport_readonly_server': 'b',
2308 'transport_server': 'a'})],
2309
2310=== modified file 'bzrlib/tests/test_xml.py'
2311--- bzrlib/tests/test_xml.py 2009-04-03 21:50:40 +0000
2312+++ bzrlib/tests/test_xml.py 2009-07-28 12:36:08 +0000
2313@@ -19,6 +19,7 @@
2314 from bzrlib import (
2315 errors,
2316 inventory,
2317+ xml6,
2318 xml7,
2319 xml8,
2320 serializer,
2321@@ -139,6 +140,14 @@
2322 </inventory>
2323 """
2324
2325+_expected_inv_v6 = """<inventory format="6" revision_id="rev_outer">
2326+<directory file_id="tree-root-321" name="" revision="rev_outer" />
2327+<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
2328+<file file_id="file-id" name="file" parent_id="tree-root-321" revision="rev_outer" text_sha1="A" text_size="1" />
2329+<symlink file_id="link-id" name="link" parent_id="tree-root-321" revision="rev_outer" symlink_target="a" />
2330+</inventory>
2331+"""
2332+
2333 _expected_inv_v7 = """<inventory format="7" revision_id="rev_outer">
2334 <directory file_id="tree-root-321" name="" revision="rev_outer" />
2335 <directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
2336@@ -377,6 +386,17 @@
2337 for path, ie in inv.iter_entries():
2338 self.assertEqual(ie, inv2[ie.file_id])
2339
2340+ def test_roundtrip_inventory_v6(self):
2341+ inv = self.get_sample_inventory()
2342+ txt = xml6.serializer_v6.write_inventory_to_string(inv)
2343+ lines = xml6.serializer_v6.write_inventory_to_lines(inv)
2344+ self.assertEqual(bzrlib.osutils.split_lines(txt), lines)
2345+ self.assertEqualDiff(_expected_inv_v6, txt)
2346+ inv2 = xml6.serializer_v6.read_inventory_from_string(txt)
2347+ self.assertEqual(4, len(inv2))
2348+ for path, ie in inv.iter_entries():
2349+ self.assertEqual(ie, inv2[ie.file_id])
2350+
2351 def test_wrong_format_v7(self):
2352 """Can't accidentally open a file with wrong serializer"""
2353 s_v6 = bzrlib.xml6.serializer_v6
2354
2355=== modified file 'bzrlib/versionedfile.py'
2356--- bzrlib/versionedfile.py 2009-07-15 16:52:26 +0000
2357+++ bzrlib/versionedfile.py 2009-07-28 12:36:07 +0000
2358@@ -34,6 +34,8 @@
2359 errors,
2360 groupcompress,
2361 index,
2362+ inventory,
2363+ inventory_delta,
2364 knit,
2365 osutils,
2366 multiparent,
2367@@ -158,6 +160,31 @@
2368 self.storage_kind)
2369
2370
2371+class InventoryDeltaContentFactory(ContentFactory):
2372+
2373+ def __init__(self, key, parents, sha1, delta, basis_id, format_flags,
2374+ repo=None):
2375+ self.sha1 = sha1
2376+ self.storage_kind = 'inventory-delta'
2377+ self.key = key
2378+ self.parents = parents
2379+ self._delta = delta
2380+ self._basis_id = basis_id
2381+ self._format_flags = format_flags
2382+ self._repo = repo
2383+
2384+ def get_bytes_as(self, storage_kind):
2385+ if storage_kind == self.storage_kind:
2386+ return self._basis_id, self.key, self._delta, self._format_flags
2387+ elif storage_kind == 'inventory-delta-bytes':
2388+ serializer = inventory_delta.InventoryDeltaSerializer()
2389+ serializer.require_flags(*self._format_flags)
2390+ return ''.join(serializer.delta_to_lines(
2391+ self._basis_id, self.key, self._delta))
2392+ raise errors.UnavailableRepresentation(self.key, storage_kind,
2393+ self.storage_kind)
2394+
2395+
2396 class AbsentContentFactory(ContentFactory):
2397 """A placeholder content factory for unavailable texts.
2398
2399@@ -1557,13 +1584,15 @@
2400 record.get_bytes_as(record.storage_kind) call.
2401 """
2402 self._bytes_iterator = bytes_iterator
2403- self._kind_factory = {'knit-ft-gz':knit.knit_network_to_record,
2404- 'knit-delta-gz':knit.knit_network_to_record,
2405- 'knit-annotated-ft-gz':knit.knit_network_to_record,
2406- 'knit-annotated-delta-gz':knit.knit_network_to_record,
2407- 'knit-delta-closure':knit.knit_delta_closure_to_records,
2408- 'fulltext':fulltext_network_to_record,
2409- 'groupcompress-block':groupcompress.network_block_to_records,
2410+ self._kind_factory = {
2411+ 'fulltext': fulltext_network_to_record,
2412+ 'groupcompress-block': groupcompress.network_block_to_records,
2413+ 'inventory-delta': inventory_delta_network_to_record,
2414+ 'knit-ft-gz': knit.knit_network_to_record,
2415+ 'knit-delta-gz': knit.knit_network_to_record,
2416+ 'knit-annotated-ft-gz': knit.knit_network_to_record,
2417+ 'knit-annotated-delta-gz': knit.knit_network_to_record,
2418+ 'knit-delta-closure': knit.knit_delta_closure_to_records,
2419 }
2420
2421 def read(self):
2422@@ -1589,6 +1618,21 @@
2423 return [FulltextContentFactory(key, parents, None, fulltext)]
2424
2425
2426+def inventory_delta_network_to_record(kind, bytes, line_end):
2427+ """Convert a network inventory-delta record to record."""
2428+ meta_len, = struct.unpack('!L', bytes[line_end:line_end+4])
2429+ record_meta = bytes[line_end+4:line_end+4+meta_len]
2430+ key, parents = bencode.bdecode_as_tuple(record_meta)
2431+ if parents == 'nil':
2432+ parents = None
2433+ inventory_delta_bytes = bytes[line_end+4+meta_len:]
2434+ deserialiser = inventory_delta.InventoryDeltaSerializer()
2435+ parse_result = deserialiser.parse_text_bytes(inventory_delta_bytes)
2436+ basis_id, new_id, rich_root, tree_refs, delta = parse_result
2437+ return [InventoryDeltaContentFactory(
2438+ key, parents, None, delta, basis_id, (rich_root, tree_refs))]
2439+
2440+
2441 def _length_prefix(bytes):
2442 return struct.pack('!L', len(bytes))
2443
2444@@ -1604,6 +1648,17 @@
2445 _length_prefix(record_meta), record_meta, record_content)
2446
2447
2448+def record_to_inventory_delta_bytes(record):
2449+ record_content = record.get_bytes_as('inventory-delta-bytes')
2450+ if record.parents is None:
2451+ parents = 'nil'
2452+ else:
2453+ parents = record.parents
2454+ record_meta = bencode.bencode((record.key, parents))
2455+ return "inventory-delta\n%s%s%s" % (
2456+ _length_prefix(record_meta), record_meta, record_content)
2457+
2458+
2459 def sort_groupcompress(parent_map):
2460 """Sort and group the keys in parent_map into groupcompress order.
2461
2462
2463=== modified file 'bzrlib/xml5.py'
2464--- bzrlib/xml5.py 2009-04-04 02:50:01 +0000
2465+++ bzrlib/xml5.py 2009-07-28 12:36:07 +0000
2466@@ -39,8 +39,8 @@
2467 format = elt.get('format')
2468 if format is not None:
2469 if format != '5':
2470- raise BzrError("invalid format version %r on inventory"
2471- % format)
2472+ raise errors.BzrError("invalid format version %r on inventory"
2473+ % format)
2474 data_revision_id = elt.get('revision_id')
2475 if data_revision_id is not None:
2476 revision_id = cache_utf8.encode(data_revision_id)