Merge lp:~spiv/bzr/inventory-delta into lp:~bzr/bzr/trunk-old

Proposed by Andrew Bennetts
Status: Merged
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/inventory-delta
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 2845 lines
To merge this branch: bzr merge lp:~spiv/bzr/inventory-delta
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+9676@code.launchpad.net

This proposal supersedes a proposal from 2009-07-22.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This is a pretty big patch. It does lots of things:

 * adds new insert_stream and get_stream verbs
 * adds de/serialization of inventory-delta records on the network
 * fixes rich-root generation in StreamSource
 * adds a bunch of new scenarios to per_interrepository tests
 * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
   the new pack is already optimal).
 * improves the inventory_delta module a little
 * various miscellaneous fixes and new tests that are hopefully self-evident
 * and, most controversially, removes InterDifferingSerializer.

From John's mail a while back there were a bunch of issues with removing IDS. I
think the outstanding ones are:

> 1) Incremental updates. IDS converts batches of 100 revs at a time,
> which also triggers autopacks at 1k revs. Streaming fetch is currently
> an all-or-nothing, which isn't appropriate (IMO) for conversions.
> Consider that conversion can take *days*, it is important to have
> something that can be stopped and resumed.
>
> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
> bloats to 2.4GB before packing back to 25MB. We know the new format is
> even more sensitive to packing efficiency. Not to mention that a single
> big-stream generates a single large pack, it isn't directly obvious that
> we are being so inefficient.

i.e. performance concerns.

The streaming code is pretty similar in how it does the conversion now to the
way IDS did it, but probably still different enough that we will want to measure
the impact of this. I'm definitely concerned about case 2, the lack of packing
as you go, although perhaps the degree of bloat is reduced by using
semantic inventory-delta records?

The reason why I eventually deleted IDS was that it was just too burdensome to
keep two code paths alive, thoroughly tested, and correct. For instance, if we
simply reinstated IDS for local-only fetches then most of the test suite,
including the relevant interrepo tests, will only exercise IDS. Also, IDS
turned out to have a bug when used on a stacked repository that the extending
test suite in this branch revealed (I've forgotten the details, but can dig them
up if you like). It didn't seem worth the hassle of fixing IDS when I already
had a working implementation.

I'm certainly open to reinstating IDS if it's the most expedient way to have
reasonable local performance for upgrades, but I thought I'd try to be bold and
see if we could just live without the extra complexity. Maybe we can improve
performance of streaming rather than resurrect IDS?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (4.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.

It also picks out the 'optimal' deltas by computing many different ones
and finding whichever one was the 'smallest'. For local conversions, the
time to compute 2-3 deltas was much smaller than to apply an inefficient
delta.

>>
>> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
>> bloats to 2.4GB before packing back to 25MB. We know the new format is
>> even more sensitive to packing efficiency. Not to mention that a single
>> big-stream generates a single large pack, it isn't directly obvious that
>> we are being so inefficient.
>
> i.e. performance concerns.
>

Generally, yes.

There is also:

3) Being able to resume because you snapshotted periodically as you
went. This seems even more important for a network transfer.

> The streaming code is pretty similar in how it does the conversion now to the
> way IDS did it, but probably still different enough that we will want to measure
> the impact of this. I'm definitely concerned about case 2, the lack of packing
> as you go, although perhaps the degree of bloat is reduced by using
> semantic inventory-delta records?
>

I don't think bzr bloating from 100MB => 2.4GB (and then back down to
25MB post pack) was because of inventory records. However, if it was
purely because of a bad streaming order, we could probably fix that by
changing how we stream texts.

> The reason why I eventually deleted IDS was that it was just too burdensome to
> keep two code paths alive, thoroughly tested, and correct. For instance, if we
> simply reinstated IDS for local-only fetches then most of the test suite,
> including the relevant interrepo tests, will only exercise IDS. Also, IDS
> turned out to have a bug when used on a stacked repository that the extending
> test suite in this branch revealed (I've forgotten the details, but can dig them
> up if you like). It didn't seem worth the hassle of fixing IDS when I already
> had a working imple...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

and

4) Progress indication

This is really quite useful for a process that can take *days* to
complete. The Stream code is often quite nice, but the fact that it
gives you 2 states:
 'getting stream'
 'inserting stream'

and nothing more than that is pretty crummy.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfUK8ACgkQJdeBCYSNAAMa+wCgybpPdd4Yie/Craew/zxX9eF7
cWMAoNcxPftDDdLssboDW7rezk4d2L2d
=WA26
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:

So for starters, let me mention what I found wrt performance:

time bzr.dev branch mysql-1k myqsl-2a/1k
  real 3m18.490s

time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
  real 2m29.953s

+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
                 return cached_ie.copy()

         kind = elt.tag

It has 2 basic effects:

1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())

2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
  x.attribute1 == y.attribute1
  and x.attribute2 == y.attribute2
etc.

As you can see it is a big win for this test case (about 4:3 or 33% faster)

So what about Andrew's work:

time bzr.inv.delta branch mysql-1k myqsl-2a/1k
  real 10m14.267s

time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
  real 9m49.372s

It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043

For most of that time, making it really look like it was stalled.

Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.

If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).

So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfWm0ACgkQJdeBCYSNAAM8mgCgru3K3SpP8BcMZdLJLH...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Andrew Bennetts wrote:

...

> So for starters, let me mention what I found wrt performance:
>
> time bzr.dev branch mysql-1k myqsl-2a/1k
> real 3m18.490s
>
> time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
> real 2m29.953s

...

> time bzr.inv.delta branch mysql-1k myqsl-2a/1k
> real 10m14.267s
>
> time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
> real 9m49.372s

Also, for real-world space issues:
$ du -ksh mysql-2a*/.bzr/repository/obsolete*
1.9M mysql-2a-bzr.dev/.bzr/repository/obsolete_packs
467M mysql-2a-inv-delta/.bzr/repository/obsolete_packs

The peak size (watch du -ksh mysql-2a-bzr.dev) during conversion using
IDS was 49MB.

$ du -ksh mysql-2a*/.bzr/repository/packs*
11M mysql-2a-bzr.dev/.bzr/repository/packs
9.1M mysql-2a-inv-delta/.bzr/repository/packs

So the new code wins slightly in the final size on disk, because it
packed at the end, rather than at 1k revs (and then there were another
40+ revs inserted.)

However, it bloated from 15MB => 467MB while it was doing the transfer
before the final size. Versus a peak of 50MB (almost 10x larger).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfdCkACgkQJdeBCYSNAANABACgl4l4L1AjaiXRJgrn5iwLrVe1
tVEAnRRJ0QbWzd8lXFXQXhWdhvqFjnw8
=pXZe
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

John A Meinel wrote:
[...]
> It also picks out the 'optimal' deltas by computing many different ones
> and finding whichever one was the 'smallest'. For local conversions, the
> time to compute 2-3 deltas was much smaller than to apply an inefficient
> delta.

FWIW, the streaming code also does this. My guess (not yet measured) is that
sending less bytes over the network is also a win, especially when one parent
might be a one-liner and the other might be large merge from trunk.

[...]
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

Yes, although we already don't have this for the network. It would be great to
have...

[...]
> I'm certainly open to the suggestion of getting rid of IDS. I don't like
> having multiple code paths. It just happens that there are *big* wins
> and it is often easier to write optimized code in a different framework.

Sure. Like I said for me it was just getting to be a large hassle to maintain
both paths in my branch, even though they were increasingly sharing a lot of
code for e.g. rich root generation before I deleted IDS.

I'd like to try see if we can cheaply fix the performance issues you report in
other mails without needing IDS. If we do need IDS for a while longer then
fine, although I think we'll want to restrict it to local source, local target,
non-stacked cases only.

Thanks for the measurements and quick feedback.

-Andrew.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:06 +0000, John A Meinel wrote:
>
> (3) is an issue I'd like to see addressed, but which Robert seems
> particularly unhappy having us try to do. (See other bug comments, etc
> about how other systems don't do it and he feels it isn't worth
> doing.)

I'd like to be clear about this. I'd be ecstatic *if* we can do it well
and robustly. However I don't think it is *at all* easy to that. If I'm
wrong - great.

I'm fine with keeping IDS for local fetches. But when networking is
involved IDS is massively slower than the streaming codepath.

> It was fairly straightforward to do with IDS, the argument I think
> from
> Robert is that the client would need to be computing whether it has a
> 'complete' set and thus can commit the current write group. (the
> *source* knows these sort of things, and can just say "and now you
> have
> it", but the client has to re-do all that work to figure it out from a
> stream.)

I think that aspect is simple - we have a stream subtype that says
'checkpoint'. Its the requirement to do all that work that is, I think
problematic - and thats *without* considering stacking, which makes it
hugely harder.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>
>
> 4) Progress indication
>
> This is really quite useful for a process that can take *days* to
> complete. The Stream code is often quite nice, but the fact that it
> gives you 2 states:
> 'getting stream'
> 'inserting stream'
>
> and nothing more than that is pretty crummy.

That is a separate bug however, and one that affects normal fetches too.
So I don't think tying it to the IDS discussion is necessary or
particularly helpful.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>>
>> 4) Progress indication
>>
>> This is really quite useful for a process that can take *days* to
>> complete. The Stream code is often quite nice, but the fact that it
>> gives you 2 states:
>> 'getting stream'
>> 'inserting stream'
>>
>> and nothing more than that is pretty crummy.
>
> That is a separate bug however, and one that affects normal fetches too.
> So I don't think tying it to the IDS discussion is necessary or
> particularly helpful.
>
> -Rob
>

It is explicitly relevant that doing "bzr upgrade --2a" which will take
longer-than-normal would now not even show a progress bar.

For local fetches, you don't even get the "transport activity"
indicator, so it *really* looks hung. It doesn't even write things into
.bzr.log so that you know it is doing anything other than spinning in a
while True loop. I guess you can tell because your disk consumption is
going way up...

I don't honestly know the performance difference for streaming a lot of
content over the network. Given a 4x performance slowdown, for large
fetches IDS could still be faster. I certainly agree that IDS is
probably significantly more inefficient when doing something like "give
me the last 2 revs".

It honestly wasn't something I was optimizing for (cross format
fetching). I *was* trying to make 'bzr upgrade' be measured in hours
rather than days/weeks/etc.

Also, given that you have to upgrade all of your stacked locations at
the same time, and --2a is a trap door, aren't 95% of upgrades going to
be all at once anyway?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpf46YACgkQJdeBCYSNAANwAwCfYQj7gws3O4KDPxqrcMLu4nfB
554AoIyuns4b5Fsa3wf4uFhf4Uex00oQ
=qjX9
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This is the same patch updated for bzr.dev, and with InterDifferingSerializer restored for local fetches only. For non-local fetches I'm pretty sure the streaming code path is massively faster based on the timings I've done (even via TCP HPSS to localhost).

I'm sure we do want to get rid of IDS eventually (and fix the shortcomings in streaming that John has pointed out) but doing that shouldn't block the rest of this work, even if it is a small maintenance headache.

[FWIW, if I don't restrict IDS to local-only branches I get test failures like:

Traceback (most recent call last):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 137, in test_fetch_parent_inventories_at_stacking_boundary_smart_old
    self.test_fetch_parent_inventories_at_stacking_boundary()
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 181, in test_fetch_parent_inventories_at_stacking_boundary
    self.assertCanStreamRevision(unstacked_repo, 'merge')
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 187, in assertCanStreamRevision
    for substream_kind, substream in source.get_stream(search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/remote.py", line 1895, in missing_parents_chain
    for kind, stream in self._get_stream(sources[0], search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/repository.py", line 537, in record_stream
    for bytes in byte_stream:
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 338, in read_streamed_body
    _translate_error(self._body_error_args)
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 361, in _translate_error
    raise errors.ErrorFromSmartServer(error_tuple)
ErrorFromSmartServer: Error received from smart server: ('error', "<bzrlib.groupcompress.GroupCompressVersionedFiles object at 0xb06348c> has no revision ('sha1:98fd3a13366960dc27dcb4b6ddb2b55aca3aae7b',)")

(for scenarios like Pack6RichRoot->2a).]

Anyway, please review.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

I'm not really sure what testing you've done on this, but I'm getting some really strange results.

Specifically, when I push it turns out to be sending xml fragments over the wire. Specifically it seems to be this code:
        elif (not from_format.supports_chks):
            # Source repository doesn't support chks. So we can transmit the
            # inventories 'as-is' and either they are just accepted on the
            # target, or the Sink will properly convert it.
            # (XXX: this assumes that all non-chk formats are understood as-is
            # by any Sink, but that presumably isn't true for foreign repo
            # formats added by bzr-svn etc?)
            return self._get_simple_inventory_stream(revision_ids,
                    missing=missing)

Which means that the raw xml bytes are being transmitted, and then the target side is extracting the xml upcasting and downcasting.

I see that there are code paths in place to do otherwise, but as near as I can tell, "_stream_invs_as_deltas" is only getting called if the *source* format is CHK.

From the profiling I've done, the _generate_root_texts() code is the bulk of the overhead with the new format. But *that* is because all of the data is sent to the server as XML texts, and the server has to do all the work to convert it.

When I just disable the code path that sends 'simple_inventory_stream', then I get:

$ time wbzr push bzr://localhost/test-2a/x
bzr: ERROR: Version present for / in TREE_ROOT

So I have a *strong* feeling the code you've introduced is actually broken, and you just didn't realize it.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (7.1 KiB)

So, to further my discussion. While investigating this, I found some odd bits in _stream_invs_as_deltas. For example:

+ def _stream_invs_as_deltas(self, revision_ids, fulltexts=False):
         from_repo = self.from_repository

...

+ inventories = self.from_repository.iter_inventories(
+ revision_ids, 'topological')
+ # XXX: ideally these flags would be per-revision, not per-repo (e.g.
+ # streaming a non-rich-root revision out of a rich-root repo back into
+ # a non-rich-root repo ought to be allowed)
+ format = from_repo._format
+ flags = (format.rich_root_data, format.supports_tree_reference)
+ invs_sent_so_far = set([_mod_revision.NULL_REVISION])
+ for inv in inventories:
             key = (inv.revision_id,)

...

+ for parent_id in parent_ids:
...
+ if (best_delta is None or
+ len(best_delta) > len(candidate_delta)):
+ best_delta = candidate_delta
+ basis_id = parent_id
+ delta = best_delta
+ invs_sent_so_far.add(basis_id)

^- Notice that once you've prepared a delta for "inv.revision_id" you then set "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending most of the inventories as complete texts, because you won't actually think you've sent what you actually have done.

+ yield versionedfile.InventoryDeltaContentFactory(
+ key, parents, None, delta, basis_id, flags, from_repo)

The way you've handled the "no parents are available use NULL", means that you
actually create a delta to NULL for *every* revision, and find out if it is the
smallest one. Which seems inefficient versus just using NULL when nothing else
is available. (Note that IDS goes as far as to not use parents that aren't
cached even if they have been sent, and to fall back to using the last-sent
revision otherwise.)

I've changed this loop around a bit, to avoid some duplication and make it a little bit clearer (at least to me) what is going on. I also added a quick LRUCache for the parent inventories that we've just sent, as re-extracting them from the repository is not going to be efficient. (Especially extracting them one at a time.)

Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.

It seems like we should be consistent at that level. What do you think?

As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:

            serializer.require_flags(*self._format_flags)

And that, IMO, is asking for trouble. Yes it is most likely correct as written, but it means that you have to line up the tuple created at the start of _stream_invs_as_deltas:
        flags = (format.rich_root_data, format.supports_tree_reference)

with the *arguments* to a function nested 3 calls away.
So how about:

  flags = {'rich_root_data': format.rich_root_data,
           'supports_tree_reference': format.supports_tree_reference
          }
...

and
  serializer.require_flags(**self._format_flags)

It isn't vastly better, but ...

Read more...

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
>
> I suppose one possibility would be to have the client reject the
> stream entirely and request it be re-sent with the basis available.

If we do this then... The server just asks for that inventory again. The
problem though, is that *all* inventories will need to be asked for
again, which could be pathologically bad. It is possible to write a file
and restore it later without network API changes, just pass in write
group save tokens. I don't think we need to block on that for this patch
though - sending a full inventory is no worse than sending the full text
of a file, which 2a does as well - and ChangeLog files etc are very big
- as large or larger than an inventory, I would expect.

> Of course, the get_stream_for_missing_keys code is wired up such that
> clients
> ignore the response about missing inventories if the target is a
> stacked repo,
> and just always force up an extra fulltext copy of all parent
> inventories
> anyway. (Because of the bugs in 1.13, IIRC.)

The condition is wrong... it probably should say 'if we're not told
anything is missing..' - and it should be guarded such that the new push
verb (introduced recently?) doesn' trigger this paranoia.

> Anyway, it means that what you get out the other side of translating
> into and then back out of the serialized form is not exactly equal to
> what went in, which seems very bad (and likely to *not* break when
> using local actions, and then have it break when using remote ones.)

This is a reason to serialise/deserialise locally ;) - at least to start
with.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
> On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
> >
> > I suppose one possibility would be to have the client reject the
> > stream entirely and request it be re-sent with the basis available.
>
> If we do this then... The server just asks for that inventory again. The
> problem though, is that *all* inventories will need to be asked for
> again, which could be pathologically bad. It is possible to write a file
> and restore it later without network API changes, just pass in write
> group save tokens. I don't think we need to block on that for this patch
> though - sending a full inventory is no worse than sending the full text
> of a file, which 2a does as well - and ChangeLog files etc are very big
> - as large or larger than an inventory, I would expect.

Also, we currently only allow one “get missing keys” stream after the original
stream. So if we optimistically send just a delta then having the client
reject it means that the next time we have to send not only a full delta closure
the second time, we also need to send all the inventory parents, because there's
no further opportunity for the client to ask for more keys. This is potentially
even worse than just sending a single fulltext.

I think sending a fulltext cross-format is fine. We aren't trying to maximally
optimise cross-format fetches, just make them acceptable. We should file a bug
for improving this, but I don't think it's particularly urgent.

FWIW, I think the correct way to fix this is to allow the receiving repository
to somehow store inventory-deltas that it can't add yet, so that it can work as
part of the regular suspend_write_group and get missing keys logic. Then the
sender can optimistically send a delta, and the receiver can either insert it or
store it in a temporary upload pack and ask for the parent keys, just as we
already do for all other types of deltas. This is optimal in the case where the
recipient already has the basis, and only requires one extra roundtrip in the
case that it doesn't. We can perhaps do this by creating an upload pack just to
store the fulltexts of inventory-deltas, including that pack in the resume
tokens, but making sure never to insert in the final commit.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal
Download full text (5.1 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> So, to further my discussion. While investigating this, I found some odd bits
> in _stream_invs_as_deltas. For example:
[...]
>
> ^- Notice that once you've prepared a delta for "inv.revision_id" you then set
> "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending
> most of the inventories as complete texts, because you won't actually think
> you've sent what you actually have done.

Oops, right. I've merged your fix for that.

> + yield versionedfile.InventoryDeltaContentFactory(
> + key, parents, None, delta, basis_id, flags, from_repo)
>
> The way you've handled the "no parents are available use NULL", means that you
> actually create a delta to NULL for *every* revision, and find out if it is the
> smallest one. Which seems inefficient versus just using NULL when nothing else
> is available. (Note that IDS goes as far as to not use parents that aren't
> cached even if they have been sent, and to fall back to using the last-sent
> revision otherwise.)
>
> I've changed this loop around a bit, to avoid some duplication and make it a
> little bit clearer (at least to me) what is going on. I also added a quick
> LRUCache for the parent inventories that we've just sent, as re-extracting
> them from the repository is not going to be efficient. (Especially extracting
> them one at a time.)

Thanks, I've merged your fix.

> Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.
>
> It seems like we should be consistent at that level. What do you think?

It would be nice, but not vital.

> As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:
[...]
> serializer.require_flags(**self._format_flags)
>
> It isn't vastly better, but at least they are named arguments, rather than
> fairly arbitrary positional ones.

Ok, I'll do that.

> In the end, I don't think that it is ideal that an incremental push of 1
> revision will transmit a complete inventory (in delta form). I understand the
> limitation is that we currently are unable to 'buffer' these records anywhere
> in a persistent state between RPC calls. (Even though we have a bytes-on-the
> wire representation, we don't have an index/etc to look them up in.)

(As said elsewhere on this thread...)

This is only an issue for a cross-format push, which doesn't have to be
maximally efficient. It just has to be reasonable. We can look at doing better
later if we want, but this is already a massive improvement in terms of data
transferred compared to IDS.

[...]
> For *me* the ssh handshake is probably signficantly more than the time to push
> up 2 inventories of bzr.dev, but that tradeoff is very different for people
> working over a tiny bandwidth from their GPRS phone, trying to push a critical
> bugfix for a Launchpad branch.

Then they probably should go to the effort of having their local branch in a
matching format :)

We're not intending to use inventory-deltas for anything but cross-format
fetches AFAIK.

> Again, the code as written fails to actually transmit data over the...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>
> > Digging further, the bytes on the wire don't include things like:
> > InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
> > InventoryDeltaContentFactory.sha1
>
> Inventories don't have parent lists, IIRC? Revisions do, and text
> versions do,
> but inventories don't. (They may have a compression parent, but not a
> semantic
> parent.)

Indeed. I wouldn't give them one in the deltas; let repositories that
care calculate one.
parents=None is valid for the interface..

> The sha1 is never set, although perhaps it should be.

No it shouldn't. the sha1 is format specific, and we don't convert back
to the original format to check it, so it would, at best, be discarded.
sha1 = None is valid for the interface as well

_Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>>> Digging further, the bytes on the wire don't include things like:
>>> InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
>>> InventoryDeltaContentFactory.sha1
>> Inventories don't have parent lists, IIRC? Revisions do, and text
>> versions do,
>> but inventories don't. (They may have a compression parent, but not a
>> semantic
>> parent.)
>
> Indeed. I wouldn't give them one in the deltas; let repositories that
> care calculate one.
> parents=None is valid for the interface..

So... all of our inventory texts have parents in all repository formats
that exist *today*.

It isn't feasible to have them query the revisions. Because in "knit"
streams, they haven't received the revision data until after the
inventory data.

At first I thought we had fixed inventories in CHK, but I checked, and
it still has parents. And there is other code that assumes this fact.

Case in point, my new bundle sending/receiving code. It intentionally
queries the inventories.get_parent_map because
revisions.get_parent_map() has *not* been filled in yet.
(Bundle.insert_revisions always inserts objects as texts, inventories,
revisions.)

John
=:->

>
>> The sha1 is never set, although perhaps it should be.
>
> No it shouldn't. the sha1 is format specific, and we don't convert back
> to the original format to check it, so it would, at best, be discarded.
> sha1 = None is valid for the interface as well
>
> _Rob

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp25csACgkQJdeBCYSNAANx4ACeJKMpECIH1YR6yJ5+RxNCNgIW
jGAAn2eOWm6DNJNWweSo4NF5CZ34OJiC
=Hyh5
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-08-03 at 13:30 +0000, John A Meinel wrote:
>
> > Indeed. I wouldn't give them one in the deltas; let repositories
> that
> > care calculate one.
> > parents=None is valid for the interface..
>
> So... all of our inventory texts have parents in all repository
> formats
> that exist *today*.
>
> It isn't feasible to have them query the revisions. Because in "knit"
> streams, they haven't received the revision data until after the
> inventory data.

ugh. ugh ugh ugh swear swear swear.

> At first I thought we had fixed inventories in CHK, but I checked, and
> it still has parents. And there is other code that assumes this fact.
>
> Case in point, my new bundle sending/receiving code. It intentionally
> queries the inventories.get_parent_map because
> revisions.get_parent_map() has *not* been filled in yet.
> (Bundle.insert_revisions always inserts objects as texts, inventories,
> revisions.)

Why does it need to do that though?

We're going to have to break this chain sometime.

If we can't at this stage, we'll need to either supply parents on the
serialised deltas, or add a parents field to the inventory serialisation
form. I prefer the former, myself.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

Ok, third time lucky! :)

Some notable changes since the last review:

  - Uses an inventory-deltas substream rather than inventing a new content factory;
  - Adds a few debug flags to control whether this code path is used or not;
  - Fixes some bugs relating to rich roots and to deletes in inventory_delta.py!

I was surprised to realise that, despite the expectations of inventory_delta.py, non-rich-root repos can have roots with IDs other than 'TREE_ROOT'. What they can't have is a root entry with a revision that doesn't match the inventory's revision.

This beats InterDifferingSerializer for a push of bzr.dev -r2000 from 1.9->2a over localhost HPSS by about 3x (9.5min vs. ~30min, although not on a totally quiescent laptop). LocalTransport push with IDS is about 10x faster than that, though.

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
[...]
> If we can't at this stage, we'll need to either supply parents on the
> serialised deltas, or add a parents field to the inventory serialisation
> form. I prefer the former, myself.

FWIW, my current code sends the parents for the inventory in the .parents of the
FulltextContentFactory holding the serialised inventory-delta (and uses that
when calling add_inventory_by_delta).

So my patch maintains the status quo here. (I'm actually fairly sure that the
version of the code John reviewed which inspired this discussion was doing this
too, but that's irrelevant at this point.)

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Ok, third time lucky! :)
>
> Some notable changes since the last review:
>
> - Uses an inventory-deltas substream rather than inventing a new content factory;
> - Adds a few debug flags to control whether this code path is used or not;
> - Fixes some bugs relating to rich roots and to deletes in inventory_delta.py!
>
> I was surprised to realise that, despite the expectations of inventory_delta.py, non-rich-root repos can have roots with IDs other than 'TREE_ROOT'. What they can't have is a root entry with a revision that doesn't match the inventory's revision.
>
> This beats InterDifferingSerializer for a push of bzr.dev -r2000 from 1.9->2a over localhost HPSS by about 3x (9.5min vs. ~30min, although not on a totally quiescent laptop). LocalTransport push with IDS is about 10x faster than that, though.

So do I understand correctly that:

IDS => bzr:// ~30min
InventoryDelta => bzr:// 9.5min
IDS => file:// < 1min

So when I was looking at InventoryDelta (before we fixed it to actually
send deltas :) the #1 overhead was actually in "_generate_root_texts()"
because that was iterating over revision_trees and having to extract all
of the inventories yet again.

Anyway I'll give the new code a look over. Unfortunately there are still
a lot of conflated factors, like code that wants to transmit all of the
"texts" before we transmit *any* "inventories" content, which means
somewhere you need to do buffering.

IDS "works" by batching 100 at a time, so it only buffers the 100 or so
inventory deltas before it writes the root texts to the target repo.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp5gmoACgkQJdeBCYSNAANXLgCgmIFEKd61nvF/69U6vcpgspWe
tIQAoJYnfIbccZmgvWKAL7cwFNxz6H+e
=4Hyp
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :

John Arbash Meinel wrote:
[...]
> So do I understand correctly that:
>
> IDS => bzr:// ~30min
> InventoryDelta => bzr:// 9.5min
> IDS => file:// < 1min

Yes, that's right.

> So when I was looking at InventoryDelta (before we fixed it to actually
> send deltas :) the #1 overhead was actually in "_generate_root_texts()"
> because that was iterating over revision_trees and having to extract all
> of the inventories yet again.

Right, and that's probably still the bottleneck. But even with that bottleneck
it's much faster over the network (even a very fast "network" like the loopback
interface), so I think it's worth merging.

> Anyway I'll give the new code a look over. Unfortunately there are still
> a lot of conflated factors, like code that wants to transmit all of the
> "texts" before we transmit *any* "inventories" content, which means
> somewhere you need to do buffering.
>
> IDS "works" by batching 100 at a time, so it only buffers the 100 or so
> inventory deltas before it writes the root texts to the target repo.

Yeah. It might be nice to somehow arrange similar batching when sending streams
over the network. If we can arrange to make these streams self-contained it
would make it easier to do incremental packing too.

Actually... all we need for incremental packing (which would fix the
"inventory-delta push to 2a is very bloated on disk until the stream is done")
is a way to be able to force a repack of an uncommitted pack (i.e. in upload/,
not inserted). That's probably not too hard to add, and then the StreamSink can
trigger that every N records or when the pack reaches N bytes or something.
I'll have a play with this.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> So do I understand correctly that:
>
> IDS => bzr:// ~30min
> InventoryDelta => bzr:// 9.5min
> IDS => file:// < 1min

Also:

InventoryDelta => file:// ~9.5min

(Timings are intentionally a bit approximate, I haven't kept my laptop perfectly
idle while running these tests, etc.)

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :

> John A Meinel wrote:
> [...]
> > So do I understand correctly that:
> >
> > IDS => bzr:// ~30min
> > InventoryDelta => bzr:// 9.5min
> > IDS => file:// < 1min
>
> Also:
>
> InventoryDelta => file:// ~9.5min
>
> (Timings are intentionally a bit approximate, I haven't kept my laptop
> perfectly
> idle while running these tests, etc.)
>
> -Andrew.

So something isn't quite right with my timings as:

wbzr init-repo --2a test-2a
time wbzr push -d ../bzr/bzr.dev -r 2000 test-2a/bzr -DIDS:always
11m12.889s

I wonder if you didn't make a mistake in your timing of IDS.

In my timing of IDS versus InventoryDelta for bzrtools, it was more:

15.8s time wbzr push -d bzrtools bzr://localhost/test-2a/bzrt
19.1s time wbzr push -d bzrtools test-2a/bzrt

Which shows that IDS was actually *slower* than pushing using InventoryDelta over the local loopback.

Given the numbers you quote, 1m is *much* closer to just the simple:
  bzr init-repo --1.9 test-19
  bzr branch ../bzr/bzr.dev test-19/bzr

Which would be the simple non-converting time.

I'll be running a couple more tests to see if the new refactoring of IDS that you've done has made anything slower, but at least at a first glance the only thing I could find that would be better with IDS is that it doesn't have a second pass over all inventories in order to generate the root texts keys. And that certainly wouldn't explain 9.5m => 1.0m.

I suggest you run your timing test again, and make sure you've set everything up correctly.

I at least thought my laptop was faster than yours, though I'm on Windows and you may have upgraded your laptop since then.

$ time wbzr push -d ../bzr/bzr.dev -r 2000 bzr://localhost/test-2a/bzr -DIDS:never
real 4m32.578s

This is 4m32s down from 11m12s for IDS (file:// to file://). Maybe something did get broken. I'll be running some more tests.

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> So something isn't quite right with my timings as:
>
> wbzr init-repo --2a test-2a
> time wbzr push -d ../bzr/bzr.dev -r 2000 test-2a/bzr -DIDS:always
> 11m12.889s
>
> I wonder if you didn't make a mistake in your timing of IDS.

Yeah, that's likely, I think I probably forgot to init --2a on that run. (I
think I reran it with the correct setup then copy-&-pasted the number from the
wrong one). The other numbers should be fine.

My corrected time for that is 23m 1s! bzr.dev's IDS is much slower for me (29.5
minutes), so I certainly haven't regressed the performance. And the test suite
says I haven't regressed the correctness.

So my patch is looking better and better...

> In my timing of IDS versus InventoryDelta for bzrtools, it was more:
>
> 15.8s time wbzr push -d bzrtools bzr://localhost/test-2a/bzrt
> 19.1s time wbzr push -d bzrtools test-2a/bzrt
>
> Which shows that IDS was actually *slower* than pushing using InventoryDelta
> over the local loopback.

Right, I'm seeing that too with bzr -r2000. (And glancing at the log, I think
it's still slower even if you don't count the time IDS spends autopacking.)

[...]
> I'll be running a couple more tests to see if the new refactoring of IDS that
> you've done has made anything slower, but at least at a first glance the only
> thing I could find that would be better with IDS is that it doesn't have a
> second pass over all inventories in order to generate the root texts keys. And
> that certainly wouldn't explain 9.5m => 1.0m.

I've taken another look at my refactoring of IDS, and I don't see any obvious
problems with my refactoring.

> I suggest you run your timing test again, and make sure you've set everything
> up correctly.
>
> I at least thought my laptop was faster than yours, though I'm on Windows and
> you may have upgraded your laptop since then.

I haven't upgraded my laptop for, well, years :)

> $ time wbzr push -d ../bzr/bzr.dev -r 2000 bzr://localhost/test-2a/bzr -DIDS:never
> real 4m32.578s
>
> This is 4m32s down from 11m12s for IDS (file:// to file://). Maybe something
> did get broken. I'll be running some more tests.

Where did you get to with your tests?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.2 KiB)

+* InterDifferingSerializer has been removed. The transformations it
+ provided are now done automatically by StreamSource. (Andrew Bennetts)
+

^- So for starters this isn't true for now at least.

You added the debug flags "-DIDS:never" and "-DIDS:always". I wonder if they wouldn't be better as IDS_never and IDS_always, just because that would seem to fit better with how we have generally named them. Not a big deal.

+
+
+def _new_root_data_stream(
+ root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):

^- I find the wrapping to be a bit strange, since this and the next function don't have any parameters on the first line. More importantly, though, these should probably at least have a minimal docstring to help understand what is going on.

It is nice that you were able to factor out these helpers so that we can consistently generate the root keys and their parents. It is a shame that _new_root_data_stream is being called on:
        rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)

Which is implemented by iterating all revision trees, which requires parsing all of the revisions. Though it then immediately goes on to use those trees to generate the inventory deltas.

    def _find_root_ids(self, revs, parent_map, graph):
        revision_root = {}
        for tree in self.iter_rev_trees(revs):
            revision_id = tree.inventory.root.revision
            root_id = tree.get_root_id()
            revision_root[revision_id] = root_id
        # Find out which parents we don't already know root ids for
        parents = set()
        for revision_parents in parent_map.itervalues():
            parents.update(revision_parents)
        parents.difference_update(revision_root.keys() + [NULL_REVISION])

^- We've certainly had this pattern enough that we really should consider factoring it out into a helper. But it seems the preferred form is:

parents = set()
map(parents.update, parent_map.itervalues())
parents.difference_update(revision_root)
parents.discard(NULL_REVISION)

That should perform better than what we have. (No need to generate a list from keys() nor append to it NULL_REVISION, creating potentially a 3rd all-keys list.)

        # Limit to revisions present in the versionedfile
        parents = graph.get_parent_map(parents).keys()
        for tree in self.iter_rev_trees(parents):
            root_id = tree.get_root_id()
            revision_root[tree.get_revision_id()] = root_id
        return revision_root

I've been doing some more performance testing, and I've basically seen stuff
like:

bzr branch mysql-525 local -DIDS:always
3m15s

bzr branch mysql-525 local -DIDS:always --XML8-nocopy
2m30s

bzr branch mysql-525 local -DIDS:never
4m01s

bzr branch mysql-525 bzr://localhost/remote -DIDS:never
3m25s

So it is slower, but only about 25% slower (except for the extra-optimized
XML8-nocopy).

I think the new code converts almost as fast, and the extra repacking from IDS
actually costs it a bit of time (other than saving it disk space). The only
major deficit at this point is that there is no progress indication. So I'd
probably stick with IDS for local operations just because of that.
...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (10.9 KiB)

Overall, this is looking pretty good. A few small tweaks here and there, and possible concerns. But only one major concern.

You changed the default return value of "iter_inventories()" to return 'unordered' results, which means that "revision_trees()" also not returns 'unordered' results. Which is fairly serious, and I was able to find more than one code path that was relying on the ordering of revision_trees(). So I think the idea is sound, but it needs to be exposed in a backwards compatible manner by making it a flag that defaults to "as-requested" ordering, and then we fix up code paths as we can.

I don't specifically need to review that change, though. And I don't really want to review this patch yet again :). So I'm voting tweak, and you can submit if you agree with my findings.

119 + from bzrlib.graph import FrozenHeadsCache
120 + graph = FrozenHeadsCache(graph)
121 + new_roots_stream = _new_root_data_stream(
122 + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
123 + return [('texts', new_roots_stream)]

^- I'm pretty sure if we are using FrozenHeadsCache that we really want to use KnownGraph instead. We have a dict 'parent_map' which means we can do much more efficient heads() checks since the whole graph is already loaded. This is a minor thing we can do later, but it would probably be good not to forget.

...

392 - elif last_modified[-1] == ':':
393 - raise errors.BzrError('special revisionid found: %r' % line)
394 - if not delta_tree_references and content.startswith('tree\x00'):
395 + elif newpath_utf8 != 'None' and last_modified[-1] == ':':
396 + # Deletes have a last_modified of null:, but otherwise special
397 + # revision ids should not occur.
398 + raise errors.BzrError('special revisionid found: %r' % line)
399 + if delta_tree_references is False and content.startswith('tree\x00'):

^- What does "newpath_utf8 != 'None'" mean if someone does:

touch None
bzr add None
bzr commit -m "adding None"

Is this a serialization bug waiting to be exposed?

I guess not as it seems the paths are always prefixed with "/", right?

(So a path of None would actually be "newpath_utf8 == '/None'")

However, further down (sorry about the bad indenting):

413 if newpath_utf8 == 'None':
414 newpath = None

^- here you set "newpath=None" but you *don't* set "newpath_utf8" to None.

415 + elif newpath_utf8[:1] != '/':
416 + raise errors.BzrError(
417 + "newpath invalid (does not start with /): %r"
418 + % (newpath_utf8,))
419 else:
420 + # Trim leading slash
421 + newpath_utf8 = newpath_utf8[1:]
422 newpath = newpath_utf8.decode('utf8')
423 + content_tuple = tuple(content.split('\x00'))
424 + if content_tuple[0] == 'deleted':
425 + entry = None
426 + else:
427 + entry = _parse_entry(
428 + newpath_utf8, file_id, parent_id, last_modified,
429 + content_tuple)

And then here "newpath_utf8" is passed to _parse_entry.

Now I realize this is probably caught by "content_tuple[0] == 'deleted'". Though it feels a bit icky to rely on "newpath_utf8" in one portion and "content_tuple[0]" in another (since they can potentially be out of sync.)

I think if we just force an early error with:
if newpath_utf8 == 'None':
  newpath = None
  ...

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-06 at 16:15 +0000, John A Meinel wrote:
>
> if utf8_path.startswith('/'):
>
> ^- If this is a core routine (something called for every path) then:
> if utf8_path[:1] == '/':
>
> *is* faster than .startswith() because you
> 1) Don't have a function call
> 2) Don't have an attribute lookup
>
> I'm assuming this is a function that gets called a lot. If not, don't
> worry about it.

utf8_path[:1] == '/' requires a string copy though, for all that its
heavily tuned in the VM.

> ...
>
> 566 + if required_version < (1, 18):
> 567 + # Remote side doesn't support inventory deltas. Wrap the
> stream to
> 568 + # make sure we don't send any. If the stream contains
> inventory
> 569 + # deltas we'll interrupt the smart insert_stream request and
> 570 + # fallback to VFS.
> 571 + stream = self._stop_stream_if_inventory_delta(stream)
>
> ^- it seems a bit of a shame that if we don't support deltas we fall
> back to
> VFS completely, rather than trying something intermediate (like
> falling back to
> the original code path of sending full inventory texts, or IDS, or...)
>
> I think we are probably okay, but this code at least raises a flag. I
> expect a
> bug report along the lines of "fetching between 1.18 and older server
> is very
> slow". I haven't looked at all the code paths to determine if 1.18
> will have
> regressed against a 1.17 server. Especially when *not* converting
> formats. Have
> you at least manually tested this?

We don't want to require rewindable streams; falling back to VFS is by
far the cleanest way to fallback without restarting the stream or
requiring rewinding. I agree that there is a risk of performance issues,
OTOH launchpad, our largest deployment, will upgrade quickly :).
...
> ^- I think this should probably be ".network_name() ==
> other.network_name()"
> and we just customize the names to be the same. Is that possible to
> do?

It would be a little difficult with clients deployed already that don't
know the names are the same, and further, it would make it impossible to
request initialisation of one of them. In my review I' suggesting just
serializer equality is all thats needed.

Anyhow, my review is about half done, getting back to it ones the mail
surge is complete.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (23.5 KiB)

On Wed, 2009-08-05 at 02:45 +0000, Andrew Bennetts wrote:
> === modified file 'NEWS'

>
> +* InterDifferingSerializer has been removed. The transformations it
> + provided are now done automatically by StreamSource. (Andrew Bennetts)

^ has it?

> === modified file 'bzrlib/fetch.py'
> --- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
> +++ bzrlib/fetch.py 2009-07-29 07:08:54 +0000

> @@ -249,20 +251,77 @@
> # yet, and are unlikely to in non-rich-root environments anyway.
> root_id_order.sort(key=operator.itemgetter(0))
> # Create a record stream containing the roots to create.
> - def yield_roots():
> - for key in root_id_order:
> - root_id, rev_id = key
> - rev_parents = parent_map[rev_id]
> - # We drop revision parents with different file-ids, because
> - # that represents a rename of the root to a different location
> - # - its not actually a parent for us. (We could look for that
> - # file id in the revision tree at considerably more expense,
> - # but for now this is sufficient (and reconcile will catch and
> - # correct this anyway).
> - # When a parent revision is a ghost, we guess that its root id
> - # was unchanged (rather than trimming it from the parent list).
> - parent_keys = tuple((root_id, parent) for parent in rev_parents
> - if parent != NULL_REVISION and
> - rev_id_to_root_id.get(parent, root_id) == root_id)
> - yield FulltextContentFactory(key, parent_keys, None, '')
> - return [('texts', yield_roots())]
> + from bzrlib.graph import FrozenHeadsCache
> + graph = FrozenHeadsCache(graph)
> + new_roots_stream = _new_root_data_stream(
> + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> + return [('texts', new_roots_stream)]
> +

These functions have a lot of parameters. Perhaps that would work better
as state on the Source?

They need docs/more docs respectively.

> +def _new_root_data_stream(
> + root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
> ...
> +def _parent_keys_for_root_version(
> + root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):

> === modified file 'bzrlib/help_topics/en/debug-flags.txt'
> --- bzrlib/help_topics/en/debug-flags.txt 2009-07-24 03:15:56 +0000
> +++ bzrlib/help_topics/en/debug-flags.txt 2009-08-05 02:05:43 +0000
> @@ -12,6 +12,7 @@
> operations.
> -Dfetch Trace history copying between repositories.
> -Dfilters Emit information for debugging content filtering.
> +-Dforceinvdeltas Force use of inventory deltas during generic streaming fetch.
> -Dgraph Trace graph traversal.
> -Dhashcache Log every time a working file is read to determine its hash.
> -Dhooks Trace hook execution.
> @@ -26,3 +27,7 @@
> -Dunlock Some errors during unlock are treated as warnings.
> -Dpack Emit information about pack operations.
> -Ds...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (12.6 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> Overall, this is looking pretty good. A few small tweaks here and there, and
> possible concerns. But only one major concern.

That's great news. Thanks very much for the thorough review.

> You changed the default return value of "iter_inventories()" to return
> 'unordered' results, which means that "revision_trees()" also not returns
> 'unordered' results. Which is fairly serious, and I was able to find more than
> one code path that was relying on the ordering of revision_trees(). So I think
> the idea is sound, but it needs to be exposed in a backwards compatible manner
> by making it a flag that defaults to "as-requested" ordering, and then we fix
> up code paths as we can.
>
> I don't specifically need to review that change, though. And I don't really
> want to review this patch yet again :). So I'm voting tweak, and you can
> submit if you agree with my findings.

Ouch, yes, that does sound serious. I'll fix that.

> 119 + from bzrlib.graph import FrozenHeadsCache
> 120 + graph = FrozenHeadsCache(graph)
> 121 + new_roots_stream = _new_root_data_stream(
> 122 + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> 123 + return [('texts', new_roots_stream)]
>
> ^- I'm pretty sure if we are using FrozenHeadsCache that we really want to use
> KnownGraph instead. We have a dict 'parent_map' which means we can do much
> more efficient heads() checks since the whole graph is already loaded. This is
> a minor thing we can do later, but it would probably be good not to forget.

That's a good idea, I'll try that. Perhaps we should just grep the entire code
base for FrozenHeadsCache and replace all uses with KnownGraph...

> ...
>
> 392 - elif last_modified[-1] == ':':
> 393 - raise errors.BzrError('special revisionid found: %r' % line)
> 394 - if not delta_tree_references and content.startswith('tree\x00'):
> 395 + elif newpath_utf8 != 'None' and last_modified[-1] == ':':
> 396 + # Deletes have a last_modified of null:, but otherwise special
> 397 + # revision ids should not occur.
> 398 + raise errors.BzrError('special revisionid found: %r' % line)
> 399 + if delta_tree_references is False and content.startswith('tree\x00'):
>
> ^- What does "newpath_utf8 != 'None'" mean if someone does:
>
> touch None
> bzr add None
> bzr commit -m "adding None"
>
> Is this a serialization bug waiting to be exposed?
>
> I guess not as it seems the paths are always prefixed with "/", right?
>
> (So a path of None would actually be "newpath_utf8 == '/None'")

That's right. No bug here.

> However, further down (sorry about the bad indenting):
>
> 413 if newpath_utf8 == 'None':
> 414 newpath = None
>
> ^- here you set "newpath=None" but you *don't* set "newpath_utf8" to None.
[...]
> And then here "newpath_utf8" is passed to _parse_entry.
>
> Now I realize this is probably caught by "content_tuple[0] == 'deleted'".
> Though it feels a bit icky to rely on "newpath_utf8" in one portion and
> "content_tuple[0]" in another (since they can potentially be out of sync.)

Yes. There's a bit of ickiness here too that _parse_entry redoes the utf8
decoding. I'll clean this up.

> I think if we jus...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (24.0 KiB)

Robert Collins wrote:
> On Wed, 2009-08-05 at 02:45 +0000, Andrew Bennetts wrote:
> > === modified file 'NEWS'
>
> >
> > +* InterDifferingSerializer has been removed. The transformations it
> > + provided are now done automatically by StreamSource. (Andrew Bennetts)
>
> ^ has it?

Oops. Updated:

* InterDifferingSerializer is now only used locally. Other fetches that
  would have used InterDifferingSerializer now use the more network
  friendly StreamSource, which now automatically does the same
  transformations as InterDifferingSerializer. (Andrew Bennetts)

> > === modified file 'bzrlib/fetch.py'
> > --- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
> > +++ bzrlib/fetch.py 2009-07-29 07:08:54 +0000
>
> > @@ -249,20 +251,77 @@
> > # yet, and are unlikely to in non-rich-root environments anyway.
> > root_id_order.sort(key=operator.itemgetter(0))
> > # Create a record stream containing the roots to create.
> > - def yield_roots():
> > - for key in root_id_order:
> > - root_id, rev_id = key
> > - rev_parents = parent_map[rev_id]
> > - # We drop revision parents with different file-ids, because
> > - # that represents a rename of the root to a different location
> > - # - its not actually a parent for us. (We could look for that
> > - # file id in the revision tree at considerably more expense,
> > - # but for now this is sufficient (and reconcile will catch and
> > - # correct this anyway).
> > - # When a parent revision is a ghost, we guess that its root id
> > - # was unchanged (rather than trimming it from the parent list).
> > - parent_keys = tuple((root_id, parent) for parent in rev_parents
> > - if parent != NULL_REVISION and
> > - rev_id_to_root_id.get(parent, root_id) == root_id)
> > - yield FulltextContentFactory(key, parent_keys, None, '')
> > - return [('texts', yield_roots())]
> > + from bzrlib.graph import FrozenHeadsCache
> > + graph = FrozenHeadsCache(graph)
> > + new_roots_stream = _new_root_data_stream(
> > + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> > + return [('texts', new_roots_stream)]
> > +
>
> These functions have a lot of parameters. Perhaps that would work better
> as state on the Source?

Perhaps, although InterDifferingSerializer uses it too (that's where I
originally extracted this code from). Something to look at again when we remove
IDS, I think.

> They need docs/more docs respectively.

The methods they were refactored from didn't have docs either :P

Docs extended.

> > === modified file 'bzrlib/help_topics/en/debug-flags.txt'
> > --- bzrlib/help_topics/en/debug-flags.txt 2009-07-24 03:15:56 +0000
> > +++ bzrlib/help_topics/en/debug-flags.txt 2009-08-05 02:05:43 +0000
> > @@ -12,6 +12,7 @@
> > operations.
> > -Dfetch Trace history copying between repositories.
> > -Dfilters Emit information for debugging conte...

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (15.2 KiB)

On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:

I've trimmed things that are fine.

[debug flags]
> I hope we don't need to ask people to use them, but they are a cheap insurance
> policy.
>
> More usefully, they are helpful for benchmarking and testing. I'm ok with
> hiding them if you like.

I don't have a strong opinion. The flags are in our docs though as it
stands, so they should be clear to people reading them rather than
perhaps causing confusion.

> > > === modified file 'bzrlib/inventory_delta.py'
> > > --- bzrlib/inventory_delta.py 2009-04-02 05:53:12 +0000
> > > +++ bzrlib/inventory_delta.py 2009-08-05 02:30:11 +0000
> >
> > >
> > > - def __init__(self, versioned_root, tree_references):
> > > - """Create an InventoryDeltaSerializer.
> > > + def __init__(self):
> > > + """Create an InventoryDeltaSerializer."""
> > > + self._versioned_root = None
> > > + self._tree_references = None
> > > + self._entry_to_content = {
> > > + 'directory': _directory_content,
> > > + 'file': _file_content,
> > > + 'symlink': _link_content,
> > > + }
> > > +
> > > + def require_flags(self, versioned_root=None, tree_references=None):
> > > + """Set the versioned_root and/or tree_references flags for this
> > > + (de)serializer.
> >
> > ^ why is this not in the constructor? You make the fields settable only
> > once, which seems identical to being set from __init__, but harder to
> > use. As its required to be called, it should be documented in the class
> > or __init__ docstring or something like that.
>
> This is a small step towards a larger change that I don't want to tackle right
> now.
>
> We really ought to have separate classes for the serializer and for the
> deserializer. The serializer could indeed have these set at construction time,
> I think.
>
> For deserializing, we don't necessary know, or care, what the flags are; e.g. a
> repo that supports rich-root and tree-refs can deserialise any delta. It was
> pretty ugly trying to cope with declaring this upfront in the code; hence this
> method which defers it.

In principle yes, but our code always knows the source serializer,
doesn't it?

Anyhow, at the moment it seems unclear and confusing, rather than
clearer. I would find it clearer as either being on the __init__, I
think, or perhaps with two seperate constructors? No biggy, not enough
to block the patch on, but it is a very awkward halfway house at the
moment - and I wouldn't want to see it left that way - so I'm concerned
that you might switch focus away from this straight after landing it.

> > > === modified file 'bzrlib/remote.py'
> [...]
> > > + @property
> > > + def repository_class(self):
> > > + self._ensure_real()
> > > + return self._custom_format.repository_class
> > > +
> >
> > ^ this property sets off alarm bells for me. What its for?
>
> Hmm, good question... ah, I think it's for the to_format.repository_class in
> CHKRepository._get_source:
>
> def _get_source(self, to_format):
> """Return a source for streaming from this repository."""
> if (to_format.support...

Revision history for this message
Andrew Bennetts (spiv) wrote :

[not a complete reply, just for some points that have been dealt with]

Robert Collins wrote:
[...]
> > Ok, I've deleted the repository_class part of the if. Should I delete the
> > “to_format.supports_chk” part too?
>
> yes. self is chk_supporting, so self._serializer ==
> to_format.serializer, or whatever, is fine.

Done.

> > > > + if not unstacked_repo._format.supports_chks:
> > > > + # these assertions aren't valid for groupcompress repos, which may
> > > > + # transfer data than strictly necessary to avoid breaking up an
> > > > + # already-compressed block of data.
> > > > + self.assertFalse(unstacked_repo.has_revision('left'))
> > > > + self.assertFalse(unstacked_repo.has_revision('right'))
> > >
> > > ^ please check the comment, its not quite clear.
> >
> > s/transfer data/transfer more data/
> >
> > I'm not 100% certain that's actually true? What do you think with the adjusted
> > comment?
>
> Big alarm bell. gc repos aren't permitted to do that for the revision
> objects, because of our invariants about revisions. Something else is
> wrong - that if _must not_ be needed.

That if guards turns out to be unnecessary; that test passes for all scenarios
with those assertions run unconditionally. Whatever bug I had that prompted me
to add it is clearly gone.

-Andrew.

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/7 Robert Collins <email address hidden>:
> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
>
> I've trimmed things that are fine.
>
> [debug flags]
>> I hope we don't need to ask people to use them, but they are a cheap insurance
>> policy.
>>
>> More usefully, they are helpful for benchmarking and testing.  I'm ok with
>> hiding them if you like.
>
> I don't have a strong opinion. The flags are in our docs though as it
> stands, so they should be clear to people reading them rather than
> perhaps causing confusion.

I don't want the merge to get hung up on them, but I think they're
worth putting in.

It might get confusing if there are lots of flags mentioned in the
user documentation; also we may want to distinguish those ones that
change behaviour from those that are safe to leave on all the time to
gather data.

Andrew and I talked about this the other day and the reasoning was
this: we observed he's testing and comparing this by commenting out
some code. If someone's doing that (and not just for a quick
ten-minute comparison), it _may_ be worth leaving a debug option to do
it in future, so that

1- they don't accidentally leave it disabled (as has sometimes happened before)
2- other people can quickly repeat the same test with just the same change
3- if it turns out that the change does have performance or
functionality issues, users can try it with the other behaviour

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-08-06 at 16:15 +0000, John A Meinel wrote:
>> if utf8_path.startswith('/'):
>>
>> ^- If this is a core routine (something called for every path) then:
>> if utf8_path[:1] == '/':
>>
>> *is* faster than .startswith() because you
>> 1) Don't have a function call
>> 2) Don't have an attribute lookup
>>
>> I'm assuming this is a function that gets called a lot. If not, don't
>> worry about it.
>
> utf8_path[:1] == '/' requires a string copy though, for all that its
> heavily tuned in the VM.

Well, a single character string is a singleton so doesn't actually
require a malloc. Certainly that is a VM implementation, but still, time
it yourself.

A function call plus attribute lookup is *much* slower than mallocing a
string (by my testing).

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp8MtUACgkQJdeBCYSNAAM+sgCfWj/3SNNI7Adqs8hYI7vLeSJs
kLcAn3kE/ym/HYIn9KgM7b6TQH0+4uSq
=NhGl
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>>>> + if not unstacked_repo._format.supports_chks:
>>>> + # these assertions aren't valid for groupcompress repos, which may
>>>> + # transfer data than strictly necessary to avoid breaking up an
>>>> + # already-compressed block of data.
>>>> + self.assertFalse(unstacked_repo.has_revision('left'))
>>>> + self.assertFalse(unstacked_repo.has_revision('right'))
>>> ^ please check the comment, its not quite clear.
>> s/transfer data/transfer more data/
>>
>> I'm not 100% certain that's actually true? What do you think with the adjusted
>> comment?
>
> Big alarm bell. gc repos aren't permitted to do that for the revision
> objects, because of our invariants about revisions. Something else is
> wrong - that if _must not_ be needed.

Actually gc repos may send more data in a group, but they don't
*reference* the data. So the comment is actually incorrect. They don't
reference any more data than the strictly necessary set, and something
like "has_revision" should *definitely* be failing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp8NvQACgkQJdeBCYSNAAP4YgCcCmrttXg3iIomKS+JEsm5S6ih
pUMAoKpxevabz88K2yG5ZTx1zhB0s2Bc
=Yx5Q
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (8.7 KiB)

Robert Collins wrote:
> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
[...]
> [debug flags]
> > I hope we don't need to ask people to use them, but they are a cheap insurance
> > policy.
> >
> > More usefully, they are helpful for benchmarking and testing. I'm ok with
> > hiding them if you like.
>
> I don't have a strong opinion. The flags are in our docs though as it
> stands, so they should be clear to people reading them rather than
> perhaps causing confusion.

I'll leave them in, assuming I can find a way to make the ReST markup cope with
them...

[inventory_delta.py API]
> Anyhow, at the moment it seems unclear and confusing, rather than
> clearer. I would find it clearer as either being on the __init__, I
> think, or perhaps with two seperate constructors? No biggy, not enough
> to block the patch on, but it is a very awkward halfway house at the
> moment - and I wouldn't want to see it left that way - so I'm concerned
> that you might switch focus away from this straight after landing it.

I've now split the class into separate serializer and deserializer, which gives
“two separate constructors”. I think it's an improvement, I hope you'll think
so too!

[repacking single pack bug]
> > I don't understand why that case isn't covered. Perhaps you should file the bug
> > instead of me?
>
> I'd like to make sure I explain it well enough first; once you
> understand either of us can file it - and I'll be happy enough to be
> that person.
[...]
> What you need to do is change the test from len(self.packs) == 1, to
> len(packs being combined) == 1 and that_pack has the same hash.

But AIUI “self.packs” on a Packer *is* “packs being combined”. If it's not then
your explanation makes sense, but my reading of the code says otherwise.

[...]
> > > > - if not hint or pack.name in hint:
> > > > + if hint is None or pack.name in hint:
> > >
> > > ^- the original form is actually faster, AFAIK. because it skips the in
> > > test for a hint of []. I'd rather we didn't change it, for all that its
> > > not in a common code path.
> >
> > But if hint is None, then “pack.name in hint” will fail.
>
> It would, but it won't execute. As we discussed, if you could change it
> back, but add a comment - its clearly worthy of expanding on (in either
> form the necessity for testing hint isn't obvious, apparently).

Ok. (I still find it cleaner to have the hint is None case handled distinctly
from the hint is a list case, it seems to express the intent more clearly to me,
rather than making it appear to simply be a micro-optimisation. But I
understand that tastes vary.)

Oh, actually, it does matter. When hint == [], no packs should be repacked.
When hint is None, all packs should be repacked.

I've added this comment:

                # Either no hint was provided (so we are packing everything),
                # or this pack was included in the hint.

[...]
> > > > + def _should_fake_unknown(self):
[...]
>
> As we discussed, I think the method should say what it does; the fact
> that the base class encodes more complex policy than a child is a smell
> (not one that you need to fix), but it would be very nice...

Read more...

1=== modified file 'NEWS'
2--- NEWS 2009-08-13 12:51:59 +0000
3+++ NEWS 2009-08-13 03:29:52 +0000
4@@ -158,6 +158,10 @@
5 lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or
6 ``do_end``. (Andrew Bennetts, #338561)
7
8+* ``RemoteStreamSource.get_stream_for_missing_keys`` will fetch CHK
9+ inventory pages when appropriate (by falling back to the vfs stream
10+ source). (Andrew Bennetts, #406686)
11+
12 * Streaming from bzr servers where there is a chain of stacked branches
13 (A stacked on B stacked on C) will now work. (Robert Collins, #406597)
14
15@@ -249,8 +253,10 @@
16 * ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds
17 as new a key which was already mapped. (Robert Collins)
18
19-* InterDifferingSerializer has been removed. The transformations it
20- provided are now done automatically by StreamSource. (Andrew Bennetts)
21+* InterDifferingSerializer is now only used locally. Other fetches that
22+ would have used InterDifferingSerializer now use the more network
23+ friendly StreamSource, which now automatically does the same
24+ transformations as InterDifferingSerializer. (Andrew Bennetts)
25
26 * Inventory delta application catches more cases of corruption and can
27 prevent corrupt deltas from affecting consistency of data structures on
28
29=== modified file 'bzrlib/fetch.py'
30--- bzrlib/fetch.py 2009-07-29 07:08:54 +0000
31+++ bzrlib/fetch.py 2009-08-07 04:27:02 +0000
32@@ -260,6 +260,19 @@
33
34 def _new_root_data_stream(
35 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
36+ """Generate a texts substream of synthesised root entries.
37+
38+ Used in fetches that do rich-root upgrades.
39+
40+ :param root_keys_to_create: iterable of (root_id, rev_id) pairs describing
41+ the root entries to create.
42+ :param rev_id_to_root_id_map: dict of known rev_id -> root_id mappings for
43+ calculating the parents. If a parent rev_id is not found here then it
44+ will be recalculated.
45+ :param parent_map: a parent map for all the revisions in
46+ root_keys_to_create.
47+ :param graph: a graph to use instead of repo.get_graph().
48+ """
49 for root_key in root_keys_to_create:
50 root_id, rev_id = root_key
51 parent_keys = _parent_keys_for_root_version(
52@@ -270,7 +283,10 @@
53
54 def _parent_keys_for_root_version(
55 root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
56- """Get the parent keys for a given root id."""
57+ """Get the parent keys for a given root id.
58+
59+ A helper function for _new_root_data_stream.
60+ """
61 # Include direct parents of the revision, but only if they used the same
62 # root_id and are heads.
63 rev_parents = parent_map[rev_id]
64
65=== modified file 'bzrlib/inventory_delta.py'
66--- bzrlib/inventory_delta.py 2009-08-05 02:30:11 +0000
67+++ bzrlib/inventory_delta.py 2009-08-11 08:40:32 +0000
68@@ -29,6 +29,25 @@
69 from bzrlib import inventory
70 from bzrlib.revision import NULL_REVISION
71
72+FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
73+
74+
75+class InventoryDeltaError(errors.BzrError):
76+ """An error when serializing or deserializing an inventory delta."""
77+
78+ # Most errors when serializing and deserializing are due to bugs, although
79+ # damaged input (i.e. a bug in a different process) could cause
80+ # deserialization errors too.
81+ internal_error = True
82+
83+
84+class IncompatibleInventoryDelta(errors.BzrError):
85+ """The delta could not be deserialised because its contents conflict with
86+ the allow_versioned_root or allow_tree_references flags of the
87+ deserializer.
88+ """
89+ internal_error = False
90+
91
92 def _directory_content(entry):
93 """Serialize the content component of entry which is a directory.
94@@ -49,7 +68,7 @@
95 exec_bytes = ''
96 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)
97 if None in size_exec_sha:
98- raise errors.BzrError('Missing size or sha for %s' % entry.file_id)
99+ raise InventoryDeltaError('Missing size or sha for %s' % entry.file_id)
100 return "file\x00%d\x00%s\x00%s" % size_exec_sha
101
102
103@@ -60,7 +79,7 @@
104 """
105 target = entry.symlink_target
106 if target is None:
107- raise errors.BzrError('Missing target for %s' % entry.file_id)
108+ raise InventoryDeltaError('Missing target for %s' % entry.file_id)
109 return "link\x00%s" % target.encode('utf8')
110
111
112@@ -71,7 +90,8 @@
113 """
114 tree_revision = entry.reference_revision
115 if tree_revision is None:
116- raise errors.BzrError('Missing reference revision for %s' % entry.file_id)
117+ raise InventoryDeltaError(
118+ 'Missing reference revision for %s' % entry.file_id)
119 return "tree\x00%s" % tree_revision
120
121
122@@ -116,39 +136,22 @@
123
124
125 class InventoryDeltaSerializer(object):
126- """Serialize and deserialize inventory deltas."""
127-
128- # XXX: really, the serializer and deserializer should be two separate
129- # classes.
130-
131- FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
132-
133- def __init__(self):
134- """Create an InventoryDeltaSerializer."""
135- self._versioned_root = None
136- self._tree_references = None
137+ """Serialize inventory deltas."""
138+
139+ def __init__(self, versioned_root, tree_references):
140+ """Create an InventoryDeltaSerializer.
141+
142+ :param versioned_root: If True, any root entry that is seen is expected
143+ to be versioned, and root entries can have any fileid.
144+ :param tree_references: If True support tree-reference entries.
145+ """
146+ self._versioned_root = versioned_root
147+ self._tree_references = tree_references
148 self._entry_to_content = {
149 'directory': _directory_content,
150 'file': _file_content,
151 'symlink': _link_content,
152 }
153-
154- def require_flags(self, versioned_root=None, tree_references=None):
155- """Set the versioned_root and/or tree_references flags for this
156- (de)serializer.
157-
158- :param versioned_root: If True, any root entry that is seen is expected
159- to be versioned, and root entries can have any fileid.
160- :param tree_references: If True support tree-reference entries.
161- """
162- if versioned_root is not None and self._versioned_root is not None:
163- raise AssertionError(
164- "require_flags(versioned_root=...) already called.")
165- if tree_references is not None and self._tree_references is not None:
166- raise AssertionError(
167- "require_flags(tree_references=...) already called.")
168- self._versioned_root = versioned_root
169- self._tree_references = tree_references
170 if tree_references:
171 self._entry_to_content['tree-reference'] = _reference_content
172
173@@ -167,10 +170,6 @@
174 takes.
175 :return: The serialized delta as lines.
176 """
177- if self._versioned_root is None or self._tree_references is None:
178- raise AssertionError(
179- "Cannot serialise unless versioned_root/tree_references flags "
180- "are both set.")
181 if type(old_name) is not str:
182 raise TypeError('old_name should be str, got %r' % (old_name,))
183 if type(new_name) is not str:
184@@ -180,11 +179,11 @@
185 for delta_item in delta_to_new:
186 line = to_line(delta_item, new_name)
187 if line.__class__ != str:
188- raise errors.BzrError(
189+ raise InventoryDeltaError(
190 'to_line generated non-str output %r' % lines[-1])
191 lines.append(line)
192 lines.sort()
193- lines[0] = "format: %s\n" % InventoryDeltaSerializer.FORMAT_1
194+ lines[0] = "format: %s\n" % FORMAT_1
195 lines[1] = "parent: %s\n" % old_name
196 lines[2] = "version: %s\n" % new_name
197 lines[3] = "versioned_root: %s\n" % self._serialize_bool(
198@@ -234,23 +233,37 @@
199 # file-ids other than TREE_ROOT, e.g. repo formats that use the
200 # xml5 serializer.
201 if last_modified != new_version:
202- raise errors.BzrError(
203+ raise InventoryDeltaError(
204 'Version present for / in %s (%s != %s)'
205 % (file_id, last_modified, new_version))
206 if last_modified is None:
207- raise errors.BzrError("no version for fileid %s" % file_id)
208+ raise InventoryDeltaError("no version for fileid %s" % file_id)
209 content = self._entry_to_content[entry.kind](entry)
210 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %
211 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,
212 content))
213
214+
215+class InventoryDeltaDeserializer(object):
216+ """Deserialize inventory deltas."""
217+
218+ def __init__(self, allow_versioned_root=True, allow_tree_references=True):
219+ """Create an InventoryDeltaDeserializer.
220+
221+ :param versioned_root: If True, any root entry that is seen is expected
222+ to be versioned, and root entries can have any fileid.
223+ :param tree_references: If True support tree-reference entries.
224+ """
225+ self._allow_versioned_root = allow_versioned_root
226+ self._allow_tree_references = allow_tree_references
227+
228 def _deserialize_bool(self, value):
229 if value == "true":
230 return True
231 elif value == "false":
232 return False
233 else:
234- raise errors.BzrError("value %r is not a bool" % (value,))
235+ raise InventoryDeltaError("value %r is not a bool" % (value,))
236
237 def parse_text_bytes(self, bytes):
238 """Parse the text bytes of a serialized inventory delta.
239@@ -266,32 +279,24 @@
240 """
241 if bytes[-1:] != '\n':
242 last_line = bytes.rsplit('\n', 1)[-1]
243- raise errors.BzrError('last line not empty: %r' % (last_line,))
244+ raise InventoryDeltaError('last line not empty: %r' % (last_line,))
245 lines = bytes.split('\n')[:-1] # discard the last empty line
246- if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:
247- raise errors.BzrError('unknown format %r' % lines[0:1])
248+ if not lines or lines[0] != 'format: %s' % FORMAT_1:
249+ raise InventoryDeltaError('unknown format %r' % lines[0:1])
250 if len(lines) < 2 or not lines[1].startswith('parent: '):
251- raise errors.BzrError('missing parent: marker')
252+ raise InventoryDeltaError('missing parent: marker')
253 delta_parent_id = lines[1][8:]
254 if len(lines) < 3 or not lines[2].startswith('version: '):
255- raise errors.BzrError('missing version: marker')
256+ raise InventoryDeltaError('missing version: marker')
257 delta_version_id = lines[2][9:]
258 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):
259- raise errors.BzrError('missing versioned_root: marker')
260+ raise InventoryDeltaError('missing versioned_root: marker')
261 delta_versioned_root = self._deserialize_bool(lines[3][16:])
262 if len(lines) < 5 or not lines[4].startswith('tree_references: '):
263- raise errors.BzrError('missing tree_references: marker')
264+ raise InventoryDeltaError('missing tree_references: marker')
265 delta_tree_references = self._deserialize_bool(lines[4][17:])
266- if (self._versioned_root is not None and
267- delta_versioned_root != self._versioned_root):
268- raise errors.BzrError(
269- "serialized versioned_root flag is wrong: %s" %
270- (delta_versioned_root,))
271- if (self._tree_references is not None
272- and delta_tree_references != self._tree_references):
273- raise errors.BzrError(
274- "serialized tree_references flag is wrong: %s" %
275- (delta_tree_references,))
276+ if (not self._allow_versioned_root and delta_versioned_root):
277+ raise IncompatibleInventoryDelta("versioned_root not allowed")
278 result = []
279 seen_ids = set()
280 line_iter = iter(lines)
281@@ -302,24 +307,30 @@
282 content) = line.split('\x00', 5)
283 parent_id = parent_id or None
284 if file_id in seen_ids:
285- raise errors.BzrError(
286+ raise InventoryDeltaError(
287 "duplicate file id in inventory delta %r" % lines)
288 seen_ids.add(file_id)
289 if (newpath_utf8 == '/' and not delta_versioned_root and
290 last_modified != delta_version_id):
291- # Delta claims to be not rich root, yet here's a root entry
292- # with either non-default version, i.e. it's rich...
293- raise errors.BzrError("Versioned root found: %r" % line)
294+ # Delta claims to be not have a versioned root, yet here's
295+ # a root entry with a non-default version.
296+ raise InventoryDeltaError("Versioned root found: %r" % line)
297 elif newpath_utf8 != 'None' and last_modified[-1] == ':':
298 # Deletes have a last_modified of null:, but otherwise special
299 # revision ids should not occur.
300- raise errors.BzrError('special revisionid found: %r' % line)
301- if delta_tree_references is False and content.startswith('tree\x00'):
302- raise errors.BzrError("Tree reference found: %r" % line)
303+ raise InventoryDeltaError('special revisionid found: %r' % line)
304+ if content.startswith('tree\x00'):
305+ if delta_tree_references is False:
306+ raise InventoryDeltaError(
307+ "Tree reference found (but header said "
308+ "tree_references: false): %r" % line)
309+ elif not self._allow_tree_references:
310+ raise IncompatibleInventoryDelta(
311+ "Tree reference not allowed")
312 if oldpath_utf8 == 'None':
313 oldpath = None
314 elif oldpath_utf8[:1] != '/':
315- raise errors.BzrError(
316+ raise InventoryDeltaError(
317 "oldpath invalid (does not start with /): %r"
318 % (oldpath_utf8,))
319 else:
320@@ -328,7 +339,7 @@
321 if newpath_utf8 == 'None':
322 newpath = None
323 elif newpath_utf8[:1] != '/':
324- raise errors.BzrError(
325+ raise InventoryDeltaError(
326 "newpath invalid (does not start with /): %r"
327 % (newpath_utf8,))
328 else:
329@@ -340,15 +351,14 @@
330 entry = None
331 else:
332 entry = _parse_entry(
333- newpath_utf8, file_id, parent_id, last_modified,
334- content_tuple)
335+ newpath, file_id, parent_id, last_modified, content_tuple)
336 delta_item = (oldpath, newpath, file_id, entry)
337 result.append(delta_item)
338 return (delta_parent_id, delta_version_id, delta_versioned_root,
339 delta_tree_references, result)
340
341
342-def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):
343+def _parse_entry(path, file_id, parent_id, last_modified, content):
344 entry_factory = {
345 'dir': _dir_to_entry,
346 'file': _file_to_entry,
347@@ -356,10 +366,10 @@
348 'tree': _tree_to_entry,
349 }
350 kind = content[0]
351- if utf8_path.startswith('/'):
352+ if path.startswith('/'):
353 raise AssertionError
354- path = utf8_path.decode('utf8')
355 name = basename(path)
356 return entry_factory[content[0]](
357 content, name, parent_id, file_id, last_modified)
358
359+
360
361=== modified file 'bzrlib/remote.py'
362--- bzrlib/remote.py 2009-08-13 12:51:59 +0000
363+++ bzrlib/remote.py 2009-08-13 08:21:02 +0000
364@@ -587,11 +587,6 @@
365 self._ensure_real()
366 return self._custom_format._serializer
367
368- @property
369- def repository_class(self):
370- self._ensure_real()
371- return self._custom_format.repository_class
372-
373
374 class RemoteRepository(_RpcHelper):
375 """Repository accessed over rpc.
376@@ -1684,9 +1679,6 @@
377
378 class RemoteStreamSink(repository.StreamSink):
379
380- def __init__(self, target_repo):
381- repository.StreamSink.__init__(self, target_repo)
382-
383 def _insert_real(self, stream, src_format, resume_tokens):
384 self.target_repo._ensure_real()
385 sink = self.target_repo._real_repository._get_sink()
386@@ -1708,6 +1700,10 @@
387 client = target._client
388 medium = client._medium
389 path = target.bzrdir._path_for_remote_call(client)
390+ # Probe for the verb to use with an empty stream before sending the
391+ # real stream to it. We do this both to avoid the risk of sending a
392+ # large request that is then rejected, and because we don't want to
393+ # implement a way to buffer, rewind, or restart the stream.
394 found_verb = False
395 for verb, required_version in candidate_calls:
396 if medium._is_remote_before(required_version):
397
398=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
399--- bzrlib/repofmt/groupcompress_repo.py 2009-08-13 12:51:59 +0000
400+++ bzrlib/repofmt/groupcompress_repo.py 2009-08-13 08:20:28 +0000
401@@ -484,6 +484,8 @@
402 old_pack = self.packs[0]
403 if old_pack.name == self.new_pack._hash.hexdigest():
404 # The single old pack was already optimally packed.
405+ trace.mutter('single pack %s was already optimally packed',
406+ old_pack.name)
407 self.new_pack.abort()
408 return None
409 self.pb.update('finishing repack', 6, 7)
410@@ -600,7 +602,7 @@
411 packer = GCCHKPacker(self, packs, '.autopack',
412 reload_func=reload_func)
413 try:
414- packer.pack()
415+ result = packer.pack()
416 except errors.RetryWithNewPacks:
417 # An exception is propagating out of this context, make sure
418 # this packer has cleaned up. Packer() doesn't set its new_pack
419@@ -609,6 +611,8 @@
420 if packer.new_pack is not None:
421 packer.new_pack.abort()
422 raise
423+ if result is None:
424+ return
425 for pack in packs:
426 self._remove_pack_from_memory(pack)
427 # record the newly available packs and stop advertising the old
428@@ -792,6 +796,8 @@
429
430 def _iter_inventories(self, revision_ids, ordering):
431 """Iterate over many inventory objects."""
432+ if ordering is None:
433+ ordering = 'unordered'
434 keys = [(revision_id,) for revision_id in revision_ids]
435 stream = self.inventories.get_record_stream(keys, ordering, True)
436 texts = {}
437@@ -903,9 +909,7 @@
438
439 def _get_source(self, to_format):
440 """Return a source for streaming from this repository."""
441- if (to_format.supports_chks and
442- self._format.repository_class is to_format.repository_class and
443- self._format._serializer == to_format._serializer):
444+ if self._format._serializer == to_format._serializer:
445 # We must be exactly the same format, otherwise stuff like the chk
446 # page layout might be different.
447 # Actually, this test is just slightly looser than exact so that
448
449=== modified file 'bzrlib/repofmt/pack_repo.py'
450--- bzrlib/repofmt/pack_repo.py 2009-08-13 12:51:59 +0000
451+++ bzrlib/repofmt/pack_repo.py 2009-08-13 07:26:29 +0000
452@@ -1575,6 +1575,8 @@
453 pack_operations = [[0, []]]
454 for pack in self.all_packs():
455 if hint is None or pack.name in hint:
456+ # Either no hint was provided (so we are packing everything),
457+ # or this pack was included in the hint.
458 pack_operations[-1][0] += pack.get_revision_count()
459 pack_operations[-1][1].append(pack)
460 self._execute_pack_operations(pack_operations, OptimisingPacker)
461
462=== modified file 'bzrlib/repository.py'
463--- bzrlib/repository.py 2009-08-13 12:51:59 +0000
464+++ bzrlib/repository.py 2009-08-13 08:56:51 +0000
465@@ -1537,6 +1537,8 @@
466 """Commit the contents accrued within the current write group.
467
468 :seealso: start_write_group.
469+
470+ :return: it may return an opaque hint that can be passed to 'pack'.
471 """
472 if self._write_group is not self.get_transaction():
473 # has an unlock or relock occured ?
474@@ -2348,7 +2350,7 @@
475 """Get Inventory object by revision id."""
476 return self.iter_inventories([revision_id]).next()
477
478- def iter_inventories(self, revision_ids, ordering='unordered'):
479+ def iter_inventories(self, revision_ids, ordering=None):
480 """Get many inventories by revision_ids.
481
482 This will buffer some or all of the texts used in constructing the
483@@ -2356,7 +2358,9 @@
484 time.
485
486 :param revision_ids: The expected revision ids of the inventories.
487- :param ordering: optional ordering, e.g. 'topological'.
488+ :param ordering: optional ordering, e.g. 'topological'. If not
489+ specified, the order of revision_ids will be preserved (by
490+ buffering if necessary).
491 :return: An iterator of inventories.
492 """
493 if ((None in revision_ids)
494@@ -2370,29 +2374,41 @@
495 for text, revision_id in inv_xmls:
496 yield self.deserialise_inventory(revision_id, text)
497
498- def _iter_inventory_xmls(self, revision_ids, ordering='unordered'):
499+ def _iter_inventory_xmls(self, revision_ids, ordering):
500+ if ordering is None:
501+ order_as_requested = True
502+ ordering = 'unordered'
503+ else:
504+ order_as_requested = False
505 keys = [(revision_id,) for revision_id in revision_ids]
506 if not keys:
507 return
508- key_iter = iter(keys)
509- next_key = key_iter.next()
510+ if order_as_requested:
511+ key_iter = iter(keys)
512+ next_key = key_iter.next()
513 stream = self.inventories.get_record_stream(keys, ordering, True)
514 text_chunks = {}
515 for record in stream:
516 if record.storage_kind != 'absent':
517- text_chunks[record.key] = record.get_bytes_as('chunked')
518+ chunks = record.get_bytes_as('chunked')
519+ if order_as_requested:
520+ text_chunks[record.key] = chunks
521+ else:
522+ yield ''.join(chunks), record.key[-1]
523 else:
524 raise errors.NoSuchRevision(self, record.key)
525- while next_key in text_chunks:
526- chunks = text_chunks.pop(next_key)
527- yield ''.join(chunks), next_key[-1]
528- try:
529- next_key = key_iter.next()
530- except StopIteration:
531- # We still want to fully consume the get_record_stream,
532- # just in case it is not actually finished at this point
533- next_key = None
534- break
535+ if order_as_requested:
536+ # Yield as many results as we can while preserving order.
537+ while next_key in text_chunks:
538+ chunks = text_chunks.pop(next_key)
539+ yield ''.join(chunks), next_key[-1]
540+ try:
541+ next_key = key_iter.next()
542+ except StopIteration:
543+ # We still want to fully consume the get_record_stream,
544+ # just in case it is not actually finished at this point
545+ next_key = None
546+ break
547
548 def deserialise_inventory(self, revision_id, xml):
549 """Transform the xml into an inventory object.
550@@ -4224,20 +4240,14 @@
551 for record in substream:
552 # Insert the delta directly
553 inventory_delta_bytes = record.get_bytes_as('fulltext')
554- deserialiser = inventory_delta.InventoryDeltaSerializer()
555- parse_result = deserialiser.parse_text_bytes(inventory_delta_bytes)
556+ deserialiser = inventory_delta.InventoryDeltaDeserializer()
557+ try:
558+ parse_result = deserialiser.parse_text_bytes(
559+ inventory_delta_bytes)
560+ except inventory_delta.IncompatibleInventoryDelta, err:
561+ trace.mutter("Incompatible delta: %s", err.msg)
562+ raise errors.IncompatibleRevision(self.target_repo._format)
563 basis_id, new_id, rich_root, tree_refs, inv_delta = parse_result
564- # Make sure the delta is compatible with the target
565- if rich_root and not target_rich_root:
566- raise errors.IncompatibleRevision(self.target_repo._format)
567- if tree_refs and not target_tree_refs:
568- # The source supports tree refs and the target doesn't. Check
569- # the delta for tree refs; if it has any we can't insert it.
570- for delta_item in inv_delta:
571- entry = delta_item[3]
572- if entry.kind == 'tree-reference':
573- raise errors.IncompatibleRevision(
574- self.target_repo._format)
575 revision_id = new_id
576 parents = [key[0] for key in record.parents]
577 self.target_repo.add_inventory_by_delta(
578@@ -4404,10 +4414,6 @@
579 # Some missing keys are genuinely ghosts, filter those out.
580 present = self.from_repository.inventories.get_parent_map(keys)
581 revs = [key[0] for key in present]
582- # As with the original stream, we may need to generate root
583- # texts for the inventories we're about to stream.
584- for _ in self._generate_root_texts(revs):
585- yield _
586 # Get the inventory stream more-or-less as we do for the
587 # original stream; there's no reason to assume that records
588 # direct from the source will be suitable for the sink. (Think
589@@ -4474,7 +4480,7 @@
590
591 def _get_convertable_inventory_stream(self, revision_ids,
592 delta_versus_null=False):
593- # The source is using CHKs, but the target either doesn't or is has a
594+ # The source is using CHKs, but the target either doesn't or it has a
595 # different serializer. The StreamSink code expects to be able to
596 # convert on the target, so we need to put bytes-on-the-wire that can
597 # be converted. That means inventory deltas (if the remote is <1.18,
598@@ -4499,17 +4505,17 @@
599 # method...
600 inventories = self.from_repository.iter_inventories(
601 revision_ids, 'topological')
602- # XXX: ideally these flags would be per-revision, not per-repo (e.g.
603- # streaming a non-rich-root revision out of a rich-root repo back into
604- # a non-rich-root repo ought to be allowed)
605 format = from_repo._format
606- flags = (format.rich_root_data, format.supports_tree_reference)
607 invs_sent_so_far = set([_mod_revision.NULL_REVISION])
608 inventory_cache = lru_cache.LRUCache(50)
609 null_inventory = from_repo.revision_tree(
610 _mod_revision.NULL_REVISION).inventory
611- serializer = inventory_delta.InventoryDeltaSerializer()
612- serializer.require_flags(*flags)
613+ # XXX: ideally the rich-root/tree-refs flags would be per-revision, not
614+ # per-repo (e.g. streaming a non-rich-root revision out of a rich-root
615+ # repo back into a non-rich-root repo ought to be allowed)
616+ serializer = inventory_delta.InventoryDeltaSerializer(
617+ versioned_root=format.rich_root_data,
618+ tree_references=format.supports_tree_reference)
619 for inv in inventories:
620 key = (inv.revision_id,)
621 parent_keys = parent_map.get(key, ())
622
623=== modified file 'bzrlib/smart/repository.py'
624--- bzrlib/smart/repository.py 2009-08-04 00:51:24 +0000
625+++ bzrlib/smart/repository.py 2009-08-13 08:20:53 +0000
626@@ -424,18 +424,21 @@
627 return None # Signal that we want a body.
628
629 def _should_fake_unknown(self):
630- # This is a workaround for bugs in pre-1.18 clients that claim to
631- # support receiving streams of CHK repositories. The pre-1.18 client
632- # expects inventory records to be serialized in the format defined by
633- # to_network_name, but in pre-1.18 (at least) that format definition
634- # tries to use the xml5 serializer, which does not correctly handle
635- # rich-roots. After 1.18 the client can also accept inventory-deltas
636- # (which avoids this issue), and those clients will use the
637- # Repository.get_stream_1.18 verb instead of this one.
638- # So: if this repository is CHK, and the to_format doesn't match,
639- # we should just fake an UnknownSmartMethod error so that the client
640- # will fallback to VFS, rather than sending it a stream we know it
641- # cannot handle.
642+ """Return True if we should return UnknownMethod to the client.
643+
644+ This is a workaround for bugs in pre-1.18 clients that claim to
645+ support receiving streams of CHK repositories. The pre-1.18 client
646+ expects inventory records to be serialized in the format defined by
647+ to_network_name, but in pre-1.18 (at least) that format definition
648+ tries to use the xml5 serializer, which does not correctly handle
649+ rich-roots. After 1.18 the client can also accept inventory-deltas
650+ (which avoids this issue), and those clients will use the
651+ Repository.get_stream_1.18 verb instead of this one.
652+ So: if this repository is CHK, and the to_format doesn't match,
653+ we should just fake an UnknownSmartMethod error so that the client
654+ will fallback to VFS, rather than sending it a stream we know it
655+ cannot handle.
656+ """
657 from_format = self._repository._format
658 to_format = self._to_format
659 if not from_format.supports_chks:
660@@ -489,8 +492,7 @@
661 class SmartServerRepositoryGetStream_1_18(SmartServerRepositoryGetStream):
662
663 def _should_fake_unknown(self):
664- # The client is at least 1.18, so we don't need to work around any
665- # bugs.
666+ """Returns False; we don't need to workaround bugs in 1.18+ clients."""
667 return False
668
669
670
671=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
672--- bzrlib/tests/per_interrepository/__init__.py 2009-08-13 12:51:59 +0000
673+++ bzrlib/tests/per_interrepository/__init__.py 2009-08-13 03:30:41 +0000
674@@ -46,12 +46,12 @@
675 """Transform the input formats to a list of scenarios.
676
677 :param formats: A list of tuples:
678- (interrepo_class, repository_format, repository_format_to).
679+ (label, repository_format, repository_format_to).
680 """
681 result = []
682- for repository_format, repository_format_to in formats:
683- id = '%s,%s' % (repository_format.__class__.__name__,
684- repository_format_to.__class__.__name__)
685+ for label, repository_format, repository_format_to in formats:
686+ id = '%s,%s,%s' % (label, repository_format.__class__.__name__,
687+ repository_format_to.__class__.__name__)
688 scenario = (id,
689 {"transport_server": transport_server,
690 "transport_readonly_server": transport_readonly_server,
691@@ -71,8 +71,8 @@
692 weaverepo,
693 )
694 result = []
695- def add_combo(from_format, to_format):
696- result.append((from_format, to_format))
697+ def add_combo(label, from_format, to_format):
698+ result.append((label, from_format, to_format))
699 # test the default InterRepository between format 6 and the current
700 # default format.
701 # XXX: robertc 20060220 reinstate this when there are two supported
702@@ -83,32 +83,47 @@
703 for optimiser_class in InterRepository._optimisers:
704 format_to_test = optimiser_class._get_repo_format_to_test()
705 if format_to_test is not None:
706- add_combo(format_to_test, format_to_test)
707+ add_combo(optimiser_class.__name__, format_to_test, format_to_test)
708 # if there are specific combinations we want to use, we can add them
709 # here. We want to test rich root upgrading.
710- add_combo(weaverepo.RepositoryFormat5(),
711- knitrepo.RepositoryFormatKnit3())
712- add_combo(knitrepo.RepositoryFormatKnit1(),
713- knitrepo.RepositoryFormatKnit3())
714- add_combo(knitrepo.RepositoryFormatKnit1(),
715+ # XXX: although we attach InterRepository class names to these scenarios,
716+ # there's nothing asserting that these labels correspond to what it
717+ # actually used.
718+ add_combo('InterRepository',
719+ weaverepo.RepositoryFormat5(),
720+ knitrepo.RepositoryFormatKnit3())
721+ add_combo('InterRepository',
722+ knitrepo.RepositoryFormatKnit1(),
723+ knitrepo.RepositoryFormatKnit3())
724+ add_combo('InterKnitRepo',
725+ knitrepo.RepositoryFormatKnit1(),
726 pack_repo.RepositoryFormatKnitPack1())
727- add_combo(pack_repo.RepositoryFormatKnitPack1(),
728+ add_combo('InterKnitRepo',
729+ pack_repo.RepositoryFormatKnitPack1(),
730 knitrepo.RepositoryFormatKnit1())
731- add_combo(knitrepo.RepositoryFormatKnit3(),
732+ add_combo('InterKnitRepo',
733+ knitrepo.RepositoryFormatKnit3(),
734 pack_repo.RepositoryFormatKnitPack3())
735- add_combo(pack_repo.RepositoryFormatKnitPack3(),
736+ add_combo('InterKnitRepo',
737+ pack_repo.RepositoryFormatKnitPack3(),
738 knitrepo.RepositoryFormatKnit3())
739- add_combo(pack_repo.RepositoryFormatKnitPack3(),
740+ add_combo('InterKnitRepo',
741+ pack_repo.RepositoryFormatKnitPack3(),
742 pack_repo.RepositoryFormatKnitPack4())
743- add_combo(pack_repo.RepositoryFormatKnitPack1(),
744- pack_repo.RepositoryFormatKnitPack6RichRoot())
745- add_combo(pack_repo.RepositoryFormatKnitPack6RichRoot(),
746- groupcompress_repo.RepositoryFormat2a())
747- add_combo(groupcompress_repo.RepositoryFormat2a(),
748- pack_repo.RepositoryFormatKnitPack6RichRoot())
749- add_combo(groupcompress_repo.RepositoryFormatCHK2(),
750- groupcompress_repo.RepositoryFormat2a())
751- add_combo(groupcompress_repo.RepositoryFormatCHK1(),
752+ add_combo('InterDifferingSerializer',
753+ pack_repo.RepositoryFormatKnitPack1(),
754+ pack_repo.RepositoryFormatKnitPack6RichRoot())
755+ add_combo('InterDifferingSerializer',
756+ pack_repo.RepositoryFormatKnitPack6RichRoot(),
757+ groupcompress_repo.RepositoryFormat2a())
758+ add_combo('InterDifferingSerializer',
759+ groupcompress_repo.RepositoryFormat2a(),
760+ pack_repo.RepositoryFormatKnitPack6RichRoot())
761+ add_combo('InterRepository',
762+ groupcompress_repo.RepositoryFormatCHK2(),
763+ groupcompress_repo.RepositoryFormat2a())
764+ add_combo('InterDifferingSerializer',
765+ groupcompress_repo.RepositoryFormatCHK1(),
766 groupcompress_repo.RepositoryFormat2a())
767 return result
768
769
770=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
771--- bzrlib/tests/per_interrepository/test_fetch.py 2009-08-13 12:51:59 +0000
772+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-13 08:55:59 +0000
773@@ -191,8 +191,19 @@
774 ['left', 'right'])
775 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)
776 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)
777-
778+
779+ # Finally, it's not enough to see that the basis inventories are
780+ # present. The texts introduced in merge (and only those) should be
781+ # present, and also generating a stream should succeed without blowing
782+ # up.
783 self.assertTrue(unstacked_repo.has_revision('merge'))
784+ expected_texts = set([('file-id', 'merge')])
785+ if stacked_branch.repository.texts.get_parent_map([('root-id',
786+ 'merge')]):
787+ # If a (root-id,merge) text exists, it should be in the stacked
788+ # repo.
789+ expected_texts.add(('root-id', 'merge'))
790+ self.assertEqual(expected_texts, unstacked_repo.texts.keys())
791 self.assertCanStreamRevision(unstacked_repo, 'merge')
792
793 def assertCanStreamRevision(self, repo, revision_id):
794@@ -241,6 +252,19 @@
795 self.addCleanup(stacked_branch.unlock)
796 stacked_second_tree = stacked_branch.repository.revision_tree('second')
797 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)
798+ # Finally, it's not enough to see that the basis inventories are
799+ # present. The texts introduced in merge (and only those) should be
800+ # present, and also generating a stream should succeed without blowing
801+ # up.
802+ self.assertTrue(unstacked_repo.has_revision('third'))
803+ expected_texts = set([('file-id', 'third')])
804+ if stacked_branch.repository.texts.get_parent_map([('root-id',
805+ 'third')]):
806+ # If a (root-id,third) text exists, it should be in the stacked
807+ # repo.
808+ expected_texts.add(('root-id', 'third'))
809+ self.assertEqual(expected_texts, unstacked_repo.texts.keys())
810+ self.assertCanStreamRevision(unstacked_repo, 'third')
811
812 def test_fetch_missing_basis_text(self):
813 """If fetching a delta, we should die if a basis is not present."""
814
815=== modified file 'bzrlib/tests/per_pack_repository.py'
816--- bzrlib/tests/per_pack_repository.py 2009-08-12 22:28:28 +0000
817+++ bzrlib/tests/per_pack_repository.py 2009-08-13 03:29:52 +0000
818@@ -1051,8 +1051,8 @@
819 tree.branch.push(remote_branch)
820 autopack_calls = len([call for call in self.hpss_calls if call ==
821 'PackRepository.autopack'])
822- streaming_calls = len([call for call in self.hpss_calls if call ==
823- 'Repository.insert_stream'])
824+ streaming_calls = len([call for call in self.hpss_calls if call in
825+ ('Repository.insert_stream', 'Repository.insert_stream_1.18')])
826 if autopack_calls:
827 # Non streaming server
828 self.assertEqual(1, autopack_calls)
829
830=== modified file 'bzrlib/tests/test_inventory_delta.py'
831--- bzrlib/tests/test_inventory_delta.py 2009-08-05 02:30:11 +0000
832+++ bzrlib/tests/test_inventory_delta.py 2009-08-11 06:52:07 +0000
833@@ -26,6 +26,7 @@
834 inventory,
835 inventory_delta,
836 )
837+from bzrlib.inventory_delta import InventoryDeltaError
838 from bzrlib.inventory import Inventory
839 from bzrlib.revision import NULL_REVISION
840 from bzrlib.tests import TestCase
841@@ -93,34 +94,34 @@
842 """Test InventoryDeltaSerializer.parse_text_bytes."""
843
844 def test_parse_no_bytes(self):
845- serializer = inventory_delta.InventoryDeltaSerializer()
846+ deserializer = inventory_delta.InventoryDeltaDeserializer()
847 err = self.assertRaises(
848- errors.BzrError, serializer.parse_text_bytes, '')
849+ InventoryDeltaError, deserializer.parse_text_bytes, '')
850 self.assertContainsRe(str(err), 'last line not empty')
851
852 def test_parse_bad_format(self):
853- serializer = inventory_delta.InventoryDeltaSerializer()
854- err = self.assertRaises(errors.BzrError,
855- serializer.parse_text_bytes, 'format: foo\n')
856+ deserializer = inventory_delta.InventoryDeltaDeserializer()
857+ err = self.assertRaises(InventoryDeltaError,
858+ deserializer.parse_text_bytes, 'format: foo\n')
859 self.assertContainsRe(str(err), 'unknown format')
860
861 def test_parse_no_parent(self):
862- serializer = inventory_delta.InventoryDeltaSerializer()
863- err = self.assertRaises(errors.BzrError,
864- serializer.parse_text_bytes,
865+ deserializer = inventory_delta.InventoryDeltaDeserializer()
866+ err = self.assertRaises(InventoryDeltaError,
867+ deserializer.parse_text_bytes,
868 'format: bzr inventory delta v1 (bzr 1.14)\n')
869 self.assertContainsRe(str(err), 'missing parent: marker')
870
871 def test_parse_no_version(self):
872- serializer = inventory_delta.InventoryDeltaSerializer()
873- err = self.assertRaises(errors.BzrError,
874- serializer.parse_text_bytes,
875+ deserializer = inventory_delta.InventoryDeltaDeserializer()
876+ err = self.assertRaises(InventoryDeltaError,
877+ deserializer.parse_text_bytes,
878 'format: bzr inventory delta v1 (bzr 1.14)\n'
879 'parent: null:\n')
880 self.assertContainsRe(str(err), 'missing version: marker')
881
882 def test_parse_duplicate_key_errors(self):
883- serializer = inventory_delta.InventoryDeltaSerializer()
884+ deserializer = inventory_delta.InventoryDeltaDeserializer()
885 double_root_lines = \
886 """format: bzr inventory delta v1 (bzr 1.14)
887 parent: null:
888@@ -130,14 +131,13 @@
889 None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
890 None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
891 """
892- err = self.assertRaises(errors.BzrError,
893- serializer.parse_text_bytes, double_root_lines)
894+ err = self.assertRaises(InventoryDeltaError,
895+ deserializer.parse_text_bytes, double_root_lines)
896 self.assertContainsRe(str(err), 'duplicate file id')
897
898 def test_parse_versioned_root_only(self):
899- serializer = inventory_delta.InventoryDeltaSerializer()
900- serializer.require_flags(versioned_root=True, tree_references=True)
901- parse_result = serializer.parse_text_bytes(root_only_lines)
902+ deserializer = inventory_delta.InventoryDeltaDeserializer()
903+ parse_result = deserializer.parse_text_bytes(root_only_lines)
904 expected_entry = inventory.make_entry(
905 'directory', u'', None, 'an-id')
906 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
907@@ -147,8 +147,7 @@
908 parse_result)
909
910 def test_parse_special_revid_not_valid_last_mod(self):
911- serializer = inventory_delta.InventoryDeltaSerializer()
912- serializer.require_flags(versioned_root=False, tree_references=True)
913+ deserializer = inventory_delta.InventoryDeltaDeserializer()
914 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
915 parent: null:
916 version: null:
917@@ -156,13 +155,12 @@
918 tree_references: true
919 None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00
920 """
921- err = self.assertRaises(errors.BzrError,
922- serializer.parse_text_bytes, root_only_lines)
923+ err = self.assertRaises(InventoryDeltaError,
924+ deserializer.parse_text_bytes, root_only_lines)
925 self.assertContainsRe(str(err), 'special revisionid found')
926
927 def test_parse_versioned_root_versioned_disabled(self):
928- serializer = inventory_delta.InventoryDeltaSerializer()
929- serializer.require_flags(versioned_root=False, tree_references=True)
930+ deserializer = inventory_delta.InventoryDeltaDeserializer()
931 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
932 parent: null:
933 version: null:
934@@ -170,13 +168,12 @@
935 tree_references: true
936 None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
937 """
938- err = self.assertRaises(errors.BzrError,
939- serializer.parse_text_bytes, root_only_lines)
940+ err = self.assertRaises(InventoryDeltaError,
941+ deserializer.parse_text_bytes, root_only_lines)
942 self.assertContainsRe(str(err), 'Versioned root found')
943
944 def test_parse_unique_root_id_root_versioned_disabled(self):
945- serializer = inventory_delta.InventoryDeltaSerializer()
946- serializer.require_flags(versioned_root=False, tree_references=True)
947+ deserializer = inventory_delta.InventoryDeltaDeserializer()
948 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
949 parent: parent-id
950 version: a@e\xc3\xa5ample.com--2004
951@@ -184,29 +181,38 @@
952 tree_references: true
953 None\x00/\x00an-id\x00\x00parent-id\x00dir\x00\x00
954 """
955- err = self.assertRaises(errors.BzrError,
956- serializer.parse_text_bytes, root_only_lines)
957+ err = self.assertRaises(InventoryDeltaError,
958+ deserializer.parse_text_bytes, root_only_lines)
959 self.assertContainsRe(str(err), 'Versioned root found')
960
961 def test_parse_unversioned_root_versioning_enabled(self):
962- serializer = inventory_delta.InventoryDeltaSerializer()
963- serializer.require_flags(versioned_root=True, tree_references=True)
964- err = self.assertRaises(errors.BzrError,
965- serializer.parse_text_bytes, root_only_unversioned)
966- self.assertContainsRe(
967- str(err), 'serialized versioned_root flag is wrong: False')
968+ deserializer = inventory_delta.InventoryDeltaDeserializer()
969+ parse_result = deserializer.parse_text_bytes(root_only_unversioned)
970+ expected_entry = inventory.make_entry(
971+ 'directory', u'', None, 'TREE_ROOT')
972+ expected_entry.revision = 'entry-version'
973+ self.assertEqual(
974+ ('null:', 'entry-version', False, False,
975+ [(None, u'', 'TREE_ROOT', expected_entry)]),
976+ parse_result)
977+
978+ def test_parse_versioned_root_when_disabled(self):
979+ deserializer = inventory_delta.InventoryDeltaDeserializer(
980+ allow_versioned_root=False)
981+ err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
982+ deserializer.parse_text_bytes, root_only_lines)
983+ self.assertEquals("versioned_root not allowed", str(err))
984
985 def test_parse_tree_when_disabled(self):
986- serializer = inventory_delta.InventoryDeltaSerializer()
987- serializer.require_flags(versioned_root=True, tree_references=False)
988- err = self.assertRaises(errors.BzrError,
989- serializer.parse_text_bytes, reference_lines)
990- self.assertContainsRe(
991- str(err), 'serialized tree_references flag is wrong: True')
992+ deserializer = inventory_delta.InventoryDeltaDeserializer(
993+ allow_tree_references=False)
994+ err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
995+ deserializer.parse_text_bytes, reference_lines)
996+ self.assertEquals("Tree reference not allowed", str(err))
997
998 def test_parse_tree_when_header_disallows(self):
999 # A deserializer that allows tree_references to be set or unset.
1000- serializer = inventory_delta.InventoryDeltaSerializer()
1001+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1002 # A serialised inventory delta with a header saying no tree refs, but
1003 # that has a tree ref in its content.
1004 lines = """format: bzr inventory delta v1 (bzr 1.14)
1005@@ -216,13 +222,13 @@
1006 tree_references: false
1007 None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
1008 """
1009- err = self.assertRaises(errors.BzrError,
1010- serializer.parse_text_bytes, lines)
1011+ err = self.assertRaises(InventoryDeltaError,
1012+ deserializer.parse_text_bytes, lines)
1013 self.assertContainsRe(str(err), 'Tree reference found')
1014
1015 def test_parse_versioned_root_when_header_disallows(self):
1016 # A deserializer that allows tree_references to be set or unset.
1017- serializer = inventory_delta.InventoryDeltaSerializer()
1018+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1019 # A serialised inventory delta with a header saying no tree refs, but
1020 # that has a tree ref in its content.
1021 lines = """format: bzr inventory delta v1 (bzr 1.14)
1022@@ -232,35 +238,35 @@
1023 tree_references: false
1024 None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
1025 """
1026- err = self.assertRaises(errors.BzrError,
1027- serializer.parse_text_bytes, lines)
1028+ err = self.assertRaises(InventoryDeltaError,
1029+ deserializer.parse_text_bytes, lines)
1030 self.assertContainsRe(str(err), 'Versioned root found')
1031
1032 def test_parse_last_line_not_empty(self):
1033 """newpath must start with / if it is not None."""
1034 # Trim the trailing newline from a valid serialization
1035 lines = root_only_lines[:-1]
1036- serializer = inventory_delta.InventoryDeltaSerializer()
1037- err = self.assertRaises(errors.BzrError,
1038- serializer.parse_text_bytes, lines)
1039+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1040+ err = self.assertRaises(InventoryDeltaError,
1041+ deserializer.parse_text_bytes, lines)
1042 self.assertContainsRe(str(err), 'last line not empty')
1043
1044 def test_parse_invalid_newpath(self):
1045 """newpath must start with / if it is not None."""
1046 lines = empty_lines
1047 lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
1048- serializer = inventory_delta.InventoryDeltaSerializer()
1049- err = self.assertRaises(errors.BzrError,
1050- serializer.parse_text_bytes, lines)
1051+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1052+ err = self.assertRaises(InventoryDeltaError,
1053+ deserializer.parse_text_bytes, lines)
1054 self.assertContainsRe(str(err), 'newpath invalid')
1055
1056 def test_parse_invalid_oldpath(self):
1057 """oldpath must start with / if it is not None."""
1058 lines = root_only_lines
1059 lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
1060- serializer = inventory_delta.InventoryDeltaSerializer()
1061- err = self.assertRaises(errors.BzrError,
1062- serializer.parse_text_bytes, lines)
1063+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1064+ err = self.assertRaises(InventoryDeltaError,
1065+ deserializer.parse_text_bytes, lines)
1066 self.assertContainsRe(str(err), 'oldpath invalid')
1067
1068 def test_parse_new_file(self):
1069@@ -270,8 +276,8 @@
1070 lines += (
1071 "None\x00/new\x00file-id\x00an-id\x00version\x00file\x00123\x00" +
1072 "\x00" + fake_sha + "\n")
1073- serializer = inventory_delta.InventoryDeltaSerializer()
1074- parse_result = serializer.parse_text_bytes(lines)
1075+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1076+ parse_result = deserializer.parse_text_bytes(lines)
1077 expected_entry = inventory.make_entry(
1078 'file', u'new', 'an-id', 'file-id')
1079 expected_entry.revision = 'version'
1080@@ -285,8 +291,8 @@
1081 lines = root_only_lines
1082 lines += (
1083 "/old-file\x00None\x00deleted-id\x00\x00null:\x00deleted\x00\x00\n")
1084- serializer = inventory_delta.InventoryDeltaSerializer()
1085- parse_result = serializer.parse_text_bytes(lines)
1086+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1087+ parse_result = deserializer.parse_text_bytes(lines)
1088 delta = parse_result[4]
1089 self.assertEqual(
1090 (u'old-file', None, 'deleted-id', None), delta[-1])
1091@@ -299,8 +305,8 @@
1092 old_inv = Inventory(None)
1093 new_inv = Inventory(None)
1094 delta = new_inv._make_delta(old_inv)
1095- serializer = inventory_delta.InventoryDeltaSerializer()
1096- serializer.require_flags(True, True)
1097+ serializer = inventory_delta.InventoryDeltaSerializer(
1098+ versioned_root=True, tree_references=True)
1099 self.assertEqual(StringIO(empty_lines).readlines(),
1100 serializer.delta_to_lines(NULL_REVISION, NULL_REVISION, delta))
1101
1102@@ -311,8 +317,8 @@
1103 root.revision = 'a@e\xc3\xa5ample.com--2004'
1104 new_inv.add(root)
1105 delta = new_inv._make_delta(old_inv)
1106- serializer = inventory_delta.InventoryDeltaSerializer()
1107- serializer.require_flags(versioned_root=True, tree_references=True)
1108+ serializer = inventory_delta.InventoryDeltaSerializer(
1109+ versioned_root=True, tree_references=True)
1110 self.assertEqual(StringIO(root_only_lines).readlines(),
1111 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
1112
1113@@ -324,15 +330,16 @@
1114 root.revision = 'entry-version'
1115 new_inv.add(root)
1116 delta = new_inv._make_delta(old_inv)
1117- serializer = inventory_delta.InventoryDeltaSerializer()
1118- serializer.require_flags(False, False)
1119+ serializer = inventory_delta.InventoryDeltaSerializer(
1120+ versioned_root=False, tree_references=False)
1121 serialized_lines = serializer.delta_to_lines(
1122 NULL_REVISION, 'entry-version', delta)
1123 self.assertEqual(StringIO(root_only_unversioned).readlines(),
1124 serialized_lines)
1125+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1126 self.assertEqual(
1127 (NULL_REVISION, 'entry-version', False, False, delta),
1128- serializer.parse_text_bytes(''.join(serialized_lines)))
1129+ deserializer.parse_text_bytes(''.join(serialized_lines)))
1130
1131 def test_unversioned_non_root_errors(self):
1132 old_inv = Inventory(None)
1133@@ -343,9 +350,9 @@
1134 non_root = new_inv.make_entry('directory', 'foo', root.file_id, 'id')
1135 new_inv.add(non_root)
1136 delta = new_inv._make_delta(old_inv)
1137- serializer = inventory_delta.InventoryDeltaSerializer()
1138- serializer.require_flags(versioned_root=True, tree_references=True)
1139- err = self.assertRaises(errors.BzrError,
1140+ serializer = inventory_delta.InventoryDeltaSerializer(
1141+ versioned_root=True, tree_references=True)
1142+ err = self.assertRaises(InventoryDeltaError,
1143 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1144 self.assertEqual(str(err), 'no version for fileid id')
1145
1146@@ -355,9 +362,9 @@
1147 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
1148 new_inv.add(root)
1149 delta = new_inv._make_delta(old_inv)
1150- serializer = inventory_delta.InventoryDeltaSerializer()
1151- serializer.require_flags(versioned_root=True, tree_references=True)
1152- err = self.assertRaises(errors.BzrError,
1153+ serializer = inventory_delta.InventoryDeltaSerializer(
1154+ versioned_root=True, tree_references=True)
1155+ err = self.assertRaises(InventoryDeltaError,
1156 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1157 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
1158
1159@@ -368,9 +375,9 @@
1160 root.revision = 'a@e\xc3\xa5ample.com--2004'
1161 new_inv.add(root)
1162 delta = new_inv._make_delta(old_inv)
1163- serializer = inventory_delta.InventoryDeltaSerializer()
1164- serializer.require_flags(versioned_root=False, tree_references=True)
1165- err = self.assertRaises(errors.BzrError,
1166+ serializer = inventory_delta.InventoryDeltaSerializer(
1167+ versioned_root=False, tree_references=True)
1168+ err = self.assertRaises(InventoryDeltaError,
1169 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
1170 self.assertStartsWith(str(err), 'Version present for / in TREE_ROOT')
1171
1172@@ -385,8 +392,8 @@
1173 non_root.kind = 'strangelove'
1174 new_inv.add(non_root)
1175 delta = new_inv._make_delta(old_inv)
1176- serializer = inventory_delta.InventoryDeltaSerializer()
1177- serializer.require_flags(True, True)
1178+ serializer = inventory_delta.InventoryDeltaSerializer(
1179+ versioned_root=True, tree_references=True)
1180 # we expect keyerror because there is little value wrapping this.
1181 # This test aims to prove that it errors more than how it errors.
1182 err = self.assertRaises(KeyError,
1183@@ -405,8 +412,8 @@
1184 non_root.reference_revision = 'subtree-version'
1185 new_inv.add(non_root)
1186 delta = new_inv._make_delta(old_inv)
1187- serializer = inventory_delta.InventoryDeltaSerializer()
1188- serializer.require_flags(versioned_root=True, tree_references=False)
1189+ serializer = inventory_delta.InventoryDeltaSerializer(
1190+ versioned_root=True, tree_references=False)
1191 # we expect keyerror because there is little value wrapping this.
1192 # This test aims to prove that it errors more than how it errors.
1193 err = self.assertRaises(KeyError,
1194@@ -425,8 +432,8 @@
1195 non_root.reference_revision = 'subtree-version'
1196 new_inv.add(non_root)
1197 delta = new_inv._make_delta(old_inv)
1198- serializer = inventory_delta.InventoryDeltaSerializer()
1199- serializer.require_flags(versioned_root=True, tree_references=True)
1200+ serializer = inventory_delta.InventoryDeltaSerializer(
1201+ versioned_root=True, tree_references=True)
1202 self.assertEqual(StringIO(reference_lines).readlines(),
1203 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
1204
1205@@ -434,19 +441,19 @@
1206 root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
1207 root_entry.revision = 'some-version'
1208 delta = [(None, '', 'TREE_ROOT', root_entry)]
1209- serializer = inventory_delta.InventoryDeltaSerializer()
1210- serializer.require_flags(versioned_root=False, tree_references=True)
1211+ serializer = inventory_delta.InventoryDeltaSerializer(
1212+ versioned_root=False, tree_references=True)
1213 self.assertRaises(
1214- errors.BzrError, serializer.delta_to_lines, 'old-version',
1215+ InventoryDeltaError, serializer.delta_to_lines, 'old-version',
1216 'new-version', delta)
1217
1218 def test_to_inventory_root_id_not_versioned(self):
1219 delta = [(None, '', 'an-id', inventory.make_entry(
1220 'directory', '', None, 'an-id'))]
1221- serializer = inventory_delta.InventoryDeltaSerializer()
1222- serializer.require_flags(versioned_root=True, tree_references=True)
1223+ serializer = inventory_delta.InventoryDeltaSerializer(
1224+ versioned_root=True, tree_references=True)
1225 self.assertRaises(
1226- errors.BzrError, serializer.delta_to_lines, 'old-version',
1227+ InventoryDeltaError, serializer.delta_to_lines, 'old-version',
1228 'new-version', delta)
1229
1230 def test_to_inventory_has_tree_not_meant_to(self):
1231@@ -459,9 +466,9 @@
1232 (None, 'foo', 'ref-id', tree_ref)
1233 # a file that followed the root move
1234 ]
1235- serializer = inventory_delta.InventoryDeltaSerializer()
1236- serializer.require_flags(versioned_root=True, tree_references=True)
1237- self.assertRaises(errors.BzrError, serializer.delta_to_lines,
1238+ serializer = inventory_delta.InventoryDeltaSerializer(
1239+ versioned_root=True, tree_references=True)
1240+ self.assertRaises(InventoryDeltaError, serializer.delta_to_lines,
1241 'old-version', 'new-version', delta)
1242
1243 def test_to_inventory_torture(self):
1244@@ -511,8 +518,8 @@
1245 executable=True, text_size=30, text_sha1='some-sha',
1246 revision='old-rev')),
1247 ]
1248- serializer = inventory_delta.InventoryDeltaSerializer()
1249- serializer.require_flags(versioned_root=True, tree_references=True)
1250+ serializer = inventory_delta.InventoryDeltaSerializer(
1251+ versioned_root=True, tree_references=True)
1252 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
1253 expected = """format: bzr inventory delta v1 (bzr 1.14)
1254 parent: null:
1255@@ -565,13 +572,13 @@
1256 def test_file_without_size(self):
1257 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
1258 file_entry.text_sha1 = 'foo'
1259- self.assertRaises(errors.BzrError,
1260+ self.assertRaises(InventoryDeltaError,
1261 inventory_delta._file_content, file_entry)
1262
1263 def test_file_without_sha1(self):
1264 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
1265 file_entry.text_size = 10
1266- self.assertRaises(errors.BzrError,
1267+ self.assertRaises(InventoryDeltaError,
1268 inventory_delta._file_content, file_entry)
1269
1270 def test_link_empty_target(self):
1271@@ -594,7 +601,7 @@
1272
1273 def test_link_no_target(self):
1274 entry = inventory.make_entry('symlink', 'a link', None)
1275- self.assertRaises(errors.BzrError,
1276+ self.assertRaises(InventoryDeltaError,
1277 inventory_delta._link_content, entry)
1278
1279 def test_reference_null(self):
1280@@ -611,5 +618,5 @@
1281
1282 def test_reference_no_reference(self):
1283 entry = inventory.make_entry('tree-reference', 'a tree', None)
1284- self.assertRaises(errors.BzrError,
1285+ self.assertRaises(InventoryDeltaError,
1286 inventory_delta._reference_content, entry)
1287
1288=== modified file 'bzrlib/tests/test_remote.py'
1289--- bzrlib/tests/test_remote.py 2009-08-13 12:51:59 +0000
1290+++ bzrlib/tests/test_remote.py 2009-08-13 03:29:52 +0000
1291@@ -2213,7 +2213,26 @@
1292 self.assertEqual([], client._calls)
1293
1294
1295-class TestRepositoryInsertStream(TestRemoteRepository):
1296+class TestRepositoryInsertStreamBase(TestRemoteRepository):
1297+ """Base class for Repository.insert_stream and .insert_stream_1.18
1298+ tests.
1299+ """
1300+
1301+ def checkInsertEmptyStream(self, repo, client):
1302+ """Insert an empty stream, checking the result.
1303+
1304+ This checks that there are no resume_tokens or missing_keys, and that
1305+ the client is finished.
1306+ """
1307+ sink = repo._get_sink()
1308+ fmt = repository.RepositoryFormat.get_default_format()
1309+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1310+ self.assertEqual([], resume_tokens)
1311+ self.assertEqual(set(), missing_keys)
1312+ self.assertFinished(client)
1313+
1314+
1315+class TestRepositoryInsertStream(TestRepositoryInsertStreamBase):
1316 """Tests for using Repository.insert_stream verb when the _1.18 variant is
1317 not available.
1318
1319@@ -2236,12 +2255,7 @@
1320 client.add_expected_call(
1321 'Repository.insert_stream', ('quack/', ''),
1322 'success', ('ok',))
1323- sink = repo._get_sink()
1324- fmt = repository.RepositoryFormat.get_default_format()
1325- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1326- self.assertEqual([], resume_tokens)
1327- self.assertEqual(set(), missing_keys)
1328- self.assertFinished(client)
1329+ self.checkInsertEmptyStream(repo, client)
1330
1331 def test_locked_repo_with_no_lock_token(self):
1332 transport_path = 'quack'
1333@@ -2259,12 +2273,7 @@
1334 'Repository.insert_stream', ('quack/', ''),
1335 'success', ('ok',))
1336 repo.lock_write()
1337- sink = repo._get_sink()
1338- fmt = repository.RepositoryFormat.get_default_format()
1339- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1340- self.assertEqual([], resume_tokens)
1341- self.assertEqual(set(), missing_keys)
1342- self.assertFinished(client)
1343+ self.checkInsertEmptyStream(repo, client)
1344
1345 def test_locked_repo_with_lock_token(self):
1346 transport_path = 'quack'
1347@@ -2282,18 +2291,13 @@
1348 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
1349 'success', ('ok',))
1350 repo.lock_write()
1351- sink = repo._get_sink()
1352- fmt = repository.RepositoryFormat.get_default_format()
1353- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1354- self.assertEqual([], resume_tokens)
1355- self.assertEqual(set(), missing_keys)
1356- self.assertFinished(client)
1357+ self.checkInsertEmptyStream(repo, client)
1358
1359 def test_stream_with_inventory_deltas(self):
1360- """'inventory-deltas' substreams can't be sent to the
1361- Repository.insert_stream verb. So when one is encountered the
1362- RemoteSink immediately stops using that verb and falls back to VFS
1363- insert_stream.
1364+ """'inventory-deltas' substreams cannot be sent to the
1365+ Repository.insert_stream verb, because not all servers that implement
1366+ that verb will accept them. So when one is encountered the RemoteSink
1367+ immediately stops using that verb and falls back to VFS insert_stream.
1368 """
1369 transport_path = 'quack'
1370 repo, client = self.setup_fake_client_and_repository(transport_path)
1371@@ -2368,8 +2372,8 @@
1372 'directory', 'newdir', inv.root.file_id, 'newdir-id')
1373 entry.revision = 'ghost'
1374 delta = [(None, 'newdir', 'newdir-id', entry)]
1375- serializer = inventory_delta.InventoryDeltaSerializer()
1376- serializer.require_flags(True, False)
1377+ serializer = inventory_delta.InventoryDeltaSerializer(
1378+ versioned_root=True, tree_references=False)
1379 lines = serializer.delta_to_lines('rev1', 'rev2', delta)
1380 yield versionedfile.ChunkedContentFactory(
1381 ('rev2',), (('rev1',)), None, lines)
1382@@ -2380,7 +2384,7 @@
1383 return stream_with_inv_delta()
1384
1385
1386-class TestRepositoryInsertStream_1_18(TestRemoteRepository):
1387+class TestRepositoryInsertStream_1_18(TestRepositoryInsertStreamBase):
1388
1389 def test_unlocked_repo(self):
1390 transport_path = 'quack'
1391@@ -2391,12 +2395,7 @@
1392 client.add_expected_call(
1393 'Repository.insert_stream_1.18', ('quack/', ''),
1394 'success', ('ok',))
1395- sink = repo._get_sink()
1396- fmt = repository.RepositoryFormat.get_default_format()
1397- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1398- self.assertEqual([], resume_tokens)
1399- self.assertEqual(set(), missing_keys)
1400- self.assertFinished(client)
1401+ self.checkInsertEmptyStream(repo, client)
1402
1403 def test_locked_repo_with_no_lock_token(self):
1404 transport_path = 'quack'
1405@@ -2411,12 +2410,7 @@
1406 'Repository.insert_stream_1.18', ('quack/', ''),
1407 'success', ('ok',))
1408 repo.lock_write()
1409- sink = repo._get_sink()
1410- fmt = repository.RepositoryFormat.get_default_format()
1411- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1412- self.assertEqual([], resume_tokens)
1413- self.assertEqual(set(), missing_keys)
1414- self.assertFinished(client)
1415+ self.checkInsertEmptyStream(repo, client)
1416
1417 def test_locked_repo_with_lock_token(self):
1418 transport_path = 'quack'
1419@@ -2431,12 +2425,7 @@
1420 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
1421 'success', ('ok',))
1422 repo.lock_write()
1423- sink = repo._get_sink()
1424- fmt = repository.RepositoryFormat.get_default_format()
1425- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
1426- self.assertEqual([], resume_tokens)
1427- self.assertEqual(set(), missing_keys)
1428- self.assertFinished(client)
1429+ self.checkInsertEmptyStream(repo, client)
1430
1431
1432 class TestRepositoryTarball(TestRemoteRepository):
1433
1434=== modified file 'bzrlib/tests/test_smart.py'
1435--- bzrlib/tests/test_smart.py 2009-07-23 07:37:05 +0000
1436+++ bzrlib/tests/test_smart.py 2009-08-07 02:26:45 +0000
1437@@ -1242,6 +1242,7 @@
1438 SmartServerResponse(('history-incomplete', 2, r2)),
1439 request.execute('stacked', 1, (3, r3)))
1440
1441+
1442 class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):
1443
1444 def make_two_commit_repo(self):
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Robert Collins wrote:
>> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
> [...]
>> [debug flags]
>>> I hope we don't need to ask people to use them, but they are a cheap insurance
>>> policy.
>>>
>>> More usefully, they are helpful for benchmarking and testing. I'm ok with
>>> hiding them if you like.
>> I don't have a strong opinion. The flags are in our docs though as it
>> stands, so they should be clear to people reading them rather than
>> perhaps causing confusion.
>
> I'll leave them in, assuming I can find a way to make the ReST markup cope with
> them...
>

Well, you could always change them to IDS_always to make it consistent
with our other debug flags.

>>> I don't understand why that case isn't covered. Perhaps you should file the bug
>>> instead of me?
>> I'd like to make sure I explain it well enough first; once you
>> understand either of us can file it - and I'll be happy enough to be
>> that person.
> [...]
>> What you need to do is change the test from len(self.packs) == 1, to
>> len(packs being combined) == 1 and that_pack has the same hash.
>
> But AIUI “self.packs” on a Packer *is* “packs being combined”. If it's not then
> your explanation makes sense, but my reading of the code says otherwise.

So he might have been thinking of PackCollection.packs, but you are
right. Packer.packs is certainly the "packs being combined".

...

> Oh, actually, it does matter. When hint == [], no packs should be repacked.
> When hint is None, all packs should be repacked.

Agreed.

...

> A tangent: when we transfer parent inventories to a stacked repo, should we
> transfer the texts introduced by those inventories or not?

No. The point of transfering those parent inventories is so that we can
clearly determine what texts are in the children that are not in the
parents.

> I'm seeing a
> worrisome half-and-half behaviour from the streaming code, where the synthesised
> rich root, but no other texts, are transferred for inventories requested via
> get_stream_for_missing_keys. I think the intention is not (if those parent
> revisions are ever filled in later, then at that time the corresponding texts
> would be backfilled too). I've now fixed this too.

Right. I'm personally not very fond of what we've had to do with
stacking. We had an abstraction in place that made stacked branches
auto-fallback to their parents, then we break that abstraction, but only
when accessing via Remote, which then means we need to fill in some
data, which starts breaking the other assumptions like having an
inventory means having all the texts, etc.

Anyway, I think you have it correct, that the *new* assertion is
something along the lines of:

    If we have a revision, then we have its inventory, and all the new
    texts introduced by that revision, relative to all of its parents.

>
> The net result of this is that the stacking tests in
> per_interrepository/test_fetch.py are starting to get pretty large. I suspect
> we can do better here... (but after this branch lands, I hope!)
>

Well, ideally most of that would be in "per_repository_reference" which
is the...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-13 at 17:36 +0000, John A Meinel wrote:
>
> Anyway, I think you have it correct, that the *new* assertion is
> something along the lines of:
>
> If we have a revision, then we have its inventory, and all the new
> texts introduced by that revision, relative to all of its parents.

I think thats insufficient because of the need to be able to make a
delta. See my reply to Andrew.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-13 at 13:03 +0000, Andrew Bennetts wrote:

{Only replying to things that need actions}

Re: Packer.packs - Garh context and failing memory. Sounds like you're
correct. John agrees that I was misremembering - so cool, your code has
it correct.

> A tangent: when we transfer parent inventories to a stacked repo,
> should we
> transfer the texts introduced by those inventories or not? I'm seeing
> a
> worrisome half-and-half behaviour from the streaming code, where the
> synthesised
> rich root, but no other texts, are transferred for inventories
> requested via
> get_stream_for_missing_keys. I think the intention is not (if those
> parent
> revisions are ever filled in later, then at that time the
> corresponding texts
> would be backfilled too). I've now fixed this too.

There's no requirement that parent inventory entry text references be
filled; the invariant is;
"A repository can always output a delta for any revision object it has
against all the revisions parents; a delta consists of:
-revision
-signature
-inventory delta against an arbitrary parent
-all text versions not present in any parent
"

We should write this down somewhere.

Woo. _doit_

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-13 22:32:23 +0000
3+++ NEWS 2009-08-14 05:35:32 +0000
4@@ -24,6 +24,15 @@
5 Bug Fixes
6 *********
7
8+* Fetching from 2a branches from a version-2 bzr protocol would fail to
9+ copy the internal inventory pages from the CHK store. This cannot happen
10+ in normal use as all 2a compatible clients and servers support the
11+ version-3 protocol, but it does cause test suite failures when testing
12+ downlevel protocol behaviour. (Robert Collins)
13+
14+* Fixed "Pack ... already exists" error when running ``bzr pack`` on a
15+ fully packed 2a repository. (Andrew Bennetts, #382463)
16+
17 * Further tweaks to handling of ``bzr add`` messages about ignored files.
18 (Jason Spashett, #76616)
19
20@@ -32,15 +41,22 @@
21 parent inventories incorrectly, and also not handling when one of the
22 parents was a ghost. (John Arbash Meinel, #402778, #412198)
23
24-* Fetching from 2a branches from a version-2 bzr protocol would fail to
25- copy the internal inventory pages from the CHK store. This cannot happen
26- in normal use as all 2a compatible clients and servers support the
27- version-3 protocol, but it does cause test suite failures when testing
28- downlevel protocol behaviour. (Robert Collins)
29+* ``RemoteStreamSource.get_stream_for_missing_keys`` will fetch CHK
30+ inventory pages when appropriate (by falling back to the vfs stream
31+ source). (Andrew Bennetts, #406686)
32+
33+* StreamSource generates rich roots from non-rich root sources correctly
34+ now. (Andrew Bennetts, #368921)
35
36 Improvements
37 ************
38
39+* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
40+ smart server are more efficient now. They send inventory deltas rather
41+ than full inventories. The smart server has two new requests,
42+ ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
43+ support this. (Andrew Bennetts, #374738, #385826)
44+
45 Documentation
46 *************
47
48@@ -50,6 +66,11 @@
49 Internals
50 *********
51
52+* InterDifferingSerializer is now only used locally. Other fetches that
53+ would have used InterDifferingSerializer now use the more network
54+ friendly StreamSource, which now automatically does the same
55+ transformations as InterDifferingSerializer. (Andrew Bennetts)
56+
57 * RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
58 protocols. (Robert Collins)
59
60
61=== modified file 'bzrlib/fetch.py'
62--- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
63+++ bzrlib/fetch.py 2009-08-14 05:35:32 +0000
64@@ -25,16 +25,21 @@
65
66 import operator
67
68+from bzrlib.lazy_import import lazy_import
69+lazy_import(globals(), """
70+from bzrlib import (
71+ tsort,
72+ versionedfile,
73+ )
74+""")
75 import bzrlib
76 from bzrlib import (
77 errors,
78 symbol_versioning,
79 )
80 from bzrlib.revision import NULL_REVISION
81-from bzrlib.tsort import topo_sort
82 from bzrlib.trace import mutter
83 import bzrlib.ui
84-from bzrlib.versionedfile import FulltextContentFactory
85
86
87 class RepoFetcher(object):
88@@ -213,11 +218,9 @@
89
90 def _find_root_ids(self, revs, parent_map, graph):
91 revision_root = {}
92- planned_versions = {}
93 for tree in self.iter_rev_trees(revs):
94 revision_id = tree.inventory.root.revision
95 root_id = tree.get_root_id()
96- planned_versions.setdefault(root_id, []).append(revision_id)
97 revision_root[revision_id] = root_id
98 # Find out which parents we don't already know root ids for
99 parents = set()
100@@ -229,7 +232,7 @@
101 for tree in self.iter_rev_trees(parents):
102 root_id = tree.get_root_id()
103 revision_root[tree.get_revision_id()] = root_id
104- return revision_root, planned_versions
105+ return revision_root
106
107 def generate_root_texts(self, revs):
108 """Generate VersionedFiles for all root ids.
109@@ -238,9 +241,8 @@
110 """
111 graph = self.source.get_graph()
112 parent_map = graph.get_parent_map(revs)
113- rev_order = topo_sort(parent_map)
114- rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(
115- revs, parent_map, graph)
116+ rev_order = tsort.topo_sort(parent_map)
117+ rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
118 root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
119 rev_order]
120 # Guaranteed stable, this groups all the file id operations together
121@@ -249,20 +251,93 @@
122 # yet, and are unlikely to in non-rich-root environments anyway.
123 root_id_order.sort(key=operator.itemgetter(0))
124 # Create a record stream containing the roots to create.
125- def yield_roots():
126- for key in root_id_order:
127- root_id, rev_id = key
128- rev_parents = parent_map[rev_id]
129- # We drop revision parents with different file-ids, because
130- # that represents a rename of the root to a different location
131- # - its not actually a parent for us. (We could look for that
132- # file id in the revision tree at considerably more expense,
133- # but for now this is sufficient (and reconcile will catch and
134- # correct this anyway).
135- # When a parent revision is a ghost, we guess that its root id
136- # was unchanged (rather than trimming it from the parent list).
137- parent_keys = tuple((root_id, parent) for parent in rev_parents
138- if parent != NULL_REVISION and
139- rev_id_to_root_id.get(parent, root_id) == root_id)
140- yield FulltextContentFactory(key, parent_keys, None, '')
141- return [('texts', yield_roots())]
142+ from bzrlib.graph import FrozenHeadsCache
143+ graph = FrozenHeadsCache(graph)
144+ new_roots_stream = _new_root_data_stream(
145+ root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
146+ return [('texts', new_roots_stream)]
147+
148+
149+def _new_root_data_stream(
150+ root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
151+ """Generate a texts substream of synthesised root entries.
152+
153+ Used in fetches that do rich-root upgrades.
154+
155+ :param root_keys_to_create: iterable of (root_id, rev_id) pairs describing
156+ the root entries to create.
157+ :param rev_id_to_root_id_map: dict of known rev_id -> root_id mappings for
158+ calculating the parents. If a parent rev_id is not found here then it
159+ will be recalculated.
160+ :param parent_map: a parent map for all the revisions in
161+ root_keys_to_create.
162+ :param graph: a graph to use instead of repo.get_graph().
163+ """
164+ for root_key in root_keys_to_create:
165+ root_id, rev_id = root_key
166+ parent_keys = _parent_keys_for_root_version(
167+ root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph)
168+ yield versionedfile.FulltextContentFactory(
169+ root_key, parent_keys, None, '')
170+
171+
172+def _parent_keys_for_root_version(
173+ root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
174+ """Get the parent keys for a given root id.
175+
176+ A helper function for _new_root_data_stream.
177+ """
178+ # Include direct parents of the revision, but only if they used the same
179+ # root_id and are heads.
180+ rev_parents = parent_map[rev_id]
181+ parent_ids = []
182+ for parent_id in rev_parents:
183+ if parent_id == NULL_REVISION:
184+ continue
185+ if parent_id not in rev_id_to_root_id_map:
186+ # We probably didn't read this revision, go spend the extra effort
187+ # to actually check
188+ try:
189+ tree = repo.revision_tree(parent_id)
190+ except errors.NoSuchRevision:
191+ # Ghost, fill out rev_id_to_root_id in case we encounter this
192+ # again.
193+ # But set parent_root_id to None since we don't really know
194+ parent_root_id = None
195+ else:
196+ parent_root_id = tree.get_root_id()
197+ rev_id_to_root_id_map[parent_id] = None
198+ # XXX: why not:
199+ # rev_id_to_root_id_map[parent_id] = parent_root_id
200+ # memory consumption maybe?
201+ else:
202+ parent_root_id = rev_id_to_root_id_map[parent_id]
203+ if root_id == parent_root_id:
204+ # With stacking we _might_ want to refer to a non-local revision,
205+ # but this code path only applies when we have the full content
206+ # available, so ghosts really are ghosts, not just the edge of
207+ # local data.
208+ parent_ids.append(parent_id)
209+ else:
210+ # root_id may be in the parent anyway.
211+ try:
212+ tree = repo.revision_tree(parent_id)
213+ except errors.NoSuchRevision:
214+ # ghost, can't refer to it.
215+ pass
216+ else:
217+ try:
218+ parent_ids.append(tree.inventory[root_id].revision)
219+ except errors.NoSuchId:
220+ # not in the tree
221+ pass
222+ # Drop non-head parents
223+ if graph is None:
224+ graph = repo.get_graph()
225+ heads = graph.heads(parent_ids)
226+ selected_ids = []
227+ for parent_id in parent_ids:
228+ if parent_id in heads and parent_id not in selected_ids:
229+ selected_ids.append(parent_id)
230+ parent_keys = [(root_id, parent_id) for parent_id in selected_ids]
231+ return parent_keys
232
233=== modified file 'bzrlib/help_topics/en/debug-flags.txt'
234--- bzrlib/help_topics/en/debug-flags.txt 2009-08-04 04:36:34 +0000
235+++ bzrlib/help_topics/en/debug-flags.txt 2009-08-14 05:35:32 +0000
236@@ -12,6 +12,7 @@
237 operations.
238 -Dfetch Trace history copying between repositories.
239 -Dfilters Emit information for debugging content filtering.
240+-Dforceinvdeltas Force use of inventory deltas during generic streaming fetch.
241 -Dgraph Trace graph traversal.
242 -Dhashcache Log every time a working file is read to determine its hash.
243 -Dhooks Trace hook execution.
244@@ -27,3 +28,7 @@
245 -Dunlock Some errors during unlock are treated as warnings.
246 -Dpack Emit information about pack operations.
247 -Dsftp Trace SFTP internals.
248+-Dstream Trace fetch streams.
249+-DIDS_never Never use InterDifferingSerializer when fetching.
250+-DIDS_always Always use InterDifferingSerializer to fetch if appropriate
251+ for the format, even for non-local fetches.
252
253=== modified file 'bzrlib/inventory_delta.py'
254--- bzrlib/inventory_delta.py 2009-04-02 05:53:12 +0000
255+++ bzrlib/inventory_delta.py 2009-08-14 05:35:32 +0000
256@@ -29,6 +29,25 @@
257 from bzrlib import inventory
258 from bzrlib.revision import NULL_REVISION
259
260+FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
261+
262+
263+class InventoryDeltaError(errors.BzrError):
264+ """An error when serializing or deserializing an inventory delta."""
265+
266+ # Most errors when serializing and deserializing are due to bugs, although
267+ # damaged input (i.e. a bug in a different process) could cause
268+ # deserialization errors too.
269+ internal_error = True
270+
271+
272+class IncompatibleInventoryDelta(errors.BzrError):
273+ """The delta could not be deserialised because its contents conflict with
274+ the allow_versioned_root or allow_tree_references flags of the
275+ deserializer.
276+ """
277+ internal_error = False
278+
279
280 def _directory_content(entry):
281 """Serialize the content component of entry which is a directory.
282@@ -49,7 +68,7 @@
283 exec_bytes = ''
284 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)
285 if None in size_exec_sha:
286- raise errors.BzrError('Missing size or sha for %s' % entry.file_id)
287+ raise InventoryDeltaError('Missing size or sha for %s' % entry.file_id)
288 return "file\x00%d\x00%s\x00%s" % size_exec_sha
289
290
291@@ -60,7 +79,7 @@
292 """
293 target = entry.symlink_target
294 if target is None:
295- raise errors.BzrError('Missing target for %s' % entry.file_id)
296+ raise InventoryDeltaError('Missing target for %s' % entry.file_id)
297 return "link\x00%s" % target.encode('utf8')
298
299
300@@ -71,7 +90,8 @@
301 """
302 tree_revision = entry.reference_revision
303 if tree_revision is None:
304- raise errors.BzrError('Missing reference revision for %s' % entry.file_id)
305+ raise InventoryDeltaError(
306+ 'Missing reference revision for %s' % entry.file_id)
307 return "tree\x00%s" % tree_revision
308
309
310@@ -115,11 +135,8 @@
311 return result
312
313
314-
315 class InventoryDeltaSerializer(object):
316- """Serialize and deserialize inventory deltas."""
317-
318- FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
319+ """Serialize inventory deltas."""
320
321 def __init__(self, versioned_root, tree_references):
322 """Create an InventoryDeltaSerializer.
323@@ -141,6 +158,9 @@
324 def delta_to_lines(self, old_name, new_name, delta_to_new):
325 """Return a line sequence for delta_to_new.
326
327+ Both the versioned_root and tree_references flags must be set via
328+ require_flags before calling this.
329+
330 :param old_name: A UTF8 revision id for the old inventory. May be
331 NULL_REVISION if there is no older inventory and delta_to_new
332 includes the entire inventory contents.
333@@ -150,15 +170,20 @@
334 takes.
335 :return: The serialized delta as lines.
336 """
337+ if type(old_name) is not str:
338+ raise TypeError('old_name should be str, got %r' % (old_name,))
339+ if type(new_name) is not str:
340+ raise TypeError('new_name should be str, got %r' % (new_name,))
341 lines = ['', '', '', '', '']
342 to_line = self._delta_item_to_line
343 for delta_item in delta_to_new:
344- lines.append(to_line(delta_item))
345- if lines[-1].__class__ != str:
346- raise errors.BzrError(
347+ line = to_line(delta_item, new_name)
348+ if line.__class__ != str:
349+ raise InventoryDeltaError(
350 'to_line generated non-str output %r' % lines[-1])
351+ lines.append(line)
352 lines.sort()
353- lines[0] = "format: %s\n" % InventoryDeltaSerializer.FORMAT_1
354+ lines[0] = "format: %s\n" % FORMAT_1
355 lines[1] = "parent: %s\n" % old_name
356 lines[2] = "version: %s\n" % new_name
357 lines[3] = "versioned_root: %s\n" % self._serialize_bool(
358@@ -173,7 +198,7 @@
359 else:
360 return "false"
361
362- def _delta_item_to_line(self, delta_item):
363+ def _delta_item_to_line(self, delta_item, new_version):
364 """Convert delta_item to a line."""
365 oldpath, newpath, file_id, entry = delta_item
366 if newpath is None:
367@@ -188,6 +213,10 @@
368 oldpath_utf8 = 'None'
369 else:
370 oldpath_utf8 = '/' + oldpath.encode('utf8')
371+ if newpath == '/':
372+ raise AssertionError(
373+ "Bad inventory delta: '/' is not a valid newpath "
374+ "(should be '') in delta item %r" % (delta_item,))
375 # TODO: Test real-world utf8 cache hit rate. It may be a win.
376 newpath_utf8 = '/' + newpath.encode('utf8')
377 # Serialize None as ''
378@@ -196,58 +225,78 @@
379 last_modified = entry.revision
380 # special cases for /
381 if newpath_utf8 == '/' and not self._versioned_root:
382- if file_id != 'TREE_ROOT':
383- raise errors.BzrError(
384- 'file_id %s is not TREE_ROOT for /' % file_id)
385- if last_modified is not None:
386- raise errors.BzrError(
387- 'Version present for / in %s' % file_id)
388- last_modified = NULL_REVISION
389+ # This is an entry for the root, this inventory does not
390+ # support versioned roots. So this must be an unversioned
391+ # root, i.e. last_modified == new revision. Otherwise, this
392+ # delta is invalid.
393+ # Note: the non-rich-root repositories *can* have roots with
394+ # file-ids other than TREE_ROOT, e.g. repo formats that use the
395+ # xml5 serializer.
396+ if last_modified != new_version:
397+ raise InventoryDeltaError(
398+ 'Version present for / in %s (%s != %s)'
399+ % (file_id, last_modified, new_version))
400 if last_modified is None:
401- raise errors.BzrError("no version for fileid %s" % file_id)
402+ raise InventoryDeltaError("no version for fileid %s" % file_id)
403 content = self._entry_to_content[entry.kind](entry)
404 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %
405 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,
406 content))
407
408+
409+class InventoryDeltaDeserializer(object):
410+ """Deserialize inventory deltas."""
411+
412+ def __init__(self, allow_versioned_root=True, allow_tree_references=True):
413+ """Create an InventoryDeltaDeserializer.
414+
415+ :param versioned_root: If True, any root entry that is seen is expected
416+ to be versioned, and root entries can have any fileid.
417+ :param tree_references: If True support tree-reference entries.
418+ """
419+ self._allow_versioned_root = allow_versioned_root
420+ self._allow_tree_references = allow_tree_references
421+
422 def _deserialize_bool(self, value):
423 if value == "true":
424 return True
425 elif value == "false":
426 return False
427 else:
428- raise errors.BzrError("value %r is not a bool" % (value,))
429+ raise InventoryDeltaError("value %r is not a bool" % (value,))
430
431 def parse_text_bytes(self, bytes):
432 """Parse the text bytes of a serialized inventory delta.
433
434+ If versioned_root and/or tree_references flags were set via
435+ require_flags, then the parsed flags must match or a BzrError will be
436+ raised.
437+
438 :param bytes: The bytes to parse. This can be obtained by calling
439 delta_to_lines and then doing ''.join(delta_lines).
440- :return: (parent_id, new_id, inventory_delta)
441+ :return: (parent_id, new_id, versioned_root, tree_references,
442+ inventory_delta)
443 """
444+ if bytes[-1:] != '\n':
445+ last_line = bytes.rsplit('\n', 1)[-1]
446+ raise InventoryDeltaError('last line not empty: %r' % (last_line,))
447 lines = bytes.split('\n')[:-1] # discard the last empty line
448- if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:
449- raise errors.BzrError('unknown format %r' % lines[0:1])
450+ if not lines or lines[0] != 'format: %s' % FORMAT_1:
451+ raise InventoryDeltaError('unknown format %r' % lines[0:1])
452 if len(lines) < 2 or not lines[1].startswith('parent: '):
453- raise errors.BzrError('missing parent: marker')
454+ raise InventoryDeltaError('missing parent: marker')
455 delta_parent_id = lines[1][8:]
456 if len(lines) < 3 or not lines[2].startswith('version: '):
457- raise errors.BzrError('missing version: marker')
458+ raise InventoryDeltaError('missing version: marker')
459 delta_version_id = lines[2][9:]
460 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):
461- raise errors.BzrError('missing versioned_root: marker')
462+ raise InventoryDeltaError('missing versioned_root: marker')
463 delta_versioned_root = self._deserialize_bool(lines[3][16:])
464 if len(lines) < 5 or not lines[4].startswith('tree_references: '):
465- raise errors.BzrError('missing tree_references: marker')
466+ raise InventoryDeltaError('missing tree_references: marker')
467 delta_tree_references = self._deserialize_bool(lines[4][17:])
468- if delta_versioned_root != self._versioned_root:
469- raise errors.BzrError(
470- "serialized versioned_root flag is wrong: %s" %
471- (delta_versioned_root,))
472- if delta_tree_references != self._tree_references:
473- raise errors.BzrError(
474- "serialized tree_references flag is wrong: %s" %
475- (delta_tree_references,))
476+ if (not self._allow_versioned_root and delta_versioned_root):
477+ raise IncompatibleInventoryDelta("versioned_root not allowed")
478 result = []
479 seen_ids = set()
480 line_iter = iter(lines)
481@@ -258,33 +307,58 @@
482 content) = line.split('\x00', 5)
483 parent_id = parent_id or None
484 if file_id in seen_ids:
485- raise errors.BzrError(
486+ raise InventoryDeltaError(
487 "duplicate file id in inventory delta %r" % lines)
488 seen_ids.add(file_id)
489- if newpath_utf8 == '/' and not delta_versioned_root and (
490- last_modified != 'null:' or file_id != 'TREE_ROOT'):
491- raise errors.BzrError("Versioned root found: %r" % line)
492- elif last_modified[-1] == ':':
493- raise errors.BzrError('special revisionid found: %r' % line)
494- if not delta_tree_references and content.startswith('tree\x00'):
495- raise errors.BzrError("Tree reference found: %r" % line)
496- content_tuple = tuple(content.split('\x00'))
497- entry = _parse_entry(
498- newpath_utf8, file_id, parent_id, last_modified, content_tuple)
499+ if (newpath_utf8 == '/' and not delta_versioned_root and
500+ last_modified != delta_version_id):
501+ # Delta claims to be not have a versioned root, yet here's
502+ # a root entry with a non-default version.
503+ raise InventoryDeltaError("Versioned root found: %r" % line)
504+ elif newpath_utf8 != 'None' and last_modified[-1] == ':':
505+ # Deletes have a last_modified of null:, but otherwise special
506+ # revision ids should not occur.
507+ raise InventoryDeltaError('special revisionid found: %r' % line)
508+ if content.startswith('tree\x00'):
509+ if delta_tree_references is False:
510+ raise InventoryDeltaError(
511+ "Tree reference found (but header said "
512+ "tree_references: false): %r" % line)
513+ elif not self._allow_tree_references:
514+ raise IncompatibleInventoryDelta(
515+ "Tree reference not allowed")
516 if oldpath_utf8 == 'None':
517 oldpath = None
518+ elif oldpath_utf8[:1] != '/':
519+ raise InventoryDeltaError(
520+ "oldpath invalid (does not start with /): %r"
521+ % (oldpath_utf8,))
522 else:
523+ oldpath_utf8 = oldpath_utf8[1:]
524 oldpath = oldpath_utf8.decode('utf8')
525 if newpath_utf8 == 'None':
526 newpath = None
527+ elif newpath_utf8[:1] != '/':
528+ raise InventoryDeltaError(
529+ "newpath invalid (does not start with /): %r"
530+ % (newpath_utf8,))
531 else:
532+ # Trim leading slash
533+ newpath_utf8 = newpath_utf8[1:]
534 newpath = newpath_utf8.decode('utf8')
535+ content_tuple = tuple(content.split('\x00'))
536+ if content_tuple[0] == 'deleted':
537+ entry = None
538+ else:
539+ entry = _parse_entry(
540+ newpath, file_id, parent_id, last_modified, content_tuple)
541 delta_item = (oldpath, newpath, file_id, entry)
542 result.append(delta_item)
543- return delta_parent_id, delta_version_id, result
544-
545-
546-def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):
547+ return (delta_parent_id, delta_version_id, delta_versioned_root,
548+ delta_tree_references, result)
549+
550+
551+def _parse_entry(path, file_id, parent_id, last_modified, content):
552 entry_factory = {
553 'dir': _dir_to_entry,
554 'file': _file_to_entry,
555@@ -292,8 +366,10 @@
556 'tree': _tree_to_entry,
557 }
558 kind = content[0]
559- path = utf8_path[1:].decode('utf8')
560+ if path.startswith('/'):
561+ raise AssertionError
562 name = basename(path)
563 return entry_factory[content[0]](
564 content, name, parent_id, file_id, last_modified)
565
566+
567
568=== modified file 'bzrlib/remote.py'
569--- bzrlib/remote.py 2009-08-13 22:32:23 +0000
570+++ bzrlib/remote.py 2009-08-14 05:35:32 +0000
571@@ -426,6 +426,7 @@
572 self._custom_format = None
573 self._network_name = None
574 self._creating_bzrdir = None
575+ self._supports_chks = None
576 self._supports_external_lookups = None
577 self._supports_tree_reference = None
578 self._rich_root_data = None
579@@ -443,6 +444,13 @@
580 return self._rich_root_data
581
582 @property
583+ def supports_chks(self):
584+ if self._supports_chks is None:
585+ self._ensure_real()
586+ self._supports_chks = self._custom_format.supports_chks
587+ return self._supports_chks
588+
589+ @property
590 def supports_external_lookups(self):
591 if self._supports_external_lookups is None:
592 self._ensure_real()
593@@ -1178,9 +1186,9 @@
594 self._ensure_real()
595 return self._real_repository.get_inventory(revision_id)
596
597- def iter_inventories(self, revision_ids):
598+ def iter_inventories(self, revision_ids, ordering=None):
599 self._ensure_real()
600- return self._real_repository.iter_inventories(revision_ids)
601+ return self._real_repository.iter_inventories(revision_ids, ordering)
602
603 @needs_read_lock
604 def get_revision(self, revision_id):
605@@ -1682,43 +1690,61 @@
606 def insert_stream(self, stream, src_format, resume_tokens):
607 target = self.target_repo
608 target._unstacked_provider.missing_keys.clear()
609+ candidate_calls = [('Repository.insert_stream_1.19', (1, 19))]
610 if target._lock_token:
611- verb = 'Repository.insert_stream_locked'
612- extra_args = (target._lock_token or '',)
613- required_version = (1, 14)
614+ candidate_calls.append(('Repository.insert_stream_locked', (1, 14)))
615+ lock_args = (target._lock_token or '',)
616 else:
617- verb = 'Repository.insert_stream'
618- extra_args = ()
619- required_version = (1, 13)
620+ candidate_calls.append(('Repository.insert_stream', (1, 13)))
621+ lock_args = ()
622 client = target._client
623 medium = client._medium
624- if medium._is_remote_before(required_version):
625- # No possible way this can work.
626- return self._insert_real(stream, src_format, resume_tokens)
627 path = target.bzrdir._path_for_remote_call(client)
628- if not resume_tokens:
629- # XXX: Ugly but important for correctness, *will* be fixed during
630- # 1.13 cycle. Pushing a stream that is interrupted results in a
631- # fallback to the _real_repositories sink *with a partial stream*.
632- # Thats bad because we insert less data than bzr expected. To avoid
633- # this we do a trial push to make sure the verb is accessible, and
634- # do not fallback when actually pushing the stream. A cleanup patch
635- # is going to look at rewinding/restarting the stream/partial
636- # buffering etc.
637+ # Probe for the verb to use with an empty stream before sending the
638+ # real stream to it. We do this both to avoid the risk of sending a
639+ # large request that is then rejected, and because we don't want to
640+ # implement a way to buffer, rewind, or restart the stream.
641+ found_verb = False
642+ for verb, required_version in candidate_calls:
643+ if medium._is_remote_before(required_version):
644+ continue
645+ if resume_tokens:
646+ # We've already done the probing (and set _is_remote_before) on
647+ # a previous insert.
648+ found_verb = True
649+ break
650 byte_stream = smart_repo._stream_to_byte_stream([], src_format)
651 try:
652 response = client.call_with_body_stream(
653- (verb, path, '') + extra_args, byte_stream)
654+ (verb, path, '') + lock_args, byte_stream)
655 except errors.UnknownSmartMethod:
656 medium._remember_remote_is_before(required_version)
657- return self._insert_real(stream, src_format, resume_tokens)
658+ else:
659+ found_verb = True
660+ break
661+ if not found_verb:
662+ # Have to use VFS.
663+ return self._insert_real(stream, src_format, resume_tokens)
664+ self._last_inv_record = None
665+ self._last_substream = None
666+ if required_version < (1, 19):
667+ # Remote side doesn't support inventory deltas. Wrap the stream to
668+ # make sure we don't send any. If the stream contains inventory
669+ # deltas we'll interrupt the smart insert_stream request and
670+ # fallback to VFS.
671+ stream = self._stop_stream_if_inventory_delta(stream)
672 byte_stream = smart_repo._stream_to_byte_stream(
673 stream, src_format)
674 resume_tokens = ' '.join(resume_tokens)
675 response = client.call_with_body_stream(
676- (verb, path, resume_tokens) + extra_args, byte_stream)
677+ (verb, path, resume_tokens) + lock_args, byte_stream)
678 if response[0][0] not in ('ok', 'missing-basis'):
679 raise errors.UnexpectedSmartServerResponse(response)
680+ if self._last_substream is not None:
681+ # The stream included an inventory-delta record, but the remote
682+ # side isn't new enough to support them. So we need to send the
683+ # rest of the stream via VFS.
684+ return self._resume_stream_with_vfs(response, src_format)
685 if response[0][0] == 'missing-basis':
686 tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
687 resume_tokens = tokens
688@@ -1727,6 +1753,46 @@
689 self.target_repo.refresh_data()
690 return [], set()
691
692+ def _resume_stream_with_vfs(self, response, src_format):
693+ """Resume sending a stream via VFS, first resending the record and
694+ substream that couldn't be sent via an insert_stream verb.
695+ """
696+ if response[0][0] == 'missing-basis':
697+ tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
698+ # Ignore missing_keys, we haven't finished inserting yet
699+ else:
700+ tokens = []
701+ def resume_substream():
702+ # Yield the substream that was interrupted.
703+ for record in self._last_substream:
704+ yield record
705+ self._last_substream = None
706+ def resume_stream():
707+ # Finish sending the interrupted substream
708+ yield ('inventory-deltas', resume_substream())
709+ # Then simply continue sending the rest of the stream.
710+ for substream_kind, substream in self._last_stream:
711+ yield substream_kind, substream
712+ return self._insert_real(resume_stream(), src_format, tokens)
713+
714+ def _stop_stream_if_inventory_delta(self, stream):
715+ """Normally this just lets the original stream pass-through unchanged.
716+
717+ However if any 'inventory-deltas' substream occurs it will stop
718+ streaming, and store the interrupted substream and stream in
719+ self._last_substream and self._last_stream so that the stream can be
720+ resumed by _resume_stream_with_vfs.
721+ """
722+
723+ stream_iter = iter(stream)
724+ for substream_kind, substream in stream_iter:
725+ if substream_kind == 'inventory-deltas':
726+ self._last_substream = substream
727+ self._last_stream = stream_iter
728+ return
729+ else:
730+ yield substream_kind, substream
731+
732
733 class RemoteStreamSource(repository.StreamSource):
734 """Stream data from a remote server."""
735@@ -1747,6 +1813,12 @@
736 sources.append(repo)
737 return self.missing_parents_chain(search, sources)
738
739+ def get_stream_for_missing_keys(self, missing_keys):
740+ self.from_repository._ensure_real()
741+ real_repo = self.from_repository._real_repository
742+ real_source = real_repo._get_source(self.to_format)
743+ return real_source.get_stream_for_missing_keys(missing_keys)
744+
745 def _real_stream(self, repo, search):
746 """Get a stream for search from repo.
747
748@@ -1782,18 +1854,26 @@
749 return self._real_stream(repo, search)
750 client = repo._client
751 medium = client._medium
752- if medium._is_remote_before((1, 13)):
753- # streaming was added in 1.13
754- return self._real_stream(repo, search)
755 path = repo.bzrdir._path_for_remote_call(client)
756- try:
757- search_bytes = repo._serialise_search_result(search)
758- response = repo._call_with_body_bytes_expecting_body(
759- 'Repository.get_stream',
760- (path, self.to_format.network_name()), search_bytes)
761- response_tuple, response_handler = response
762- except errors.UnknownSmartMethod:
763- medium._remember_remote_is_before((1,13))
764+ search_bytes = repo._serialise_search_result(search)
765+ args = (path, self.to_format.network_name())
766+ candidate_verbs = [
767+ ('Repository.get_stream_1.19', (1, 19)),
768+ ('Repository.get_stream', (1, 13))]
769+ found_verb = False
770+ for verb, version in candidate_verbs:
771+ if medium._is_remote_before(version):
772+ continue
773+ try:
774+ response = repo._call_with_body_bytes_expecting_body(
775+ verb, args, search_bytes)
776+ except errors.UnknownSmartMethod:
777+ medium._remember_remote_is_before(version)
778+ else:
779+ response_tuple, response_handler = response
780+ found_verb = True
781+ break
782+ if not found_verb:
783 return self._real_stream(repo, search)
784 if response_tuple[0] != 'ok':
785 raise errors.UnexpectedSmartServerResponse(response_tuple)
786
787=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
788--- bzrlib/repofmt/groupcompress_repo.py 2009-08-12 18:58:05 +0000
789+++ bzrlib/repofmt/groupcompress_repo.py 2009-08-14 05:35:32 +0000
790@@ -154,6 +154,8 @@
791 self._writer.begin()
792 # what state is the pack in? (open, finished, aborted)
793 self._state = 'open'
794+ # no name until we finish writing the content
795+ self.name = None
796
797 def _check_references(self):
798 """Make sure our external references are present.
799@@ -477,6 +479,15 @@
800 if not self._use_pack(self.new_pack):
801 self.new_pack.abort()
802 return None
803+ self.new_pack.finish_content()
804+ if len(self.packs) == 1:
805+ old_pack = self.packs[0]
806+ if old_pack.name == self.new_pack._hash.hexdigest():
807+ # The single old pack was already optimally packed.
808+ trace.mutter('single pack %s was already optimally packed',
809+ old_pack.name)
810+ self.new_pack.abort()
811+ return None
812 self.pb.update('finishing repack', 6, 7)
813 self.new_pack.finish()
814 self._pack_collection.allocate(self.new_pack)
815@@ -591,7 +602,7 @@
816 packer = GCCHKPacker(self, packs, '.autopack',
817 reload_func=reload_func)
818 try:
819- packer.pack()
820+ result = packer.pack()
821 except errors.RetryWithNewPacks:
822 # An exception is propagating out of this context, make sure
823 # this packer has cleaned up. Packer() doesn't set its new_pack
824@@ -600,6 +611,8 @@
825 if packer.new_pack is not None:
826 packer.new_pack.abort()
827 raise
828+ if result is None:
829+ return
830 for pack in packs:
831 self._remove_pack_from_memory(pack)
832 # record the newly available packs and stop advertising the old
833@@ -781,10 +794,12 @@
834 return inventory.CHKInventory.deserialise(self.chk_bytes, bytes,
835 (revision_id,))
836
837- def _iter_inventories(self, revision_ids):
838+ def _iter_inventories(self, revision_ids, ordering):
839 """Iterate over many inventory objects."""
840+ if ordering is None:
841+ ordering = 'unordered'
842 keys = [(revision_id,) for revision_id in revision_ids]
843- stream = self.inventories.get_record_stream(keys, 'unordered', True)
844+ stream = self.inventories.get_record_stream(keys, ordering, True)
845 texts = {}
846 for record in stream:
847 if record.storage_kind != 'absent':
848@@ -794,7 +809,7 @@
849 for key in keys:
850 yield inventory.CHKInventory.deserialise(self.chk_bytes, texts[key], key)
851
852- def _iter_inventory_xmls(self, revision_ids):
853+ def _iter_inventory_xmls(self, revision_ids, ordering):
854 # Without a native 'xml' inventory, this method doesn't make sense, so
855 # make it raise to trap naughty direct users.
856 raise NotImplementedError(self._iter_inventory_xmls)
857@@ -894,14 +909,11 @@
858
859 def _get_source(self, to_format):
860 """Return a source for streaming from this repository."""
861- if isinstance(to_format, remote.RemoteRepositoryFormat):
862- # Can't just check attributes on to_format with the current code,
863- # work around this:
864- to_format._ensure_real()
865- to_format = to_format._custom_format
866- if to_format.__class__ is self._format.__class__:
867+ if self._format._serializer == to_format._serializer:
868 # We must be exactly the same format, otherwise stuff like the chk
869- # page layout might be different
870+ # page layout might be different.
871+ # Actually, this test is just slightly looser than exact so that
872+ # CHK2 <-> 2a transfers will work.
873 return GroupCHKStreamSource(self, to_format)
874 return super(CHKInventoryRepository, self)._get_source(to_format)
875
876
877=== modified file 'bzrlib/repofmt/pack_repo.py'
878--- bzrlib/repofmt/pack_repo.py 2009-08-05 01:05:58 +0000
879+++ bzrlib/repofmt/pack_repo.py 2009-08-14 05:35:32 +0000
880@@ -422,6 +422,8 @@
881 self._writer.begin()
882 # what state is the pack in? (open, finished, aborted)
883 self._state = 'open'
884+ # no name until we finish writing the content
885+ self.name = None
886
887 def abort(self):
888 """Cancel creating this pack."""
889@@ -448,6 +450,14 @@
890 self.signature_index.key_count() or
891 (self.chk_index is not None and self.chk_index.key_count()))
892
893+ def finish_content(self):
894+ if self.name is not None:
895+ return
896+ self._writer.end()
897+ if self._buffer[1]:
898+ self._write_data('', flush=True)
899+ self.name = self._hash.hexdigest()
900+
901 def finish(self, suspend=False):
902 """Finish the new pack.
903
904@@ -459,10 +469,7 @@
905 - stores the index size tuple for the pack in the index_sizes
906 attribute.
907 """
908- self._writer.end()
909- if self._buffer[1]:
910- self._write_data('', flush=True)
911- self.name = self._hash.hexdigest()
912+ self.finish_content()
913 if not suspend:
914 self._check_references()
915 # write indices
916@@ -1567,7 +1574,9 @@
917 # determine which packs need changing
918 pack_operations = [[0, []]]
919 for pack in self.all_packs():
920- if not hint or pack.name in hint:
921+ if hint is None or pack.name in hint:
922+ # Either no hint was provided (so we are packing everything),
923+ # or this pack was included in the hint.
924 pack_operations[-1][0] += pack.get_revision_count()
925 pack_operations[-1][1].append(pack)
926 self._execute_pack_operations(pack_operations, OptimisingPacker)
927@@ -2093,6 +2102,7 @@
928 # when autopack takes no steps, the names list is still
929 # unsaved.
930 return self._save_pack_names()
931+ return []
932
933 def _suspend_write_group(self):
934 tokens = [pack.name for pack in self._resumed_packs]
935
936=== modified file 'bzrlib/repository.py'
937--- bzrlib/repository.py 2009-08-12 22:28:28 +0000
938+++ bzrlib/repository.py 2009-08-14 05:35:32 +0000
939@@ -31,6 +31,7 @@
940 gpg,
941 graph,
942 inventory,
943+ inventory_delta,
944 lazy_regex,
945 lockable_files,
946 lockdir,
947@@ -924,6 +925,11 @@
948 """
949 if self._write_group is not self.get_transaction():
950 # has an unlock or relock occured ?
951+ if suppress_errors:
952+ mutter(
953+ '(suppressed) mismatched lock context and write group. %r, %r',
954+ self._write_group, self.get_transaction())
955+ return
956 raise errors.BzrError(
957 'mismatched lock context and write group. %r, %r' %
958 (self._write_group, self.get_transaction()))
959@@ -1063,8 +1069,10 @@
960 check_content=True):
961 """Store lines in inv_vf and return the sha1 of the inventory."""
962 parents = [(parent,) for parent in parents]
963- return self.inventories.add_lines((revision_id,), parents, lines,
964+ result = self.inventories.add_lines((revision_id,), parents, lines,
965 check_content=check_content)[0]
966+ self.inventories._access.flush()
967+ return result
968
969 def add_revision(self, revision_id, rev, inv=None, config=None):
970 """Add rev to the revision store as revision_id.
971@@ -1529,6 +1537,8 @@
972 """Commit the contents accrued within the current write group.
973
974 :seealso: start_write_group.
975+
976+ :return: it may return an opaque hint that can be passed to 'pack'.
977 """
978 if self._write_group is not self.get_transaction():
979 # has an unlock or relock occured ?
980@@ -2340,7 +2350,7 @@
981 """Get Inventory object by revision id."""
982 return self.iter_inventories([revision_id]).next()
983
984- def iter_inventories(self, revision_ids):
985+ def iter_inventories(self, revision_ids, ordering=None):
986 """Get many inventories by revision_ids.
987
988 This will buffer some or all of the texts used in constructing the
989@@ -2348,30 +2358,57 @@
990 time.
991
992 :param revision_ids: The expected revision ids of the inventories.
993+ :param ordering: optional ordering, e.g. 'topological'. If not
994+ specified, the order of revision_ids will be preserved (by
995+ buffering if necessary).
996 :return: An iterator of inventories.
997 """
998 if ((None in revision_ids)
999 or (_mod_revision.NULL_REVISION in revision_ids)):
1000 raise ValueError('cannot get null revision inventory')
1001- return self._iter_inventories(revision_ids)
1002+ return self._iter_inventories(revision_ids, ordering)
1003
1004- def _iter_inventories(self, revision_ids):
1005+ def _iter_inventories(self, revision_ids, ordering):
1006 """single-document based inventory iteration."""
1007- for text, revision_id in self._iter_inventory_xmls(revision_ids):
1008+ inv_xmls = self._iter_inventory_xmls(revision_ids, ordering)
1009+ for text, revision_id in inv_xmls:
1010 yield self.deserialise_inventory(revision_id, text)
1011
1012- def _iter_inventory_xmls(self, revision_ids):
1013+ def _iter_inventory_xmls(self, revision_ids, ordering):
1014+ if ordering is None:
1015+ order_as_requested = True
1016+ ordering = 'unordered'
1017+ else:
1018+ order_as_requested = False
1019 keys = [(revision_id,) for revision_id in revision_ids]
1020- stream = self.inventories.get_record_stream(keys, 'unordered', True)
1021+ if not keys:
1022+ return
1023+ if order_as_requested:
1024+ key_iter = iter(keys)
1025+ next_key = key_iter.next()
1026+ stream = self.inventories.get_record_stream(keys, ordering, True)
1027 text_chunks = {}
1028 for record in stream:
1029 if record.storage_kind != 'absent':
1030- text_chunks[record.key] = record.get_bytes_as('chunked')
1031+ chunks = record.get_bytes_as('chunked')
1032+ if order_as_requested:
1033+ text_chunks[record.key] = chunks
1034+ else:
1035+ yield ''.join(chunks), record.key[-1]
1036 else:
1037 raise errors.NoSuchRevision(self, record.key)
1038- for key in keys:
1039- chunks = text_chunks.pop(key)
1040- yield ''.join(chunks), key[-1]
1041+ if order_as_requested:
1042+ # Yield as many results as we can while preserving order.
1043+ while next_key in text_chunks:
1044+ chunks = text_chunks.pop(next_key)
1045+ yield ''.join(chunks), next_key[-1]
1046+ try:
1047+ next_key = key_iter.next()
1048+ except StopIteration:
1049+ # We still want to fully consume the get_record_stream,
1050+ # just in case it is not actually finished at this point
1051+ next_key = None
1052+ break
1053
1054 def deserialise_inventory(self, revision_id, xml):
1055 """Transform the xml into an inventory object.
1056@@ -2398,7 +2435,7 @@
1057 @needs_read_lock
1058 def get_inventory_xml(self, revision_id):
1059 """Get inventory XML as a file object."""
1060- texts = self._iter_inventory_xmls([revision_id])
1061+ texts = self._iter_inventory_xmls([revision_id], 'unordered')
1062 try:
1063 text, revision_id = texts.next()
1064 except StopIteration:
1065@@ -3664,11 +3701,25 @@
1066 # This is redundant with format.check_conversion_target(), however that
1067 # raises an exception, and we just want to say "False" as in we won't
1068 # support converting between these formats.
1069+ if 'IDS_never' in debug.debug_flags:
1070+ return False
1071 if source.supports_rich_root() and not target.supports_rich_root():
1072 return False
1073 if (source._format.supports_tree_reference
1074 and not target._format.supports_tree_reference):
1075 return False
1076+ if target._fallback_repositories and target._format.supports_chks:
1077+ # IDS doesn't know how to copy CHKs for the parent inventories it
1078+ # adds to stacked repos.
1079+ return False
1080+ if 'IDS_always' in debug.debug_flags:
1081+ return True
1082+ # Only use this code path for local source and target. IDS does far
1083+ # too much IO (both bandwidth and roundtrips) over a network.
1084+ if not source.bzrdir.transport.base.startswith('file:///'):
1085+ return False
1086+ if not target.bzrdir.transport.base.startswith('file:///'):
1087+ return False
1088 return True
1089
1090 def _get_delta_for_revision(self, tree, parent_ids, basis_id, cache):
1091@@ -3690,63 +3741,6 @@
1092 deltas.sort()
1093 return deltas[0][1:]
1094
1095- def _get_parent_keys(self, root_key, parent_map):
1096- """Get the parent keys for a given root id."""
1097- root_id, rev_id = root_key
1098- # Include direct parents of the revision, but only if they used
1099- # the same root_id and are heads.
1100- parent_keys = []
1101- for parent_id in parent_map[rev_id]:
1102- if parent_id == _mod_revision.NULL_REVISION:
1103- continue
1104- if parent_id not in self._revision_id_to_root_id:
1105- # We probably didn't read this revision, go spend the
1106- # extra effort to actually check
1107- try:
1108- tree = self.source.revision_tree(parent_id)
1109- except errors.NoSuchRevision:
1110- # Ghost, fill out _revision_id_to_root_id in case we
1111- # encounter this again.
1112- # But set parent_root_id to None since we don't really know
1113- parent_root_id = None
1114- else:
1115- parent_root_id = tree.get_root_id()
1116- self._revision_id_to_root_id[parent_id] = None
1117- else:
1118- parent_root_id = self._revision_id_to_root_id[parent_id]
1119- if root_id == parent_root_id:
1120- # With stacking we _might_ want to refer to a non-local
1121- # revision, but this code path only applies when we have the
1122- # full content available, so ghosts really are ghosts, not just
1123- # the edge of local data.
1124- parent_keys.append((parent_id,))
1125- else:
1126- # root_id may be in the parent anyway.
1127- try:
1128- tree = self.source.revision_tree(parent_id)
1129- except errors.NoSuchRevision:
1130- # ghost, can't refer to it.
1131- pass
1132- else:
1133- try:
1134- parent_keys.append((tree.inventory[root_id].revision,))
1135- except errors.NoSuchId:
1136- # not in the tree
1137- pass
1138- g = graph.Graph(self.source.revisions)
1139- heads = g.heads(parent_keys)
1140- selected_keys = []
1141- for key in parent_keys:
1142- if key in heads and key not in selected_keys:
1143- selected_keys.append(key)
1144- return tuple([(root_id,)+ key for key in selected_keys])
1145-
1146- def _new_root_data_stream(self, root_keys_to_create, parent_map):
1147- for root_key in root_keys_to_create:
1148- parent_keys = self._get_parent_keys(root_key, parent_map)
1149- yield versionedfile.FulltextContentFactory(root_key,
1150- parent_keys, None, '')
1151-
1152 def _fetch_batch(self, revision_ids, basis_id, cache):
1153 """Fetch across a few revisions.
1154
1155@@ -3798,8 +3792,10 @@
1156 from_texts = self.source.texts
1157 to_texts = self.target.texts
1158 if root_keys_to_create:
1159- root_stream = self._new_root_data_stream(root_keys_to_create,
1160- parent_map)
1161+ from bzrlib.fetch import _new_root_data_stream
1162+ root_stream = _new_root_data_stream(
1163+ root_keys_to_create, self._revision_id_to_root_id, parent_map,
1164+ self.source)
1165 to_texts.insert_record_stream(root_stream)
1166 to_texts.insert_record_stream(from_texts.get_record_stream(
1167 text_keys, self.target._format._fetch_order,
1168@@ -3899,7 +3895,6 @@
1169 # Walk though all revisions; get inventory deltas, copy referenced
1170 # texts that delta references, insert the delta, revision and
1171 # signature.
1172- first_rev = self.source.get_revision(revision_ids[0])
1173 if pb is None:
1174 my_pb = ui.ui_factory.nested_progress_bar()
1175 pb = my_pb
1176@@ -4071,9 +4066,6 @@
1177 self.file_ids = set([file_id for file_id, _ in
1178 self.text_index.iterkeys()])
1179 # text keys is now grouped by file_id
1180- n_weaves = len(self.file_ids)
1181- files_in_revisions = {}
1182- revisions_of_files = {}
1183 n_versions = len(self.text_index)
1184 progress_bar.update('loading text store', 0, n_versions)
1185 parent_map = self.repository.texts.get_parent_map(self.text_index)
1186@@ -4172,6 +4164,8 @@
1187 else:
1188 new_pack.set_write_cache_size(1024*1024)
1189 for substream_type, substream in stream:
1190+ if 'stream' in debug.debug_flags:
1191+ mutter('inserting substream: %s', substream_type)
1192 if substream_type == 'texts':
1193 self.target_repo.texts.insert_record_stream(substream)
1194 elif substream_type == 'inventories':
1195@@ -4181,6 +4175,9 @@
1196 else:
1197 self._extract_and_insert_inventories(
1198 substream, src_serializer)
1199+ elif substream_type == 'inventory-deltas':
1200+ self._extract_and_insert_inventory_deltas(
1201+ substream, src_serializer)
1202 elif substream_type == 'chk_bytes':
1203 # XXX: This doesn't support conversions, as it assumes the
1204 # conversion was done in the fetch code.
1205@@ -4237,18 +4234,45 @@
1206 self.target_repo.pack(hint=hint)
1207 return [], set()
1208
1209- def _extract_and_insert_inventories(self, substream, serializer):
1210+ def _extract_and_insert_inventory_deltas(self, substream, serializer):
1211+ target_rich_root = self.target_repo._format.rich_root_data
1212+ target_tree_refs = self.target_repo._format.supports_tree_reference
1213+ for record in substream:
1214+ # Insert the delta directly
1215+ inventory_delta_bytes = record.get_bytes_as('fulltext')
1216+ deserialiser = inventory_delta.InventoryDeltaDeserializer()
1217+ try:
1218+ parse_result = deserialiser.parse_text_bytes(
1219+ inventory_delta_bytes)
1220+ except inventory_delta.IncompatibleInventoryDelta, err:
1221+ trace.mutter("Incompatible delta: %s", err.msg)
1222+ raise errors.IncompatibleRevision(self.target_repo._format)
1223+ basis_id, new_id, rich_root, tree_refs, inv_delta = parse_result
1224+ revision_id = new_id
1225+ parents = [key[0] for key in record.parents]
1226+ self.target_repo.add_inventory_by_delta(
1227+ basis_id, inv_delta, revision_id, parents)
1228+
1229+ def _extract_and_insert_inventories(self, substream, serializer,
1230+ parse_delta=None):
1231 """Generate a new inventory versionedfile in target, converting data.
1232
1233 The inventory is retrieved from the source, (deserializing it), and
1234 stored in the target (reserializing it in a different format).
1235 """
1236+ target_rich_root = self.target_repo._format.rich_root_data
1237+ target_tree_refs = self.target_repo._format.supports_tree_reference
1238 for record in substream:
1239+ # It's not a delta, so it must be a fulltext in the source
1240+ # serializer's format.
1241 bytes = record.get_bytes_as('fulltext')
1242 revision_id = record.key[0]
1243 inv = serializer.read_inventory_from_string(bytes, revision_id)
1244 parents = [key[0] for key in record.parents]
1245 self.target_repo.add_inventory(revision_id, inv, parents)
1246+ # No need to keep holding this full inv in memory when the rest of
1247+ # the substream is likely to be all deltas.
1248+ del inv
1249
1250 def _extract_and_insert_revisions(self, substream, serializer):
1251 for record in substream:
1252@@ -4303,11 +4327,8 @@
1253 return [('signatures', signatures), ('revisions', revisions)]
1254
1255 def _generate_root_texts(self, revs):
1256- """This will be called by __fetch between fetching weave texts and
1257+ """This will be called by get_stream between fetching weave texts and
1258 fetching the inventory weave.
1259-
1260- Subclasses should override this if they need to generate root texts
1261- after fetching weave texts.
1262 """
1263 if self._rich_root_upgrade():
1264 import bzrlib.fetch
1265@@ -4345,9 +4366,6 @@
1266 # will be valid.
1267 for _ in self._generate_root_texts(revs):
1268 yield _
1269- # NB: This currently reopens the inventory weave in source;
1270- # using a single stream interface instead would avoid this.
1271- from_weave = self.from_repository.inventories
1272 # we fetch only the referenced inventories because we do not
1273 # know for unselected inventories whether all their required
1274 # texts are present in the other repository - it could be
1275@@ -4392,6 +4410,18 @@
1276 if not keys:
1277 # No need to stream something we don't have
1278 continue
1279+ if substream_kind == 'inventories':
1280+ # Some missing keys are genuinely ghosts, filter those out.
1281+ present = self.from_repository.inventories.get_parent_map(keys)
1282+ revs = [key[0] for key in present]
1283+ # Get the inventory stream more-or-less as we do for the
1284+ # original stream; there's no reason to assume that records
1285+ # direct from the source will be suitable for the sink. (Think
1286+ # e.g. 2a -> 1.9-rich-root).
1287+ for info in self._get_inventory_stream(revs, missing=True):
1288+ yield info
1289+ continue
1290+
1291 # Ask for full texts always so that we don't need more round trips
1292 # after this stream.
1293 # Some of the missing keys are genuinely ghosts, so filter absent
1294@@ -4412,129 +4442,116 @@
1295 return (not self.from_repository._format.rich_root_data and
1296 self.to_format.rich_root_data)
1297
1298- def _get_inventory_stream(self, revision_ids):
1299+ def _get_inventory_stream(self, revision_ids, missing=False):
1300 from_format = self.from_repository._format
1301- if (from_format.supports_chks and self.to_format.supports_chks
1302- and (from_format._serializer == self.to_format._serializer)):
1303- # Both sides support chks, and they use the same serializer, so it
1304- # is safe to transmit the chk pages and inventory pages across
1305- # as-is.
1306- return self._get_chk_inventory_stream(revision_ids)
1307- elif (not from_format.supports_chks):
1308- # Source repository doesn't support chks. So we can transmit the
1309- # inventories 'as-is' and either they are just accepted on the
1310- # target, or the Sink will properly convert it.
1311- return self._get_simple_inventory_stream(revision_ids)
1312+ if (from_format.supports_chks and self.to_format.supports_chks and
1313+ from_format.network_name() == self.to_format.network_name()):
1314+ raise AssertionError(
1315+ "this case should be handled by GroupCHKStreamSource")
1316+ elif 'forceinvdeltas' in debug.debug_flags:
1317+ return self._get_convertable_inventory_stream(revision_ids,
1318+ delta_versus_null=missing)
1319+ elif from_format.network_name() == self.to_format.network_name():
1320+ # Same format.
1321+ return self._get_simple_inventory_stream(revision_ids,
1322+ missing=missing)
1323+ elif (not from_format.supports_chks and not self.to_format.supports_chks
1324+ and from_format._serializer == self.to_format._serializer):
1325+ # Essentially the same format.
1326+ return self._get_simple_inventory_stream(revision_ids,
1327+ missing=missing)
1328 else:
1329- # XXX: Hack to make not-chk->chk fetch: copy the inventories as
1330- # inventories. Note that this should probably be done somehow
1331- # as part of bzrlib.repository.StreamSink. Except JAM couldn't
1332- # figure out how a non-chk repository could possibly handle
1333- # deserializing an inventory stream from a chk repo, as it
1334- # doesn't have a way to understand individual pages.
1335- return self._get_convertable_inventory_stream(revision_ids)
1336+ # Any time we switch serializations, we want to use an
1337+ # inventory-delta based approach.
1338+ return self._get_convertable_inventory_stream(revision_ids,
1339+ delta_versus_null=missing)
1340
1341- def _get_simple_inventory_stream(self, revision_ids):
1342+ def _get_simple_inventory_stream(self, revision_ids, missing=False):
1343+ # NB: This currently reopens the inventory weave in source;
1344+ # using a single stream interface instead would avoid this.
1345 from_weave = self.from_repository.inventories
1346+ if missing:
1347+ delta_closure = True
1348+ else:
1349+ delta_closure = not self.delta_on_metadata()
1350 yield ('inventories', from_weave.get_record_stream(
1351 [(rev_id,) for rev_id in revision_ids],
1352- self.inventory_fetch_order(),
1353- not self.delta_on_metadata()))
1354-
1355- def _get_chk_inventory_stream(self, revision_ids):
1356- """Fetch the inventory texts, along with the associated chk maps."""
1357- # We want an inventory outside of the search set, so that we can filter
1358- # out uninteresting chk pages. For now we use
1359- # _find_revision_outside_set, but if we had a Search with cut_revs, we
1360- # could use that instead.
1361- start_rev_id = self.from_repository._find_revision_outside_set(
1362- revision_ids)
1363- start_rev_key = (start_rev_id,)
1364- inv_keys_to_fetch = [(rev_id,) for rev_id in revision_ids]
1365- if start_rev_id != _mod_revision.NULL_REVISION:
1366- inv_keys_to_fetch.append((start_rev_id,))
1367- # Any repo that supports chk_bytes must also support out-of-order
1368- # insertion. At least, that is how we expect it to work
1369- # We use get_record_stream instead of iter_inventories because we want
1370- # to be able to insert the stream as well. We could instead fetch
1371- # allowing deltas, and then iter_inventories, but we don't know whether
1372- # source or target is more 'local' anway.
1373- inv_stream = self.from_repository.inventories.get_record_stream(
1374- inv_keys_to_fetch, 'unordered',
1375- True) # We need them as full-texts so we can find their references
1376- uninteresting_chk_roots = set()
1377- interesting_chk_roots = set()
1378- def filter_inv_stream(inv_stream):
1379- for idx, record in enumerate(inv_stream):
1380- ### child_pb.update('fetch inv', idx, len(inv_keys_to_fetch))
1381- bytes = record.get_bytes_as('fulltext')
1382- chk_inv = inventory.CHKInventory.deserialise(
1383- self.from_repository.chk_bytes, bytes, record.key)
1384- if record.key == start_rev_key:
1385- uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
1386- p_id_map = chk_inv.parent_id_basename_to_file_id
1387- if p_id_map is not None:
1388- uninteresting_chk_roots.add(p_id_map.key())
1389- else:
1390- yield record
1391- interesting_chk_roots.add(chk_inv.id_to_entry.key())
1392- p_id_map = chk_inv.parent_id_basename_to_file_id
1393- if p_id_map is not None:
1394- interesting_chk_roots.add(p_id_map.key())
1395- ### pb.update('fetch inventory', 0, 2)
1396- yield ('inventories', filter_inv_stream(inv_stream))
1397- # Now that we have worked out all of the interesting root nodes, grab
1398- # all of the interesting pages and insert them
1399- ### pb.update('fetch inventory', 1, 2)
1400- interesting = chk_map.iter_interesting_nodes(
1401- self.from_repository.chk_bytes, interesting_chk_roots,
1402- uninteresting_chk_roots)
1403- def to_stream_adapter():
1404- """Adapt the iter_interesting_nodes result to a single stream.
1405-
1406- iter_interesting_nodes returns records as it processes them, along
1407- with keys. However, we only want to return the records themselves.
1408- """
1409- for record, items in interesting:
1410- if record is not None:
1411- yield record
1412- # XXX: We could instead call get_record_stream(records.keys())
1413- # ATM, this will always insert the records as fulltexts, and
1414- # requires that you can hang on to records once you have gone
1415- # on to the next one. Further, it causes the target to
1416- # recompress the data. Testing shows it to be faster than
1417- # requesting the records again, though.
1418- yield ('chk_bytes', to_stream_adapter())
1419- ### pb.update('fetch inventory', 2, 2)
1420-
1421- def _get_convertable_inventory_stream(self, revision_ids):
1422- # XXX: One of source or target is using chks, and they don't have
1423- # compatible serializations. The StreamSink code expects to be
1424- # able to convert on the target, so we need to put
1425- # bytes-on-the-wire that can be converted
1426- yield ('inventories', self._stream_invs_as_fulltexts(revision_ids))
1427-
1428- def _stream_invs_as_fulltexts(self, revision_ids):
1429+ self.inventory_fetch_order(), delta_closure))
1430+
1431+ def _get_convertable_inventory_stream(self, revision_ids,
1432+ delta_versus_null=False):
1433+ # The source is using CHKs, but the target either doesn't or it has a
1434+ # different serializer. The StreamSink code expects to be able to
1435+ # convert on the target, so we need to put bytes-on-the-wire that can
1436+ # be converted. That means inventory deltas (if the remote is <1.19,
1437+ # RemoteStreamSink will fallback to VFS to insert the deltas).
1438+ yield ('inventory-deltas',
1439+ self._stream_invs_as_deltas(revision_ids,
1440+ delta_versus_null=delta_versus_null))
1441+
1442+ def _stream_invs_as_deltas(self, revision_ids, delta_versus_null=False):
1443+ """Return a stream of inventory-deltas for the given rev ids.
1444+
1445+ :param revision_ids: The list of inventories to transmit
1446+ :param delta_versus_null: Don't try to find a minimal delta for this
1447+ entry, instead compute the delta versus the NULL_REVISION. This
1448+ effectively streams a complete inventory. Used for stuff like
1449+ filling in missing parents, etc.
1450+ """
1451 from_repo = self.from_repository
1452- from_serializer = from_repo._format._serializer
1453 revision_keys = [(rev_id,) for rev_id in revision_ids]
1454 parent_map = from_repo.inventories.get_parent_map(revision_keys)
1455- for inv in self.from_repository.iter_inventories(revision_ids):
1456- # XXX: This is a bit hackish, but it works. Basically,
1457- # CHKSerializer 'accidentally' supports
1458- # read/write_inventory_to_string, even though that is never
1459- # the format that is stored on disk. It *does* give us a
1460- # single string representation for an inventory, so live with
1461- # it for now.
1462- # This would be far better if we had a 'serialized inventory
1463- # delta' form. Then we could use 'inventory._make_delta', and
1464- # transmit that. This would both be faster to generate, and
1465- # result in fewer bytes-on-the-wire.
1466- as_bytes = from_serializer.write_inventory_to_string(inv)
1467+ # XXX: possibly repos could implement a more efficient iter_inv_deltas
1468+ # method...
1469+ inventories = self.from_repository.iter_inventories(
1470+ revision_ids, 'topological')
1471+ format = from_repo._format
1472+ invs_sent_so_far = set([_mod_revision.NULL_REVISION])
1473+ inventory_cache = lru_cache.LRUCache(50)
1474+ null_inventory = from_repo.revision_tree(
1475+ _mod_revision.NULL_REVISION).inventory
1476+ # XXX: ideally the rich-root/tree-refs flags would be per-revision, not
1477+ # per-repo (e.g. streaming a non-rich-root revision out of a rich-root
1478+ # repo back into a non-rich-root repo ought to be allowed)
1479+ serializer = inventory_delta.InventoryDeltaSerializer(
1480+ versioned_root=format.rich_root_data,
1481+ tree_references=format.supports_tree_reference)
1482+ for inv in inventories:
1483 key = (inv.revision_id,)
1484 parent_keys = parent_map.get(key, ())
1485+ delta = None
1486+ if not delta_versus_null and parent_keys:
1487+ # The caller did not ask for complete inventories and we have
1488+ # some parents that we can delta against. Make a delta against
1489+ # each parent so that we can find the smallest.
1490+ parent_ids = [parent_key[0] for parent_key in parent_keys]
1491+ for parent_id in parent_ids:
1492+ if parent_id not in invs_sent_so_far:
1493+ # We don't know that the remote side has this basis, so
1494+ # we can't use it.
1495+ continue
1496+ if parent_id == _mod_revision.NULL_REVISION:
1497+ parent_inv = null_inventory
1498+ else:
1499+ parent_inv = inventory_cache.get(parent_id, None)
1500+ if parent_inv is None:
1501+ parent_inv = from_repo.get_inventory(parent_id)
1502+ candidate_delta = inv._make_delta(parent_inv)
1503+ if (delta is None or
1504+ len(delta) > len(candidate_delta)):
1505+ delta = candidate_delta
1506+ basis_id = parent_id
1507+ if delta is None:
1508+ # Either none of the parents ended up being suitable, or we
1509+ # were asked to delta against NULL
1510+ basis_id = _mod_revision.NULL_REVISION
1511+ delta = inv._make_delta(null_inventory)
1512+ invs_sent_so_far.add(inv.revision_id)
1513+ inventory_cache[inv.revision_id] = inv
1514+ delta_serialized = ''.join(
1515+ serializer.delta_to_lines(basis_id, key[-1], delta))
1516 yield versionedfile.FulltextContentFactory(
1517- key, parent_keys, None, as_bytes)
1518+ key, parent_keys, None, delta_serialized)
1519
1520
1521 def _iter_for_revno(repo, partial_history_cache, stop_index=None,
1522
1523=== modified file 'bzrlib/smart/protocol.py'
1524--- bzrlib/smart/protocol.py 2009-07-17 01:48:56 +0000
1525+++ bzrlib/smart/protocol.py 2009-08-14 05:35:32 +0000
1526@@ -1209,6 +1209,8 @@
1527 except (KeyboardInterrupt, SystemExit):
1528 raise
1529 except Exception:
1530+ mutter('_iter_with_errors caught error')
1531+ log_exception_quietly()
1532 yield sys.exc_info(), None
1533 return
1534
1535
1536=== modified file 'bzrlib/smart/repository.py'
1537--- bzrlib/smart/repository.py 2009-06-16 06:46:32 +0000
1538+++ bzrlib/smart/repository.py 2009-08-14 05:35:32 +0000
1539@@ -30,6 +30,7 @@
1540 graph,
1541 osutils,
1542 pack,
1543+ versionedfile,
1544 )
1545 from bzrlib.bzrdir import BzrDir
1546 from bzrlib.smart.request import (
1547@@ -39,7 +40,10 @@
1548 )
1549 from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
1550 from bzrlib import revision as _mod_revision
1551-from bzrlib.versionedfile import NetworkRecordStream, record_to_fulltext_bytes
1552+from bzrlib.versionedfile import (
1553+ NetworkRecordStream,
1554+ record_to_fulltext_bytes,
1555+ )
1556
1557
1558 class SmartServerRepositoryRequest(SmartServerRequest):
1559@@ -414,8 +418,42 @@
1560 repository.
1561 """
1562 self._to_format = network_format_registry.get(to_network_name)
1563+ if self._should_fake_unknown():
1564+ return FailedSmartServerResponse(
1565+ ('UnknownMethod', 'Repository.get_stream'))
1566 return None # Signal that we want a body.
1567
1568+ def _should_fake_unknown(self):
1569+ """Return True if we should return UnknownMethod to the client.
1570+
1571+ This is a workaround for bugs in pre-1.19 clients that claim to
1572+ support receiving streams of CHK repositories. The pre-1.19 client
1573+ expects inventory records to be serialized in the format defined by
1574+ to_network_name, but in pre-1.19 (at least) that format definition
1575+ tries to use the xml5 serializer, which does not correctly handle
1576+ rich-roots. After 1.19 the client can also accept inventory-deltas
1577+ (which avoids this issue), and those clients will use the
1578+ Repository.get_stream_1.19 verb instead of this one.
1579+ So: if this repository is CHK, and the to_format doesn't match,
1580+ we should just fake an UnknownSmartMethod error so that the client
1581+ will fallback to VFS, rather than sending it a stream we know it
1582+ cannot handle.
1583+ """
1584+ from_format = self._repository._format
1585+ to_format = self._to_format
1586+ if not from_format.supports_chks:
1587+ # Source not CHK: that's ok
1588+ return False
1589+ if (to_format.supports_chks and
1590+ from_format.repository_class is to_format.repository_class and
1591+ from_format._serializer == to_format._serializer):
1592+ # Source is CHK, but target matches: that's ok
1593+ # (e.g. 2a->2a, or CHK2->2a)
1594+ return False
1595+ # Source is CHK, and target is not CHK or incompatible CHK. We can't
1596+ # generate a compatible stream.
1597+ return True
1598+
1599 def do_body(self, body_bytes):
1600 repository = self._repository
1601 repository.lock_read()
1602@@ -451,6 +489,13 @@
1603 repository.unlock()
1604
1605
1606+class SmartServerRepositoryGetStream_1_19(SmartServerRepositoryGetStream):
1607+
1608+ def _should_fake_unknown(self):
1609+ """Returns False; we don't need to workaround bugs in 1.19+ clients."""
1610+ return False
1611+
1612+
1613 def _stream_to_byte_stream(stream, src_format):
1614 """Convert a record stream to a self delimited byte stream."""
1615 pack_writer = pack.ContainerSerialiser()
1616@@ -460,6 +505,8 @@
1617 for record in substream:
1618 if record.storage_kind in ('chunked', 'fulltext'):
1619 serialised = record_to_fulltext_bytes(record)
1620+ elif record.storage_kind == 'inventory-delta':
1621+ serialised = record_to_inventory_delta_bytes(record)
1622 elif record.storage_kind == 'absent':
1623 raise ValueError("Absent factory for %s" % (record.key,))
1624 else:
1625@@ -650,6 +697,23 @@
1626 return SuccessfulSmartServerResponse(('ok', ))
1627
1628
1629+class SmartServerRepositoryInsertStream_1_19(SmartServerRepositoryInsertStreamLocked):
1630+ """Insert a record stream from a RemoteSink into a repository.
1631+
1632+ Same as SmartServerRepositoryInsertStreamLocked, except:
1633+ - the lock token argument is optional
1634+ - servers that implement this verb accept 'inventory-delta' records in the
1635+ stream.
1636+
1637+ New in 1.19.
1638+ """
1639+
1640+ def do_repository_request(self, repository, resume_tokens, lock_token=None):
1641+ """StreamSink.insert_stream for a remote repository."""
1642+ SmartServerRepositoryInsertStreamLocked.do_repository_request(
1643+ self, repository, resume_tokens, lock_token)
1644+
1645+
1646 class SmartServerRepositoryInsertStream(SmartServerRepositoryInsertStreamLocked):
1647 """Insert a record stream from a RemoteSink into an unlocked repository.
1648
1649
1650=== modified file 'bzrlib/smart/request.py'
1651--- bzrlib/smart/request.py 2009-07-27 02:06:05 +0000
1652+++ bzrlib/smart/request.py 2009-08-14 05:35:32 +0000
1653@@ -553,6 +553,8 @@
1654 request_handlers.register_lazy(
1655 'Repository.insert_stream', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream')
1656 request_handlers.register_lazy(
1657+ 'Repository.insert_stream_1.19', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream_1_19')
1658+request_handlers.register_lazy(
1659 'Repository.insert_stream_locked', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStreamLocked')
1660 request_handlers.register_lazy(
1661 'Repository.is_shared', 'bzrlib.smart.repository', 'SmartServerRepositoryIsShared')
1662@@ -570,6 +572,9 @@
1663 'Repository.get_stream', 'bzrlib.smart.repository',
1664 'SmartServerRepositoryGetStream')
1665 request_handlers.register_lazy(
1666+ 'Repository.get_stream_1.19', 'bzrlib.smart.repository',
1667+ 'SmartServerRepositoryGetStream_1_19')
1668+request_handlers.register_lazy(
1669 'Repository.tarball', 'bzrlib.smart.repository',
1670 'SmartServerRepositoryTarball')
1671 request_handlers.register_lazy(
1672
1673=== modified file 'bzrlib/tests/__init__.py'
1674--- bzrlib/tests/__init__.py 2009-08-12 18:49:22 +0000
1675+++ bzrlib/tests/__init__.py 2009-08-14 05:35:32 +0000
1676@@ -1938,6 +1938,16 @@
1677 sio.encoding = output_encoding
1678 return sio
1679
1680+ def disable_verb(self, verb):
1681+ """Disable a smart server verb for one test."""
1682+ from bzrlib.smart import request
1683+ request_handlers = request.request_handlers
1684+ orig_method = request_handlers.get(verb)
1685+ request_handlers.remove(verb)
1686+ def restoreVerb():
1687+ request_handlers.register(verb, orig_method)
1688+ self.addCleanup(restoreVerb)
1689+
1690
1691 class CapturedCall(object):
1692 """A helper for capturing smart server calls for easy debug analysis."""
1693
1694=== modified file 'bzrlib/tests/per_branch/test_push.py'
1695--- bzrlib/tests/per_branch/test_push.py 2009-08-05 02:30:59 +0000
1696+++ bzrlib/tests/per_branch/test_push.py 2009-08-14 05:35:32 +0000
1697@@ -261,14 +261,15 @@
1698 self.assertFalse(local.is_locked())
1699 local.push(remote)
1700 hpss_call_names = [item.call.method for item in self.hpss_calls]
1701- self.assertTrue('Repository.insert_stream' in hpss_call_names)
1702- insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
1703+ self.assertTrue('Repository.insert_stream_1.19' in hpss_call_names)
1704+ insert_stream_idx = hpss_call_names.index(
1705+ 'Repository.insert_stream_1.19')
1706 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
1707 # After inserting the stream the client has no reason to query the
1708 # remote graph any further.
1709 self.assertEqual(
1710- ['Repository.insert_stream', 'Repository.insert_stream', 'get',
1711- 'Branch.set_last_revision_info', 'Branch.unlock'],
1712+ ['Repository.insert_stream_1.19', 'Repository.insert_stream_1.19',
1713+ 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
1714 calls_after_insert_stream)
1715
1716 def disableOptimisticGetParentMap(self):
1717
1718=== modified file 'bzrlib/tests/per_interbranch/test_push.py'
1719--- bzrlib/tests/per_interbranch/test_push.py 2009-08-05 02:30:59 +0000
1720+++ bzrlib/tests/per_interbranch/test_push.py 2009-08-14 05:35:32 +0000
1721@@ -267,14 +267,15 @@
1722 self.assertFalse(local.is_locked())
1723 local.push(remote)
1724 hpss_call_names = [item.call.method for item in self.hpss_calls]
1725- self.assertTrue('Repository.insert_stream' in hpss_call_names)
1726- insert_stream_idx = hpss_call_names.index('Repository.insert_stream')
1727+ self.assertTrue('Repository.insert_stream_1.19' in hpss_call_names)
1728+ insert_stream_idx = hpss_call_names.index(
1729+ 'Repository.insert_stream_1.19')
1730 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
1731 # After inserting the stream the client has no reason to query the
1732 # remote graph any further.
1733 self.assertEqual(
1734- ['Repository.insert_stream', 'Repository.insert_stream', 'get',
1735- 'Branch.set_last_revision_info', 'Branch.unlock'],
1736+ ['Repository.insert_stream_1.19', 'Repository.insert_stream_1.19',
1737+ 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
1738 calls_after_insert_stream)
1739
1740 def disableOptimisticGetParentMap(self):
1741
1742=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
1743--- bzrlib/tests/per_interrepository/__init__.py 2009-08-11 21:02:46 +0000
1744+++ bzrlib/tests/per_interrepository/__init__.py 2009-08-14 05:35:33 +0000
1745@@ -32,8 +32,6 @@
1746 )
1747
1748 from bzrlib.repository import (
1749- InterDifferingSerializer,
1750- InterKnitRepo,
1751 InterRepository,
1752 )
1753 from bzrlib.tests import (
1754@@ -48,18 +46,16 @@
1755 """Transform the input formats to a list of scenarios.
1756
1757 :param formats: A list of tuples:
1758- (interrepo_class, repository_format, repository_format_to).
1759+ (label, repository_format, repository_format_to).
1760 """
1761 result = []
1762- for interrepo_class, repository_format, repository_format_to in formats:
1763- id = '%s,%s,%s' % (interrepo_class.__name__,
1764- repository_format.__class__.__name__,
1765- repository_format_to.__class__.__name__)
1766+ for label, repository_format, repository_format_to in formats:
1767+ id = '%s,%s,%s' % (label, repository_format.__class__.__name__,
1768+ repository_format_to.__class__.__name__)
1769 scenario = (id,
1770 {"transport_server": transport_server,
1771 "transport_readonly_server": transport_readonly_server,
1772 "repository_format": repository_format,
1773- "interrepo_class": interrepo_class,
1774 "repository_format_to": repository_format_to,
1775 })
1776 result.append(scenario)
1777@@ -75,6 +71,8 @@
1778 weaverepo,
1779 )
1780 result = []
1781+ def add_combo(label, from_format, to_format):
1782+ result.append((label, from_format, to_format))
1783 # test the default InterRepository between format 6 and the current
1784 # default format.
1785 # XXX: robertc 20060220 reinstate this when there are two supported
1786@@ -85,40 +83,48 @@
1787 for optimiser_class in InterRepository._optimisers:
1788 format_to_test = optimiser_class._get_repo_format_to_test()
1789 if format_to_test is not None:
1790- result.append((optimiser_class,
1791- format_to_test, format_to_test))
1792+ add_combo(optimiser_class.__name__, format_to_test, format_to_test)
1793 # if there are specific combinations we want to use, we can add them
1794 # here. We want to test rich root upgrading.
1795- result.append((InterRepository,
1796- weaverepo.RepositoryFormat5(),
1797- knitrepo.RepositoryFormatKnit3()))
1798- result.append((InterRepository,
1799- knitrepo.RepositoryFormatKnit1(),
1800- knitrepo.RepositoryFormatKnit3()))
1801- result.append((InterRepository,
1802- knitrepo.RepositoryFormatKnit1(),
1803- knitrepo.RepositoryFormatKnit3()))
1804- result.append((InterKnitRepo,
1805- knitrepo.RepositoryFormatKnit1(),
1806- pack_repo.RepositoryFormatKnitPack1()))
1807- result.append((InterKnitRepo,
1808- pack_repo.RepositoryFormatKnitPack1(),
1809- knitrepo.RepositoryFormatKnit1()))
1810- result.append((InterKnitRepo,
1811- knitrepo.RepositoryFormatKnit3(),
1812- pack_repo.RepositoryFormatKnitPack3()))
1813- result.append((InterKnitRepo,
1814- pack_repo.RepositoryFormatKnitPack3(),
1815- knitrepo.RepositoryFormatKnit3()))
1816- result.append((InterKnitRepo,
1817- pack_repo.RepositoryFormatKnitPack3(),
1818- pack_repo.RepositoryFormatKnitPack4()))
1819- result.append((InterDifferingSerializer,
1820- pack_repo.RepositoryFormatKnitPack1(),
1821- pack_repo.RepositoryFormatKnitPack6RichRoot()))
1822- result.append((InterDifferingSerializer,
1823- pack_repo.RepositoryFormatKnitPack6RichRoot(),
1824- groupcompress_repo.RepositoryFormat2a()))
1825+ # XXX: although we attach InterRepository class names to these scenarios,
1826+ # there's nothing asserting that these labels correspond to what is
1827+ # actually used.
1828+ add_combo('InterRepository',
1829+ weaverepo.RepositoryFormat5(),
1830+ knitrepo.RepositoryFormatKnit3())
1831+ add_combo('InterRepository',
1832+ knitrepo.RepositoryFormatKnit1(),
1833+ knitrepo.RepositoryFormatKnit3())
1834+ add_combo('InterKnitRepo',
1835+ knitrepo.RepositoryFormatKnit1(),
1836+ pack_repo.RepositoryFormatKnitPack1())
1837+ add_combo('InterKnitRepo',
1838+ pack_repo.RepositoryFormatKnitPack1(),
1839+ knitrepo.RepositoryFormatKnit1())
1840+ add_combo('InterKnitRepo',
1841+ knitrepo.RepositoryFormatKnit3(),
1842+ pack_repo.RepositoryFormatKnitPack3())
1843+ add_combo('InterKnitRepo',
1844+ pack_repo.RepositoryFormatKnitPack3(),
1845+ knitrepo.RepositoryFormatKnit3())
1846+ add_combo('InterKnitRepo',
1847+ pack_repo.RepositoryFormatKnitPack3(),
1848+ pack_repo.RepositoryFormatKnitPack4())
1849+ add_combo('InterDifferingSerializer',
1850+ pack_repo.RepositoryFormatKnitPack1(),
1851+ pack_repo.RepositoryFormatKnitPack6RichRoot())
1852+ add_combo('InterDifferingSerializer',
1853+ pack_repo.RepositoryFormatKnitPack6RichRoot(),
1854+ groupcompress_repo.RepositoryFormat2a())
1855+ add_combo('InterDifferingSerializer',
1856+ groupcompress_repo.RepositoryFormat2a(),
1857+ pack_repo.RepositoryFormatKnitPack6RichRoot())
1858+ add_combo('InterRepository',
1859+ groupcompress_repo.RepositoryFormatCHK2(),
1860+ groupcompress_repo.RepositoryFormat2a())
1861+ add_combo('InterDifferingSerializer',
1862+ groupcompress_repo.RepositoryFormatCHK1(),
1863+ groupcompress_repo.RepositoryFormat2a())
1864 return result
1865
1866
1867
1868=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
1869--- bzrlib/tests/per_interrepository/test_fetch.py 2009-08-12 02:21:06 +0000
1870+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-14 05:35:33 +0000
1871@@ -28,6 +28,9 @@
1872 from bzrlib.errors import (
1873 NoSuchRevision,
1874 )
1875+from bzrlib.graph import (
1876+ SearchResult,
1877+ )
1878 from bzrlib.revision import (
1879 NULL_REVISION,
1880 Revision,
1881@@ -124,6 +127,15 @@
1882 to_repo.texts.get_record_stream([('foo', revid)],
1883 'unordered', True).next().get_bytes_as('fulltext'))
1884
1885+ def test_fetch_parent_inventories_at_stacking_boundary_smart(self):
1886+ self.setup_smart_server_with_call_log()
1887+ self.test_fetch_parent_inventories_at_stacking_boundary()
1888+
1889+ def test_fetch_parent_inventories_at_stacking_boundary_smart_old(self):
1890+ self.setup_smart_server_with_call_log()
1891+ self.disable_verb('Repository.insert_stream_1.19')
1892+ self.test_fetch_parent_inventories_at_stacking_boundary()
1893+
1894 def test_fetch_parent_inventories_at_stacking_boundary(self):
1895 """Fetch to a stacked branch copies inventories for parents of
1896 revisions at the stacking boundary.
1897@@ -180,6 +192,28 @@
1898 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)
1899 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)
1900
1901+ # Finally, it's not enough to see that the basis inventories are
1902+ # present. The texts introduced in merge (and only those) should be
1903+ # present, and also generating a stream should succeed without blowing
1904+ # up.
1905+ self.assertTrue(unstacked_repo.has_revision('merge'))
1906+ expected_texts = set([('file-id', 'merge')])
1907+ if stacked_branch.repository.texts.get_parent_map([('root-id',
1908+ 'merge')]):
1909+ # If a (root-id,merge) text exists, it should be in the stacked
1910+ # repo.
1911+ expected_texts.add(('root-id', 'merge'))
1912+ self.assertEqual(expected_texts, unstacked_repo.texts.keys())
1913+ self.assertCanStreamRevision(unstacked_repo, 'merge')
1914+
1915+ def assertCanStreamRevision(self, repo, revision_id):
1916+ exclude_keys = set(repo.all_revision_ids()) - set([revision_id])
1917+ search = SearchResult([revision_id], exclude_keys, 1, [revision_id])
1918+ source = repo._get_source(repo._format)
1919+ for substream_kind, substream in source.get_stream(search):
1920+ # Consume the substream
1921+ list(substream)
1922+
1923 def test_fetch_across_stacking_boundary_ignores_ghost(self):
1924 if not self.repository_format_to.supports_external_lookups:
1925 raise TestNotApplicable("Need stacking support in the target.")
1926@@ -218,6 +252,19 @@
1927 self.addCleanup(stacked_branch.unlock)
1928 stacked_second_tree = stacked_branch.repository.revision_tree('second')
1929 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)
1930+ # Finally, it's not enough to see that the basis inventories are
1931+ # present. The texts introduced in merge (and only those) should be
1932+ # present, and also generating a stream should succeed without blowing
1933+ # up.
1934+ self.assertTrue(unstacked_repo.has_revision('third'))
1935+ expected_texts = set([('file-id', 'third')])
1936+ if stacked_branch.repository.texts.get_parent_map([('root-id',
1937+ 'third')]):
1938+ # If a (root-id,third) text exists, it should be in the stacked
1939+ # repo.
1940+ expected_texts.add(('root-id', 'third'))
1941+ self.assertEqual(expected_texts, unstacked_repo.texts.keys())
1942+ self.assertCanStreamRevision(unstacked_repo, 'third')
1943
1944 def test_fetch_missing_basis_text(self):
1945 """If fetching a delta, we should die if a basis is not present."""
1946
1947=== modified file 'bzrlib/tests/per_pack_repository.py'
1948--- bzrlib/tests/per_pack_repository.py 2009-08-12 22:28:28 +0000
1949+++ bzrlib/tests/per_pack_repository.py 2009-08-14 05:35:32 +0000
1950@@ -1051,8 +1051,8 @@
1951 tree.branch.push(remote_branch)
1952 autopack_calls = len([call for call in self.hpss_calls if call ==
1953 'PackRepository.autopack'])
1954- streaming_calls = len([call for call in self.hpss_calls if call ==
1955- 'Repository.insert_stream'])
1956+ streaming_calls = len([call for call in self.hpss_calls if call in
1957+ ('Repository.insert_stream', 'Repository.insert_stream_1.19')])
1958 if autopack_calls:
1959 # Non streaming server
1960 self.assertEqual(1, autopack_calls)
1961
1962=== modified file 'bzrlib/tests/test_inventory_delta.py'
1963--- bzrlib/tests/test_inventory_delta.py 2009-04-02 05:53:12 +0000
1964+++ bzrlib/tests/test_inventory_delta.py 2009-08-14 05:35:32 +0000
1965@@ -26,6 +26,7 @@
1966 inventory,
1967 inventory_delta,
1968 )
1969+from bzrlib.inventory_delta import InventoryDeltaError
1970 from bzrlib.inventory import Inventory
1971 from bzrlib.revision import NULL_REVISION
1972 from bzrlib.tests import TestCase
1973@@ -68,7 +69,7 @@
1974 version: entry-version
1975 versioned_root: false
1976 tree_references: false
1977-None\x00/\x00TREE_ROOT\x00\x00null:\x00dir
1978+None\x00/\x00TREE_ROOT\x00\x00entry-version\x00dir
1979 """
1980
1981 reference_lines = """format: bzr inventory delta v1 (bzr 1.14)
1982@@ -93,39 +94,34 @@
1983 """Test InventoryDeltaSerializer.parse_text_bytes."""
1984
1985 def test_parse_no_bytes(self):
1986- serializer = inventory_delta.InventoryDeltaSerializer(
1987- versioned_root=True, tree_references=True)
1988+ deserializer = inventory_delta.InventoryDeltaDeserializer()
1989 err = self.assertRaises(
1990- errors.BzrError, serializer.parse_text_bytes, '')
1991- self.assertContainsRe(str(err), 'unknown format')
1992+ InventoryDeltaError, deserializer.parse_text_bytes, '')
1993+ self.assertContainsRe(str(err), 'last line not empty')
1994
1995 def test_parse_bad_format(self):
1996- serializer = inventory_delta.InventoryDeltaSerializer(
1997- versioned_root=True, tree_references=True)
1998- err = self.assertRaises(errors.BzrError,
1999- serializer.parse_text_bytes, 'format: foo\n')
2000+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2001+ err = self.assertRaises(InventoryDeltaError,
2002+ deserializer.parse_text_bytes, 'format: foo\n')
2003 self.assertContainsRe(str(err), 'unknown format')
2004
2005 def test_parse_no_parent(self):
2006- serializer = inventory_delta.InventoryDeltaSerializer(
2007- versioned_root=True, tree_references=True)
2008- err = self.assertRaises(errors.BzrError,
2009- serializer.parse_text_bytes,
2010+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2011+ err = self.assertRaises(InventoryDeltaError,
2012+ deserializer.parse_text_bytes,
2013 'format: bzr inventory delta v1 (bzr 1.14)\n')
2014 self.assertContainsRe(str(err), 'missing parent: marker')
2015
2016 def test_parse_no_version(self):
2017- serializer = inventory_delta.InventoryDeltaSerializer(
2018- versioned_root=True, tree_references=True)
2019- err = self.assertRaises(errors.BzrError,
2020- serializer.parse_text_bytes,
2021+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2022+ err = self.assertRaises(InventoryDeltaError,
2023+ deserializer.parse_text_bytes,
2024 'format: bzr inventory delta v1 (bzr 1.14)\n'
2025 'parent: null:\n')
2026 self.assertContainsRe(str(err), 'missing version: marker')
2027
2028 def test_parse_duplicate_key_errors(self):
2029- serializer = inventory_delta.InventoryDeltaSerializer(
2030- versioned_root=True, tree_references=True)
2031+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2032 double_root_lines = \
2033 """format: bzr inventory delta v1 (bzr 1.14)
2034 parent: null:
2035@@ -135,24 +131,23 @@
2036 None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
2037 None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
2038 """
2039- err = self.assertRaises(errors.BzrError,
2040- serializer.parse_text_bytes, double_root_lines)
2041+ err = self.assertRaises(InventoryDeltaError,
2042+ deserializer.parse_text_bytes, double_root_lines)
2043 self.assertContainsRe(str(err), 'duplicate file id')
2044
2045 def test_parse_versioned_root_only(self):
2046- serializer = inventory_delta.InventoryDeltaSerializer(
2047- versioned_root=True, tree_references=True)
2048- parse_result = serializer.parse_text_bytes(root_only_lines)
2049+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2050+ parse_result = deserializer.parse_text_bytes(root_only_lines)
2051 expected_entry = inventory.make_entry(
2052 'directory', u'', None, 'an-id')
2053 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
2054 self.assertEqual(
2055- ('null:', 'entry-version', [(None, '/', 'an-id', expected_entry)]),
2056+ ('null:', 'entry-version', True, True,
2057+ [(None, '', 'an-id', expected_entry)]),
2058 parse_result)
2059
2060 def test_parse_special_revid_not_valid_last_mod(self):
2061- serializer = inventory_delta.InventoryDeltaSerializer(
2062- versioned_root=False, tree_references=True)
2063+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2064 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
2065 parent: null:
2066 version: null:
2067@@ -160,13 +155,12 @@
2068 tree_references: true
2069 None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00
2070 """
2071- err = self.assertRaises(errors.BzrError,
2072- serializer.parse_text_bytes, root_only_lines)
2073+ err = self.assertRaises(InventoryDeltaError,
2074+ deserializer.parse_text_bytes, root_only_lines)
2075 self.assertContainsRe(str(err), 'special revisionid found')
2076
2077 def test_parse_versioned_root_versioned_disabled(self):
2078- serializer = inventory_delta.InventoryDeltaSerializer(
2079- versioned_root=False, tree_references=True)
2080+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2081 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
2082 parent: null:
2083 version: null:
2084@@ -174,39 +168,134 @@
2085 tree_references: true
2086 None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
2087 """
2088- err = self.assertRaises(errors.BzrError,
2089- serializer.parse_text_bytes, root_only_lines)
2090+ err = self.assertRaises(InventoryDeltaError,
2091+ deserializer.parse_text_bytes, root_only_lines)
2092 self.assertContainsRe(str(err), 'Versioned root found')
2093
2094 def test_parse_unique_root_id_root_versioned_disabled(self):
2095- serializer = inventory_delta.InventoryDeltaSerializer(
2096- versioned_root=False, tree_references=True)
2097+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2098 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
2099-parent: null:
2100-version: null:
2101+parent: parent-id
2102+version: a@e\xc3\xa5ample.com--2004
2103 versioned_root: false
2104 tree_references: true
2105-None\x00/\x00an-id\x00\x00null:\x00dir\x00\x00
2106+None\x00/\x00an-id\x00\x00parent-id\x00dir\x00\x00
2107 """
2108- err = self.assertRaises(errors.BzrError,
2109- serializer.parse_text_bytes, root_only_lines)
2110+ err = self.assertRaises(InventoryDeltaError,
2111+ deserializer.parse_text_bytes, root_only_lines)
2112 self.assertContainsRe(str(err), 'Versioned root found')
2113
2114 def test_parse_unversioned_root_versioning_enabled(self):
2115- serializer = inventory_delta.InventoryDeltaSerializer(
2116- versioned_root=True, tree_references=True)
2117- err = self.assertRaises(errors.BzrError,
2118- serializer.parse_text_bytes, root_only_unversioned)
2119- self.assertContainsRe(
2120- str(err), 'serialized versioned_root flag is wrong: False')
2121+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2122+ parse_result = deserializer.parse_text_bytes(root_only_unversioned)
2123+ expected_entry = inventory.make_entry(
2124+ 'directory', u'', None, 'TREE_ROOT')
2125+ expected_entry.revision = 'entry-version'
2126+ self.assertEqual(
2127+ ('null:', 'entry-version', False, False,
2128+ [(None, u'', 'TREE_ROOT', expected_entry)]),
2129+ parse_result)
2130+
2131+ def test_parse_versioned_root_when_disabled(self):
2132+ deserializer = inventory_delta.InventoryDeltaDeserializer(
2133+ allow_versioned_root=False)
2134+ err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
2135+ deserializer.parse_text_bytes, root_only_lines)
2136+ self.assertEquals("versioned_root not allowed", str(err))
2137
2138 def test_parse_tree_when_disabled(self):
2139- serializer = inventory_delta.InventoryDeltaSerializer(
2140- versioned_root=True, tree_references=False)
2141- err = self.assertRaises(errors.BzrError,
2142- serializer.parse_text_bytes, reference_lines)
2143- self.assertContainsRe(
2144- str(err), 'serialized tree_references flag is wrong: True')
2145+ deserializer = inventory_delta.InventoryDeltaDeserializer(
2146+ allow_tree_references=False)
2147+ err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
2148+ deserializer.parse_text_bytes, reference_lines)
2149+ self.assertEquals("Tree reference not allowed", str(err))
2150+
2151+ def test_parse_tree_when_header_disallows(self):
2152+ # A deserializer that allows tree_references to be set or unset.
2153+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2154+ # A serialised inventory delta with a header saying no tree refs, but
2155+ # that has a tree ref in its content.
2156+ lines = """format: bzr inventory delta v1 (bzr 1.14)
2157+parent: null:
2158+version: entry-version
2159+versioned_root: false
2160+tree_references: false
2161+None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
2162+"""
2163+ err = self.assertRaises(InventoryDeltaError,
2164+ deserializer.parse_text_bytes, lines)
2165+ self.assertContainsRe(str(err), 'Tree reference found')
2166+
2167+ def test_parse_versioned_root_when_header_disallows(self):
2168+ # A deserializer that allows tree_references to be set or unset.
2169+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2170+ # A serialised inventory delta with a header saying no tree refs, but
2171+ # that has a tree ref in its content.
2172+ lines = """format: bzr inventory delta v1 (bzr 1.14)
2173+parent: null:
2174+version: entry-version
2175+versioned_root: false
2176+tree_references: false
2177+None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
2178+"""
2179+ err = self.assertRaises(InventoryDeltaError,
2180+ deserializer.parse_text_bytes, lines)
2181+ self.assertContainsRe(str(err), 'Versioned root found')
2182+
2183+ def test_parse_last_line_not_empty(self):
2184+ """newpath must start with / if it is not None."""
2185+ # Trim the trailing newline from a valid serialization
2186+ lines = root_only_lines[:-1]
2187+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2188+ err = self.assertRaises(InventoryDeltaError,
2189+ deserializer.parse_text_bytes, lines)
2190+ self.assertContainsRe(str(err), 'last line not empty')
2191+
2192+ def test_parse_invalid_newpath(self):
2193+ """newpath must start with / if it is not None."""
2194+ lines = empty_lines
2195+ lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
2196+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2197+ err = self.assertRaises(InventoryDeltaError,
2198+ deserializer.parse_text_bytes, lines)
2199+ self.assertContainsRe(str(err), 'newpath invalid')
2200+
2201+ def test_parse_invalid_oldpath(self):
2202+ """oldpath must start with / if it is not None."""
2203+ lines = root_only_lines
2204+ lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
2205+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2206+ err = self.assertRaises(InventoryDeltaError,
2207+ deserializer.parse_text_bytes, lines)
2208+ self.assertContainsRe(str(err), 'oldpath invalid')
2209+
2210+ def test_parse_new_file(self):
2211+ """a new file is parsed correctly"""
2212+ lines = root_only_lines
2213+ fake_sha = "deadbeef" * 5
2214+ lines += (
2215+ "None\x00/new\x00file-id\x00an-id\x00version\x00file\x00123\x00" +
2216+ "\x00" + fake_sha + "\n")
2217+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2218+ parse_result = deserializer.parse_text_bytes(lines)
2219+ expected_entry = inventory.make_entry(
2220+ 'file', u'new', 'an-id', 'file-id')
2221+ expected_entry.revision = 'version'
2222+ expected_entry.text_size = 123
2223+ expected_entry.text_sha1 = fake_sha
2224+ delta = parse_result[4]
2225+ self.assertEqual(
2226+ (None, u'new', 'file-id', expected_entry), delta[-1])
2227+
2228+ def test_parse_delete(self):
2229+ lines = root_only_lines
2230+ lines += (
2231+ "/old-file\x00None\x00deleted-id\x00\x00null:\x00deleted\x00\x00\n")
2232+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2233+ parse_result = deserializer.parse_text_bytes(lines)
2234+ delta = parse_result[4]
2235+ self.assertEqual(
2236+ (u'old-file', None, 'deleted-id', None), delta[-1])
2237
2238
2239 class TestSerialization(TestCase):
2240@@ -237,12 +326,20 @@
2241 old_inv = Inventory(None)
2242 new_inv = Inventory(None)
2243 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
2244+ # Implicit roots are considered modified in every revision.
2245+ root.revision = 'entry-version'
2246 new_inv.add(root)
2247 delta = new_inv._make_delta(old_inv)
2248 serializer = inventory_delta.InventoryDeltaSerializer(
2249 versioned_root=False, tree_references=False)
2250+ serialized_lines = serializer.delta_to_lines(
2251+ NULL_REVISION, 'entry-version', delta)
2252 self.assertEqual(StringIO(root_only_unversioned).readlines(),
2253- serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
2254+ serialized_lines)
2255+ deserializer = inventory_delta.InventoryDeltaDeserializer()
2256+ self.assertEqual(
2257+ (NULL_REVISION, 'entry-version', False, False, delta),
2258+ deserializer.parse_text_bytes(''.join(serialized_lines)))
2259
2260 def test_unversioned_non_root_errors(self):
2261 old_inv = Inventory(None)
2262@@ -255,7 +352,7 @@
2263 delta = new_inv._make_delta(old_inv)
2264 serializer = inventory_delta.InventoryDeltaSerializer(
2265 versioned_root=True, tree_references=True)
2266- err = self.assertRaises(errors.BzrError,
2267+ err = self.assertRaises(InventoryDeltaError,
2268 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
2269 self.assertEqual(str(err), 'no version for fileid id')
2270
2271@@ -267,7 +364,7 @@
2272 delta = new_inv._make_delta(old_inv)
2273 serializer = inventory_delta.InventoryDeltaSerializer(
2274 versioned_root=True, tree_references=True)
2275- err = self.assertRaises(errors.BzrError,
2276+ err = self.assertRaises(InventoryDeltaError,
2277 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
2278 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
2279
2280@@ -280,22 +377,9 @@
2281 delta = new_inv._make_delta(old_inv)
2282 serializer = inventory_delta.InventoryDeltaSerializer(
2283 versioned_root=False, tree_references=True)
2284- err = self.assertRaises(errors.BzrError,
2285- serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
2286- self.assertEqual(str(err), 'Version present for / in TREE_ROOT')
2287-
2288- def test_nonrichroot_non_TREE_ROOT_id_errors(self):
2289- old_inv = Inventory(None)
2290- new_inv = Inventory(None)
2291- root = new_inv.make_entry('directory', '', None, 'my-rich-root-id')
2292- new_inv.add(root)
2293- delta = new_inv._make_delta(old_inv)
2294- serializer = inventory_delta.InventoryDeltaSerializer(
2295- versioned_root=False, tree_references=True)
2296- err = self.assertRaises(errors.BzrError,
2297- serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
2298- self.assertEqual(
2299- str(err), 'file_id my-rich-root-id is not TREE_ROOT for /')
2300+ err = self.assertRaises(InventoryDeltaError,
2301+ serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
2302+ self.assertStartsWith(str(err), 'Version present for / in TREE_ROOT')
2303
2304 def test_unknown_kind_errors(self):
2305 old_inv = Inventory(None)
2306@@ -354,19 +438,22 @@
2307 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
2308
2309 def test_to_inventory_root_id_versioned_not_permitted(self):
2310- delta = [(None, '/', 'TREE_ROOT', inventory.make_entry(
2311- 'directory', '', None, 'TREE_ROOT'))]
2312- serializer = inventory_delta.InventoryDeltaSerializer(False, True)
2313+ root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
2314+ root_entry.revision = 'some-version'
2315+ delta = [(None, '', 'TREE_ROOT', root_entry)]
2316+ serializer = inventory_delta.InventoryDeltaSerializer(
2317+ versioned_root=False, tree_references=True)
2318 self.assertRaises(
2319- errors.BzrError, serializer.delta_to_lines, 'old-version',
2320+ InventoryDeltaError, serializer.delta_to_lines, 'old-version',
2321 'new-version', delta)
2322
2323 def test_to_inventory_root_id_not_versioned(self):
2324- delta = [(None, '/', 'an-id', inventory.make_entry(
2325+ delta = [(None, '', 'an-id', inventory.make_entry(
2326 'directory', '', None, 'an-id'))]
2327- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
2328+ serializer = inventory_delta.InventoryDeltaSerializer(
2329+ versioned_root=True, tree_references=True)
2330 self.assertRaises(
2331- errors.BzrError, serializer.delta_to_lines, 'old-version',
2332+ InventoryDeltaError, serializer.delta_to_lines, 'old-version',
2333 'new-version', delta)
2334
2335 def test_to_inventory_has_tree_not_meant_to(self):
2336@@ -374,13 +461,14 @@
2337 tree_ref = make_entry('tree-reference', 'foo', 'changed-in', 'ref-id')
2338 tree_ref.reference_revision = 'ref-revision'
2339 delta = [
2340- (None, '/', 'an-id',
2341+ (None, '', 'an-id',
2342 make_entry('directory', '', 'changed-in', 'an-id')),
2343- (None, '/foo', 'ref-id', tree_ref)
2344+ (None, 'foo', 'ref-id', tree_ref)
2345 # a file that followed the root move
2346 ]
2347- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
2348- self.assertRaises(errors.BzrError, serializer.delta_to_lines,
2349+ serializer = inventory_delta.InventoryDeltaSerializer(
2350+ versioned_root=True, tree_references=True)
2351+ self.assertRaises(InventoryDeltaError, serializer.delta_to_lines,
2352 'old-version', 'new-version', delta)
2353
2354 def test_to_inventory_torture(self):
2355@@ -430,7 +518,8 @@
2356 executable=True, text_size=30, text_sha1='some-sha',
2357 revision='old-rev')),
2358 ]
2359- serializer = inventory_delta.InventoryDeltaSerializer(True, True)
2360+ serializer = inventory_delta.InventoryDeltaSerializer(
2361+ versioned_root=True, tree_references=True)
2362 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
2363 expected = """format: bzr inventory delta v1 (bzr 1.14)
2364 parent: null:
2365@@ -483,13 +572,13 @@
2366 def test_file_without_size(self):
2367 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
2368 file_entry.text_sha1 = 'foo'
2369- self.assertRaises(errors.BzrError,
2370+ self.assertRaises(InventoryDeltaError,
2371 inventory_delta._file_content, file_entry)
2372
2373 def test_file_without_sha1(self):
2374 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
2375 file_entry.text_size = 10
2376- self.assertRaises(errors.BzrError,
2377+ self.assertRaises(InventoryDeltaError,
2378 inventory_delta._file_content, file_entry)
2379
2380 def test_link_empty_target(self):
2381@@ -512,7 +601,7 @@
2382
2383 def test_link_no_target(self):
2384 entry = inventory.make_entry('symlink', 'a link', None)
2385- self.assertRaises(errors.BzrError,
2386+ self.assertRaises(InventoryDeltaError,
2387 inventory_delta._link_content, entry)
2388
2389 def test_reference_null(self):
2390@@ -529,5 +618,5 @@
2391
2392 def test_reference_no_reference(self):
2393 entry = inventory.make_entry('tree-reference', 'a tree', None)
2394- self.assertRaises(errors.BzrError,
2395+ self.assertRaises(InventoryDeltaError,
2396 inventory_delta._reference_content, entry)
2397
2398=== modified file 'bzrlib/tests/test_remote.py'
2399--- bzrlib/tests/test_remote.py 2009-08-11 05:26:57 +0000
2400+++ bzrlib/tests/test_remote.py 2009-08-14 05:35:32 +0000
2401@@ -31,6 +31,8 @@
2402 config,
2403 errors,
2404 graph,
2405+ inventory,
2406+ inventory_delta,
2407 pack,
2408 remote,
2409 repository,
2410@@ -38,6 +40,7 @@
2411 tests,
2412 treebuilder,
2413 urlutils,
2414+ versionedfile,
2415 )
2416 from bzrlib.branch import Branch
2417 from bzrlib.bzrdir import BzrDir, BzrDirFormat
2418@@ -332,15 +335,6 @@
2419 reference_bzrdir_format = bzrdir.format_registry.get('default')()
2420 return reference_bzrdir_format.repository_format
2421
2422- def disable_verb(self, verb):
2423- """Disable a verb for one test."""
2424- request_handlers = smart.request.request_handlers
2425- orig_method = request_handlers.get(verb)
2426- request_handlers.remove(verb)
2427- def restoreVerb():
2428- request_handlers.register(verb, orig_method)
2429- self.addCleanup(restoreVerb)
2430-
2431 def assertFinished(self, fake_client):
2432 """Assert that all of a FakeClient's expected calls have occurred."""
2433 fake_client.finished_test()
2434@@ -2219,63 +2213,219 @@
2435 self.assertEqual([], client._calls)
2436
2437
2438-class TestRepositoryInsertStream(TestRemoteRepository):
2439-
2440- def test_unlocked_repo(self):
2441- transport_path = 'quack'
2442- repo, client = self.setup_fake_client_and_repository(transport_path)
2443- client.add_expected_call(
2444- 'Repository.insert_stream', ('quack/', ''),
2445- 'success', ('ok',))
2446- client.add_expected_call(
2447- 'Repository.insert_stream', ('quack/', ''),
2448- 'success', ('ok',))
2449- sink = repo._get_sink()
2450- fmt = repository.RepositoryFormat.get_default_format()
2451- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2452- self.assertEqual([], resume_tokens)
2453- self.assertEqual(set(), missing_keys)
2454- self.assertFinished(client)
2455-
2456- def test_locked_repo_with_no_lock_token(self):
2457- transport_path = 'quack'
2458- repo, client = self.setup_fake_client_and_repository(transport_path)
2459- client.add_expected_call(
2460- 'Repository.lock_write', ('quack/', ''),
2461- 'success', ('ok', ''))
2462- client.add_expected_call(
2463- 'Repository.insert_stream', ('quack/', ''),
2464- 'success', ('ok',))
2465- client.add_expected_call(
2466- 'Repository.insert_stream', ('quack/', ''),
2467- 'success', ('ok',))
2468- repo.lock_write()
2469- sink = repo._get_sink()
2470- fmt = repository.RepositoryFormat.get_default_format()
2471- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2472- self.assertEqual([], resume_tokens)
2473- self.assertEqual(set(), missing_keys)
2474- self.assertFinished(client)
2475-
2476- def test_locked_repo_with_lock_token(self):
2477- transport_path = 'quack'
2478- repo, client = self.setup_fake_client_and_repository(transport_path)
2479- client.add_expected_call(
2480- 'Repository.lock_write', ('quack/', ''),
2481- 'success', ('ok', 'a token'))
2482- client.add_expected_call(
2483- 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2484- 'success', ('ok',))
2485- client.add_expected_call(
2486- 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2487- 'success', ('ok',))
2488- repo.lock_write()
2489- sink = repo._get_sink()
2490- fmt = repository.RepositoryFormat.get_default_format()
2491- resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2492- self.assertEqual([], resume_tokens)
2493- self.assertEqual(set(), missing_keys)
2494- self.assertFinished(client)
2495+class TestRepositoryInsertStreamBase(TestRemoteRepository):
2496+ """Base class for Repository.insert_stream and .insert_stream_1.19
2497+ tests.
2498+ """
2499+
2500+ def checkInsertEmptyStream(self, repo, client):
2501+ """Insert an empty stream, checking the result.
2502+
2503+ This checks that there are no resume_tokens or missing_keys, and that
2504+ the client is finished.
2505+ """
2506+ sink = repo._get_sink()
2507+ fmt = repository.RepositoryFormat.get_default_format()
2508+ resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2509+ self.assertEqual([], resume_tokens)
2510+ self.assertEqual(set(), missing_keys)
2511+ self.assertFinished(client)
2512+
2513+
2514+class TestRepositoryInsertStream(TestRepositoryInsertStreamBase):
2515+ """Tests for using Repository.insert_stream verb when the _1.19 variant is
2516+ not available.
2517+
2518+ This test case is very similar to TestRepositoryInsertStream_1_19.
2519+ """
2520+
2521+ def setUp(self):
2522+ TestRemoteRepository.setUp(self)
2523+ self.disable_verb('Repository.insert_stream_1.19')
2524+
2525+ def test_unlocked_repo(self):
2526+ transport_path = 'quack'
2527+ repo, client = self.setup_fake_client_and_repository(transport_path)
2528+ client.add_expected_call(
2529+ 'Repository.insert_stream_1.19', ('quack/', ''),
2530+ 'unknown', ('Repository.insert_stream_1.19',))
2531+ client.add_expected_call(
2532+ 'Repository.insert_stream', ('quack/', ''),
2533+ 'success', ('ok',))
2534+ client.add_expected_call(
2535+ 'Repository.insert_stream', ('quack/', ''),
2536+ 'success', ('ok',))
2537+ self.checkInsertEmptyStream(repo, client)
2538+
2539+ def test_locked_repo_with_no_lock_token(self):
2540+ transport_path = 'quack'
2541+ repo, client = self.setup_fake_client_and_repository(transport_path)
2542+ client.add_expected_call(
2543+ 'Repository.lock_write', ('quack/', ''),
2544+ 'success', ('ok', ''))
2545+ client.add_expected_call(
2546+ 'Repository.insert_stream_1.19', ('quack/', ''),
2547+ 'unknown', ('Repository.insert_stream_1.19',))
2548+ client.add_expected_call(
2549+ 'Repository.insert_stream', ('quack/', ''),
2550+ 'success', ('ok',))
2551+ client.add_expected_call(
2552+ 'Repository.insert_stream', ('quack/', ''),
2553+ 'success', ('ok',))
2554+ repo.lock_write()
2555+ self.checkInsertEmptyStream(repo, client)
2556+
2557+ def test_locked_repo_with_lock_token(self):
2558+ transport_path = 'quack'
2559+ repo, client = self.setup_fake_client_and_repository(transport_path)
2560+ client.add_expected_call(
2561+ 'Repository.lock_write', ('quack/', ''),
2562+ 'success', ('ok', 'a token'))
2563+ client.add_expected_call(
2564+ 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2565+ 'unknown', ('Repository.insert_stream_1.19',))
2566+ client.add_expected_call(
2567+ 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2568+ 'success', ('ok',))
2569+ client.add_expected_call(
2570+ 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2571+ 'success', ('ok',))
2572+ repo.lock_write()
2573+ self.checkInsertEmptyStream(repo, client)
2574+
2575+ def test_stream_with_inventory_deltas(self):
2576+ """'inventory-deltas' substreams cannot be sent to the
2577+ Repository.insert_stream verb, because not all servers that implement
2578+ that verb will accept them. So when one is encountered the RemoteSink
2579+ immediately stops using that verb and falls back to VFS insert_stream.
2580+ """
2581+ transport_path = 'quack'
2582+ repo, client = self.setup_fake_client_and_repository(transport_path)
2583+ client.add_expected_call(
2584+ 'Repository.insert_stream_1.19', ('quack/', ''),
2585+ 'unknown', ('Repository.insert_stream_1.19',))
2586+ client.add_expected_call(
2587+ 'Repository.insert_stream', ('quack/', ''),
2588+ 'success', ('ok',))
2589+ client.add_expected_call(
2590+ 'Repository.insert_stream', ('quack/', ''),
2591+ 'success', ('ok',))
2592+ # Create a fake real repository for insert_stream to fall back on, so
2593+ # that we can directly see the records the RemoteSink passes to the
2594+ # real sink.
2595+ class FakeRealSink:
2596+ def __init__(self):
2597+ self.records = []
2598+ def insert_stream(self, stream, src_format, resume_tokens):
2599+ for substream_kind, substream in stream:
2600+ self.records.append(
2601+ (substream_kind, [record.key for record in substream]))
2602+ return ['fake tokens'], ['fake missing keys']
2603+ fake_real_sink = FakeRealSink()
2604+ class FakeRealRepository:
2605+ def _get_sink(self):
2606+ return fake_real_sink
2607+ repo._real_repository = FakeRealRepository()
2608+ sink = repo._get_sink()
2609+ fmt = repository.RepositoryFormat.get_default_format()
2610+ stream = self.make_stream_with_inv_deltas(fmt)
2611+ resume_tokens, missing_keys = sink.insert_stream(stream, fmt, [])
2612+ # Every record from the first inventory delta should have been sent to
2613+ # the VFS sink.
2614+ expected_records = [
2615+ ('inventory-deltas', [('rev2',), ('rev3',)]),
2616+ ('texts', [('some-rev', 'some-file')])]
2617+ self.assertEqual(expected_records, fake_real_sink.records)
2618+ # The return values from the real sink's insert_stream are propagated
2619+ # back to the original caller.
2620+ self.assertEqual(['fake tokens'], resume_tokens)
2621+ self.assertEqual(['fake missing keys'], missing_keys)
2622+ self.assertFinished(client)
2623+
2624+ def make_stream_with_inv_deltas(self, fmt):
2625+ """Make a simple stream with an inventory delta followed by more
2626+ records and more substreams to test that all records and substreams
2627+ from that point on are used.
2628+
2629+ This sends, in order:
2630+ * inventories substream: rev1, rev2, rev3. rev2 and rev3 are
2631+ inventory-deltas.
2632+ * texts substream: (some-rev, some-file)
2633+ """
2634+ # Define a stream using generators so that it isn't rewindable.
2635+ inv = inventory.Inventory(revision_id='rev1')
2636+ def stream_with_inv_delta():
2637+ yield ('inventories', inventories_substream())
2638+ yield ('inventory-deltas', inventory_delta_substream())
2639+ yield ('texts', [
2640+ versionedfile.FulltextContentFactory(
2641+ ('some-rev', 'some-file'), (), None, 'content')])
2642+ def inventories_substream():
2643+ # An empty inventory fulltext. This will be streamed normally.
2644+ text = fmt._serializer.write_inventory_to_string(inv)
2645+ yield versionedfile.FulltextContentFactory(
2646+ ('rev1',), (), None, text)
2647+ def inventory_delta_substream():
2648+ # An inventory delta. This can't be streamed via this verb, so it
2649+ # will trigger a fallback to VFS insert_stream.
2650+ entry = inv.make_entry(
2651+ 'directory', 'newdir', inv.root.file_id, 'newdir-id')
2652+ entry.revision = 'ghost'
2653+ delta = [(None, 'newdir', 'newdir-id', entry)]
2654+ serializer = inventory_delta.InventoryDeltaSerializer(
2655+ versioned_root=True, tree_references=False)
2656+ lines = serializer.delta_to_lines('rev1', 'rev2', delta)
2657+ yield versionedfile.ChunkedContentFactory(
2658+ ('rev2',), (('rev1',)), None, lines)
2659+ # Another delta.
2660+ lines = serializer.delta_to_lines('rev1', 'rev3', delta)
2661+ yield versionedfile.ChunkedContentFactory(
2662+ ('rev3',), (('rev1',)), None, lines)
2663+ return stream_with_inv_delta()
2664+
2665+
2666+class TestRepositoryInsertStream_1_19(TestRepositoryInsertStreamBase):
2667+
2668+ def test_unlocked_repo(self):
2669+ transport_path = 'quack'
2670+ repo, client = self.setup_fake_client_and_repository(transport_path)
2671+ client.add_expected_call(
2672+ 'Repository.insert_stream_1.19', ('quack/', ''),
2673+ 'success', ('ok',))
2674+ client.add_expected_call(
2675+ 'Repository.insert_stream_1.19', ('quack/', ''),
2676+ 'success', ('ok',))
2677+ self.checkInsertEmptyStream(repo, client)
2678+
2679+ def test_locked_repo_with_no_lock_token(self):
2680+ transport_path = 'quack'
2681+ repo, client = self.setup_fake_client_and_repository(transport_path)
2682+ client.add_expected_call(
2683+ 'Repository.lock_write', ('quack/', ''),
2684+ 'success', ('ok', ''))
2685+ client.add_expected_call(
2686+ 'Repository.insert_stream_1.19', ('quack/', ''),
2687+ 'success', ('ok',))
2688+ client.add_expected_call(
2689+ 'Repository.insert_stream_1.19', ('quack/', ''),
2690+ 'success', ('ok',))
2691+ repo.lock_write()
2692+ self.checkInsertEmptyStream(repo, client)
2693+
2694+ def test_locked_repo_with_lock_token(self):
2695+ transport_path = 'quack'
2696+ repo, client = self.setup_fake_client_and_repository(transport_path)
2697+ client.add_expected_call(
2698+ 'Repository.lock_write', ('quack/', ''),
2699+ 'success', ('ok', 'a token'))
2700+ client.add_expected_call(
2701+ 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2702+ 'success', ('ok',))
2703+ client.add_expected_call(
2704+ 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2705+ 'success', ('ok',))
2706+ repo.lock_write()
2707+ self.checkInsertEmptyStream(repo, client)
2708
2709
2710 class TestRepositoryTarball(TestRemoteRepository):
2711
2712=== modified file 'bzrlib/tests/test_selftest.py'
2713--- bzrlib/tests/test_selftest.py 2009-08-04 02:09:19 +0000
2714+++ bzrlib/tests/test_selftest.py 2009-08-14 05:35:32 +0000
2715@@ -124,7 +124,7 @@
2716 self.assertEqual(sample_permutation,
2717 get_transport_test_permutations(MockModule()))
2718
2719- def test_scenarios_invlude_all_modules(self):
2720+ def test_scenarios_include_all_modules(self):
2721 # this checks that the scenario generator returns as many permutations
2722 # as there are in all the registered transport modules - we assume if
2723 # this matches its probably doing the right thing especially in
2724@@ -293,18 +293,16 @@
2725 from bzrlib.tests.per_interrepository import make_scenarios
2726 server1 = "a"
2727 server2 = "b"
2728- formats = [(str, "C1", "C2"), (int, "D1", "D2")]
2729+ formats = [("C0", "C1", "C2"), ("D0", "D1", "D2")]
2730 scenarios = make_scenarios(server1, server2, formats)
2731 self.assertEqual([
2732- ('str,str,str',
2733- {'interrepo_class': str,
2734- 'repository_format': 'C1',
2735+ ('C0,str,str',
2736+ {'repository_format': 'C1',
2737 'repository_format_to': 'C2',
2738 'transport_readonly_server': 'b',
2739 'transport_server': 'a'}),
2740- ('int,str,str',
2741- {'interrepo_class': int,
2742- 'repository_format': 'D1',
2743+ ('D0,str,str',
2744+ {'repository_format': 'D1',
2745 'repository_format_to': 'D2',
2746 'transport_readonly_server': 'b',
2747 'transport_server': 'a'})],
2748
2749=== modified file 'bzrlib/tests/test_smart.py'
2750--- bzrlib/tests/test_smart.py 2009-07-23 07:37:05 +0000
2751+++ bzrlib/tests/test_smart.py 2009-08-14 05:35:32 +0000
2752@@ -1242,6 +1242,7 @@
2753 SmartServerResponse(('history-incomplete', 2, r2)),
2754 request.execute('stacked', 1, (3, r3)))
2755
2756+
2757 class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):
2758
2759 def make_two_commit_repo(self):
2760
2761=== modified file 'bzrlib/tests/test_xml.py'
2762--- bzrlib/tests/test_xml.py 2009-04-03 21:50:40 +0000
2763+++ bzrlib/tests/test_xml.py 2009-08-14 05:35:32 +0000
2764@@ -19,6 +19,7 @@
2765 from bzrlib import (
2766 errors,
2767 inventory,
2768+ xml6,
2769 xml7,
2770 xml8,
2771 serializer,
2772@@ -139,6 +140,14 @@
2773 </inventory>
2774 """
2775
2776+_expected_inv_v6 = """<inventory format="6" revision_id="rev_outer">
2777+<directory file_id="tree-root-321" name="" revision="rev_outer" />
2778+<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
2779+<file file_id="file-id" name="file" parent_id="tree-root-321" revision="rev_outer" text_sha1="A" text_size="1" />
2780+<symlink file_id="link-id" name="link" parent_id="tree-root-321" revision="rev_outer" symlink_target="a" />
2781+</inventory>
2782+"""
2783+
2784 _expected_inv_v7 = """<inventory format="7" revision_id="rev_outer">
2785 <directory file_id="tree-root-321" name="" revision="rev_outer" />
2786 <directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
2787@@ -377,6 +386,17 @@
2788 for path, ie in inv.iter_entries():
2789 self.assertEqual(ie, inv2[ie.file_id])
2790
2791+ def test_roundtrip_inventory_v6(self):
2792+ inv = self.get_sample_inventory()
2793+ txt = xml6.serializer_v6.write_inventory_to_string(inv)
2794+ lines = xml6.serializer_v6.write_inventory_to_lines(inv)
2795+ self.assertEqual(bzrlib.osutils.split_lines(txt), lines)
2796+ self.assertEqualDiff(_expected_inv_v6, txt)
2797+ inv2 = xml6.serializer_v6.read_inventory_from_string(txt)
2798+ self.assertEqual(4, len(inv2))
2799+ for path, ie in inv.iter_entries():
2800+ self.assertEqual(ie, inv2[ie.file_id])
2801+
2802 def test_wrong_format_v7(self):
2803 """Can't accidentally open a file with wrong serializer"""
2804 s_v6 = bzrlib.xml6.serializer_v6
2805
2806=== modified file 'bzrlib/versionedfile.py'
2807--- bzrlib/versionedfile.py 2009-08-04 04:36:34 +0000
2808+++ bzrlib/versionedfile.py 2009-08-14 05:35:32 +0000
2809@@ -1571,13 +1571,14 @@
2810 record.get_bytes_as(record.storage_kind) call.
2811 """
2812 self._bytes_iterator = bytes_iterator
2813- self._kind_factory = {'knit-ft-gz':knit.knit_network_to_record,
2814- 'knit-delta-gz':knit.knit_network_to_record,
2815- 'knit-annotated-ft-gz':knit.knit_network_to_record,
2816- 'knit-annotated-delta-gz':knit.knit_network_to_record,
2817- 'knit-delta-closure':knit.knit_delta_closure_to_records,
2818- 'fulltext':fulltext_network_to_record,
2819- 'groupcompress-block':groupcompress.network_block_to_records,
2820+ self._kind_factory = {
2821+ 'fulltext': fulltext_network_to_record,
2822+ 'groupcompress-block': groupcompress.network_block_to_records,
2823+ 'knit-ft-gz': knit.knit_network_to_record,
2824+ 'knit-delta-gz': knit.knit_network_to_record,
2825+ 'knit-annotated-ft-gz': knit.knit_network_to_record,
2826+ 'knit-annotated-delta-gz': knit.knit_network_to_record,
2827+ 'knit-delta-closure': knit.knit_delta_closure_to_records,
2828 }
2829
2830 def read(self):
2831
2832=== modified file 'bzrlib/xml5.py'
2833--- bzrlib/xml5.py 2009-04-04 02:50:01 +0000
2834+++ bzrlib/xml5.py 2009-08-14 05:35:32 +0000
2835@@ -39,8 +39,8 @@
2836 format = elt.get('format')
2837 if format is not None:
2838 if format != '5':
2839- raise BzrError("invalid format version %r on inventory"
2840- % format)
2841+ raise errors.BzrError("invalid format version %r on inventory"
2842+ % format)
2843 data_revision_id = elt.get('revision_id')
2844 if data_revision_id is not None:
2845 revision_id = cache_utf8.encode(data_revision_id)