Merge lp:~spiv/bzr/inventory-delta into lp:~bzr/bzr/trunk-old

Proposed by Andrew Bennetts
Status: Merged
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/inventory-delta
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 2845 lines
To merge this branch: bzr merge lp:~spiv/bzr/inventory-delta
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+9676@code.launchpad.net

This proposal supersedes a proposal from 2009-07-22.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This is a pretty big patch. It does lots of things:

 * adds new insert_stream and get_stream verbs
 * adds de/serialization of inventory-delta records on the network
 * fixes rich-root generation in StreamSource
 * adds a bunch of new scenarios to per_interrepository tests
 * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
   the new pack is already optimal).
 * improves the inventory_delta module a little
 * various miscellaneous fixes and new tests that are hopefully self-evident
 * and, most controversially, removes InterDifferingSerializer.

From John's mail a while back there were a bunch of issues with removing IDS. I
think the outstanding ones are:

> 1) Incremental updates. IDS converts batches of 100 revs at a time,
> which also triggers autopacks at 1k revs. Streaming fetch is currently
> an all-or-nothing, which isn't appropriate (IMO) for conversions.
> Consider that conversion can take *days*, it is important to have
> something that can be stopped and resumed.
>
> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
> bloats to 2.4GB before packing back to 25MB. We know the new format is
> even more sensitive to packing efficiency. Not to mention that a single
> big-stream generates a single large pack, it isn't directly obvious that
> we are being so inefficient.

i.e. performance concerns.

The streaming code is pretty similar in how it does the conversion now to the
way IDS did it, but probably still different enough that we will want to measure
the impact of this. I'm definitely concerned about case 2, the lack of packing
as you go, although perhaps the degree of bloat is reduced by using
semantic inventory-delta records?

The reason why I eventually deleted IDS was that it was just too burdensome to
keep two code paths alive, thoroughly tested, and correct. For instance, if we
simply reinstated IDS for local-only fetches then most of the test suite,
including the relevant interrepo tests, will only exercise IDS. Also, IDS
turned out to have a bug when used on a stacked repository that the extending
test suite in this branch revealed (I've forgotten the details, but can dig them
up if you like). It didn't seem worth the hassle of fixing IDS when I already
had a working implementation.

I'm certainly open to reinstating IDS if it's the most expedient way to have
reasonable local performance for upgrades, but I thought I'd try to be bold and
see if we could just live without the extra complexity. Maybe we can improve
performance of streaming rather than resurrect IDS?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (4.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:
>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.

It also picks out the 'optimal' deltas by computing many different ones
and finding whichever one was the 'smallest'. For local conversions, the
time to compute 2-3 deltas was much smaller than to apply an inefficient
delta.

>>
>> 2) Also, auto-packing as you go avoids the case you ran into, where bzr
>> bloats to 2.4GB before packing back to 25MB. We know the new format is
>> even more sensitive to packing efficiency. Not to mention that a single
>> big-stream generates a single large pack, it isn't directly obvious that
>> we are being so inefficient.
>
> i.e. performance concerns.
>

Generally, yes.

There is also:

3) Being able to resume because you snapshotted periodically as you
went. This seems even more important for a network transfer.

> The streaming code is pretty similar in how it does the conversion now to the
> way IDS did it, but probably still different enough that we will want to measure
> the impact of this. I'm definitely concerned about case 2, the lack of packing
> as you go, although perhaps the degree of bloat is reduced by using
> semantic inventory-delta records?
>

I don't think bzr bloating from 100MB => 2.4GB (and then back down to
25MB post pack) was because of inventory records. However, if it was
purely because of a bad streaming order, we could probably fix that by
changing how we stream texts.

> The reason why I eventually deleted IDS was that it was just too burdensome to
> keep two code paths alive, thoroughly tested, and correct. For instance, if we
> simply reinstated IDS for local-only fetches then most of the test suite,
> including the relevant interrepo tests, will only exercise IDS. Also, IDS
> turned out to have a bug when used on a stacked repository that the extending
> test suite in this branch revealed (I've forgotten the details, but can dig them
> up if you like). It didn't seem worth the hassle of fixing IDS when I already
> had a working imple...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

and

4) Progress indication

This is really quite useful for a process that can take *days* to
complete. The Stream code is often quite nice, but the fact that it
gives you 2 states:
 'getting stream'
 'inserting stream'

and nothing more than that is pretty crummy.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfUK8ACgkQJdeBCYSNAAMa+wCgybpPdd4Yie/Craew/zxX9eF7
cWMAoNcxPftDDdLssboDW7rezk4d2L2d
=WA26
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a pretty big patch. It does lots of things:
>
> * adds new insert_stream and get_stream verbs
> * adds de/serialization of inventory-delta records on the network
> * fixes rich-root generation in StreamSource
> * adds a bunch of new scenarios to per_interrepository tests
> * fixes some 'pack already exist' bugs for packing a single GC pack (i.e. when
> the new pack is already optimal).
> * improves the inventory_delta module a little
> * various miscellaneous fixes and new tests that are hopefully self-evident
> * and, most controversially, removes InterDifferingSerializer.
>
>>From John's mail a while back there were a bunch of issues with removing IDS. I
> think the outstanding ones are:

So for starters, let me mention what I found wrt performance:

time bzr.dev branch mysql-1k myqsl-2a/1k
  real 3m18.490s

time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
  real 2m29.953s

+xml8 is just this patch:
=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-07-16 16:14:38 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
                 return cached_ie.copy()

         kind = elt.tag

It has 2 basic effects:

1) Avoid copying all inventory entries all the time (so reduce the time
spent in InventoryEntry.copy())

2) By re-using exact objects "_make_delta" can do "x is y" comparisons,
rather than having to do:
  x.attribute1 == y.attribute1
  and x.attribute2 == y.attribute2
etc.

As you can see it is a big win for this test case (about 4:3 or 33% faster)

So what about Andrew's work:

time bzr.inv.delta branch mysql-1k myqsl-2a/1k
  real 10m14.267s

time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
  real 9m49.372s

It also was stuck at:
[##################- ] Fetching revisions:Inserting stream:Walking
content 912/1043

For most of that time, making it really look like it was stalled.

Anyway, this isn't something where it is, say, 10% slower which is
acceptable because we get rid of some extra code paths. This ends up
being 3-4x slower and no longer giving any progress information.

If that scales to launchpad sized projects, you are talking 4-days
becoming 16-days (aka > 2 weeks).

So honestly, I don't think we can land this as is. I won't stick on the
performance side if people feel it is acceptable. But I did spend a lot
of time optimizing IDS that clearly hasn't been done with StreamSource.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfWm0ACgkQJdeBCYSNAAM8mgCgru3K3SpP8BcMZdLJLH...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Andrew Bennetts wrote:

...

> So for starters, let me mention what I found wrt performance:
>
> time bzr.dev branch mysql-1k myqsl-2a/1k
> real 3m18.490s
>
> time bzr.dev+xml8 branch mysql-1k myqsl-2a/1k
> real 2m29.953s

...

> time bzr.inv.delta branch mysql-1k myqsl-2a/1k
> real 10m14.267s
>
> time bzr.inv.delta+xml8 branch mysql-1k myqsl-2a/1k
> real 9m49.372s

Also, for real-world space issues:
$ du -ksh mysql-2a*/.bzr/repository/obsolete*
1.9M mysql-2a-bzr.dev/.bzr/repository/obsolete_packs
467M mysql-2a-inv-delta/.bzr/repository/obsolete_packs

The peak size (watch du -ksh mysql-2a-bzr.dev) during conversion using
IDS was 49MB.

$ du -ksh mysql-2a*/.bzr/repository/packs*
11M mysql-2a-bzr.dev/.bzr/repository/packs
9.1M mysql-2a-inv-delta/.bzr/repository/packs

So the new code wins slightly in the final size on disk, because it
packed at the end, rather than at 1k revs (and then there were another
40+ revs inserted.)

However, it bloated from 15MB => 467MB while it was doing the transfer
before the final size. Versus a peak of 50MB (almost 10x larger).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpfdCkACgkQJdeBCYSNAANABACgl4l4L1AjaiXRJgrn5iwLrVe1
tVEAnRRJ0QbWzd8lXFXQXhWdhvqFjnw8
=pXZe
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

John A Meinel wrote:
[...]
> It also picks out the 'optimal' deltas by computing many different ones
> and finding whichever one was the 'smallest'. For local conversions, the
> time to compute 2-3 deltas was much smaller than to apply an inefficient
> delta.

FWIW, the streaming code also does this. My guess (not yet measured) is that
sending less bytes over the network is also a win, especially when one parent
might be a one-liner and the other might be large merge from trunk.

[...]
> There is also:
>
> 3) Being able to resume because you snapshotted periodically as you
> went. This seems even more important for a network transfer.

Yes, although we already don't have this for the network. It would be great to
have...

[...]
> I'm certainly open to the suggestion of getting rid of IDS. I don't like
> having multiple code paths. It just happens that there are *big* wins
> and it is often easier to write optimized code in a different framework.

Sure. Like I said for me it was just getting to be a large hassle to maintain
both paths in my branch, even though they were increasingly sharing a lot of
code for e.g. rich root generation before I deleted IDS.

I'd like to try see if we can cheaply fix the performance issues you report in
other mails without needing IDS. If we do need IDS for a while longer then
fine, although I think we'll want to restrict it to local source, local target,
non-stacked cases only.

Thanks for the measurements and quick feedback.

-Andrew.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:06 +0000, John A Meinel wrote:
>
> (3) is an issue I'd like to see addressed, but which Robert seems
> particularly unhappy having us try to do. (See other bug comments, etc
> about how other systems don't do it and he feels it isn't worth
> doing.)

I'd like to be clear about this. I'd be ecstatic *if* we can do it well
and robustly. However I don't think it is *at all* easy to that. If I'm
wrong - great.

I'm fine with keeping IDS for local fetches. But when networking is
involved IDS is massively slower than the streaming codepath.

> It was fairly straightforward to do with IDS, the argument I think
> from
> Robert is that the client would need to be computing whether it has a
> 'complete' set and thus can commit the current write group. (the
> *source* knows these sort of things, and can just say "and now you
> have
> it", but the client has to re-do all that work to figure it out from a
> stream.)

I think that aspect is simple - we have a stream subtype that says
'checkpoint'. Its the requirement to do all that work that is, I think
problematic - and thats *without* considering stacking, which makes it
hugely harder.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>
>
> 4) Progress indication
>
> This is really quite useful for a process that can take *days* to
> complete. The Stream code is often quite nice, but the fact that it
> gives you 2 states:
> 'getting stream'
> 'inserting stream'
>
> and nothing more than that is pretty crummy.

That is a separate bug however, and one that affects normal fetches too.
So I don't think tying it to the IDS discussion is necessary or
particularly helpful.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-07-16 at 16:12 +0000, John A Meinel wrote:
>>
>> 4) Progress indication
>>
>> This is really quite useful for a process that can take *days* to
>> complete. The Stream code is often quite nice, but the fact that it
>> gives you 2 states:
>> 'getting stream'
>> 'inserting stream'
>>
>> and nothing more than that is pretty crummy.
>
> That is a separate bug however, and one that affects normal fetches too.
> So I don't think tying it to the IDS discussion is necessary or
> particularly helpful.
>
> -Rob
>

It is explicitly relevant that doing "bzr upgrade --2a" which will take
longer-than-normal would now not even show a progress bar.

For local fetches, you don't even get the "transport activity"
indicator, so it *really* looks hung. It doesn't even write things into
.bzr.log so that you know it is doing anything other than spinning in a
while True loop. I guess you can tell because your disk consumption is
going way up...

I don't honestly know the performance difference for streaming a lot of
content over the network. Given a 4x performance slowdown, for large
fetches IDS could still be faster. I certainly agree that IDS is
probably significantly more inefficient when doing something like "give
me the last 2 revs".

It honestly wasn't something I was optimizing for (cross format
fetching). I *was* trying to make 'bzr upgrade' be measured in hours
rather than days/weeks/etc.

Also, given that you have to upgrade all of your stacked locations at
the same time, and --2a is a trap door, aren't 95% of upgrades going to
be all at once anyway?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpf46YACgkQJdeBCYSNAANwAwCfYQj7gws3O4KDPxqrcMLu4nfB
554AoIyuns4b5Fsa3wf4uFhf4Uex00oQ
=qjX9
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This is the same patch updated for bzr.dev, and with InterDifferingSerializer restored for local fetches only. For non-local fetches I'm pretty sure the streaming code path is massively faster based on the timings I've done (even via TCP HPSS to localhost).

I'm sure we do want to get rid of IDS eventually (and fix the shortcomings in streaming that John has pointed out) but doing that shouldn't block the rest of this work, even if it is a small maintenance headache.

[FWIW, if I don't restrict IDS to local-only branches I get test failures like:

Traceback (most recent call last):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 137, in test_fetch_parent_inventories_at_stacking_boundary_smart_old
    self.test_fetch_parent_inventories_at_stacking_boundary()
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 181, in test_fetch_parent_inventories_at_stacking_boundary
    self.assertCanStreamRevision(unstacked_repo, 'merge')
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/tests/per_interrepository/test_fetch.py", line 187, in assertCanStreamRevision
    for substream_kind, substream in source.get_stream(search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/remote.py", line 1895, in missing_parents_chain
    for kind, stream in self._get_stream(sources[0], search):
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/repository.py", line 537, in record_stream
    for bytes in byte_stream:
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 338, in read_streamed_body
    _translate_error(self._body_error_args)
  File "/home/andrew/warthogs/bzr/inventory-delta/bzrlib/smart/message.py", line 361, in _translate_error
    raise errors.ErrorFromSmartServer(error_tuple)
ErrorFromSmartServer: Error received from smart server: ('error', "<bzrlib.groupcompress.GroupCompressVersionedFiles object at 0xb06348c> has no revision ('sha1:98fd3a13366960dc27dcb4b6ddb2b55aca3aae7b',)")

(for scenarios like Pack6RichRoot->2a).]

Anyway, please review.

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

I'm not really sure what testing you've done on this, but I'm getting some really strange results.

Specifically, when I push it turns out to be sending xml fragments over the wire. Specifically it seems to be this code:
        elif (not from_format.supports_chks):
            # Source repository doesn't support chks. So we can transmit the
            # inventories 'as-is' and either they are just accepted on the
            # target, or the Sink will properly convert it.
            # (XXX: this assumes that all non-chk formats are understood as-is
            # by any Sink, but that presumably isn't true for foreign repo
            # formats added by bzr-svn etc?)
            return self._get_simple_inventory_stream(revision_ids,
                    missing=missing)

Which means that the raw xml bytes are being transmitted, and then the target side is extracting the xml upcasting and downcasting.

I see that there are code paths in place to do otherwise, but as near as I can tell, "_stream_invs_as_deltas" is only getting called if the *source* format is CHK.

From the profiling I've done, the _generate_root_texts() code is the bulk of the overhead with the new format. But *that* is because all of the data is sent to the server as XML texts, and the server has to do all the work to convert it.

When I just disable the code path that sends 'simple_inventory_stream', then I get:

$ time wbzr push bzr://localhost/test-2a/x
bzr: ERROR: Version present for / in TREE_ROOT

So I have a *strong* feeling the code you've introduced is actually broken, and you just didn't realize it.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (7.1 KiB)

So, to further my discussion. While investigating this, I found some odd bits in _stream_invs_as_deltas. For example:

+ def _stream_invs_as_deltas(self, revision_ids, fulltexts=False):
         from_repo = self.from_repository

...

+ inventories = self.from_repository.iter_inventories(
+ revision_ids, 'topological')
+ # XXX: ideally these flags would be per-revision, not per-repo (e.g.
+ # streaming a non-rich-root revision out of a rich-root repo back into
+ # a non-rich-root repo ought to be allowed)
+ format = from_repo._format
+ flags = (format.rich_root_data, format.supports_tree_reference)
+ invs_sent_so_far = set([_mod_revision.NULL_REVISION])
+ for inv in inventories:
             key = (inv.revision_id,)

...

+ for parent_id in parent_ids:
...
+ if (best_delta is None or
+ len(best_delta) > len(candidate_delta)):
+ best_delta = candidate_delta
+ basis_id = parent_id
+ delta = best_delta
+ invs_sent_so_far.add(basis_id)

^- Notice that once you've prepared a delta for "inv.revision_id" you then set "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending most of the inventories as complete texts, because you won't actually think you've sent what you actually have done.

+ yield versionedfile.InventoryDeltaContentFactory(
+ key, parents, None, delta, basis_id, flags, from_repo)

The way you've handled the "no parents are available use NULL", means that you
actually create a delta to NULL for *every* revision, and find out if it is the
smallest one. Which seems inefficient versus just using NULL when nothing else
is available. (Note that IDS goes as far as to not use parents that aren't
cached even if they have been sent, and to fall back to using the last-sent
revision otherwise.)

I've changed this loop around a bit, to avoid some duplication and make it a little bit clearer (at least to me) what is going on. I also added a quick LRUCache for the parent inventories that we've just sent, as re-extracting them from the repository is not going to be efficient. (Especially extracting them one at a time.)

Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.

It seems like we should be consistent at that level. What do you think?

As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:

            serializer.require_flags(*self._format_flags)

And that, IMO, is asking for trouble. Yes it is most likely correct as written, but it means that you have to line up the tuple created at the start of _stream_invs_as_deltas:
        flags = (format.rich_root_data, format.supports_tree_reference)

with the *arguments* to a function nested 3 calls away.
So how about:

  flags = {'rich_root_data': format.rich_root_data,
           'supports_tree_reference': format.supports_tree_reference
          }
...

and
  serializer.require_flags(**self._format_flags)

It isn't vastly better, but ...

Read more...

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
>
> I suppose one possibility would be to have the client reject the
> stream entirely and request it be re-sent with the basis available.

If we do this then... The server just asks for that inventory again. The
problem though, is that *all* inventories will need to be asked for
again, which could be pathologically bad. It is possible to write a file
and restore it later without network API changes, just pass in write
group save tokens. I don't think we need to block on that for this patch
though - sending a full inventory is no worse than sending the full text
of a file, which 2a does as well - and ChangeLog files etc are very big
- as large or larger than an inventory, I would expect.

> Of course, the get_stream_for_missing_keys code is wired up such that
> clients
> ignore the response about missing inventories if the target is a
> stacked repo,
> and just always force up an extra fulltext copy of all parent
> inventories
> anyway. (Because of the bugs in 1.13, IIRC.)

The condition is wrong... it probably should say 'if we're not told
anything is missing..' - and it should be guarded such that the new push
verb (introduced recently?) doesn' trigger this paranoia.

> Anyway, it means that what you get out the other side of translating
> into and then back out of the serialized form is not exactly equal to
> what went in, which seems very bad (and likely to *not* break when
> using local actions, and then have it break when using remote ones.)

This is a reason to serialise/deserialise locally ;) - at least to start
with.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
> On Tue, 2009-07-28 at 21:51 +0000, John A Meinel wrote:
> >
> > I suppose one possibility would be to have the client reject the
> > stream entirely and request it be re-sent with the basis available.
>
> If we do this then... The server just asks for that inventory again. The
> problem though, is that *all* inventories will need to be asked for
> again, which could be pathologically bad. It is possible to write a file
> and restore it later without network API changes, just pass in write
> group save tokens. I don't think we need to block on that for this patch
> though - sending a full inventory is no worse than sending the full text
> of a file, which 2a does as well - and ChangeLog files etc are very big
> - as large or larger than an inventory, I would expect.

Also, we currently only allow one “get missing keys” stream after the original
stream. So if we optimistically send just a delta then having the client
reject it means that the next time we have to send not only a full delta closure
the second time, we also need to send all the inventory parents, because there's
no further opportunity for the client to ask for more keys. This is potentially
even worse than just sending a single fulltext.

I think sending a fulltext cross-format is fine. We aren't trying to maximally
optimise cross-format fetches, just make them acceptable. We should file a bug
for improving this, but I don't think it's particularly urgent.

FWIW, I think the correct way to fix this is to allow the receiving repository
to somehow store inventory-deltas that it can't add yet, so that it can work as
part of the regular suspend_write_group and get missing keys logic. Then the
sender can optimistically send a delta, and the receiver can either insert it or
store it in a temporary upload pack and ask for the parent keys, just as we
already do for all other types of deltas. This is optimal in the case where the
recipient already has the basis, and only requires one extra roundtrip in the
case that it doesn't. We can perhaps do this by creating an upload pack just to
store the fulltexts of inventory-deltas, including that pack in the resume
tokens, but making sure never to insert in the final commit.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal
Download full text (5.1 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> So, to further my discussion. While investigating this, I found some odd bits
> in _stream_invs_as_deltas. For example:
[...]
>
> ^- Notice that once you've prepared a delta for "inv.revision_id" you then set
> "basis_id" as part of "invs_sent_so_far". Which AFAICT means you will sending
> most of the inventories as complete texts, because you won't actually think
> you've sent what you actually have done.

Oops, right. I've merged your fix for that.

> + yield versionedfile.InventoryDeltaContentFactory(
> + key, parents, None, delta, basis_id, flags, from_repo)
>
> The way you've handled the "no parents are available use NULL", means that you
> actually create a delta to NULL for *every* revision, and find out if it is the
> smallest one. Which seems inefficient versus just using NULL when nothing else
> is available. (Note that IDS goes as far as to not use parents that aren't
> cached even if they have been sent, and to fall back to using the last-sent
> revision otherwise.)
>
> I've changed this loop around a bit, to avoid some duplication and make it a
> little bit clearer (at least to me) what is going on. I also added a quick
> LRUCache for the parent inventories that we've just sent, as re-extracting
> them from the repository is not going to be efficient. (Especially extracting
> them one at a time.)

Thanks, I've merged your fix.

> Going for that last line... 'key' is a tuple, and 'parents' is a tuple of tuples (or a list of tuples), but 'basis_id' is a simple string.
>
> It seems like we should be consistent at that level. What do you think?

It would be nice, but not vital.

> As for "flags", wouldn't it be better to pass that in as a *dict*. You pass it directly to:
[...]
> serializer.require_flags(**self._format_flags)
>
> It isn't vastly better, but at least they are named arguments, rather than
> fairly arbitrary positional ones.

Ok, I'll do that.

> In the end, I don't think that it is ideal that an incremental push of 1
> revision will transmit a complete inventory (in delta form). I understand the
> limitation is that we currently are unable to 'buffer' these records anywhere
> in a persistent state between RPC calls. (Even though we have a bytes-on-the
> wire representation, we don't have an index/etc to look them up in.)

(As said elsewhere on this thread...)

This is only an issue for a cross-format push, which doesn't have to be
maximally efficient. It just has to be reasonable. We can look at doing better
later if we want, but this is already a massive improvement in terms of data
transferred compared to IDS.

[...]
> For *me* the ssh handshake is probably signficantly more than the time to push
> up 2 inventories of bzr.dev, but that tradeoff is very different for people
> working over a tiny bandwidth from their GPRS phone, trying to push a critical
> bugfix for a Launchpad branch.

Then they probably should go to the effort of having their local branch in a
matching format :)

We're not intending to use inventory-deltas for anything but cross-format
fetches AFAIK.

> Again, the code as written fails to actually transmit data over the...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>
> > Digging further, the bytes on the wire don't include things like:
> > InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
> > InventoryDeltaContentFactory.sha1
>
> Inventories don't have parent lists, IIRC? Revisions do, and text
> versions do,
> but inventories don't. (They may have a compression parent, but not a
> semantic
> parent.)

Indeed. I wouldn't give them one in the deltas; let repositories that
care calculate one.
parents=None is valid for the interface..

> The sha1 is never set, although perhaps it should be.

No it shouldn't. the sha1 is format specific, and we don't convert back
to the original format to check it, so it would, at best, be discarded.
sha1 = None is valid for the interface as well

_Rob

Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2009-08-03 at 06:33 +0000, Andrew Bennetts wrote:
>>> Digging further, the bytes on the wire don't include things like:
>>> InventoryDeltaContentFactory.parents (list/tuple of tuple keys)
>>> InventoryDeltaContentFactory.sha1
>> Inventories don't have parent lists, IIRC? Revisions do, and text
>> versions do,
>> but inventories don't. (They may have a compression parent, but not a
>> semantic
>> parent.)
>
> Indeed. I wouldn't give them one in the deltas; let repositories that
> care calculate one.
> parents=None is valid for the interface..

So... all of our inventory texts have parents in all repository formats
that exist *today*.

It isn't feasible to have them query the revisions. Because in "knit"
streams, they haven't received the revision data until after the
inventory data.

At first I thought we had fixed inventories in CHK, but I checked, and
it still has parents. And there is other code that assumes this fact.

Case in point, my new bundle sending/receiving code. It intentionally
queries the inventories.get_parent_map because
revisions.get_parent_map() has *not* been filled in yet.
(Bundle.insert_revisions always inserts objects as texts, inventories,
revisions.)

John
=:->

>
>> The sha1 is never set, although perhaps it should be.
>
> No it shouldn't. the sha1 is format specific, and we don't convert back
> to the original format to check it, so it would, at best, be discarded.
> sha1 = None is valid for the interface as well
>
> _Rob

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp25csACgkQJdeBCYSNAANx4ACeJKMpECIH1YR6yJ5+RxNCNgIW
jGAAn2eOWm6DNJNWweSo4NF5CZ34OJiC
=Hyh5
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Mon, 2009-08-03 at 13:30 +0000, John A Meinel wrote:
>
> > Indeed. I wouldn't give them one in the deltas; let repositories
> that
> > care calculate one.
> > parents=None is valid for the interface..
>
> So... all of our inventory texts have parents in all repository
> formats
> that exist *today*.
>
> It isn't feasible to have them query the revisions. Because in "knit"
> streams, they haven't received the revision data until after the
> inventory data.

ugh. ugh ugh ugh swear swear swear.

> At first I thought we had fixed inventories in CHK, but I checked, and
> it still has parents. And there is other code that assumes this fact.
>
> Case in point, my new bundle sending/receiving code. It intentionally
> queries the inventories.get_parent_map because
> revisions.get_parent_map() has *not* been filled in yet.
> (Bundle.insert_revisions always inserts objects as texts, inventories,
> revisions.)

Why does it need to do that though?

We're going to have to break this chain sometime.

If we can't at this stage, we'll need to either supply parents on the
serialised deltas, or add a parents field to the inventory serialisation
form. I prefer the former, myself.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

Ok, third time lucky! :)

Some notable changes since the last review:

  - Uses an inventory-deltas substream rather than inventing a new content factory;
  - Adds a few debug flags to control whether this code path is used or not;
  - Fixes some bugs relating to rich roots and to deletes in inventory_delta.py!

I was surprised to realise that, despite the expectations of inventory_delta.py, non-rich-root repos can have roots with IDs other than 'TREE_ROOT'. What they can't have is a root entry with a revision that doesn't match the inventory's revision.

This beats InterDifferingSerializer for a push of bzr.dev -r2000 from 1.9->2a over localhost HPSS by about 3x (9.5min vs. ~30min, although not on a totally quiescent laptop). LocalTransport push with IDS is about 10x faster than that, though.

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
[...]
> If we can't at this stage, we'll need to either supply parents on the
> serialised deltas, or add a parents field to the inventory serialisation
> form. I prefer the former, myself.

FWIW, my current code sends the parents for the inventory in the .parents of the
FulltextContentFactory holding the serialised inventory-delta (and uses that
when calling add_inventory_by_delta).

So my patch maintains the status quo here. (I'm actually fairly sure that the
version of the code John reviewed which inspired this discussion was doing this
too, but that's irrelevant at this point.)

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Ok, third time lucky! :)
>
> Some notable changes since the last review:
>
> - Uses an inventory-deltas substream rather than inventing a new content factory;
> - Adds a few debug flags to control whether this code path is used or not;
> - Fixes some bugs relating to rich roots and to deletes in inventory_delta.py!
>
> I was surprised to realise that, despite the expectations of inventory_delta.py, non-rich-root repos can have roots with IDs other than 'TREE_ROOT'. What they can't have is a root entry with a revision that doesn't match the inventory's revision.
>
> This beats InterDifferingSerializer for a push of bzr.dev -r2000 from 1.9->2a over localhost HPSS by about 3x (9.5min vs. ~30min, although not on a totally quiescent laptop). LocalTransport push with IDS is about 10x faster than that, though.

So do I understand correctly that:

IDS => bzr:// ~30min
InventoryDelta => bzr:// 9.5min
IDS => file:// < 1min

So when I was looking at InventoryDelta (before we fixed it to actually
send deltas :) the #1 overhead was actually in "_generate_root_texts()"
because that was iterating over revision_trees and having to extract all
of the inventories yet again.

Anyway I'll give the new code a look over. Unfortunately there are still
a lot of conflated factors, like code that wants to transmit all of the
"texts" before we transmit *any* "inventories" content, which means
somewhere you need to do buffering.

IDS "works" by batching 100 at a time, so it only buffers the 100 or so
inventory deltas before it writes the root texts to the target repo.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp5gmoACgkQJdeBCYSNAANXLgCgmIFEKd61nvF/69U6vcpgspWe
tIQAoJYnfIbccZmgvWKAL7cwFNxz6H+e
=4Hyp
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :

John Arbash Meinel wrote:
[...]
> So do I understand correctly that:
>
> IDS => bzr:// ~30min
> InventoryDelta => bzr:// 9.5min
> IDS => file:// < 1min

Yes, that's right.

> So when I was looking at InventoryDelta (before we fixed it to actually
> send deltas :) the #1 overhead was actually in "_generate_root_texts()"
> because that was iterating over revision_trees and having to extract all
> of the inventories yet again.

Right, and that's probably still the bottleneck. But even with that bottleneck
it's much faster over the network (even a very fast "network" like the loopback
interface), so I think it's worth merging.

> Anyway I'll give the new code a look over. Unfortunately there are still
> a lot of conflated factors, like code that wants to transmit all of the
> "texts" before we transmit *any* "inventories" content, which means
> somewhere you need to do buffering.
>
> IDS "works" by batching 100 at a time, so it only buffers the 100 or so
> inventory deltas before it writes the root texts to the target repo.

Yeah. It might be nice to somehow arrange similar batching when sending streams
over the network. If we can arrange to make these streams self-contained it
would make it easier to do incremental packing too.

Actually... all we need for incremental packing (which would fix the
"inventory-delta push to 2a is very bloated on disk until the stream is done")
is a way to be able to force a repack of an uncommitted pack (i.e. in upload/,
not inserted). That's probably not too hard to add, and then the StreamSink can
trigger that every N records or when the pack reaches N bytes or something.
I'll have a play with this.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> So do I understand correctly that:
>
> IDS => bzr:// ~30min
> InventoryDelta => bzr:// 9.5min
> IDS => file:// < 1min

Also:

InventoryDelta => file:// ~9.5min

(Timings are intentionally a bit approximate, I haven't kept my laptop perfectly
idle while running these tests, etc.)

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :

> John A Meinel wrote:
> [...]
> > So do I understand correctly that:
> >
> > IDS => bzr:// ~30min
> > InventoryDelta => bzr:// 9.5min
> > IDS => file:// < 1min
>
> Also:
>
> InventoryDelta => file:// ~9.5min
>
> (Timings are intentionally a bit approximate, I haven't kept my laptop
> perfectly
> idle while running these tests, etc.)
>
> -Andrew.

So something isn't quite right with my timings as:

wbzr init-repo --2a test-2a
time wbzr push -d ../bzr/bzr.dev -r 2000 test-2a/bzr -DIDS:always
11m12.889s

I wonder if you didn't make a mistake in your timing of IDS.

In my timing of IDS versus InventoryDelta for bzrtools, it was more:

15.8s time wbzr push -d bzrtools bzr://localhost/test-2a/bzrt
19.1s time wbzr push -d bzrtools test-2a/bzrt

Which shows that IDS was actually *slower* than pushing using InventoryDelta over the local loopback.

Given the numbers you quote, 1m is *much* closer to just the simple:
  bzr init-repo --1.9 test-19
  bzr branch ../bzr/bzr.dev test-19/bzr

Which would be the simple non-converting time.

I'll be running a couple more tests to see if the new refactoring of IDS that you've done has made anything slower, but at least at a first glance the only thing I could find that would be better with IDS is that it doesn't have a second pass over all inventories in order to generate the root texts keys. And that certainly wouldn't explain 9.5m => 1.0m.

I suggest you run your timing test again, and make sure you've set everything up correctly.

I at least thought my laptop was faster than yours, though I'm on Windows and you may have upgraded your laptop since then.

$ time wbzr push -d ../bzr/bzr.dev -r 2000 bzr://localhost/test-2a/bzr -DIDS:never
real 4m32.578s

This is 4m32s down from 11m12s for IDS (file:// to file://). Maybe something did get broken. I'll be running some more tests.

Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> So something isn't quite right with my timings as:
>
> wbzr init-repo --2a test-2a
> time wbzr push -d ../bzr/bzr.dev -r 2000 test-2a/bzr -DIDS:always
> 11m12.889s
>
> I wonder if you didn't make a mistake in your timing of IDS.

Yeah, that's likely, I think I probably forgot to init --2a on that run. (I
think I reran it with the correct setup then copy-&-pasted the number from the
wrong one). The other numbers should be fine.

My corrected time for that is 23m 1s! bzr.dev's IDS is much slower for me (29.5
minutes), so I certainly haven't regressed the performance. And the test suite
says I haven't regressed the correctness.

So my patch is looking better and better...

> In my timing of IDS versus InventoryDelta for bzrtools, it was more:
>
> 15.8s time wbzr push -d bzrtools bzr://localhost/test-2a/bzrt
> 19.1s time wbzr push -d bzrtools test-2a/bzrt
>
> Which shows that IDS was actually *slower* than pushing using InventoryDelta
> over the local loopback.

Right, I'm seeing that too with bzr -r2000. (And glancing at the log, I think
it's still slower even if you don't count the time IDS spends autopacking.)

[...]
> I'll be running a couple more tests to see if the new refactoring of IDS that
> you've done has made anything slower, but at least at a first glance the only
> thing I could find that would be better with IDS is that it doesn't have a
> second pass over all inventories in order to generate the root texts keys. And
> that certainly wouldn't explain 9.5m => 1.0m.

I've taken another look at my refactoring of IDS, and I don't see any obvious
problems with my refactoring.

> I suggest you run your timing test again, and make sure you've set everything
> up correctly.
>
> I at least thought my laptop was faster than yours, though I'm on Windows and
> you may have upgraded your laptop since then.

I haven't upgraded my laptop for, well, years :)

> $ time wbzr push -d ../bzr/bzr.dev -r 2000 bzr://localhost/test-2a/bzr -DIDS:never
> real 4m32.578s
>
> This is 4m32s down from 11m12s for IDS (file:// to file://). Maybe something
> did get broken. I'll be running some more tests.

Where did you get to with your tests?

-Andrew.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.2 KiB)

+* InterDifferingSerializer has been removed. The transformations it
+ provided are now done automatically by StreamSource. (Andrew Bennetts)
+

^- So for starters this isn't true for now at least.

You added the debug flags "-DIDS:never" and "-DIDS:always". I wonder if they wouldn't be better as IDS_never and IDS_always, just because that would seem to fit better with how we have generally named them. Not a big deal.

+
+
+def _new_root_data_stream(
+ root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):

^- I find the wrapping to be a bit strange, since this and the next function don't have any parameters on the first line. More importantly, though, these should probably at least have a minimal docstring to help understand what is going on.

It is nice that you were able to factor out these helpers so that we can consistently generate the root keys and their parents. It is a shame that _new_root_data_stream is being called on:
        rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)

Which is implemented by iterating all revision trees, which requires parsing all of the revisions. Though it then immediately goes on to use those trees to generate the inventory deltas.

    def _find_root_ids(self, revs, parent_map, graph):
        revision_root = {}
        for tree in self.iter_rev_trees(revs):
            revision_id = tree.inventory.root.revision
            root_id = tree.get_root_id()
            revision_root[revision_id] = root_id
        # Find out which parents we don't already know root ids for
        parents = set()
        for revision_parents in parent_map.itervalues():
            parents.update(revision_parents)
        parents.difference_update(revision_root.keys() + [NULL_REVISION])

^- We've certainly had this pattern enough that we really should consider factoring it out into a helper. But it seems the preferred form is:

parents = set()
map(parents.update, parent_map.itervalues())
parents.difference_update(revision_root)
parents.discard(NULL_REVISION)

That should perform better than what we have. (No need to generate a list from keys() nor append to it NULL_REVISION, creating potentially a 3rd all-keys list.)

        # Limit to revisions present in the versionedfile
        parents = graph.get_parent_map(parents).keys()
        for tree in self.iter_rev_trees(parents):
            root_id = tree.get_root_id()
            revision_root[tree.get_revision_id()] = root_id
        return revision_root

I've been doing some more performance testing, and I've basically seen stuff
like:

bzr branch mysql-525 local -DIDS:always
3m15s

bzr branch mysql-525 local -DIDS:always --XML8-nocopy
2m30s

bzr branch mysql-525 local -DIDS:never
4m01s

bzr branch mysql-525 bzr://localhost/remote -DIDS:never
3m25s

So it is slower, but only about 25% slower (except for the extra-optimized
XML8-nocopy).

I think the new code converts almost as fast, and the extra repacking from IDS
actually costs it a bit of time (other than saving it disk space). The only
major deficit at this point is that there is no progress indication. So I'd
probably stick with IDS for local operations just because of that.
...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (10.9 KiB)

Overall, this is looking pretty good. A few small tweaks here and there, and possible concerns. But only one major concern.

You changed the default return value of "iter_inventories()" to return 'unordered' results, which means that "revision_trees()" also not returns 'unordered' results. Which is fairly serious, and I was able to find more than one code path that was relying on the ordering of revision_trees(). So I think the idea is sound, but it needs to be exposed in a backwards compatible manner by making it a flag that defaults to "as-requested" ordering, and then we fix up code paths as we can.

I don't specifically need to review that change, though. And I don't really want to review this patch yet again :). So I'm voting tweak, and you can submit if you agree with my findings.

119 + from bzrlib.graph import FrozenHeadsCache
120 + graph = FrozenHeadsCache(graph)
121 + new_roots_stream = _new_root_data_stream(
122 + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
123 + return [('texts', new_roots_stream)]

^- I'm pretty sure if we are using FrozenHeadsCache that we really want to use KnownGraph instead. We have a dict 'parent_map' which means we can do much more efficient heads() checks since the whole graph is already loaded. This is a minor thing we can do later, but it would probably be good not to forget.

...

392 - elif last_modified[-1] == ':':
393 - raise errors.BzrError('special revisionid found: %r' % line)
394 - if not delta_tree_references and content.startswith('tree\x00'):
395 + elif newpath_utf8 != 'None' and last_modified[-1] == ':':
396 + # Deletes have a last_modified of null:, but otherwise special
397 + # revision ids should not occur.
398 + raise errors.BzrError('special revisionid found: %r' % line)
399 + if delta_tree_references is False and content.startswith('tree\x00'):

^- What does "newpath_utf8 != 'None'" mean if someone does:

touch None
bzr add None
bzr commit -m "adding None"

Is this a serialization bug waiting to be exposed?

I guess not as it seems the paths are always prefixed with "/", right?

(So a path of None would actually be "newpath_utf8 == '/None'")

However, further down (sorry about the bad indenting):

413 if newpath_utf8 == 'None':
414 newpath = None

^- here you set "newpath=None" but you *don't* set "newpath_utf8" to None.

415 + elif newpath_utf8[:1] != '/':
416 + raise errors.BzrError(
417 + "newpath invalid (does not start with /): %r"
418 + % (newpath_utf8,))
419 else:
420 + # Trim leading slash
421 + newpath_utf8 = newpath_utf8[1:]
422 newpath = newpath_utf8.decode('utf8')
423 + content_tuple = tuple(content.split('\x00'))
424 + if content_tuple[0] == 'deleted':
425 + entry = None
426 + else:
427 + entry = _parse_entry(
428 + newpath_utf8, file_id, parent_id, last_modified,
429 + content_tuple)

And then here "newpath_utf8" is passed to _parse_entry.

Now I realize this is probably caught by "content_tuple[0] == 'deleted'". Though it feels a bit icky to rely on "newpath_utf8" in one portion and "content_tuple[0]" in another (since they can potentially be out of sync.)

I think if we just force an early error with:
if newpath_utf8 == 'None':
  newpath = None
  ...

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-06 at 16:15 +0000, John A Meinel wrote:
>
> if utf8_path.startswith('/'):
>
> ^- If this is a core routine (something called for every path) then:
> if utf8_path[:1] == '/':
>
> *is* faster than .startswith() because you
> 1) Don't have a function call
> 2) Don't have an attribute lookup
>
> I'm assuming this is a function that gets called a lot. If not, don't
> worry about it.

utf8_path[:1] == '/' requires a string copy though, for all that its
heavily tuned in the VM.

> ...
>
> 566 + if required_version < (1, 18):
> 567 + # Remote side doesn't support inventory deltas. Wrap the
> stream to
> 568 + # make sure we don't send any. If the stream contains
> inventory
> 569 + # deltas we'll interrupt the smart insert_stream request and
> 570 + # fallback to VFS.
> 571 + stream = self._stop_stream_if_inventory_delta(stream)
>
> ^- it seems a bit of a shame that if we don't support deltas we fall
> back to
> VFS completely, rather than trying something intermediate (like
> falling back to
> the original code path of sending full inventory texts, or IDS, or...)
>
> I think we are probably okay, but this code at least raises a flag. I
> expect a
> bug report along the lines of "fetching between 1.18 and older server
> is very
> slow". I haven't looked at all the code paths to determine if 1.18
> will have
> regressed against a 1.17 server. Especially when *not* converting
> formats. Have
> you at least manually tested this?

We don't want to require rewindable streams; falling back to VFS is by
far the cleanest way to fallback without restarting the stream or
requiring rewinding. I agree that there is a risk of performance issues,
OTOH launchpad, our largest deployment, will upgrade quickly :).
...
> ^- I think this should probably be ".network_name() ==
> other.network_name()"
> and we just customize the names to be the same. Is that possible to
> do?

It would be a little difficult with clients deployed already that don't
know the names are the same, and further, it would make it impossible to
request initialisation of one of them. In my review I' suggesting just
serializer equality is all thats needed.

Anyhow, my review is about half done, getting back to it ones the mail
surge is complete.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (23.5 KiB)

On Wed, 2009-08-05 at 02:45 +0000, Andrew Bennetts wrote:
> === modified file 'NEWS'

>
> +* InterDifferingSerializer has been removed. The transformations it
> + provided are now done automatically by StreamSource. (Andrew Bennetts)

^ has it?

> === modified file 'bzrlib/fetch.py'
> --- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
> +++ bzrlib/fetch.py 2009-07-29 07:08:54 +0000

> @@ -249,20 +251,77 @@
> # yet, and are unlikely to in non-rich-root environments anyway.
> root_id_order.sort(key=operator.itemgetter(0))
> # Create a record stream containing the roots to create.
> - def yield_roots():
> - for key in root_id_order:
> - root_id, rev_id = key
> - rev_parents = parent_map[rev_id]
> - # We drop revision parents with different file-ids, because
> - # that represents a rename of the root to a different location
> - # - its not actually a parent for us. (We could look for that
> - # file id in the revision tree at considerably more expense,
> - # but for now this is sufficient (and reconcile will catch and
> - # correct this anyway).
> - # When a parent revision is a ghost, we guess that its root id
> - # was unchanged (rather than trimming it from the parent list).
> - parent_keys = tuple((root_id, parent) for parent in rev_parents
> - if parent != NULL_REVISION and
> - rev_id_to_root_id.get(parent, root_id) == root_id)
> - yield FulltextContentFactory(key, parent_keys, None, '')
> - return [('texts', yield_roots())]
> + from bzrlib.graph import FrozenHeadsCache
> + graph = FrozenHeadsCache(graph)
> + new_roots_stream = _new_root_data_stream(
> + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> + return [('texts', new_roots_stream)]
> +

These functions have a lot of parameters. Perhaps that would work better
as state on the Source?

They need docs/more docs respectively.

> +def _new_root_data_stream(
> + root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
> ...
> +def _parent_keys_for_root_version(
> + root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):

> === modified file 'bzrlib/help_topics/en/debug-flags.txt'
> --- bzrlib/help_topics/en/debug-flags.txt 2009-07-24 03:15:56 +0000
> +++ bzrlib/help_topics/en/debug-flags.txt 2009-08-05 02:05:43 +0000
> @@ -12,6 +12,7 @@
> operations.
> -Dfetch Trace history copying between repositories.
> -Dfilters Emit information for debugging content filtering.
> +-Dforceinvdeltas Force use of inventory deltas during generic streaming fetch.
> -Dgraph Trace graph traversal.
> -Dhashcache Log every time a working file is read to determine its hash.
> -Dhooks Trace hook execution.
> @@ -26,3 +27,7 @@
> -Dunlock Some errors during unlock are treated as warnings.
> -Dpack Emit information about pack operations.
> -Ds...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (12.6 KiB)

John A Meinel wrote:
> Review: Needs Fixing
> Overall, this is looking pretty good. A few small tweaks here and there, and
> possible concerns. But only one major concern.

That's great news. Thanks very much for the thorough review.

> You changed the default return value of "iter_inventories()" to return
> 'unordered' results, which means that "revision_trees()" also not returns
> 'unordered' results. Which is fairly serious, and I was able to find more than
> one code path that was relying on the ordering of revision_trees(). So I think
> the idea is sound, but it needs to be exposed in a backwards compatible manner
> by making it a flag that defaults to "as-requested" ordering, and then we fix
> up code paths as we can.
>
> I don't specifically need to review that change, though. And I don't really
> want to review this patch yet again :). So I'm voting tweak, and you can
> submit if you agree with my findings.

Ouch, yes, that does sound serious. I'll fix that.

> 119 + from bzrlib.graph import FrozenHeadsCache
> 120 + graph = FrozenHeadsCache(graph)
> 121 + new_roots_stream = _new_root_data_stream(
> 122 + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> 123 + return [('texts', new_roots_stream)]
>
> ^- I'm pretty sure if we are using FrozenHeadsCache that we really want to use
> KnownGraph instead. We have a dict 'parent_map' which means we can do much
> more efficient heads() checks since the whole graph is already loaded. This is
> a minor thing we can do later, but it would probably be good not to forget.

That's a good idea, I'll try that. Perhaps we should just grep the entire code
base for FrozenHeadsCache and replace all uses with KnownGraph...

> ...
>
> 392 - elif last_modified[-1] == ':':
> 393 - raise errors.BzrError('special revisionid found: %r' % line)
> 394 - if not delta_tree_references and content.startswith('tree\x00'):
> 395 + elif newpath_utf8 != 'None' and last_modified[-1] == ':':
> 396 + # Deletes have a last_modified of null:, but otherwise special
> 397 + # revision ids should not occur.
> 398 + raise errors.BzrError('special revisionid found: %r' % line)
> 399 + if delta_tree_references is False and content.startswith('tree\x00'):
>
> ^- What does "newpath_utf8 != 'None'" mean if someone does:
>
> touch None
> bzr add None
> bzr commit -m "adding None"
>
> Is this a serialization bug waiting to be exposed?
>
> I guess not as it seems the paths are always prefixed with "/", right?
>
> (So a path of None would actually be "newpath_utf8 == '/None'")

That's right. No bug here.

> However, further down (sorry about the bad indenting):
>
> 413 if newpath_utf8 == 'None':
> 414 newpath = None
>
> ^- here you set "newpath=None" but you *don't* set "newpath_utf8" to None.
[...]
> And then here "newpath_utf8" is passed to _parse_entry.
>
> Now I realize this is probably caught by "content_tuple[0] == 'deleted'".
> Though it feels a bit icky to rely on "newpath_utf8" in one portion and
> "content_tuple[0]" in another (since they can potentially be out of sync.)

Yes. There's a bit of ickiness here too that _parse_entry redoes the utf8
decoding. I'll clean this up.

> I think if we jus...

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (24.0 KiB)

Robert Collins wrote:
> On Wed, 2009-08-05 at 02:45 +0000, Andrew Bennetts wrote:
> > === modified file 'NEWS'
>
> >
> > +* InterDifferingSerializer has been removed. The transformations it
> > + provided are now done automatically by StreamSource. (Andrew Bennetts)
>
> ^ has it?

Oops. Updated:

* InterDifferingSerializer is now only used locally. Other fetches that
  would have used InterDifferingSerializer now use the more network
  friendly StreamSource, which now automatically does the same
  transformations as InterDifferingSerializer. (Andrew Bennetts)

> > === modified file 'bzrlib/fetch.py'
> > --- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
> > +++ bzrlib/fetch.py 2009-07-29 07:08:54 +0000
>
> > @@ -249,20 +251,77 @@
> > # yet, and are unlikely to in non-rich-root environments anyway.
> > root_id_order.sort(key=operator.itemgetter(0))
> > # Create a record stream containing the roots to create.
> > - def yield_roots():
> > - for key in root_id_order:
> > - root_id, rev_id = key
> > - rev_parents = parent_map[rev_id]
> > - # We drop revision parents with different file-ids, because
> > - # that represents a rename of the root to a different location
> > - # - its not actually a parent for us. (We could look for that
> > - # file id in the revision tree at considerably more expense,
> > - # but for now this is sufficient (and reconcile will catch and
> > - # correct this anyway).
> > - # When a parent revision is a ghost, we guess that its root id
> > - # was unchanged (rather than trimming it from the parent list).
> > - parent_keys = tuple((root_id, parent) for parent in rev_parents
> > - if parent != NULL_REVISION and
> > - rev_id_to_root_id.get(parent, root_id) == root_id)
> > - yield FulltextContentFactory(key, parent_keys, None, '')
> > - return [('texts', yield_roots())]
> > + from bzrlib.graph import FrozenHeadsCache
> > + graph = FrozenHeadsCache(graph)
> > + new_roots_stream = _new_root_data_stream(
> > + root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
> > + return [('texts', new_roots_stream)]
> > +
>
> These functions have a lot of parameters. Perhaps that would work better
> as state on the Source?

Perhaps, although InterDifferingSerializer uses it too (that's where I
originally extracted this code from). Something to look at again when we remove
IDS, I think.

> They need docs/more docs respectively.

The methods they were refactored from didn't have docs either :P

Docs extended.

> > === modified file 'bzrlib/help_topics/en/debug-flags.txt'
> > --- bzrlib/help_topics/en/debug-flags.txt 2009-07-24 03:15:56 +0000
> > +++ bzrlib/help_topics/en/debug-flags.txt 2009-08-05 02:05:43 +0000
> > @@ -12,6 +12,7 @@
> > operations.
> > -Dfetch Trace history copying between repositories.
> > -Dfilters Emit information for debugging conte...

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (15.2 KiB)

On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:

I've trimmed things that are fine.

[debug flags]
> I hope we don't need to ask people to use them, but they are a cheap insurance
> policy.
>
> More usefully, they are helpful for benchmarking and testing. I'm ok with
> hiding them if you like.

I don't have a strong opinion. The flags are in our docs though as it
stands, so they should be clear to people reading them rather than
perhaps causing confusion.

> > > === modified file 'bzrlib/inventory_delta.py'
> > > --- bzrlib/inventory_delta.py 2009-04-02 05:53:12 +0000
> > > +++ bzrlib/inventory_delta.py 2009-08-05 02:30:11 +0000
> >
> > >
> > > - def __init__(self, versioned_root, tree_references):
> > > - """Create an InventoryDeltaSerializer.
> > > + def __init__(self):
> > > + """Create an InventoryDeltaSerializer."""
> > > + self._versioned_root = None
> > > + self._tree_references = None
> > > + self._entry_to_content = {
> > > + 'directory': _directory_content,
> > > + 'file': _file_content,
> > > + 'symlink': _link_content,
> > > + }
> > > +
> > > + def require_flags(self, versioned_root=None, tree_references=None):
> > > + """Set the versioned_root and/or tree_references flags for this
> > > + (de)serializer.
> >
> > ^ why is this not in the constructor? You make the fields settable only
> > once, which seems identical to being set from __init__, but harder to
> > use. As its required to be called, it should be documented in the class
> > or __init__ docstring or something like that.
>
> This is a small step towards a larger change that I don't want to tackle right
> now.
>
> We really ought to have separate classes for the serializer and for the
> deserializer. The serializer could indeed have these set at construction time,
> I think.
>
> For deserializing, we don't necessary know, or care, what the flags are; e.g. a
> repo that supports rich-root and tree-refs can deserialise any delta. It was
> pretty ugly trying to cope with declaring this upfront in the code; hence this
> method which defers it.

In principle yes, but our code always knows the source serializer,
doesn't it?

Anyhow, at the moment it seems unclear and confusing, rather than
clearer. I would find it clearer as either being on the __init__, I
think, or perhaps with two seperate constructors? No biggy, not enough
to block the patch on, but it is a very awkward halfway house at the
moment - and I wouldn't want to see it left that way - so I'm concerned
that you might switch focus away from this straight after landing it.

> > > === modified file 'bzrlib/remote.py'
> [...]
> > > + @property
> > > + def repository_class(self):
> > > + self._ensure_real()
> > > + return self._custom_format.repository_class
> > > +
> >
> > ^ this property sets off alarm bells for me. What its for?
>
> Hmm, good question... ah, I think it's for the to_format.repository_class in
> CHKRepository._get_source:
>
> def _get_source(self, to_format):
> """Return a source for streaming from this repository."""
> if (to_format.support...

Revision history for this message
Andrew Bennetts (spiv) wrote :

[not a complete reply, just for some points that have been dealt with]

Robert Collins wrote:
[...]
> > Ok, I've deleted the repository_class part of the if. Should I delete the
> > “to_format.supports_chk” part too?
>
> yes. self is chk_supporting, so self._serializer ==
> to_format.serializer, or whatever, is fine.

Done.

> > > > + if not unstacked_repo._format.supports_chks:
> > > > + # these assertions aren't valid for groupcompress repos, which may
> > > > + # transfer data than strictly necessary to avoid breaking up an
> > > > + # already-compressed block of data.
> > > > + self.assertFalse(unstacked_repo.has_revision('left'))
> > > > + self.assertFalse(unstacked_repo.has_revision('right'))
> > >
> > > ^ please check the comment, its not quite clear.
> >
> > s/transfer data/transfer more data/
> >
> > I'm not 100% certain that's actually true? What do you think with the adjusted
> > comment?
>
> Big alarm bell. gc repos aren't permitted to do that for the revision
> objects, because of our invariants about revisions. Something else is
> wrong - that if _must not_ be needed.

That if guards turns out to be unnecessary; that test passes for all scenarios
with those assertions run unconditionally. Whatever bug I had that prompted me
to add it is clearly gone.

-Andrew.

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/7 Robert Collins <email address hidden>:
> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
>
> I've trimmed things that are fine.
>
> [debug flags]
>> I hope we don't need to ask people to use them, but they are a cheap insurance
>> policy.
>>
>> More usefully, they are helpful for benchmarking and testing.  I'm ok with
>> hiding them if you like.
>
> I don't have a strong opinion. The flags are in our docs though as it
> stands, so they should be clear to people reading them rather than
> perhaps causing confusion.

I don't want the merge to get hung up on them, but I think they're
worth putting in.

It might get confusing if there are lots of flags mentioned in the
user documentation; also we may want to distinguish those ones that
change behaviour from those that are safe to leave on all the time to
gather data.

Andrew and I talked about this the other day and the reasoning was
this: we observed he's testing and comparing this by commenting out
some code. If someone's doing that (and not just for a quick
ten-minute comparison), it _may_ be worth leaving a debug option to do
it in future, so that

1- they don't accidentally leave it disabled (as has sometimes happened before)
2- other people can quickly repeat the same test with just the same change
3- if it turns out that the change does have performance or
functionality issues, users can try it with the other behaviour

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2009-08-06 at 16:15 +0000, John A Meinel wrote:
>> if utf8_path.startswith('/'):
>>
>> ^- If this is a core routine (something called for every path) then:
>> if utf8_path[:1] == '/':
>>
>> *is* faster than .startswith() because you
>> 1) Don't have a function call
>> 2) Don't have an attribute lookup
>>
>> I'm assuming this is a function that gets called a lot. If not, don't
>> worry about it.
>
> utf8_path[:1] == '/' requires a string copy though, for all that its
> heavily tuned in the VM.

Well, a single character string is a singleton so doesn't actually
require a malloc. Certainly that is a VM implementation, but still, time
it yourself.

A function call plus attribute lookup is *much* slower than mallocing a
string (by my testing).

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp8MtUACgkQJdeBCYSNAAM+sgCfWj/3SNNI7Adqs8hYI7vLeSJs
kLcAn3kE/ym/HYIn9KgM7b6TQH0+4uSq
=NhGl
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>>>> + if not unstacked_repo._format.supports_chks:
>>>> + # these assertions aren't valid for groupcompress repos, which may
>>>> + # transfer data than strictly necessary to avoid breaking up an
>>>> + # already-compressed block of data.
>>>> + self.assertFalse(unstacked_repo.has_revision('left'))
>>>> + self.assertFalse(unstacked_repo.has_revision('right'))
>>> ^ please check the comment, its not quite clear.
>> s/transfer data/transfer more data/
>>
>> I'm not 100% certain that's actually true? What do you think with the adjusted
>> comment?
>
> Big alarm bell. gc repos aren't permitted to do that for the revision
> objects, because of our invariants about revisions. Something else is
> wrong - that if _must not_ be needed.

Actually gc repos may send more data in a group, but they don't
*reference* the data. So the comment is actually incorrect. They don't
reference any more data than the strictly necessary set, and something
like "has_revision" should *definitely* be failing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp8NvQACgkQJdeBCYSNAAP4YgCcCmrttXg3iIomKS+JEsm5S6ih
pUMAoKpxevabz88K2yG5ZTx1zhB0s2Bc
=Yx5Q
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :
Download full text (8.7 KiB)

Robert Collins wrote:
> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
[...]
> [debug flags]
> > I hope we don't need to ask people to use them, but they are a cheap insurance
> > policy.
> >
> > More usefully, they are helpful for benchmarking and testing. I'm ok with
> > hiding them if you like.
>
> I don't have a strong opinion. The flags are in our docs though as it
> stands, so they should be clear to people reading them rather than
> perhaps causing confusion.

I'll leave them in, assuming I can find a way to make the ReST markup cope with
them...

[inventory_delta.py API]
> Anyhow, at the moment it seems unclear and confusing, rather than
> clearer. I would find it clearer as either being on the __init__, I
> think, or perhaps with two seperate constructors? No biggy, not enough
> to block the patch on, but it is a very awkward halfway house at the
> moment - and I wouldn't want to see it left that way - so I'm concerned
> that you might switch focus away from this straight after landing it.

I've now split the class into separate serializer and deserializer, which gives
“two separate constructors”. I think it's an improvement, I hope you'll think
so too!

[repacking single pack bug]
> > I don't understand why that case isn't covered. Perhaps you should file the bug
> > instead of me?
>
> I'd like to make sure I explain it well enough first; once you
> understand either of us can file it - and I'll be happy enough to be
> that person.
[...]
> What you need to do is change the test from len(self.packs) == 1, to
> len(packs being combined) == 1 and that_pack has the same hash.

But AIUI “self.packs” on a Packer *is* “packs being combined”. If it's not then
your explanation makes sense, but my reading of the code says otherwise.

[...]
> > > > - if not hint or pack.name in hint:
> > > > + if hint is None or pack.name in hint:
> > >
> > > ^- the original form is actually faster, AFAIK. because it skips the in
> > > test for a hint of []. I'd rather we didn't change it, for all that its
> > > not in a common code path.
> >
> > But if hint is None, then “pack.name in hint” will fail.
>
> It would, but it won't execute. As we discussed, if you could change it
> back, but add a comment - its clearly worthy of expanding on (in either
> form the necessity for testing hint isn't obvious, apparently).

Ok. (I still find it cleaner to have the hint is None case handled distinctly
from the hint is a list case, it seems to express the intent more clearly to me,
rather than making it appear to simply be a micro-optimisation. But I
understand that tastes vary.)

Oh, actually, it does matter. When hint == [], no packs should be repacked.
When hint is None, all packs should be repacked.

I've added this comment:

                # Either no hint was provided (so we are packing everything),
                # or this pack was included in the hint.

[...]
> > > > + def _should_fake_unknown(self):
[...]
>
> As we discussed, I think the method should say what it does; the fact
> that the base class encodes more complex policy than a child is a smell
> (not one that you need to fix), but it would be very nice...

Read more...

=== modified file 'NEWS'
--- NEWS 2009-08-13 12:51:59 +0000
+++ NEWS 2009-08-13 03:29:52 +0000
@@ -158,6 +158,10 @@
158 lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or158 lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or
159 ``do_end``. (Andrew Bennetts, #338561)159 ``do_end``. (Andrew Bennetts, #338561)
160 160
161* ``RemoteStreamSource.get_stream_for_missing_keys`` will fetch CHK
162 inventory pages when appropriate (by falling back to the vfs stream
163 source). (Andrew Bennetts, #406686)
164
161* Streaming from bzr servers where there is a chain of stacked branches165* Streaming from bzr servers where there is a chain of stacked branches
162 (A stacked on B stacked on C) will now work. (Robert Collins, #406597)166 (A stacked on B stacked on C) will now work. (Robert Collins, #406597)
163167
@@ -249,8 +253,10 @@
249* ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds253* ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds
250 as new a key which was already mapped. (Robert Collins)254 as new a key which was already mapped. (Robert Collins)
251255
252* InterDifferingSerializer has been removed. The transformations it256* InterDifferingSerializer is now only used locally. Other fetches that
253 provided are now done automatically by StreamSource. (Andrew Bennetts)257 would have used InterDifferingSerializer now use the more network
258 friendly StreamSource, which now automatically does the same
259 transformations as InterDifferingSerializer. (Andrew Bennetts)
254260
255* Inventory delta application catches more cases of corruption and can261* Inventory delta application catches more cases of corruption and can
256 prevent corrupt deltas from affecting consistency of data structures on262 prevent corrupt deltas from affecting consistency of data structures on
257263
=== modified file 'bzrlib/fetch.py'
--- bzrlib/fetch.py 2009-07-29 07:08:54 +0000
+++ bzrlib/fetch.py 2009-08-07 04:27:02 +0000
@@ -260,6 +260,19 @@
260260
261def _new_root_data_stream(261def _new_root_data_stream(
262 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):262 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
263 """Generate a texts substream of synthesised root entries.
264
265 Used in fetches that do rich-root upgrades.
266
267 :param root_keys_to_create: iterable of (root_id, rev_id) pairs describing
268 the root entries to create.
269 :param rev_id_to_root_id_map: dict of known rev_id -> root_id mappings for
270 calculating the parents. If a parent rev_id is not found here then it
271 will be recalculated.
272 :param parent_map: a parent map for all the revisions in
273 root_keys_to_create.
274 :param graph: a graph to use instead of repo.get_graph().
275 """
263 for root_key in root_keys_to_create:276 for root_key in root_keys_to_create:
264 root_id, rev_id = root_key277 root_id, rev_id = root_key
265 parent_keys = _parent_keys_for_root_version(278 parent_keys = _parent_keys_for_root_version(
@@ -270,7 +283,10 @@
270283
271def _parent_keys_for_root_version(284def _parent_keys_for_root_version(
272 root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):285 root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
273 """Get the parent keys for a given root id."""286 """Get the parent keys for a given root id.
287
288 A helper function for _new_root_data_stream.
289 """
274 # Include direct parents of the revision, but only if they used the same290 # Include direct parents of the revision, but only if they used the same
275 # root_id and are heads.291 # root_id and are heads.
276 rev_parents = parent_map[rev_id]292 rev_parents = parent_map[rev_id]
277293
=== modified file 'bzrlib/inventory_delta.py'
--- bzrlib/inventory_delta.py 2009-08-05 02:30:11 +0000
+++ bzrlib/inventory_delta.py 2009-08-11 08:40:32 +0000
@@ -29,6 +29,25 @@
29from bzrlib import inventory29from bzrlib import inventory
30from bzrlib.revision import NULL_REVISION30from bzrlib.revision import NULL_REVISION
3131
32FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
33
34
35class InventoryDeltaError(errors.BzrError):
36 """An error when serializing or deserializing an inventory delta."""
37
38 # Most errors when serializing and deserializing are due to bugs, although
39 # damaged input (i.e. a bug in a different process) could cause
40 # deserialization errors too.
41 internal_error = True
42
43
44class IncompatibleInventoryDelta(errors.BzrError):
45 """The delta could not be deserialised because its contents conflict with
46 the allow_versioned_root or allow_tree_references flags of the
47 deserializer.
48 """
49 internal_error = False
50
3251
33def _directory_content(entry):52def _directory_content(entry):
34 """Serialize the content component of entry which is a directory.53 """Serialize the content component of entry which is a directory.
@@ -49,7 +68,7 @@
49 exec_bytes = ''68 exec_bytes = ''
50 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)69 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)
51 if None in size_exec_sha:70 if None in size_exec_sha:
52 raise errors.BzrError('Missing size or sha for %s' % entry.file_id)71 raise InventoryDeltaError('Missing size or sha for %s' % entry.file_id)
53 return "file\x00%d\x00%s\x00%s" % size_exec_sha72 return "file\x00%d\x00%s\x00%s" % size_exec_sha
5473
5574
@@ -60,7 +79,7 @@
60 """79 """
61 target = entry.symlink_target80 target = entry.symlink_target
62 if target is None:81 if target is None:
63 raise errors.BzrError('Missing target for %s' % entry.file_id)82 raise InventoryDeltaError('Missing target for %s' % entry.file_id)
64 return "link\x00%s" % target.encode('utf8')83 return "link\x00%s" % target.encode('utf8')
6584
6685
@@ -71,7 +90,8 @@
71 """90 """
72 tree_revision = entry.reference_revision91 tree_revision = entry.reference_revision
73 if tree_revision is None:92 if tree_revision is None:
74 raise errors.BzrError('Missing reference revision for %s' % entry.file_id)93 raise InventoryDeltaError(
94 'Missing reference revision for %s' % entry.file_id)
75 return "tree\x00%s" % tree_revision95 return "tree\x00%s" % tree_revision
7696
7797
@@ -116,39 +136,22 @@
116136
117137
118class InventoryDeltaSerializer(object):138class InventoryDeltaSerializer(object):
119 """Serialize and deserialize inventory deltas."""139 """Serialize inventory deltas."""
120140
121 # XXX: really, the serializer and deserializer should be two separate141 def __init__(self, versioned_root, tree_references):
122 # classes.142 """Create an InventoryDeltaSerializer.
123143
124 FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'144 :param versioned_root: If True, any root entry that is seen is expected
125145 to be versioned, and root entries can have any fileid.
126 def __init__(self):146 :param tree_references: If True support tree-reference entries.
127 """Create an InventoryDeltaSerializer."""147 """
128 self._versioned_root = None148 self._versioned_root = versioned_root
129 self._tree_references = None149 self._tree_references = tree_references
130 self._entry_to_content = {150 self._entry_to_content = {
131 'directory': _directory_content,151 'directory': _directory_content,
132 'file': _file_content,152 'file': _file_content,
133 'symlink': _link_content,153 'symlink': _link_content,
134 }154 }
135
136 def require_flags(self, versioned_root=None, tree_references=None):
137 """Set the versioned_root and/or tree_references flags for this
138 (de)serializer.
139
140 :param versioned_root: If True, any root entry that is seen is expected
141 to be versioned, and root entries can have any fileid.
142 :param tree_references: If True support tree-reference entries.
143 """
144 if versioned_root is not None and self._versioned_root is not None:
145 raise AssertionError(
146 "require_flags(versioned_root=...) already called.")
147 if tree_references is not None and self._tree_references is not None:
148 raise AssertionError(
149 "require_flags(tree_references=...) already called.")
150 self._versioned_root = versioned_root
151 self._tree_references = tree_references
152 if tree_references:155 if tree_references:
153 self._entry_to_content['tree-reference'] = _reference_content156 self._entry_to_content['tree-reference'] = _reference_content
154157
@@ -167,10 +170,6 @@
167 takes.170 takes.
168 :return: The serialized delta as lines.171 :return: The serialized delta as lines.
169 """172 """
170 if self._versioned_root is None or self._tree_references is None:
171 raise AssertionError(
172 "Cannot serialise unless versioned_root/tree_references flags "
173 "are both set.")
174 if type(old_name) is not str:173 if type(old_name) is not str:
175 raise TypeError('old_name should be str, got %r' % (old_name,))174 raise TypeError('old_name should be str, got %r' % (old_name,))
176 if type(new_name) is not str:175 if type(new_name) is not str:
@@ -180,11 +179,11 @@
180 for delta_item in delta_to_new:179 for delta_item in delta_to_new:
181 line = to_line(delta_item, new_name)180 line = to_line(delta_item, new_name)
182 if line.__class__ != str:181 if line.__class__ != str:
183 raise errors.BzrError(182 raise InventoryDeltaError(
184 'to_line generated non-str output %r' % lines[-1])183 'to_line generated non-str output %r' % lines[-1])
185 lines.append(line)184 lines.append(line)
186 lines.sort()185 lines.sort()
187 lines[0] = "format: %s\n" % InventoryDeltaSerializer.FORMAT_1186 lines[0] = "format: %s\n" % FORMAT_1
188 lines[1] = "parent: %s\n" % old_name187 lines[1] = "parent: %s\n" % old_name
189 lines[2] = "version: %s\n" % new_name188 lines[2] = "version: %s\n" % new_name
190 lines[3] = "versioned_root: %s\n" % self._serialize_bool(189 lines[3] = "versioned_root: %s\n" % self._serialize_bool(
@@ -234,23 +233,37 @@
234 # file-ids other than TREE_ROOT, e.g. repo formats that use the233 # file-ids other than TREE_ROOT, e.g. repo formats that use the
235 # xml5 serializer.234 # xml5 serializer.
236 if last_modified != new_version:235 if last_modified != new_version:
237 raise errors.BzrError(236 raise InventoryDeltaError(
238 'Version present for / in %s (%s != %s)'237 'Version present for / in %s (%s != %s)'
239 % (file_id, last_modified, new_version))238 % (file_id, last_modified, new_version))
240 if last_modified is None:239 if last_modified is None:
241 raise errors.BzrError("no version for fileid %s" % file_id)240 raise InventoryDeltaError("no version for fileid %s" % file_id)
242 content = self._entry_to_content[entry.kind](entry)241 content = self._entry_to_content[entry.kind](entry)
243 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %242 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %
244 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,243 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,
245 content))244 content))
246245
246
247class InventoryDeltaDeserializer(object):
248 """Deserialize inventory deltas."""
249
250 def __init__(self, allow_versioned_root=True, allow_tree_references=True):
251 """Create an InventoryDeltaDeserializer.
252
253 :param versioned_root: If True, any root entry that is seen is expected
254 to be versioned, and root entries can have any fileid.
255 :param tree_references: If True support tree-reference entries.
256 """
257 self._allow_versioned_root = allow_versioned_root
258 self._allow_tree_references = allow_tree_references
259
247 def _deserialize_bool(self, value):260 def _deserialize_bool(self, value):
248 if value == "true":261 if value == "true":
249 return True262 return True
250 elif value == "false":263 elif value == "false":
251 return False264 return False
252 else:265 else:
253 raise errors.BzrError("value %r is not a bool" % (value,))266 raise InventoryDeltaError("value %r is not a bool" % (value,))
254267
255 def parse_text_bytes(self, bytes):268 def parse_text_bytes(self, bytes):
256 """Parse the text bytes of a serialized inventory delta.269 """Parse the text bytes of a serialized inventory delta.
@@ -266,32 +279,24 @@
266 """279 """
267 if bytes[-1:] != '\n':280 if bytes[-1:] != '\n':
268 last_line = bytes.rsplit('\n', 1)[-1]281 last_line = bytes.rsplit('\n', 1)[-1]
269 raise errors.BzrError('last line not empty: %r' % (last_line,))282 raise InventoryDeltaError('last line not empty: %r' % (last_line,))
270 lines = bytes.split('\n')[:-1] # discard the last empty line283 lines = bytes.split('\n')[:-1] # discard the last empty line
271 if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:284 if not lines or lines[0] != 'format: %s' % FORMAT_1:
272 raise errors.BzrError('unknown format %r' % lines[0:1])285 raise InventoryDeltaError('unknown format %r' % lines[0:1])
273 if len(lines) < 2 or not lines[1].startswith('parent: '):286 if len(lines) < 2 or not lines[1].startswith('parent: '):
274 raise errors.BzrError('missing parent: marker')287 raise InventoryDeltaError('missing parent: marker')
275 delta_parent_id = lines[1][8:]288 delta_parent_id = lines[1][8:]
276 if len(lines) < 3 or not lines[2].startswith('version: '):289 if len(lines) < 3 or not lines[2].startswith('version: '):
277 raise errors.BzrError('missing version: marker')290 raise InventoryDeltaError('missing version: marker')
278 delta_version_id = lines[2][9:]291 delta_version_id = lines[2][9:]
279 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):292 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):
280 raise errors.BzrError('missing versioned_root: marker')293 raise InventoryDeltaError('missing versioned_root: marker')
281 delta_versioned_root = self._deserialize_bool(lines[3][16:])294 delta_versioned_root = self._deserialize_bool(lines[3][16:])
282 if len(lines) < 5 or not lines[4].startswith('tree_references: '):295 if len(lines) < 5 or not lines[4].startswith('tree_references: '):
283 raise errors.BzrError('missing tree_references: marker')296 raise InventoryDeltaError('missing tree_references: marker')
284 delta_tree_references = self._deserialize_bool(lines[4][17:])297 delta_tree_references = self._deserialize_bool(lines[4][17:])
285 if (self._versioned_root is not None and298 if (not self._allow_versioned_root and delta_versioned_root):
286 delta_versioned_root != self._versioned_root):299 raise IncompatibleInventoryDelta("versioned_root not allowed")
287 raise errors.BzrError(
288 "serialized versioned_root flag is wrong: %s" %
289 (delta_versioned_root,))
290 if (self._tree_references is not None
291 and delta_tree_references != self._tree_references):
292 raise errors.BzrError(
293 "serialized tree_references flag is wrong: %s" %
294 (delta_tree_references,))
295 result = []300 result = []
296 seen_ids = set()301 seen_ids = set()
297 line_iter = iter(lines)302 line_iter = iter(lines)
@@ -302,24 +307,30 @@
302 content) = line.split('\x00', 5)307 content) = line.split('\x00', 5)
303 parent_id = parent_id or None308 parent_id = parent_id or None
304 if file_id in seen_ids:309 if file_id in seen_ids:
305 raise errors.BzrError(310 raise InventoryDeltaError(
306 "duplicate file id in inventory delta %r" % lines)311 "duplicate file id in inventory delta %r" % lines)
307 seen_ids.add(file_id)312 seen_ids.add(file_id)
308 if (newpath_utf8 == '/' and not delta_versioned_root and313 if (newpath_utf8 == '/' and not delta_versioned_root and
309 last_modified != delta_version_id):314 last_modified != delta_version_id):
310 # Delta claims to be not rich root, yet here's a root entry315 # Delta claims to be not have a versioned root, yet here's
311 # with either non-default version, i.e. it's rich...316 # a root entry with a non-default version.
312 raise errors.BzrError("Versioned root found: %r" % line)317 raise InventoryDeltaError("Versioned root found: %r" % line)
313 elif newpath_utf8 != 'None' and last_modified[-1] == ':':318 elif newpath_utf8 != 'None' and last_modified[-1] == ':':
314 # Deletes have a last_modified of null:, but otherwise special319 # Deletes have a last_modified of null:, but otherwise special
315 # revision ids should not occur.320 # revision ids should not occur.
316 raise errors.BzrError('special revisionid found: %r' % line)321 raise InventoryDeltaError('special revisionid found: %r' % line)
317 if delta_tree_references is False and content.startswith('tree\x00'):322 if content.startswith('tree\x00'):
318 raise errors.BzrError("Tree reference found: %r" % line)323 if delta_tree_references is False:
324 raise InventoryDeltaError(
325 "Tree reference found (but header said "
326 "tree_references: false): %r" % line)
327 elif not self._allow_tree_references:
328 raise IncompatibleInventoryDelta(
329 "Tree reference not allowed")
319 if oldpath_utf8 == 'None':330 if oldpath_utf8 == 'None':
320 oldpath = None331 oldpath = None
321 elif oldpath_utf8[:1] != '/':332 elif oldpath_utf8[:1] != '/':
322 raise errors.BzrError(333 raise InventoryDeltaError(
323 "oldpath invalid (does not start with /): %r"334 "oldpath invalid (does not start with /): %r"
324 % (oldpath_utf8,))335 % (oldpath_utf8,))
325 else:336 else:
@@ -328,7 +339,7 @@
328 if newpath_utf8 == 'None':339 if newpath_utf8 == 'None':
329 newpath = None340 newpath = None
330 elif newpath_utf8[:1] != '/':341 elif newpath_utf8[:1] != '/':
331 raise errors.BzrError(342 raise InventoryDeltaError(
332 "newpath invalid (does not start with /): %r"343 "newpath invalid (does not start with /): %r"
333 % (newpath_utf8,))344 % (newpath_utf8,))
334 else:345 else:
@@ -340,15 +351,14 @@
340 entry = None351 entry = None
341 else:352 else:
342 entry = _parse_entry(353 entry = _parse_entry(
343 newpath_utf8, file_id, parent_id, last_modified,354 newpath, file_id, parent_id, last_modified, content_tuple)
344 content_tuple)
345 delta_item = (oldpath, newpath, file_id, entry)355 delta_item = (oldpath, newpath, file_id, entry)
346 result.append(delta_item)356 result.append(delta_item)
347 return (delta_parent_id, delta_version_id, delta_versioned_root,357 return (delta_parent_id, delta_version_id, delta_versioned_root,
348 delta_tree_references, result)358 delta_tree_references, result)
349359
350360
351def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):361def _parse_entry(path, file_id, parent_id, last_modified, content):
352 entry_factory = {362 entry_factory = {
353 'dir': _dir_to_entry,363 'dir': _dir_to_entry,
354 'file': _file_to_entry,364 'file': _file_to_entry,
@@ -356,10 +366,10 @@
356 'tree': _tree_to_entry,366 'tree': _tree_to_entry,
357 }367 }
358 kind = content[0]368 kind = content[0]
359 if utf8_path.startswith('/'):369 if path.startswith('/'):
360 raise AssertionError370 raise AssertionError
361 path = utf8_path.decode('utf8')
362 name = basename(path)371 name = basename(path)
363 return entry_factory[content[0]](372 return entry_factory[content[0]](
364 content, name, parent_id, file_id, last_modified)373 content, name, parent_id, file_id, last_modified)
365374
375
366376
=== modified file 'bzrlib/remote.py'
--- bzrlib/remote.py 2009-08-13 12:51:59 +0000
+++ bzrlib/remote.py 2009-08-13 08:21:02 +0000
@@ -587,11 +587,6 @@
587 self._ensure_real()587 self._ensure_real()
588 return self._custom_format._serializer588 return self._custom_format._serializer
589589
590 @property
591 def repository_class(self):
592 self._ensure_real()
593 return self._custom_format.repository_class
594
595590
596class RemoteRepository(_RpcHelper):591class RemoteRepository(_RpcHelper):
597 """Repository accessed over rpc.592 """Repository accessed over rpc.
@@ -1684,9 +1679,6 @@
16841679
1685class RemoteStreamSink(repository.StreamSink):1680class RemoteStreamSink(repository.StreamSink):
16861681
1687 def __init__(self, target_repo):
1688 repository.StreamSink.__init__(self, target_repo)
1689
1690 def _insert_real(self, stream, src_format, resume_tokens):1682 def _insert_real(self, stream, src_format, resume_tokens):
1691 self.target_repo._ensure_real()1683 self.target_repo._ensure_real()
1692 sink = self.target_repo._real_repository._get_sink()1684 sink = self.target_repo._real_repository._get_sink()
@@ -1708,6 +1700,10 @@
1708 client = target._client1700 client = target._client
1709 medium = client._medium1701 medium = client._medium
1710 path = target.bzrdir._path_for_remote_call(client)1702 path = target.bzrdir._path_for_remote_call(client)
1703 # Probe for the verb to use with an empty stream before sending the
1704 # real stream to it. We do this both to avoid the risk of sending a
1705 # large request that is then rejected, and because we don't want to
1706 # implement a way to buffer, rewind, or restart the stream.
1711 found_verb = False1707 found_verb = False
1712 for verb, required_version in candidate_calls:1708 for verb, required_version in candidate_calls:
1713 if medium._is_remote_before(required_version):1709 if medium._is_remote_before(required_version):
17141710
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-08-13 12:51:59 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-08-13 08:20:28 +0000
@@ -484,6 +484,8 @@
484 old_pack = self.packs[0]484 old_pack = self.packs[0]
485 if old_pack.name == self.new_pack._hash.hexdigest():485 if old_pack.name == self.new_pack._hash.hexdigest():
486 # The single old pack was already optimally packed.486 # The single old pack was already optimally packed.
487 trace.mutter('single pack %s was already optimally packed',
488 old_pack.name)
487 self.new_pack.abort()489 self.new_pack.abort()
488 return None490 return None
489 self.pb.update('finishing repack', 6, 7)491 self.pb.update('finishing repack', 6, 7)
@@ -600,7 +602,7 @@
600 packer = GCCHKPacker(self, packs, '.autopack',602 packer = GCCHKPacker(self, packs, '.autopack',
601 reload_func=reload_func)603 reload_func=reload_func)
602 try:604 try:
603 packer.pack()605 result = packer.pack()
604 except errors.RetryWithNewPacks:606 except errors.RetryWithNewPacks:
605 # An exception is propagating out of this context, make sure607 # An exception is propagating out of this context, make sure
606 # this packer has cleaned up. Packer() doesn't set its new_pack608 # this packer has cleaned up. Packer() doesn't set its new_pack
@@ -609,6 +611,8 @@
609 if packer.new_pack is not None:611 if packer.new_pack is not None:
610 packer.new_pack.abort()612 packer.new_pack.abort()
611 raise613 raise
614 if result is None:
615 return
612 for pack in packs:616 for pack in packs:
613 self._remove_pack_from_memory(pack)617 self._remove_pack_from_memory(pack)
614 # record the newly available packs and stop advertising the old618 # record the newly available packs and stop advertising the old
@@ -792,6 +796,8 @@
792796
793 def _iter_inventories(self, revision_ids, ordering):797 def _iter_inventories(self, revision_ids, ordering):
794 """Iterate over many inventory objects."""798 """Iterate over many inventory objects."""
799 if ordering is None:
800 ordering = 'unordered'
795 keys = [(revision_id,) for revision_id in revision_ids]801 keys = [(revision_id,) for revision_id in revision_ids]
796 stream = self.inventories.get_record_stream(keys, ordering, True)802 stream = self.inventories.get_record_stream(keys, ordering, True)
797 texts = {}803 texts = {}
@@ -903,9 +909,7 @@
903909
904 def _get_source(self, to_format):910 def _get_source(self, to_format):
905 """Return a source for streaming from this repository."""911 """Return a source for streaming from this repository."""
906 if (to_format.supports_chks and912 if self._format._serializer == to_format._serializer:
907 self._format.repository_class is to_format.repository_class and
908 self._format._serializer == to_format._serializer):
909 # We must be exactly the same format, otherwise stuff like the chk913 # We must be exactly the same format, otherwise stuff like the chk
910 # page layout might be different.914 # page layout might be different.
911 # Actually, this test is just slightly looser than exact so that915 # Actually, this test is just slightly looser than exact so that
912916
=== modified file 'bzrlib/repofmt/pack_repo.py'
--- bzrlib/repofmt/pack_repo.py 2009-08-13 12:51:59 +0000
+++ bzrlib/repofmt/pack_repo.py 2009-08-13 07:26:29 +0000
@@ -1575,6 +1575,8 @@
1575 pack_operations = [[0, []]]1575 pack_operations = [[0, []]]
1576 for pack in self.all_packs():1576 for pack in self.all_packs():
1577 if hint is None or pack.name in hint:1577 if hint is None or pack.name in hint:
1578 # Either no hint was provided (so we are packing everything),
1579 # or this pack was included in the hint.
1578 pack_operations[-1][0] += pack.get_revision_count()1580 pack_operations[-1][0] += pack.get_revision_count()
1579 pack_operations[-1][1].append(pack)1581 pack_operations[-1][1].append(pack)
1580 self._execute_pack_operations(pack_operations, OptimisingPacker)1582 self._execute_pack_operations(pack_operations, OptimisingPacker)
15811583
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-08-13 12:51:59 +0000
+++ bzrlib/repository.py 2009-08-13 08:56:51 +0000
@@ -1537,6 +1537,8 @@
1537 """Commit the contents accrued within the current write group.1537 """Commit the contents accrued within the current write group.
15381538
1539 :seealso: start_write_group.1539 :seealso: start_write_group.
1540
1541 :return: it may return an opaque hint that can be passed to 'pack'.
1540 """1542 """
1541 if self._write_group is not self.get_transaction():1543 if self._write_group is not self.get_transaction():
1542 # has an unlock or relock occured ?1544 # has an unlock or relock occured ?
@@ -2348,7 +2350,7 @@
2348 """Get Inventory object by revision id."""2350 """Get Inventory object by revision id."""
2349 return self.iter_inventories([revision_id]).next()2351 return self.iter_inventories([revision_id]).next()
23502352
2351 def iter_inventories(self, revision_ids, ordering='unordered'):2353 def iter_inventories(self, revision_ids, ordering=None):
2352 """Get many inventories by revision_ids.2354 """Get many inventories by revision_ids.
23532355
2354 This will buffer some or all of the texts used in constructing the2356 This will buffer some or all of the texts used in constructing the
@@ -2356,7 +2358,9 @@
2356 time.2358 time.
23572359
2358 :param revision_ids: The expected revision ids of the inventories.2360 :param revision_ids: The expected revision ids of the inventories.
2359 :param ordering: optional ordering, e.g. 'topological'.2361 :param ordering: optional ordering, e.g. 'topological'. If not
2362 specified, the order of revision_ids will be preserved (by
2363 buffering if necessary).
2360 :return: An iterator of inventories.2364 :return: An iterator of inventories.
2361 """2365 """
2362 if ((None in revision_ids)2366 if ((None in revision_ids)
@@ -2370,29 +2374,41 @@
2370 for text, revision_id in inv_xmls:2374 for text, revision_id in inv_xmls:
2371 yield self.deserialise_inventory(revision_id, text)2375 yield self.deserialise_inventory(revision_id, text)
23722376
2373 def _iter_inventory_xmls(self, revision_ids, ordering='unordered'):2377 def _iter_inventory_xmls(self, revision_ids, ordering):
2378 if ordering is None:
2379 order_as_requested = True
2380 ordering = 'unordered'
2381 else:
2382 order_as_requested = False
2374 keys = [(revision_id,) for revision_id in revision_ids]2383 keys = [(revision_id,) for revision_id in revision_ids]
2375 if not keys:2384 if not keys:
2376 return2385 return
2377 key_iter = iter(keys)2386 if order_as_requested:
2378 next_key = key_iter.next()2387 key_iter = iter(keys)
2388 next_key = key_iter.next()
2379 stream = self.inventories.get_record_stream(keys, ordering, True)2389 stream = self.inventories.get_record_stream(keys, ordering, True)
2380 text_chunks = {}2390 text_chunks = {}
2381 for record in stream:2391 for record in stream:
2382 if record.storage_kind != 'absent':2392 if record.storage_kind != 'absent':
2383 text_chunks[record.key] = record.get_bytes_as('chunked')2393 chunks = record.get_bytes_as('chunked')
2394 if order_as_requested:
2395 text_chunks[record.key] = chunks
2396 else:
2397 yield ''.join(chunks), record.key[-1]
2384 else:2398 else:
2385 raise errors.NoSuchRevision(self, record.key)2399 raise errors.NoSuchRevision(self, record.key)
2386 while next_key in text_chunks:2400 if order_as_requested:
2387 chunks = text_chunks.pop(next_key)2401 # Yield as many results as we can while preserving order.
2388 yield ''.join(chunks), next_key[-1]2402 while next_key in text_chunks:
2389 try:2403 chunks = text_chunks.pop(next_key)
2390 next_key = key_iter.next()2404 yield ''.join(chunks), next_key[-1]
2391 except StopIteration:2405 try:
2392 # We still want to fully consume the get_record_stream,2406 next_key = key_iter.next()
2393 # just in case it is not actually finished at this point2407 except StopIteration:
2394 next_key = None2408 # We still want to fully consume the get_record_stream,
2395 break2409 # just in case it is not actually finished at this point
2410 next_key = None
2411 break
23962412
2397 def deserialise_inventory(self, revision_id, xml):2413 def deserialise_inventory(self, revision_id, xml):
2398 """Transform the xml into an inventory object.2414 """Transform the xml into an inventory object.
@@ -4224,20 +4240,14 @@
4224 for record in substream:4240 for record in substream:
4225 # Insert the delta directly4241 # Insert the delta directly
4226 inventory_delta_bytes = record.get_bytes_as('fulltext')4242 inventory_delta_bytes = record.get_bytes_as('fulltext')
4227 deserialiser = inventory_delta.InventoryDeltaSerializer()4243 deserialiser = inventory_delta.InventoryDeltaDeserializer()
4228 parse_result = deserialiser.parse_text_bytes(inventory_delta_bytes)4244 try:
4245 parse_result = deserialiser.parse_text_bytes(
4246 inventory_delta_bytes)
4247 except inventory_delta.IncompatibleInventoryDelta, err:
4248 trace.mutter("Incompatible delta: %s", err.msg)
4249 raise errors.IncompatibleRevision(self.target_repo._format)
4229 basis_id, new_id, rich_root, tree_refs, inv_delta = parse_result4250 basis_id, new_id, rich_root, tree_refs, inv_delta = parse_result
4230 # Make sure the delta is compatible with the target
4231 if rich_root and not target_rich_root:
4232 raise errors.IncompatibleRevision(self.target_repo._format)
4233 if tree_refs and not target_tree_refs:
4234 # The source supports tree refs and the target doesn't. Check
4235 # the delta for tree refs; if it has any we can't insert it.
4236 for delta_item in inv_delta:
4237 entry = delta_item[3]
4238 if entry.kind == 'tree-reference':
4239 raise errors.IncompatibleRevision(
4240 self.target_repo._format)
4241 revision_id = new_id4251 revision_id = new_id
4242 parents = [key[0] for key in record.parents]4252 parents = [key[0] for key in record.parents]
4243 self.target_repo.add_inventory_by_delta(4253 self.target_repo.add_inventory_by_delta(
@@ -4404,10 +4414,6 @@
4404 # Some missing keys are genuinely ghosts, filter those out.4414 # Some missing keys are genuinely ghosts, filter those out.
4405 present = self.from_repository.inventories.get_parent_map(keys)4415 present = self.from_repository.inventories.get_parent_map(keys)
4406 revs = [key[0] for key in present]4416 revs = [key[0] for key in present]
4407 # As with the original stream, we may need to generate root
4408 # texts for the inventories we're about to stream.
4409 for _ in self._generate_root_texts(revs):
4410 yield _
4411 # Get the inventory stream more-or-less as we do for the4417 # Get the inventory stream more-or-less as we do for the
4412 # original stream; there's no reason to assume that records4418 # original stream; there's no reason to assume that records
4413 # direct from the source will be suitable for the sink. (Think4419 # direct from the source will be suitable for the sink. (Think
@@ -4474,7 +4480,7 @@
44744480
4475 def _get_convertable_inventory_stream(self, revision_ids,4481 def _get_convertable_inventory_stream(self, revision_ids,
4476 delta_versus_null=False):4482 delta_versus_null=False):
4477 # The source is using CHKs, but the target either doesn't or is has a4483 # The source is using CHKs, but the target either doesn't or it has a
4478 # different serializer. The StreamSink code expects to be able to4484 # different serializer. The StreamSink code expects to be able to
4479 # convert on the target, so we need to put bytes-on-the-wire that can4485 # convert on the target, so we need to put bytes-on-the-wire that can
4480 # be converted. That means inventory deltas (if the remote is <1.18,4486 # be converted. That means inventory deltas (if the remote is <1.18,
@@ -4499,17 +4505,17 @@
4499 # method...4505 # method...
4500 inventories = self.from_repository.iter_inventories(4506 inventories = self.from_repository.iter_inventories(
4501 revision_ids, 'topological')4507 revision_ids, 'topological')
4502 # XXX: ideally these flags would be per-revision, not per-repo (e.g.
4503 # streaming a non-rich-root revision out of a rich-root repo back into
4504 # a non-rich-root repo ought to be allowed)
4505 format = from_repo._format4508 format = from_repo._format
4506 flags = (format.rich_root_data, format.supports_tree_reference)
4507 invs_sent_so_far = set([_mod_revision.NULL_REVISION])4509 invs_sent_so_far = set([_mod_revision.NULL_REVISION])
4508 inventory_cache = lru_cache.LRUCache(50)4510 inventory_cache = lru_cache.LRUCache(50)
4509 null_inventory = from_repo.revision_tree(4511 null_inventory = from_repo.revision_tree(
4510 _mod_revision.NULL_REVISION).inventory4512 _mod_revision.NULL_REVISION).inventory
4511 serializer = inventory_delta.InventoryDeltaSerializer()4513 # XXX: ideally the rich-root/tree-refs flags would be per-revision, not
4512 serializer.require_flags(*flags)4514 # per-repo (e.g. streaming a non-rich-root revision out of a rich-root
4515 # repo back into a non-rich-root repo ought to be allowed)
4516 serializer = inventory_delta.InventoryDeltaSerializer(
4517 versioned_root=format.rich_root_data,
4518 tree_references=format.supports_tree_reference)
4513 for inv in inventories:4519 for inv in inventories:
4514 key = (inv.revision_id,)4520 key = (inv.revision_id,)
4515 parent_keys = parent_map.get(key, ())4521 parent_keys = parent_map.get(key, ())
45164522
=== modified file 'bzrlib/smart/repository.py'
--- bzrlib/smart/repository.py 2009-08-04 00:51:24 +0000
+++ bzrlib/smart/repository.py 2009-08-13 08:20:53 +0000
@@ -424,18 +424,21 @@
424 return None # Signal that we want a body.424 return None # Signal that we want a body.
425425
426 def _should_fake_unknown(self):426 def _should_fake_unknown(self):
427 # This is a workaround for bugs in pre-1.18 clients that claim to427 """Return True if we should return UnknownMethod to the client.
428 # support receiving streams of CHK repositories. The pre-1.18 client428
429 # expects inventory records to be serialized in the format defined by429 This is a workaround for bugs in pre-1.18 clients that claim to
430 # to_network_name, but in pre-1.18 (at least) that format definition430 support receiving streams of CHK repositories. The pre-1.18 client
431 # tries to use the xml5 serializer, which does not correctly handle431 expects inventory records to be serialized in the format defined by
432 # rich-roots. After 1.18 the client can also accept inventory-deltas432 to_network_name, but in pre-1.18 (at least) that format definition
433 # (which avoids this issue), and those clients will use the433 tries to use the xml5 serializer, which does not correctly handle
434 # Repository.get_stream_1.18 verb instead of this one.434 rich-roots. After 1.18 the client can also accept inventory-deltas
435 # So: if this repository is CHK, and the to_format doesn't match,435 (which avoids this issue), and those clients will use the
436 # we should just fake an UnknownSmartMethod error so that the client436 Repository.get_stream_1.18 verb instead of this one.
437 # will fallback to VFS, rather than sending it a stream we know it437 So: if this repository is CHK, and the to_format doesn't match,
438 # cannot handle.438 we should just fake an UnknownSmartMethod error so that the client
439 will fallback to VFS, rather than sending it a stream we know it
440 cannot handle.
441 """
439 from_format = self._repository._format442 from_format = self._repository._format
440 to_format = self._to_format443 to_format = self._to_format
441 if not from_format.supports_chks:444 if not from_format.supports_chks:
@@ -489,8 +492,7 @@
489class SmartServerRepositoryGetStream_1_18(SmartServerRepositoryGetStream):492class SmartServerRepositoryGetStream_1_18(SmartServerRepositoryGetStream):
490493
491 def _should_fake_unknown(self):494 def _should_fake_unknown(self):
492 # The client is at least 1.18, so we don't need to work around any495 """Returns False; we don't need to workaround bugs in 1.18+ clients."""
493 # bugs.
494 return False496 return False
495497
496498
497499
=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
--- bzrlib/tests/per_interrepository/__init__.py 2009-08-13 12:51:59 +0000
+++ bzrlib/tests/per_interrepository/__init__.py 2009-08-13 03:30:41 +0000
@@ -46,12 +46,12 @@
46 """Transform the input formats to a list of scenarios.46 """Transform the input formats to a list of scenarios.
4747
48 :param formats: A list of tuples:48 :param formats: A list of tuples:
49 (interrepo_class, repository_format, repository_format_to).49 (label, repository_format, repository_format_to).
50 """50 """
51 result = []51 result = []
52 for repository_format, repository_format_to in formats:52 for label, repository_format, repository_format_to in formats:
53 id = '%s,%s' % (repository_format.__class__.__name__,53 id = '%s,%s,%s' % (label, repository_format.__class__.__name__,
54 repository_format_to.__class__.__name__)54 repository_format_to.__class__.__name__)
55 scenario = (id,55 scenario = (id,
56 {"transport_server": transport_server,56 {"transport_server": transport_server,
57 "transport_readonly_server": transport_readonly_server,57 "transport_readonly_server": transport_readonly_server,
@@ -71,8 +71,8 @@
71 weaverepo,71 weaverepo,
72 )72 )
73 result = []73 result = []
74 def add_combo(from_format, to_format):74 def add_combo(label, from_format, to_format):
75 result.append((from_format, to_format))75 result.append((label, from_format, to_format))
76 # test the default InterRepository between format 6 and the current76 # test the default InterRepository between format 6 and the current
77 # default format.77 # default format.
78 # XXX: robertc 20060220 reinstate this when there are two supported78 # XXX: robertc 20060220 reinstate this when there are two supported
@@ -83,32 +83,47 @@
83 for optimiser_class in InterRepository._optimisers:83 for optimiser_class in InterRepository._optimisers:
84 format_to_test = optimiser_class._get_repo_format_to_test()84 format_to_test = optimiser_class._get_repo_format_to_test()
85 if format_to_test is not None:85 if format_to_test is not None:
86 add_combo(format_to_test, format_to_test)86 add_combo(optimiser_class.__name__, format_to_test, format_to_test)
87 # if there are specific combinations we want to use, we can add them87 # if there are specific combinations we want to use, we can add them
88 # here. We want to test rich root upgrading.88 # here. We want to test rich root upgrading.
89 add_combo(weaverepo.RepositoryFormat5(),89 # XXX: although we attach InterRepository class names to these scenarios,
90 knitrepo.RepositoryFormatKnit3())90 # there's nothing asserting that these labels correspond to what it
91 add_combo(knitrepo.RepositoryFormatKnit1(),91 # actually used.
92 knitrepo.RepositoryFormatKnit3())92 add_combo('InterRepository',
93 add_combo(knitrepo.RepositoryFormatKnit1(),93 weaverepo.RepositoryFormat5(),
94 knitrepo.RepositoryFormatKnit3())
95 add_combo('InterRepository',
96 knitrepo.RepositoryFormatKnit1(),
97 knitrepo.RepositoryFormatKnit3())
98 add_combo('InterKnitRepo',
99 knitrepo.RepositoryFormatKnit1(),
94 pack_repo.RepositoryFormatKnitPack1())100 pack_repo.RepositoryFormatKnitPack1())
95 add_combo(pack_repo.RepositoryFormatKnitPack1(),101 add_combo('InterKnitRepo',
102 pack_repo.RepositoryFormatKnitPack1(),
96 knitrepo.RepositoryFormatKnit1())103 knitrepo.RepositoryFormatKnit1())
97 add_combo(knitrepo.RepositoryFormatKnit3(),104 add_combo('InterKnitRepo',
105 knitrepo.RepositoryFormatKnit3(),
98 pack_repo.RepositoryFormatKnitPack3())106 pack_repo.RepositoryFormatKnitPack3())
99 add_combo(pack_repo.RepositoryFormatKnitPack3(),107 add_combo('InterKnitRepo',
108 pack_repo.RepositoryFormatKnitPack3(),
100 knitrepo.RepositoryFormatKnit3())109 knitrepo.RepositoryFormatKnit3())
101 add_combo(pack_repo.RepositoryFormatKnitPack3(),110 add_combo('InterKnitRepo',
111 pack_repo.RepositoryFormatKnitPack3(),
102 pack_repo.RepositoryFormatKnitPack4())112 pack_repo.RepositoryFormatKnitPack4())
103 add_combo(pack_repo.RepositoryFormatKnitPack1(),113 add_combo('InterDifferingSerializer',
104 pack_repo.RepositoryFormatKnitPack6RichRoot())114 pack_repo.RepositoryFormatKnitPack1(),
105 add_combo(pack_repo.RepositoryFormatKnitPack6RichRoot(),115 pack_repo.RepositoryFormatKnitPack6RichRoot())
106 groupcompress_repo.RepositoryFormat2a())116 add_combo('InterDifferingSerializer',
107 add_combo(groupcompress_repo.RepositoryFormat2a(),117 pack_repo.RepositoryFormatKnitPack6RichRoot(),
108 pack_repo.RepositoryFormatKnitPack6RichRoot())118 groupcompress_repo.RepositoryFormat2a())
109 add_combo(groupcompress_repo.RepositoryFormatCHK2(),119 add_combo('InterDifferingSerializer',
110 groupcompress_repo.RepositoryFormat2a())120 groupcompress_repo.RepositoryFormat2a(),
111 add_combo(groupcompress_repo.RepositoryFormatCHK1(),121 pack_repo.RepositoryFormatKnitPack6RichRoot())
122 add_combo('InterRepository',
123 groupcompress_repo.RepositoryFormatCHK2(),
124 groupcompress_repo.RepositoryFormat2a())
125 add_combo('InterDifferingSerializer',
126 groupcompress_repo.RepositoryFormatCHK1(),
112 groupcompress_repo.RepositoryFormat2a())127 groupcompress_repo.RepositoryFormat2a())
113 return result128 return result
114129
115130
=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
--- bzrlib/tests/per_interrepository/test_fetch.py 2009-08-13 12:51:59 +0000
+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-13 08:55:59 +0000
@@ -191,8 +191,19 @@
191 ['left', 'right'])191 ['left', 'right'])
192 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)192 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)
193 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)193 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)
194 194
195 # Finally, it's not enough to see that the basis inventories are
196 # present. The texts introduced in merge (and only those) should be
197 # present, and also generating a stream should succeed without blowing
198 # up.
195 self.assertTrue(unstacked_repo.has_revision('merge'))199 self.assertTrue(unstacked_repo.has_revision('merge'))
200 expected_texts = set([('file-id', 'merge')])
201 if stacked_branch.repository.texts.get_parent_map([('root-id',
202 'merge')]):
203 # If a (root-id,merge) text exists, it should be in the stacked
204 # repo.
205 expected_texts.add(('root-id', 'merge'))
206 self.assertEqual(expected_texts, unstacked_repo.texts.keys())
196 self.assertCanStreamRevision(unstacked_repo, 'merge')207 self.assertCanStreamRevision(unstacked_repo, 'merge')
197208
198 def assertCanStreamRevision(self, repo, revision_id):209 def assertCanStreamRevision(self, repo, revision_id):
@@ -241,6 +252,19 @@
241 self.addCleanup(stacked_branch.unlock)252 self.addCleanup(stacked_branch.unlock)
242 stacked_second_tree = stacked_branch.repository.revision_tree('second')253 stacked_second_tree = stacked_branch.repository.revision_tree('second')
243 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)254 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)
255 # Finally, it's not enough to see that the basis inventories are
256 # present. The texts introduced in merge (and only those) should be
257 # present, and also generating a stream should succeed without blowing
258 # up.
259 self.assertTrue(unstacked_repo.has_revision('third'))
260 expected_texts = set([('file-id', 'third')])
261 if stacked_branch.repository.texts.get_parent_map([('root-id',
262 'third')]):
263 # If a (root-id,third) text exists, it should be in the stacked
264 # repo.
265 expected_texts.add(('root-id', 'third'))
266 self.assertEqual(expected_texts, unstacked_repo.texts.keys())
267 self.assertCanStreamRevision(unstacked_repo, 'third')
244268
245 def test_fetch_missing_basis_text(self):269 def test_fetch_missing_basis_text(self):
246 """If fetching a delta, we should die if a basis is not present."""270 """If fetching a delta, we should die if a basis is not present."""
247271
=== modified file 'bzrlib/tests/per_pack_repository.py'
--- bzrlib/tests/per_pack_repository.py 2009-08-12 22:28:28 +0000
+++ bzrlib/tests/per_pack_repository.py 2009-08-13 03:29:52 +0000
@@ -1051,8 +1051,8 @@
1051 tree.branch.push(remote_branch)1051 tree.branch.push(remote_branch)
1052 autopack_calls = len([call for call in self.hpss_calls if call ==1052 autopack_calls = len([call for call in self.hpss_calls if call ==
1053 'PackRepository.autopack'])1053 'PackRepository.autopack'])
1054 streaming_calls = len([call for call in self.hpss_calls if call ==1054 streaming_calls = len([call for call in self.hpss_calls if call in
1055 'Repository.insert_stream'])1055 ('Repository.insert_stream', 'Repository.insert_stream_1.18')])
1056 if autopack_calls:1056 if autopack_calls:
1057 # Non streaming server1057 # Non streaming server
1058 self.assertEqual(1, autopack_calls)1058 self.assertEqual(1, autopack_calls)
10591059
=== modified file 'bzrlib/tests/test_inventory_delta.py'
--- bzrlib/tests/test_inventory_delta.py 2009-08-05 02:30:11 +0000
+++ bzrlib/tests/test_inventory_delta.py 2009-08-11 06:52:07 +0000
@@ -26,6 +26,7 @@
26 inventory,26 inventory,
27 inventory_delta,27 inventory_delta,
28 )28 )
29from bzrlib.inventory_delta import InventoryDeltaError
29from bzrlib.inventory import Inventory30from bzrlib.inventory import Inventory
30from bzrlib.revision import NULL_REVISION31from bzrlib.revision import NULL_REVISION
31from bzrlib.tests import TestCase32from bzrlib.tests import TestCase
@@ -93,34 +94,34 @@
93 """Test InventoryDeltaSerializer.parse_text_bytes."""94 """Test InventoryDeltaSerializer.parse_text_bytes."""
9495
95 def test_parse_no_bytes(self):96 def test_parse_no_bytes(self):
96 serializer = inventory_delta.InventoryDeltaSerializer()97 deserializer = inventory_delta.InventoryDeltaDeserializer()
97 err = self.assertRaises(98 err = self.assertRaises(
98 errors.BzrError, serializer.parse_text_bytes, '')99 InventoryDeltaError, deserializer.parse_text_bytes, '')
99 self.assertContainsRe(str(err), 'last line not empty')100 self.assertContainsRe(str(err), 'last line not empty')
100101
101 def test_parse_bad_format(self):102 def test_parse_bad_format(self):
102 serializer = inventory_delta.InventoryDeltaSerializer()103 deserializer = inventory_delta.InventoryDeltaDeserializer()
103 err = self.assertRaises(errors.BzrError,104 err = self.assertRaises(InventoryDeltaError,
104 serializer.parse_text_bytes, 'format: foo\n')105 deserializer.parse_text_bytes, 'format: foo\n')
105 self.assertContainsRe(str(err), 'unknown format')106 self.assertContainsRe(str(err), 'unknown format')
106107
107 def test_parse_no_parent(self):108 def test_parse_no_parent(self):
108 serializer = inventory_delta.InventoryDeltaSerializer()109 deserializer = inventory_delta.InventoryDeltaDeserializer()
109 err = self.assertRaises(errors.BzrError,110 err = self.assertRaises(InventoryDeltaError,
110 serializer.parse_text_bytes,111 deserializer.parse_text_bytes,
111 'format: bzr inventory delta v1 (bzr 1.14)\n')112 'format: bzr inventory delta v1 (bzr 1.14)\n')
112 self.assertContainsRe(str(err), 'missing parent: marker')113 self.assertContainsRe(str(err), 'missing parent: marker')
113114
114 def test_parse_no_version(self):115 def test_parse_no_version(self):
115 serializer = inventory_delta.InventoryDeltaSerializer()116 deserializer = inventory_delta.InventoryDeltaDeserializer()
116 err = self.assertRaises(errors.BzrError,117 err = self.assertRaises(InventoryDeltaError,
117 serializer.parse_text_bytes,118 deserializer.parse_text_bytes,
118 'format: bzr inventory delta v1 (bzr 1.14)\n'119 'format: bzr inventory delta v1 (bzr 1.14)\n'
119 'parent: null:\n')120 'parent: null:\n')
120 self.assertContainsRe(str(err), 'missing version: marker')121 self.assertContainsRe(str(err), 'missing version: marker')
121 122
122 def test_parse_duplicate_key_errors(self):123 def test_parse_duplicate_key_errors(self):
123 serializer = inventory_delta.InventoryDeltaSerializer()124 deserializer = inventory_delta.InventoryDeltaDeserializer()
124 double_root_lines = \125 double_root_lines = \
125"""format: bzr inventory delta v1 (bzr 1.14)126"""format: bzr inventory delta v1 (bzr 1.14)
126parent: null:127parent: null:
@@ -130,14 +131,13 @@
130None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00131None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
131None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00132None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
132"""133"""
133 err = self.assertRaises(errors.BzrError,134 err = self.assertRaises(InventoryDeltaError,
134 serializer.parse_text_bytes, double_root_lines)135 deserializer.parse_text_bytes, double_root_lines)
135 self.assertContainsRe(str(err), 'duplicate file id')136 self.assertContainsRe(str(err), 'duplicate file id')
136137
137 def test_parse_versioned_root_only(self):138 def test_parse_versioned_root_only(self):
138 serializer = inventory_delta.InventoryDeltaSerializer()139 deserializer = inventory_delta.InventoryDeltaDeserializer()
139 serializer.require_flags(versioned_root=True, tree_references=True)140 parse_result = deserializer.parse_text_bytes(root_only_lines)
140 parse_result = serializer.parse_text_bytes(root_only_lines)
141 expected_entry = inventory.make_entry(141 expected_entry = inventory.make_entry(
142 'directory', u'', None, 'an-id')142 'directory', u'', None, 'an-id')
143 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'143 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
@@ -147,8 +147,7 @@
147 parse_result)147 parse_result)
148148
149 def test_parse_special_revid_not_valid_last_mod(self):149 def test_parse_special_revid_not_valid_last_mod(self):
150 serializer = inventory_delta.InventoryDeltaSerializer()150 deserializer = inventory_delta.InventoryDeltaDeserializer()
151 serializer.require_flags(versioned_root=False, tree_references=True)
152 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)151 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
153parent: null:152parent: null:
154version: null:153version: null:
@@ -156,13 +155,12 @@
156tree_references: true155tree_references: true
157None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00156None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00
158"""157"""
159 err = self.assertRaises(errors.BzrError,158 err = self.assertRaises(InventoryDeltaError,
160 serializer.parse_text_bytes, root_only_lines)159 deserializer.parse_text_bytes, root_only_lines)
161 self.assertContainsRe(str(err), 'special revisionid found')160 self.assertContainsRe(str(err), 'special revisionid found')
162161
163 def test_parse_versioned_root_versioned_disabled(self):162 def test_parse_versioned_root_versioned_disabled(self):
164 serializer = inventory_delta.InventoryDeltaSerializer()163 deserializer = inventory_delta.InventoryDeltaDeserializer()
165 serializer.require_flags(versioned_root=False, tree_references=True)
166 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)164 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
167parent: null:165parent: null:
168version: null:166version: null:
@@ -170,13 +168,12 @@
170tree_references: true168tree_references: true
171None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00169None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
172"""170"""
173 err = self.assertRaises(errors.BzrError,171 err = self.assertRaises(InventoryDeltaError,
174 serializer.parse_text_bytes, root_only_lines)172 deserializer.parse_text_bytes, root_only_lines)
175 self.assertContainsRe(str(err), 'Versioned root found')173 self.assertContainsRe(str(err), 'Versioned root found')
176174
177 def test_parse_unique_root_id_root_versioned_disabled(self):175 def test_parse_unique_root_id_root_versioned_disabled(self):
178 serializer = inventory_delta.InventoryDeltaSerializer()176 deserializer = inventory_delta.InventoryDeltaDeserializer()
179 serializer.require_flags(versioned_root=False, tree_references=True)
180 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)177 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
181parent: parent-id178parent: parent-id
182version: a@e\xc3\xa5ample.com--2004179version: a@e\xc3\xa5ample.com--2004
@@ -184,29 +181,38 @@
184tree_references: true181tree_references: true
185None\x00/\x00an-id\x00\x00parent-id\x00dir\x00\x00182None\x00/\x00an-id\x00\x00parent-id\x00dir\x00\x00
186"""183"""
187 err = self.assertRaises(errors.BzrError,184 err = self.assertRaises(InventoryDeltaError,
188 serializer.parse_text_bytes, root_only_lines)185 deserializer.parse_text_bytes, root_only_lines)
189 self.assertContainsRe(str(err), 'Versioned root found')186 self.assertContainsRe(str(err), 'Versioned root found')
190187
191 def test_parse_unversioned_root_versioning_enabled(self):188 def test_parse_unversioned_root_versioning_enabled(self):
192 serializer = inventory_delta.InventoryDeltaSerializer()189 deserializer = inventory_delta.InventoryDeltaDeserializer()
193 serializer.require_flags(versioned_root=True, tree_references=True)190 parse_result = deserializer.parse_text_bytes(root_only_unversioned)
194 err = self.assertRaises(errors.BzrError,191 expected_entry = inventory.make_entry(
195 serializer.parse_text_bytes, root_only_unversioned)192 'directory', u'', None, 'TREE_ROOT')
196 self.assertContainsRe(193 expected_entry.revision = 'entry-version'
197 str(err), 'serialized versioned_root flag is wrong: False')194 self.assertEqual(
195 ('null:', 'entry-version', False, False,
196 [(None, u'', 'TREE_ROOT', expected_entry)]),
197 parse_result)
198
199 def test_parse_versioned_root_when_disabled(self):
200 deserializer = inventory_delta.InventoryDeltaDeserializer(
201 allow_versioned_root=False)
202 err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
203 deserializer.parse_text_bytes, root_only_lines)
204 self.assertEquals("versioned_root not allowed", str(err))
198205
199 def test_parse_tree_when_disabled(self):206 def test_parse_tree_when_disabled(self):
200 serializer = inventory_delta.InventoryDeltaSerializer()207 deserializer = inventory_delta.InventoryDeltaDeserializer(
201 serializer.require_flags(versioned_root=True, tree_references=False)208 allow_tree_references=False)
202 err = self.assertRaises(errors.BzrError,209 err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
203 serializer.parse_text_bytes, reference_lines)210 deserializer.parse_text_bytes, reference_lines)
204 self.assertContainsRe(211 self.assertEquals("Tree reference not allowed", str(err))
205 str(err), 'serialized tree_references flag is wrong: True')
206212
207 def test_parse_tree_when_header_disallows(self):213 def test_parse_tree_when_header_disallows(self):
208 # A deserializer that allows tree_references to be set or unset.214 # A deserializer that allows tree_references to be set or unset.
209 serializer = inventory_delta.InventoryDeltaSerializer()215 deserializer = inventory_delta.InventoryDeltaDeserializer()
210 # A serialised inventory delta with a header saying no tree refs, but216 # A serialised inventory delta with a header saying no tree refs, but
211 # that has a tree ref in its content.217 # that has a tree ref in its content.
212 lines = """format: bzr inventory delta v1 (bzr 1.14)218 lines = """format: bzr inventory delta v1 (bzr 1.14)
@@ -216,13 +222,13 @@
216tree_references: false222tree_references: false
217None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version223None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
218"""224"""
219 err = self.assertRaises(errors.BzrError,225 err = self.assertRaises(InventoryDeltaError,
220 serializer.parse_text_bytes, lines)226 deserializer.parse_text_bytes, lines)
221 self.assertContainsRe(str(err), 'Tree reference found')227 self.assertContainsRe(str(err), 'Tree reference found')
222228
223 def test_parse_versioned_root_when_header_disallows(self):229 def test_parse_versioned_root_when_header_disallows(self):
224 # A deserializer that allows tree_references to be set or unset.230 # A deserializer that allows tree_references to be set or unset.
225 serializer = inventory_delta.InventoryDeltaSerializer()231 deserializer = inventory_delta.InventoryDeltaDeserializer()
226 # A serialised inventory delta with a header saying no tree refs, but232 # A serialised inventory delta with a header saying no tree refs, but
227 # that has a tree ref in its content.233 # that has a tree ref in its content.
228 lines = """format: bzr inventory delta v1 (bzr 1.14)234 lines = """format: bzr inventory delta v1 (bzr 1.14)
@@ -232,35 +238,35 @@
232tree_references: false238tree_references: false
233None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir239None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
234"""240"""
235 err = self.assertRaises(errors.BzrError,241 err = self.assertRaises(InventoryDeltaError,
236 serializer.parse_text_bytes, lines)242 deserializer.parse_text_bytes, lines)
237 self.assertContainsRe(str(err), 'Versioned root found')243 self.assertContainsRe(str(err), 'Versioned root found')
238244
239 def test_parse_last_line_not_empty(self):245 def test_parse_last_line_not_empty(self):
240 """newpath must start with / if it is not None."""246 """newpath must start with / if it is not None."""
241 # Trim the trailing newline from a valid serialization247 # Trim the trailing newline from a valid serialization
242 lines = root_only_lines[:-1]248 lines = root_only_lines[:-1]
243 serializer = inventory_delta.InventoryDeltaSerializer()249 deserializer = inventory_delta.InventoryDeltaDeserializer()
244 err = self.assertRaises(errors.BzrError,250 err = self.assertRaises(InventoryDeltaError,
245 serializer.parse_text_bytes, lines)251 deserializer.parse_text_bytes, lines)
246 self.assertContainsRe(str(err), 'last line not empty')252 self.assertContainsRe(str(err), 'last line not empty')
247253
248 def test_parse_invalid_newpath(self):254 def test_parse_invalid_newpath(self):
249 """newpath must start with / if it is not None."""255 """newpath must start with / if it is not None."""
250 lines = empty_lines256 lines = empty_lines
251 lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"257 lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
252 serializer = inventory_delta.InventoryDeltaSerializer()258 deserializer = inventory_delta.InventoryDeltaDeserializer()
253 err = self.assertRaises(errors.BzrError,259 err = self.assertRaises(InventoryDeltaError,
254 serializer.parse_text_bytes, lines)260 deserializer.parse_text_bytes, lines)
255 self.assertContainsRe(str(err), 'newpath invalid')261 self.assertContainsRe(str(err), 'newpath invalid')
256262
257 def test_parse_invalid_oldpath(self):263 def test_parse_invalid_oldpath(self):
258 """oldpath must start with / if it is not None."""264 """oldpath must start with / if it is not None."""
259 lines = root_only_lines265 lines = root_only_lines
260 lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"266 lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
261 serializer = inventory_delta.InventoryDeltaSerializer()267 deserializer = inventory_delta.InventoryDeltaDeserializer()
262 err = self.assertRaises(errors.BzrError,268 err = self.assertRaises(InventoryDeltaError,
263 serializer.parse_text_bytes, lines)269 deserializer.parse_text_bytes, lines)
264 self.assertContainsRe(str(err), 'oldpath invalid')270 self.assertContainsRe(str(err), 'oldpath invalid')
265 271
266 def test_parse_new_file(self):272 def test_parse_new_file(self):
@@ -270,8 +276,8 @@
270 lines += (276 lines += (
271 "None\x00/new\x00file-id\x00an-id\x00version\x00file\x00123\x00" +277 "None\x00/new\x00file-id\x00an-id\x00version\x00file\x00123\x00" +
272 "\x00" + fake_sha + "\n")278 "\x00" + fake_sha + "\n")
273 serializer = inventory_delta.InventoryDeltaSerializer()279 deserializer = inventory_delta.InventoryDeltaDeserializer()
274 parse_result = serializer.parse_text_bytes(lines)280 parse_result = deserializer.parse_text_bytes(lines)
275 expected_entry = inventory.make_entry(281 expected_entry = inventory.make_entry(
276 'file', u'new', 'an-id', 'file-id')282 'file', u'new', 'an-id', 'file-id')
277 expected_entry.revision = 'version'283 expected_entry.revision = 'version'
@@ -285,8 +291,8 @@
285 lines = root_only_lines291 lines = root_only_lines
286 lines += (292 lines += (
287 "/old-file\x00None\x00deleted-id\x00\x00null:\x00deleted\x00\x00\n")293 "/old-file\x00None\x00deleted-id\x00\x00null:\x00deleted\x00\x00\n")
288 serializer = inventory_delta.InventoryDeltaSerializer()294 deserializer = inventory_delta.InventoryDeltaDeserializer()
289 parse_result = serializer.parse_text_bytes(lines)295 parse_result = deserializer.parse_text_bytes(lines)
290 delta = parse_result[4]296 delta = parse_result[4]
291 self.assertEqual(297 self.assertEqual(
292 (u'old-file', None, 'deleted-id', None), delta[-1])298 (u'old-file', None, 'deleted-id', None), delta[-1])
@@ -299,8 +305,8 @@
299 old_inv = Inventory(None)305 old_inv = Inventory(None)
300 new_inv = Inventory(None)306 new_inv = Inventory(None)
301 delta = new_inv._make_delta(old_inv)307 delta = new_inv._make_delta(old_inv)
302 serializer = inventory_delta.InventoryDeltaSerializer()308 serializer = inventory_delta.InventoryDeltaSerializer(
303 serializer.require_flags(True, True)309 versioned_root=True, tree_references=True)
304 self.assertEqual(StringIO(empty_lines).readlines(),310 self.assertEqual(StringIO(empty_lines).readlines(),
305 serializer.delta_to_lines(NULL_REVISION, NULL_REVISION, delta))311 serializer.delta_to_lines(NULL_REVISION, NULL_REVISION, delta))
306312
@@ -311,8 +317,8 @@
311 root.revision = 'a@e\xc3\xa5ample.com--2004'317 root.revision = 'a@e\xc3\xa5ample.com--2004'
312 new_inv.add(root)318 new_inv.add(root)
313 delta = new_inv._make_delta(old_inv)319 delta = new_inv._make_delta(old_inv)
314 serializer = inventory_delta.InventoryDeltaSerializer()320 serializer = inventory_delta.InventoryDeltaSerializer(
315 serializer.require_flags(versioned_root=True, tree_references=True)321 versioned_root=True, tree_references=True)
316 self.assertEqual(StringIO(root_only_lines).readlines(),322 self.assertEqual(StringIO(root_only_lines).readlines(),
317 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))323 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
318324
@@ -324,15 +330,16 @@
324 root.revision = 'entry-version'330 root.revision = 'entry-version'
325 new_inv.add(root)331 new_inv.add(root)
326 delta = new_inv._make_delta(old_inv)332 delta = new_inv._make_delta(old_inv)
327 serializer = inventory_delta.InventoryDeltaSerializer()333 serializer = inventory_delta.InventoryDeltaSerializer(
328 serializer.require_flags(False, False)334 versioned_root=False, tree_references=False)
329 serialized_lines = serializer.delta_to_lines(335 serialized_lines = serializer.delta_to_lines(
330 NULL_REVISION, 'entry-version', delta)336 NULL_REVISION, 'entry-version', delta)
331 self.assertEqual(StringIO(root_only_unversioned).readlines(),337 self.assertEqual(StringIO(root_only_unversioned).readlines(),
332 serialized_lines)338 serialized_lines)
339 deserializer = inventory_delta.InventoryDeltaDeserializer()
333 self.assertEqual(340 self.assertEqual(
334 (NULL_REVISION, 'entry-version', False, False, delta),341 (NULL_REVISION, 'entry-version', False, False, delta),
335 serializer.parse_text_bytes(''.join(serialized_lines)))342 deserializer.parse_text_bytes(''.join(serialized_lines)))
336343
337 def test_unversioned_non_root_errors(self):344 def test_unversioned_non_root_errors(self):
338 old_inv = Inventory(None)345 old_inv = Inventory(None)
@@ -343,9 +350,9 @@
343 non_root = new_inv.make_entry('directory', 'foo', root.file_id, 'id')350 non_root = new_inv.make_entry('directory', 'foo', root.file_id, 'id')
344 new_inv.add(non_root)351 new_inv.add(non_root)
345 delta = new_inv._make_delta(old_inv)352 delta = new_inv._make_delta(old_inv)
346 serializer = inventory_delta.InventoryDeltaSerializer()353 serializer = inventory_delta.InventoryDeltaSerializer(
347 serializer.require_flags(versioned_root=True, tree_references=True)354 versioned_root=True, tree_references=True)
348 err = self.assertRaises(errors.BzrError,355 err = self.assertRaises(InventoryDeltaError,
349 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)356 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
350 self.assertEqual(str(err), 'no version for fileid id')357 self.assertEqual(str(err), 'no version for fileid id')
351358
@@ -355,9 +362,9 @@
355 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')362 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
356 new_inv.add(root)363 new_inv.add(root)
357 delta = new_inv._make_delta(old_inv)364 delta = new_inv._make_delta(old_inv)
358 serializer = inventory_delta.InventoryDeltaSerializer()365 serializer = inventory_delta.InventoryDeltaSerializer(
359 serializer.require_flags(versioned_root=True, tree_references=True)366 versioned_root=True, tree_references=True)
360 err = self.assertRaises(errors.BzrError,367 err = self.assertRaises(InventoryDeltaError,
361 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)368 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
362 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')369 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
363370
@@ -368,9 +375,9 @@
368 root.revision = 'a@e\xc3\xa5ample.com--2004'375 root.revision = 'a@e\xc3\xa5ample.com--2004'
369 new_inv.add(root)376 new_inv.add(root)
370 delta = new_inv._make_delta(old_inv)377 delta = new_inv._make_delta(old_inv)
371 serializer = inventory_delta.InventoryDeltaSerializer()378 serializer = inventory_delta.InventoryDeltaSerializer(
372 serializer.require_flags(versioned_root=False, tree_references=True)379 versioned_root=False, tree_references=True)
373 err = self.assertRaises(errors.BzrError,380 err = self.assertRaises(InventoryDeltaError,
374 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)381 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
375 self.assertStartsWith(str(err), 'Version present for / in TREE_ROOT')382 self.assertStartsWith(str(err), 'Version present for / in TREE_ROOT')
376383
@@ -385,8 +392,8 @@
385 non_root.kind = 'strangelove'392 non_root.kind = 'strangelove'
386 new_inv.add(non_root)393 new_inv.add(non_root)
387 delta = new_inv._make_delta(old_inv)394 delta = new_inv._make_delta(old_inv)
388 serializer = inventory_delta.InventoryDeltaSerializer()395 serializer = inventory_delta.InventoryDeltaSerializer(
389 serializer.require_flags(True, True)396 versioned_root=True, tree_references=True)
390 # we expect keyerror because there is little value wrapping this.397 # we expect keyerror because there is little value wrapping this.
391 # This test aims to prove that it errors more than how it errors.398 # This test aims to prove that it errors more than how it errors.
392 err = self.assertRaises(KeyError,399 err = self.assertRaises(KeyError,
@@ -405,8 +412,8 @@
405 non_root.reference_revision = 'subtree-version'412 non_root.reference_revision = 'subtree-version'
406 new_inv.add(non_root)413 new_inv.add(non_root)
407 delta = new_inv._make_delta(old_inv)414 delta = new_inv._make_delta(old_inv)
408 serializer = inventory_delta.InventoryDeltaSerializer()415 serializer = inventory_delta.InventoryDeltaSerializer(
409 serializer.require_flags(versioned_root=True, tree_references=False)416 versioned_root=True, tree_references=False)
410 # we expect keyerror because there is little value wrapping this.417 # we expect keyerror because there is little value wrapping this.
411 # This test aims to prove that it errors more than how it errors.418 # This test aims to prove that it errors more than how it errors.
412 err = self.assertRaises(KeyError,419 err = self.assertRaises(KeyError,
@@ -425,8 +432,8 @@
425 non_root.reference_revision = 'subtree-version'432 non_root.reference_revision = 'subtree-version'
426 new_inv.add(non_root)433 new_inv.add(non_root)
427 delta = new_inv._make_delta(old_inv)434 delta = new_inv._make_delta(old_inv)
428 serializer = inventory_delta.InventoryDeltaSerializer()435 serializer = inventory_delta.InventoryDeltaSerializer(
429 serializer.require_flags(versioned_root=True, tree_references=True)436 versioned_root=True, tree_references=True)
430 self.assertEqual(StringIO(reference_lines).readlines(),437 self.assertEqual(StringIO(reference_lines).readlines(),
431 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))438 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
432439
@@ -434,19 +441,19 @@
434 root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')441 root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
435 root_entry.revision = 'some-version'442 root_entry.revision = 'some-version'
436 delta = [(None, '', 'TREE_ROOT', root_entry)]443 delta = [(None, '', 'TREE_ROOT', root_entry)]
437 serializer = inventory_delta.InventoryDeltaSerializer()444 serializer = inventory_delta.InventoryDeltaSerializer(
438 serializer.require_flags(versioned_root=False, tree_references=True)445 versioned_root=False, tree_references=True)
439 self.assertRaises(446 self.assertRaises(
440 errors.BzrError, serializer.delta_to_lines, 'old-version',447 InventoryDeltaError, serializer.delta_to_lines, 'old-version',
441 'new-version', delta)448 'new-version', delta)
442449
443 def test_to_inventory_root_id_not_versioned(self):450 def test_to_inventory_root_id_not_versioned(self):
444 delta = [(None, '', 'an-id', inventory.make_entry(451 delta = [(None, '', 'an-id', inventory.make_entry(
445 'directory', '', None, 'an-id'))]452 'directory', '', None, 'an-id'))]
446 serializer = inventory_delta.InventoryDeltaSerializer()453 serializer = inventory_delta.InventoryDeltaSerializer(
447 serializer.require_flags(versioned_root=True, tree_references=True)454 versioned_root=True, tree_references=True)
448 self.assertRaises(455 self.assertRaises(
449 errors.BzrError, serializer.delta_to_lines, 'old-version',456 InventoryDeltaError, serializer.delta_to_lines, 'old-version',
450 'new-version', delta)457 'new-version', delta)
451458
452 def test_to_inventory_has_tree_not_meant_to(self):459 def test_to_inventory_has_tree_not_meant_to(self):
@@ -459,9 +466,9 @@
459 (None, 'foo', 'ref-id', tree_ref)466 (None, 'foo', 'ref-id', tree_ref)
460 # a file that followed the root move467 # a file that followed the root move
461 ]468 ]
462 serializer = inventory_delta.InventoryDeltaSerializer()469 serializer = inventory_delta.InventoryDeltaSerializer(
463 serializer.require_flags(versioned_root=True, tree_references=True)470 versioned_root=True, tree_references=True)
464 self.assertRaises(errors.BzrError, serializer.delta_to_lines,471 self.assertRaises(InventoryDeltaError, serializer.delta_to_lines,
465 'old-version', 'new-version', delta)472 'old-version', 'new-version', delta)
466473
467 def test_to_inventory_torture(self):474 def test_to_inventory_torture(self):
@@ -511,8 +518,8 @@
511 executable=True, text_size=30, text_sha1='some-sha',518 executable=True, text_size=30, text_sha1='some-sha',
512 revision='old-rev')),519 revision='old-rev')),
513 ]520 ]
514 serializer = inventory_delta.InventoryDeltaSerializer()521 serializer = inventory_delta.InventoryDeltaSerializer(
515 serializer.require_flags(versioned_root=True, tree_references=True)522 versioned_root=True, tree_references=True)
516 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)523 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
517 expected = """format: bzr inventory delta v1 (bzr 1.14)524 expected = """format: bzr inventory delta v1 (bzr 1.14)
518parent: null:525parent: null:
@@ -565,13 +572,13 @@
565 def test_file_without_size(self):572 def test_file_without_size(self):
566 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')573 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
567 file_entry.text_sha1 = 'foo'574 file_entry.text_sha1 = 'foo'
568 self.assertRaises(errors.BzrError,575 self.assertRaises(InventoryDeltaError,
569 inventory_delta._file_content, file_entry)576 inventory_delta._file_content, file_entry)
570577
571 def test_file_without_sha1(self):578 def test_file_without_sha1(self):
572 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')579 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
573 file_entry.text_size = 10580 file_entry.text_size = 10
574 self.assertRaises(errors.BzrError,581 self.assertRaises(InventoryDeltaError,
575 inventory_delta._file_content, file_entry)582 inventory_delta._file_content, file_entry)
576583
577 def test_link_empty_target(self):584 def test_link_empty_target(self):
@@ -594,7 +601,7 @@
594601
595 def test_link_no_target(self):602 def test_link_no_target(self):
596 entry = inventory.make_entry('symlink', 'a link', None)603 entry = inventory.make_entry('symlink', 'a link', None)
597 self.assertRaises(errors.BzrError,604 self.assertRaises(InventoryDeltaError,
598 inventory_delta._link_content, entry)605 inventory_delta._link_content, entry)
599606
600 def test_reference_null(self):607 def test_reference_null(self):
@@ -611,5 +618,5 @@
611618
612 def test_reference_no_reference(self):619 def test_reference_no_reference(self):
613 entry = inventory.make_entry('tree-reference', 'a tree', None)620 entry = inventory.make_entry('tree-reference', 'a tree', None)
614 self.assertRaises(errors.BzrError,621 self.assertRaises(InventoryDeltaError,
615 inventory_delta._reference_content, entry)622 inventory_delta._reference_content, entry)
616623
=== modified file 'bzrlib/tests/test_remote.py'
--- bzrlib/tests/test_remote.py 2009-08-13 12:51:59 +0000
+++ bzrlib/tests/test_remote.py 2009-08-13 03:29:52 +0000
@@ -2213,7 +2213,26 @@
2213 self.assertEqual([], client._calls)2213 self.assertEqual([], client._calls)
22142214
22152215
2216class TestRepositoryInsertStream(TestRemoteRepository):2216class TestRepositoryInsertStreamBase(TestRemoteRepository):
2217 """Base class for Repository.insert_stream and .insert_stream_1.18
2218 tests.
2219 """
2220
2221 def checkInsertEmptyStream(self, repo, client):
2222 """Insert an empty stream, checking the result.
2223
2224 This checks that there are no resume_tokens or missing_keys, and that
2225 the client is finished.
2226 """
2227 sink = repo._get_sink()
2228 fmt = repository.RepositoryFormat.get_default_format()
2229 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2230 self.assertEqual([], resume_tokens)
2231 self.assertEqual(set(), missing_keys)
2232 self.assertFinished(client)
2233
2234
2235class TestRepositoryInsertStream(TestRepositoryInsertStreamBase):
2217 """Tests for using Repository.insert_stream verb when the _1.18 variant is2236 """Tests for using Repository.insert_stream verb when the _1.18 variant is
2218 not available.2237 not available.
22192238
@@ -2236,12 +2255,7 @@
2236 client.add_expected_call(2255 client.add_expected_call(
2237 'Repository.insert_stream', ('quack/', ''),2256 'Repository.insert_stream', ('quack/', ''),
2238 'success', ('ok',))2257 'success', ('ok',))
2239 sink = repo._get_sink()2258 self.checkInsertEmptyStream(repo, client)
2240 fmt = repository.RepositoryFormat.get_default_format()
2241 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2242 self.assertEqual([], resume_tokens)
2243 self.assertEqual(set(), missing_keys)
2244 self.assertFinished(client)
22452259
2246 def test_locked_repo_with_no_lock_token(self):2260 def test_locked_repo_with_no_lock_token(self):
2247 transport_path = 'quack'2261 transport_path = 'quack'
@@ -2259,12 +2273,7 @@
2259 'Repository.insert_stream', ('quack/', ''),2273 'Repository.insert_stream', ('quack/', ''),
2260 'success', ('ok',))2274 'success', ('ok',))
2261 repo.lock_write()2275 repo.lock_write()
2262 sink = repo._get_sink()2276 self.checkInsertEmptyStream(repo, client)
2263 fmt = repository.RepositoryFormat.get_default_format()
2264 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2265 self.assertEqual([], resume_tokens)
2266 self.assertEqual(set(), missing_keys)
2267 self.assertFinished(client)
22682277
2269 def test_locked_repo_with_lock_token(self):2278 def test_locked_repo_with_lock_token(self):
2270 transport_path = 'quack'2279 transport_path = 'quack'
@@ -2282,18 +2291,13 @@
2282 'Repository.insert_stream_locked', ('quack/', '', 'a token'),2291 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2283 'success', ('ok',))2292 'success', ('ok',))
2284 repo.lock_write()2293 repo.lock_write()
2285 sink = repo._get_sink()2294 self.checkInsertEmptyStream(repo, client)
2286 fmt = repository.RepositoryFormat.get_default_format()
2287 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2288 self.assertEqual([], resume_tokens)
2289 self.assertEqual(set(), missing_keys)
2290 self.assertFinished(client)
22912295
2292 def test_stream_with_inventory_deltas(self):2296 def test_stream_with_inventory_deltas(self):
2293 """'inventory-deltas' substreams can't be sent to the2297 """'inventory-deltas' substreams cannot be sent to the
2294 Repository.insert_stream verb. So when one is encountered the2298 Repository.insert_stream verb, because not all servers that implement
2295 RemoteSink immediately stops using that verb and falls back to VFS2299 that verb will accept them. So when one is encountered the RemoteSink
2296 insert_stream.2300 immediately stops using that verb and falls back to VFS insert_stream.
2297 """2301 """
2298 transport_path = 'quack'2302 transport_path = 'quack'
2299 repo, client = self.setup_fake_client_and_repository(transport_path)2303 repo, client = self.setup_fake_client_and_repository(transport_path)
@@ -2368,8 +2372,8 @@
2368 'directory', 'newdir', inv.root.file_id, 'newdir-id')2372 'directory', 'newdir', inv.root.file_id, 'newdir-id')
2369 entry.revision = 'ghost'2373 entry.revision = 'ghost'
2370 delta = [(None, 'newdir', 'newdir-id', entry)]2374 delta = [(None, 'newdir', 'newdir-id', entry)]
2371 serializer = inventory_delta.InventoryDeltaSerializer()2375 serializer = inventory_delta.InventoryDeltaSerializer(
2372 serializer.require_flags(True, False)2376 versioned_root=True, tree_references=False)
2373 lines = serializer.delta_to_lines('rev1', 'rev2', delta)2377 lines = serializer.delta_to_lines('rev1', 'rev2', delta)
2374 yield versionedfile.ChunkedContentFactory(2378 yield versionedfile.ChunkedContentFactory(
2375 ('rev2',), (('rev1',)), None, lines)2379 ('rev2',), (('rev1',)), None, lines)
@@ -2380,7 +2384,7 @@
2380 return stream_with_inv_delta()2384 return stream_with_inv_delta()
23812385
23822386
2383class TestRepositoryInsertStream_1_18(TestRemoteRepository):2387class TestRepositoryInsertStream_1_18(TestRepositoryInsertStreamBase):
23842388
2385 def test_unlocked_repo(self):2389 def test_unlocked_repo(self):
2386 transport_path = 'quack'2390 transport_path = 'quack'
@@ -2391,12 +2395,7 @@
2391 client.add_expected_call(2395 client.add_expected_call(
2392 'Repository.insert_stream_1.18', ('quack/', ''),2396 'Repository.insert_stream_1.18', ('quack/', ''),
2393 'success', ('ok',))2397 'success', ('ok',))
2394 sink = repo._get_sink()2398 self.checkInsertEmptyStream(repo, client)
2395 fmt = repository.RepositoryFormat.get_default_format()
2396 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2397 self.assertEqual([], resume_tokens)
2398 self.assertEqual(set(), missing_keys)
2399 self.assertFinished(client)
24002399
2401 def test_locked_repo_with_no_lock_token(self):2400 def test_locked_repo_with_no_lock_token(self):
2402 transport_path = 'quack'2401 transport_path = 'quack'
@@ -2411,12 +2410,7 @@
2411 'Repository.insert_stream_1.18', ('quack/', ''),2410 'Repository.insert_stream_1.18', ('quack/', ''),
2412 'success', ('ok',))2411 'success', ('ok',))
2413 repo.lock_write()2412 repo.lock_write()
2414 sink = repo._get_sink()2413 self.checkInsertEmptyStream(repo, client)
2415 fmt = repository.RepositoryFormat.get_default_format()
2416 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2417 self.assertEqual([], resume_tokens)
2418 self.assertEqual(set(), missing_keys)
2419 self.assertFinished(client)
24202414
2421 def test_locked_repo_with_lock_token(self):2415 def test_locked_repo_with_lock_token(self):
2422 transport_path = 'quack'2416 transport_path = 'quack'
@@ -2431,12 +2425,7 @@
2431 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),2425 'Repository.insert_stream_1.18', ('quack/', '', 'a token'),
2432 'success', ('ok',))2426 'success', ('ok',))
2433 repo.lock_write()2427 repo.lock_write()
2434 sink = repo._get_sink()2428 self.checkInsertEmptyStream(repo, client)
2435 fmt = repository.RepositoryFormat.get_default_format()
2436 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2437 self.assertEqual([], resume_tokens)
2438 self.assertEqual(set(), missing_keys)
2439 self.assertFinished(client)
24402429
24412430
2442class TestRepositoryTarball(TestRemoteRepository):2431class TestRepositoryTarball(TestRemoteRepository):
24432432
=== modified file 'bzrlib/tests/test_smart.py'
--- bzrlib/tests/test_smart.py 2009-07-23 07:37:05 +0000
+++ bzrlib/tests/test_smart.py 2009-08-07 02:26:45 +0000
@@ -1242,6 +1242,7 @@
1242 SmartServerResponse(('history-incomplete', 2, r2)),1242 SmartServerResponse(('history-incomplete', 2, r2)),
1243 request.execute('stacked', 1, (3, r3)))1243 request.execute('stacked', 1, (3, r3)))
12441244
1245
1245class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):1246class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):
12461247
1247 def make_two_commit_repo(self):1248 def make_two_commit_repo(self):
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.7 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Robert Collins wrote:
>> On Fri, 2009-08-07 at 05:39 +0000, Andrew Bennetts wrote:
> [...]
>> [debug flags]
>>> I hope we don't need to ask people to use them, but they are a cheap insurance
>>> policy.
>>>
>>> More usefully, they are helpful for benchmarking and testing. I'm ok with
>>> hiding them if you like.
>> I don't have a strong opinion. The flags are in our docs though as it
>> stands, so they should be clear to people reading them rather than
>> perhaps causing confusion.
>
> I'll leave them in, assuming I can find a way to make the ReST markup cope with
> them...
>

Well, you could always change them to IDS_always to make it consistent
with our other debug flags.

>>> I don't understand why that case isn't covered. Perhaps you should file the bug
>>> instead of me?
>> I'd like to make sure I explain it well enough first; once you
>> understand either of us can file it - and I'll be happy enough to be
>> that person.
> [...]
>> What you need to do is change the test from len(self.packs) == 1, to
>> len(packs being combined) == 1 and that_pack has the same hash.
>
> But AIUI “self.packs” on a Packer *is* “packs being combined”. If it's not then
> your explanation makes sense, but my reading of the code says otherwise.

So he might have been thinking of PackCollection.packs, but you are
right. Packer.packs is certainly the "packs being combined".

...

> Oh, actually, it does matter. When hint == [], no packs should be repacked.
> When hint is None, all packs should be repacked.

Agreed.

...

> A tangent: when we transfer parent inventories to a stacked repo, should we
> transfer the texts introduced by those inventories or not?

No. The point of transfering those parent inventories is so that we can
clearly determine what texts are in the children that are not in the
parents.

> I'm seeing a
> worrisome half-and-half behaviour from the streaming code, where the synthesised
> rich root, but no other texts, are transferred for inventories requested via
> get_stream_for_missing_keys. I think the intention is not (if those parent
> revisions are ever filled in later, then at that time the corresponding texts
> would be backfilled too). I've now fixed this too.

Right. I'm personally not very fond of what we've had to do with
stacking. We had an abstraction in place that made stacked branches
auto-fallback to their parents, then we break that abstraction, but only
when accessing via Remote, which then means we need to fill in some
data, which starts breaking the other assumptions like having an
inventory means having all the texts, etc.

Anyway, I think you have it correct, that the *new* assertion is
something along the lines of:

    If we have a revision, then we have its inventory, and all the new
    texts introduced by that revision, relative to all of its parents.

>
> The net result of this is that the stacking tests in
> per_interrepository/test_fetch.py are starting to get pretty large. I suspect
> we can do better here... (but after this branch lands, I hope!)
>

Well, ideally most of that would be in "per_repository_reference" which
is the...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-13 at 17:36 +0000, John A Meinel wrote:
>
> Anyway, I think you have it correct, that the *new* assertion is
> something along the lines of:
>
> If we have a revision, then we have its inventory, and all the new
> texts introduced by that revision, relative to all of its parents.

I think thats insufficient because of the need to be able to make a
delta. See my reply to Andrew.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-08-13 at 13:03 +0000, Andrew Bennetts wrote:

{Only replying to things that need actions}

Re: Packer.packs - Garh context and failing memory. Sounds like you're
correct. John agrees that I was misremembering - so cool, your code has
it correct.

> A tangent: when we transfer parent inventories to a stacked repo,
> should we
> transfer the texts introduced by those inventories or not? I'm seeing
> a
> worrisome half-and-half behaviour from the streaming code, where the
> synthesised
> rich root, but no other texts, are transferred for inventories
> requested via
> get_stream_for_missing_keys. I think the intention is not (if those
> parent
> revisions are ever filled in later, then at that time the
> corresponding texts
> would be backfilled too). I've now fixed this too.

There's no requirement that parent inventory entry text references be
filled; the invariant is;
"A repository can always output a delta for any revision object it has
against all the revisions parents; a delta consists of:
-revision
-signature
-inventory delta against an arbitrary parent
-all text versions not present in any parent
"

We should write this down somewhere.

Woo. _doit_

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-08-13 22:32:23 +0000
+++ NEWS 2009-08-14 05:35:32 +0000
@@ -24,6 +24,15 @@
24Bug Fixes24Bug Fixes
25*********25*********
2626
27* Fetching from 2a branches from a version-2 bzr protocol would fail to
28 copy the internal inventory pages from the CHK store. This cannot happen
29 in normal use as all 2a compatible clients and servers support the
30 version-3 protocol, but it does cause test suite failures when testing
31 downlevel protocol behaviour. (Robert Collins)
32
33* Fixed "Pack ... already exists" error when running ``bzr pack`` on a
34 fully packed 2a repository. (Andrew Bennetts, #382463)
35
27* Further tweaks to handling of ``bzr add`` messages about ignored files.36* Further tweaks to handling of ``bzr add`` messages about ignored files.
28 (Jason Spashett, #76616)37 (Jason Spashett, #76616)
2938
@@ -32,15 +41,22 @@
32 parent inventories incorrectly, and also not handling when one of the41 parent inventories incorrectly, and also not handling when one of the
33 parents was a ghost. (John Arbash Meinel, #402778, #412198)42 parents was a ghost. (John Arbash Meinel, #402778, #412198)
3443
35* Fetching from 2a branches from a version-2 bzr protocol would fail to44* ``RemoteStreamSource.get_stream_for_missing_keys`` will fetch CHK
36 copy the internal inventory pages from the CHK store. This cannot happen45 inventory pages when appropriate (by falling back to the vfs stream
37 in normal use as all 2a compatible clients and servers support the46 source). (Andrew Bennetts, #406686)
38 version-3 protocol, but it does cause test suite failures when testing47
39 downlevel protocol behaviour. (Robert Collins)48* StreamSource generates rich roots from non-rich root sources correctly
49 now. (Andrew Bennetts, #368921)
4050
41Improvements51Improvements
42************52************
4353
54* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
55 smart server are more efficient now. They send inventory deltas rather
56 than full inventories. The smart server has two new requests,
57 ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
58 support this. (Andrew Bennetts, #374738, #385826)
59
44Documentation60Documentation
45*************61*************
4662
@@ -50,6 +66,11 @@
50Internals66Internals
51*********67*********
5268
69* InterDifferingSerializer is now only used locally. Other fetches that
70 would have used InterDifferingSerializer now use the more network
71 friendly StreamSource, which now automatically does the same
72 transformations as InterDifferingSerializer. (Andrew Bennetts)
73
53* RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v274* RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
54 protocols. (Robert Collins)75 protocols. (Robert Collins)
5576
5677
=== modified file 'bzrlib/fetch.py'
--- bzrlib/fetch.py 2009-07-09 08:59:51 +0000
+++ bzrlib/fetch.py 2009-08-14 05:35:32 +0000
@@ -25,16 +25,21 @@
2525
26import operator26import operator
2727
28from bzrlib.lazy_import import lazy_import
29lazy_import(globals(), """
30from bzrlib import (
31 tsort,
32 versionedfile,
33 )
34""")
28import bzrlib35import bzrlib
29from bzrlib import (36from bzrlib import (
30 errors,37 errors,
31 symbol_versioning,38 symbol_versioning,
32 )39 )
33from bzrlib.revision import NULL_REVISION40from bzrlib.revision import NULL_REVISION
34from bzrlib.tsort import topo_sort
35from bzrlib.trace import mutter41from bzrlib.trace import mutter
36import bzrlib.ui42import bzrlib.ui
37from bzrlib.versionedfile import FulltextContentFactory
3843
3944
40class RepoFetcher(object):45class RepoFetcher(object):
@@ -213,11 +218,9 @@
213218
214 def _find_root_ids(self, revs, parent_map, graph):219 def _find_root_ids(self, revs, parent_map, graph):
215 revision_root = {}220 revision_root = {}
216 planned_versions = {}
217 for tree in self.iter_rev_trees(revs):221 for tree in self.iter_rev_trees(revs):
218 revision_id = tree.inventory.root.revision222 revision_id = tree.inventory.root.revision
219 root_id = tree.get_root_id()223 root_id = tree.get_root_id()
220 planned_versions.setdefault(root_id, []).append(revision_id)
221 revision_root[revision_id] = root_id224 revision_root[revision_id] = root_id
222 # Find out which parents we don't already know root ids for225 # Find out which parents we don't already know root ids for
223 parents = set()226 parents = set()
@@ -229,7 +232,7 @@
229 for tree in self.iter_rev_trees(parents):232 for tree in self.iter_rev_trees(parents):
230 root_id = tree.get_root_id()233 root_id = tree.get_root_id()
231 revision_root[tree.get_revision_id()] = root_id234 revision_root[tree.get_revision_id()] = root_id
232 return revision_root, planned_versions235 return revision_root
233236
234 def generate_root_texts(self, revs):237 def generate_root_texts(self, revs):
235 """Generate VersionedFiles for all root ids.238 """Generate VersionedFiles for all root ids.
@@ -238,9 +241,8 @@
238 """241 """
239 graph = self.source.get_graph()242 graph = self.source.get_graph()
240 parent_map = graph.get_parent_map(revs)243 parent_map = graph.get_parent_map(revs)
241 rev_order = topo_sort(parent_map)244 rev_order = tsort.topo_sort(parent_map)
242 rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(245 rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
243 revs, parent_map, graph)
244 root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in246 root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
245 rev_order]247 rev_order]
246 # Guaranteed stable, this groups all the file id operations together248 # Guaranteed stable, this groups all the file id operations together
@@ -249,20 +251,93 @@
249 # yet, and are unlikely to in non-rich-root environments anyway.251 # yet, and are unlikely to in non-rich-root environments anyway.
250 root_id_order.sort(key=operator.itemgetter(0))252 root_id_order.sort(key=operator.itemgetter(0))
251 # Create a record stream containing the roots to create.253 # Create a record stream containing the roots to create.
252 def yield_roots():254 from bzrlib.graph import FrozenHeadsCache
253 for key in root_id_order:255 graph = FrozenHeadsCache(graph)
254 root_id, rev_id = key256 new_roots_stream = _new_root_data_stream(
255 rev_parents = parent_map[rev_id]257 root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
256 # We drop revision parents with different file-ids, because258 return [('texts', new_roots_stream)]
257 # that represents a rename of the root to a different location259
258 # - its not actually a parent for us. (We could look for that260
259 # file id in the revision tree at considerably more expense,261def _new_root_data_stream(
260 # but for now this is sufficient (and reconcile will catch and262 root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
261 # correct this anyway).263 """Generate a texts substream of synthesised root entries.
262 # When a parent revision is a ghost, we guess that its root id264
263 # was unchanged (rather than trimming it from the parent list).265 Used in fetches that do rich-root upgrades.
264 parent_keys = tuple((root_id, parent) for parent in rev_parents266
265 if parent != NULL_REVISION and267 :param root_keys_to_create: iterable of (root_id, rev_id) pairs describing
266 rev_id_to_root_id.get(parent, root_id) == root_id)268 the root entries to create.
267 yield FulltextContentFactory(key, parent_keys, None, '')269 :param rev_id_to_root_id_map: dict of known rev_id -> root_id mappings for
268 return [('texts', yield_roots())]270 calculating the parents. If a parent rev_id is not found here then it
271 will be recalculated.
272 :param parent_map: a parent map for all the revisions in
273 root_keys_to_create.
274 :param graph: a graph to use instead of repo.get_graph().
275 """
276 for root_key in root_keys_to_create:
277 root_id, rev_id = root_key
278 parent_keys = _parent_keys_for_root_version(
279 root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph)
280 yield versionedfile.FulltextContentFactory(
281 root_key, parent_keys, None, '')
282
283
284def _parent_keys_for_root_version(
285 root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
286 """Get the parent keys for a given root id.
287
288 A helper function for _new_root_data_stream.
289 """
290 # Include direct parents of the revision, but only if they used the same
291 # root_id and are heads.
292 rev_parents = parent_map[rev_id]
293 parent_ids = []
294 for parent_id in rev_parents:
295 if parent_id == NULL_REVISION:
296 continue
297 if parent_id not in rev_id_to_root_id_map:
298 # We probably didn't read this revision, go spend the extra effort
299 # to actually check
300 try:
301 tree = repo.revision_tree(parent_id)
302 except errors.NoSuchRevision:
303 # Ghost, fill out rev_id_to_root_id in case we encounter this
304 # again.
305 # But set parent_root_id to None since we don't really know
306 parent_root_id = None
307 else:
308 parent_root_id = tree.get_root_id()
309 rev_id_to_root_id_map[parent_id] = None
310 # XXX: why not:
311 # rev_id_to_root_id_map[parent_id] = parent_root_id
312 # memory consumption maybe?
313 else:
314 parent_root_id = rev_id_to_root_id_map[parent_id]
315 if root_id == parent_root_id:
316 # With stacking we _might_ want to refer to a non-local revision,
317 # but this code path only applies when we have the full content
318 # available, so ghosts really are ghosts, not just the edge of
319 # local data.
320 parent_ids.append(parent_id)
321 else:
322 # root_id may be in the parent anyway.
323 try:
324 tree = repo.revision_tree(parent_id)
325 except errors.NoSuchRevision:
326 # ghost, can't refer to it.
327 pass
328 else:
329 try:
330 parent_ids.append(tree.inventory[root_id].revision)
331 except errors.NoSuchId:
332 # not in the tree
333 pass
334 # Drop non-head parents
335 if graph is None:
336 graph = repo.get_graph()
337 heads = graph.heads(parent_ids)
338 selected_ids = []
339 for parent_id in parent_ids:
340 if parent_id in heads and parent_id not in selected_ids:
341 selected_ids.append(parent_id)
342 parent_keys = [(root_id, parent_id) for parent_id in selected_ids]
343 return parent_keys
269344
=== modified file 'bzrlib/help_topics/en/debug-flags.txt'
--- bzrlib/help_topics/en/debug-flags.txt 2009-08-04 04:36:34 +0000
+++ bzrlib/help_topics/en/debug-flags.txt 2009-08-14 05:35:32 +0000
@@ -12,6 +12,7 @@
12 operations.12 operations.
13-Dfetch Trace history copying between repositories.13-Dfetch Trace history copying between repositories.
14-Dfilters Emit information for debugging content filtering.14-Dfilters Emit information for debugging content filtering.
15-Dforceinvdeltas Force use of inventory deltas during generic streaming fetch.
15-Dgraph Trace graph traversal.16-Dgraph Trace graph traversal.
16-Dhashcache Log every time a working file is read to determine its hash.17-Dhashcache Log every time a working file is read to determine its hash.
17-Dhooks Trace hook execution.18-Dhooks Trace hook execution.
@@ -27,3 +28,7 @@
27-Dunlock Some errors during unlock are treated as warnings.28-Dunlock Some errors during unlock are treated as warnings.
28-Dpack Emit information about pack operations.29-Dpack Emit information about pack operations.
29-Dsftp Trace SFTP internals.30-Dsftp Trace SFTP internals.
31-Dstream Trace fetch streams.
32-DIDS_never Never use InterDifferingSerializer when fetching.
33-DIDS_always Always use InterDifferingSerializer to fetch if appropriate
34 for the format, even for non-local fetches.
3035
=== modified file 'bzrlib/inventory_delta.py'
--- bzrlib/inventory_delta.py 2009-04-02 05:53:12 +0000
+++ bzrlib/inventory_delta.py 2009-08-14 05:35:32 +0000
@@ -29,6 +29,25 @@
29from bzrlib import inventory29from bzrlib import inventory
30from bzrlib.revision import NULL_REVISION30from bzrlib.revision import NULL_REVISION
3131
32FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
33
34
35class InventoryDeltaError(errors.BzrError):
36 """An error when serializing or deserializing an inventory delta."""
37
38 # Most errors when serializing and deserializing are due to bugs, although
39 # damaged input (i.e. a bug in a different process) could cause
40 # deserialization errors too.
41 internal_error = True
42
43
44class IncompatibleInventoryDelta(errors.BzrError):
45 """The delta could not be deserialised because its contents conflict with
46 the allow_versioned_root or allow_tree_references flags of the
47 deserializer.
48 """
49 internal_error = False
50
3251
33def _directory_content(entry):52def _directory_content(entry):
34 """Serialize the content component of entry which is a directory.53 """Serialize the content component of entry which is a directory.
@@ -49,7 +68,7 @@
49 exec_bytes = ''68 exec_bytes = ''
50 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)69 size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)
51 if None in size_exec_sha:70 if None in size_exec_sha:
52 raise errors.BzrError('Missing size or sha for %s' % entry.file_id)71 raise InventoryDeltaError('Missing size or sha for %s' % entry.file_id)
53 return "file\x00%d\x00%s\x00%s" % size_exec_sha72 return "file\x00%d\x00%s\x00%s" % size_exec_sha
5473
5574
@@ -60,7 +79,7 @@
60 """79 """
61 target = entry.symlink_target80 target = entry.symlink_target
62 if target is None:81 if target is None:
63 raise errors.BzrError('Missing target for %s' % entry.file_id)82 raise InventoryDeltaError('Missing target for %s' % entry.file_id)
64 return "link\x00%s" % target.encode('utf8')83 return "link\x00%s" % target.encode('utf8')
6584
6685
@@ -71,7 +90,8 @@
71 """90 """
72 tree_revision = entry.reference_revision91 tree_revision = entry.reference_revision
73 if tree_revision is None:92 if tree_revision is None:
74 raise errors.BzrError('Missing reference revision for %s' % entry.file_id)93 raise InventoryDeltaError(
94 'Missing reference revision for %s' % entry.file_id)
75 return "tree\x00%s" % tree_revision95 return "tree\x00%s" % tree_revision
7696
7797
@@ -115,11 +135,8 @@
115 return result135 return result
116136
117137
118
119class InventoryDeltaSerializer(object):138class InventoryDeltaSerializer(object):
120 """Serialize and deserialize inventory deltas."""139 """Serialize inventory deltas."""
121
122 FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
123140
124 def __init__(self, versioned_root, tree_references):141 def __init__(self, versioned_root, tree_references):
125 """Create an InventoryDeltaSerializer.142 """Create an InventoryDeltaSerializer.
@@ -141,6 +158,9 @@
141 def delta_to_lines(self, old_name, new_name, delta_to_new):158 def delta_to_lines(self, old_name, new_name, delta_to_new):
142 """Return a line sequence for delta_to_new.159 """Return a line sequence for delta_to_new.
143160
161 Both the versioned_root and tree_references flags must be set via
162 require_flags before calling this.
163
144 :param old_name: A UTF8 revision id for the old inventory. May be164 :param old_name: A UTF8 revision id for the old inventory. May be
145 NULL_REVISION if there is no older inventory and delta_to_new165 NULL_REVISION if there is no older inventory and delta_to_new
146 includes the entire inventory contents.166 includes the entire inventory contents.
@@ -150,15 +170,20 @@
150 takes.170 takes.
151 :return: The serialized delta as lines.171 :return: The serialized delta as lines.
152 """172 """
173 if type(old_name) is not str:
174 raise TypeError('old_name should be str, got %r' % (old_name,))
175 if type(new_name) is not str:
176 raise TypeError('new_name should be str, got %r' % (new_name,))
153 lines = ['', '', '', '', '']177 lines = ['', '', '', '', '']
154 to_line = self._delta_item_to_line178 to_line = self._delta_item_to_line
155 for delta_item in delta_to_new:179 for delta_item in delta_to_new:
156 lines.append(to_line(delta_item))180 line = to_line(delta_item, new_name)
157 if lines[-1].__class__ != str:181 if line.__class__ != str:
158 raise errors.BzrError(182 raise InventoryDeltaError(
159 'to_line generated non-str output %r' % lines[-1])183 'to_line generated non-str output %r' % lines[-1])
184 lines.append(line)
160 lines.sort()185 lines.sort()
161 lines[0] = "format: %s\n" % InventoryDeltaSerializer.FORMAT_1186 lines[0] = "format: %s\n" % FORMAT_1
162 lines[1] = "parent: %s\n" % old_name187 lines[1] = "parent: %s\n" % old_name
163 lines[2] = "version: %s\n" % new_name188 lines[2] = "version: %s\n" % new_name
164 lines[3] = "versioned_root: %s\n" % self._serialize_bool(189 lines[3] = "versioned_root: %s\n" % self._serialize_bool(
@@ -173,7 +198,7 @@
173 else:198 else:
174 return "false"199 return "false"
175200
176 def _delta_item_to_line(self, delta_item):201 def _delta_item_to_line(self, delta_item, new_version):
177 """Convert delta_item to a line."""202 """Convert delta_item to a line."""
178 oldpath, newpath, file_id, entry = delta_item203 oldpath, newpath, file_id, entry = delta_item
179 if newpath is None:204 if newpath is None:
@@ -188,6 +213,10 @@
188 oldpath_utf8 = 'None'213 oldpath_utf8 = 'None'
189 else:214 else:
190 oldpath_utf8 = '/' + oldpath.encode('utf8')215 oldpath_utf8 = '/' + oldpath.encode('utf8')
216 if newpath == '/':
217 raise AssertionError(
218 "Bad inventory delta: '/' is not a valid newpath "
219 "(should be '') in delta item %r" % (delta_item,))
191 # TODO: Test real-world utf8 cache hit rate. It may be a win.220 # TODO: Test real-world utf8 cache hit rate. It may be a win.
192 newpath_utf8 = '/' + newpath.encode('utf8')221 newpath_utf8 = '/' + newpath.encode('utf8')
193 # Serialize None as ''222 # Serialize None as ''
@@ -196,58 +225,78 @@
196 last_modified = entry.revision225 last_modified = entry.revision
197 # special cases for /226 # special cases for /
198 if newpath_utf8 == '/' and not self._versioned_root:227 if newpath_utf8 == '/' and not self._versioned_root:
199 if file_id != 'TREE_ROOT':228 # This is an entry for the root, this inventory does not
200 raise errors.BzrError(229 # support versioned roots. So this must be an unversioned
201 'file_id %s is not TREE_ROOT for /' % file_id)230 # root, i.e. last_modified == new revision. Otherwise, this
202 if last_modified is not None:231 # delta is invalid.
203 raise errors.BzrError(232 # Note: the non-rich-root repositories *can* have roots with
204 'Version present for / in %s' % file_id)233 # file-ids other than TREE_ROOT, e.g. repo formats that use the
205 last_modified = NULL_REVISION234 # xml5 serializer.
235 if last_modified != new_version:
236 raise InventoryDeltaError(
237 'Version present for / in %s (%s != %s)'
238 % (file_id, last_modified, new_version))
206 if last_modified is None:239 if last_modified is None:
207 raise errors.BzrError("no version for fileid %s" % file_id)240 raise InventoryDeltaError("no version for fileid %s" % file_id)
208 content = self._entry_to_content[entry.kind](entry)241 content = self._entry_to_content[entry.kind](entry)
209 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %242 return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %
210 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,243 (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,
211 content))244 content))
212245
246
247class InventoryDeltaDeserializer(object):
248 """Deserialize inventory deltas."""
249
250 def __init__(self, allow_versioned_root=True, allow_tree_references=True):
251 """Create an InventoryDeltaDeserializer.
252
253 :param versioned_root: If True, any root entry that is seen is expected
254 to be versioned, and root entries can have any fileid.
255 :param tree_references: If True support tree-reference entries.
256 """
257 self._allow_versioned_root = allow_versioned_root
258 self._allow_tree_references = allow_tree_references
259
213 def _deserialize_bool(self, value):260 def _deserialize_bool(self, value):
214 if value == "true":261 if value == "true":
215 return True262 return True
216 elif value == "false":263 elif value == "false":
217 return False264 return False
218 else:265 else:
219 raise errors.BzrError("value %r is not a bool" % (value,))266 raise InventoryDeltaError("value %r is not a bool" % (value,))
220267
221 def parse_text_bytes(self, bytes):268 def parse_text_bytes(self, bytes):
222 """Parse the text bytes of a serialized inventory delta.269 """Parse the text bytes of a serialized inventory delta.
223270
271 If versioned_root and/or tree_references flags were set via
272 require_flags, then the parsed flags must match or a BzrError will be
273 raised.
274
224 :param bytes: The bytes to parse. This can be obtained by calling275 :param bytes: The bytes to parse. This can be obtained by calling
225 delta_to_lines and then doing ''.join(delta_lines).276 delta_to_lines and then doing ''.join(delta_lines).
226 :return: (parent_id, new_id, inventory_delta)277 :return: (parent_id, new_id, versioned_root, tree_references,
278 inventory_delta)
227 """279 """
280 if bytes[-1:] != '\n':
281 last_line = bytes.rsplit('\n', 1)[-1]
282 raise InventoryDeltaError('last line not empty: %r' % (last_line,))
228 lines = bytes.split('\n')[:-1] # discard the last empty line283 lines = bytes.split('\n')[:-1] # discard the last empty line
229 if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:284 if not lines or lines[0] != 'format: %s' % FORMAT_1:
230 raise errors.BzrError('unknown format %r' % lines[0:1])285 raise InventoryDeltaError('unknown format %r' % lines[0:1])
231 if len(lines) < 2 or not lines[1].startswith('parent: '):286 if len(lines) < 2 or not lines[1].startswith('parent: '):
232 raise errors.BzrError('missing parent: marker')287 raise InventoryDeltaError('missing parent: marker')
233 delta_parent_id = lines[1][8:]288 delta_parent_id = lines[1][8:]
234 if len(lines) < 3 or not lines[2].startswith('version: '):289 if len(lines) < 3 or not lines[2].startswith('version: '):
235 raise errors.BzrError('missing version: marker')290 raise InventoryDeltaError('missing version: marker')
236 delta_version_id = lines[2][9:]291 delta_version_id = lines[2][9:]
237 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):292 if len(lines) < 4 or not lines[3].startswith('versioned_root: '):
238 raise errors.BzrError('missing versioned_root: marker')293 raise InventoryDeltaError('missing versioned_root: marker')
239 delta_versioned_root = self._deserialize_bool(lines[3][16:])294 delta_versioned_root = self._deserialize_bool(lines[3][16:])
240 if len(lines) < 5 or not lines[4].startswith('tree_references: '):295 if len(lines) < 5 or not lines[4].startswith('tree_references: '):
241 raise errors.BzrError('missing tree_references: marker')296 raise InventoryDeltaError('missing tree_references: marker')
242 delta_tree_references = self._deserialize_bool(lines[4][17:])297 delta_tree_references = self._deserialize_bool(lines[4][17:])
243 if delta_versioned_root != self._versioned_root:298 if (not self._allow_versioned_root and delta_versioned_root):
244 raise errors.BzrError(299 raise IncompatibleInventoryDelta("versioned_root not allowed")
245 "serialized versioned_root flag is wrong: %s" %
246 (delta_versioned_root,))
247 if delta_tree_references != self._tree_references:
248 raise errors.BzrError(
249 "serialized tree_references flag is wrong: %s" %
250 (delta_tree_references,))
251 result = []300 result = []
252 seen_ids = set()301 seen_ids = set()
253 line_iter = iter(lines)302 line_iter = iter(lines)
@@ -258,33 +307,58 @@
258 content) = line.split('\x00', 5)307 content) = line.split('\x00', 5)
259 parent_id = parent_id or None308 parent_id = parent_id or None
260 if file_id in seen_ids:309 if file_id in seen_ids:
261 raise errors.BzrError(310 raise InventoryDeltaError(
262 "duplicate file id in inventory delta %r" % lines)311 "duplicate file id in inventory delta %r" % lines)
263 seen_ids.add(file_id)312 seen_ids.add(file_id)
264 if newpath_utf8 == '/' and not delta_versioned_root and (313 if (newpath_utf8 == '/' and not delta_versioned_root and
265 last_modified != 'null:' or file_id != 'TREE_ROOT'):314 last_modified != delta_version_id):
266 raise errors.BzrError("Versioned root found: %r" % line)315 # Delta claims to be not have a versioned root, yet here's
267 elif last_modified[-1] == ':':316 # a root entry with a non-default version.
268 raise errors.BzrError('special revisionid found: %r' % line)317 raise InventoryDeltaError("Versioned root found: %r" % line)
269 if not delta_tree_references and content.startswith('tree\x00'):318 elif newpath_utf8 != 'None' and last_modified[-1] == ':':
270 raise errors.BzrError("Tree reference found: %r" % line)319 # Deletes have a last_modified of null:, but otherwise special
271 content_tuple = tuple(content.split('\x00'))320 # revision ids should not occur.
272 entry = _parse_entry(321 raise InventoryDeltaError('special revisionid found: %r' % line)
273 newpath_utf8, file_id, parent_id, last_modified, content_tuple)322 if content.startswith('tree\x00'):
323 if delta_tree_references is False:
324 raise InventoryDeltaError(
325 "Tree reference found (but header said "
326 "tree_references: false): %r" % line)
327 elif not self._allow_tree_references:
328 raise IncompatibleInventoryDelta(
329 "Tree reference not allowed")
274 if oldpath_utf8 == 'None':330 if oldpath_utf8 == 'None':
275 oldpath = None331 oldpath = None
332 elif oldpath_utf8[:1] != '/':
333 raise InventoryDeltaError(
334 "oldpath invalid (does not start with /): %r"
335 % (oldpath_utf8,))
276 else:336 else:
337 oldpath_utf8 = oldpath_utf8[1:]
277 oldpath = oldpath_utf8.decode('utf8')338 oldpath = oldpath_utf8.decode('utf8')
278 if newpath_utf8 == 'None':339 if newpath_utf8 == 'None':
279 newpath = None340 newpath = None
341 elif newpath_utf8[:1] != '/':
342 raise InventoryDeltaError(
343 "newpath invalid (does not start with /): %r"
344 % (newpath_utf8,))
280 else:345 else:
346 # Trim leading slash
347 newpath_utf8 = newpath_utf8[1:]
281 newpath = newpath_utf8.decode('utf8')348 newpath = newpath_utf8.decode('utf8')
349 content_tuple = tuple(content.split('\x00'))
350 if content_tuple[0] == 'deleted':
351 entry = None
352 else:
353 entry = _parse_entry(
354 newpath, file_id, parent_id, last_modified, content_tuple)
282 delta_item = (oldpath, newpath, file_id, entry)355 delta_item = (oldpath, newpath, file_id, entry)
283 result.append(delta_item)356 result.append(delta_item)
284 return delta_parent_id, delta_version_id, result357 return (delta_parent_id, delta_version_id, delta_versioned_root,
285358 delta_tree_references, result)
286359
287def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):360
361def _parse_entry(path, file_id, parent_id, last_modified, content):
288 entry_factory = {362 entry_factory = {
289 'dir': _dir_to_entry,363 'dir': _dir_to_entry,
290 'file': _file_to_entry,364 'file': _file_to_entry,
@@ -292,8 +366,10 @@
292 'tree': _tree_to_entry,366 'tree': _tree_to_entry,
293 }367 }
294 kind = content[0]368 kind = content[0]
295 path = utf8_path[1:].decode('utf8')369 if path.startswith('/'):
370 raise AssertionError
296 name = basename(path)371 name = basename(path)
297 return entry_factory[content[0]](372 return entry_factory[content[0]](
298 content, name, parent_id, file_id, last_modified)373 content, name, parent_id, file_id, last_modified)
299374
375
300376
=== modified file 'bzrlib/remote.py'
--- bzrlib/remote.py 2009-08-13 22:32:23 +0000
+++ bzrlib/remote.py 2009-08-14 05:35:32 +0000
@@ -426,6 +426,7 @@
426 self._custom_format = None426 self._custom_format = None
427 self._network_name = None427 self._network_name = None
428 self._creating_bzrdir = None428 self._creating_bzrdir = None
429 self._supports_chks = None
429 self._supports_external_lookups = None430 self._supports_external_lookups = None
430 self._supports_tree_reference = None431 self._supports_tree_reference = None
431 self._rich_root_data = None432 self._rich_root_data = None
@@ -443,6 +444,13 @@
443 return self._rich_root_data444 return self._rich_root_data
444445
445 @property446 @property
447 def supports_chks(self):
448 if self._supports_chks is None:
449 self._ensure_real()
450 self._supports_chks = self._custom_format.supports_chks
451 return self._supports_chks
452
453 @property
446 def supports_external_lookups(self):454 def supports_external_lookups(self):
447 if self._supports_external_lookups is None:455 if self._supports_external_lookups is None:
448 self._ensure_real()456 self._ensure_real()
@@ -1178,9 +1186,9 @@
1178 self._ensure_real()1186 self._ensure_real()
1179 return self._real_repository.get_inventory(revision_id)1187 return self._real_repository.get_inventory(revision_id)
11801188
1181 def iter_inventories(self, revision_ids):1189 def iter_inventories(self, revision_ids, ordering=None):
1182 self._ensure_real()1190 self._ensure_real()
1183 return self._real_repository.iter_inventories(revision_ids)1191 return self._real_repository.iter_inventories(revision_ids, ordering)
11841192
1185 @needs_read_lock1193 @needs_read_lock
1186 def get_revision(self, revision_id):1194 def get_revision(self, revision_id):
@@ -1682,43 +1690,61 @@
1682 def insert_stream(self, stream, src_format, resume_tokens):1690 def insert_stream(self, stream, src_format, resume_tokens):
1683 target = self.target_repo1691 target = self.target_repo
1684 target._unstacked_provider.missing_keys.clear()1692 target._unstacked_provider.missing_keys.clear()
1693 candidate_calls = [('Repository.insert_stream_1.19', (1, 19))]
1685 if target._lock_token:1694 if target._lock_token:
1686 verb = 'Repository.insert_stream_locked'1695 candidate_calls.append(('Repository.insert_stream_locked', (1, 14)))
1687 extra_args = (target._lock_token or '',)1696 lock_args = (target._lock_token or '',)
1688 required_version = (1, 14)
1689 else:1697 else:
1690 verb = 'Repository.insert_stream'1698 candidate_calls.append(('Repository.insert_stream', (1, 13)))
1691 extra_args = ()1699 lock_args = ()
1692 required_version = (1, 13)
1693 client = target._client1700 client = target._client
1694 medium = client._medium1701 medium = client._medium
1695 if medium._is_remote_before(required_version):
1696 # No possible way this can work.
1697 return self._insert_real(stream, src_format, resume_tokens)
1698 path = target.bzrdir._path_for_remote_call(client)1702 path = target.bzrdir._path_for_remote_call(client)
1699 if not resume_tokens:1703 # Probe for the verb to use with an empty stream before sending the
1700 # XXX: Ugly but important for correctness, *will* be fixed during1704 # real stream to it. We do this both to avoid the risk of sending a
1701 # 1.13 cycle. Pushing a stream that is interrupted results in a1705 # large request that is then rejected, and because we don't want to
1702 # fallback to the _real_repositories sink *with a partial stream*.1706 # implement a way to buffer, rewind, or restart the stream.
1703 # Thats bad because we insert less data than bzr expected. To avoid1707 found_verb = False
1704 # this we do a trial push to make sure the verb is accessible, and1708 for verb, required_version in candidate_calls:
1705 # do not fallback when actually pushing the stream. A cleanup patch1709 if medium._is_remote_before(required_version):
1706 # is going to look at rewinding/restarting the stream/partial1710 continue
1707 # buffering etc.1711 if resume_tokens:
1712 # We've already done the probing (and set _is_remote_before) on
1713 # a previous insert.
1714 found_verb = True
1715 break
1708 byte_stream = smart_repo._stream_to_byte_stream([], src_format)1716 byte_stream = smart_repo._stream_to_byte_stream([], src_format)
1709 try:1717 try:
1710 response = client.call_with_body_stream(1718 response = client.call_with_body_stream(
1711 (verb, path, '') + extra_args, byte_stream)1719 (verb, path, '') + lock_args, byte_stream)
1712 except errors.UnknownSmartMethod:1720 except errors.UnknownSmartMethod:
1713 medium._remember_remote_is_before(required_version)1721 medium._remember_remote_is_before(required_version)
1714 return self._insert_real(stream, src_format, resume_tokens)1722 else:
1723 found_verb = True
1724 break
1725 if not found_verb:
1726 # Have to use VFS.
1727 return self._insert_real(stream, src_format, resume_tokens)
1728 self._last_inv_record = None
1729 self._last_substream = None
1730 if required_version < (1, 19):
1731 # Remote side doesn't support inventory deltas. Wrap the stream to
1732 # make sure we don't send any. If the stream contains inventory
1733 # deltas we'll interrupt the smart insert_stream request and
1734 # fallback to VFS.
1735 stream = self._stop_stream_if_inventory_delta(stream)
1715 byte_stream = smart_repo._stream_to_byte_stream(1736 byte_stream = smart_repo._stream_to_byte_stream(
1716 stream, src_format)1737 stream, src_format)
1717 resume_tokens = ' '.join(resume_tokens)1738 resume_tokens = ' '.join(resume_tokens)
1718 response = client.call_with_body_stream(1739 response = client.call_with_body_stream(
1719 (verb, path, resume_tokens) + extra_args, byte_stream)1740 (verb, path, resume_tokens) + lock_args, byte_stream)
1720 if response[0][0] not in ('ok', 'missing-basis'):1741 if response[0][0] not in ('ok', 'missing-basis'):
1721 raise errors.UnexpectedSmartServerResponse(response)1742 raise errors.UnexpectedSmartServerResponse(response)
1743 if self._last_substream is not None:
1744 # The stream included an inventory-delta record, but the remote
1745 # side isn't new enough to support them. So we need to send the
1746 # rest of the stream via VFS.
1747 return self._resume_stream_with_vfs(response, src_format)
1722 if response[0][0] == 'missing-basis':1748 if response[0][0] == 'missing-basis':
1723 tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])1749 tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
1724 resume_tokens = tokens1750 resume_tokens = tokens
@@ -1727,6 +1753,46 @@
1727 self.target_repo.refresh_data()1753 self.target_repo.refresh_data()
1728 return [], set()1754 return [], set()
17291755
1756 def _resume_stream_with_vfs(self, response, src_format):
1757 """Resume sending a stream via VFS, first resending the record and
1758 substream that couldn't be sent via an insert_stream verb.
1759 """
1760 if response[0][0] == 'missing-basis':
1761 tokens, missing_keys = bencode.bdecode_as_tuple(response[0][1])
1762 # Ignore missing_keys, we haven't finished inserting yet
1763 else:
1764 tokens = []
1765 def resume_substream():
1766 # Yield the substream that was interrupted.
1767 for record in self._last_substream:
1768 yield record
1769 self._last_substream = None
1770 def resume_stream():
1771 # Finish sending the interrupted substream
1772 yield ('inventory-deltas', resume_substream())
1773 # Then simply continue sending the rest of the stream.
1774 for substream_kind, substream in self._last_stream:
1775 yield substream_kind, substream
1776 return self._insert_real(resume_stream(), src_format, tokens)
1777
1778 def _stop_stream_if_inventory_delta(self, stream):
1779 """Normally this just lets the original stream pass-through unchanged.
1780
1781 However if any 'inventory-deltas' substream occurs it will stop
1782 streaming, and store the interrupted substream and stream in
1783 self._last_substream and self._last_stream so that the stream can be
1784 resumed by _resume_stream_with_vfs.
1785 """
1786
1787 stream_iter = iter(stream)
1788 for substream_kind, substream in stream_iter:
1789 if substream_kind == 'inventory-deltas':
1790 self._last_substream = substream
1791 self._last_stream = stream_iter
1792 return
1793 else:
1794 yield substream_kind, substream
1795
17301796
1731class RemoteStreamSource(repository.StreamSource):1797class RemoteStreamSource(repository.StreamSource):
1732 """Stream data from a remote server."""1798 """Stream data from a remote server."""
@@ -1747,6 +1813,12 @@
1747 sources.append(repo)1813 sources.append(repo)
1748 return self.missing_parents_chain(search, sources)1814 return self.missing_parents_chain(search, sources)
17491815
1816 def get_stream_for_missing_keys(self, missing_keys):
1817 self.from_repository._ensure_real()
1818 real_repo = self.from_repository._real_repository
1819 real_source = real_repo._get_source(self.to_format)
1820 return real_source.get_stream_for_missing_keys(missing_keys)
1821
1750 def _real_stream(self, repo, search):1822 def _real_stream(self, repo, search):
1751 """Get a stream for search from repo.1823 """Get a stream for search from repo.
1752 1824
@@ -1782,18 +1854,26 @@
1782 return self._real_stream(repo, search)1854 return self._real_stream(repo, search)
1783 client = repo._client1855 client = repo._client
1784 medium = client._medium1856 medium = client._medium
1785 if medium._is_remote_before((1, 13)):
1786 # streaming was added in 1.13
1787 return self._real_stream(repo, search)
1788 path = repo.bzrdir._path_for_remote_call(client)1857 path = repo.bzrdir._path_for_remote_call(client)
1789 try:1858 search_bytes = repo._serialise_search_result(search)
1790 search_bytes = repo._serialise_search_result(search)1859 args = (path, self.to_format.network_name())
1791 response = repo._call_with_body_bytes_expecting_body(1860 candidate_verbs = [
1792 'Repository.get_stream',1861 ('Repository.get_stream_1.19', (1, 19)),
1793 (path, self.to_format.network_name()), search_bytes)1862 ('Repository.get_stream', (1, 13))]
1794 response_tuple, response_handler = response1863 found_verb = False
1795 except errors.UnknownSmartMethod:1864 for verb, version in candidate_verbs:
1796 medium._remember_remote_is_before((1,13))1865 if medium._is_remote_before(version):
1866 continue
1867 try:
1868 response = repo._call_with_body_bytes_expecting_body(
1869 verb, args, search_bytes)
1870 except errors.UnknownSmartMethod:
1871 medium._remember_remote_is_before(version)
1872 else:
1873 response_tuple, response_handler = response
1874 found_verb = True
1875 break
1876 if not found_verb:
1797 return self._real_stream(repo, search)1877 return self._real_stream(repo, search)
1798 if response_tuple[0] != 'ok':1878 if response_tuple[0] != 'ok':
1799 raise errors.UnexpectedSmartServerResponse(response_tuple)1879 raise errors.UnexpectedSmartServerResponse(response_tuple)
18001880
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-08-12 18:58:05 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-08-14 05:35:32 +0000
@@ -154,6 +154,8 @@
154 self._writer.begin()154 self._writer.begin()
155 # what state is the pack in? (open, finished, aborted)155 # what state is the pack in? (open, finished, aborted)
156 self._state = 'open'156 self._state = 'open'
157 # no name until we finish writing the content
158 self.name = None
157159
158 def _check_references(self):160 def _check_references(self):
159 """Make sure our external references are present.161 """Make sure our external references are present.
@@ -477,6 +479,15 @@
477 if not self._use_pack(self.new_pack):479 if not self._use_pack(self.new_pack):
478 self.new_pack.abort()480 self.new_pack.abort()
479 return None481 return None
482 self.new_pack.finish_content()
483 if len(self.packs) == 1:
484 old_pack = self.packs[0]
485 if old_pack.name == self.new_pack._hash.hexdigest():
486 # The single old pack was already optimally packed.
487 trace.mutter('single pack %s was already optimally packed',
488 old_pack.name)
489 self.new_pack.abort()
490 return None
480 self.pb.update('finishing repack', 6, 7)491 self.pb.update('finishing repack', 6, 7)
481 self.new_pack.finish()492 self.new_pack.finish()
482 self._pack_collection.allocate(self.new_pack)493 self._pack_collection.allocate(self.new_pack)
@@ -591,7 +602,7 @@
591 packer = GCCHKPacker(self, packs, '.autopack',602 packer = GCCHKPacker(self, packs, '.autopack',
592 reload_func=reload_func)603 reload_func=reload_func)
593 try:604 try:
594 packer.pack()605 result = packer.pack()
595 except errors.RetryWithNewPacks:606 except errors.RetryWithNewPacks:
596 # An exception is propagating out of this context, make sure607 # An exception is propagating out of this context, make sure
597 # this packer has cleaned up. Packer() doesn't set its new_pack608 # this packer has cleaned up. Packer() doesn't set its new_pack
@@ -600,6 +611,8 @@
600 if packer.new_pack is not None:611 if packer.new_pack is not None:
601 packer.new_pack.abort()612 packer.new_pack.abort()
602 raise613 raise
614 if result is None:
615 return
603 for pack in packs:616 for pack in packs:
604 self._remove_pack_from_memory(pack)617 self._remove_pack_from_memory(pack)
605 # record the newly available packs and stop advertising the old618 # record the newly available packs and stop advertising the old
@@ -781,10 +794,12 @@
781 return inventory.CHKInventory.deserialise(self.chk_bytes, bytes,794 return inventory.CHKInventory.deserialise(self.chk_bytes, bytes,
782 (revision_id,))795 (revision_id,))
783796
784 def _iter_inventories(self, revision_ids):797 def _iter_inventories(self, revision_ids, ordering):
785 """Iterate over many inventory objects."""798 """Iterate over many inventory objects."""
799 if ordering is None:
800 ordering = 'unordered'
786 keys = [(revision_id,) for revision_id in revision_ids]801 keys = [(revision_id,) for revision_id in revision_ids]
787 stream = self.inventories.get_record_stream(keys, 'unordered', True)802 stream = self.inventories.get_record_stream(keys, ordering, True)
788 texts = {}803 texts = {}
789 for record in stream:804 for record in stream:
790 if record.storage_kind != 'absent':805 if record.storage_kind != 'absent':
@@ -794,7 +809,7 @@
794 for key in keys:809 for key in keys:
795 yield inventory.CHKInventory.deserialise(self.chk_bytes, texts[key], key)810 yield inventory.CHKInventory.deserialise(self.chk_bytes, texts[key], key)
796811
797 def _iter_inventory_xmls(self, revision_ids):812 def _iter_inventory_xmls(self, revision_ids, ordering):
798 # Without a native 'xml' inventory, this method doesn't make sense, so813 # Without a native 'xml' inventory, this method doesn't make sense, so
799 # make it raise to trap naughty direct users.814 # make it raise to trap naughty direct users.
800 raise NotImplementedError(self._iter_inventory_xmls)815 raise NotImplementedError(self._iter_inventory_xmls)
@@ -894,14 +909,11 @@
894909
895 def _get_source(self, to_format):910 def _get_source(self, to_format):
896 """Return a source for streaming from this repository."""911 """Return a source for streaming from this repository."""
897 if isinstance(to_format, remote.RemoteRepositoryFormat):912 if self._format._serializer == to_format._serializer:
898 # Can't just check attributes on to_format with the current code,
899 # work around this:
900 to_format._ensure_real()
901 to_format = to_format._custom_format
902 if to_format.__class__ is self._format.__class__:
903 # We must be exactly the same format, otherwise stuff like the chk913 # We must be exactly the same format, otherwise stuff like the chk
904 # page layout might be different914 # page layout might be different.
915 # Actually, this test is just slightly looser than exact so that
916 # CHK2 <-> 2a transfers will work.
905 return GroupCHKStreamSource(self, to_format)917 return GroupCHKStreamSource(self, to_format)
906 return super(CHKInventoryRepository, self)._get_source(to_format)918 return super(CHKInventoryRepository, self)._get_source(to_format)
907919
908920
=== modified file 'bzrlib/repofmt/pack_repo.py'
--- bzrlib/repofmt/pack_repo.py 2009-08-05 01:05:58 +0000
+++ bzrlib/repofmt/pack_repo.py 2009-08-14 05:35:32 +0000
@@ -422,6 +422,8 @@
422 self._writer.begin()422 self._writer.begin()
423 # what state is the pack in? (open, finished, aborted)423 # what state is the pack in? (open, finished, aborted)
424 self._state = 'open'424 self._state = 'open'
425 # no name until we finish writing the content
426 self.name = None
425427
426 def abort(self):428 def abort(self):
427 """Cancel creating this pack."""429 """Cancel creating this pack."""
@@ -448,6 +450,14 @@
448 self.signature_index.key_count() or450 self.signature_index.key_count() or
449 (self.chk_index is not None and self.chk_index.key_count()))451 (self.chk_index is not None and self.chk_index.key_count()))
450452
453 def finish_content(self):
454 if self.name is not None:
455 return
456 self._writer.end()
457 if self._buffer[1]:
458 self._write_data('', flush=True)
459 self.name = self._hash.hexdigest()
460
451 def finish(self, suspend=False):461 def finish(self, suspend=False):
452 """Finish the new pack.462 """Finish the new pack.
453463
@@ -459,10 +469,7 @@
459 - stores the index size tuple for the pack in the index_sizes469 - stores the index size tuple for the pack in the index_sizes
460 attribute.470 attribute.
461 """471 """
462 self._writer.end()472 self.finish_content()
463 if self._buffer[1]:
464 self._write_data('', flush=True)
465 self.name = self._hash.hexdigest()
466 if not suspend:473 if not suspend:
467 self._check_references()474 self._check_references()
468 # write indices475 # write indices
@@ -1567,7 +1574,9 @@
1567 # determine which packs need changing1574 # determine which packs need changing
1568 pack_operations = [[0, []]]1575 pack_operations = [[0, []]]
1569 for pack in self.all_packs():1576 for pack in self.all_packs():
1570 if not hint or pack.name in hint:1577 if hint is None or pack.name in hint:
1578 # Either no hint was provided (so we are packing everything),
1579 # or this pack was included in the hint.
1571 pack_operations[-1][0] += pack.get_revision_count()1580 pack_operations[-1][0] += pack.get_revision_count()
1572 pack_operations[-1][1].append(pack)1581 pack_operations[-1][1].append(pack)
1573 self._execute_pack_operations(pack_operations, OptimisingPacker)1582 self._execute_pack_operations(pack_operations, OptimisingPacker)
@@ -2093,6 +2102,7 @@
2093 # when autopack takes no steps, the names list is still2102 # when autopack takes no steps, the names list is still
2094 # unsaved.2103 # unsaved.
2095 return self._save_pack_names()2104 return self._save_pack_names()
2105 return []
20962106
2097 def _suspend_write_group(self):2107 def _suspend_write_group(self):
2098 tokens = [pack.name for pack in self._resumed_packs]2108 tokens = [pack.name for pack in self._resumed_packs]
20992109
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-08-12 22:28:28 +0000
+++ bzrlib/repository.py 2009-08-14 05:35:32 +0000
@@ -31,6 +31,7 @@
31 gpg,31 gpg,
32 graph,32 graph,
33 inventory,33 inventory,
34 inventory_delta,
34 lazy_regex,35 lazy_regex,
35 lockable_files,36 lockable_files,
36 lockdir,37 lockdir,
@@ -924,6 +925,11 @@
924 """925 """
925 if self._write_group is not self.get_transaction():926 if self._write_group is not self.get_transaction():
926 # has an unlock or relock occured ?927 # has an unlock or relock occured ?
928 if suppress_errors:
929 mutter(
930 '(suppressed) mismatched lock context and write group. %r, %r',
931 self._write_group, self.get_transaction())
932 return
927 raise errors.BzrError(933 raise errors.BzrError(
928 'mismatched lock context and write group. %r, %r' %934 'mismatched lock context and write group. %r, %r' %
929 (self._write_group, self.get_transaction()))935 (self._write_group, self.get_transaction()))
@@ -1063,8 +1069,10 @@
1063 check_content=True):1069 check_content=True):
1064 """Store lines in inv_vf and return the sha1 of the inventory."""1070 """Store lines in inv_vf and return the sha1 of the inventory."""
1065 parents = [(parent,) for parent in parents]1071 parents = [(parent,) for parent in parents]
1066 return self.inventories.add_lines((revision_id,), parents, lines,1072 result = self.inventories.add_lines((revision_id,), parents, lines,
1067 check_content=check_content)[0]1073 check_content=check_content)[0]
1074 self.inventories._access.flush()
1075 return result
10681076
1069 def add_revision(self, revision_id, rev, inv=None, config=None):1077 def add_revision(self, revision_id, rev, inv=None, config=None):
1070 """Add rev to the revision store as revision_id.1078 """Add rev to the revision store as revision_id.
@@ -1529,6 +1537,8 @@
1529 """Commit the contents accrued within the current write group.1537 """Commit the contents accrued within the current write group.
15301538
1531 :seealso: start_write_group.1539 :seealso: start_write_group.
1540
1541 :return: it may return an opaque hint that can be passed to 'pack'.
1532 """1542 """
1533 if self._write_group is not self.get_transaction():1543 if self._write_group is not self.get_transaction():
1534 # has an unlock or relock occured ?1544 # has an unlock or relock occured ?
@@ -2340,7 +2350,7 @@
2340 """Get Inventory object by revision id."""2350 """Get Inventory object by revision id."""
2341 return self.iter_inventories([revision_id]).next()2351 return self.iter_inventories([revision_id]).next()
23422352
2343 def iter_inventories(self, revision_ids):2353 def iter_inventories(self, revision_ids, ordering=None):
2344 """Get many inventories by revision_ids.2354 """Get many inventories by revision_ids.
23452355
2346 This will buffer some or all of the texts used in constructing the2356 This will buffer some or all of the texts used in constructing the
@@ -2348,30 +2358,57 @@
2348 time.2358 time.
23492359
2350 :param revision_ids: The expected revision ids of the inventories.2360 :param revision_ids: The expected revision ids of the inventories.
2361 :param ordering: optional ordering, e.g. 'topological'. If not
2362 specified, the order of revision_ids will be preserved (by
2363 buffering if necessary).
2351 :return: An iterator of inventories.2364 :return: An iterator of inventories.
2352 """2365 """
2353 if ((None in revision_ids)2366 if ((None in revision_ids)
2354 or (_mod_revision.NULL_REVISION in revision_ids)):2367 or (_mod_revision.NULL_REVISION in revision_ids)):
2355 raise ValueError('cannot get null revision inventory')2368 raise ValueError('cannot get null revision inventory')
2356 return self._iter_inventories(revision_ids)2369 return self._iter_inventories(revision_ids, ordering)
23572370
2358 def _iter_inventories(self, revision_ids):2371 def _iter_inventories(self, revision_ids, ordering):
2359 """single-document based inventory iteration."""2372 """single-document based inventory iteration."""
2360 for text, revision_id in self._iter_inventory_xmls(revision_ids):2373 inv_xmls = self._iter_inventory_xmls(revision_ids, ordering)
2374 for text, revision_id in inv_xmls:
2361 yield self.deserialise_inventory(revision_id, text)2375 yield self.deserialise_inventory(revision_id, text)
23622376
2363 def _iter_inventory_xmls(self, revision_ids):2377 def _iter_inventory_xmls(self, revision_ids, ordering):
2378 if ordering is None:
2379 order_as_requested = True
2380 ordering = 'unordered'
2381 else:
2382 order_as_requested = False
2364 keys = [(revision_id,) for revision_id in revision_ids]2383 keys = [(revision_id,) for revision_id in revision_ids]
2365 stream = self.inventories.get_record_stream(keys, 'unordered', True)2384 if not keys:
2385 return
2386 if order_as_requested:
2387 key_iter = iter(keys)
2388 next_key = key_iter.next()
2389 stream = self.inventories.get_record_stream(keys, ordering, True)
2366 text_chunks = {}2390 text_chunks = {}
2367 for record in stream:2391 for record in stream:
2368 if record.storage_kind != 'absent':2392 if record.storage_kind != 'absent':
2369 text_chunks[record.key] = record.get_bytes_as('chunked')2393 chunks = record.get_bytes_as('chunked')
2394 if order_as_requested:
2395 text_chunks[record.key] = chunks
2396 else:
2397 yield ''.join(chunks), record.key[-1]
2370 else:2398 else:
2371 raise errors.NoSuchRevision(self, record.key)2399 raise errors.NoSuchRevision(self, record.key)
2372 for key in keys:2400 if order_as_requested:
2373 chunks = text_chunks.pop(key)2401 # Yield as many results as we can while preserving order.
2374 yield ''.join(chunks), key[-1]2402 while next_key in text_chunks:
2403 chunks = text_chunks.pop(next_key)
2404 yield ''.join(chunks), next_key[-1]
2405 try:
2406 next_key = key_iter.next()
2407 except StopIteration:
2408 # We still want to fully consume the get_record_stream,
2409 # just in case it is not actually finished at this point
2410 next_key = None
2411 break
23752412
2376 def deserialise_inventory(self, revision_id, xml):2413 def deserialise_inventory(self, revision_id, xml):
2377 """Transform the xml into an inventory object.2414 """Transform the xml into an inventory object.
@@ -2398,7 +2435,7 @@
2398 @needs_read_lock2435 @needs_read_lock
2399 def get_inventory_xml(self, revision_id):2436 def get_inventory_xml(self, revision_id):
2400 """Get inventory XML as a file object."""2437 """Get inventory XML as a file object."""
2401 texts = self._iter_inventory_xmls([revision_id])2438 texts = self._iter_inventory_xmls([revision_id], 'unordered')
2402 try:2439 try:
2403 text, revision_id = texts.next()2440 text, revision_id = texts.next()
2404 except StopIteration:2441 except StopIteration:
@@ -3664,11 +3701,25 @@
3664 # This is redundant with format.check_conversion_target(), however that3701 # This is redundant with format.check_conversion_target(), however that
3665 # raises an exception, and we just want to say "False" as in we won't3702 # raises an exception, and we just want to say "False" as in we won't
3666 # support converting between these formats.3703 # support converting between these formats.
3704 if 'IDS_never' in debug.debug_flags:
3705 return False
3667 if source.supports_rich_root() and not target.supports_rich_root():3706 if source.supports_rich_root() and not target.supports_rich_root():
3668 return False3707 return False
3669 if (source._format.supports_tree_reference3708 if (source._format.supports_tree_reference
3670 and not target._format.supports_tree_reference):3709 and not target._format.supports_tree_reference):
3671 return False3710 return False
3711 if target._fallback_repositories and target._format.supports_chks:
3712 # IDS doesn't know how to copy CHKs for the parent inventories it
3713 # adds to stacked repos.
3714 return False
3715 if 'IDS_always' in debug.debug_flags:
3716 return True
3717 # Only use this code path for local source and target. IDS does far
3718 # too much IO (both bandwidth and roundtrips) over a network.
3719 if not source.bzrdir.transport.base.startswith('file:///'):
3720 return False
3721 if not target.bzrdir.transport.base.startswith('file:///'):
3722 return False
3672 return True3723 return True
36733724
3674 def _get_delta_for_revision(self, tree, parent_ids, basis_id, cache):3725 def _get_delta_for_revision(self, tree, parent_ids, basis_id, cache):
@@ -3690,63 +3741,6 @@
3690 deltas.sort()3741 deltas.sort()
3691 return deltas[0][1:]3742 return deltas[0][1:]
36923743
3693 def _get_parent_keys(self, root_key, parent_map):
3694 """Get the parent keys for a given root id."""
3695 root_id, rev_id = root_key
3696 # Include direct parents of the revision, but only if they used
3697 # the same root_id and are heads.
3698 parent_keys = []
3699 for parent_id in parent_map[rev_id]:
3700 if parent_id == _mod_revision.NULL_REVISION:
3701 continue
3702 if parent_id not in self._revision_id_to_root_id:
3703 # We probably didn't read this revision, go spend the
3704 # extra effort to actually check
3705 try:
3706 tree = self.source.revision_tree(parent_id)
3707 except errors.NoSuchRevision:
3708 # Ghost, fill out _revision_id_to_root_id in case we
3709 # encounter this again.
3710 # But set parent_root_id to None since we don't really know
3711 parent_root_id = None
3712 else:
3713 parent_root_id = tree.get_root_id()
3714 self._revision_id_to_root_id[parent_id] = None
3715 else:
3716 parent_root_id = self._revision_id_to_root_id[parent_id]
3717 if root_id == parent_root_id:
3718 # With stacking we _might_ want to refer to a non-local
3719 # revision, but this code path only applies when we have the
3720 # full content available, so ghosts really are ghosts, not just
3721 # the edge of local data.
3722 parent_keys.append((parent_id,))
3723 else:
3724 # root_id may be in the parent anyway.
3725 try:
3726 tree = self.source.revision_tree(parent_id)
3727 except errors.NoSuchRevision:
3728 # ghost, can't refer to it.
3729 pass
3730 else:
3731 try:
3732 parent_keys.append((tree.inventory[root_id].revision,))
3733 except errors.NoSuchId:
3734 # not in the tree
3735 pass
3736 g = graph.Graph(self.source.revisions)
3737 heads = g.heads(parent_keys)
3738 selected_keys = []
3739 for key in parent_keys:
3740 if key in heads and key not in selected_keys:
3741 selected_keys.append(key)
3742 return tuple([(root_id,)+ key for key in selected_keys])
3743
3744 def _new_root_data_stream(self, root_keys_to_create, parent_map):
3745 for root_key in root_keys_to_create:
3746 parent_keys = self._get_parent_keys(root_key, parent_map)
3747 yield versionedfile.FulltextContentFactory(root_key,
3748 parent_keys, None, '')
3749
3750 def _fetch_batch(self, revision_ids, basis_id, cache):3744 def _fetch_batch(self, revision_ids, basis_id, cache):
3751 """Fetch across a few revisions.3745 """Fetch across a few revisions.
37523746
@@ -3798,8 +3792,10 @@
3798 from_texts = self.source.texts3792 from_texts = self.source.texts
3799 to_texts = self.target.texts3793 to_texts = self.target.texts
3800 if root_keys_to_create:3794 if root_keys_to_create:
3801 root_stream = self._new_root_data_stream(root_keys_to_create,3795 from bzrlib.fetch import _new_root_data_stream
3802 parent_map)3796 root_stream = _new_root_data_stream(
3797 root_keys_to_create, self._revision_id_to_root_id, parent_map,
3798 self.source)
3803 to_texts.insert_record_stream(root_stream)3799 to_texts.insert_record_stream(root_stream)
3804 to_texts.insert_record_stream(from_texts.get_record_stream(3800 to_texts.insert_record_stream(from_texts.get_record_stream(
3805 text_keys, self.target._format._fetch_order,3801 text_keys, self.target._format._fetch_order,
@@ -3899,7 +3895,6 @@
3899 # Walk though all revisions; get inventory deltas, copy referenced3895 # Walk though all revisions; get inventory deltas, copy referenced
3900 # texts that delta references, insert the delta, revision and3896 # texts that delta references, insert the delta, revision and
3901 # signature.3897 # signature.
3902 first_rev = self.source.get_revision(revision_ids[0])
3903 if pb is None:3898 if pb is None:
3904 my_pb = ui.ui_factory.nested_progress_bar()3899 my_pb = ui.ui_factory.nested_progress_bar()
3905 pb = my_pb3900 pb = my_pb
@@ -4071,9 +4066,6 @@
4071 self.file_ids = set([file_id for file_id, _ in4066 self.file_ids = set([file_id for file_id, _ in
4072 self.text_index.iterkeys()])4067 self.text_index.iterkeys()])
4073 # text keys is now grouped by file_id4068 # text keys is now grouped by file_id
4074 n_weaves = len(self.file_ids)
4075 files_in_revisions = {}
4076 revisions_of_files = {}
4077 n_versions = len(self.text_index)4069 n_versions = len(self.text_index)
4078 progress_bar.update('loading text store', 0, n_versions)4070 progress_bar.update('loading text store', 0, n_versions)
4079 parent_map = self.repository.texts.get_parent_map(self.text_index)4071 parent_map = self.repository.texts.get_parent_map(self.text_index)
@@ -4172,6 +4164,8 @@
4172 else:4164 else:
4173 new_pack.set_write_cache_size(1024*1024)4165 new_pack.set_write_cache_size(1024*1024)
4174 for substream_type, substream in stream:4166 for substream_type, substream in stream:
4167 if 'stream' in debug.debug_flags:
4168 mutter('inserting substream: %s', substream_type)
4175 if substream_type == 'texts':4169 if substream_type == 'texts':
4176 self.target_repo.texts.insert_record_stream(substream)4170 self.target_repo.texts.insert_record_stream(substream)
4177 elif substream_type == 'inventories':4171 elif substream_type == 'inventories':
@@ -4181,6 +4175,9 @@
4181 else:4175 else:
4182 self._extract_and_insert_inventories(4176 self._extract_and_insert_inventories(
4183 substream, src_serializer)4177 substream, src_serializer)
4178 elif substream_type == 'inventory-deltas':
4179 self._extract_and_insert_inventory_deltas(
4180 substream, src_serializer)
4184 elif substream_type == 'chk_bytes':4181 elif substream_type == 'chk_bytes':
4185 # XXX: This doesn't support conversions, as it assumes the4182 # XXX: This doesn't support conversions, as it assumes the
4186 # conversion was done in the fetch code.4183 # conversion was done in the fetch code.
@@ -4237,18 +4234,45 @@
4237 self.target_repo.pack(hint=hint)4234 self.target_repo.pack(hint=hint)
4238 return [], set()4235 return [], set()
42394236
4240 def _extract_and_insert_inventories(self, substream, serializer):4237 def _extract_and_insert_inventory_deltas(self, substream, serializer):
4238 target_rich_root = self.target_repo._format.rich_root_data
4239 target_tree_refs = self.target_repo._format.supports_tree_reference
4240 for record in substream:
4241 # Insert the delta directly
4242 inventory_delta_bytes = record.get_bytes_as('fulltext')
4243 deserialiser = inventory_delta.InventoryDeltaDeserializer()
4244 try:
4245 parse_result = deserialiser.parse_text_bytes(
4246 inventory_delta_bytes)
4247 except inventory_delta.IncompatibleInventoryDelta, err:
4248 trace.mutter("Incompatible delta: %s", err.msg)
4249 raise errors.IncompatibleRevision(self.target_repo._format)
4250 basis_id, new_id, rich_root, tree_refs, inv_delta = parse_result
4251 revision_id = new_id
4252 parents = [key[0] for key in record.parents]
4253 self.target_repo.add_inventory_by_delta(
4254 basis_id, inv_delta, revision_id, parents)
4255
4256 def _extract_and_insert_inventories(self, substream, serializer,
4257 parse_delta=None):
4241 """Generate a new inventory versionedfile in target, converting data.4258 """Generate a new inventory versionedfile in target, converting data.
42424259
4243 The inventory is retrieved from the source, (deserializing it), and4260 The inventory is retrieved from the source, (deserializing it), and
4244 stored in the target (reserializing it in a different format).4261 stored in the target (reserializing it in a different format).
4245 """4262 """
4263 target_rich_root = self.target_repo._format.rich_root_data
4264 target_tree_refs = self.target_repo._format.supports_tree_reference
4246 for record in substream:4265 for record in substream:
4266 # It's not a delta, so it must be a fulltext in the source
4267 # serializer's format.
4247 bytes = record.get_bytes_as('fulltext')4268 bytes = record.get_bytes_as('fulltext')
4248 revision_id = record.key[0]4269 revision_id = record.key[0]
4249 inv = serializer.read_inventory_from_string(bytes, revision_id)4270 inv = serializer.read_inventory_from_string(bytes, revision_id)
4250 parents = [key[0] for key in record.parents]4271 parents = [key[0] for key in record.parents]
4251 self.target_repo.add_inventory(revision_id, inv, parents)4272 self.target_repo.add_inventory(revision_id, inv, parents)
4273 # No need to keep holding this full inv in memory when the rest of
4274 # the substream is likely to be all deltas.
4275 del inv
42524276
4253 def _extract_and_insert_revisions(self, substream, serializer):4277 def _extract_and_insert_revisions(self, substream, serializer):
4254 for record in substream:4278 for record in substream:
@@ -4303,11 +4327,8 @@
4303 return [('signatures', signatures), ('revisions', revisions)]4327 return [('signatures', signatures), ('revisions', revisions)]
43044328
4305 def _generate_root_texts(self, revs):4329 def _generate_root_texts(self, revs):
4306 """This will be called by __fetch between fetching weave texts and4330 """This will be called by get_stream between fetching weave texts and
4307 fetching the inventory weave.4331 fetching the inventory weave.
4308
4309 Subclasses should override this if they need to generate root texts
4310 after fetching weave texts.
4311 """4332 """
4312 if self._rich_root_upgrade():4333 if self._rich_root_upgrade():
4313 import bzrlib.fetch4334 import bzrlib.fetch
@@ -4345,9 +4366,6 @@
4345 # will be valid.4366 # will be valid.
4346 for _ in self._generate_root_texts(revs):4367 for _ in self._generate_root_texts(revs):
4347 yield _4368 yield _
4348 # NB: This currently reopens the inventory weave in source;
4349 # using a single stream interface instead would avoid this.
4350 from_weave = self.from_repository.inventories
4351 # we fetch only the referenced inventories because we do not4369 # we fetch only the referenced inventories because we do not
4352 # know for unselected inventories whether all their required4370 # know for unselected inventories whether all their required
4353 # texts are present in the other repository - it could be4371 # texts are present in the other repository - it could be
@@ -4392,6 +4410,18 @@
4392 if not keys:4410 if not keys:
4393 # No need to stream something we don't have4411 # No need to stream something we don't have
4394 continue4412 continue
4413 if substream_kind == 'inventories':
4414 # Some missing keys are genuinely ghosts, filter those out.
4415 present = self.from_repository.inventories.get_parent_map(keys)
4416 revs = [key[0] for key in present]
4417 # Get the inventory stream more-or-less as we do for the
4418 # original stream; there's no reason to assume that records
4419 # direct from the source will be suitable for the sink. (Think
4420 # e.g. 2a -> 1.9-rich-root).
4421 for info in self._get_inventory_stream(revs, missing=True):
4422 yield info
4423 continue
4424
4395 # Ask for full texts always so that we don't need more round trips4425 # Ask for full texts always so that we don't need more round trips
4396 # after this stream.4426 # after this stream.
4397 # Some of the missing keys are genuinely ghosts, so filter absent4427 # Some of the missing keys are genuinely ghosts, so filter absent
@@ -4412,129 +4442,116 @@
4412 return (not self.from_repository._format.rich_root_data and4442 return (not self.from_repository._format.rich_root_data and
4413 self.to_format.rich_root_data)4443 self.to_format.rich_root_data)
44144444
4415 def _get_inventory_stream(self, revision_ids):4445 def _get_inventory_stream(self, revision_ids, missing=False):
4416 from_format = self.from_repository._format4446 from_format = self.from_repository._format
4417 if (from_format.supports_chks and self.to_format.supports_chks4447 if (from_format.supports_chks and self.to_format.supports_chks and
4418 and (from_format._serializer == self.to_format._serializer)):4448 from_format.network_name() == self.to_format.network_name()):
4419 # Both sides support chks, and they use the same serializer, so it4449 raise AssertionError(
4420 # is safe to transmit the chk pages and inventory pages across4450 "this case should be handled by GroupCHKStreamSource")
4421 # as-is.4451 elif 'forceinvdeltas' in debug.debug_flags:
4422 return self._get_chk_inventory_stream(revision_ids)4452 return self._get_convertable_inventory_stream(revision_ids,
4423 elif (not from_format.supports_chks):4453 delta_versus_null=missing)
4424 # Source repository doesn't support chks. So we can transmit the4454 elif from_format.network_name() == self.to_format.network_name():
4425 # inventories 'as-is' and either they are just accepted on the4455 # Same format.
4426 # target, or the Sink will properly convert it.4456 return self._get_simple_inventory_stream(revision_ids,
4427 return self._get_simple_inventory_stream(revision_ids)4457 missing=missing)
4458 elif (not from_format.supports_chks and not self.to_format.supports_chks
4459 and from_format._serializer == self.to_format._serializer):
4460 # Essentially the same format.
4461 return self._get_simple_inventory_stream(revision_ids,
4462 missing=missing)
4428 else:4463 else:
4429 # XXX: Hack to make not-chk->chk fetch: copy the inventories as4464 # Any time we switch serializations, we want to use an
4430 # inventories. Note that this should probably be done somehow4465 # inventory-delta based approach.
4431 # as part of bzrlib.repository.StreamSink. Except JAM couldn't4466 return self._get_convertable_inventory_stream(revision_ids,
4432 # figure out how a non-chk repository could possibly handle4467 delta_versus_null=missing)
4433 # deserializing an inventory stream from a chk repo, as it
4434 # doesn't have a way to understand individual pages.
4435 return self._get_convertable_inventory_stream(revision_ids)
44364468
4437 def _get_simple_inventory_stream(self, revision_ids):4469 def _get_simple_inventory_stream(self, revision_ids, missing=False):
4470 # NB: This currently reopens the inventory weave in source;
4471 # using a single stream interface instead would avoid this.
4438 from_weave = self.from_repository.inventories4472 from_weave = self.from_repository.inventories
4473 if missing:
4474 delta_closure = True
4475 else:
4476 delta_closure = not self.delta_on_metadata()
4439 yield ('inventories', from_weave.get_record_stream(4477 yield ('inventories', from_weave.get_record_stream(
4440 [(rev_id,) for rev_id in revision_ids],4478 [(rev_id,) for rev_id in revision_ids],
4441 self.inventory_fetch_order(),4479 self.inventory_fetch_order(), delta_closure))
4442 not self.delta_on_metadata()))4480
44434481 def _get_convertable_inventory_stream(self, revision_ids,
4444 def _get_chk_inventory_stream(self, revision_ids):4482 delta_versus_null=False):
4445 """Fetch the inventory texts, along with the associated chk maps."""4483 # The source is using CHKs, but the target either doesn't or it has a
4446 # We want an inventory outside of the search set, so that we can filter4484 # different serializer. The StreamSink code expects to be able to
4447 # out uninteresting chk pages. For now we use4485 # convert on the target, so we need to put bytes-on-the-wire that can
4448 # _find_revision_outside_set, but if we had a Search with cut_revs, we4486 # be converted. That means inventory deltas (if the remote is <1.19,
4449 # could use that instead.4487 # RemoteStreamSink will fallback to VFS to insert the deltas).
4450 start_rev_id = self.from_repository._find_revision_outside_set(4488 yield ('inventory-deltas',
4451 revision_ids)4489 self._stream_invs_as_deltas(revision_ids,
4452 start_rev_key = (start_rev_id,)4490 delta_versus_null=delta_versus_null))
4453 inv_keys_to_fetch = [(rev_id,) for rev_id in revision_ids]4491
4454 if start_rev_id != _mod_revision.NULL_REVISION:4492 def _stream_invs_as_deltas(self, revision_ids, delta_versus_null=False):
4455 inv_keys_to_fetch.append((start_rev_id,))4493 """Return a stream of inventory-deltas for the given rev ids.
4456 # Any repo that supports chk_bytes must also support out-of-order4494
4457 # insertion. At least, that is how we expect it to work4495 :param revision_ids: The list of inventories to transmit
4458 # We use get_record_stream instead of iter_inventories because we want4496 :param delta_versus_null: Don't try to find a minimal delta for this
4459 # to be able to insert the stream as well. We could instead fetch4497 entry, instead compute the delta versus the NULL_REVISION. This
4460 # allowing deltas, and then iter_inventories, but we don't know whether4498 effectively streams a complete inventory. Used for stuff like
4461 # source or target is more 'local' anway.4499 filling in missing parents, etc.
4462 inv_stream = self.from_repository.inventories.get_record_stream(4500 """
4463 inv_keys_to_fetch, 'unordered',
4464 True) # We need them as full-texts so we can find their references
4465 uninteresting_chk_roots = set()
4466 interesting_chk_roots = set()
4467 def filter_inv_stream(inv_stream):
4468 for idx, record in enumerate(inv_stream):
4469 ### child_pb.update('fetch inv', idx, len(inv_keys_to_fetch))
4470 bytes = record.get_bytes_as('fulltext')
4471 chk_inv = inventory.CHKInventory.deserialise(
4472 self.from_repository.chk_bytes, bytes, record.key)
4473 if record.key == start_rev_key:
4474 uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
4475 p_id_map = chk_inv.parent_id_basename_to_file_id
4476 if p_id_map is not None:
4477 uninteresting_chk_roots.add(p_id_map.key())
4478 else:
4479 yield record
4480 interesting_chk_roots.add(chk_inv.id_to_entry.key())
4481 p_id_map = chk_inv.parent_id_basename_to_file_id
4482 if p_id_map is not None:
4483 interesting_chk_roots.add(p_id_map.key())
4484 ### pb.update('fetch inventory', 0, 2)
4485 yield ('inventories', filter_inv_stream(inv_stream))
4486 # Now that we have worked out all of the interesting root nodes, grab
4487 # all of the interesting pages and insert them
4488 ### pb.update('fetch inventory', 1, 2)
4489 interesting = chk_map.iter_interesting_nodes(
4490 self.from_repository.chk_bytes, interesting_chk_roots,
4491 uninteresting_chk_roots)
4492 def to_stream_adapter():
4493 """Adapt the iter_interesting_nodes result to a single stream.
4494
4495 iter_interesting_nodes returns records as it processes them, along
4496 with keys. However, we only want to return the records themselves.
4497 """
4498 for record, items in interesting:
4499 if record is not None:
4500 yield record
4501 # XXX: We could instead call get_record_stream(records.keys())
4502 # ATM, this will always insert the records as fulltexts, and
4503 # requires that you can hang on to records once you have gone
4504 # on to the next one. Further, it causes the target to
4505 # recompress the data. Testing shows it to be faster than
4506 # requesting the records again, though.
4507 yield ('chk_bytes', to_stream_adapter())
4508 ### pb.update('fetch inventory', 2, 2)
4509
4510 def _get_convertable_inventory_stream(self, revision_ids):
4511 # XXX: One of source or target is using chks, and they don't have
4512 # compatible serializations. The StreamSink code expects to be
4513 # able to convert on the target, so we need to put
4514 # bytes-on-the-wire that can be converted
4515 yield ('inventories', self._stream_invs_as_fulltexts(revision_ids))
4516
4517 def _stream_invs_as_fulltexts(self, revision_ids):
4518 from_repo = self.from_repository4501 from_repo = self.from_repository
4519 from_serializer = from_repo._format._serializer
4520 revision_keys = [(rev_id,) for rev_id in revision_ids]4502 revision_keys = [(rev_id,) for rev_id in revision_ids]
4521 parent_map = from_repo.inventories.get_parent_map(revision_keys)4503 parent_map = from_repo.inventories.get_parent_map(revision_keys)
4522 for inv in self.from_repository.iter_inventories(revision_ids):4504 # XXX: possibly repos could implement a more efficient iter_inv_deltas
4523 # XXX: This is a bit hackish, but it works. Basically,4505 # method...
4524 # CHKSerializer 'accidentally' supports4506 inventories = self.from_repository.iter_inventories(
4525 # read/write_inventory_to_string, even though that is never4507 revision_ids, 'topological')
4526 # the format that is stored on disk. It *does* give us a4508 format = from_repo._format
4527 # single string representation for an inventory, so live with4509 invs_sent_so_far = set([_mod_revision.NULL_REVISION])
4528 # it for now.4510 inventory_cache = lru_cache.LRUCache(50)
4529 # This would be far better if we had a 'serialized inventory4511 null_inventory = from_repo.revision_tree(
4530 # delta' form. Then we could use 'inventory._make_delta', and4512 _mod_revision.NULL_REVISION).inventory
4531 # transmit that. This would both be faster to generate, and4513 # XXX: ideally the rich-root/tree-refs flags would be per-revision, not
4532 # result in fewer bytes-on-the-wire.4514 # per-repo (e.g. streaming a non-rich-root revision out of a rich-root
4533 as_bytes = from_serializer.write_inventory_to_string(inv)4515 # repo back into a non-rich-root repo ought to be allowed)
4516 serializer = inventory_delta.InventoryDeltaSerializer(
4517 versioned_root=format.rich_root_data,
4518 tree_references=format.supports_tree_reference)
4519 for inv in inventories:
4534 key = (inv.revision_id,)4520 key = (inv.revision_id,)
4535 parent_keys = parent_map.get(key, ())4521 parent_keys = parent_map.get(key, ())
4522 delta = None
4523 if not delta_versus_null and parent_keys:
4524 # The caller did not ask for complete inventories and we have
4525 # some parents that we can delta against. Make a delta against
4526 # each parent so that we can find the smallest.
4527 parent_ids = [parent_key[0] for parent_key in parent_keys]
4528 for parent_id in parent_ids:
4529 if parent_id not in invs_sent_so_far:
4530 # We don't know that the remote side has this basis, so
4531 # we can't use it.
4532 continue
4533 if parent_id == _mod_revision.NULL_REVISION:
4534 parent_inv = null_inventory
4535 else:
4536 parent_inv = inventory_cache.get(parent_id, None)
4537 if parent_inv is None:
4538 parent_inv = from_repo.get_inventory(parent_id)
4539 candidate_delta = inv._make_delta(parent_inv)
4540 if (delta is None or
4541 len(delta) > len(candidate_delta)):
4542 delta = candidate_delta
4543 basis_id = parent_id
4544 if delta is None:
4545 # Either none of the parents ended up being suitable, or we
4546 # were asked to delta against NULL
4547 basis_id = _mod_revision.NULL_REVISION
4548 delta = inv._make_delta(null_inventory)
4549 invs_sent_so_far.add(inv.revision_id)
4550 inventory_cache[inv.revision_id] = inv
4551 delta_serialized = ''.join(
4552 serializer.delta_to_lines(basis_id, key[-1], delta))
4536 yield versionedfile.FulltextContentFactory(4553 yield versionedfile.FulltextContentFactory(
4537 key, parent_keys, None, as_bytes)4554 key, parent_keys, None, delta_serialized)
45384555
45394556
4540def _iter_for_revno(repo, partial_history_cache, stop_index=None,4557def _iter_for_revno(repo, partial_history_cache, stop_index=None,
45414558
=== modified file 'bzrlib/smart/protocol.py'
--- bzrlib/smart/protocol.py 2009-07-17 01:48:56 +0000
+++ bzrlib/smart/protocol.py 2009-08-14 05:35:32 +0000
@@ -1209,6 +1209,8 @@
1209 except (KeyboardInterrupt, SystemExit):1209 except (KeyboardInterrupt, SystemExit):
1210 raise1210 raise
1211 except Exception:1211 except Exception:
1212 mutter('_iter_with_errors caught error')
1213 log_exception_quietly()
1212 yield sys.exc_info(), None1214 yield sys.exc_info(), None
1213 return1215 return
12141216
12151217
=== modified file 'bzrlib/smart/repository.py'
--- bzrlib/smart/repository.py 2009-06-16 06:46:32 +0000
+++ bzrlib/smart/repository.py 2009-08-14 05:35:32 +0000
@@ -30,6 +30,7 @@
30 graph,30 graph,
31 osutils,31 osutils,
32 pack,32 pack,
33 versionedfile,
33 )34 )
34from bzrlib.bzrdir import BzrDir35from bzrlib.bzrdir import BzrDir
35from bzrlib.smart.request import (36from bzrlib.smart.request import (
@@ -39,7 +40,10 @@
39 )40 )
40from bzrlib.repository import _strip_NULL_ghosts, network_format_registry41from bzrlib.repository import _strip_NULL_ghosts, network_format_registry
41from bzrlib import revision as _mod_revision42from bzrlib import revision as _mod_revision
42from bzrlib.versionedfile import NetworkRecordStream, record_to_fulltext_bytes43from bzrlib.versionedfile import (
44 NetworkRecordStream,
45 record_to_fulltext_bytes,
46 )
4347
4448
45class SmartServerRepositoryRequest(SmartServerRequest):49class SmartServerRepositoryRequest(SmartServerRequest):
@@ -414,8 +418,42 @@
414 repository.418 repository.
415 """419 """
416 self._to_format = network_format_registry.get(to_network_name)420 self._to_format = network_format_registry.get(to_network_name)
421 if self._should_fake_unknown():
422 return FailedSmartServerResponse(
423 ('UnknownMethod', 'Repository.get_stream'))
417 return None # Signal that we want a body.424 return None # Signal that we want a body.
418425
426 def _should_fake_unknown(self):
427 """Return True if we should return UnknownMethod to the client.
428
429 This is a workaround for bugs in pre-1.19 clients that claim to
430 support receiving streams of CHK repositories. The pre-1.19 client
431 expects inventory records to be serialized in the format defined by
432 to_network_name, but in pre-1.19 (at least) that format definition
433 tries to use the xml5 serializer, which does not correctly handle
434 rich-roots. After 1.19 the client can also accept inventory-deltas
435 (which avoids this issue), and those clients will use the
436 Repository.get_stream_1.19 verb instead of this one.
437 So: if this repository is CHK, and the to_format doesn't match,
438 we should just fake an UnknownSmartMethod error so that the client
439 will fallback to VFS, rather than sending it a stream we know it
440 cannot handle.
441 """
442 from_format = self._repository._format
443 to_format = self._to_format
444 if not from_format.supports_chks:
445 # Source not CHK: that's ok
446 return False
447 if (to_format.supports_chks and
448 from_format.repository_class is to_format.repository_class and
449 from_format._serializer == to_format._serializer):
450 # Source is CHK, but target matches: that's ok
451 # (e.g. 2a->2a, or CHK2->2a)
452 return False
453 # Source is CHK, and target is not CHK or incompatible CHK. We can't
454 # generate a compatible stream.
455 return True
456
419 def do_body(self, body_bytes):457 def do_body(self, body_bytes):
420 repository = self._repository458 repository = self._repository
421 repository.lock_read()459 repository.lock_read()
@@ -451,6 +489,13 @@
451 repository.unlock()489 repository.unlock()
452490
453491
492class SmartServerRepositoryGetStream_1_19(SmartServerRepositoryGetStream):
493
494 def _should_fake_unknown(self):
495 """Returns False; we don't need to workaround bugs in 1.19+ clients."""
496 return False
497
498
454def _stream_to_byte_stream(stream, src_format):499def _stream_to_byte_stream(stream, src_format):
455 """Convert a record stream to a self delimited byte stream."""500 """Convert a record stream to a self delimited byte stream."""
456 pack_writer = pack.ContainerSerialiser()501 pack_writer = pack.ContainerSerialiser()
@@ -460,6 +505,8 @@
460 for record in substream:505 for record in substream:
461 if record.storage_kind in ('chunked', 'fulltext'):506 if record.storage_kind in ('chunked', 'fulltext'):
462 serialised = record_to_fulltext_bytes(record)507 serialised = record_to_fulltext_bytes(record)
508 elif record.storage_kind == 'inventory-delta':
509 serialised = record_to_inventory_delta_bytes(record)
463 elif record.storage_kind == 'absent':510 elif record.storage_kind == 'absent':
464 raise ValueError("Absent factory for %s" % (record.key,))511 raise ValueError("Absent factory for %s" % (record.key,))
465 else:512 else:
@@ -650,6 +697,23 @@
650 return SuccessfulSmartServerResponse(('ok', ))697 return SuccessfulSmartServerResponse(('ok', ))
651698
652699
700class SmartServerRepositoryInsertStream_1_19(SmartServerRepositoryInsertStreamLocked):
701 """Insert a record stream from a RemoteSink into a repository.
702
703 Same as SmartServerRepositoryInsertStreamLocked, except:
704 - the lock token argument is optional
705 - servers that implement this verb accept 'inventory-delta' records in the
706 stream.
707
708 New in 1.19.
709 """
710
711 def do_repository_request(self, repository, resume_tokens, lock_token=None):
712 """StreamSink.insert_stream for a remote repository."""
713 SmartServerRepositoryInsertStreamLocked.do_repository_request(
714 self, repository, resume_tokens, lock_token)
715
716
653class SmartServerRepositoryInsertStream(SmartServerRepositoryInsertStreamLocked):717class SmartServerRepositoryInsertStream(SmartServerRepositoryInsertStreamLocked):
654 """Insert a record stream from a RemoteSink into an unlocked repository.718 """Insert a record stream from a RemoteSink into an unlocked repository.
655719
656720
=== modified file 'bzrlib/smart/request.py'
--- bzrlib/smart/request.py 2009-07-27 02:06:05 +0000
+++ bzrlib/smart/request.py 2009-08-14 05:35:32 +0000
@@ -553,6 +553,8 @@
553request_handlers.register_lazy(553request_handlers.register_lazy(
554 'Repository.insert_stream', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream')554 'Repository.insert_stream', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream')
555request_handlers.register_lazy(555request_handlers.register_lazy(
556 'Repository.insert_stream_1.19', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStream_1_19')
557request_handlers.register_lazy(
556 'Repository.insert_stream_locked', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStreamLocked')558 'Repository.insert_stream_locked', 'bzrlib.smart.repository', 'SmartServerRepositoryInsertStreamLocked')
557request_handlers.register_lazy(559request_handlers.register_lazy(
558 'Repository.is_shared', 'bzrlib.smart.repository', 'SmartServerRepositoryIsShared')560 'Repository.is_shared', 'bzrlib.smart.repository', 'SmartServerRepositoryIsShared')
@@ -570,6 +572,9 @@
570 'Repository.get_stream', 'bzrlib.smart.repository',572 'Repository.get_stream', 'bzrlib.smart.repository',
571 'SmartServerRepositoryGetStream')573 'SmartServerRepositoryGetStream')
572request_handlers.register_lazy(574request_handlers.register_lazy(
575 'Repository.get_stream_1.19', 'bzrlib.smart.repository',
576 'SmartServerRepositoryGetStream_1_19')
577request_handlers.register_lazy(
573 'Repository.tarball', 'bzrlib.smart.repository',578 'Repository.tarball', 'bzrlib.smart.repository',
574 'SmartServerRepositoryTarball')579 'SmartServerRepositoryTarball')
575request_handlers.register_lazy(580request_handlers.register_lazy(
576581
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2009-08-12 18:49:22 +0000
+++ bzrlib/tests/__init__.py 2009-08-14 05:35:32 +0000
@@ -1938,6 +1938,16 @@
1938 sio.encoding = output_encoding1938 sio.encoding = output_encoding
1939 return sio1939 return sio
19401940
1941 def disable_verb(self, verb):
1942 """Disable a smart server verb for one test."""
1943 from bzrlib.smart import request
1944 request_handlers = request.request_handlers
1945 orig_method = request_handlers.get(verb)
1946 request_handlers.remove(verb)
1947 def restoreVerb():
1948 request_handlers.register(verb, orig_method)
1949 self.addCleanup(restoreVerb)
1950
19411951
1942class CapturedCall(object):1952class CapturedCall(object):
1943 """A helper for capturing smart server calls for easy debug analysis."""1953 """A helper for capturing smart server calls for easy debug analysis."""
19441954
=== modified file 'bzrlib/tests/per_branch/test_push.py'
--- bzrlib/tests/per_branch/test_push.py 2009-08-05 02:30:59 +0000
+++ bzrlib/tests/per_branch/test_push.py 2009-08-14 05:35:32 +0000
@@ -261,14 +261,15 @@
261 self.assertFalse(local.is_locked())261 self.assertFalse(local.is_locked())
262 local.push(remote)262 local.push(remote)
263 hpss_call_names = [item.call.method for item in self.hpss_calls]263 hpss_call_names = [item.call.method for item in self.hpss_calls]
264 self.assertTrue('Repository.insert_stream' in hpss_call_names)264 self.assertTrue('Repository.insert_stream_1.19' in hpss_call_names)
265 insert_stream_idx = hpss_call_names.index('Repository.insert_stream')265 insert_stream_idx = hpss_call_names.index(
266 'Repository.insert_stream_1.19')
266 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]267 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
267 # After inserting the stream the client has no reason to query the268 # After inserting the stream the client has no reason to query the
268 # remote graph any further.269 # remote graph any further.
269 self.assertEqual(270 self.assertEqual(
270 ['Repository.insert_stream', 'Repository.insert_stream', 'get',271 ['Repository.insert_stream_1.19', 'Repository.insert_stream_1.19',
271 'Branch.set_last_revision_info', 'Branch.unlock'],272 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
272 calls_after_insert_stream)273 calls_after_insert_stream)
273274
274 def disableOptimisticGetParentMap(self):275 def disableOptimisticGetParentMap(self):
275276
=== modified file 'bzrlib/tests/per_interbranch/test_push.py'
--- bzrlib/tests/per_interbranch/test_push.py 2009-08-05 02:30:59 +0000
+++ bzrlib/tests/per_interbranch/test_push.py 2009-08-14 05:35:32 +0000
@@ -267,14 +267,15 @@
267 self.assertFalse(local.is_locked())267 self.assertFalse(local.is_locked())
268 local.push(remote)268 local.push(remote)
269 hpss_call_names = [item.call.method for item in self.hpss_calls]269 hpss_call_names = [item.call.method for item in self.hpss_calls]
270 self.assertTrue('Repository.insert_stream' in hpss_call_names)270 self.assertTrue('Repository.insert_stream_1.19' in hpss_call_names)
271 insert_stream_idx = hpss_call_names.index('Repository.insert_stream')271 insert_stream_idx = hpss_call_names.index(
272 'Repository.insert_stream_1.19')
272 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]273 calls_after_insert_stream = hpss_call_names[insert_stream_idx:]
273 # After inserting the stream the client has no reason to query the274 # After inserting the stream the client has no reason to query the
274 # remote graph any further.275 # remote graph any further.
275 self.assertEqual(276 self.assertEqual(
276 ['Repository.insert_stream', 'Repository.insert_stream', 'get',277 ['Repository.insert_stream_1.19', 'Repository.insert_stream_1.19',
277 'Branch.set_last_revision_info', 'Branch.unlock'],278 'get', 'Branch.set_last_revision_info', 'Branch.unlock'],
278 calls_after_insert_stream)279 calls_after_insert_stream)
279280
280 def disableOptimisticGetParentMap(self):281 def disableOptimisticGetParentMap(self):
281282
=== modified file 'bzrlib/tests/per_interrepository/__init__.py'
--- bzrlib/tests/per_interrepository/__init__.py 2009-08-11 21:02:46 +0000
+++ bzrlib/tests/per_interrepository/__init__.py 2009-08-14 05:35:33 +0000
@@ -32,8 +32,6 @@
32 )32 )
3333
34from bzrlib.repository import (34from bzrlib.repository import (
35 InterDifferingSerializer,
36 InterKnitRepo,
37 InterRepository,35 InterRepository,
38 )36 )
39from bzrlib.tests import (37from bzrlib.tests import (
@@ -48,18 +46,16 @@
48 """Transform the input formats to a list of scenarios.46 """Transform the input formats to a list of scenarios.
4947
50 :param formats: A list of tuples:48 :param formats: A list of tuples:
51 (interrepo_class, repository_format, repository_format_to).49 (label, repository_format, repository_format_to).
52 """50 """
53 result = []51 result = []
54 for interrepo_class, repository_format, repository_format_to in formats:52 for label, repository_format, repository_format_to in formats:
55 id = '%s,%s,%s' % (interrepo_class.__name__,53 id = '%s,%s,%s' % (label, repository_format.__class__.__name__,
56 repository_format.__class__.__name__,54 repository_format_to.__class__.__name__)
57 repository_format_to.__class__.__name__)
58 scenario = (id,55 scenario = (id,
59 {"transport_server": transport_server,56 {"transport_server": transport_server,
60 "transport_readonly_server": transport_readonly_server,57 "transport_readonly_server": transport_readonly_server,
61 "repository_format": repository_format,58 "repository_format": repository_format,
62 "interrepo_class": interrepo_class,
63 "repository_format_to": repository_format_to,59 "repository_format_to": repository_format_to,
64 })60 })
65 result.append(scenario)61 result.append(scenario)
@@ -75,6 +71,8 @@
75 weaverepo,71 weaverepo,
76 )72 )
77 result = []73 result = []
74 def add_combo(label, from_format, to_format):
75 result.append((label, from_format, to_format))
78 # test the default InterRepository between format 6 and the current76 # test the default InterRepository between format 6 and the current
79 # default format.77 # default format.
80 # XXX: robertc 20060220 reinstate this when there are two supported78 # XXX: robertc 20060220 reinstate this when there are two supported
@@ -85,40 +83,48 @@
85 for optimiser_class in InterRepository._optimisers:83 for optimiser_class in InterRepository._optimisers:
86 format_to_test = optimiser_class._get_repo_format_to_test()84 format_to_test = optimiser_class._get_repo_format_to_test()
87 if format_to_test is not None:85 if format_to_test is not None:
88 result.append((optimiser_class,86 add_combo(optimiser_class.__name__, format_to_test, format_to_test)
89 format_to_test, format_to_test))
90 # if there are specific combinations we want to use, we can add them87 # if there are specific combinations we want to use, we can add them
91 # here. We want to test rich root upgrading.88 # here. We want to test rich root upgrading.
92 result.append((InterRepository,89 # XXX: although we attach InterRepository class names to these scenarios,
93 weaverepo.RepositoryFormat5(),90 # there's nothing asserting that these labels correspond to what is
94 knitrepo.RepositoryFormatKnit3()))91 # actually used.
95 result.append((InterRepository,92 add_combo('InterRepository',
96 knitrepo.RepositoryFormatKnit1(),93 weaverepo.RepositoryFormat5(),
97 knitrepo.RepositoryFormatKnit3()))94 knitrepo.RepositoryFormatKnit3())
98 result.append((InterRepository,95 add_combo('InterRepository',
99 knitrepo.RepositoryFormatKnit1(),96 knitrepo.RepositoryFormatKnit1(),
100 knitrepo.RepositoryFormatKnit3()))97 knitrepo.RepositoryFormatKnit3())
101 result.append((InterKnitRepo,98 add_combo('InterKnitRepo',
102 knitrepo.RepositoryFormatKnit1(),99 knitrepo.RepositoryFormatKnit1(),
103 pack_repo.RepositoryFormatKnitPack1()))100 pack_repo.RepositoryFormatKnitPack1())
104 result.append((InterKnitRepo,101 add_combo('InterKnitRepo',
105 pack_repo.RepositoryFormatKnitPack1(),102 pack_repo.RepositoryFormatKnitPack1(),
106 knitrepo.RepositoryFormatKnit1()))103 knitrepo.RepositoryFormatKnit1())
107 result.append((InterKnitRepo,104 add_combo('InterKnitRepo',
108 knitrepo.RepositoryFormatKnit3(),105 knitrepo.RepositoryFormatKnit3(),
109 pack_repo.RepositoryFormatKnitPack3()))106 pack_repo.RepositoryFormatKnitPack3())
110 result.append((InterKnitRepo,107 add_combo('InterKnitRepo',
111 pack_repo.RepositoryFormatKnitPack3(),108 pack_repo.RepositoryFormatKnitPack3(),
112 knitrepo.RepositoryFormatKnit3()))109 knitrepo.RepositoryFormatKnit3())
113 result.append((InterKnitRepo,110 add_combo('InterKnitRepo',
114 pack_repo.RepositoryFormatKnitPack3(),111 pack_repo.RepositoryFormatKnitPack3(),
115 pack_repo.RepositoryFormatKnitPack4()))112 pack_repo.RepositoryFormatKnitPack4())
116 result.append((InterDifferingSerializer,113 add_combo('InterDifferingSerializer',
117 pack_repo.RepositoryFormatKnitPack1(),114 pack_repo.RepositoryFormatKnitPack1(),
118 pack_repo.RepositoryFormatKnitPack6RichRoot()))115 pack_repo.RepositoryFormatKnitPack6RichRoot())
119 result.append((InterDifferingSerializer,116 add_combo('InterDifferingSerializer',
120 pack_repo.RepositoryFormatKnitPack6RichRoot(),117 pack_repo.RepositoryFormatKnitPack6RichRoot(),
121 groupcompress_repo.RepositoryFormat2a()))118 groupcompress_repo.RepositoryFormat2a())
119 add_combo('InterDifferingSerializer',
120 groupcompress_repo.RepositoryFormat2a(),
121 pack_repo.RepositoryFormatKnitPack6RichRoot())
122 add_combo('InterRepository',
123 groupcompress_repo.RepositoryFormatCHK2(),
124 groupcompress_repo.RepositoryFormat2a())
125 add_combo('InterDifferingSerializer',
126 groupcompress_repo.RepositoryFormatCHK1(),
127 groupcompress_repo.RepositoryFormat2a())
122 return result128 return result
123129
124130
125131
=== modified file 'bzrlib/tests/per_interrepository/test_fetch.py'
--- bzrlib/tests/per_interrepository/test_fetch.py 2009-08-12 02:21:06 +0000
+++ bzrlib/tests/per_interrepository/test_fetch.py 2009-08-14 05:35:33 +0000
@@ -28,6 +28,9 @@
28from bzrlib.errors import (28from bzrlib.errors import (
29 NoSuchRevision,29 NoSuchRevision,
30 )30 )
31from bzrlib.graph import (
32 SearchResult,
33 )
31from bzrlib.revision import (34from bzrlib.revision import (
32 NULL_REVISION,35 NULL_REVISION,
33 Revision,36 Revision,
@@ -124,6 +127,15 @@
124 to_repo.texts.get_record_stream([('foo', revid)],127 to_repo.texts.get_record_stream([('foo', revid)],
125 'unordered', True).next().get_bytes_as('fulltext'))128 'unordered', True).next().get_bytes_as('fulltext'))
126129
130 def test_fetch_parent_inventories_at_stacking_boundary_smart(self):
131 self.setup_smart_server_with_call_log()
132 self.test_fetch_parent_inventories_at_stacking_boundary()
133
134 def test_fetch_parent_inventories_at_stacking_boundary_smart_old(self):
135 self.setup_smart_server_with_call_log()
136 self.disable_verb('Repository.insert_stream_1.19')
137 self.test_fetch_parent_inventories_at_stacking_boundary()
138
127 def test_fetch_parent_inventories_at_stacking_boundary(self):139 def test_fetch_parent_inventories_at_stacking_boundary(self):
128 """Fetch to a stacked branch copies inventories for parents of140 """Fetch to a stacked branch copies inventories for parents of
129 revisions at the stacking boundary.141 revisions at the stacking boundary.
@@ -180,6 +192,28 @@
180 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)192 self.assertEqual(left_tree.inventory, stacked_left_tree.inventory)
181 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)193 self.assertEqual(right_tree.inventory, stacked_right_tree.inventory)
182194
195 # Finally, it's not enough to see that the basis inventories are
196 # present. The texts introduced in merge (and only those) should be
197 # present, and also generating a stream should succeed without blowing
198 # up.
199 self.assertTrue(unstacked_repo.has_revision('merge'))
200 expected_texts = set([('file-id', 'merge')])
201 if stacked_branch.repository.texts.get_parent_map([('root-id',
202 'merge')]):
203 # If a (root-id,merge) text exists, it should be in the stacked
204 # repo.
205 expected_texts.add(('root-id', 'merge'))
206 self.assertEqual(expected_texts, unstacked_repo.texts.keys())
207 self.assertCanStreamRevision(unstacked_repo, 'merge')
208
209 def assertCanStreamRevision(self, repo, revision_id):
210 exclude_keys = set(repo.all_revision_ids()) - set([revision_id])
211 search = SearchResult([revision_id], exclude_keys, 1, [revision_id])
212 source = repo._get_source(repo._format)
213 for substream_kind, substream in source.get_stream(search):
214 # Consume the substream
215 list(substream)
216
183 def test_fetch_across_stacking_boundary_ignores_ghost(self):217 def test_fetch_across_stacking_boundary_ignores_ghost(self):
184 if not self.repository_format_to.supports_external_lookups:218 if not self.repository_format_to.supports_external_lookups:
185 raise TestNotApplicable("Need stacking support in the target.")219 raise TestNotApplicable("Need stacking support in the target.")
@@ -218,6 +252,19 @@
218 self.addCleanup(stacked_branch.unlock)252 self.addCleanup(stacked_branch.unlock)
219 stacked_second_tree = stacked_branch.repository.revision_tree('second')253 stacked_second_tree = stacked_branch.repository.revision_tree('second')
220 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)254 self.assertEqual(second_tree.inventory, stacked_second_tree.inventory)
255 # Finally, it's not enough to see that the basis inventories are
256 # present. The texts introduced in merge (and only those) should be
257 # present, and also generating a stream should succeed without blowing
258 # up.
259 self.assertTrue(unstacked_repo.has_revision('third'))
260 expected_texts = set([('file-id', 'third')])
261 if stacked_branch.repository.texts.get_parent_map([('root-id',
262 'third')]):
263 # If a (root-id,third) text exists, it should be in the stacked
264 # repo.
265 expected_texts.add(('root-id', 'third'))
266 self.assertEqual(expected_texts, unstacked_repo.texts.keys())
267 self.assertCanStreamRevision(unstacked_repo, 'third')
221268
222 def test_fetch_missing_basis_text(self):269 def test_fetch_missing_basis_text(self):
223 """If fetching a delta, we should die if a basis is not present."""270 """If fetching a delta, we should die if a basis is not present."""
224271
=== modified file 'bzrlib/tests/per_pack_repository.py'
--- bzrlib/tests/per_pack_repository.py 2009-08-12 22:28:28 +0000
+++ bzrlib/tests/per_pack_repository.py 2009-08-14 05:35:32 +0000
@@ -1051,8 +1051,8 @@
1051 tree.branch.push(remote_branch)1051 tree.branch.push(remote_branch)
1052 autopack_calls = len([call for call in self.hpss_calls if call ==1052 autopack_calls = len([call for call in self.hpss_calls if call ==
1053 'PackRepository.autopack'])1053 'PackRepository.autopack'])
1054 streaming_calls = len([call for call in self.hpss_calls if call ==1054 streaming_calls = len([call for call in self.hpss_calls if call in
1055 'Repository.insert_stream'])1055 ('Repository.insert_stream', 'Repository.insert_stream_1.19')])
1056 if autopack_calls:1056 if autopack_calls:
1057 # Non streaming server1057 # Non streaming server
1058 self.assertEqual(1, autopack_calls)1058 self.assertEqual(1, autopack_calls)
10591059
=== modified file 'bzrlib/tests/test_inventory_delta.py'
--- bzrlib/tests/test_inventory_delta.py 2009-04-02 05:53:12 +0000
+++ bzrlib/tests/test_inventory_delta.py 2009-08-14 05:35:32 +0000
@@ -26,6 +26,7 @@
26 inventory,26 inventory,
27 inventory_delta,27 inventory_delta,
28 )28 )
29from bzrlib.inventory_delta import InventoryDeltaError
29from bzrlib.inventory import Inventory30from bzrlib.inventory import Inventory
30from bzrlib.revision import NULL_REVISION31from bzrlib.revision import NULL_REVISION
31from bzrlib.tests import TestCase32from bzrlib.tests import TestCase
@@ -68,7 +69,7 @@
68version: entry-version69version: entry-version
69versioned_root: false70versioned_root: false
70tree_references: false71tree_references: false
71None\x00/\x00TREE_ROOT\x00\x00null:\x00dir72None\x00/\x00TREE_ROOT\x00\x00entry-version\x00dir
72"""73"""
7374
74reference_lines = """format: bzr inventory delta v1 (bzr 1.14)75reference_lines = """format: bzr inventory delta v1 (bzr 1.14)
@@ -93,39 +94,34 @@
93 """Test InventoryDeltaSerializer.parse_text_bytes."""94 """Test InventoryDeltaSerializer.parse_text_bytes."""
9495
95 def test_parse_no_bytes(self):96 def test_parse_no_bytes(self):
96 serializer = inventory_delta.InventoryDeltaSerializer(97 deserializer = inventory_delta.InventoryDeltaDeserializer()
97 versioned_root=True, tree_references=True)
98 err = self.assertRaises(98 err = self.assertRaises(
99 errors.BzrError, serializer.parse_text_bytes, '')99 InventoryDeltaError, deserializer.parse_text_bytes, '')
100 self.assertContainsRe(str(err), 'unknown format')100 self.assertContainsRe(str(err), 'last line not empty')
101101
102 def test_parse_bad_format(self):102 def test_parse_bad_format(self):
103 serializer = inventory_delta.InventoryDeltaSerializer(103 deserializer = inventory_delta.InventoryDeltaDeserializer()
104 versioned_root=True, tree_references=True)104 err = self.assertRaises(InventoryDeltaError,
105 err = self.assertRaises(errors.BzrError,105 deserializer.parse_text_bytes, 'format: foo\n')
106 serializer.parse_text_bytes, 'format: foo\n')
107 self.assertContainsRe(str(err), 'unknown format')106 self.assertContainsRe(str(err), 'unknown format')
108107
109 def test_parse_no_parent(self):108 def test_parse_no_parent(self):
110 serializer = inventory_delta.InventoryDeltaSerializer(109 deserializer = inventory_delta.InventoryDeltaDeserializer()
111 versioned_root=True, tree_references=True)110 err = self.assertRaises(InventoryDeltaError,
112 err = self.assertRaises(errors.BzrError,111 deserializer.parse_text_bytes,
113 serializer.parse_text_bytes,
114 'format: bzr inventory delta v1 (bzr 1.14)\n')112 'format: bzr inventory delta v1 (bzr 1.14)\n')
115 self.assertContainsRe(str(err), 'missing parent: marker')113 self.assertContainsRe(str(err), 'missing parent: marker')
116114
117 def test_parse_no_version(self):115 def test_parse_no_version(self):
118 serializer = inventory_delta.InventoryDeltaSerializer(116 deserializer = inventory_delta.InventoryDeltaDeserializer()
119 versioned_root=True, tree_references=True)117 err = self.assertRaises(InventoryDeltaError,
120 err = self.assertRaises(errors.BzrError,118 deserializer.parse_text_bytes,
121 serializer.parse_text_bytes,
122 'format: bzr inventory delta v1 (bzr 1.14)\n'119 'format: bzr inventory delta v1 (bzr 1.14)\n'
123 'parent: null:\n')120 'parent: null:\n')
124 self.assertContainsRe(str(err), 'missing version: marker')121 self.assertContainsRe(str(err), 'missing version: marker')
125 122
126 def test_parse_duplicate_key_errors(self):123 def test_parse_duplicate_key_errors(self):
127 serializer = inventory_delta.InventoryDeltaSerializer(124 deserializer = inventory_delta.InventoryDeltaDeserializer()
128 versioned_root=True, tree_references=True)
129 double_root_lines = \125 double_root_lines = \
130"""format: bzr inventory delta v1 (bzr 1.14)126"""format: bzr inventory delta v1 (bzr 1.14)
131parent: null:127parent: null:
@@ -135,24 +131,23 @@
135None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00131None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
136None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00132None\x00/\x00an-id\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
137"""133"""
138 err = self.assertRaises(errors.BzrError,134 err = self.assertRaises(InventoryDeltaError,
139 serializer.parse_text_bytes, double_root_lines)135 deserializer.parse_text_bytes, double_root_lines)
140 self.assertContainsRe(str(err), 'duplicate file id')136 self.assertContainsRe(str(err), 'duplicate file id')
141137
142 def test_parse_versioned_root_only(self):138 def test_parse_versioned_root_only(self):
143 serializer = inventory_delta.InventoryDeltaSerializer(139 deserializer = inventory_delta.InventoryDeltaDeserializer()
144 versioned_root=True, tree_references=True)140 parse_result = deserializer.parse_text_bytes(root_only_lines)
145 parse_result = serializer.parse_text_bytes(root_only_lines)
146 expected_entry = inventory.make_entry(141 expected_entry = inventory.make_entry(
147 'directory', u'', None, 'an-id')142 'directory', u'', None, 'an-id')
148 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'143 expected_entry.revision = 'a@e\xc3\xa5ample.com--2004'
149 self.assertEqual(144 self.assertEqual(
150 ('null:', 'entry-version', [(None, '/', 'an-id', expected_entry)]),145 ('null:', 'entry-version', True, True,
146 [(None, '', 'an-id', expected_entry)]),
151 parse_result)147 parse_result)
152148
153 def test_parse_special_revid_not_valid_last_mod(self):149 def test_parse_special_revid_not_valid_last_mod(self):
154 serializer = inventory_delta.InventoryDeltaSerializer(150 deserializer = inventory_delta.InventoryDeltaDeserializer()
155 versioned_root=False, tree_references=True)
156 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)151 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
157parent: null:152parent: null:
158version: null:153version: null:
@@ -160,13 +155,12 @@
160tree_references: true155tree_references: true
161None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00156None\x00/\x00TREE_ROOT\x00\x00null:\x00dir\x00\x00
162"""157"""
163 err = self.assertRaises(errors.BzrError,158 err = self.assertRaises(InventoryDeltaError,
164 serializer.parse_text_bytes, root_only_lines)159 deserializer.parse_text_bytes, root_only_lines)
165 self.assertContainsRe(str(err), 'special revisionid found')160 self.assertContainsRe(str(err), 'special revisionid found')
166161
167 def test_parse_versioned_root_versioned_disabled(self):162 def test_parse_versioned_root_versioned_disabled(self):
168 serializer = inventory_delta.InventoryDeltaSerializer(163 deserializer = inventory_delta.InventoryDeltaDeserializer()
169 versioned_root=False, tree_references=True)
170 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)164 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
171parent: null:165parent: null:
172version: null:166version: null:
@@ -174,39 +168,134 @@
174tree_references: true168tree_references: true
175None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00169None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir\x00\x00
176"""170"""
177 err = self.assertRaises(errors.BzrError,171 err = self.assertRaises(InventoryDeltaError,
178 serializer.parse_text_bytes, root_only_lines)172 deserializer.parse_text_bytes, root_only_lines)
179 self.assertContainsRe(str(err), 'Versioned root found')173 self.assertContainsRe(str(err), 'Versioned root found')
180174
181 def test_parse_unique_root_id_root_versioned_disabled(self):175 def test_parse_unique_root_id_root_versioned_disabled(self):
182 serializer = inventory_delta.InventoryDeltaSerializer(176 deserializer = inventory_delta.InventoryDeltaDeserializer()
183 versioned_root=False, tree_references=True)
184 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)177 root_only_lines = """format: bzr inventory delta v1 (bzr 1.14)
185parent: null:178parent: parent-id
186version: null:179version: a@e\xc3\xa5ample.com--2004
187versioned_root: false180versioned_root: false
188tree_references: true181tree_references: true
189None\x00/\x00an-id\x00\x00null:\x00dir\x00\x00182None\x00/\x00an-id\x00\x00parent-id\x00dir\x00\x00
190"""183"""
191 err = self.assertRaises(errors.BzrError,184 err = self.assertRaises(InventoryDeltaError,
192 serializer.parse_text_bytes, root_only_lines)185 deserializer.parse_text_bytes, root_only_lines)
193 self.assertContainsRe(str(err), 'Versioned root found')186 self.assertContainsRe(str(err), 'Versioned root found')
194187
195 def test_parse_unversioned_root_versioning_enabled(self):188 def test_parse_unversioned_root_versioning_enabled(self):
196 serializer = inventory_delta.InventoryDeltaSerializer(189 deserializer = inventory_delta.InventoryDeltaDeserializer()
197 versioned_root=True, tree_references=True)190 parse_result = deserializer.parse_text_bytes(root_only_unversioned)
198 err = self.assertRaises(errors.BzrError,191 expected_entry = inventory.make_entry(
199 serializer.parse_text_bytes, root_only_unversioned)192 'directory', u'', None, 'TREE_ROOT')
200 self.assertContainsRe(193 expected_entry.revision = 'entry-version'
201 str(err), 'serialized versioned_root flag is wrong: False')194 self.assertEqual(
195 ('null:', 'entry-version', False, False,
196 [(None, u'', 'TREE_ROOT', expected_entry)]),
197 parse_result)
198
199 def test_parse_versioned_root_when_disabled(self):
200 deserializer = inventory_delta.InventoryDeltaDeserializer(
201 allow_versioned_root=False)
202 err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
203 deserializer.parse_text_bytes, root_only_lines)
204 self.assertEquals("versioned_root not allowed", str(err))
202205
203 def test_parse_tree_when_disabled(self):206 def test_parse_tree_when_disabled(self):
204 serializer = inventory_delta.InventoryDeltaSerializer(207 deserializer = inventory_delta.InventoryDeltaDeserializer(
205 versioned_root=True, tree_references=False)208 allow_tree_references=False)
206 err = self.assertRaises(errors.BzrError,209 err = self.assertRaises(inventory_delta.IncompatibleInventoryDelta,
207 serializer.parse_text_bytes, reference_lines)210 deserializer.parse_text_bytes, reference_lines)
208 self.assertContainsRe(211 self.assertEquals("Tree reference not allowed", str(err))
209 str(err), 'serialized tree_references flag is wrong: True')212
213 def test_parse_tree_when_header_disallows(self):
214 # A deserializer that allows tree_references to be set or unset.
215 deserializer = inventory_delta.InventoryDeltaDeserializer()
216 # A serialised inventory delta with a header saying no tree refs, but
217 # that has a tree ref in its content.
218 lines = """format: bzr inventory delta v1 (bzr 1.14)
219parent: null:
220version: entry-version
221versioned_root: false
222tree_references: false
223None\x00/foo\x00id\x00TREE_ROOT\x00changed\x00tree\x00subtree-version
224"""
225 err = self.assertRaises(InventoryDeltaError,
226 deserializer.parse_text_bytes, lines)
227 self.assertContainsRe(str(err), 'Tree reference found')
228
229 def test_parse_versioned_root_when_header_disallows(self):
230 # A deserializer that allows tree_references to be set or unset.
231 deserializer = inventory_delta.InventoryDeltaDeserializer()
232 # A serialised inventory delta with a header saying no tree refs, but
233 # that has a tree ref in its content.
234 lines = """format: bzr inventory delta v1 (bzr 1.14)
235parent: null:
236version: entry-version
237versioned_root: false
238tree_references: false
239None\x00/\x00TREE_ROOT\x00\x00a@e\xc3\xa5ample.com--2004\x00dir
240"""
241 err = self.assertRaises(InventoryDeltaError,
242 deserializer.parse_text_bytes, lines)
243 self.assertContainsRe(str(err), 'Versioned root found')
244
245 def test_parse_last_line_not_empty(self):
246 """newpath must start with / if it is not None."""
247 # Trim the trailing newline from a valid serialization
248 lines = root_only_lines[:-1]
249 deserializer = inventory_delta.InventoryDeltaDeserializer()
250 err = self.assertRaises(InventoryDeltaError,
251 deserializer.parse_text_bytes, lines)
252 self.assertContainsRe(str(err), 'last line not empty')
253
254 def test_parse_invalid_newpath(self):
255 """newpath must start with / if it is not None."""
256 lines = empty_lines
257 lines += "None\x00bad\x00TREE_ROOT\x00\x00version\x00dir\n"
258 deserializer = inventory_delta.InventoryDeltaDeserializer()
259 err = self.assertRaises(InventoryDeltaError,
260 deserializer.parse_text_bytes, lines)
261 self.assertContainsRe(str(err), 'newpath invalid')
262
263 def test_parse_invalid_oldpath(self):
264 """oldpath must start with / if it is not None."""
265 lines = root_only_lines
266 lines += "bad\x00/new\x00file-id\x00\x00version\x00dir\n"
267 deserializer = inventory_delta.InventoryDeltaDeserializer()
268 err = self.assertRaises(InventoryDeltaError,
269 deserializer.parse_text_bytes, lines)
270 self.assertContainsRe(str(err), 'oldpath invalid')
271
272 def test_parse_new_file(self):
273 """a new file is parsed correctly"""
274 lines = root_only_lines
275 fake_sha = "deadbeef" * 5
276 lines += (
277 "None\x00/new\x00file-id\x00an-id\x00version\x00file\x00123\x00" +
278 "\x00" + fake_sha + "\n")
279 deserializer = inventory_delta.InventoryDeltaDeserializer()
280 parse_result = deserializer.parse_text_bytes(lines)
281 expected_entry = inventory.make_entry(
282 'file', u'new', 'an-id', 'file-id')
283 expected_entry.revision = 'version'
284 expected_entry.text_size = 123
285 expected_entry.text_sha1 = fake_sha
286 delta = parse_result[4]
287 self.assertEqual(
288 (None, u'new', 'file-id', expected_entry), delta[-1])
289
290 def test_parse_delete(self):
291 lines = root_only_lines
292 lines += (
293 "/old-file\x00None\x00deleted-id\x00\x00null:\x00deleted\x00\x00\n")
294 deserializer = inventory_delta.InventoryDeltaDeserializer()
295 parse_result = deserializer.parse_text_bytes(lines)
296 delta = parse_result[4]
297 self.assertEqual(
298 (u'old-file', None, 'deleted-id', None), delta[-1])
210299
211300
212class TestSerialization(TestCase):301class TestSerialization(TestCase):
@@ -237,12 +326,20 @@
237 old_inv = Inventory(None)326 old_inv = Inventory(None)
238 new_inv = Inventory(None)327 new_inv = Inventory(None)
239 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')328 root = new_inv.make_entry('directory', '', None, 'TREE_ROOT')
329 # Implicit roots are considered modified in every revision.
330 root.revision = 'entry-version'
240 new_inv.add(root)331 new_inv.add(root)
241 delta = new_inv._make_delta(old_inv)332 delta = new_inv._make_delta(old_inv)
242 serializer = inventory_delta.InventoryDeltaSerializer(333 serializer = inventory_delta.InventoryDeltaSerializer(
243 versioned_root=False, tree_references=False)334 versioned_root=False, tree_references=False)
335 serialized_lines = serializer.delta_to_lines(
336 NULL_REVISION, 'entry-version', delta)
244 self.assertEqual(StringIO(root_only_unversioned).readlines(),337 self.assertEqual(StringIO(root_only_unversioned).readlines(),
245 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))338 serialized_lines)
339 deserializer = inventory_delta.InventoryDeltaDeserializer()
340 self.assertEqual(
341 (NULL_REVISION, 'entry-version', False, False, delta),
342 deserializer.parse_text_bytes(''.join(serialized_lines)))
246343
247 def test_unversioned_non_root_errors(self):344 def test_unversioned_non_root_errors(self):
248 old_inv = Inventory(None)345 old_inv = Inventory(None)
@@ -255,7 +352,7 @@
255 delta = new_inv._make_delta(old_inv)352 delta = new_inv._make_delta(old_inv)
256 serializer = inventory_delta.InventoryDeltaSerializer(353 serializer = inventory_delta.InventoryDeltaSerializer(
257 versioned_root=True, tree_references=True)354 versioned_root=True, tree_references=True)
258 err = self.assertRaises(errors.BzrError,355 err = self.assertRaises(InventoryDeltaError,
259 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)356 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
260 self.assertEqual(str(err), 'no version for fileid id')357 self.assertEqual(str(err), 'no version for fileid id')
261358
@@ -267,7 +364,7 @@
267 delta = new_inv._make_delta(old_inv)364 delta = new_inv._make_delta(old_inv)
268 serializer = inventory_delta.InventoryDeltaSerializer(365 serializer = inventory_delta.InventoryDeltaSerializer(
269 versioned_root=True, tree_references=True)366 versioned_root=True, tree_references=True)
270 err = self.assertRaises(errors.BzrError,367 err = self.assertRaises(InventoryDeltaError,
271 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)368 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
272 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')369 self.assertEqual(str(err), 'no version for fileid TREE_ROOT')
273370
@@ -280,22 +377,9 @@
280 delta = new_inv._make_delta(old_inv)377 delta = new_inv._make_delta(old_inv)
281 serializer = inventory_delta.InventoryDeltaSerializer(378 serializer = inventory_delta.InventoryDeltaSerializer(
282 versioned_root=False, tree_references=True)379 versioned_root=False, tree_references=True)
283 err = self.assertRaises(errors.BzrError,380 err = self.assertRaises(InventoryDeltaError,
284 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)381 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
285 self.assertEqual(str(err), 'Version present for / in TREE_ROOT')382 self.assertStartsWith(str(err), 'Version present for / in TREE_ROOT')
286
287 def test_nonrichroot_non_TREE_ROOT_id_errors(self):
288 old_inv = Inventory(None)
289 new_inv = Inventory(None)
290 root = new_inv.make_entry('directory', '', None, 'my-rich-root-id')
291 new_inv.add(root)
292 delta = new_inv._make_delta(old_inv)
293 serializer = inventory_delta.InventoryDeltaSerializer(
294 versioned_root=False, tree_references=True)
295 err = self.assertRaises(errors.BzrError,
296 serializer.delta_to_lines, NULL_REVISION, 'entry-version', delta)
297 self.assertEqual(
298 str(err), 'file_id my-rich-root-id is not TREE_ROOT for /')
299383
300 def test_unknown_kind_errors(self):384 def test_unknown_kind_errors(self):
301 old_inv = Inventory(None)385 old_inv = Inventory(None)
@@ -354,19 +438,22 @@
354 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))438 serializer.delta_to_lines(NULL_REVISION, 'entry-version', delta))
355439
356 def test_to_inventory_root_id_versioned_not_permitted(self):440 def test_to_inventory_root_id_versioned_not_permitted(self):
357 delta = [(None, '/', 'TREE_ROOT', inventory.make_entry(441 root_entry = inventory.make_entry('directory', '', None, 'TREE_ROOT')
358 'directory', '', None, 'TREE_ROOT'))]442 root_entry.revision = 'some-version'
359 serializer = inventory_delta.InventoryDeltaSerializer(False, True)443 delta = [(None, '', 'TREE_ROOT', root_entry)]
444 serializer = inventory_delta.InventoryDeltaSerializer(
445 versioned_root=False, tree_references=True)
360 self.assertRaises(446 self.assertRaises(
361 errors.BzrError, serializer.delta_to_lines, 'old-version',447 InventoryDeltaError, serializer.delta_to_lines, 'old-version',
362 'new-version', delta)448 'new-version', delta)
363449
364 def test_to_inventory_root_id_not_versioned(self):450 def test_to_inventory_root_id_not_versioned(self):
365 delta = [(None, '/', 'an-id', inventory.make_entry(451 delta = [(None, '', 'an-id', inventory.make_entry(
366 'directory', '', None, 'an-id'))]452 'directory', '', None, 'an-id'))]
367 serializer = inventory_delta.InventoryDeltaSerializer(True, True)453 serializer = inventory_delta.InventoryDeltaSerializer(
454 versioned_root=True, tree_references=True)
368 self.assertRaises(455 self.assertRaises(
369 errors.BzrError, serializer.delta_to_lines, 'old-version',456 InventoryDeltaError, serializer.delta_to_lines, 'old-version',
370 'new-version', delta)457 'new-version', delta)
371458
372 def test_to_inventory_has_tree_not_meant_to(self):459 def test_to_inventory_has_tree_not_meant_to(self):
@@ -374,13 +461,14 @@
374 tree_ref = make_entry('tree-reference', 'foo', 'changed-in', 'ref-id')461 tree_ref = make_entry('tree-reference', 'foo', 'changed-in', 'ref-id')
375 tree_ref.reference_revision = 'ref-revision'462 tree_ref.reference_revision = 'ref-revision'
376 delta = [463 delta = [
377 (None, '/', 'an-id',464 (None, '', 'an-id',
378 make_entry('directory', '', 'changed-in', 'an-id')),465 make_entry('directory', '', 'changed-in', 'an-id')),
379 (None, '/foo', 'ref-id', tree_ref)466 (None, 'foo', 'ref-id', tree_ref)
380 # a file that followed the root move467 # a file that followed the root move
381 ]468 ]
382 serializer = inventory_delta.InventoryDeltaSerializer(True, True)469 serializer = inventory_delta.InventoryDeltaSerializer(
383 self.assertRaises(errors.BzrError, serializer.delta_to_lines,470 versioned_root=True, tree_references=True)
471 self.assertRaises(InventoryDeltaError, serializer.delta_to_lines,
384 'old-version', 'new-version', delta)472 'old-version', 'new-version', delta)
385473
386 def test_to_inventory_torture(self):474 def test_to_inventory_torture(self):
@@ -430,7 +518,8 @@
430 executable=True, text_size=30, text_sha1='some-sha',518 executable=True, text_size=30, text_sha1='some-sha',
431 revision='old-rev')),519 revision='old-rev')),
432 ]520 ]
433 serializer = inventory_delta.InventoryDeltaSerializer(True, True)521 serializer = inventory_delta.InventoryDeltaSerializer(
522 versioned_root=True, tree_references=True)
434 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)523 lines = serializer.delta_to_lines(NULL_REVISION, 'something', delta)
435 expected = """format: bzr inventory delta v1 (bzr 1.14)524 expected = """format: bzr inventory delta v1 (bzr 1.14)
436parent: null:525parent: null:
@@ -483,13 +572,13 @@
483 def test_file_without_size(self):572 def test_file_without_size(self):
484 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')573 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
485 file_entry.text_sha1 = 'foo'574 file_entry.text_sha1 = 'foo'
486 self.assertRaises(errors.BzrError,575 self.assertRaises(InventoryDeltaError,
487 inventory_delta._file_content, file_entry)576 inventory_delta._file_content, file_entry)
488577
489 def test_file_without_sha1(self):578 def test_file_without_sha1(self):
490 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')579 file_entry = inventory.make_entry('file', 'a file', None, 'file-id')
491 file_entry.text_size = 10580 file_entry.text_size = 10
492 self.assertRaises(errors.BzrError,581 self.assertRaises(InventoryDeltaError,
493 inventory_delta._file_content, file_entry)582 inventory_delta._file_content, file_entry)
494583
495 def test_link_empty_target(self):584 def test_link_empty_target(self):
@@ -512,7 +601,7 @@
512601
513 def test_link_no_target(self):602 def test_link_no_target(self):
514 entry = inventory.make_entry('symlink', 'a link', None)603 entry = inventory.make_entry('symlink', 'a link', None)
515 self.assertRaises(errors.BzrError,604 self.assertRaises(InventoryDeltaError,
516 inventory_delta._link_content, entry)605 inventory_delta._link_content, entry)
517606
518 def test_reference_null(self):607 def test_reference_null(self):
@@ -529,5 +618,5 @@
529618
530 def test_reference_no_reference(self):619 def test_reference_no_reference(self):
531 entry = inventory.make_entry('tree-reference', 'a tree', None)620 entry = inventory.make_entry('tree-reference', 'a tree', None)
532 self.assertRaises(errors.BzrError,621 self.assertRaises(InventoryDeltaError,
533 inventory_delta._reference_content, entry)622 inventory_delta._reference_content, entry)
534623
=== modified file 'bzrlib/tests/test_remote.py'
--- bzrlib/tests/test_remote.py 2009-08-11 05:26:57 +0000
+++ bzrlib/tests/test_remote.py 2009-08-14 05:35:32 +0000
@@ -31,6 +31,8 @@
31 config,31 config,
32 errors,32 errors,
33 graph,33 graph,
34 inventory,
35 inventory_delta,
34 pack,36 pack,
35 remote,37 remote,
36 repository,38 repository,
@@ -38,6 +40,7 @@
38 tests,40 tests,
39 treebuilder,41 treebuilder,
40 urlutils,42 urlutils,
43 versionedfile,
41 )44 )
42from bzrlib.branch import Branch45from bzrlib.branch import Branch
43from bzrlib.bzrdir import BzrDir, BzrDirFormat46from bzrlib.bzrdir import BzrDir, BzrDirFormat
@@ -332,15 +335,6 @@
332 reference_bzrdir_format = bzrdir.format_registry.get('default')()335 reference_bzrdir_format = bzrdir.format_registry.get('default')()
333 return reference_bzrdir_format.repository_format336 return reference_bzrdir_format.repository_format
334337
335 def disable_verb(self, verb):
336 """Disable a verb for one test."""
337 request_handlers = smart.request.request_handlers
338 orig_method = request_handlers.get(verb)
339 request_handlers.remove(verb)
340 def restoreVerb():
341 request_handlers.register(verb, orig_method)
342 self.addCleanup(restoreVerb)
343
344 def assertFinished(self, fake_client):338 def assertFinished(self, fake_client):
345 """Assert that all of a FakeClient's expected calls have occurred."""339 """Assert that all of a FakeClient's expected calls have occurred."""
346 fake_client.finished_test()340 fake_client.finished_test()
@@ -2219,63 +2213,219 @@
2219 self.assertEqual([], client._calls)2213 self.assertEqual([], client._calls)
22202214
22212215
2222class TestRepositoryInsertStream(TestRemoteRepository):2216class TestRepositoryInsertStreamBase(TestRemoteRepository):
22232217 """Base class for Repository.insert_stream and .insert_stream_1.19
2224 def test_unlocked_repo(self):2218 tests.
2225 transport_path = 'quack'2219 """
2226 repo, client = self.setup_fake_client_and_repository(transport_path)2220
2227 client.add_expected_call(2221 def checkInsertEmptyStream(self, repo, client):
2228 'Repository.insert_stream', ('quack/', ''),2222 """Insert an empty stream, checking the result.
2229 'success', ('ok',))2223
2230 client.add_expected_call(2224 This checks that there are no resume_tokens or missing_keys, and that
2231 'Repository.insert_stream', ('quack/', ''),2225 the client is finished.
2232 'success', ('ok',))2226 """
2233 sink = repo._get_sink()2227 sink = repo._get_sink()
2234 fmt = repository.RepositoryFormat.get_default_format()2228 fmt = repository.RepositoryFormat.get_default_format()
2235 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])2229 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])
2236 self.assertEqual([], resume_tokens)2230 self.assertEqual([], resume_tokens)
2237 self.assertEqual(set(), missing_keys)2231 self.assertEqual(set(), missing_keys)
2238 self.assertFinished(client)2232 self.assertFinished(client)
22392233
2240 def test_locked_repo_with_no_lock_token(self):2234
2241 transport_path = 'quack'2235class TestRepositoryInsertStream(TestRepositoryInsertStreamBase):
2242 repo, client = self.setup_fake_client_and_repository(transport_path)2236 """Tests for using Repository.insert_stream verb when the _1.19 variant is
2243 client.add_expected_call(2237 not available.
2244 'Repository.lock_write', ('quack/', ''),2238
2245 'success', ('ok', ''))2239 This test case is very similar to TestRepositoryInsertStream_1_19.
2246 client.add_expected_call(2240 """
2247 'Repository.insert_stream', ('quack/', ''),2241
2248 'success', ('ok',))2242 def setUp(self):
2249 client.add_expected_call(2243 TestRemoteRepository.setUp(self)
2250 'Repository.insert_stream', ('quack/', ''),2244 self.disable_verb('Repository.insert_stream_1.19')
2251 'success', ('ok',))2245
2252 repo.lock_write()2246 def test_unlocked_repo(self):
2253 sink = repo._get_sink()2247 transport_path = 'quack'
2254 fmt = repository.RepositoryFormat.get_default_format()2248 repo, client = self.setup_fake_client_and_repository(transport_path)
2255 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])2249 client.add_expected_call(
2256 self.assertEqual([], resume_tokens)2250 'Repository.insert_stream_1.19', ('quack/', ''),
2257 self.assertEqual(set(), missing_keys)2251 'unknown', ('Repository.insert_stream_1.19',))
2258 self.assertFinished(client)2252 client.add_expected_call(
22592253 'Repository.insert_stream', ('quack/', ''),
2260 def test_locked_repo_with_lock_token(self):2254 'success', ('ok',))
2261 transport_path = 'quack'2255 client.add_expected_call(
2262 repo, client = self.setup_fake_client_and_repository(transport_path)2256 'Repository.insert_stream', ('quack/', ''),
2263 client.add_expected_call(2257 'success', ('ok',))
2264 'Repository.lock_write', ('quack/', ''),2258 self.checkInsertEmptyStream(repo, client)
2265 'success', ('ok', 'a token'))2259
2266 client.add_expected_call(2260 def test_locked_repo_with_no_lock_token(self):
2267 'Repository.insert_stream_locked', ('quack/', '', 'a token'),2261 transport_path = 'quack'
2268 'success', ('ok',))2262 repo, client = self.setup_fake_client_and_repository(transport_path)
2269 client.add_expected_call(2263 client.add_expected_call(
2270 'Repository.insert_stream_locked', ('quack/', '', 'a token'),2264 'Repository.lock_write', ('quack/', ''),
2271 'success', ('ok',))2265 'success', ('ok', ''))
2272 repo.lock_write()2266 client.add_expected_call(
2273 sink = repo._get_sink()2267 'Repository.insert_stream_1.19', ('quack/', ''),
2274 fmt = repository.RepositoryFormat.get_default_format()2268 'unknown', ('Repository.insert_stream_1.19',))
2275 resume_tokens, missing_keys = sink.insert_stream([], fmt, [])2269 client.add_expected_call(
2276 self.assertEqual([], resume_tokens)2270 'Repository.insert_stream', ('quack/', ''),
2277 self.assertEqual(set(), missing_keys)2271 'success', ('ok',))
2278 self.assertFinished(client)2272 client.add_expected_call(
2273 'Repository.insert_stream', ('quack/', ''),
2274 'success', ('ok',))
2275 repo.lock_write()
2276 self.checkInsertEmptyStream(repo, client)
2277
2278 def test_locked_repo_with_lock_token(self):
2279 transport_path = 'quack'
2280 repo, client = self.setup_fake_client_and_repository(transport_path)
2281 client.add_expected_call(
2282 'Repository.lock_write', ('quack/', ''),
2283 'success', ('ok', 'a token'))
2284 client.add_expected_call(
2285 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2286 'unknown', ('Repository.insert_stream_1.19',))
2287 client.add_expected_call(
2288 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2289 'success', ('ok',))
2290 client.add_expected_call(
2291 'Repository.insert_stream_locked', ('quack/', '', 'a token'),
2292 'success', ('ok',))
2293 repo.lock_write()
2294 self.checkInsertEmptyStream(repo, client)
2295
2296 def test_stream_with_inventory_deltas(self):
2297 """'inventory-deltas' substreams cannot be sent to the
2298 Repository.insert_stream verb, because not all servers that implement
2299 that verb will accept them. So when one is encountered the RemoteSink
2300 immediately stops using that verb and falls back to VFS insert_stream.
2301 """
2302 transport_path = 'quack'
2303 repo, client = self.setup_fake_client_and_repository(transport_path)
2304 client.add_expected_call(
2305 'Repository.insert_stream_1.19', ('quack/', ''),
2306 'unknown', ('Repository.insert_stream_1.19',))
2307 client.add_expected_call(
2308 'Repository.insert_stream', ('quack/', ''),
2309 'success', ('ok',))
2310 client.add_expected_call(
2311 'Repository.insert_stream', ('quack/', ''),
2312 'success', ('ok',))
2313 # Create a fake real repository for insert_stream to fall back on, so
2314 # that we can directly see the records the RemoteSink passes to the
2315 # real sink.
2316 class FakeRealSink:
2317 def __init__(self):
2318 self.records = []
2319 def insert_stream(self, stream, src_format, resume_tokens):
2320 for substream_kind, substream in stream:
2321 self.records.append(
2322 (substream_kind, [record.key for record in substream]))
2323 return ['fake tokens'], ['fake missing keys']
2324 fake_real_sink = FakeRealSink()
2325 class FakeRealRepository:
2326 def _get_sink(self):
2327 return fake_real_sink
2328 repo._real_repository = FakeRealRepository()
2329 sink = repo._get_sink()
2330 fmt = repository.RepositoryFormat.get_default_format()
2331 stream = self.make_stream_with_inv_deltas(fmt)
2332 resume_tokens, missing_keys = sink.insert_stream(stream, fmt, [])
2333 # Every record from the first inventory delta should have been sent to
2334 # the VFS sink.
2335 expected_records = [
2336 ('inventory-deltas', [('rev2',), ('rev3',)]),
2337 ('texts', [('some-rev', 'some-file')])]
2338 self.assertEqual(expected_records, fake_real_sink.records)
2339 # The return values from the real sink's insert_stream are propagated
2340 # back to the original caller.
2341 self.assertEqual(['fake tokens'], resume_tokens)
2342 self.assertEqual(['fake missing keys'], missing_keys)
2343 self.assertFinished(client)
2344
2345 def make_stream_with_inv_deltas(self, fmt):
2346 """Make a simple stream with an inventory delta followed by more
2347 records and more substreams to test that all records and substreams
2348 from that point on are used.
2349
2350 This sends, in order:
2351 * inventories substream: rev1, rev2, rev3. rev2 and rev3 are
2352 inventory-deltas.
2353 * texts substream: (some-rev, some-file)
2354 """
2355 # Define a stream using generators so that it isn't rewindable.
2356 inv = inventory.Inventory(revision_id='rev1')
2357 def stream_with_inv_delta():
2358 yield ('inventories', inventories_substream())
2359 yield ('inventory-deltas', inventory_delta_substream())
2360 yield ('texts', [
2361 versionedfile.FulltextContentFactory(
2362 ('some-rev', 'some-file'), (), None, 'content')])
2363 def inventories_substream():
2364 # An empty inventory fulltext. This will be streamed normally.
2365 text = fmt._serializer.write_inventory_to_string(inv)
2366 yield versionedfile.FulltextContentFactory(
2367 ('rev1',), (), None, text)
2368 def inventory_delta_substream():
2369 # An inventory delta. This can't be streamed via this verb, so it
2370 # will trigger a fallback to VFS insert_stream.
2371 entry = inv.make_entry(
2372 'directory', 'newdir', inv.root.file_id, 'newdir-id')
2373 entry.revision = 'ghost'
2374 delta = [(None, 'newdir', 'newdir-id', entry)]
2375 serializer = inventory_delta.InventoryDeltaSerializer(
2376 versioned_root=True, tree_references=False)
2377 lines = serializer.delta_to_lines('rev1', 'rev2', delta)
2378 yield versionedfile.ChunkedContentFactory(
2379 ('rev2',), (('rev1',)), None, lines)
2380 # Another delta.
2381 lines = serializer.delta_to_lines('rev1', 'rev3', delta)
2382 yield versionedfile.ChunkedContentFactory(
2383 ('rev3',), (('rev1',)), None, lines)
2384 return stream_with_inv_delta()
2385
2386
2387class TestRepositoryInsertStream_1_19(TestRepositoryInsertStreamBase):
2388
2389 def test_unlocked_repo(self):
2390 transport_path = 'quack'
2391 repo, client = self.setup_fake_client_and_repository(transport_path)
2392 client.add_expected_call(
2393 'Repository.insert_stream_1.19', ('quack/', ''),
2394 'success', ('ok',))
2395 client.add_expected_call(
2396 'Repository.insert_stream_1.19', ('quack/', ''),
2397 'success', ('ok',))
2398 self.checkInsertEmptyStream(repo, client)
2399
2400 def test_locked_repo_with_no_lock_token(self):
2401 transport_path = 'quack'
2402 repo, client = self.setup_fake_client_and_repository(transport_path)
2403 client.add_expected_call(
2404 'Repository.lock_write', ('quack/', ''),
2405 'success', ('ok', ''))
2406 client.add_expected_call(
2407 'Repository.insert_stream_1.19', ('quack/', ''),
2408 'success', ('ok',))
2409 client.add_expected_call(
2410 'Repository.insert_stream_1.19', ('quack/', ''),
2411 'success', ('ok',))
2412 repo.lock_write()
2413 self.checkInsertEmptyStream(repo, client)
2414
2415 def test_locked_repo_with_lock_token(self):
2416 transport_path = 'quack'
2417 repo, client = self.setup_fake_client_and_repository(transport_path)
2418 client.add_expected_call(
2419 'Repository.lock_write', ('quack/', ''),
2420 'success', ('ok', 'a token'))
2421 client.add_expected_call(
2422 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2423 'success', ('ok',))
2424 client.add_expected_call(
2425 'Repository.insert_stream_1.19', ('quack/', '', 'a token'),
2426 'success', ('ok',))
2427 repo.lock_write()
2428 self.checkInsertEmptyStream(repo, client)
22792429
22802430
2281class TestRepositoryTarball(TestRemoteRepository):2431class TestRepositoryTarball(TestRemoteRepository):
22822432
=== modified file 'bzrlib/tests/test_selftest.py'
--- bzrlib/tests/test_selftest.py 2009-08-04 02:09:19 +0000
+++ bzrlib/tests/test_selftest.py 2009-08-14 05:35:32 +0000
@@ -124,7 +124,7 @@
124 self.assertEqual(sample_permutation,124 self.assertEqual(sample_permutation,
125 get_transport_test_permutations(MockModule()))125 get_transport_test_permutations(MockModule()))
126126
127 def test_scenarios_invlude_all_modules(self):127 def test_scenarios_include_all_modules(self):
128 # this checks that the scenario generator returns as many permutations128 # this checks that the scenario generator returns as many permutations
129 # as there are in all the registered transport modules - we assume if129 # as there are in all the registered transport modules - we assume if
130 # this matches its probably doing the right thing especially in130 # this matches its probably doing the right thing especially in
@@ -293,18 +293,16 @@
293 from bzrlib.tests.per_interrepository import make_scenarios293 from bzrlib.tests.per_interrepository import make_scenarios
294 server1 = "a"294 server1 = "a"
295 server2 = "b"295 server2 = "b"
296 formats = [(str, "C1", "C2"), (int, "D1", "D2")]296 formats = [("C0", "C1", "C2"), ("D0", "D1", "D2")]
297 scenarios = make_scenarios(server1, server2, formats)297 scenarios = make_scenarios(server1, server2, formats)
298 self.assertEqual([298 self.assertEqual([
299 ('str,str,str',299 ('C0,str,str',
300 {'interrepo_class': str,300 {'repository_format': 'C1',
301 'repository_format': 'C1',
302 'repository_format_to': 'C2',301 'repository_format_to': 'C2',
303 'transport_readonly_server': 'b',302 'transport_readonly_server': 'b',
304 'transport_server': 'a'}),303 'transport_server': 'a'}),
305 ('int,str,str',304 ('D0,str,str',
306 {'interrepo_class': int,305 {'repository_format': 'D1',
307 'repository_format': 'D1',
308 'repository_format_to': 'D2',306 'repository_format_to': 'D2',
309 'transport_readonly_server': 'b',307 'transport_readonly_server': 'b',
310 'transport_server': 'a'})],308 'transport_server': 'a'})],
311309
=== modified file 'bzrlib/tests/test_smart.py'
--- bzrlib/tests/test_smart.py 2009-07-23 07:37:05 +0000
+++ bzrlib/tests/test_smart.py 2009-08-14 05:35:32 +0000
@@ -1242,6 +1242,7 @@
1242 SmartServerResponse(('history-incomplete', 2, r2)),1242 SmartServerResponse(('history-incomplete', 2, r2)),
1243 request.execute('stacked', 1, (3, r3)))1243 request.execute('stacked', 1, (3, r3)))
12441244
1245
1245class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):1246class TestSmartServerRepositoryGetStream(tests.TestCaseWithMemoryTransport):
12461247
1247 def make_two_commit_repo(self):1248 def make_two_commit_repo(self):
12481249
=== modified file 'bzrlib/tests/test_xml.py'
--- bzrlib/tests/test_xml.py 2009-04-03 21:50:40 +0000
+++ bzrlib/tests/test_xml.py 2009-08-14 05:35:32 +0000
@@ -19,6 +19,7 @@
19from bzrlib import (19from bzrlib import (
20 errors,20 errors,
21 inventory,21 inventory,
22 xml6,
22 xml7,23 xml7,
23 xml8,24 xml8,
24 serializer,25 serializer,
@@ -139,6 +140,14 @@
139</inventory>140</inventory>
140"""141"""
141142
143_expected_inv_v6 = """<inventory format="6" revision_id="rev_outer">
144<directory file_id="tree-root-321" name="" revision="rev_outer" />
145<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
146<file file_id="file-id" name="file" parent_id="tree-root-321" revision="rev_outer" text_sha1="A" text_size="1" />
147<symlink file_id="link-id" name="link" parent_id="tree-root-321" revision="rev_outer" symlink_target="a" />
148</inventory>
149"""
150
142_expected_inv_v7 = """<inventory format="7" revision_id="rev_outer">151_expected_inv_v7 = """<inventory format="7" revision_id="rev_outer">
143<directory file_id="tree-root-321" name="" revision="rev_outer" />152<directory file_id="tree-root-321" name="" revision="rev_outer" />
144<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />153<directory file_id="dir-id" name="dir" parent_id="tree-root-321" revision="rev_outer" />
@@ -377,6 +386,17 @@
377 for path, ie in inv.iter_entries():386 for path, ie in inv.iter_entries():
378 self.assertEqual(ie, inv2[ie.file_id])387 self.assertEqual(ie, inv2[ie.file_id])
379388
389 def test_roundtrip_inventory_v6(self):
390 inv = self.get_sample_inventory()
391 txt = xml6.serializer_v6.write_inventory_to_string(inv)
392 lines = xml6.serializer_v6.write_inventory_to_lines(inv)
393 self.assertEqual(bzrlib.osutils.split_lines(txt), lines)
394 self.assertEqualDiff(_expected_inv_v6, txt)
395 inv2 = xml6.serializer_v6.read_inventory_from_string(txt)
396 self.assertEqual(4, len(inv2))
397 for path, ie in inv.iter_entries():
398 self.assertEqual(ie, inv2[ie.file_id])
399
380 def test_wrong_format_v7(self):400 def test_wrong_format_v7(self):
381 """Can't accidentally open a file with wrong serializer"""401 """Can't accidentally open a file with wrong serializer"""
382 s_v6 = bzrlib.xml6.serializer_v6402 s_v6 = bzrlib.xml6.serializer_v6
383403
=== modified file 'bzrlib/versionedfile.py'
--- bzrlib/versionedfile.py 2009-08-04 04:36:34 +0000
+++ bzrlib/versionedfile.py 2009-08-14 05:35:32 +0000
@@ -1571,13 +1571,14 @@
1571 record.get_bytes_as(record.storage_kind) call.1571 record.get_bytes_as(record.storage_kind) call.
1572 """1572 """
1573 self._bytes_iterator = bytes_iterator1573 self._bytes_iterator = bytes_iterator
1574 self._kind_factory = {'knit-ft-gz':knit.knit_network_to_record,1574 self._kind_factory = {
1575 'knit-delta-gz':knit.knit_network_to_record,1575 'fulltext': fulltext_network_to_record,
1576 'knit-annotated-ft-gz':knit.knit_network_to_record,1576 'groupcompress-block': groupcompress.network_block_to_records,
1577 'knit-annotated-delta-gz':knit.knit_network_to_record,1577 'knit-ft-gz': knit.knit_network_to_record,
1578 'knit-delta-closure':knit.knit_delta_closure_to_records,1578 'knit-delta-gz': knit.knit_network_to_record,
1579 'fulltext':fulltext_network_to_record,1579 'knit-annotated-ft-gz': knit.knit_network_to_record,
1580 'groupcompress-block':groupcompress.network_block_to_records,1580 'knit-annotated-delta-gz': knit.knit_network_to_record,
1581 'knit-delta-closure': knit.knit_delta_closure_to_records,
1581 }1582 }
15821583
1583 def read(self):1584 def read(self):
15841585
=== modified file 'bzrlib/xml5.py'
--- bzrlib/xml5.py 2009-04-04 02:50:01 +0000
+++ bzrlib/xml5.py 2009-08-14 05:35:32 +0000
@@ -39,8 +39,8 @@
39 format = elt.get('format')39 format = elt.get('format')
40 if format is not None:40 if format is not None:
41 if format != '5':41 if format != '5':
42 raise BzrError("invalid format version %r on inventory"42 raise errors.BzrError("invalid format version %r on inventory"
43 % format)43 % format)
44 data_revision_id = elt.get('revision_id')44 data_revision_id = elt.get('revision_id')
45 if data_revision_id is not None:45 if data_revision_id is not None:
46 revision_id = cache_utf8.encode(data_revision_id)46 revision_id = cache_utf8.encode(data_revision_id)