Overall, this is looking pretty good. A few small tweaks here and there, and possible concerns. But only one major concern.

You changed the default return value of "iter_inventories()" to return 'unordered' results, which means that "revision_trees()" also not returns 'unordered' results. Which is fairly serious, and I was able to find more than one code path that was relying on the ordering of revision_trees(). So I think the idea is sound, but it needs to be exposed in a backwards compatible manner by making it a flag that defaults to "as-requested" ordering, and then we fix up code paths as we can.

I don't specifically need to review that change, though. And I don't really want to review this patch yet again :). So I'm voting tweak, and you can submit if you agree with my findings.


119	+ from bzrlib.graph import FrozenHeadsCache
120	+ graph = FrozenHeadsCache(graph)
121	+ new_roots_stream = _new_root_data_stream(
122	+ root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
123	+ return [('texts', new_roots_stream)]

^- I'm pretty sure if we are using FrozenHeadsCache that we really want to use KnownGraph instead. We have a dict 'parent_map' which means we can do much more efficient heads() checks since the whole graph is already loaded. This is a minor thing we can do later, but it would probably be good not to forget.

...

392	- elif last_modified[-1] == ':':
393	- raise errors.BzrError('special revisionid found: %r' % line)
394	- if not delta_tree_references and content.startswith('tree\x00'):
395	+ elif newpath_utf8 != 'None' and last_modified[-1] == ':':
396	+ # Deletes have a last_modified of null:, but otherwise special
397	+ # revision ids should not occur.
398	+ raise errors.BzrError('special revisionid found: %r' % line)
399	+ if delta_tree_references is False and content.startswith('tree\x00'):

^- What does "newpath_utf8 != 'None'" mean if someone does:

touch None
bzr add None
bzr commit -m "adding None"

Is this a serialization bug waiting to be exposed?

I guess not as it seems the paths are always prefixed with "/", right?

(So a path of None would actually be "newpath_utf8 == '/None'")

However, further down (sorry about the bad indenting):

413	 if newpath_utf8 == 'None':
414	newpath = None

^- here you set "newpath=None" but you *don't* set "newpath_utf8" to None.

415	+ elif newpath_utf8[:1] != '/':
416	+ raise errors.BzrError(
417	+ "newpath invalid (does not start with /): %r"
418	+ % (newpath_utf8,))
419	else:
420	+ # Trim leading slash
421	+ newpath_utf8 = newpath_utf8[1:]
422	newpath = newpath_utf8.decode('utf8')
423	+ content_tuple = tuple(content.split('\x00'))
424	+ if content_tuple[0] == 'deleted':
425	+ entry = None
426	+ else:
427	+ entry = _parse_entry(
428	+ newpath_utf8, file_id, parent_id, last_modified,
429	+ content_tuple)

And then here "newpath_utf8" is passed to _parse_entry.

Now I realize this is probably caught by "content_tuple[0] == 'deleted'". Though it feels a bit icky to rely on "newpath_utf8" in one portion and "content_tuple[0]" in another (since they can potentially be out of sync.)

I think if we just force an early error with:
if newpath_utf8 == 'None':
  newpath = None
  newpath_utf8 = None

Then if there is an issue at least it fails rather than succeeding to create a new file with the name "None".

if utf8_path.startswith('/'):

^- If this is a core routine (something called for every path) then:
if utf8_path[:1] == '/':

*is* faster than .startswith() because you 
1) Don't have a function call
2) Don't have an attribute lookup

I'm assuming this is a function that gets called a lot. If not, don't worry about it.

...

566	+ if required_version < (1, 18):
567	+ # Remote side doesn't support inventory deltas. Wrap the stream to
568	+ # make sure we don't send any. If the stream contains inventory
569	+ # deltas we'll interrupt the smart insert_stream request and
570	+ # fallback to VFS.
571	+ stream = self._stop_stream_if_inventory_delta(stream)

^- it seems a bit of a shame that if we don't support deltas we fall back to
VFS completely, rather than trying something intermediate (like falling back to
the original code path of sending full inventory texts, or IDS, or...)

I think we are probably okay, but this code at least raises a flag. I expect a
bug report along the lines of "fetching between 1.18 and older server is very
slow". I haven't looked at all the code paths to determine if 1.18 will have
regressed against a 1.17 server. Especially when *not* converting formats. Have
you at least manually tested this?

...

703	+ self.new_pack.finish_content()
704	+ if len(self.packs) == 1:
705	+ old_pack = self.packs[0]
706	+ if old_pack.name == self.new_pack._hash.hexdigest():
707	+ # The single old pack was already optimally packed.
708	+ self.new_pack.abort()
709	+ return None

^- I think this is a good fix, but
1) It really should have a NEWS entry and a test case [it might have a test],
as I'm sure there are people following the bug.
2) It probably should say something about the repository already being
optimally packed. At a minimum a mutter() but I'm thinking it is a reasonable
'note()'.

737	 def _get_source(self, to_format):
738	"""Return a source for streaming from this repository."""
739	- if isinstance(to_format, remote.RemoteRepositoryFormat):
740	- # Can't just check attributes on to_format with the current code,
741	- # work around this:
742	- to_format._ensure_real()
743	- to_format = to_format._custom_format
744	- if to_format.__class__ is self._format.__class__:
745	+ if (to_format.supports_chks and
746	+ self._format.repository_class is to_format.repository_class and
747	+ self._format._serializer == to_format._serializer):
748	# We must be exactly the same format, otherwise stuff like the chk
749	- # page layout might be different
750	+ # page layout might be different.
751	+ # Actually, this test is just slightly looser than exact so that
752	+ # CHK2 <-> 2a transfers will work.

^- I think this should probably be ".network_name() == other.network_name()"
and we just customize the names to be the same. Is that possible to do?


854	+ def iter_inventories(self, revision_ids, ordering='unordered'):

^- the "iter_inventories" api was written such that it always returned the
inventories in the *same* order as the supplied revisions, and we've had code
that did:

for rev, inv in izip(revs, iter_inventories(revs)):
    ...
I've tried to get away from it, because of inv.revision_id, but I think this is
a significant API break that should be documented.

That is the original reason *why* _iter_inventory_xmls() always buffered the
texts before yielding.


There are a few places that you didn't fix, such as:
bzrlib/bundle/serializer/v4.py:369:        for inv in self.repository.iter_inventories(needed_inventories)

^- This, specifically, is one of the places I was working on with my bundles +
2a fixes, and the supplied ordering is *required* (though it happens to also be
topological.)

I'm concerned that we may have more layers that you don't realize built on top
of this. For example:

    def revision_trees(self, revision_ids):
        """Return Trees for revisions in this repository.

        :param revision_ids: a sequence of revision-ids;
          a revision-id may not be None or 'null:'
        """
        inventories = self.iter_inventories(revision_ids)
        for inv in inventories:
            yield RevisionTree(self, inv, inv.revision_id)


^- And "revision_trees" *also* is explicitly stated that it returns the trees
in the order requested.

And you get stuff like:
bzrlib/repository.py:580:            revtrees = list(self.repository.revision_trees(self.parents))


^- this is a rather serious one in "record_iter_changes" as it really requires
the ordering. We could layer the ordering on top of it, etc.

However, I really think for *your* patch, we shouldn't change the return
ordering of "iter_inventories". We could change it as:

def iter_inventories(self, revisions, ordering=None):
  if ordering is None or ordering == 'as-requested':
      # return inventories in the exact order as requested
      ...

And then for new code that we know we can trust will handle results in
arbitrary order (or make an explicit request on the order), then we can do
better.

It also gives us a way to transition, and mark code that has been fixed. The
parameter could be extended to be passed from "revision_trees()".

1185	+ elif from_format.network_name() == self.to_format.network_name():
1186	+ # Same format.
1187	+ return self._get_simple_inventory_stream(revision_ids,
1188	+ missing=missing)
1189	+ elif (not from_format.supports_chks and not self.to_format.supports_chks
1190	+ and from_format._serializer == self.to_format._serializer):
1191	+ # Essentially the same format.
1192	+ return self._get_simple_inventory_stream(revision_ids,
1193	+ missing=missing)

^- would the second "elif" be better as just an "or ..." on the first elif?

...

1434	+ def _should_fake_unknown(self):
1435	+ # This is a workaround for bugs in pre-1.18 clients that claim to

^- This seems interesting, is it well tested that this is triggered? (And did
you manually use an older bzr and test it with a newer server?)

1660	- result.append((InterRepository,
1661	- weaverepo.RepositoryFormat5(),
1662	- knitrepo.RepositoryFormatKnit3()))
1663	- result.append((InterRepository,
1664	- knitrepo.RepositoryFormatKnit1(),
1665	- knitrepo.RepositoryFormatKnit3()))
1666	- result.append((InterRepository,
1667	- knitrepo.RepositoryFormatKnit1(),
1668	- knitrepo.RepositoryFormatKnit3()))
1669	- result.append((InterKnitRepo,
1670	- knitrepo.RepositoryFormatKnit1(),
1671	- pack_repo.RepositoryFormatKnitPack1()))
1672	- result.append((InterKnitRepo,
1673	- pack_repo.RepositoryFormatKnitPack1(),
1674	- knitrepo.RepositoryFormatKnit1()))
1675	- result.append((InterKnitRepo,
1676	- knitrepo.RepositoryFormatKnit3(),
1677	- pack_repo.RepositoryFormatKnitPack3()))
1678	- result.append((InterKnitRepo,
1679	- pack_repo.RepositoryFormatKnitPack3(),
1680	- knitrepo.RepositoryFormatKnit3()))
1681	- result.append((InterKnitRepo,
1682	- pack_repo.RepositoryFormatKnitPack3(),
1683	- pack_repo.RepositoryFormatKnitPack4()))
1684	- result.append((InterDifferingSerializer,
1685	- pack_repo.RepositoryFormatKnitPack1(),
1686	- pack_repo.RepositoryFormatKnitPack6RichRoot()))
1687	+ add_combo(weaverepo.RepositoryFormat5(),
1688	+ knitrepo.RepositoryFormatKnit3())
1689	+ add_combo(knitrepo.RepositoryFormatKnit1(),
1690	+ knitrepo.RepositoryFormatKnit3())
1691	+ add_combo(knitrepo.RepositoryFormatKnit1(),
1692	+ pack_repo.RepositoryFormatKnitPack1())


^- so aside from some obvious duplication that needed to be removed, this looks
like it removes some test permutations.

Namely, testing that both "InterRepository" *and* "InterKnitRepository" work
between certain repository formats.

I'm a little concerned that we are opening potential holes here. What do you
think?


I can't say that I did a fully thorough review of all the tests. my primary
concern is the change to "iter_inventories" which breaks a bunch of assumptions
from higher up code.