+* InterDifferingSerializer has been removed. The transformations it
+ provided are now done automatically by StreamSource. (Andrew Bennetts)
+
^- So for starters this isn't true for now at least.
You added the debug flags "-DIDS:never" and "-DIDS:always". I wonder if they wouldn't be better as IDS_never and IDS_always, just because that would seem to fit better with how we have generally named them. Not a big deal.
^- I find the wrapping to be a bit strange, since this and the next function don't have any parameters on the first line. More importantly, though, these should probably at least have a minimal docstring to help understand what is going on.
It is nice that you were able to factor out these helpers so that we can consistently generate the root keys and their parents. It is a shame that _new_root_data_stream is being called on: rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
Which is implemented by iterating all revision trees, which requires parsing all of the revisions. Though it then immediately goes on to use those trees to generate the inventory deltas.
def _find_root_ids(self, revs, parent_map, graph): revision_root = {}
for tree in self.iter_rev_trees(revs): revision_id = tree.inventory.root.revision
root_id = tree.get_root_id() revision_root[revision_id] = root_id
# Find out which parents we don't already know root ids for
parents = set()
for revision_parents in parent_map.itervalues(): parents.update(revision_parents) parents.difference_update(revision_root.keys() + [NULL_REVISION])
^- We've certainly had this pattern enough that we really should consider factoring it out into a helper. But it seems the preferred form is:
That should perform better than what we have. (No need to generate a list from keys() nor append to it NULL_REVISION, creating potentially a 3rd all-keys list.)
# Limit to revisions present in the versionedfile
parents = graph.get_parent_map(parents).keys()
for tree in self.iter_rev_trees(parents):
root_id = tree.get_root_id() revision_root[tree.get_revision_id()] = root_id
return revision_root
I've been doing some more performance testing, and I've basically seen stuff
like:
bzr branch mysql-525 local -DIDS:always
3m15s
bzr branch mysql-525 local -DIDS:always --XML8-nocopy
2m30s
So it is slower, but only about 25% slower (except for the extra-optimized
XML8-nocopy).
I think the new code converts almost as fast, and the extra repacking from IDS
actually costs it a bit of time (other than saving it disk space). The only
major deficit at this point is that there is no progress indication. So I'd
probably stick with IDS for local operations just because of that.
I'll try to finish reviewing the rest, but at least my initial performance
concerns have been addressed.
+* InterDifferingS erializer has been removed. The transformations it
+ provided are now done automatically by StreamSource. (Andrew Bennetts)
+
^- So for starters this isn't true for now at least.
You added the debug flags "-DIDS:never" and "-DIDS:always". I wonder if they wouldn't be better as IDS_never and IDS_always, just because that would seem to fit better with how we have generally named them. Not a big deal.
+ data_stream( to_create, rev_id_ to_root_ id_map, parent_map, repo, graph=None):
+
+def _new_root_
+ root_keys_
^- I find the wrapping to be a bit strange, since this and the next function don't have any parameters on the first line. More importantly, though, these should probably at least have a minimal docstring to help understand what is going on.
It is nice that you were able to factor out these helpers so that we can consistently generate the root keys and their parents. It is a shame that _new_root_ data_stream is being called on:
rev_id_ to_root_ id = self._find_ root_ids( revs, parent_map, graph)
Which is implemented by iterating all revision trees, which requires parsing all of the revisions. Though it then immediately goes on to use those trees to generate the inventory deltas.
def _find_root_ ids(self, revs, parent_map, graph):
revision_ root = {} rev_trees( revs):
revision_ id = tree.inventory. root.revision
revision_ root[revision_ id] = root_id map.itervalues( ):
parents. update( revision_ parents)
parents. difference_ update( revision_ root.keys( ) + [NULL_REVISION])
for tree in self.iter_
root_id = tree.get_root_id()
# Find out which parents we don't already know root ids for
parents = set()
for revision_parents in parent_
^- We've certainly had this pattern enough that we really should consider factoring it out into a helper. But it seems the preferred form is:
parents = set() map.itervalues( )) difference_ update( revision_ root) discard( NULL_REVISION)
map(parents.update, parent_
parents.
parents.
That should perform better than what we have. (No need to generate a list from keys() nor append to it NULL_REVISION, creating potentially a 3rd all-keys list.)
# Limit to revisions present in the versionedfile parent_ map(parents) .keys() rev_trees( parents) :
revision_ root[tree. get_revision_ id()] = root_id
parents = graph.get_
for tree in self.iter_
root_id = tree.get_root_id()
return revision_root
I've been doing some more performance testing, and I've basically seen stuff
like:
bzr branch mysql-525 local -DIDS:always
3m15s
bzr branch mysql-525 local -DIDS:always --XML8-nocopy
2m30s
bzr branch mysql-525 local -DIDS:never
4m01s
bzr branch mysql-525 bzr://localhost /remote -DIDS:never
3m25s
So it is slower, but only about 25% slower (except for the extra-optimized
XML8-nocopy).
I think the new code converts almost as fast, and the extra repacking from IDS
actually costs it a bit of time (other than saving it disk space). The only
major deficit at this point is that there is no progress indication. So I'd
probably stick with IDS for local operations just because of that.
I'll try to finish reviewing the rest, but at least my initial performance
concerns have been addressed.