Bazaar

Merge lp:~jameinel/bzr/1.16-chk-direct into lp:~bzr/bzr/trunk-old

1.16-chk-direct
Merge into trunk-old

Proposed by John A Meinel on 2009-06-08

Status:	Merged
Merged at revision:	not available
Proposed branch:	lp:~jameinel/bzr/1.16-chk-direct
Merge into:	lp:~bzr/bzr/trunk-old
Diff against target:	200 lines
To merge this branch:	bzr merge lp:~jameinel/bzr/1.16-chk-direct
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Martin Pool		2009-06-08	Approve on 2009-06-16
Review via email: mp+7203@code.launchpad.net

Revision history for this message

John A Meinel (jameinel) wrote on 2009-06-08:

This is a bit more of an [RFC] than a strict [MERGE], but I figured it would be good to get some feedback on the route I'm taking.

With the attached patch (and most of my other proposed updates for commit w/ dev6 format repositories) I've managed to get the time for initial commit of a MySQL repository down to ~ equivalent to the XML time.

For comparison:

dev6 15.5s
1.9 11.6s
dev6+ 13.5s
dev6-all 11.6s

So this patch by itself saves approximately 2s, or 10-20% of the time for
initial commit.

The main change is to get away from calling CHKMap.map() for every object in
the commit, and instead to create a LeafNode and then _split() it.

This also includes my other change to improve InternalNode.map() performance
(_iter_nodes change: https://code.edge.launchpad.net/~jameinel/bzr/1.16-chkmap-updates/+merge/7113)

However, InternalNode.map() still has a lot of "did the LeafNode shrink"
overhead, that we really don't care about for initial commit.

Hopefully someone will approve 7113 for me, and then I'll resubmit this, so we can see that it isn't a very big change.

Revision history for this message

Martin Pool (mbp) wrote on 2009-06-16:

I'm not very familiar with the chkmap code you're changing here, but I don't see any problems with this patch. Were you wanting to talk about any particular issue here, or is there a list thread about it?

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John A Meinel

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

 === modified file 'bzrlib/chk_map.py'
 --- bzrlib/chk_map.py	2009-06-15 14:49:27 +0000
 +++ bzrlib/chk_map.py	2009-06-16 02:36:08 +0000
@@ -203,13 +203,57 @@
              multiple pages.
          :return: The root chk of the resulting CHKMap.
          """
--        result = CHKMap(store, None, search_key_func=search_key_func)
++        # root_key = klass._create_via_map(store, initial_value,
++        #     maximum_size=maximum_size, key_width=key_width,
++        #     search_key_func=search_key_func)
++        root_key = klass._create_directly(store, initial_value,
++            maximum_size=maximum_size, key_width=key_width,
++            search_key_func=search_key_func)
++        # if root_key != alt_root_key:
++        #     result1 = klass(store, root_key, search_key_func=search_key_func)
++        #     result2 = klass(store, alt_root_key,
++        #                     search_key_func=search_key_func)
++        #     import pdb; pdb.set_trace()
++        #     raise ValueError('Failed to serialize via leaf splitting.')
++        return root_key
++
++    @classmethod
++    def _create_via_map(klass, store, initial_value, maximum_size=0,
++                        key_width=1, search_key_func=None):
++        result = klass(store, None, search_key_func=search_key_func)
          result._root_node.set_maximum_size(maximum_size)
          result._root_node._key_width = key_width
          delta = []
          for key, value in initial_value.items():
              delta.append((None, key, value))
--        return result.apply_delta(delta)
++        root_key = result.apply_delta(delta)
++        return root_key
++
++    @classmethod
++    def _create_directly(klass, store, initial_value, maximum_size=0,
++                         key_width=1, search_key_func=None):
++        node = LeafNode(search_key_func=search_key_func)
++        node.set_maximum_size(maximum_size)
++        node._key_width = key_width
++        node._items = dict(initial_value)
++        node._raw_size = sum([node._key_value_len(key, value)
++                              for key,value in initial_value.iteritems()])
++        node._len = len(node._items)
++        node._compute_search_prefix()
++        node._compute_serialised_prefix()
++        if (node._len > 1
++            and maximum_size
++            and node._current_size() > maximum_size):
++            prefix, node_details = node._split(store)
++            if len(node_details) == 1:
++                raise AssertionError('Failed to split using node._split')
++            node = InternalNode(prefix, search_key_func=search_key_func)
++            node.set_maximum_size(maximum_size)
++            node._key_width = key_width
++            for split, subnode in node_details:
++                node.add_node(split, subnode)
++        keys = list(node.serialise(store))
++        return keys[-1]
      def iter_changes(self, basis):
          """Iterate over the changes between basis and self.
@@ -764,7 +808,19 @@
                  result[prefix] = node
              else:
                  node = result[prefix]
--            node.map(store, key, value)
++            sub_prefix, node_details = node.map(store, key, value)
++            if len(node_details) > 1:
++                if prefix != sub_prefix:
++                    # This node has been split and is now found via a different
++                    # path
++                    result.pop(prefix)
++                new_node = InternalNode(sub_prefix,
++                    search_key_func=self._search_key_func)
++                new_node.set_maximum_size(self._maximum_size)
++                new_node._key_width = self._key_width
++                for split, node in node_details:
++                    new_node.add_node(split, node)
++                result[prefix] = new_node
          return common_prefix, result.items()
      def map(self, store, key, value):
 === modified file 'bzrlib/repofmt/groupcompress_repo.py'
 --- bzrlib/repofmt/groupcompress_repo.py	2009-06-12 01:11:00 +0000
 +++ bzrlib/repofmt/groupcompress_repo.py	2009-06-16 02:36:08 +0000
@@ -674,6 +674,73 @@
          return self._inventory_add_lines(revision_id, parents,
              inv_lines, check_content=False)
++    def _get_null_inventory(self):
++        serializer = self._format._serializer
++        null_inv = inventory.CHKInventory(serializer.search_key_name)
++        search_key_func = chk_map.search_key_registry.get(
++                            serializer.search_key_name)
++        null_inv.id_to_entry = chk_map.CHKMap(self.chk_bytes,
++            None, search_key_func)
++        null_inv.id_to_entry._root_node.set_maximum_size(
++            serializer.maximum_size)
++        null_inv.parent_id_basename_to_file_id = chk_map.CHKMap(
++            self.chk_bytes, None, search_key_func)
++        null_inv.parent_id_basename_to_file_id._root_node.set_maximum_size(
++            serializer.maximum_size)
++        null_inv.parent_id_basename_to_file_id._root_node._key_width = 2
++        null_inv.root_id = None
++        return null_inv
++
++    def _create_inv_from_null(self, delta, new_revision_id):
++        """This will mutate new_inv directly.
++
++        This is a simplified form of create_by_apply_delta which knows that all
++        the old values must be None, so everything is a create.
++        """
++        serializer = self._format._serializer
++        new_inv = inventory.CHKInventory(serializer.search_key_name)
++        new_inv.revision_id = new_revision_id
++
++        entry_to_bytes = new_inv._entry_to_bytes
++        id_to_entry_dict = {}
++        parent_id_basename_dict = {}
++        for old_path, new_path, file_id, entry in delta:
++            if old_path is not None:
++                raise ValueError('Invalid delta, somebody tried to delete %r'
++                                 ' from the NULL_REVISION'
++                                 % ((old_path, file_id),))
++            if new_path is None:
++                raise ValueError('Invalid delta, delta from NULL_REVISION has'
++                                 ' no new_path %r' % (file_id,))
++            # file id changes
++            if new_path == '':
++                new_inv.root_id = file_id
++                parent_id_basename_key = '', ''
++            else:
++                utf8_entry_name = entry.name.encode('utf-8')
++                parent_id_basename_key = (entry.parent_id, utf8_entry_name)
++            new_value = entry_to_bytes(entry)
++            # Create Caches?
++            ## new_inv._path_to_fileid_cache[new_path] = file_id
++            id_to_entry_dict[(file_id,)] = new_value
++            parent_id_basename_dict[parent_id_basename_key] = file_id
++
++        search_key_func = chk_map.search_key_registry.get(
++                            serializer.search_key_name)
++        maximum_size = serializer.maximum_size
++        root_key = chk_map.CHKMap.from_dict(self.chk_bytes, id_to_entry_dict,
++                   maximum_size=maximum_size, key_width=1,
++                   search_key_func=search_key_func)
++        new_inv.id_to_entry = chk_map.CHKMap(self.chk_bytes, root_key,
++                                             search_key_func)
++        root_key = chk_map.CHKMap.from_dict(self.chk_bytes,
++                   parent_id_basename_dict,
++                   maximum_size=maximum_size, key_width=1,
++                   search_key_func=search_key_func)
++        new_inv.parent_id_basename_to_file_id = chk_map.CHKMap(self.chk_bytes,
++                                                    root_key, search_key_func)
++        return new_inv
++
      def add_inventory_by_delta(self, basis_revision_id, delta, new_revision_id,
                                 parents, basis_inv=None, propagate_caches=False):
          """Add a new inventory expressed as a delta against another revision.
@@ -699,24 +766,29 @@
              repository format specific) of the serialized inventory, and the
              resulting inventory.
          """
--        if basis_revision_id == _mod_revision.NULL_REVISION:
--            return KnitPackRepository.add_inventory_by_delta(self,
--                basis_revision_id, delta, new_revision_id, parents)
          if not self.is_in_write_group():
              raise AssertionError("%r not in write group" % (self,))
          _mod_revision.check_not_reserved_id(new_revision_id)
--        basis_tree = self.revision_tree(basis_revision_id)
--        basis_tree.lock_read()
--        try:
--            if basis_inv is None:
++        basis_tree = None
++        if basis_inv is None:
++            if basis_revision_id == _mod_revision.NULL_REVISION:
++                new_inv = self._create_inv_from_null(delta, new_revision_id)
++                inv_lines = new_inv.to_lines()
++                return self._inventory_add_lines(new_revision_id, parents,
++                    inv_lines, check_content=False), new_inv
++            else:
++                basis_tree = self.revision_tree(basis_revision_id)
++                basis_tree.lock_read()
                  basis_inv = basis_tree.inventory
++        try:
              result = basis_inv.create_by_apply_delta(delta, new_revision_id,
                  propagate_caches=propagate_caches)
              inv_lines = result.to_lines()
              return self._inventory_add_lines(new_revision_id, parents,
                  inv_lines, check_content=False), result
          finally:
--            basis_tree.unlock()
++            if basis_tree is not None:
++                basis_tree.unlock()
      def _iter_inventories(self, revision_ids):
          """Iterate over many inventory objects."""

Bazaar

Merge lp:~jameinel/bzr/1.16-chk-direct into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers