Merge lp:~jelmer/bzr/faster-stacked-tree-build into lp:bzr

Proposed by Jelmer Vernooij
Status: Rejected
Rejected by: Jelmer Vernooij
Proposed branch: lp:~jelmer/bzr/faster-stacked-tree-build
Merge into: lp:bzr
Diff against target: 741 lines (+284/-80)
14 files modified
bzrlib/__init__.py (+4/-7)
bzrlib/bzrdir.py (+21/-4)
bzrlib/chk_map.py (+9/-0)
bzrlib/remote.py (+5/-3)
bzrlib/repofmt/groupcompress_repo.py (+56/-3)
bzrlib/smart/repository.py (+4/-0)
bzrlib/tests/blackbox/test_branch.py (+2/-3)
bzrlib/tests/per_branch/test_stacking.py (+12/-10)
bzrlib/tests/per_repository/test_commit_builder.py (+19/-13)
bzrlib/tests/per_repository_reference/test_commit_with_stacking.py (+11/-17)
bzrlib/tests/test_remote.py (+42/-19)
bzrlib/tests/test_repository.py (+2/-1)
bzrlib/vf_search.py (+93/-0)
doc/en/release-notes/bzr-2.5.txt (+4/-0)
To merge this branch: bzr merge lp:~jelmer/bzr/faster-stacked-tree-build
Reviewer Review Type Date Requested Status
Vincent Ladeuil Pending
Review via email: mp+82416@code.launchpad.net

This proposal supersedes a proposal from 2011-08-03.

Description of the change

This branch was originally submitted by spiv. Here is his cover letter:

Here's my branch that makes 'bzr branch --stacked $SMART_SERVER_URL' much, much faster. It adds a new kind of search “texts for revision”, and adds that to the HPSS protocol (without adding a new verb, just a new search body, as usual it will fallback to VFS if the server doesn't support the new thing), and checks for the new search in get_stream.

The basic idea here is that to build a working tree we need to fetch all the texts, and if the new local repo is stacked we may as well fetch all that data in a single stream and save it into that repo (rather than fetching it text-by-text when we try to build the tree from the empty local repo stacked on the remote fallback).

I forget exactly what is missing from this branch, but probably not much. I think mainly just some clarity about the new SearchResult APIs it adds, and that's perhaps best resolved by having other folks review this anyway. So: review away!

(This is actually all code from while I was still working at Canonical, but the flu stopped me from having time to give it a final self-review and/or polish, so here it is without that.)

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

I haven't read it extremely carefully but it looks basically ok.
Thanks for pushing it up.

It would be nice to get some news and what's new entries that give
some indication of how much performance has actually changed.

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

+ # XXX: this is a gross hack.
+ mutter('HACK adding texts-for-rev:%s', tip)

[tweak] Can you add something more about what's specifically wrong with it or what ought to be done instead (even just a pointer to bug 791704), otherwise for someone who comes to look at it it's just kind of rubbing it in.

I presume what you're saying is gross is the special case for when this fires? Or is it that it is done as a special search at all? It does look like it will cause a similar performance cliff to what we currently have when you branch into a new empty repository as opposed to creating a new repository.

[optional] Thinking this through makes me wonder if we should have a debug option to force this on or force it off, so that we can eg use it as a workaround and can more easily compare the performance.

+# def get_full_keys(self):
+# return set(sr.get_full_keys() for sr in self.search_results)

[query] If this is really dead can you just remove it? Likewise below.

+ def __init__(self, revision_id, repo=None):
+ """Constructor.
+
+ :param revision_id: A revision ID. All records needed to create a
+ checkout of this revision are referenced by this result (i.e. the
+ revision record and corresponding inventory record, and all
+ referenced texts and chk_bytes).
+ :param repo: An optional repository. If not given, then
+ ``get_full_keys`` cannot be used.
+ """
+ self.revision_id = revision_id
+# self.repo = repo

[query] Is the repo really unused? Maybe the function prototype or docstring needs to be changed?

+ if self.texts_for_inv is not None:
+ # Emit all CHK records for this inventory, and add all texts
+ # those nodes reference.
+ keys = set()
+ for node in self.texts_for_inv.id_to_entry.iter_nodes():
+ keys.add(node.key())
+ if type(node) is chk_map.LeafNode:
+ self._text_keys.update(
+ [chk_map._bytes_to_text_key(b) for b in
+ node._items.values()])
+ stream = self.from_repository.chk_bytes.get_record_stream(
+ keys, 'unordered', True)
+ for record in stream:
+ yield record
+ uninteresting_root_keys.add(
+ self.texts_for_inv.id_to_entry.key())

[comment] This seems to be getting kind of large for a nested function but perhaps refactoring it is out of scope.

[query] It's not obvious to me that there are tests for this actually being reached, but I might be just missing that.

I'd like to get a second review, probably from John, and someone should also build this and just play with it to check the behaviour is reasonable. This needs to be considered as a risk factor for the landing on lp (maybe we should have somewhere to accumulate a list of them.)

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

So, I can do these tweaks, if spiv is busy, but I would definitely like a second review.

Revision history for this message
Vincent Ladeuil (vila) wrote : Posted in a previous version of this proposal

@jam: AIUI you're working on this bug now, what should be the status of *this* proposal in that respect ?

review: Needs Information
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Posted in a previous version of this proposal

I'm looking at finishing this.

Revision history for this message
Andrew Bennetts (spiv) wrote :

On Wed, Nov 16, 2011 at 03:52:39PM -0000, Jelmer Vernooij wrote:
> Jelmer Vernooij has proposed merging lp:~jelmer/bzr/faster-stacked-tree-build into lp:bzr.

Thanks for picking this up!

-Andrew.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/__init__.py'
2--- bzrlib/__init__.py 2012-01-16 17:57:43 +0000
3+++ bzrlib/__init__.py 2012-01-25 17:56:30 +0000
4@@ -148,13 +148,10 @@
5 """
6 try:
7 import ctypes
8- except ImportError:
9- return
10- pythonapi = getattr(ctypes, "pythonapi", None)
11- if pythonapi is None:
12- # Not CPython ctypes implementation
13- return
14- old_ptr = ctypes.c_void_p.in_dll(pythonapi, "Py_FileSystemDefaultEncoding")
15+ old_ptr = ctypes.c_void_p.in_dll(ctypes.pythonapi,
16+ "Py_FileSystemDefaultEncoding")
17+ except (ImportError, ValueError):
18+ return # No ctypes or not CPython implementation, do nothing
19 new_ptr = ctypes.cast(ctypes.c_char_p(intern(new_enc)), ctypes.c_void_p)
20 old_ptr.value = new_ptr.value
21 if sys.getfilesystemencoding() != new_enc:
22
23=== modified file 'bzrlib/bzrdir.py'
24--- bzrlib/bzrdir.py 2012-01-18 20:27:41 +0000
25+++ bzrlib/bzrdir.py 2012-01-25 17:56:30 +0000
26@@ -410,8 +410,27 @@
27 else:
28 target_repo_kind = fetch.TargetRepoKinds.PREEXISTING
29 fetch_spec_factory.target_repo_kind = target_repo_kind
30+ create_tree = (create_tree_if_local and not result.has_workingtree() and
31+ isinstance(target_transport, local.LocalTransport) and
32+ (result_repo is None or result_repo.make_working_trees()))
33+
34 if source_repository is not None:
35- fetch_spec = fetch_spec_factory.make_fetch_spec()
36+ if revision_id is not None:
37+ tip = revision_id
38+ elif source_branch is not None:
39+ tip = source_branch.last_revision()
40+ else:
41+ tip = _mod_revision.NULL_REVISION
42+ if (create_tree and is_new_repo
43+ and tip != _mod_revision.NULL_REVISION
44+ and target_repo_kind == fetch.TargetRepoKinds.STACKED):
45+ # We're creating a tree for a stacked branch, so make sure we
46+ # fetch the texts we're about to need to construct that tree.
47+ # XXX: this is a gross hack.
48+ mutter('HACK adding texts-for-rev:%s', tip)
49+ fetch_spec = vf_search.TextsForRevisionSearchResult(tip)
50+ else:
51+ fetch_spec = fetch_spec_factory.make_fetch_spec()
52 result_repo.fetch(source_repository, fetch_spec=fetch_spec)
53
54 if source_branch is None:
55@@ -426,9 +445,7 @@
56 mutter("created new branch %r" % (result_branch,))
57
58 # Create/update the result working tree
59- if (create_tree_if_local and not result.has_workingtree() and
60- isinstance(target_transport, local.LocalTransport) and
61- (result_repo is None or result_repo.make_working_trees())):
62+ if create_tree:
63 wt = result.create_workingtree(accelerator_tree=accelerator_tree,
64 hardlink=hardlink, from_branch=result_branch)
65 wt.lock_write()
66
67=== modified file 'bzrlib/chk_map.py'
68--- bzrlib/chk_map.py 2011-12-19 13:23:58 +0000
69+++ bzrlib/chk_map.py 2012-01-25 17:56:30 +0000
70@@ -222,6 +222,15 @@
71 result.append(' %r %r' % (tuple(key), value))
72 return result
73
74+ def iter_nodes(self):
75+ """Yield all nodes of this CHK map."""
76+ self._ensure_root()
77+ node = self._root_node
78+ yield node
79+ if type(node) is InternalNode:
80+ for node, node_filter in node._iter_nodes(self._store):
81+ yield node
82+
83 @classmethod
84 def from_dict(klass, store, initial_value, maximum_size=0, key_width=1,
85 search_key_func=None):
86
87=== modified file 'bzrlib/remote.py'
88--- bzrlib/remote.py 2012-01-19 14:17:55 +0000
89+++ bzrlib/remote.py 2012-01-25 17:56:30 +0000
90@@ -2998,14 +2998,16 @@
91 except errors.UnknownSmartMethod:
92 medium._remember_remote_is_before(version)
93 except errors.UnknownErrorFromSmartServer, e:
94- if isinstance(search, vf_search.EverythingResult):
95+ new_in_2_5_searches = (
96+ vf_search.EverythingResult, vf_search.TextsForRevisionSearchResult)
97+ if isinstance(search, new_in_2_5_searches):
98 error_verb = e.error_from_smart_server.error_verb
99 if error_verb == 'BadSearch':
100- # Pre-2.4 servers don't support this sort of search.
101+ # Pre-2.5 servers don't support this sort of search.
102 # XXX: perhaps falling back to VFS on BadSearch is a
103 # good idea in general? It might provide a little bit
104 # of protection against client-side bugs.
105- medium._remember_remote_is_before((2, 4))
106+ medium._remember_remote_is_before((2, 5))
107 break
108 raise
109 else:
110
111=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
112--- bzrlib/repofmt/groupcompress_repo.py 2011-12-19 19:15:58 +0000
113+++ bzrlib/repofmt/groupcompress_repo.py 2012-01-25 17:56:30 +0000
114@@ -786,7 +786,8 @@
115 pass
116 except errors.NoSuchRevision, e:
117 problems.append(
118- "missing chk node(s) for parent_id_basename_to_file_id maps")
119+ "missing chk node(s) for parent_id_basename_to_file_id maps: %s"
120+ % (str(e),))
121 present_text_keys = no_fallback_texts_index.get_parent_map(text_keys)
122 missing_text_keys = text_keys.difference(present_text_keys)
123 if missing_text_keys:
124@@ -1158,6 +1159,8 @@
125 self._text_fetch_order = 'groupcompress'
126 self._chk_id_roots = None
127 self._chk_p_id_roots = None
128+ self.texts_for_rev = None
129+ self.texts_for_inv = None
130
131 def _get_inventory_stream(self, inventory_keys, allow_absent=False):
132 """Get a stream of inventory texts.
133@@ -1167,6 +1170,7 @@
134 """
135 self._chk_id_roots = []
136 self._chk_p_id_roots = []
137+ self._texts_for_inv = None
138 def _filtered_inv_stream():
139 id_roots_set = set()
140 p_id_roots_set = set()
141@@ -1180,8 +1184,13 @@
142 else:
143 raise errors.NoSuchRevision(self, record.key)
144 bytes = record.get_bytes_as('fulltext')
145- chk_inv = inventory.CHKInventory.deserialise(None, bytes,
146- record.key)
147+ if record.key[-1] == self.texts_for_rev:
148+ chk_inv = inventory.CHKInventory.deserialise(
149+ self.from_repository.chk_bytes, bytes, record.key)
150+ self.texts_for_inv = chk_inv
151+ else:
152+ chk_inv = inventory.CHKInventory.deserialise(None, bytes,
153+ record.key)
154 key = chk_inv.id_to_entry.key()
155 if key not in id_roots_set:
156 self._chk_id_roots.append(key)
157@@ -1222,6 +1231,22 @@
158 inv.parent_id_basename_to_file_id.key())
159 chk_bytes = self.from_repository.chk_bytes
160 def _filter_id_to_entry():
161+ if self.texts_for_inv is not None:
162+ # Emit all CHK records for this inventory, and add all texts
163+ # those nodes reference.
164+ keys = set()
165+ for node in self.texts_for_inv.id_to_entry.iter_nodes():
166+ keys.add(node.key())
167+ if type(node) is chk_map.LeafNode:
168+ self._text_keys.update(
169+ [chk_map._bytes_to_text_key(b) for b in
170+ node._items.values()])
171+ stream = self.from_repository.chk_bytes.get_record_stream(
172+ keys, 'unordered', True)
173+ for record in stream:
174+ yield record
175+ uninteresting_root_keys.add(
176+ self.texts_for_inv.id_to_entry.key())
177 interesting_nodes = chk_map.iter_interesting_nodes(chk_bytes,
178 self._chk_id_roots, uninteresting_root_keys)
179 for record in _filter_text_keys(interesting_nodes, self._text_keys,
180@@ -1232,12 +1257,25 @@
181 self._chk_id_roots = None
182 yield 'chk_bytes', _filter_id_to_entry()
183 def _get_parent_id_basename_to_file_id_pages():
184+ if self.texts_for_inv is not None:
185+ # Emit all CHK records for this inventory.
186+ keys = set()
187+ inv = self.texts_for_inv
188+ for node in inv.parent_id_basename_to_file_id.iter_nodes():
189+ keys.add(node.key())
190+ stream = self.from_repository.chk_bytes.get_record_stream(
191+ keys, 'unordered', True)
192+ for record in stream:
193+ yield record
194+ uninteresting_pid_root_keys.add(
195+ inv.parent_id_basename_to_file_id.key())
196 for record, items in chk_map.iter_interesting_nodes(chk_bytes,
197 self._chk_p_id_roots, uninteresting_pid_root_keys):
198 if record is not None:
199 yield record
200 # Consumed
201 self._chk_p_id_roots = None
202+ self.texts_for_inv = None
203 yield 'chk_bytes', _get_parent_id_basename_to_file_id_pages()
204
205 def _get_text_stream(self):
206@@ -1260,6 +1298,20 @@
207 yield record
208
209 revision_ids = search.get_keys()
210+ texts_for_rev = None
211+ for special in search.get_special():
212+ if special[0] == 'texts-for-rev':
213+ if texts_for_rev is not None:
214+ raise errors.BzrError("Cannot handle special search: %r" %
215+ (search,))
216+ texts_for_rev = special[1]
217+ if texts_for_rev not in revision_ids:
218+ raise AssertionError('texts_for_rev %r not in %r'
219+ % (texts_for_rev, revision_ids))
220+ else:
221+ raise errors.BzrError("Cannot handle special search: %r" %
222+ (search,))
223+ revision_ids = frozenset(revision_ids)
224 pb = ui.ui_factory.nested_progress_bar()
225 rc = self._record_counter
226 self._record_counter.setup(len(revision_ids))
227@@ -1278,6 +1330,7 @@
228 # Clear the repo's get_parent_map cache too.
229 self.from_repository._unstacked_provider.disable_cache()
230 self.from_repository._unstacked_provider.enable_cache()
231+ self.texts_for_rev = texts_for_rev
232 s = self._get_inventory_stream(self._revision_keys)
233 yield (s[0], wrap_and_count(pb, rc, s[1]))
234 self.from_repository.inventories.clear_cache()
235
236=== modified file 'bzrlib/smart/repository.py'
237--- bzrlib/smart/repository.py 2011-12-19 13:23:58 +0000
238+++ bzrlib/smart/repository.py 2012-01-25 17:56:30 +0000
239@@ -99,6 +99,10 @@
240 elif lines[0] == 'search':
241 return self.recreate_search_from_recipe(repository, lines[1:],
242 discard_excess=discard_excess)
243+ elif lines[0] == 'texts-for-rev':
244+ if len(lines) > 2:
245+ return (None, FailedSmartServerResponse(('BadSearch',)))
246+ return (vf_search.TextsForRevisionSearchResult(lines[1]), None)
247 else:
248 return (None, FailedSmartServerResponse(('BadSearch',)))
249
250
251=== modified file 'bzrlib/tests/blackbox/test_branch.py'
252--- bzrlib/tests/blackbox/test_branch.py 2012-01-19 12:01:12 +0000
253+++ bzrlib/tests/blackbox/test_branch.py 2012-01-25 17:56:30 +0000
254@@ -424,13 +424,12 @@
255
256 def test_branch_stacked(self):
257 # We have a mainline
258- trunk_tree = self.make_branch_and_tree('mainline',
259- format='1.9')
260+ trunk_tree = self.make_branch_and_tree('mainline')
261 original_revid = trunk_tree.commit('mainline')
262 self.assertRevisionInRepository('mainline', original_revid)
263 # and a branch from it which is stacked
264 out, err = self.run_bzr(['branch', '--stacked', 'mainline',
265- 'newbranch'])
266+ 'newbranch', '--no-tree'])
267 self.assertEqual('', out)
268 self.assertEqual('Created new stacked branch referring to %s.\n' %
269 trunk_tree.branch.base, err)
270
271=== modified file 'bzrlib/tests/per_branch/test_stacking.py'
272--- bzrlib/tests/per_branch/test_stacking.py 2011-11-24 17:24:21 +0000
273+++ bzrlib/tests/per_branch/test_stacking.py 2012-01-25 17:56:30 +0000
274@@ -143,7 +143,8 @@
275 trunk_revid = trunk_tree.commit('mainline')
276 # and make branch from it which is stacked
277 try:
278- new_dir = trunk_tree.bzrdir.sprout('newbranch', stacked=True)
279+ new_dir = trunk_tree.bzrdir.sprout('newbranch', stacked=True,
280+ create_tree_if_local=False)
281 except unstackable_format_errors, e:
282 raise TestNotApplicable(e)
283 # stacked repository
284@@ -160,14 +161,16 @@
285 trunk_revid = trunk_tree.commit('mainline')
286 # Make sure that we can make a stacked branch from it
287 try:
288- trunk_tree.bzrdir.sprout('testbranch', stacked=True)
289+ trunk_tree.bzrdir.sprout('testbranch', stacked=True,
290+ create_tree_if_local=False)
291 except unstackable_format_errors, e:
292 raise TestNotApplicable(e)
293 # Now serve the original mainline from a smart server
294 remote_transport = self.make_smart_server('mainline')
295 remote_bzrdir = bzrdir.BzrDir.open_from_transport(remote_transport)
296 # and make branch from the smart server which is stacked
297- new_dir = remote_bzrdir.sprout('newbranch', stacked=True)
298+ new_dir = remote_bzrdir.sprout('newbranch', stacked=True,
299+ create_tree_if_local=False)
300 # stacked repository
301 self.assertRevisionNotInRepository('newbranch', trunk_revid)
302 tree = new_dir.open_branch().create_checkout('local')
303@@ -187,7 +190,8 @@
304 mainline_revid = 'rev-1'
305 # and make branch from it which is stacked (with no tags)
306 try:
307- new_dir = trunk.bzrdir.sprout(self.get_url('newbranch'), stacked=True)
308+ new_dir = trunk.bzrdir.sprout(self.get_url('newbranch'),
309+ stacked=True, create_tree_if_local=False)
310 except unstackable_format_errors, e:
311 raise TestNotApplicable(e)
312 # stacked repository
313@@ -534,14 +538,12 @@
314 stack_on.commit('first commit', rev_id='rev1')
315 try:
316 stacked_dir = stack_on.bzrdir.sprout(
317- self.get_url('stacked'), stacked=True)
318+ self.get_url('stacked'), stacked=True,
319+ create_tree_if_local=False)
320 except unstackable_format_errors, e:
321 raise TestNotApplicable('Format does not support stacking.')
322- try:
323- stacked = stacked_dir.open_workingtree()
324- except errors.NoWorkingTree:
325- stacked = stacked_dir.open_branch().create_checkout(
326- 'stacked-checkout', lightweight=True)
327+ stacked = stacked_dir.open_branch().create_checkout(
328+ 'stacked-checkout', lightweight=True)
329 tree = stacked.branch.create_checkout('local')
330 tree.commit('second commit', rev_id='rev2')
331 # Sanity check: stacked's repo should not contain rev1, otherwise this
332
333=== modified file 'bzrlib/tests/per_repository/test_commit_builder.py'
334--- bzrlib/tests/per_repository/test_commit_builder.py 2011-12-22 16:17:28 +0000
335+++ bzrlib/tests/per_repository/test_commit_builder.py 2012-01-25 17:56:30 +0000
336@@ -284,6 +284,7 @@
337 if not builder.supports_record_entry_contents:
338 raise tests.TestNotApplicable("CommitBuilder doesn't support "
339 "record_entry_contents")
340+ builder.will_record_deletes()
341 ie = inventory.make_entry('directory', '', None,
342 tree.get_root_id())
343 delta, version_recorded, fs_hash = builder.record_entry_contents(
344@@ -299,7 +300,7 @@
345 if got_new_revision:
346 self.assertEqual(('', '', ie.file_id, ie), delta)
347 # The delta should be tracked
348- self.assertEqual(delta, builder._basis_delta[-1])
349+ self.assertEqual(delta, builder.get_basis_delta()[-1])
350 else:
351 self.assertEqual(None, delta)
352 # Directories do not get hashed.
353@@ -441,7 +442,7 @@
354 # the delta should be returned, and recorded in _basis_delta
355 delta = builder.record_delete("foo", "foo-id")
356 self.assertEqual(("foo", None, "foo-id", None), delta)
357- self.assertEqual(delta, builder._basis_delta[-1])
358+ self.assertEqual(delta, builder.get_basis_delta()[-1])
359 builder.finish_inventory()
360 rev_id2 = builder.commit('delete foo')
361 except:
362@@ -463,13 +464,14 @@
363 try:
364 builder = tree.branch.get_commit_builder([rev_id])
365 try:
366+ builder.will_record_deletes()
367 delete_change = ('foo-id', ('foo', None), True, (True, False),
368 (tree.path2id(''), None), ('foo', None), ('file', None),
369 (False, None))
370 list(builder.record_iter_changes(tree, rev_id,
371 [delete_change]))
372 self.assertEqual(("foo", None, "foo-id", None),
373- builder._basis_delta[0])
374+ builder.get_basis_delta()[0])
375 self.assertTrue(builder.any_changes())
376 builder.finish_inventory()
377 rev_id2 = builder.commit('delete foo')
378@@ -866,6 +868,7 @@
379 if not builder.supports_record_entry_contents:
380 raise tests.TestNotApplicable("CommitBuilder doesn't "
381 "support record_entry_contents")
382+ builder.will_record_deletes()
383 parent_tree = tree.basis_tree()
384 parent_tree.lock_read()
385 self.addCleanup(parent_tree.unlock)
386@@ -913,7 +916,8 @@
387 if delta_against_basis:
388 expected_delta = (name, new_name, file_id, new_entry)
389 # The delta should be recorded
390- self.assertEqual(expected_delta, builder._basis_delta[-1])
391+ self.assertEqual(expected_delta,
392+ builder.get_basis_delta()[-1])
393 else:
394 expected_delta = None
395 self.assertEqual(expected_delta, delta)
396@@ -954,6 +958,7 @@
397 # record_entry_contents.
398 parent_ids = tree.get_parent_ids()
399 builder = tree.branch.get_commit_builder(parent_ids)
400+ builder.will_record_deletes()
401 parent_tree = tree.basis_tree()
402 parent_tree.lock_read()
403 self.addCleanup(parent_tree.unlock)
404@@ -975,15 +980,6 @@
405 self.assertEqualStat(result[2][1], tree_file_stat[1])
406 else:
407 self.assertEqual([], result)
408- delta = builder._basis_delta
409- delta_dict = dict((change[2], change) for change in delta)
410- version_recorded = (file_id in delta_dict and
411- delta_dict[file_id][3] is not None and
412- delta_dict[file_id][3].revision == builder._new_revision_id)
413- if records_version:
414- self.assertTrue(version_recorded)
415- else:
416- self.assertFalse(version_recorded)
417 self.assertIs(None, builder.new_inventory)
418 builder.finish_inventory()
419 if tree.branch.repository._format.supports_full_versioned_files:
420@@ -993,6 +989,16 @@
421 self.assertEqual(inv_sha1, builder.inv_sha1)
422 self.assertIs(None, builder.new_inventory)
423 rev2 = builder.commit('')
424+ delta = builder.get_basis_delta()
425+ delta_dict = dict((change[2], change) for change in delta)
426+ version_recorded = (file_id in delta_dict and
427+ delta_dict[file_id][3] is not None and
428+ delta_dict[file_id][3].revision == rev2)
429+ if records_version:
430+ self.assertTrue(version_recorded)
431+ else:
432+ self.assertFalse(version_recorded)
433+
434 new_inventory = builder.revision_tree().inventory
435 new_entry = new_inventory[file_id]
436 if delta_against_basis:
437
438=== modified file 'bzrlib/tests/per_repository_reference/test_commit_with_stacking.py'
439--- bzrlib/tests/per_repository_reference/test_commit_with_stacking.py 2011-01-11 00:36:10 +0000
440+++ bzrlib/tests/per_repository_reference/test_commit_with_stacking.py 2012-01-25 17:56:30 +0000
441@@ -39,13 +39,10 @@
442 base_tree.commit('base adds f2', rev_id=self.r2_key[0])
443 stacked_url = urlutils.join(base_tree.branch.base, '../stacked')
444 stacked_bzrdir = base_tree.bzrdir.sprout(stacked_url,
445- stacked=True)
446- if isinstance(stacked_bzrdir, remote.RemoteBzrDir):
447- stacked_branch = stacked_bzrdir.open_branch()
448- stacked_tree = stacked_branch.create_checkout('stacked',
449- lightweight=True)
450- else:
451- stacked_tree = stacked_bzrdir.open_workingtree()
452+ stacked=True, create_tree_if_local=False)
453+ stacked_branch = stacked_bzrdir.open_branch()
454+ stacked_tree = stacked_branch.create_checkout('stacked-tree',
455+ lightweight=True)
456 return base_tree, stacked_tree
457
458
459@@ -79,7 +76,7 @@
460 base_tree, stacked_tree = self.make_stacked_target()
461 self.assertEqual(1,
462 len(stacked_tree.branch.repository._fallback_repositories))
463- self.build_tree_contents([('stacked/f1.txt', 'new content\n')])
464+ self.build_tree_contents([('stacked-tree/f1.txt', 'new content\n')])
465 stacked_tree.commit('new content', rev_id='new-rev-id')
466 # We open the repository without fallbacks to ensure the data is
467 # locally true
468@@ -142,7 +139,7 @@
469 def test_multi_stack(self):
470 """base + stacked + stacked-on-stacked"""
471 base_tree, stacked_tree = self.make_stacked_target()
472- self.build_tree(['stacked/f3.txt'])
473+ self.build_tree(['stacked-tree/f3.txt'])
474 stacked_tree.add(['f3.txt'], ['f3.txt-id'])
475 stacked_key = ('stacked-rev-id',)
476 stacked_tree.commit('add f3', rev_id=stacked_key[0])
477@@ -153,16 +150,13 @@
478 stacked2_url = urlutils.join(base_tree.branch.base, '../stacked2')
479 stacked2_bzrdir = stacked_tree.bzrdir.sprout(stacked2_url,
480 revision_id=self.r1_key[0],
481- stacked=True)
482- if isinstance(stacked2_bzrdir, remote.RemoteBzrDir):
483- stacked2_branch = stacked2_bzrdir.open_branch()
484- stacked2_tree = stacked2_branch.create_checkout('stacked2',
485- lightweight=True)
486- else:
487- stacked2_tree = stacked2_bzrdir.open_workingtree()
488+ stacked=True, create_tree_if_local=False)
489+ stacked2_branch = stacked2_bzrdir.open_branch()
490+ stacked2_tree = stacked2_branch.create_checkout('stacked2-tree',
491+ lightweight=True)
492 # stacked2 is stacked on stacked, but its content is rev1, so
493 # it needs to pull the basis information from a fallback-of-fallback.
494- self.build_tree(['stacked2/f3.txt'])
495+ self.build_tree(['stacked2-tree/f3.txt'])
496 stacked2_only_repo = self.get_only_repo(stacked2_tree)
497 self.assertPresent([], stacked2_only_repo.inventories,
498 [self.r1_key, self.r2_key])
499
500=== modified file 'bzrlib/tests/test_remote.py'
501--- bzrlib/tests/test_remote.py 2012-01-06 22:06:36 +0000
502+++ bzrlib/tests/test_remote.py 2012-01-25 17:56:30 +0000
503@@ -3934,11 +3934,11 @@
504 self.setup_smart_server_with_call_log()
505 tree1 = self.make_branch_and_tree('tree1', format='1.9')
506 tree1.commit('rev1', rev_id='rev1')
507- tree2 = tree1.branch.bzrdir.sprout('tree2', stacked=True
508- ).open_workingtree()
509- local_tree = tree2.branch.create_checkout('local')
510+ stacked_b = tree1.branch.bzrdir.sprout('branch2',
511+ stacked=True, create_tree_if_local=False).open_branch()
512+ local_tree = stacked_b.create_checkout('local')
513 local_tree.commit('local changes make me feel good.')
514- branch2 = Branch.open(self.get_url('tree2'))
515+ branch2 = Branch.open(self.get_url('branch2'))
516 branch2.lock_read()
517 self.addCleanup(branch2.unlock)
518 return tree1.branch, branch2
519@@ -4013,9 +4013,9 @@
520 # is itself stacked yields the full data from all three sources.
521 def make_stacked_stacked():
522 _, stacked = self.prepare_stacked_remote_branch()
523- tree = stacked.bzrdir.sprout('tree3', stacked=True
524- ).open_workingtree()
525- local_tree = tree.branch.create_checkout('local-tree3')
526+ local_branch = stacked.bzrdir.sprout('tree3', stacked=True,
527+ create_tree_if_local=False).open_branch()
528+ local_tree = local_branch.create_checkout('local-tree3')
529 local_tree.commit('more local changes are better')
530 branch = Branch.open(self.get_url('tree3'))
531 branch.lock_read()
532@@ -4113,37 +4113,60 @@
533 override_existing=True, info=orig_info)
534
535 def test_fetch_everything_backwards_compat(self):
536- """Can fetch with EverythingResult even with pre 2.4 servers.
537+ """Can fetch with EverythingResult even with pre 2.5 servers.
538
539- Pre-2.4 do not support 'everything' searches with the
540+ Pre-2.5 do not support 'everything' searches with the
541 Repository.get_stream_1.19 verb.
542 """
543- verb_log = []
544+ self.disable_search(lambda bytes: bytes == 'everything')
545+ local = self.make_branch('local')
546+ builder = self.make_branch_builder('remote')
547+ builder.build_commit(message="Commit.")
548+ remote_branch_url = self.smart_server.get_url() + 'remote'
549+ remote_branch = bzrdir.BzrDir.open(remote_branch_url).open_branch()
550+ self.hpss_calls = []
551+ fetch_spec = vf_search.EverythingResult(remote_branch.repository)
552+ local.repository.fetch(remote_branch.repository, fetch_spec=fetch_spec)
553+ # make sure the overridden verb was used
554+ self.assertLength(1, self.verb_log)
555+ # more than one HPSS call is needed, but because it's a VFS callback
556+ # its hard to predict exactly how many.
557+ self.assertTrue(len(self.hpss_calls) > 1)
558+
559+ def disable_search(self, reject_search_func):
560+ self.verb_log = verb_log = []
561 class OldGetStreamVerb(SmartServerRepositoryGetStream_1_19):
562 """A version of the Repository.get_stream_1.19 verb patched to
563- reject 'everything' searches the way 2.3 and earlier do.
564+ reject searches matching reject_search_func (to emulate old
565+ servers).
566 """
567 def recreate_search(self, repository, search_bytes,
568 discard_excess=False):
569 verb_log.append(search_bytes.split('\n', 1)[0])
570- if search_bytes == 'everything':
571- return (None,
572- request.FailedSmartServerResponse(('BadSearch',)))
573+ if reject_search_func(search_bytes):
574+ return (None, request.FailedSmartServerResponse(('BadSearch',)))
575 return super(OldGetStreamVerb,
576 self).recreate_search(repository, search_bytes,
577 discard_excess=discard_excess)
578 self.override_verb('Repository.get_stream_1.19', OldGetStreamVerb)
579+
580+ def test_fetch_texts_for_rev_backwards_compat(self):
581+ """Can fetch with TextsForRevisionSearchResult even with pre 2.5 servers.
582+
583+ Pre-2.5 do not support 'texts-for-rev' searches with the
584+ Repository.get_stream_1.19 verb.
585+ """
586+ self.disable_search(lambda bytes: bytes.startswith('texts-for-rev\n'))
587 local = self.make_branch('local')
588 builder = self.make_branch_builder('remote')
589- builder.build_commit(message="Commit.")
590+ builder.build_commit(message="Commit.", rev_id='rev-1')
591 remote_branch_url = self.smart_server.get_url() + 'remote'
592 remote_branch = bzrdir.BzrDir.open(remote_branch_url).open_branch()
593 self.hpss_calls = []
594- local.repository.fetch(
595- remote_branch.repository,
596- fetch_spec=vf_search.EverythingResult(remote_branch.repository))
597+ local.repository.fetch(remote_branch.repository,
598+ fetch_spec=vf_search.TextsForRevisionSearchResult('rev-1'))
599 # make sure the overridden verb was used
600- self.assertLength(1, verb_log)
601+ self.assertLength(1, self.verb_log)
602 # more than one HPSS call is needed, but because it's a VFS callback
603 # its hard to predict exactly how many.
604 self.assertTrue(len(self.hpss_calls) > 1)
605
606=== modified file 'bzrlib/tests/test_repository.py'
607--- bzrlib/tests/test_repository.py 2011-12-22 19:54:56 +0000
608+++ bzrlib/tests/test_repository.py 2012-01-25 17:56:30 +0000
609@@ -1559,7 +1559,8 @@
610 """
611 b_source = self.make_abc_branch()
612 b_base = b_source.bzrdir.sprout('base', revision_id='A').open_branch()
613- b_stacked = b_base.bzrdir.sprout('stacked', stacked=True).open_branch()
614+ b_stacked = b_base.bzrdir.sprout('stacked', stacked=True,
615+ create_tree_if_local=False).open_branch()
616 b_stacked.lock_write()
617 self.addCleanup(b_stacked.unlock)
618 b_stacked.fetch(b_source, 'B')
619
620=== modified file 'bzrlib/vf_search.py'
621--- bzrlib/vf_search.py 2011-12-19 13:23:58 +0000
622+++ bzrlib/vf_search.py 2012-01-25 17:56:30 +0000
623@@ -61,6 +61,9 @@
624 """
625 raise NotImplementedError(self.get_keys)
626
627+ def get_special(self):
628+ return []
629+
630 def is_empty(self):
631 """Return false if the search lists 1 or more revisions."""
632 raise NotImplementedError(self.is_empty)
633@@ -509,3 +512,93 @@
634 # walking, so we can exclude them from the start list.
635 start_keys = set(start_keys).difference(found_heads)
636 return start_keys, exclude_keys, len(keys)
637+
638+
639+class UnionSearchResult(AbstractSearchResult):
640+ """The union of multiple search results."""
641+
642+ def __init__(self, search_results):
643+ self.search_results = frozenset(search_results)
644+
645+ def get_recipe(self):
646+ """Return a recipe that can be used to replay this search.
647+
648+ :return: A tuple of ('union', (recipe, ...)).
649+ """
650+ return ('union', tuple(sr.get_recipe() for sr in self.search_results))
651+
652+ def get_network_struct(self):
653+ """Return a tuple that can be transmitted via the HPSS protocol."""
654+ return ('union',
655+ tuple(sr.get_network_struct() for sr in self.search_results))
656+
657+ def get_keys(self):
658+ """Return the keys found in this search.
659+
660+ :return: A set of keys.
661+ """
662+ return set(sr.get_keys() for sr in self.search_results)
663+
664+ def get_special(self):
665+ specials = []
666+ for sr in self.search_results:
667+ specials.extend(sr.get_special())
668+ return specials
669+
670+ def is_empty(self):
671+ """Return false if the search lists 1 or more revisions."""
672+ return all(sr.is_empty() for sr in self.search_results)
673+
674+ def refine(self, seen, referenced):
675+ """Create a new search by refining this search.
676+
677+ :param seen: Revisions that have been satisfied.
678+ :param referenced: Revision references observed while satisfying some
679+ of this search.
680+ :return: A search result.
681+ """
682+ raise NotImplementedError(self.refine)
683+
684+
685+class TextsForRevisionSearchResult(AbstractSearchResult):
686+ """A search result specifying all the texts for a given revision."""
687+
688+ def __init__(self, revision_id):
689+ """Constructor.
690+
691+ :param revision_id: A revision ID. All records needed to create a
692+ checkout of this revision are referenced by this result (i.e. the
693+ revision record and corresponding inventory record, and all
694+ referenced texts and chk_bytes).
695+ """
696+ self.revision_id = revision_id
697+
698+ def get_recipe(self):
699+ """Return a recipe that can be used to replay this search.
700+
701+ :return: A tuple of ('texts-for-rev', revision_id)
702+ """
703+ return ('texts-for-rev', self.revision_id)
704+
705+ def get_network_struct(self):
706+ """Return a tuple that can be transmitted via the HPSS protocol."""
707+ return ('texts-for-rev', self.revision_id)
708+
709+ def get_keys(self):
710+ return [self.revision_id]
711+
712+ def get_special(self):
713+ return [('texts-for-rev', self.revision_id)]
714+
715+ def is_empty(self):
716+ """Return false if the search lists 1 or more revisions."""
717+ return False
718+
719+ def refine(self, seen, referenced):
720+ # XXX: this is pretty wrong, but texts-for-rev is advisory rather than
721+ # mandatory I guess...
722+ #trace.mutter("seen, ref'd: %r, %r", seen, referenced)
723+ res = SearchResult(set(), set(referenced), 0, set()).refine(seen, referenced)
724+ #trace.mutter("-> %r", res)
725+ return res
726+ #raise NotImplementedError(self.refine)
727
728=== modified file 'doc/en/release-notes/bzr-2.5.txt'
729--- doc/en/release-notes/bzr-2.5.txt 2012-01-23 15:02:51 +0000
730+++ doc/en/release-notes/bzr-2.5.txt 2012-01-25 17:56:30 +0000
731@@ -36,6 +36,10 @@
732 a control directory (but no branch or working tree).
733 (Jelmer Vernooij, #913980)
734
735+* Much improved performance of ``bzr branch --stacked`` against the
736+ smart server, by introduction of a new search type.
737+ (Andrew Bennetts, Jelmer Vernooij, #791704)
738+
739 * New HPSS call for ``BzrDir.get_branches``. (Jelmer Vernooij, #894460)
740
741 * Two new command hooks, ``pre_command`` and ``post_command``,