Merge lp:~spiv/bzr/hardlink-2a-408193 into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Approved by: Martin Pool
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/hardlink-2a-408193
Merge into: lp:bzr
Diff against target: 202 lines (+60/-40)
8 files modified
NEWS (+4/-0)
bzrlib/builtins.py (+0/-3)
bzrlib/tests/blackbox/test_branch.py (+1/-9)
bzrlib/tests/blackbox/test_checkout.py (+1/-9)
bzrlib/tests/per_tree/test_iter_search_rules.py (+0/-1)
bzrlib/tests/test_transform.py (+45/-2)
bzrlib/transform.py (+6/-2)
bzrlib/workingtree_4.py (+3/-14)
To merge this branch: bzr merge lp:~spiv/bzr/hardlink-2a-408193
Reviewer Review Type Date Requested Status
Martin Pool Approve
John A Meinel Approve
Review via email: mp+15591@code.launchpad.net

This proposal supersedes a proposal from 2009-11-27.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This tries to fix bug 408193, --hardlink not working on recent tree formats. I'm not certain this is a correct fix, as I haven't looked at content filtering code before, so I'd love some careful feedback on this approach.

This patch doesn't seem to break any tests, and it doesn't seem to change performance for the worse. For formats that don't support content filtering, nothing changes. For formats that do, this (assuming I've written this correctly) looks at each file to see if any filtering rules apply to it and if so skips hardlinking of it.

In my testing, if you have no rules configured on a 2a tree, on bzr trees it gives the same performance and hardlinks as --hardlink of a 1.9 format tree. (Both cases with 2a branch and repo, just varying tree format.)

In practice, for me, this makes branching 2a format bzr.dev trees speed up from 3s to 1s. Plus of course uses less disk.

This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Fri, 2009-11-27 at 00:50 +0000, Andrew Bennetts wrote:

> This patch almost certainly deserves new tests. I'd appreciate some suggestions about which tests to add, and where. I guess maybe in test_transform?

That sounds appropriate to me, and the approach you've taken sounds
plausible.

Revision history for this message
Martin Pool (mbp) wrote : Posted in a previous version of this proposal

news please.

This looks like a plausible fix to me. I probably wouldn't trust it to go into 2.0 though, at least not without a shakedown in trunk.

There is another test relevant to this in test_checkout so you should at least reenable that.

I guess ideally you want a test that shows that things are not hardlinked when the filters cause them to have different content. However, you could make a case that we should optimistically land this and then see if things actually fail that way.

review: Needs Fixing
Revision history for this message
Andrew Bennetts (spiv) wrote :

Thanks very much for the reviews so far. This patch adds a test, as well as the other changes suggested in the earlier reviews.

The test is a bit ugly, but I'm not sure if there's a better way to write it. There are quite a few layers involved in the content filtering APIs. I think it's probably ok, but an extra review won't hurt :)

I agree that this is a bit risky for 2.0, so I wouldn't propose backporting it without a) someone clamouring for it, and b) time for it to be tested in trunk.

Revision history for this message
John A Meinel (jameinel) wrote :

46 + self.assertEqual(source_stat, target_stat)

^- It is generally better to use "self.assertEqualStat()".

I realize it isn't strictly required here, since you aren't doing an fstat versus lstat, etc.

118 + os.mkdir(self.test_home_dir + '/.bazaar')
119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
120 + f = open(rules_filename, 'w')
121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
122 + f.close()

^- Could you use ".build_tree()" and ".build_tree_contents()" here? I'll also note that you should at least use 'wb', since I don't think you want to get \r\n content on Windows. (maybe this doesn't matter as much since it is a config file?)

Would it be easier to use something like "fn = config.get_rules_filename()" ? I don't know how that path is currently determined, but it doesn't seem like you should be going via self.test_home_dir. (If there isn't an api for it, just submit a bug, and leave the test alone.)

Otherwise I think the change is reasonable.

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
> Review: Approve
> 46 + self.assertEqual(source_stat, target_stat)
>
> ^- It is generally better to use "self.assertEqualStat()".
>
> I realize it isn't strictly required here, since you aren't doing an fstat
> versus lstat, etc.

Ok, although there's no assertNotEqualStat, so I'll leave the corresponding
assertNotEqual as it is.

> 118 + os.mkdir(self.test_home_dir + '/.bazaar')
> 119 + rules_filename = self.test_home_dir + '/.bazaar/rules'
> 120 + f = open(rules_filename, 'w')
> 121 + f.write('[name %s]\nrot13=yes\n' % (pattern,))
> 122 + f.close()
>
> ^- Could you use ".build_tree()" and ".build_tree_contents()" here? I'll also
> note that you should at least use 'wb', since I don't think you want to get
> \r\n content on Windows. (maybe this doesn't matter as much since it is a
> config file?)

Good point about 'wb', changed (although I don't know the precise rules for
newlines in config files).

I cannot easily use build_tree_contents, because doesn't provide a convenient
way to build a tree relative to a dir other than '.'. Perhaps it should be
updated to take an optional transport and use put_bytes_non_atomic like
TestCase.build_tree? And I still need to remove or rename the file afterwards
before I call reset_rules()... probably that's something that should be handled
by the base TestCase, but it isn't at the moment.

> Would it be easier to use something like "fn = config.get_rules_filename()" ?
> I don't know how that path is currently determined, but it doesn't seem like
> you should be going via self.test_home_dir. (If there isn't an api for it,
> just submit a bug, and leave the test alone.)
>
> Otherwise I think the change is reasonable.

Hmm, there's bzrlib.rules.rules_filename(). I kind of like the safety of using
the test_home_dir explicitly though, it's easy to imagine a naïve optimisation
of rules_filename() caching and returning the wrong value — after all I need to
call reset_rules() because of similar caching. I might be too pessimistic
though. Anyway, I'd still need to makedirs(os.path.dirname(fn)), so it wouldn't
end up being much easier.

You do have a point that ideally I'd have essentially a one-liner:

     write_file(rules_filename(), '''...''')

And instead I have 5 lines, even ignoring the reset_rules call and cleanups.

So this is something to revisit if we write more tests that want to temporarily
install some rules, but I hope it's good enough for now.

-Andrew.

Revision history for this message
Martin Pool (mbp) wrote :

If I touched this again I'd probably add assertIsHardLinked. Not needed now. (And this is a case where testtools's assertIs would be nice, so you didn't need to separately write assertIsNotHardLinked.)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-12-03 02:24:54 +0000
3+++ NEWS 2009-12-04 06:17:11 +0000
4@@ -48,6 +48,10 @@
5 * Terminate ssh subprocesses when no references to them remain, fixing
6 subprocess and file descriptor leaks. (Andrew Bennetts, #426662)
7
8+* The ``--hardlink`` option of ``bzr branch`` and ``bzr checkout`` now
9+ works for 2a format trees. Only files unaffected by content filters
10+ will be hardlinked. (Andrew Bennetts, #408193)
11+
12 * The new glob expansion on Windows would replace all ``\`` characters
13 with ``/`` even if it there wasn't a glob to expand, the arg was quoted,
14 etc. Now only change slashes if there is something being glob expanded.
15
16=== modified file 'bzrlib/builtins.py'
17--- bzrlib/builtins.py 2009-12-03 20:56:18 +0000
18+++ bzrlib/builtins.py 2009-12-04 06:17:11 +0000
19@@ -1209,9 +1209,6 @@
20 from bzrlib.tag import _merge_tags_if_possible
21 accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
22 from_location)
23- if (accelerator_tree is not None and
24- accelerator_tree.supports_content_filtering()):
25- accelerator_tree = None
26 revision = _get_one_revision('branch', revision)
27 br_from.lock_read()
28 try:
29
30=== modified file 'bzrlib/tests/blackbox/test_branch.py'
31--- bzrlib/tests/blackbox/test_branch.py 2009-08-20 12:30:49 +0000
32+++ bzrlib/tests/blackbox/test_branch.py 2009-12-04 06:17:11 +0000
33@@ -171,15 +171,7 @@
34 out, err = self.run_bzr(['branch', 'source', 'target', '--hardlink'])
35 source_stat = os.stat('source/file1')
36 target_stat = os.stat('target/file1')
37- same_file = (source_stat == target_stat)
38- if same_file:
39- pass
40- else:
41- # https://bugs.edge.launchpad.net/bzr/+bug/408193
42- self.assertContainsRe(err, "hardlinking working copy files is "
43- "not currently supported")
44- raise KnownFailure("--hardlink doesn't work in formats "
45- "that support content filtering (#408193)")
46+ self.assertEqual(source_stat, target_stat)
47
48 def test_branch_standalone(self):
49 shared_repo = self.make_repository('repo', shared=True)
50
51=== modified file 'bzrlib/tests/blackbox/test_checkout.py'
52--- bzrlib/tests/blackbox/test_checkout.py 2009-08-03 04:19:03 +0000
53+++ bzrlib/tests/blackbox/test_checkout.py 2009-12-04 06:17:11 +0000
54@@ -160,12 +160,4 @@
55 '--hardlink'])
56 source_stat = os.stat('source/file1')
57 target_stat = os.stat('target/file1')
58- same_file = (source_stat == target_stat)
59- if same_file:
60- pass
61- else:
62- # https://bugs.edge.launchpad.net/bzr/+bug/408193
63- self.assertContainsRe(err, "hardlinking working copy files is "
64- "not currently supported")
65- raise KnownFailure("--hardlink doesn't work in formats "
66- "that support content filtering (#408193)")
67+ self.assertEqual(source_stat, target_stat)
68
69=== modified file 'bzrlib/tests/per_tree/test_iter_search_rules.py'
70--- bzrlib/tests/per_tree/test_iter_search_rules.py 2009-07-10 07:14:02 +0000
71+++ bzrlib/tests/per_tree/test_iter_search_rules.py 2009-12-04 06:17:11 +0000
72@@ -18,7 +18,6 @@
73
74 from bzrlib import (
75 rules,
76- tests,
77 )
78 from bzrlib.tests.per_tree import TestCaseWithTree
79
80
81=== modified file 'bzrlib/tests/test_transform.py'
82--- bzrlib/tests/test_transform.py 2009-11-18 16:10:18 +0000
83+++ bzrlib/tests/test_transform.py 2009-12-04 06:17:11 +0000
84@@ -15,18 +15,18 @@
85 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
86
87 import os
88-import stat
89 from StringIO import StringIO
90 import sys
91
92 from bzrlib import (
93 bencode,
94 errors,
95+ filters,
96 generate_ids,
97 osutils,
98 progress,
99 revision as _mod_revision,
100- symbol_versioning,
101+ rules,
102 tests,
103 urlutils,
104 )
105@@ -1868,6 +1868,49 @@
106 self.assertEqual([], list(target.iter_changes(revision_tree)))
107 self.assertTrue(source.is_executable('file1-id'))
108
109+ def install_rot13_content_filter(self, pattern):
110+ original_registry = filters._reset_registry()
111+ def restore_registry():
112+ filters._reset_registry(original_registry)
113+ self.addCleanup(restore_registry)
114+ def rot13(chunks, context=None):
115+ return [''.join(chunks).encode('rot13')]
116+ rot13filter = filters.ContentFilter(rot13, rot13)
117+ filters.register_filter_stack_map('rot13', {'yes': [rot13filter]}.get)
118+ os.mkdir(self.test_home_dir + '/.bazaar')
119+ rules_filename = self.test_home_dir + '/.bazaar/rules'
120+ f = open(rules_filename, 'wb')
121+ f.write('[name %s]\nrot13=yes\n' % (pattern,))
122+ f.close()
123+ def uninstall_rules():
124+ os.remove(rules_filename)
125+ rules.reset_rules()
126+ self.addCleanup(uninstall_rules)
127+ rules.reset_rules()
128+
129+ def test_build_tree_content_filtered_files_are_not_hardlinked(self):
130+ """build_tree will not hardlink files that have content filtering rules
131+ applied to them (but will still hardlink other files from the same tree
132+ if it can).
133+ """
134+ self.requireFeature(HardlinkFeature)
135+ self.install_rot13_content_filter('file1')
136+ source = self.create_ab_tree()
137+ target = self.make_branch_and_tree('target')
138+ revision_tree = source.basis_tree()
139+ revision_tree.lock_read()
140+ self.addCleanup(revision_tree.unlock)
141+ build_tree(revision_tree, target, source, hardlink=True)
142+ target.lock_read()
143+ self.addCleanup(target.unlock)
144+ self.assertEqual([], list(target.iter_changes(revision_tree)))
145+ source_stat = os.stat('source/file1')
146+ target_stat = os.stat('target/file1')
147+ self.assertNotEqual(source_stat, target_stat)
148+ source_stat = os.stat('source/file2')
149+ target_stat = os.stat('target/file2')
150+ self.assertEqualStat(source_stat, target_stat)
151+
152 def test_case_insensitive_build_tree_inventory(self):
153 if (tests.CaseInsensitiveFilesystemFeature.available()
154 or tests.CaseInsCasePresFilenameFeature.available()):
155
156=== modified file 'bzrlib/transform.py'
157--- bzrlib/transform.py 2009-12-03 02:24:54 +0000
158+++ bzrlib/transform.py 2009-12-04 06:17:11 +0000
159@@ -2300,8 +2300,12 @@
160 new_desired_files = desired_files
161 else:
162 iter = accelerator_tree.iter_changes(tree, include_unchanged=True)
163- unchanged = dict((f, p[1]) for (f, p, c, v, d, n, k, e)
164- in iter if not (c or e[0] != e[1]))
165+ unchanged = [(f, p[1]) for (f, p, c, v, d, n, k, e)
166+ in iter if not (c or e[0] != e[1])]
167+ if accelerator_tree.supports_content_filtering():
168+ unchanged = [(f, p) for (f, p) in unchanged
169+ if not accelerator_tree.iter_search_rules([p]).next()]
170+ unchanged = dict(unchanged)
171 new_desired_files = []
172 count = 0
173 for file_id, (trans_id, tree_path) in desired_files:
174
175=== modified file 'bzrlib/workingtree_4.py'
176--- bzrlib/workingtree_4.py 2009-11-14 11:06:51 +0000
177+++ bzrlib/workingtree_4.py 2009-12-04 06:17:11 +0000
178@@ -1447,21 +1447,10 @@
179 if basis_root_id is not None:
180 wt._set_root_id(basis_root_id)
181 wt.flush()
182- # If content filtering is supported, do not use the accelerator
183- # tree - the cost of transforming the content both ways and
184- # checking for changed content can outweight the gains it gives.
185- # Note: do NOT move this logic up higher - using the basis from
186- # the accelerator tree is still desirable because that can save
187- # a minute or more of processing on large trees!
188- # The original tree may not have the same content filters
189- # applied so we can't safely build the inventory delta from
190- # the source tree.
191 if wt.supports_content_filtering():
192- if hardlink:
193- # see https://bugs.edge.launchpad.net/bzr/+bug/408193
194- trace.warning("hardlinking working copy files is not currently "
195- "supported in %r" % (wt,))
196- accelerator_tree = None
197+ # The original tree may not have the same content filters
198+ # applied so we can't safely build the inventory delta from
199+ # the source tree.
200 delta_from_tree = False
201 else:
202 delta_from_tree = True