Merge lp:~jameinel/bzr/2.4-status-no-changes-765881 into lp:bzr

Proposed by John A Meinel
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 5809
Proposed branch: lp:~jameinel/bzr/2.4-status-no-changes-765881
Merge into: lp:bzr
Diff against target: 302 lines (+84/-49)
5 files modified
bzrlib/_dirstate_helpers_pyx.pyx (+16/-4)
bzrlib/dirstate.py (+6/-1)
bzrlib/tests/test__dirstate_helpers.py (+23/-7)
bzrlib/tests/test_dirstate.py (+33/-37)
doc/en/release-notes/bzr-2.4.txt (+6/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.4-status-no-changes-765881
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) code Approve
bzr-core Pending
Review via email: mp+58313@code.launchpad.net

Commit message

Fix bug #765881, we don't need to save the dirstate file for trivial changes.

Description of the change

This fixes a bug that Mark Shuttleworth pointed out to me (bug #765881)

Basically, our 'bzr status' times were a lot slower than just the 'iter_changes()' times, which was confusing. It turned out that whenever you had files that were just newly added, *every* 'bzr status' from then on would rewrite the dirstate file. This happened because the shortcut case of "don't compute the sha1" would trigger a "and the dirstate is considered modified".

While I was there, I went ahead and updated some tests to also treat newly-changed directory entries as not-worth-saving. The reasoning is that all of the information we would be updating is trivially derived from only the stat object. And we have to stat the entry anyway, so we don't save any future work.
There is more that we could do there, but I think this is an incremental improvement.

It turns out that this also affected a couple of tests that were using trivial changes to trigger IN_MEMORY_MODIFIED. I rewrote them a bit to use slightly-less trivial changes (though eventually we may not save those, either.)

To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Nice improvement :) I'm voting approve, but I should note I don't have a lot of experience working on dirstate. Perhaps somebody else can have a quick look as well?

review: Approve (code)
Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/_dirstate_helpers_pyx.pyx'
2--- bzrlib/_dirstate_helpers_pyx.pyx 2010-08-27 18:02:22 +0000
3+++ bzrlib/_dirstate_helpers_pyx.pyx 2011-04-19 14:27:37 +0000
4@@ -866,6 +866,7 @@
5 # _st mode of the compiled stat objects.
6 cdef int minikind, saved_minikind
7 cdef void * details
8+ cdef int worth_saving
9 minikind = minikind_from_mode(stat_value.st_mode)
10 if 0 == minikind:
11 return None
12@@ -900,6 +901,7 @@
13 # If we have gotten this far, that means that we need to actually
14 # process this entry.
15 link_or_sha1 = None
16+ worth_saving = 1
17 if minikind == c'f':
18 executable = self._is_executable(stat_value.st_mode,
19 saved_executable)
20@@ -916,10 +918,15 @@
21 entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
22 executable, packed_stat)
23 else:
24- entry[1][0] = ('f', '', stat_value.st_size,
25- executable, DirState.NULLSTAT)
26+ # This file is not worth caching the sha1. Either it is too new, or
27+ # it is newly added. Regardless, the only things we are changing
28+ # are derived from the stat, and so are not worth caching. So we do
29+ # *not* set the IN_MEMORY_MODIFIED flag. (But we'll save the
30+ # updated values if there is *other* data worth saving.)
31+ entry[1][0] = ('f', '', stat_value.st_size, executable,
32+ DirState.NULLSTAT)
33+ worth_saving = 0
34 elif minikind == c'd':
35- link_or_sha1 = None
36 entry[1][0] = ('d', '', 0, False, packed_stat)
37 if saved_minikind != c'd':
38 # This changed from something into a directory. Make sure we
39@@ -929,6 +936,10 @@
40 self._get_block_entry_index(entry[0][0], entry[0][1], 0)
41 self._ensure_block(block_index, entry_index,
42 pathjoin(entry[0][0], entry[0][1]))
43+ else:
44+ # Any changes are derived trivially from the stat object, not worth
45+ # re-writing a dirstate for just this
46+ worth_saving = 0
47 elif minikind == c'l':
48 link_or_sha1 = self._read_link(abspath, saved_link_or_sha1)
49 if self._cutoff_time is None:
50@@ -940,7 +951,8 @@
51 else:
52 entry[1][0] = ('l', '', stat_value.st_size,
53 False, DirState.NULLSTAT)
54- self._dirblock_state = DirState.IN_MEMORY_MODIFIED
55+ if worth_saving:
56+ self._dirblock_state = DirState.IN_MEMORY_MODIFIED
57 return link_or_sha1
58
59
60
61=== modified file 'bzrlib/dirstate.py'
62--- bzrlib/dirstate.py 2011-04-08 12:28:05 +0000
63+++ bzrlib/dirstate.py 2011-04-19 14:27:37 +0000
64@@ -3205,6 +3205,7 @@
65 # If we have gotten this far, that means that we need to actually
66 # process this entry.
67 link_or_sha1 = None
68+ worth_saving = True
69 if minikind == 'f':
70 executable = state._is_executable(stat_value.st_mode,
71 saved_executable)
72@@ -3226,6 +3227,7 @@
73 else:
74 entry[1][0] = ('f', '', stat_value.st_size,
75 executable, DirState.NULLSTAT)
76+ worth_saving = False
77 elif minikind == 'd':
78 link_or_sha1 = None
79 entry[1][0] = ('d', '', 0, False, packed_stat)
80@@ -3237,6 +3239,8 @@
81 state._get_block_entry_index(entry[0][0], entry[0][1], 0)
82 state._ensure_block(block_index, entry_index,
83 osutils.pathjoin(entry[0][0], entry[0][1]))
84+ else:
85+ worth_saving = False
86 elif minikind == 'l':
87 link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
88 if state._cutoff_time is None:
89@@ -3248,7 +3252,8 @@
90 else:
91 entry[1][0] = ('l', '', stat_value.st_size,
92 False, DirState.NULLSTAT)
93- state._dirblock_state = DirState.IN_MEMORY_MODIFIED
94+ if worth_saving:
95+ state._dirblock_state = DirState.IN_MEMORY_MODIFIED
96 return link_or_sha1
97
98
99
100=== modified file 'bzrlib/tests/test__dirstate_helpers.py'
101--- bzrlib/tests/test__dirstate_helpers.py 2011-01-10 22:20:12 +0000
102+++ bzrlib/tests/test__dirstate_helpers.py 2011-04-19 14:27:37 +0000
103@@ -871,10 +871,12 @@
104 stat_value=stat_value)
105 self.assertEqual(None, link_or_sha1)
106
107- # The dirblock entry should not have cached the file's sha1 (too new)
108+ # The dirblock entry should not have computed or cached the file's
109+ # sha1, but it did update the files' st_size. However, this is not
110+ # worth writing a dirstate file for, so we leave the state UNMODIFIED
111 self.assertEqual(('f', '', 14, False, dirstate.DirState.NULLSTAT),
112 entry[1][0])
113- self.assertEqual(dirstate.DirState.IN_MEMORY_MODIFIED,
114+ self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
115 state._dirblock_state)
116 mode = stat_value.st_mode
117 self.assertEqual([('is_exec', mode, False)], state._log)
118@@ -883,9 +885,8 @@
119 self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
120 state._dirblock_state)
121
122- # If we do it again right away, we don't know if the file has changed
123- # so we will re-read the file. Roll the clock back so the file is
124- # guaranteed to look too new.
125+ # Roll the clock back so the file is guaranteed to look too new. We
126+ # should still not compute the sha1.
127 state.adjust_time(-10)
128 del state._log[:]
129
130@@ -893,7 +894,7 @@
131 stat_value=stat_value)
132 self.assertEqual([('is_exec', mode, False)], state._log)
133 self.assertEqual(None, link_or_sha1)
134- self.assertEqual(dirstate.DirState.IN_MEMORY_MODIFIED,
135+ self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
136 state._dirblock_state)
137 self.assertEqual(('f', '', 14, False, dirstate.DirState.NULLSTAT),
138 entry[1][0])
139@@ -909,6 +910,8 @@
140 self.assertEqual([('is_exec', mode, False)], state._log)
141 self.assertEqual(('f', '', 14, False, dirstate.DirState.NULLSTAT),
142 entry[1][0])
143+ self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
144+ state._dirblock_state)
145
146 # If the file is no longer new, and the clock has been moved forward
147 # sufficiently, it will cache the sha.
148@@ -1005,12 +1008,25 @@
149 self.build_tree(['a/'])
150 state.adjust_time(+20)
151 self.assertIs(None, self.do_update_entry(state, entry, 'a'))
152+ # a/ used to be a file, but is now a directory, worth saving
153 self.assertEqual(dirstate.DirState.IN_MEMORY_MODIFIED,
154 state._dirblock_state)
155 state.save()
156 self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
157 state._dirblock_state)
158- self.assertIs(None, self.do_update_entry(state, entry, 'a'))
159+ # No changes to a/ means not worth saving.
160+ self.assertIs(None, self.do_update_entry(state, entry, 'a'))
161+ self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
162+ state._dirblock_state)
163+ # Change the last-modified time for the directory
164+ t = time.time() - 100.0
165+ os.utime('a', (t, t))
166+ saved_packed_stat = entry[1][0][-1]
167+ self.assertIs(None, self.do_update_entry(state, entry, 'a'))
168+ # We *do* go ahead and update the information in the dirblocks, but we
169+ # don't bother setting IN_MEMORY_MODIFIED because it is trivial to
170+ # recompute.
171+ self.assertNotEqual(saved_packed_stat, entry[1][0][-1])
172 self.assertEqual(dirstate.DirState.IN_MEMORY_UNMODIFIED,
173 state._dirblock_state)
174
175
176=== modified file 'bzrlib/tests/test_dirstate.py'
177--- bzrlib/tests/test_dirstate.py 2011-01-25 22:54:08 +0000
178+++ bzrlib/tests/test_dirstate.py 2011-04-19 14:27:37 +0000
179@@ -527,6 +527,19 @@
180
181 class TestDirStateOnFile(TestCaseWithDirState):
182
183+ def create_updated_dirstate(self):
184+ self.build_tree(['a-file'])
185+ tree = self.make_branch_and_tree('.')
186+ tree.add(['a-file'], ['a-id'])
187+ tree.commit('add a-file')
188+ # Save and unlock the state, re-open it in readonly mode
189+ state = dirstate.DirState.from_tree(tree, 'dirstate')
190+ state.save()
191+ state.unlock()
192+ state = dirstate.DirState.on_file('dirstate')
193+ state.lock_read()
194+ return state
195+
196 def test_construct_with_path(self):
197 tree = self.make_branch_and_tree('tree')
198 state = dirstate.DirState.from_tree(tree, 'dirstate.from_tree')
199@@ -561,36 +574,24 @@
200 state.unlock()
201
202 def test_can_save_in_read_lock(self):
203- self.build_tree(['a-file'])
204- state = dirstate.DirState.initialize('dirstate')
205- try:
206- # No stat and no sha1 sum.
207- state.add('a-file', 'a-file-id', 'file', None, '')
208- state.save()
209- finally:
210- state.unlock()
211-
212- # Now open in readonly mode
213- state = dirstate.DirState.on_file('dirstate')
214- state.lock_read()
215+ state = self.create_updated_dirstate()
216 try:
217 entry = state._get_entry(0, path_utf8='a-file')
218 # The current size should be 0 (default)
219 self.assertEqual(0, entry[1][0][2])
220 # We should have a real entry.
221 self.assertNotEqual((None, None), entry)
222- # Make sure everything is old enough
223+ # Set the cutoff-time into the future, so things look cacheable
224 state._sha_cutoff_time()
225- state._cutoff_time += 10
226- # Change the file length
227- self.build_tree_contents([('a-file', 'shorter')])
228- sha1sum = dirstate.update_entry(state, entry, 'a-file',
229- os.lstat('a-file'))
230- # new file, no cached sha:
231- self.assertEqual(None, sha1sum)
232+ state._cutoff_time += 10.0
233+ st = os.lstat('a-file')
234+ sha1sum = dirstate.update_entry(state, entry, 'a-file', st)
235+ # We updated the current sha1sum because the file is cacheable
236+ self.assertEqual('ecc5374e9ed82ad3ea3b4d452ea995a5fd3e70e3',
237+ sha1sum)
238
239 # The dirblock has been updated
240- self.assertEqual(7, entry[1][0][2])
241+ self.assertEqual(st.st_size, entry[1][0][2])
242 self.assertEqual(dirstate.DirState.IN_MEMORY_MODIFIED,
243 state._dirblock_state)
244
245@@ -606,29 +607,24 @@
246 state.lock_read()
247 try:
248 entry = state._get_entry(0, path_utf8='a-file')
249- self.assertEqual(7, entry[1][0][2])
250+ self.assertEqual(st.st_size, entry[1][0][2])
251 finally:
252 state.unlock()
253
254 def test_save_fails_quietly_if_locked(self):
255 """If dirstate is locked, save will fail without complaining."""
256- self.build_tree(['a-file'])
257- state = dirstate.DirState.initialize('dirstate')
258- try:
259- # No stat and no sha1 sum.
260- state.add('a-file', 'a-file-id', 'file', None, '')
261- state.save()
262- finally:
263- state.unlock()
264-
265- state = dirstate.DirState.on_file('dirstate')
266- state.lock_read()
267+ state = self.create_updated_dirstate()
268 try:
269 entry = state._get_entry(0, path_utf8='a-file')
270- sha1sum = dirstate.update_entry(state, entry, 'a-file',
271- os.lstat('a-file'))
272- # No sha - too new
273- self.assertEqual(None, sha1sum)
274+ # No cached sha1 yet.
275+ self.assertEqual('', entry[1][0][1])
276+ # Set the cutoff-time into the future, so things look cacheable
277+ state._sha_cutoff_time()
278+ state._cutoff_time += 10.0
279+ st = os.lstat('a-file')
280+ sha1sum = dirstate.update_entry(state, entry, 'a-file', st)
281+ self.assertEqual('ecc5374e9ed82ad3ea3b4d452ea995a5fd3e70e3',
282+ sha1sum)
283 self.assertEqual(dirstate.DirState.IN_MEMORY_MODIFIED,
284 state._dirblock_state)
285
286
287=== modified file 'doc/en/release-notes/bzr-2.4.txt'
288--- doc/en/release-notes/bzr-2.4.txt 2011-04-19 14:12:43 +0000
289+++ doc/en/release-notes/bzr-2.4.txt 2011-04-19 14:27:37 +0000
290@@ -304,6 +304,12 @@
291 * ``bzr serve`` no longer crashes when a server_started hook is installed and
292 IPv6 support is available on the system. (Jelmer Vernooij, #293697)
293
294+* ``bzr status`` will not rewrite the dirstate file if it only has
295+ 'trivial' changes. (Currently limited to dir updates and newly-added
296+ files changing state.) This saves a bit of time for regular operations.
297+ eg. ``bzr status`` in a 100k tree takes 1.4s to compute the status, but 1s
298+ to re-save the dirstate file. (John Arbash Meinel, #765881)
299+
300 * ``bzr tags`` will no longer choke on branches with ghost revisions in
301 their mainline and tags on revisions not in the branch ancestry.
302 (Jelmer Vernooij, #397556)