Merge lp:~vila/bzr/948339-config-caching into lp:bzr

Proposed by Vincent Ladeuil on 2012-03-13
Status: Merged
Approved by: Martin Packman on 2012-03-16
Approved revision: 6501
Merged at revision: 6512
Proposed branch: lp:~vila/bzr/948339-config-caching
Merge into: lp:bzr
Diff against target: 242 lines (+61/-24)
8 files modified
bzrlib/branch.py (+3/-3)
bzrlib/config.py (+14/-11)
bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py (+4/-1)
bzrlib/tests/test_branch.py (+15/-9)
bzrlib/tests/test_config.py (+14/-0)
bzrlib/tests/test_merge_core.py (+1/-0)
doc/developers/configuration.txt (+7/-0)
doc/en/release-notes/bzr-2.6.txt (+3/-0)
To merge this branch: bzr merge lp:~vila/bzr/948339-config-caching
Reviewer Review Type Date Requested Status
Martin Packman (community) 2012-03-13 Approve on 2012-03-16
Review via email: mp+97256@code.launchpad.net

Commit message

Properly share mutable config sections and save the branch config only during the final unlock

Description of the change

This fix two bugs in the config stack API:

1) the branch config was saved (if changes have been made) for every
   unlock(). This now happens only for the highest level lock.

2) mutable sections weren't properly shared

On top of this proposal I did two experiments to purge the cached config on
unlock():

1) Define a _clear_cached_state() method for BzrBranch adding a
   self.conf_store_unload().

Chaos ensued, more than 130 tests failed. _clear_cached_state() cannot be
used to unload the config store because it's used in specific ways that go
beyond what can be deduced from its name (see last_revision_info() and
_set_revision_history() for examples :-/)

More concretely, _clear_cached_state() is not a general clear-branch-state
method so it can't be used here.

2) calls self.conf_store.unload() just after save_changes() in unlock().

All test passed *but* using -Econfig_stats, we went, for the whole test
suite from config.load : 54487 to config.load : 114282, roughly doubling
branch.conf accesses. This is caused by all the methods decorated with
@needs_read_lock trigerring a config purge which of course goes against the
aim of reducing the IOs.

Wrong idea IMHO.

So unlocked branch objects will keep their config cached specifically to
cover the use case of unlocked branch objects.

Fixing this requires switching to a model where we never use unlocked
branches which in itself is a really big change which may be valuable but
which has so many fallouts that I don't think we want to tackle it for now.

To post a comment you must log in.
Martin Packman (gz) wrote :

Summarising from discussion on IRC yesterday,

The lock count change doesn't seem to be directly asserted by test_set_delays_write. Checking now though the test does in fact fail without the fix, as does test_set_from_config_stack_get_from_config in a similar manner, so that's fine.

Flipping the expectedFailure assertions where the behaviour isn't worth fixing is a little odd, though the added reasoning is nice. Ideally, it would live not here but somewhere that people trying to use config and running into problems might actually find.

The added test_mutable_section_shared is nice and easy to follow, so even though I find the list->dict fixes slightly mysterious without working out more about the config details, I'm happy with the change.

review: Approve
Vincent Ladeuil (vila) wrote :

> Summarising from discussion on IRC yesterday,
>
> The lock count change doesn't seem to be directly asserted by
> test_set_delays_write. Checking now though the test does in fact fail without
> the fix, as does test_set_from_config_stack_get_from_config in a similar
> manner, so that's fine.

Yeah, the more I think about remarks like this (or one I do myself during reviews), the more I believe I should keep more detailed notes while working on the fix and have a way to mention the narrative in the cover letter... You rightly understand how I proceeded, well done Watson ;) Err, Sherlock ? ;-p

>
> Flipping the expectedFailure assertions where the behaviour isn't worth fixing
> is a little odd,

It's more than 'not worth fixing', it's really that trying to maintain compatibility between the two APIs can't be done while still reducing the IOs. The old API *really* relies in many ways on the fact that each and every change is saved to disk. No matter how you use the old API, you will never miss a change: from different objects in the same process, from different processes, everything works. This came with a price: read the config file for each and every option. Read *and* write the config file for each and every change.

The new implementation already dropped some possible optimisations to ease the transition and is safe (AFAIK) to mix with the old one *as long* as one API only is used for a given option.

So dropping the IO reduction for compatibility sake ? I say no. Some migrations will be harder than others but well, they'll have to happen, so the sooner the better.

> though the added reasoning is nice. Ideally, it would live
> not here but somewhere that people trying to use config and running into
> problems might actually find.

Should I refer to these tests in doc/developers/configuration.txt then ?

>
> The added test_mutable_section_shared is nice and easy to follow, so even
> though I find the list->dict fixes slightly mysterious without working out
> more about the config details, I'm happy with the change.

Ok. I'll add a comment to better explain why it's important that MutableSection should stay unique to be truly shared in the __init__ method then, would that have helped ?

Martin Packman (gz) wrote :

> So dropping the IO reduction for compatibility sake ? I say no. Some
> migrations will be harder than others but well, they'll have to happen, so the
> sooner the better.
>
> > though the added reasoning is nice. Ideally, it would live
> > not here but somewhere that people trying to use config and running into
> > problems might actually find.
>
> Should I refer to these tests in doc/developers/configuration.txt then ?

Not the tests, but something that would have helped Aaron when he was wondering why the things these tests exercise don't work would be good. A general migration thing with warnings about the pitfalls would be nice, currently that doc just lists some some ai changes without further advice.

> Ok. I'll add a comment to better explain why it's important that
> MutableSection should stay unique to be truly shared in the __init__ method
> then, would that have helped ?

Maybe a little, my issue was mostly that I'm not familiar with what the code is doing.

lp:~vila/bzr/948339-config-caching updated on 2012-03-28
6502. By Vincent Ladeuil on 2012-03-28

Clarify the dict usage to share dirty sections.

Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/branch.py'
2--- bzrlib/branch.py 2012-03-14 15:26:01 +0000
3+++ bzrlib/branch.py 2012-03-28 15:52:30 +0000
4@@ -968,8 +968,8 @@
5 This means the next call to revision_history will need to call
6 _gen_revision_history.
7
8- This API is semi-public; it only for use by subclasses, all other code
9- should consider it to be private.
10+ This API is semi-public; it is only for use by subclasses, all other
11+ code should consider it to be private.
12 """
13 self._revision_history_cache = None
14 self._revision_id_to_revno_cache = None
15@@ -2495,7 +2495,7 @@
16
17 @only_raises(errors.LockNotHeld, errors.LockBroken)
18 def unlock(self):
19- if self.conf_store is not None:
20+ if self.control_files._lock_count == 1 and self.conf_store is not None:
21 self.conf_store.save_changes()
22 try:
23 self.control_files.unlock()
24
25=== modified file 'bzrlib/config.py'
26--- bzrlib/config.py 2012-03-14 09:30:48 +0000
27+++ bzrlib/config.py 2012-03-28 15:52:30 +0000
28@@ -2956,7 +2956,7 @@
29 # Report concurrent updates in an ad-hoc way. This should only
30 # occurs when different processes try to update the same option
31 # which is not supported (as in: the config framework is not meant
32- # to be used a sharing mechanism).
33+ # to be used as a sharing mechanism).
34 if expected != reloaded:
35 if actual is _DeletedOption:
36 actual = '<DELETED>'
37@@ -2982,8 +2982,9 @@
38 mutable_section_class = MutableSection
39
40 def __init__(self):
41- # Which sections need to be saved
42- self.dirty_sections = []
43+ # Which sections need to be saved (by section id). We use a dict here
44+ # so the dirty sections can be shared by multiple callers.
45+ self.dirty_sections = {}
46
47 def is_loaded(self):
48 """Returns True if the Store has been loaded.
49@@ -3032,7 +3033,7 @@
50 raise NotImplementedError(self.save)
51
52 def _need_saving(self):
53- for s in self.dirty_sections:
54+ for s in self.dirty_sections.values():
55 if s.orig:
56 # At least one dirty section contains a modification
57 return True
58@@ -3052,11 +3053,11 @@
59 # get_mutable_section() call below.
60 self.unload()
61 # Apply the changes from the preserved dirty sections
62- for dirty in dirty_sections:
63- clean = self.get_mutable_section(dirty.id)
64+ for section_id, dirty in dirty_sections.iteritems():
65+ clean = self.get_mutable_section(section_id)
66 clean.apply_changes(dirty, self)
67 # Everything is clean now
68- self.dirty_sections = []
69+ self.dirty_sections = {}
70
71 def save_changes(self):
72 """Saves the Store to persistent storage if changes occurred.
73@@ -3142,7 +3143,7 @@
74
75 def unload(self):
76 self._config_obj = None
77- self.dirty_sections = []
78+ self.dirty_sections = {}
79
80 def _load_content(self):
81 """Load the config file bytes.
82@@ -3196,8 +3197,7 @@
83 if not self._need_saving():
84 return
85 # Preserve the current version
86- current = self._config_obj
87- dirty_sections = list(self.dirty_sections)
88+ dirty_sections = dict(self.dirty_sections.items())
89 self.apply_changes(dirty_sections)
90 # Save to the persistent storage
91 self.save()
92@@ -3238,13 +3238,16 @@
93 except errors.NoSuchFile:
94 # The file doesn't exist, let's pretend it was empty
95 self._load_from_string('')
96+ if section_id in self.dirty_sections:
97+ # We already created a mutable section for this id
98+ return self.dirty_sections[section_id]
99 if section_id is None:
100 section = self._config_obj
101 else:
102 section = self._config_obj.setdefault(section_id, {})
103 mutable_section = self.mutable_section_class(section_id, section)
104 # All mutable sections can become dirty
105- self.dirty_sections.append(mutable_section)
106+ self.dirty_sections[section_id] = mutable_section
107 return mutable_section
108
109 def quote(self, value):
110
111=== modified file 'bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py'
112--- bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py 2012-01-27 22:18:07 +0000
113+++ bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py 2012-03-28 15:52:30 +0000
114@@ -186,7 +186,10 @@
115 base_text, True)
116 builder.change_contents('clog-id', other=other_text, this=this_text)
117 merger = builder.make_merger(merge.Merge3Merger, ['clog-id'])
118- merger.this_branch.get_config_stack().set(
119+ # The following can't use config stacks until the plugin itself does
120+ # ('this_branch' is already write locked at this point and as such
121+ # won't write the new value to disk where get_user_option can get it).
122+ merger.this_branch.get_config().set_user_option(
123 'changelog_merge_files', 'ChangeLog')
124 merge_hook_params = merge.MergeFileHookParams(merger, 'clog-id', None,
125 'file', 'file', 'conflict')
126
127=== modified file 'bzrlib/tests/test_branch.py'
128--- bzrlib/tests/test_branch.py 2012-03-14 15:26:01 +0000
129+++ bzrlib/tests/test_branch.py 2012-03-28 15:52:30 +0000
130@@ -661,10 +661,13 @@
131 finally:
132 copy.unlock()
133 self.assertFalse(self.branch.is_locked())
134- result = self.branch.get_config_stack().get('foo')
135- # Bug: https://bugs.launchpad.net/bzr/+bug/948339
136- self.expectFailure('Unlocked branches cache their configs',
137- self.assertEqual, 'bar', result)
138+ # Since the branch is locked, the option value won't be saved on disk
139+ # so trying to access the config of locked branch via another older
140+ # non-locked branch object pointing to the same branch is not supported
141+ self.assertEqual(None, self.branch.get_config_stack().get('foo'))
142+ # Using a newly created branch object works as expected
143+ fresh = _mod_branch.Branch.open(self.branch.base)
144+ self.assertEqual('bar', fresh.get_config_stack().get('foo'))
145
146 def test_set_from_config_get_from_config_stack(self):
147 self.branch.lock_write()
148@@ -679,18 +682,21 @@
149 self.branch.lock_write()
150 self.addCleanup(self.branch.unlock)
151 self.branch.get_config_stack().set('foo', 'bar')
152- self.assertEqual('bar',
153+ # Since the branch is locked, the option value won't be saved on disk
154+ # so mixing get() and get_user_option() is broken by design.
155+ self.assertEqual(None,
156 self.branch.get_config().get_user_option('foo'))
157
158- def test_set_delays_write(self):
159+ def test_set_delays_write_when_branch_is_locked(self):
160 self.branch.lock_write()
161 self.addCleanup(self.branch.unlock)
162 self.branch.get_config_stack().set('foo', 'bar')
163 copy = _mod_branch.Branch.open(self.branch.base)
164 result = copy.get_config_stack().get('foo')
165- # Bug: https://bugs.launchpad.net/bzr/+bug/948339
166- self.expectFailure("Config writes are not cached.", self.assertIs,
167- None, result)
168+ # Accessing from a different branch object is like accessing from a
169+ # different process: the option has not been saved yet and the new
170+ # value cannot be seen.
171+ self.assertIs(None, result)
172
173
174 class TestPullResult(tests.TestCase):
175
176=== modified file 'bzrlib/tests/test_config.py'
177--- bzrlib/tests/test_config.py 2012-03-14 09:30:48 +0000
178+++ bzrlib/tests/test_config.py 2012-03-28 15:52:30 +0000
179@@ -2860,6 +2860,20 @@
180 store.save()
181 self.assertEquals(False, self.has_store(store))
182
183+ def test_mutable_section_shared(self):
184+ store = self.get_store(self)
185+ store._load_from_string('foo=bar\n')
186+ # FIXME: There should be a better way than relying on the test
187+ # parametrization to identify branch.conf -- vila 2011-0526
188+ if self.store_id in ('branch', 'remote_branch'):
189+ # branch stores requires write locked branches
190+ self.addCleanup(store.branch.lock_write().unlock)
191+ section1 = store.get_mutable_section(None)
192+ section2 = store.get_mutable_section(None)
193+ # If we get different sections, different callers won't share the
194+ # modification
195+ self.assertIs(section1, section2)
196+
197 def test_save_emptied_succeeds(self):
198 store = self.get_store(self)
199 store._load_from_string('foo=bar\n')
200
201=== modified file 'bzrlib/tests/test_merge_core.py'
202--- bzrlib/tests/test_merge_core.py 2012-02-23 23:26:35 +0000
203+++ bzrlib/tests/test_merge_core.py 2012-03-28 15:52:30 +0000
204@@ -43,6 +43,7 @@
205
206
207 class MergeBuilder(object):
208+
209 def __init__(self, dir=None):
210 self.dir = osutils.mkdtemp(prefix="merge-test", dir=dir)
211 self.tree_root = generate_ids.gen_root_id()
212
213=== modified file 'doc/developers/configuration.txt'
214--- doc/developers/configuration.txt 2012-01-28 15:37:27 +0000
215+++ doc/developers/configuration.txt 2012-03-28 15:52:30 +0000
216@@ -98,6 +98,13 @@
217 * convert the unicode string provided by the user into a suitable
218 representation (integer, list, etc).
219
220+If you start migrating a given option to the config stacks, don't stop
221+mid-way, all its uses should be covered (tests included). There are some
222+edge cases where updates via one API will be not be seen by the other API
223+(see http://pad.lv/948339 and http://pad.lv/948344 for details). Roughly,
224+the old API always trigger an IO while the new one cache values to avoid
225+them. This works fine as long as a given option is handled via a single API.
226+
227 Adding a new stack
228 ------------------
229
230
231=== modified file 'doc/en/release-notes/bzr-2.6.txt'
232--- doc/en/release-notes/bzr-2.6.txt 2012-03-16 15:05:05 +0000
233+++ doc/en/release-notes/bzr-2.6.txt 2012-03-28 15:52:30 +0000
234@@ -117,6 +117,9 @@
235 * Fix ``bzr config`` display for ``RegistryOption`` values.
236 (Vincent Ladeuil, #930182)
237
238+* Option values set on locked branches should be saved only when the branch
239+ is finally unlocked. (Vincent Ladeuil, #948339)
240+
241 Documentation
242 *************
243