Merge lp:~vila/bzr/948339-config-caching into lp:bzr

Proposed by Vincent Ladeuil on 2012-03-13
Status: Merged
Approved by: Martin Packman on 2012-03-16
Approved revision: 6501
Merged at revision: 6512
Proposed branch: lp:~vila/bzr/948339-config-caching
Merge into: lp:bzr
Diff against target: 242 lines (+61/-24)
8 files modified
bzrlib/branch.py (+3/-3)
bzrlib/config.py (+14/-11)
bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py (+4/-1)
bzrlib/tests/test_branch.py (+15/-9)
bzrlib/tests/test_config.py (+14/-0)
bzrlib/tests/test_merge_core.py (+1/-0)
doc/developers/configuration.txt (+7/-0)
doc/en/release-notes/bzr-2.6.txt (+3/-0)
To merge this branch: bzr merge lp:~vila/bzr/948339-config-caching
Reviewer Review Type Date Requested Status
Martin Packman (community) 2012-03-13 Approve on 2012-03-16
Review via email: mp+97256@code.launchpad.net

Commit message

Properly share mutable config sections and save the branch config only during the final unlock

Description of the change

This fix two bugs in the config stack API:

1) the branch config was saved (if changes have been made) for every
   unlock(). This now happens only for the highest level lock.

2) mutable sections weren't properly shared

On top of this proposal I did two experiments to purge the cached config on
unlock():

1) Define a _clear_cached_state() method for BzrBranch adding a
   self.conf_store_unload().

Chaos ensued, more than 130 tests failed. _clear_cached_state() cannot be
used to unload the config store because it's used in specific ways that go
beyond what can be deduced from its name (see last_revision_info() and
_set_revision_history() for examples :-/)

More concretely, _clear_cached_state() is not a general clear-branch-state
method so it can't be used here.

2) calls self.conf_store.unload() just after save_changes() in unlock().

All test passed *but* using -Econfig_stats, we went, for the whole test
suite from config.load : 54487 to config.load : 114282, roughly doubling
branch.conf accesses. This is caused by all the methods decorated with
@needs_read_lock trigerring a config purge which of course goes against the
aim of reducing the IOs.

Wrong idea IMHO.

So unlocked branch objects will keep their config cached specifically to
cover the use case of unlocked branch objects.

Fixing this requires switching to a model where we never use unlocked
branches which in itself is a really big change which may be valuable but
which has so many fallouts that I don't think we want to tackle it for now.

To post a comment you must log in.
Martin Packman (gz) wrote :

Summarising from discussion on IRC yesterday,

The lock count change doesn't seem to be directly asserted by test_set_delays_write. Checking now though the test does in fact fail without the fix, as does test_set_from_config_stack_get_from_config in a similar manner, so that's fine.

Flipping the expectedFailure assertions where the behaviour isn't worth fixing is a little odd, though the added reasoning is nice. Ideally, it would live not here but somewhere that people trying to use config and running into problems might actually find.

The added test_mutable_section_shared is nice and easy to follow, so even though I find the list->dict fixes slightly mysterious without working out more about the config details, I'm happy with the change.

review: Approve
Vincent Ladeuil (vila) wrote :

> Summarising from discussion on IRC yesterday,
>
> The lock count change doesn't seem to be directly asserted by
> test_set_delays_write. Checking now though the test does in fact fail without
> the fix, as does test_set_from_config_stack_get_from_config in a similar
> manner, so that's fine.

Yeah, the more I think about remarks like this (or one I do myself during reviews), the more I believe I should keep more detailed notes while working on the fix and have a way to mention the narrative in the cover letter... You rightly understand how I proceeded, well done Watson ;) Err, Sherlock ? ;-p

>
> Flipping the expectedFailure assertions where the behaviour isn't worth fixing
> is a little odd,

It's more than 'not worth fixing', it's really that trying to maintain compatibility between the two APIs can't be done while still reducing the IOs. The old API *really* relies in many ways on the fact that each and every change is saved to disk. No matter how you use the old API, you will never miss a change: from different objects in the same process, from different processes, everything works. This came with a price: read the config file for each and every option. Read *and* write the config file for each and every change.

The new implementation already dropped some possible optimisations to ease the transition and is safe (AFAIK) to mix with the old one *as long* as one API only is used for a given option.

So dropping the IO reduction for compatibility sake ? I say no. Some migrations will be harder than others but well, they'll have to happen, so the sooner the better.

> though the added reasoning is nice. Ideally, it would live
> not here but somewhere that people trying to use config and running into
> problems might actually find.

Should I refer to these tests in doc/developers/configuration.txt then ?

>
> The added test_mutable_section_shared is nice and easy to follow, so even
> though I find the list->dict fixes slightly mysterious without working out
> more about the config details, I'm happy with the change.

Ok. I'll add a comment to better explain why it's important that MutableSection should stay unique to be truly shared in the __init__ method then, would that have helped ?

Martin Packman (gz) wrote :

> So dropping the IO reduction for compatibility sake ? I say no. Some
> migrations will be harder than others but well, they'll have to happen, so the
> sooner the better.
>
> > though the added reasoning is nice. Ideally, it would live
> > not here but somewhere that people trying to use config and running into
> > problems might actually find.
>
> Should I refer to these tests in doc/developers/configuration.txt then ?

Not the tests, but something that would have helped Aaron when he was wondering why the things these tests exercise don't work would be good. A general migration thing with warnings about the pitfalls would be nice, currently that doc just lists some some ai changes without further advice.

> Ok. I'll add a comment to better explain why it's important that
> MutableSection should stay unique to be truly shared in the __init__ method
> then, would that have helped ?

Maybe a little, my issue was mostly that I'm not familiar with what the code is doing.

lp:~vila/bzr/948339-config-caching updated on 2012-03-28
6502. By Vincent Ladeuil on 2012-03-28

Clarify the dict usage to share dirty sections.

Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/branch.py'
--- bzrlib/branch.py 2012-03-14 15:26:01 +0000
+++ bzrlib/branch.py 2012-03-28 15:52:30 +0000
@@ -968,8 +968,8 @@
968 This means the next call to revision_history will need to call968 This means the next call to revision_history will need to call
969 _gen_revision_history.969 _gen_revision_history.
970970
971 This API is semi-public; it only for use by subclasses, all other code971 This API is semi-public; it is only for use by subclasses, all other
972 should consider it to be private.972 code should consider it to be private.
973 """973 """
974 self._revision_history_cache = None974 self._revision_history_cache = None
975 self._revision_id_to_revno_cache = None975 self._revision_id_to_revno_cache = None
@@ -2495,7 +2495,7 @@
24952495
2496 @only_raises(errors.LockNotHeld, errors.LockBroken)2496 @only_raises(errors.LockNotHeld, errors.LockBroken)
2497 def unlock(self):2497 def unlock(self):
2498 if self.conf_store is not None:2498 if self.control_files._lock_count == 1 and self.conf_store is not None:
2499 self.conf_store.save_changes()2499 self.conf_store.save_changes()
2500 try:2500 try:
2501 self.control_files.unlock()2501 self.control_files.unlock()
25022502
=== modified file 'bzrlib/config.py'
--- bzrlib/config.py 2012-03-14 09:30:48 +0000
+++ bzrlib/config.py 2012-03-28 15:52:30 +0000
@@ -2956,7 +2956,7 @@
2956 # Report concurrent updates in an ad-hoc way. This should only2956 # Report concurrent updates in an ad-hoc way. This should only
2957 # occurs when different processes try to update the same option2957 # occurs when different processes try to update the same option
2958 # which is not supported (as in: the config framework is not meant2958 # which is not supported (as in: the config framework is not meant
2959 # to be used a sharing mechanism).2959 # to be used as a sharing mechanism).
2960 if expected != reloaded:2960 if expected != reloaded:
2961 if actual is _DeletedOption:2961 if actual is _DeletedOption:
2962 actual = '<DELETED>'2962 actual = '<DELETED>'
@@ -2982,8 +2982,9 @@
2982 mutable_section_class = MutableSection2982 mutable_section_class = MutableSection
29832983
2984 def __init__(self):2984 def __init__(self):
2985 # Which sections need to be saved2985 # Which sections need to be saved (by section id). We use a dict here
2986 self.dirty_sections = []2986 # so the dirty sections can be shared by multiple callers.
2987 self.dirty_sections = {}
29872988
2988 def is_loaded(self):2989 def is_loaded(self):
2989 """Returns True if the Store has been loaded.2990 """Returns True if the Store has been loaded.
@@ -3032,7 +3033,7 @@
3032 raise NotImplementedError(self.save)3033 raise NotImplementedError(self.save)
30333034
3034 def _need_saving(self):3035 def _need_saving(self):
3035 for s in self.dirty_sections:3036 for s in self.dirty_sections.values():
3036 if s.orig:3037 if s.orig:
3037 # At least one dirty section contains a modification3038 # At least one dirty section contains a modification
3038 return True3039 return True
@@ -3052,11 +3053,11 @@
3052 # get_mutable_section() call below.3053 # get_mutable_section() call below.
3053 self.unload()3054 self.unload()
3054 # Apply the changes from the preserved dirty sections3055 # Apply the changes from the preserved dirty sections
3055 for dirty in dirty_sections:3056 for section_id, dirty in dirty_sections.iteritems():
3056 clean = self.get_mutable_section(dirty.id)3057 clean = self.get_mutable_section(section_id)
3057 clean.apply_changes(dirty, self)3058 clean.apply_changes(dirty, self)
3058 # Everything is clean now3059 # Everything is clean now
3059 self.dirty_sections = []3060 self.dirty_sections = {}
30603061
3061 def save_changes(self):3062 def save_changes(self):
3062 """Saves the Store to persistent storage if changes occurred.3063 """Saves the Store to persistent storage if changes occurred.
@@ -3142,7 +3143,7 @@
31423143
3143 def unload(self):3144 def unload(self):
3144 self._config_obj = None3145 self._config_obj = None
3145 self.dirty_sections = []3146 self.dirty_sections = {}
31463147
3147 def _load_content(self):3148 def _load_content(self):
3148 """Load the config file bytes.3149 """Load the config file bytes.
@@ -3196,8 +3197,7 @@
3196 if not self._need_saving():3197 if not self._need_saving():
3197 return3198 return
3198 # Preserve the current version3199 # Preserve the current version
3199 current = self._config_obj3200 dirty_sections = dict(self.dirty_sections.items())
3200 dirty_sections = list(self.dirty_sections)
3201 self.apply_changes(dirty_sections)3201 self.apply_changes(dirty_sections)
3202 # Save to the persistent storage3202 # Save to the persistent storage
3203 self.save()3203 self.save()
@@ -3238,13 +3238,16 @@
3238 except errors.NoSuchFile:3238 except errors.NoSuchFile:
3239 # The file doesn't exist, let's pretend it was empty3239 # The file doesn't exist, let's pretend it was empty
3240 self._load_from_string('')3240 self._load_from_string('')
3241 if section_id in self.dirty_sections:
3242 # We already created a mutable section for this id
3243 return self.dirty_sections[section_id]
3241 if section_id is None:3244 if section_id is None:
3242 section = self._config_obj3245 section = self._config_obj
3243 else:3246 else:
3244 section = self._config_obj.setdefault(section_id, {})3247 section = self._config_obj.setdefault(section_id, {})
3245 mutable_section = self.mutable_section_class(section_id, section)3248 mutable_section = self.mutable_section_class(section_id, section)
3246 # All mutable sections can become dirty3249 # All mutable sections can become dirty
3247 self.dirty_sections.append(mutable_section)3250 self.dirty_sections[section_id] = mutable_section
3248 return mutable_section3251 return mutable_section
32493252
3250 def quote(self, value):3253 def quote(self, value):
32513254
=== modified file 'bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py'
--- bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py 2012-01-27 22:18:07 +0000
+++ bzrlib/plugins/changelog_merge/tests/test_changelog_merge.py 2012-03-28 15:52:30 +0000
@@ -186,7 +186,10 @@
186 base_text, True)186 base_text, True)
187 builder.change_contents('clog-id', other=other_text, this=this_text)187 builder.change_contents('clog-id', other=other_text, this=this_text)
188 merger = builder.make_merger(merge.Merge3Merger, ['clog-id'])188 merger = builder.make_merger(merge.Merge3Merger, ['clog-id'])
189 merger.this_branch.get_config_stack().set(189 # The following can't use config stacks until the plugin itself does
190 # ('this_branch' is already write locked at this point and as such
191 # won't write the new value to disk where get_user_option can get it).
192 merger.this_branch.get_config().set_user_option(
190 'changelog_merge_files', 'ChangeLog')193 'changelog_merge_files', 'ChangeLog')
191 merge_hook_params = merge.MergeFileHookParams(merger, 'clog-id', None,194 merge_hook_params = merge.MergeFileHookParams(merger, 'clog-id', None,
192 'file', 'file', 'conflict')195 'file', 'file', 'conflict')
193196
=== modified file 'bzrlib/tests/test_branch.py'
--- bzrlib/tests/test_branch.py 2012-03-14 15:26:01 +0000
+++ bzrlib/tests/test_branch.py 2012-03-28 15:52:30 +0000
@@ -661,10 +661,13 @@
661 finally:661 finally:
662 copy.unlock()662 copy.unlock()
663 self.assertFalse(self.branch.is_locked())663 self.assertFalse(self.branch.is_locked())
664 result = self.branch.get_config_stack().get('foo')664 # Since the branch is locked, the option value won't be saved on disk
665 # Bug: https://bugs.launchpad.net/bzr/+bug/948339665 # so trying to access the config of locked branch via another older
666 self.expectFailure('Unlocked branches cache their configs',666 # non-locked branch object pointing to the same branch is not supported
667 self.assertEqual, 'bar', result)667 self.assertEqual(None, self.branch.get_config_stack().get('foo'))
668 # Using a newly created branch object works as expected
669 fresh = _mod_branch.Branch.open(self.branch.base)
670 self.assertEqual('bar', fresh.get_config_stack().get('foo'))
668671
669 def test_set_from_config_get_from_config_stack(self):672 def test_set_from_config_get_from_config_stack(self):
670 self.branch.lock_write()673 self.branch.lock_write()
@@ -679,18 +682,21 @@
679 self.branch.lock_write()682 self.branch.lock_write()
680 self.addCleanup(self.branch.unlock)683 self.addCleanup(self.branch.unlock)
681 self.branch.get_config_stack().set('foo', 'bar')684 self.branch.get_config_stack().set('foo', 'bar')
682 self.assertEqual('bar',685 # Since the branch is locked, the option value won't be saved on disk
686 # so mixing get() and get_user_option() is broken by design.
687 self.assertEqual(None,
683 self.branch.get_config().get_user_option('foo'))688 self.branch.get_config().get_user_option('foo'))
684689
685 def test_set_delays_write(self):690 def test_set_delays_write_when_branch_is_locked(self):
686 self.branch.lock_write()691 self.branch.lock_write()
687 self.addCleanup(self.branch.unlock)692 self.addCleanup(self.branch.unlock)
688 self.branch.get_config_stack().set('foo', 'bar')693 self.branch.get_config_stack().set('foo', 'bar')
689 copy = _mod_branch.Branch.open(self.branch.base)694 copy = _mod_branch.Branch.open(self.branch.base)
690 result = copy.get_config_stack().get('foo')695 result = copy.get_config_stack().get('foo')
691 # Bug: https://bugs.launchpad.net/bzr/+bug/948339696 # Accessing from a different branch object is like accessing from a
692 self.expectFailure("Config writes are not cached.", self.assertIs,697 # different process: the option has not been saved yet and the new
693 None, result)698 # value cannot be seen.
699 self.assertIs(None, result)
694700
695701
696class TestPullResult(tests.TestCase):702class TestPullResult(tests.TestCase):
697703
=== modified file 'bzrlib/tests/test_config.py'
--- bzrlib/tests/test_config.py 2012-03-14 09:30:48 +0000
+++ bzrlib/tests/test_config.py 2012-03-28 15:52:30 +0000
@@ -2860,6 +2860,20 @@
2860 store.save()2860 store.save()
2861 self.assertEquals(False, self.has_store(store))2861 self.assertEquals(False, self.has_store(store))
28622862
2863 def test_mutable_section_shared(self):
2864 store = self.get_store(self)
2865 store._load_from_string('foo=bar\n')
2866 # FIXME: There should be a better way than relying on the test
2867 # parametrization to identify branch.conf -- vila 2011-0526
2868 if self.store_id in ('branch', 'remote_branch'):
2869 # branch stores requires write locked branches
2870 self.addCleanup(store.branch.lock_write().unlock)
2871 section1 = store.get_mutable_section(None)
2872 section2 = store.get_mutable_section(None)
2873 # If we get different sections, different callers won't share the
2874 # modification
2875 self.assertIs(section1, section2)
2876
2863 def test_save_emptied_succeeds(self):2877 def test_save_emptied_succeeds(self):
2864 store = self.get_store(self)2878 store = self.get_store(self)
2865 store._load_from_string('foo=bar\n')2879 store._load_from_string('foo=bar\n')
28662880
=== modified file 'bzrlib/tests/test_merge_core.py'
--- bzrlib/tests/test_merge_core.py 2012-02-23 23:26:35 +0000
+++ bzrlib/tests/test_merge_core.py 2012-03-28 15:52:30 +0000
@@ -43,6 +43,7 @@
4343
4444
45class MergeBuilder(object):45class MergeBuilder(object):
46
46 def __init__(self, dir=None):47 def __init__(self, dir=None):
47 self.dir = osutils.mkdtemp(prefix="merge-test", dir=dir)48 self.dir = osutils.mkdtemp(prefix="merge-test", dir=dir)
48 self.tree_root = generate_ids.gen_root_id()49 self.tree_root = generate_ids.gen_root_id()
4950
=== modified file 'doc/developers/configuration.txt'
--- doc/developers/configuration.txt 2012-01-28 15:37:27 +0000
+++ doc/developers/configuration.txt 2012-03-28 15:52:30 +0000
@@ -98,6 +98,13 @@
98* convert the unicode string provided by the user into a suitable98* convert the unicode string provided by the user into a suitable
99 representation (integer, list, etc).99 representation (integer, list, etc).
100100
101If you start migrating a given option to the config stacks, don't stop
102mid-way, all its uses should be covered (tests included). There are some
103edge cases where updates via one API will be not be seen by the other API
104(see http://pad.lv/948339 and http://pad.lv/948344 for details). Roughly,
105the old API always trigger an IO while the new one cache values to avoid
106them. This works fine as long as a given option is handled via a single API.
107
101Adding a new stack108Adding a new stack
102------------------109------------------
103110
104111
=== modified file 'doc/en/release-notes/bzr-2.6.txt'
--- doc/en/release-notes/bzr-2.6.txt 2012-03-16 15:05:05 +0000
+++ doc/en/release-notes/bzr-2.6.txt 2012-03-28 15:52:30 +0000
@@ -117,6 +117,9 @@
117* Fix ``bzr config`` display for ``RegistryOption`` values.117* Fix ``bzr config`` display for ``RegistryOption`` values.
118 (Vincent Ladeuil, #930182)118 (Vincent Ladeuil, #930182)
119119
120* Option values set on locked branches should be saved only when the branch
121 is finally unlocked. (Vincent Ladeuil, #948339)
122
120Documentation123Documentation
121*************124*************
122125