Merge lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr

Proposed by Andrew Bennetts on 2010-08-11
Status: Merged
Approved by: Andrew Bennetts on 2010-08-16
Approved revision: 5381
Merged at revision: 5377
Proposed branch: lp:~spiv/bzr/chk-canonicalizer-613698
Merge into: lp:bzr
Diff against target: 322 lines (+199/-10)
4 files modified
bzrlib/builtins.py (+8/-2)
bzrlib/reconcile.py (+26/-7)
bzrlib/repofmt/groupcompress_repo.py (+125/-1)
bzrlib/versionedfile.py (+40/-0)
To merge this branch: bzr merge lp:~spiv/bzr/chk-canonicalizer-613698
Reviewer Review Type Date Requested Status
bzr-core 2010-08-11 Pending
Review via email: mp+32294@code.launchpad.net

Commit Message

Add hidden option 'bzr reconcile --canonicalize-chks' to repair repos affected by bug 522637.

Description of the Change

This adds 'bzr reconcile --canonicalize-chks' as a hidden option to repair repositories affected by bug 522637. The basic approach is like John's bzr-check-chk plugin: regenerate the CHK map from scratch and verify that the key is the same.

Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.

I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.

The use of NoDupeAddLinesDecorator is not strictly necessary, but it avoids exploding a 5MB repo of lp:do into over 200MB. That is fixed by repacking, but that sort of waste might have made this repair tool impossible to run on a larger repository.

This takes about 3 minutes to fix the broken lp:do repository on my laptop.

To post a comment you must log in.
John A Meinel (jameinel) wrote :
Download full text (3.3 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #613698 need reconcile fixer for bug 522637 "missing referenced chk root keys"
> https://bugs.launchpad.net/bugs/613698
>
>
> This adds 'bzr reconcile --canonicalize-chks' as a hidden option to repair repositories affected by bug 522637. The basic approach is like John's bzr-check-chk plugin: regenerate the CHK map from scratch and verify that the key is the same.
>
> Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.
>
> I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.
>
> The use of NoDupeAddLinesDecorator is not strictly necessary, but it avoids exploding a 5MB repo of lp:do into over 200MB. That is fixed by repacking, but that sort of waste might have made this repair tool impossible to run on a larger repository.
>
> This takes about 3 minutes to fix the broken lp:do repository on my laptop.
>

- - repo_reconciler = self.repo.reconcile(thorough=True)
+ if self.canonicalize_chks:
+ repo_reconciler = self.repo.reconcile_canonicalize_chks()
+ else:
+ repo_reconciler = self.repo.reconcile(thorough=True)

^- Why use a new function rather than passing the parameter through?

I know there is a design principle that says don't add a parameter that
is static for callers (always True, always False for a given caller).
This isn't quite the same, since a user is selecting it.

Anyway, it *felt* to me more like a parameter to .reconcile()
(especially since it is a parameter up the stack).

The other thing I'm not sure about, it looks like you added:
+ @needs_write_lock
+ def reconcile_canonicalize_chks(self):

only to one repository type, which means you'd get an attribute error
running reconcile --canonicalize-chks on another repo type. But maybe I
misread that. (Better to just raise a BzrCommandError, I guess)

+class NoDupeAddLinesDecorator(object):

^- Is this actually necessary. I would have sworn I had a patch earlier
that did something exactly like this, because I saw conversions (I
think) waste tons of space that disappeared at repack. Then again, maybe
that was in InterDifferingSerializer or something along those lines.

Anyway, I think it might be reasonable to just push this as the common
case, rather than needing a wrapper. Certainly there you might be able
to do it better than a get_parent_map() call.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with...

Read more...

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #613698 need reconcile fixer for bug 522637 "missing referenced chk root keys"
> https://bugs.launchpad.net/bugs/613698
>
>
> This adds 'bzr reconcile --canonicalize-chks' as a hidden option to repair repositories affected by bug 522637. The basic approach is like John's bzr-check-chk plugin: regenerate the CHK map from scratch and verify that the key is the same.
>
> Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.
>
> I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.
>
> The use of NoDupeAddLinesDecorator is not strictly necessary, but it avoids exploding a 5MB repo of lp:do into over 200MB. That is fixed by repacking, but that sort of waste might have made this repair tool impossible to run on a larger repository.
>
> This takes about 3 minutes to fix the broken lp:do repository on my laptop.
>

btw, if we wanted to make this faster...

I think we could rework it so that the initial inventory is produced
from scratch, and then all others just use apply_inventory_delta.
(Possibly special cased if it actually finds an incorrect inventory.)

The total shape of the inventory is correct (otherwise all of this
wouldn't work anyway), and the deltas are correct. As such, it would
require a lot less regeneration of pages that are already correct.

That would make it possible to run it all reconcile runs, rather than
requiring a flag.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxjILcACgkQJdeBCYSNAAOj6gCeJ2HU6OJMgTAnsBvq1mIKkHCX
0rUAn2FouKNap31WoEEy+6P+ssXRJmac
=9oPA
-----END PGP SIGNATURE-----

Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> - - repo_reconciler = self.repo.reconcile(thorough=True)
> + if self.canonicalize_chks:
> + repo_reconciler = self.repo.reconcile_canonicalize_chks()
> + else:
> + repo_reconciler = self.repo.reconcile(thorough=True)
>
> ^- Why use a new function rather than passing the parameter through?
>
> I know there is a design principle that says don't add a parameter that
> is static for callers (always True, always False for a given caller).
> This isn't quite the same, since a user is selecting it.

The reason was that I didn't want to add it to all implementations of
reconcile, which felt a bit wrong considering it only applies to one
repository type.

> Anyway, it *felt* to me more like a parameter to .reconcile()
> (especially since it is a parameter up the stack).
>
> The other thing I'm not sure about, it looks like you added:
> + @needs_write_lock
> + def reconcile_canonicalize_chks(self):
>
> only to one repository type, which means you'd get an attribute error
> running reconcile --canonicalize-chks on another repo type. But maybe I
> misread that. (Better to just raise a BzrCommandError, I guess)

Yeah, I'll make cmd_reconcile check the format and raise that.

> +class NoDupeAddLinesDecorator(object):
>
> ^- Is this actually necessary. I would have sworn I had a patch earlier
> that did something exactly like this, because I saw conversions (I
> think) waste tons of space that disappeared at repack. Then again, maybe
> that was in InterDifferingSerializer or something along those lines.

Yep, it's necessary. 200MB vs. 5MB!

> Anyway, I think it might be reasonable to just push this as the common
> case, rather than needing a wrapper. Certainly there you might be able
> to do it better than a get_parent_map() call.

Yeah, quite possibly. This implementation does have the cost of
querying the index for 1 key every add_lines, which would be pretty bad
for performance where latency is a concern. But I'm also a bit
reluctant to make add_lines take any more parameters...

-Andrew.

Andrew Bennetts (spiv) wrote :

John A Meinel wrote:
[...]
> btw, if we wanted to make this faster...
>
> I think we could rework it so that the initial inventory is produced
> from scratch, and then all others just use apply_inventory_delta.
> (Possibly special cased if it actually finds an incorrect inventory.)

That's a good point (although it relies on apply_inventory_delta being
as reliable as CHKMap.from_dict, which was the bug in the first place,
but that seems reasonable).

I'd prefer to stop spending time on this, though :) The number of
affected repositories, as judged from the bug report, seems to be pretty
low, which makes sense given the fairly particular conditions needed to
trigger it (deleting < 5 entries from an inventory that is just over the
size that splits it from one CHK node to many). So for now I think it's
good enough to get this relatively crude repair tool in the hands of
the few users that need it. We can always come back and improve it
later if there's a reason to do that.

-Andrew.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> John A Meinel wrote:
> [...]
>> btw, if we wanted to make this faster...
>>
>> I think we could rework it so that the initial inventory is produced
>> from scratch, and then all others just use apply_inventory_delta.
>> (Possibly special cased if it actually finds an incorrect inventory.)
>
> That's a good point (although it relies on apply_inventory_delta being
> as reliable as CHKMap.from_dict, which was the bug in the first place,
> but that seems reasonable).
>
> I'd prefer to stop spending time on this, though :) The number of
> affected repositories, as judged from the bug report, seems to be pretty
> low, which makes sense given the fairly particular conditions needed to
> trigger it (deleting < 5 entries from an inventory that is just over the
> size that splits it from one CHK node to many). So for now I think it's
> good enough to get this relatively crude repair tool in the hands of
> the few users that need it. We can always come back and improve it
> later if there's a reason to do that.
>
> -Andrew.
>

Actually, it is

if delete_Count > _INTERESTING ...:

so it is deleting *more* that 5 entries from an inventory that causes it
to cross the boundary.

Which just means anything that deletes a lot of entries, enough to cross
the threshold.

That said, I don't know that you need to spend a lot more time on this.
Ideally we would also have 'bzr check' notice it, though.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxjU3QACgkQJdeBCYSNAAOlTQCeNxSBEeol3WJPCNwoJ09tbWfF
yIUAoLfOBGoRGzbIM0Cvy50a+DEKSt+g
=D6lS
-----END PGP SIGNATURE-----

5381. By Andrew Bennetts on 2010-08-12

Don't traceback if a repository doesn't support reconcile_canonicalize_chks.

Andrew Bennetts (spiv) wrote :

John wrote:
> That said, I don't know that you need to spend a lot more time on this.

I'm going to go ahead and merge this then. I hope I'm not taking too many liberties by doing so, but I'm confident we can incrementally improve this after landing if we need to, and it'll be of more benefit landed than not.

> Ideally we would also have 'bzr check' notice it, though.

I agree. At the moment I'm not planning to spend time on that.

Andrew Bennetts (spiv) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/builtins.py'
2--- bzrlib/builtins.py 2010-08-07 20:31:22 +0000
3+++ bzrlib/builtins.py 2010-08-12 04:26:01 +0000
4@@ -1592,11 +1592,17 @@
5
6 _see_also = ['check']
7 takes_args = ['branch?']
8+ takes_options = [
9+ Option('canonicalize-chks',
10+ help='Make sure CHKs are in canonical form (repairs '
11+ 'bug 522637).',
12+ hidden=True),
13+ ]
14
15- def run(self, branch="."):
16+ def run(self, branch=".", canonicalize_chks=False):
17 from bzrlib.reconcile import reconcile
18 dir = bzrdir.BzrDir.open(branch)
19- reconcile(dir)
20+ reconcile(dir, canonicalize_chks=canonicalize_chks)
21
22
23 class cmd_revision_history(Command):
24
25=== modified file 'bzrlib/reconcile.py'
26--- bzrlib/reconcile.py 2010-05-20 18:23:17 +0000
27+++ bzrlib/reconcile.py 2010-08-12 04:26:01 +0000
28@@ -36,7 +36,7 @@
29 from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
30
31
32-def reconcile(dir, other=None):
33+def reconcile(dir, canonicalize_chks=False):
34 """Reconcile the data in dir.
35
36 Currently this is limited to a inventory 'reweave'.
37@@ -46,18 +46,19 @@
38 Directly using Reconciler is recommended for library users that
39 desire fine grained control or analysis of the found issues.
40
41- :param other: another bzrdir to reconcile against.
42+ :param canonicalize_chks: Make sure CHKs are in canonical form.
43 """
44- reconciler = Reconciler(dir, other=other)
45+ reconciler = Reconciler(dir, canonicalize_chks=canonicalize_chks)
46 reconciler.reconcile()
47
48
49 class Reconciler(object):
50 """Reconcilers are used to reconcile existing data."""
51
52- def __init__(self, dir, other=None):
53+ def __init__(self, dir, other=None, canonicalize_chks=False):
54 """Create a Reconciler."""
55 self.bzrdir = dir
56+ self.canonicalize_chks = canonicalize_chks
57
58 def reconcile(self):
59 """Perform reconciliation.
60@@ -98,7 +99,15 @@
61 ui.ui_factory.note('Reconciling repository %s' %
62 self.repo.user_url)
63 self.pb.update("Reconciling repository", 0, 1)
64- repo_reconciler = self.repo.reconcile(thorough=True)
65+ if self.canonicalize_chks:
66+ try:
67+ self.repo.reconcile_canonicalize_chks
68+ except AttributeError:
69+ raise errors.BzrError(
70+ "%s cannot canonicalize CHKs." % (self.repo,))
71+ repo_reconciler = self.repo.reconcile_canonicalize_chks()
72+ else:
73+ repo_reconciler = self.repo.reconcile(thorough=True)
74 self.inconsistent_parents = repo_reconciler.inconsistent_parents
75 self.garbage_inventories = repo_reconciler.garbage_inventories
76 if repo_reconciler.aborted:
77@@ -496,6 +505,12 @@
78 # - unlock the names list
79 # https://bugs.launchpad.net/bzr/+bug/154173
80
81+ def __init__(self, repo, other=None, thorough=False,
82+ canonicalize_chks=False):
83+ super(PackReconciler, self).__init__(repo, other=other,
84+ thorough=thorough)
85+ self.canonicalize_chks = canonicalize_chks
86+
87 def _reconcile_steps(self):
88 """Perform the steps to reconcile this repository."""
89 if not self.thorough:
90@@ -509,8 +524,12 @@
91 total_inventories = len(list(
92 collection.inventory_index.combined_index.iter_all_entries()))
93 if len(all_revisions):
94- new_pack = self.repo._reconcile_pack(collection, packs,
95- ".reconcile", all_revisions, self.pb)
96+ if self.canonicalize_chks:
97+ reconcile_meth = self.repo._canonicalize_chks_pack
98+ else:
99+ reconcile_meth = self.repo._reconcile_pack
100+ new_pack = reconcile_meth(collection, packs, ".reconcile",
101+ all_revisions, self.pb)
102 if new_pack is not None:
103 self._discard_and_save(packs)
104 else:
105
106=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
107--- bzrlib/repofmt/groupcompress_repo.py 2010-05-14 13:25:05 +0000
108+++ bzrlib/repofmt/groupcompress_repo.py 2010-08-12 04:26:01 +0000
109@@ -32,11 +32,13 @@
110 revision as _mod_revision,
111 trace,
112 ui,
113+ versionedfile,
114 )
115 from bzrlib.btree_index import (
116 BTreeGraphIndex,
117 BTreeBuilder,
118 )
119+from bzrlib.decorators import needs_write_lock
120 from bzrlib.groupcompress import (
121 _GCGraphIndex,
122 GroupCompressVersionedFiles,
123@@ -425,8 +427,11 @@
124 self._copy_stream(source_vf, target_vf, inventory_keys,
125 'inventories', self._get_filtered_inv_stream, 2)
126
127+ def _get_chk_vfs_for_copy(self):
128+ return self._build_vfs('chk', False, False)
129+
130 def _copy_chk_texts(self):
131- source_vf, target_vf = self._build_vfs('chk', False, False)
132+ source_vf, target_vf = self._get_chk_vfs_for_copy()
133 # TODO: This is technically spurious... if it is a performance issue,
134 # remove it
135 total_keys = source_vf.keys()
136@@ -578,6 +583,111 @@
137 return new_pack.data_inserted() and self._data_changed
138
139
140+class GCCHKCanonicalizingPacker(GCCHKPacker):
141+ """A packer that ensures inventories have canonical-form CHK maps.
142+
143+ Ideally this would be part of reconcile, but it's very slow and rarely
144+ needed. (It repairs repositories affected by
145+ https://bugs.launchpad.net/bzr/+bug/522637).
146+ """
147+
148+ def __init__(self, *args, **kwargs):
149+ super(GCCHKCanonicalizingPacker, self).__init__(*args, **kwargs)
150+ self._data_changed = False
151+
152+ def _exhaust_stream(self, source_vf, keys, message, vf_to_stream, pb_offset):
153+ """Create and exhaust a stream, but don't insert it.
154+
155+ This is useful to get the side-effects of generating a stream.
156+ """
157+ self.pb.update('scanning %s' % (message,), pb_offset)
158+ child_pb = ui.ui_factory.nested_progress_bar()
159+ try:
160+ list(vf_to_stream(source_vf, keys, message, child_pb))
161+ finally:
162+ child_pb.finished()
163+
164+ def _copy_inventory_texts(self):
165+ source_vf, target_vf = self._build_vfs('inventory', True, True)
166+ source_chk_vf, target_chk_vf = self._get_chk_vfs_for_copy()
167+ inventory_keys = source_vf.keys()
168+ # First, copy the existing CHKs on the assumption that most of them
169+ # will be correct. This will save us from having to reinsert (and
170+ # recompress) these records later at the cost of perhaps preserving a
171+ # few unused CHKs.
172+ # (Iterate but don't insert _get_filtered_inv_stream to populate the
173+ # variables needed by GCCHKPacker._copy_chk_texts.)
174+ self._exhaust_stream(source_vf, inventory_keys, 'inventories',
175+ self._get_filtered_inv_stream, 2)
176+ GCCHKPacker._copy_chk_texts(self)
177+ # Now copy and fix the inventories, and any regenerated CHKs.
178+ def chk_canonicalizing_inv_stream(source_vf, keys, message, pb=None):
179+ return self._get_filtered_canonicalizing_inv_stream(
180+ source_vf, keys, message, pb, source_chk_vf, target_chk_vf)
181+ self._copy_stream(source_vf, target_vf, inventory_keys,
182+ 'inventories', chk_canonicalizing_inv_stream, 4)
183+
184+ def _copy_chk_texts(self):
185+ # No-op; in this class this happens during _copy_inventory_texts.
186+ pass
187+
188+ def _get_filtered_canonicalizing_inv_stream(self, source_vf, keys, message,
189+ pb=None, source_chk_vf=None, target_chk_vf=None):
190+ """Filter the texts of inventories, regenerating CHKs to make sure they
191+ are canonical.
192+ """
193+ total_keys = len(keys)
194+ target_chk_vf = versionedfile.NoDupeAddLinesDecorator(target_chk_vf)
195+ def _filtered_inv_stream():
196+ stream = source_vf.get_record_stream(keys, 'groupcompress', True)
197+ search_key_name = None
198+ for idx, record in enumerate(stream):
199+ # Inventories should always be with revisions; assume success.
200+ bytes = record.get_bytes_as('fulltext')
201+ chk_inv = inventory.CHKInventory.deserialise(
202+ source_chk_vf, bytes, record.key)
203+ if pb is not None:
204+ pb.update('inv', idx, total_keys)
205+ chk_inv.id_to_entry._ensure_root()
206+ if search_key_name is None:
207+ # Find the name corresponding to the search_key_func
208+ search_key_reg = chk_map.search_key_registry
209+ for search_key_name, func in search_key_reg.iteritems():
210+ if func == chk_inv.id_to_entry._search_key_func:
211+ break
212+ canonical_inv = inventory.CHKInventory.from_inventory(
213+ target_chk_vf, chk_inv,
214+ maximum_size=chk_inv.id_to_entry._root_node._maximum_size,
215+ search_key_name=search_key_name)
216+ if chk_inv.id_to_entry.key() != canonical_inv.id_to_entry.key():
217+ trace.mutter(
218+ 'Non-canonical CHK map for id_to_entry of inv: %s '
219+ '(root is %s, should be %s)' % (chk_inv.revision_id,
220+ chk_inv.id_to_entry.key()[0],
221+ canonical_inv.id_to_entry.key()[0]))
222+ self._data_changed = True
223+ p_id_map = chk_inv.parent_id_basename_to_file_id
224+ p_id_map._ensure_root()
225+ canon_p_id_map = canonical_inv.parent_id_basename_to_file_id
226+ if p_id_map.key() != canon_p_id_map.key():
227+ trace.mutter(
228+ 'Non-canonical CHK map for parent_id_to_basename of '
229+ 'inv: %s (root is %s, should be %s)'
230+ % (chk_inv.revision_id, p_id_map.key()[0],
231+ canon_p_id_map.key()[0]))
232+ self._data_changed = True
233+ yield versionedfile.ChunkedContentFactory(record.key,
234+ record.parents, record.sha1,
235+ canonical_inv.to_lines())
236+ # We have finished processing all of the inventory records, we
237+ # don't need these sets anymore
238+ return _filtered_inv_stream()
239+
240+ def _use_pack(self, new_pack):
241+ """Override _use_pack to check for reconcile having changed content."""
242+ return new_pack.data_inserted() and self._data_changed
243+
244+
245 class GCRepositoryPackCollection(RepositoryPackCollection):
246
247 pack_factory = GCPack
248@@ -999,10 +1109,24 @@
249 finally:
250 pb.finished()
251
252+ @needs_write_lock
253+ def reconcile_canonicalize_chks(self):
254+ """Reconcile this repository to make sure all CHKs are in canonical
255+ form.
256+ """
257+ from bzrlib.reconcile import PackReconciler
258+ reconciler = PackReconciler(self, thorough=True, canonicalize_chks=True)
259+ reconciler.reconcile()
260+ return reconciler
261+
262 def _reconcile_pack(self, collection, packs, extension, revs, pb):
263 packer = GCCHKReconcilePacker(collection, packs, extension)
264 return packer.pack(pb)
265
266+ def _canonicalize_chks_pack(self, collection, packs, extension, revs, pb):
267+ packer = GCCHKCanonicalizingPacker(collection, packs, extension, revs)
268+ return packer.pack(pb)
269+
270 def _get_source(self, to_format):
271 """Return a source for streaming from this repository."""
272 if self._format._serializer == to_format._serializer:
273
274=== modified file 'bzrlib/versionedfile.py'
275--- bzrlib/versionedfile.py 2010-08-11 01:27:46 +0000
276+++ bzrlib/versionedfile.py 2010-08-12 04:26:01 +0000
277@@ -1725,6 +1725,46 @@
278 yield (l, key)
279
280
281+class NoDupeAddLinesDecorator(object):
282+ """Decorator for a VersionedFiles that skips doing an add_lines if the key
283+ is already present.
284+ """
285+
286+ def __init__(self, store):
287+ self._store = store
288+
289+ def add_lines(self, key, parents, lines, parent_texts=None,
290+ left_matching_blocks=None, nostore_sha=None, random_id=False,
291+ check_content=True):
292+ """See VersionedFiles.add_lines.
293+
294+ This implementation may return None as the third element of the return
295+ value when the original store wouldn't.
296+ """
297+ if nostore_sha:
298+ raise NotImplementedError(
299+ "NoDupeAddLinesDecorator.add_lines does not implement the "
300+ "nostore_sha behaviour.")
301+ if key[-1] is None:
302+ sha1 = osutils.sha_strings(lines)
303+ key = ("sha1:" + sha1,)
304+ else:
305+ sha1 = None
306+ if key in self._store.get_parent_map([key]):
307+ # This key has already been inserted, so don't do it again.
308+ if sha1 is None:
309+ sha1 = osutils.sha_strings(lines)
310+ return sha1, sum(map(len, lines)), None
311+ return self._store.add_lines(key, parents, lines,
312+ parent_texts=parent_texts,
313+ left_matching_blocks=left_matching_blocks,
314+ nostore_sha=nostore_sha, random_id=random_id,
315+ check_content=check_content)
316+
317+ def __getattr__(self, name):
318+ return getattr(self._store, name)
319+
320+
321 def network_bytes_to_kind_and_offset(network_bytes):
322 """Strip of a record kind from the front of network_bytes.
323