Merge lp:~lifeless/bzr/apply-inventory-delta into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Superseded
Proposed branch: lp:~lifeless/bzr/apply-inventory-delta
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 827 lines (has conflicts)
Text conflict in bzrlib/tests/test_chk_map.py
To merge this branch: bzr merge lp:~lifeless/bzr/apply-inventory-delta
Reviewer Review Type Date Requested Status
Ian Clatworthy Needs Fixing
Review via email: mp+8730@code.launchpad.net

This proposal supersedes a proposal from 2009-07-13.

This proposal has been superseded by a proposal from 2009-07-17.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Add WorkingTree.apply_inventory_delta to the set of delta
implementations interface tested.

This implementation wasn't preventing 'parent is not a directory'
errors. Other than that the changes that are made are essentially
cosmetic - consistent exceptions and catching corruption that we're very
unlikely to cause.

-Rob

--

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> Add WorkingTree.apply_inventory_delta to the set of delta
> implementations interface tested.
>
> This implementation wasn't preventing 'parent is not a directory'
> errors. Other than that the changes that are made are essentially
> cosmetic - consistent exceptions and catching corruption that we're very
> unlikely to cause.
>
> -Rob
>
>
The dirstate, doc and test changes look fine.

I'm really concerned though about the performance impact of calling
id2path for every entry in an inventory delta to do the additional
checks you've added. For chk-based inventories in particular, path
calculations *really* hurt performance.

Do you have any measurements re performance impact? I'm happy to help
here if you'd like.

Ian C.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

On Tue, 2009-07-14 at 04:21 +0000, Ian Clatworthy wrote:

> The dirstate, doc and test changes look fine.
>
> I'm really concerned though about the performance impact of calling
> id2path for every entry in an inventory delta to do the additional
> checks you've added. For chk-based inventories in particular, path
> calculations *really* hurt performance.
>
> Do you have any measurements re performance impact? I'm happy to help
> here if you'd like.

I don't, and I'd love some. id2path only occurs on the items being
changed, not on the full inventory so it should be tolerable (and
because its not path2id we load in the items we're changing and above
only). The effect on importers and commit should be marginal. I won't be
surprised if its visible, but even if it is I'm fairly sure that we need
this. Over the last week and a bit I've found a frighteningly large
number of ways to completely bork inventories via damaged deltas.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

Ian report a test failure;
ERROR: per_repository.test_fetch.TestFetchSameRepository.test_fetch_to_rich_root_set_parent_1_parent_different_id_gone(RepositoryFormatCHK2)
#
    An inconsistent delta was supplied: [(('', ''), None, None), (None, ('', ''), 'my-root')]
#
reason: New items are already in the map [(('', ''), 'TREE_ROOT')].

This is a bogus delta new checks have uncovered; we need to fix the cause of this.

Revision history for this message
Robert Collins (lifeless) wrote :

also from igc
ERROR: per_inventory.basics.TestInventoryApplyDelta.test_apply_delta_delete(Inventory)
    An inconsistent delta was supplied involving 'a', None
reason: Entry with no new_path

(This indicates a broken sample delta, I suspect)

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Running usertest on OOo comparing rev 4538 vs same+patch shows:

1. commit of a new files increasing from 1.9s to 2.1s
2. commit of a changed file increasing from 1.8s to 3.3s.

The former is probably ok; the latter needs investigation.

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 03:07 +0000, Ian Clatworthy wrote:
> Running usertest on OOo comparing rev 4538 vs same+patch shows:
>
> 1. commit of a new files increasing from 1.9s to 2.1s
> 2. commit of a changed file increasing from 1.8s to 3.3s.
>
> The former is probably ok; the latter needs investigation.

Is the commit of a changed file whole-tree or named file?

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 03:15 +0000, Robert Collins wrote:
> On Thu, 2009-07-16 at 03:07 +0000, Ian Clatworthy wrote:
> > Running usertest on OOo comparing rev 4538 vs same+patch shows:
> >
> > 1. commit of a new files increasing from 1.9s to 2.1s
> > 2. commit of a changed file increasing from 1.8s to 3.3s.
> >
> > The former is probably ok; the latter needs investigation.
>
> Is the commit of a changed file whole-tree or named file?

Ian says 'whole tree with one changed file'.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta into lp:bzr.
>
>
>
Some feedback on the dirstate.py changes ...

> + try:
> + entry = self._dirblocks[block_i][1][entry_i]
> + except IndexError:
> + raise errors.InconsistentDelta(path, file_id,
> + "Wrong path for old path.")
> + if not f_present or entry[1][0][0] in 'ar':
> + raise errors.InconsistentDelta(path, file_id,
> + "Wrong path for old path.")
> + if file_id != entry[0][2]:
> + raise errors.InconsistentDelta(path, file_id,
> + "Attempt to remove wrong has wrong id - found %r."
> + % entry[0][2])
>

The first 2 messages here are identical. Is that by design or
cut-and-paste? (Sometimes it's worth making the messages slightly
different to assist support.)

The third message needs tweaking: "wrong has wrong ..."

> + # New record. Check there isn't a entry at this path already.
> + if not fullscan:
> + low_index, _ = self._find_entry_index(key[0:2] + ('',), block)
> + while low_index < len(block):
> + entry = block[low_index]
> + if entry[0][0:2] == key[0:2]:
> + if entry[1][0][0] not in 'ar':
> + # This entry has the same path (but a different id) as
> + # the new entry we're adding, and is present in ths
> + # tree.
> + raise errors.InconsistentDelta(
> + ("%s/%s" % key[0:2]).decode('utf8'), key[2],
> + "Attempt to add item at path already occupied by "
> + "id %r" % entry[0][2])
> + else:
> + break
>
The while loop looks like it can/will loop forever. Also s/ths/this/ in
the comment.

Ian C.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Doc changes and tests all look fine. More code feedback: I don't think it's safe removing these few lines from inventory.py ...

         for old_path, file_id in sorted(((op, f) for op, np, f, e in delta
                                         if op is not None), reverse=True):
- if file_id not in self:
- # adds come later
- continue
             # Preserve unaltered children of file_id for later reinsertion.
             file_id_children = getattr(self[file_id], 'children', {})

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 03:28 +0000, Ian Clatworthy wrote:

> Doc changes and tests all look fine. More code feedback: I don't think
> it's safe removing these few lines from inventory.py ...

Why not?

> for old_path, file_id in sorted(((op, f) for op, np, f, e in
> delta
> if op is not None),
> reverse=True):
> - if file_id not in self:
> - # adds come later
> - continue
> # Preserve unaltered children of file_id for later
> reinsertion.
> file_id_children = getattr(self[file_id], 'children', {})

If file_id is not in self, then the old_path must be None, or the delta
is deleting a file_id that isn't present. Thats a corrupt delta and we
should not silently accomodate that.

-Rob

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Robert Collins wrote:
> On Thu, 2009-07-16 at 03:28 +0000, Ian Clatworthy wrote:
>
>
>> Doc changes and tests all look fine. More code feedback: I don't think
>> it's safe removing these few lines from inventory.py ...
>>
>
> Why not?
>
>
>> for old_path, file_id in sorted(((op, f) for op, np, f, e in
>> delta
>> if op is not None),
>> reverse=True):
>> - if file_id not in self:
>> - # adds come later
>> - continue
>> # Preserve unaltered children of file_id for later
>> reinsertion.
>> file_id_children = getattr(self[file_id], 'children', {})
>>
>
> If file_id is not in self, then the old_path must be None, or the delta
> is deleting a file_id that isn't present. Thats a corrupt delta and we
> should not silently accomodate that.
>
> -Rob
>
>
You need to trap the KeyError self[file_id] can throw if you remove it.

Ian C.

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-16 at 03:57 +0000, Ian Clatworthy wrote:

> You need to trap the KeyError self[file_id] can throw if you remove it.

In this branch I think there are tests that deleting an absent id raises
the right error. That code is dead code ;).

-Rob

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

See linked bug for callgrind before and after attachments.

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

> Robert Collins wrote:
> > Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta
> into lp:bzr.
> >
> >
> >
> Some feedback on the dirstate.py changes ...
>
> > + try:
> > + entry = self._dirblocks[block_i][1][entry_i]
> > + except IndexError:
> > + raise errors.InconsistentDelta(path, file_id,
> > + "Wrong path for old path.")
> > + if not f_present or entry[1][0][0] in 'ar':
> > + raise errors.InconsistentDelta(path, file_id,
> > + "Wrong path for old path.")
> > + if file_id != entry[0][2]:
> > + raise errors.InconsistentDelta(path, file_id,
> > + "Attempt to remove wrong has wrong id - found %r."
> > + % entry[0][2])
> >
>
> The first 2 messages here are identical. Is that by design or
> cut-and-paste? (Sometimes it's worth making the messages slightly
> different to assist support.)

Deliberate - its a series of checks with the same outcome.

> The third message needs tweaking: "wrong has wrong ..."

Changed: s/wrong has wrong/path has wrong/.

> The while loop looks like it can/will loop forever. Also s/ths/this/ in
> the comment.

Thanks, was missing a low_index += 1.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-07-16 08:00:03 +0000
3+++ NEWS 2009-07-16 16:36:05 +0000
4@@ -57,6 +57,13 @@
5 Internals
6 *********
7
8+* ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds
9+ as new a key which was already mapped. (Robert Collins)
10+
11+* Inventory delta application catches more cases of corruption and can
12+ prevent corrupt deltas from affecting consistency of data structures on
13+ disk. (Robert Collins)
14+
15
16 bzr 1.17rc1 "So late it's brunch" 2009-07-13
17 ############################################
18
19=== modified file 'bzrlib/chk_map.py'
20--- bzrlib/chk_map.py 2009-07-14 17:33:13 +0000
21+++ bzrlib/chk_map.py 2009-07-16 16:36:05 +0000
22@@ -26,9 +26,9 @@
23
24 Updates to a CHKMap are done preferentially via the apply_delta method, to
25 allow optimisation of the update operation; but individual map/unmap calls are
26-possible and supported. All changes via map/unmap are buffered in memory until
27-the _save method is called to force serialisation of the tree. apply_delta
28-performs a _save implicitly.
29+possible and supported. Individual changes via map/unmap are buffered in memory
30+until the _save method is called to force serialisation of the tree.
31+apply_delta records its changes immediately by performing an implicit _save.
32
33 TODO:
34 -----
35@@ -41,7 +41,10 @@
36
37 from bzrlib import lazy_import
38 lazy_import.lazy_import(globals(), """
39-from bzrlib import versionedfile
40+from bzrlib import (
41+ errors,
42+ versionedfile,
43+ )
44 """)
45 from bzrlib import (
46 lru_cache,
47@@ -105,6 +108,14 @@
48 of old_key is removed.
49 """
50 delete_count = 0
51+ # Check preconditions first.
52+ new_items = set([key for (old, key, value) in delta if key is not None
53+ and old is None])
54+ existing_new = list(self.iteritems(key_filter=new_items))
55+ if existing_new:
56+ raise errors.InconsistentDeltaDelta(delta,
57+ "New items are already in the map %r." % existing_new)
58+ # Now apply changes.
59 for old, new, value in delta:
60 if old is not None and old != new:
61 self.unmap(old, check_remap=False)
62@@ -482,7 +493,11 @@
63 return len(self._root_node)
64
65 def map(self, key, value):
66- """Map a key tuple to value."""
67+ """Map a key tuple to value.
68+
69+ :param key: A key to map.
70+ :param value: The value to assign to key.
71+ """
72 # Need a root object.
73 self._ensure_root()
74 prefix, node_details = self._root_node.map(self._store, key, value)
75
76=== modified file 'bzrlib/dirstate.py'
77--- bzrlib/dirstate.py 2009-07-10 05:33:07 +0000
78+++ bzrlib/dirstate.py 2009-07-16 16:36:05 +0000
79@@ -204,6 +204,7 @@
80 import bisect
81 import binascii
82 import errno
83+import operator
84 import os
85 from stat import S_IEXEC
86 import stat
87@@ -1277,22 +1278,49 @@
88 def update_by_delta(self, delta):
89 """Apply an inventory delta to the dirstate for tree 0
90
91+ This is the workhorse for apply_inventory_delta in dirstate based
92+ trees.
93+
94 :param delta: An inventory delta. See Inventory.apply_delta for
95 details.
96 """
97 self._read_dirblocks_if_needed()
98+ encode = cache_utf8.encode
99 insertions = {}
100 removals = {}
101- for old_path, new_path, file_id, inv_entry in sorted(delta, reverse=True):
102+ # Accumulate parent references (path_utf8, id), to check for parentless
103+ # items or items placed under files/links/tree-references. We get
104+ # references from every item in the delta that is not a deletion and
105+ # is not itself the root.
106+ parents = set()
107+ # Added ids must not be in the dirstate already. This set holds those
108+ # ids.
109+ new_ids = set()
110+ # This loop transforms the delta to single atomic operations that can
111+ # be executed and validated.
112+ for old_path, new_path, file_id, inv_entry in sorted(
113+ inventory._check_delta_unique_old_paths(
114+ inventory._check_delta_unique_new_paths(
115+ inventory._check_delta_ids_match_entry(
116+ inventory._check_delta_new_path_entry_both_or_None(delta)))),
117+ reverse=True):
118 if (file_id in insertions) or (file_id in removals):
119 raise errors.InconsistentDelta(old_path or new_path, file_id,
120 "repeated file_id")
121 if old_path is not None:
122 old_path = old_path.encode('utf-8')
123 removals[file_id] = old_path
124+ else:
125+ new_ids.add(file_id)
126 if new_path is not None:
127+ if inv_entry is None:
128+ raise errors.InconsistentDelta(new_path, file_id,
129+ "new_path with no entry")
130 new_path = new_path.encode('utf-8')
131 dirname, basename = osutils.split(new_path)
132+ dirname_utf8 = encode(dirname)
133+ if basename:
134+ parents.add((dirname_utf8, inv_entry.parent_id))
135 key = (dirname, basename, file_id)
136 minikind = DirState._kind_to_minikind[inv_entry.kind]
137 if minikind == 't':
138@@ -1321,15 +1349,39 @@
139 child_basename)
140 insertions[child[0][2]] = (key, minikind, executable,
141 fingerprint, new_child_path)
142- self._apply_removals(removals.values())
143- self._apply_insertions(insertions.values())
144+ self._check_delta_ids_absent(new_ids, delta, 0)
145+ try:
146+ self._apply_removals(removals.iteritems())
147+ self._apply_insertions(insertions.values())
148+ # Validate parents
149+ self._after_delta_check_parents(parents, 0)
150+ except errors.BzrError, e:
151+ if 'integrity error' not in str(e):
152+ raise
153+ # _get_entry raises BzrError when a request is inconsistent; we
154+ # want such errors to be shown as InconsistentDelta - and that
155+ # fits the behaviour we trigger.
156+ self._changes_aborted = True
157+ raise errors.InconsistentDeltaDelta(delta, "error from _get_entry.")
158
159 def _apply_removals(self, removals):
160- for path in sorted(removals, reverse=True):
161+ for file_id, path in sorted(removals, reverse=True,
162+ key=operator.itemgetter(1)):
163 dirname, basename = osutils.split(path)
164 block_i, entry_i, d_present, f_present = \
165 self._get_block_entry_index(dirname, basename, 0)
166- entry = self._dirblocks[block_i][1][entry_i]
167+ try:
168+ entry = self._dirblocks[block_i][1][entry_i]
169+ except IndexError:
170+ raise errors.InconsistentDelta(path, file_id,
171+ "Wrong path for old path.")
172+ if not f_present or entry[1][0][0] in 'ar':
173+ raise errors.InconsistentDelta(path, file_id,
174+ "Wrong path for old path.")
175+ if file_id != entry[0][2]:
176+ raise errors.InconsistentDelta(path, file_id,
177+ "Attempt to remove wrong has wrong id - found %r."
178+ % entry[0][2])
179 self._make_absent(entry)
180 # See if we have a malformed delta: deleting a directory must not
181 # leave crud behind. This increases the number of bisects needed
182@@ -1348,9 +1400,13 @@
183 "not deleted.")
184
185 def _apply_insertions(self, adds):
186- for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
187- self.update_minimal(key, minikind, executable, fingerprint,
188- path_utf8=path_utf8)
189+ try:
190+ for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
191+ self.update_minimal(key, minikind, executable, fingerprint,
192+ path_utf8=path_utf8)
193+ except errors.NotVersionedError:
194+ raise errors.InconsistentDelta(path_utf8.decode('utf8'), key[2],
195+ "Missing parent")
196
197 def update_basis_by_delta(self, delta, new_revid):
198 """Update the parents of this tree after a commit.
199@@ -1400,16 +1456,22 @@
200 # At the same time, to reduce interface friction we convert the input
201 # inventory entries to dirstate.
202 root_only = ('', '')
203- # Accumulate parent references (path and id), to check for parentless
204+ # Accumulate parent references (path_utf8, id), to check for parentless
205 # items or items placed under files/links/tree-references. We get
206 # references from every item in the delta that is not a deletion and
207 # is not itself the root.
208 parents = set()
209+ # Added ids must not be in the dirstate already. This set holds those
210+ # ids.
211+ new_ids = set()
212 for old_path, new_path, file_id, inv_entry in delta:
213 if inv_entry is not None and file_id != inv_entry.file_id:
214 raise errors.InconsistentDelta(new_path, file_id,
215 "mismatched entry file_id %r" % inv_entry)
216 if new_path is not None:
217+ if inv_entry is None:
218+ raise errors.InconsistentDelta(new_path, file_id,
219+ "new_path with no entry")
220 new_path_utf8 = encode(new_path)
221 # note the parent for validation
222 dirname_utf8, basename_utf8 = osutils.split(new_path_utf8)
223@@ -1418,6 +1480,7 @@
224 if old_path is None:
225 adds.append((None, encode(new_path), file_id,
226 inv_to_entry(inv_entry), True))
227+ new_ids.add(file_id)
228 elif new_path is None:
229 deletes.append((encode(old_path), None, file_id, None, True))
230 elif (old_path, new_path) != root_only:
231@@ -1466,7 +1529,7 @@
232 # of everything.
233 changes.append((encode(old_path), encode(new_path), file_id,
234 inv_to_entry(inv_entry)))
235-
236+ self._check_delta_ids_absent(new_ids, delta, 1)
237 try:
238 # Finish expunging deletes/first half of renames.
239 self._update_basis_apply_deletes(deletes)
240@@ -1475,7 +1538,7 @@
241 # Apply in-situ changes.
242 self._update_basis_apply_changes(changes)
243 # Validate parents
244- self._update_basis_check_parents(parents)
245+ self._after_delta_check_parents(parents, 1)
246 except errors.BzrError, e:
247 if 'integrity error' not in str(e):
248 raise
249@@ -1492,6 +1555,29 @@
250 self._id_index = None
251 return
252
253+ def _check_delta_ids_absent(self, new_ids, delta, tree_index):
254+ """Check that none of the file_ids in new_ids are present in a tree."""
255+ if not new_ids:
256+ return
257+ id_index = self._get_id_index()
258+ for file_id in new_ids:
259+ for key in id_index.get(file_id, []):
260+ block_i, entry_i, d_present, f_present = \
261+ self._get_block_entry_index(key[0], key[1], tree_index)
262+ if not f_present:
263+ # In a different tree
264+ continue
265+ entry = self._dirblocks[block_i][1][entry_i]
266+ if entry[0][2] != file_id:
267+ # Different file_id, so not what we want.
268+ continue
269+ # NB: No changes made before this helper is called, so no need
270+ # to set the _changes_aborted flag.
271+ raise errors.InconsistentDelta(
272+ ("%s/%s" % key[0:2]).decode('utf8'), file_id,
273+ "This file_id is new in the delta but already present in "
274+ "the target")
275+
276 def _update_basis_apply_adds(self, adds):
277 """Apply a sequence of adds to tree 1 during update_basis_by_delta.
278
279@@ -1600,18 +1686,24 @@
280 # it is being resurrected here, so blank it out temporarily.
281 self._dirblocks[block_index][1][entry_index][1][1] = null
282
283- def _update_basis_check_parents(self, parents):
284- """Check that parents required by the delta are all intact."""
285+ def _after_delta_check_parents(self, parents, index):
286+ """Check that parents required by the delta are all intact.
287+
288+ :param parents: An iterable of (path_utf8, file_id) tuples which are
289+ required to be present in tree 'index' at path_utf8 with id file_id
290+ and be a directory.
291+ :param index: The column in the dirstate to check for parents in.
292+ """
293 for dirname_utf8, file_id in parents:
294 # Get the entry - the ensures that file_id, dirname_utf8 exists and
295 # has the right file id.
296- entry = self._get_entry(1, file_id, dirname_utf8)
297+ entry = self._get_entry(index, file_id, dirname_utf8)
298 if entry[1] is None:
299 self._changes_aborted = True
300 raise errors.InconsistentDelta(dirname_utf8.decode('utf8'),
301 file_id, "This parent is not present.")
302 # Parents of things must be directories
303- if entry[1][1][0] != 'd':
304+ if entry[1][index][0] != 'd':
305 self._changes_aborted = True
306 raise errors.InconsistentDelta(dirname_utf8.decode('utf8'),
307 file_id, "This parent is not a directory.")
308@@ -2478,7 +2570,8 @@
309 # old is finished: insert current_new into the state.
310 self.update_minimal(new_entry_key, current_new_minikind,
311 executable=current_new[1].executable,
312- path_utf8=new_path_utf8, fingerprint=fingerprint)
313+ path_utf8=new_path_utf8, fingerprint=fingerprint,
314+ fullscan=True)
315 current_new = advance(new_iterator)
316 elif not current_new:
317 # new is finished
318@@ -2496,7 +2589,8 @@
319 current_old[1][0][0] != current_new_minikind):
320 self.update_minimal(current_old[0], current_new_minikind,
321 executable=current_new[1].executable,
322- path_utf8=new_path_utf8, fingerprint=fingerprint)
323+ path_utf8=new_path_utf8, fingerprint=fingerprint,
324+ fullscan=True)
325 # both sides are dealt with, move on
326 current_old = advance(old_iterator)
327 current_new = advance(new_iterator)
328@@ -2507,7 +2601,8 @@
329 # add a entry for this and advance new
330 self.update_minimal(new_entry_key, current_new_minikind,
331 executable=current_new[1].executable,
332- path_utf8=new_path_utf8, fingerprint=fingerprint)
333+ path_utf8=new_path_utf8, fingerprint=fingerprint,
334+ fullscan=True)
335 current_new = advance(new_iterator)
336 else:
337 # we've advanced past the place where the old key would be,
338@@ -2571,7 +2666,7 @@
339 return last_reference
340
341 def update_minimal(self, key, minikind, executable=False, fingerprint='',
342- packed_stat=None, size=0, path_utf8=None):
343+ packed_stat=None, size=0, path_utf8=None, fullscan=False):
344 """Update an entry to the state in tree 0.
345
346 This will either create a new entry at 'key' or update an existing one.
347@@ -2588,6 +2683,9 @@
348 :param size: Size information for new entry
349 :param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
350 extra computation.
351+ :param fullscan: If True then a complete scan of the dirstate is being
352+ done and checking for duplicate rows should not be done. This
353+ should only be set by set_state_from_inventory and similar methods.
354
355 If packed_stat and fingerprint are not given, they're invalidated in
356 the entry.
357@@ -2602,6 +2700,22 @@
358 new_details = (minikind, fingerprint, size, executable, packed_stat)
359 id_index = self._get_id_index()
360 if not present:
361+ # New record. Check there isn't a entry at this path already.
362+ if not fullscan:
363+ low_index, _ = self._find_entry_index(key[0:2] + ('',), block)
364+ while low_index < len(block):
365+ entry = block[low_index]
366+ if entry[0][0:2] == key[0:2]:
367+ if entry[1][0][0] not in 'ar':
368+ # This entry has the same path (but a different id) as
369+ # the new entry we're adding, and is present in ths
370+ # tree.
371+ raise errors.InconsistentDelta(
372+ ("%s/%s" % key[0:2]).decode('utf8'), key[2],
373+ "Attempt to add item at path already occupied by "
374+ "id %r" % entry[0][2])
375+ else:
376+ break
377 # new entry, synthesis cross reference here,
378 existing_keys = id_index.setdefault(key[2], set())
379 if not existing_keys:
380
381=== modified file 'bzrlib/inventory.py'
382--- bzrlib/inventory.py 2009-07-10 05:18:29 +0000
383+++ bzrlib/inventory.py 2009-07-16 16:36:05 +0000
384@@ -1089,6 +1089,9 @@
385 See the inventory developers documentation for the theory behind
386 inventory deltas.
387
388+ If delta application fails the inventory is left in an indeterminate
389+ state and must not be used.
390+
391 :param delta: A list of changes to apply. After all the changes are
392 applied the final inventory must be internally consistent, but it
393 is ok to supply changes which, if only half-applied would have an
394@@ -1129,7 +1132,8 @@
395 # facility.
396 list(_check_delta_unique_ids(_check_delta_unique_new_paths(
397 _check_delta_unique_old_paths(_check_delta_ids_match_entry(
398- delta)))))
399+ _check_delta_new_path_entry_both_or_None(
400+ delta))))))
401
402 children = {}
403 # Remove all affected items which were in the original inventory,
404@@ -1138,13 +1142,13 @@
405 # modified children remaining by the time we examine it.
406 for old_path, file_id in sorted(((op, f) for op, np, f, e in delta
407 if op is not None), reverse=True):
408- if file_id not in self:
409- # adds come later
410- continue
411 # Preserve unaltered children of file_id for later reinsertion.
412 file_id_children = getattr(self[file_id], 'children', {})
413 if len(file_id_children):
414 children[file_id] = file_id_children
415+ if self.id2path(file_id) != old_path:
416+ raise errors.InconsistentDelta(old_path, file_id,
417+ "Entry was at wrong other path %r." % self.id2path(file_id))
418 # Remove file_id and the unaltered children. If file_id is not
419 # being deleted it will be reinserted back later.
420 self.remove_recursive_id(file_id)
421@@ -1153,7 +1157,7 @@
422 # longest, ensuring that items which were modified and whose parents in
423 # the resulting inventory were also modified, are inserted after their
424 # parents.
425- for new_path, new_entry in sorted((np, e) for op, np, f, e in
426+ for new_path, f, new_entry in sorted((np, f, e) for op, np, f, e in
427 delta if np is not None):
428 if new_entry.kind == 'directory':
429 # Pop the child which to allow detection of children whose
430@@ -1166,9 +1170,15 @@
431 new_entry = replacement
432 try:
433 self.add(new_entry)
434+ except errors.DuplicateFileId:
435+ raise errors.InconsistentDelta(new_path, new_entry.file_id,
436+ "New id is already present in target.")
437 except AttributeError:
438 raise errors.InconsistentDelta(new_path, new_entry.file_id,
439 "Parent is not a directory.")
440+ if self.id2path(new_entry.file_id) != new_path:
441+ raise errors.InconsistentDelta(new_path, new_entry.file_id,
442+ "New path is not consistent with parent path.")
443 if len(children):
444 # Get the parent id that was deleted
445 parent_id, children = children.popitem()
446@@ -1254,12 +1264,11 @@
447 To add a file to a branch ready to be committed, use Branch.add,
448 which calls this.
449
450- Returns the new entry object.
451+ :return: entry
452 """
453 if entry.file_id in self._byid:
454 raise errors.DuplicateFileId(entry.file_id,
455 self._byid[entry.file_id])
456-
457 if entry.parent_id is None:
458 self.root = entry
459 else:
460@@ -1590,6 +1599,7 @@
461 copied to and updated for the result.
462 :return: The new CHKInventory.
463 """
464+ split = osutils.split
465 result = CHKInventory(self._search_key_name)
466 if propagate_caches:
467 # Just propagate the path-to-fileid cache for now
468@@ -1630,8 +1640,18 @@
469 inventory_delta = _check_delta_unique_new_paths(inventory_delta)
470 # Check for entries that don't match the fileid
471 inventory_delta = _check_delta_ids_match_entry(inventory_delta)
472- # All changed entries need to have their parents be directories.
473+ # Check for new_path <-> entry consistency
474+ inventory_delta = _check_delta_new_path_entry_both_or_None(
475+ inventory_delta)
476+ # All changed entries need to have their parents be directories and be
477+ # at the right path. This set contains (path, id) tuples.
478 parents = set()
479+ # When we delete an item, all the children of it must be either deleted
480+ # or altered in their own right. As we batch process the change via
481+ # CHKMap.apply_delta, we build a set of things to use to validate the
482+ # delta.
483+ deletes = set()
484+ altered = set()
485 for old_path, new_path, file_id, entry in inventory_delta:
486 # file id changes
487 if new_path == '':
488@@ -1646,17 +1666,23 @@
489 del result._path_to_fileid_cache[old_path]
490 except KeyError:
491 pass
492+ deletes.add(file_id)
493 else:
494 new_key = (file_id,)
495 new_value = result._entry_to_bytes(entry)
496 # Update caches. It's worth doing this whether
497 # we're propagating the old caches or not.
498 result._path_to_fileid_cache[new_path] = file_id
499- parents.add(entry.parent_id)
500+ parents.add((split(new_path)[0], entry.parent_id))
501 if old_path is None:
502 old_key = None
503 else:
504 old_key = (file_id,)
505+ if self.id2path(file_id) != old_path:
506+ raise errors.InconsistentDelta(old_path, file_id,
507+ "Entry was at wrong other path %r." %
508+ self.id2path(file_id))
509+ altered.add(file_id)
510 id_to_entry_delta.append((old_key, new_key, new_value))
511 if result.parent_id_basename_to_file_id is not None:
512 # parent_id, basename changes
513@@ -1675,11 +1701,23 @@
514 # If the two keys are the same, the value will be unchanged
515 # as its always the file id.
516 parent_id_basename_delta.append((old_key, new_key, new_value))
517+ # validate that deletes are complete.
518+ for file_id in deletes:
519+ entry = self[file_id]
520+ if entry.kind != 'directory':
521+ continue
522+ # This loop could potentially be better by using the id_basename
523+ # map to just get the child file ids.
524+ for child in entry.children.values():
525+ if child.file_id not in altered:
526+ raise errors.InconsistentDelta(self.id2path(child.file_id),
527+ child.file_id, "Child not deleted or reparented when "
528+ "parent deleted.")
529 result.id_to_entry.apply_delta(id_to_entry_delta)
530 if parent_id_basename_delta:
531 result.parent_id_basename_to_file_id.apply_delta(parent_id_basename_delta)
532- parents.discard(None)
533- for parent in parents:
534+ parents.discard(('', None))
535+ for parent_path, parent in parents:
536 try:
537 if result[parent].kind != 'directory':
538 raise errors.InconsistentDelta(result.id2path(parent), parent,
539@@ -1687,6 +1725,9 @@
540 except errors.NoSuchId:
541 raise errors.InconsistentDelta("<unknown>", parent,
542 "Parent is not present in resulting inventory.")
543+ if result.path2id(parent_path) != parent:
544+ raise errors.InconsistentDelta(parent_path, parent,
545+ "Parent has wrong path %r." % result.path2id(parent_path))
546 return result
547
548 @classmethod
549@@ -2156,3 +2197,20 @@
550 raise errors.InconsistentDelta(item[0] or item[1], item[2],
551 "mismatched id with %r" % entry)
552 yield item
553+
554+
555+def _check_delta_new_path_entry_both_or_None(delta):
556+ """Decorate a delta and check that the new_path and entry are paired.
557+
558+ :return: A generator over delta.
559+ """
560+ for item in delta:
561+ new_path = item[1]
562+ entry = item[3]
563+ if new_path is None and entry is not None:
564+ raise errors.InconsistentDelta(item[0], item[1],
565+ "Entry with no new_path")
566+ if new_path is not None and entry is None:
567+ raise errors.InconsistentDelta(new_path, item[1],
568+ "new_path with no entry")
569+ yield item
570
571=== modified file 'bzrlib/tests/test_chk_map.py'
572--- bzrlib/tests/test_chk_map.py 2009-07-14 17:33:13 +0000
573+++ bzrlib/tests/test_chk_map.py 2009-07-16 16:36:05 +0000
574@@ -20,7 +20,11 @@
575
576 from bzrlib import (
577 chk_map,
578+<<<<<<< TREE
579 groupcompress,
580+=======
581+ errors,
582+>>>>>>> MERGE-SOURCE
583 osutils,
584 tests,
585 )
586@@ -466,6 +470,18 @@
587 # updated key.
588 self.assertEqual(new_root, chkmap._root_node._key)
589
590+ def test_apply_new_keys_must_be_new(self):
591+ # applying a delta (None, "a", "b") to a map with 'a' in it generates
592+ # an error.
593+ chk_bytes = self.get_chk_bytes()
594+ root_key = CHKMap.from_dict(chk_bytes, {("a",):"b"})
595+ chkmap = CHKMap(chk_bytes, root_key)
596+ self.assertRaises(errors.InconsistentDelta, chkmap.apply_delta,
597+ [(None, ("a",), "b")])
598+ # As an error occured, the update should have left us without changing
599+ # anything (the root should be unchanged).
600+ self.assertEqual(root_key, chkmap._root_node._key)
601+
602 def test_apply_delta_is_deterministic(self):
603 chk_bytes = self.get_chk_bytes()
604 chkmap1 = CHKMap(chk_bytes, None)
605
606=== modified file 'bzrlib/tests/test_inv.py'
607--- bzrlib/tests/test_inv.py 2009-07-10 12:40:36 +0000
608+++ bzrlib/tests/test_inv.py 2009-07-16 16:36:05 +0000
609@@ -54,9 +54,14 @@
610 'apply_delta':apply_inventory_Repository_add_inventory_by_delta,
611 'format':format}))
612 for format in workingtree_formats():
613- scenarios.append((str(format.__class__.__name__), {
614+ scenarios.append(
615+ (str(format.__class__.__name__) + ".update_basis_by_delta", {
616 'apply_delta':apply_inventory_WT_basis,
617 'format':format}))
618+ scenarios.append(
619+ (str(format.__class__.__name__) + ".apply_inventory_delta", {
620+ 'apply_delta':apply_inventory_WT,
621+ 'format':format}))
622 return multiply_tests(to_adapt, scenarios, result)
623
624
625@@ -71,6 +76,42 @@
626 return basis
627
628
629+def apply_inventory_WT(self, basis, delta):
630+ """Apply delta to basis and return the result.
631+
632+ This sets the tree state to be basis, and then calls apply_inventory_delta.
633+
634+ :param basis: An inventory to be used as the basis.
635+ :param delta: The inventory delta to apply:
636+ :return: An inventory resulting from the application.
637+ """
638+ control = self.make_bzrdir('tree', format=self.format._matchingbzrdir)
639+ control.create_repository()
640+ control.create_branch()
641+ tree = self.format.initialize(control)
642+ tree.lock_write()
643+ try:
644+ tree._write_inventory(basis)
645+ finally:
646+ tree.unlock()
647+ # Fresh object, reads disk again.
648+ tree = tree.bzrdir.open_workingtree()
649+ tree.lock_write()
650+ try:
651+ tree.apply_inventory_delta(delta)
652+ finally:
653+ tree.unlock()
654+ # reload tree - ensure we get what was written.
655+ tree = tree.bzrdir.open_workingtree()
656+ tree.lock_read()
657+ self.addCleanup(tree.unlock)
658+ # One could add 'tree._validate' here but that would cause 'early' failues
659+ # as far as higher level code is concerned. Possibly adding an
660+ # expect_fail parameter to this function and if that is False then do a
661+ # validate call.
662+ return tree.inventory
663+
664+
665 def apply_inventory_WT_basis(self, basis, delta):
666 """Apply delta to basis and return the result.
667
668@@ -118,7 +159,7 @@
669 paths = {}
670 parents = set()
671 for old, new, id, entry in delta:
672- if entry is None:
673+ if None in (new, entry):
674 continue
675 paths[new] = (entry.file_id, entry.kind)
676 parents.add(osutils.dirname(new))
677@@ -296,6 +337,22 @@
678 self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
679 inv, delta)
680
681+ def test_mismatched_new_path_entry_None(self):
682+ inv = self.get_empty_inventory()
683+ delta = [(None, u'path', 'id', None)]
684+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
685+ inv, delta)
686+
687+ def test_mismatched_new_path_None_entry(self):
688+ inv = self.get_empty_inventory()
689+ file1 = inventory.InventoryFile('id1', 'path', inv.root.file_id)
690+ file1.revision = 'result'
691+ file1.text_size = 0
692+ file1.text_sha1 = ""
693+ delta = [(u"path", None, 'id1', file1)]
694+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
695+ inv, delta)
696+
697 def test_parent_is_not_directory(self):
698 inv = self.get_empty_inventory()
699 file1 = inventory.InventoryFile('id1', 'path', inv.root.file_id)
700@@ -321,6 +378,106 @@
701 self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
702 inv, delta)
703
704+ def test_new_parent_path_has_wrong_id(self):
705+ inv = self.get_empty_inventory()
706+ parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id)
707+ parent1.revision = 'result'
708+ parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id)
709+ parent2.revision = 'result'
710+ file1 = inventory.InventoryFile('id', 'path', 'p-2')
711+ file1.revision = 'result'
712+ file1.text_size = 0
713+ file1.text_sha1 = ""
714+ inv.add(parent1)
715+ inv.add(parent2)
716+ # This delta claims that file1 is at dir/path, but actually its at
717+ # dir2/path if you follow the inventory parent structure.
718+ delta = [(None, u'dir/path', 'id', file1)]
719+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
720+ inv, delta)
721+
722+ def test_old_parent_path_is_wrong(self):
723+ inv = self.get_empty_inventory()
724+ parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id)
725+ parent1.revision = 'result'
726+ parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id)
727+ parent2.revision = 'result'
728+ file1 = inventory.InventoryFile('id', 'path', 'p-2')
729+ file1.revision = 'result'
730+ file1.text_size = 0
731+ file1.text_sha1 = ""
732+ inv.add(parent1)
733+ inv.add(parent2)
734+ inv.add(file1)
735+ # This delta claims that file1 was at dir/path, but actually it was at
736+ # dir2/path if you follow the inventory parent structure.
737+ delta = [(u'dir/path', None, 'id', None)]
738+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
739+ inv, delta)
740+
741+ def test_old_parent_path_is_for_other_id(self):
742+ inv = self.get_empty_inventory()
743+ parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id)
744+ parent1.revision = 'result'
745+ parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id)
746+ parent2.revision = 'result'
747+ file1 = inventory.InventoryFile('id', 'path', 'p-2')
748+ file1.revision = 'result'
749+ file1.text_size = 0
750+ file1.text_sha1 = ""
751+ file2 = inventory.InventoryFile('id2', 'path', 'p-1')
752+ file2.revision = 'result'
753+ file2.text_size = 0
754+ file2.text_sha1 = ""
755+ inv.add(parent1)
756+ inv.add(parent2)
757+ inv.add(file1)
758+ inv.add(file2)
759+ # This delta claims that file1 was at dir/path, but actually it was at
760+ # dir2/path if you follow the inventory parent structure. At dir/path
761+ # is another entry we should not delete.
762+ delta = [(u'dir/path', None, 'id', None)]
763+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
764+ inv, delta)
765+
766+ def test_add_existing_id_new_path(self):
767+ inv = self.get_empty_inventory()
768+ parent1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id)
769+ parent1.revision = 'result'
770+ parent2 = inventory.InventoryDirectory('p-1', 'dir2', inv.root.file_id)
771+ parent2.revision = 'result'
772+ inv.add(parent1)
773+ delta = [(None, u'dir2', 'p-1', parent2)]
774+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
775+ inv, delta)
776+
777+ def test_add_new_id_existing_path(self):
778+ inv = self.get_empty_inventory()
779+ parent1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id)
780+ parent1.revision = 'result'
781+ parent2 = inventory.InventoryDirectory('p-2', 'dir1', inv.root.file_id)
782+ parent2.revision = 'result'
783+ inv.add(parent1)
784+ delta = [(None, u'dir1', 'p-2', parent2)]
785+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
786+ inv, delta)
787+
788+ def test_remove_dir_leaving_dangling_child(self):
789+ inv = self.get_empty_inventory()
790+ dir1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id)
791+ dir1.revision = 'result'
792+ dir2 = inventory.InventoryDirectory('p-2', 'child1', 'p-1')
793+ dir2.revision = 'result'
794+ dir3 = inventory.InventoryDirectory('p-3', 'child2', 'p-1')
795+ dir3.revision = 'result'
796+ inv.add(dir1)
797+ inv.add(dir2)
798+ inv.add(dir3)
799+ delta = [(u'dir1', None, 'p-1', None),
800+ (u'dir1/child2', None, 'p-3', None)]
801+ self.assertRaises(errors.InconsistentDelta, self.apply_delta, self,
802+ inv, delta)
803+
804
805 class TestInventoryEntry(TestCase):
806
807
808=== modified file 'doc/developers/inventory.txt'
809--- doc/developers/inventory.txt 2009-07-02 07:22:27 +0000
810+++ doc/developers/inventory.txt 2009-07-16 16:36:05 +0000
811@@ -545,6 +545,8 @@
812 times.
813 - has a fileid field different to the entry.fileid in the same item in the
814 delta.
815+ - has an entry that is in an impossible state (e.g. a directory with a text
816+ size)
817
818 Forms of inventory inconsistency deltas can carry/cause:
819 - An entry newly introduced to a path without also removing or relocating any
820@@ -554,6 +556,7 @@
821 (Wrong path)
822 - An entry whose parent is not a directory. (Under non-directory).
823 - An entry that is internally inconsistent.
824+ - An entry that is already present in the tree (Duplicate id)
825
826 Known causes of inconsistency:
827 - A 'new' entry which the inventory already has - when this is a directory