Merge lp:~lifeless/bzr/apply-inventory-delta into lp:~bzr/bzr/trunk-old
- apply-inventory-delta
- Merge into trunk-old
Status: | Superseded |
---|---|
Proposed branch: | lp:~lifeless/bzr/apply-inventory-delta |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: |
827 lines (has conflicts)
Text conflict in bzrlib/tests/test_chk_map.py |
To merge this branch: | bzr merge lp:~lifeless/bzr/apply-inventory-delta |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Ian Clatworthy | Needs Fixing | ||
Review via email: mp+8730@code.launchpad.net |
This proposal supersedes a proposal from 2009-07-13.
This proposal has been superseded by a proposal from 2009-07-17.
Commit message
Description of the change
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> Add WorkingTree.
> implementations interface tested.
>
> This implementation wasn't preventing 'parent is not a directory'
> errors. Other than that the changes that are made are essentially
> cosmetic - consistent exceptions and catching corruption that we're very
> unlikely to cause.
>
> -Rob
>
>
The dirstate, doc and test changes look fine.
I'm really concerned though about the performance impact of calling
id2path for every entry in an inventory delta to do the additional
checks you've added. For chk-based inventories in particular, path
calculations *really* hurt performance.
Do you have any measurements re performance impact? I'm happy to help
here if you'd like.
Ian C.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Tue, 2009-07-14 at 04:21 +0000, Ian Clatworthy wrote:
> The dirstate, doc and test changes look fine.
>
> I'm really concerned though about the performance impact of calling
> id2path for every entry in an inventory delta to do the additional
> checks you've added. For chk-based inventories in particular, path
> calculations *really* hurt performance.
>
> Do you have any measurements re performance impact? I'm happy to help
> here if you'd like.
I don't, and I'd love some. id2path only occurs on the items being
changed, not on the full inventory so it should be tolerable (and
because its not path2id we load in the items we're changing and above
only). The effect on importers and commit should be marginal. I won't be
surprised if its visible, but even if it is I'm fairly sure that we need
this. Over the last week and a bit I've found a frighteningly large
number of ways to completely bork inventories via damaged deltas.
-Rob
Robert Collins (lifeless) wrote : | # |
Ian report a test failure;
ERROR: per_repository.
#
An inconsistent delta was supplied: [(('', ''), None, None), (None, ('', ''), 'my-root')]
#
reason: New items are already in the map [(('', ''), 'TREE_ROOT')].
This is a bogus delta new checks have uncovered; we need to fix the cause of this.
Robert Collins (lifeless) wrote : | # |
also from igc
ERROR: per_inventory.
An inconsistent delta was supplied involving 'a', None
reason: Entry with no new_path
(This indicates a broken sample delta, I suspect)
Ian Clatworthy (ian-clatworthy) wrote : | # |
Running usertest on OOo comparing rev 4538 vs same+patch shows:
1. commit of a new files increasing from 1.9s to 2.1s
2. commit of a changed file increasing from 1.8s to 3.3s.
The former is probably ok; the latter needs investigation.
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-07-16 at 03:07 +0000, Ian Clatworthy wrote:
> Running usertest on OOo comparing rev 4538 vs same+patch shows:
>
> 1. commit of a new files increasing from 1.9s to 2.1s
> 2. commit of a changed file increasing from 1.8s to 3.3s.
>
> The former is probably ok; the latter needs investigation.
Is the commit of a changed file whole-tree or named file?
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-07-16 at 03:15 +0000, Robert Collins wrote:
> On Thu, 2009-07-16 at 03:07 +0000, Ian Clatworthy wrote:
> > Running usertest on OOo comparing rev 4538 vs same+patch shows:
> >
> > 1. commit of a new files increasing from 1.9s to 2.1s
> > 2. commit of a changed file increasing from 1.8s to 3.3s.
> >
> > The former is probably ok; the latter needs investigation.
>
> Is the commit of a changed file whole-tree or named file?
Ian says 'whole tree with one changed file'.
Ian Clatworthy (ian-clatworthy) wrote : | # |
Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta into lp:bzr.
>
>
>
Some feedback on the dirstate.py changes ...
> + try:
> + entry = self._dirblocks
> + except IndexError:
> + raise errors.
> + "Wrong path for old path.")
> + if not f_present or entry[1][0][0] in 'ar':
> + raise errors.
> + "Wrong path for old path.")
> + if file_id != entry[0][2]:
> + raise errors.
> + "Attempt to remove wrong has wrong id - found %r."
> + % entry[0][2])
>
The first 2 messages here are identical. Is that by design or
cut-and-paste? (Sometimes it's worth making the messages slightly
different to assist support.)
The third message needs tweaking: "wrong has wrong ..."
> + # New record. Check there isn't a entry at this path already.
> + if not fullscan:
> + low_index, _ = self._find_
> + while low_index < len(block):
> + entry = block[low_index]
> + if entry[0][0:2] == key[0:2]:
> + if entry[1][0][0] not in 'ar':
> + # This entry has the same path (but a different id) as
> + # the new entry we're adding, and is present in ths
> + # tree.
> + raise errors.
> + ("%s/%s" % key[0:2]
> + "Attempt to add item at path already occupied by "
> + "id %r" % entry[0][2])
> + else:
> + break
>
The while loop looks like it can/will loop forever. Also s/ths/this/ in
the comment.
Ian C.
Ian Clatworthy (ian-clatworthy) wrote : | # |
Doc changes and tests all look fine. More code feedback: I don't think it's safe removing these few lines from inventory.py ...
for old_path, file_id in sorted(((op, f) for op, np, f, e in delta
- if file_id not in self:
- # adds come later
- continue
# Preserve unaltered children of file_id for later reinsertion.
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-07-16 at 03:28 +0000, Ian Clatworthy wrote:
> Doc changes and tests all look fine. More code feedback: I don't think
> it's safe removing these few lines from inventory.py ...
Why not?
> for old_path, file_id in sorted(((op, f) for op, np, f, e in
> delta
> if op is not None),
> reverse=True):
> - if file_id not in self:
> - # adds come later
> - continue
> # Preserve unaltered children of file_id for later
> reinsertion.
> file_id_children = getattr(
If file_id is not in self, then the old_path must be None, or the delta
is deleting a file_id that isn't present. Thats a corrupt delta and we
should not silently accomodate that.
-Rob
Ian Clatworthy (ian-clatworthy) wrote : | # |
Robert Collins wrote:
> On Thu, 2009-07-16 at 03:28 +0000, Ian Clatworthy wrote:
>
>
>> Doc changes and tests all look fine. More code feedback: I don't think
>> it's safe removing these few lines from inventory.py ...
>>
>
> Why not?
>
>
>> for old_path, file_id in sorted(((op, f) for op, np, f, e in
>> delta
>> if op is not None),
>> reverse=True):
>> - if file_id not in self:
>> - # adds come later
>> - continue
>> # Preserve unaltered children of file_id for later
>> reinsertion.
>> file_id_children = getattr(
>>
>
> If file_id is not in self, then the old_path must be None, or the delta
> is deleting a file_id that isn't present. Thats a corrupt delta and we
> should not silently accomodate that.
>
> -Rob
>
>
You need to trap the KeyError self[file_id] can throw if you remove it.
Ian C.
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-07-16 at 03:57 +0000, Ian Clatworthy wrote:
> You need to trap the KeyError self[file_id] can throw if you remove it.
In this branch I think there are tests that deleting an absent id raises
the right error. That code is dead code ;).
-Rob
Ian Clatworthy (ian-clatworthy) wrote : | # |
See linked bug for callgrind before and after attachments.
Robert Collins (lifeless) wrote : | # |
> Robert Collins wrote:
> > Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta
> into lp:bzr.
> >
> >
> >
> Some feedback on the dirstate.py changes ...
>
> > + try:
> > + entry = self._dirblocks
> > + except IndexError:
> > + raise errors.
> > + "Wrong path for old path.")
> > + if not f_present or entry[1][0][0] in 'ar':
> > + raise errors.
> > + "Wrong path for old path.")
> > + if file_id != entry[0][2]:
> > + raise errors.
> > + "Attempt to remove wrong has wrong id - found %r."
> > + % entry[0][2])
> >
>
> The first 2 messages here are identical. Is that by design or
> cut-and-paste? (Sometimes it's worth making the messages slightly
> different to assist support.)
Deliberate - its a series of checks with the same outcome.
> The third message needs tweaking: "wrong has wrong ..."
Changed: s/wrong has wrong/path has wrong/.
> The while loop looks like it can/will loop forever. Also s/ths/this/ in
> the comment.
Thanks, was missing a low_index += 1.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-07-16 08:00:03 +0000 |
3 | +++ NEWS 2009-07-16 16:36:05 +0000 |
4 | @@ -57,6 +57,13 @@ |
5 | Internals |
6 | ********* |
7 | |
8 | +* ``CHKMap.apply_delta`` now raises ``InconsistentDelta`` if a delta adds |
9 | + as new a key which was already mapped. (Robert Collins) |
10 | + |
11 | +* Inventory delta application catches more cases of corruption and can |
12 | + prevent corrupt deltas from affecting consistency of data structures on |
13 | + disk. (Robert Collins) |
14 | + |
15 | |
16 | bzr 1.17rc1 "So late it's brunch" 2009-07-13 |
17 | ############################################ |
18 | |
19 | === modified file 'bzrlib/chk_map.py' |
20 | --- bzrlib/chk_map.py 2009-07-14 17:33:13 +0000 |
21 | +++ bzrlib/chk_map.py 2009-07-16 16:36:05 +0000 |
22 | @@ -26,9 +26,9 @@ |
23 | |
24 | Updates to a CHKMap are done preferentially via the apply_delta method, to |
25 | allow optimisation of the update operation; but individual map/unmap calls are |
26 | -possible and supported. All changes via map/unmap are buffered in memory until |
27 | -the _save method is called to force serialisation of the tree. apply_delta |
28 | -performs a _save implicitly. |
29 | +possible and supported. Individual changes via map/unmap are buffered in memory |
30 | +until the _save method is called to force serialisation of the tree. |
31 | +apply_delta records its changes immediately by performing an implicit _save. |
32 | |
33 | TODO: |
34 | ----- |
35 | @@ -41,7 +41,10 @@ |
36 | |
37 | from bzrlib import lazy_import |
38 | lazy_import.lazy_import(globals(), """ |
39 | -from bzrlib import versionedfile |
40 | +from bzrlib import ( |
41 | + errors, |
42 | + versionedfile, |
43 | + ) |
44 | """) |
45 | from bzrlib import ( |
46 | lru_cache, |
47 | @@ -105,6 +108,14 @@ |
48 | of old_key is removed. |
49 | """ |
50 | delete_count = 0 |
51 | + # Check preconditions first. |
52 | + new_items = set([key for (old, key, value) in delta if key is not None |
53 | + and old is None]) |
54 | + existing_new = list(self.iteritems(key_filter=new_items)) |
55 | + if existing_new: |
56 | + raise errors.InconsistentDeltaDelta(delta, |
57 | + "New items are already in the map %r." % existing_new) |
58 | + # Now apply changes. |
59 | for old, new, value in delta: |
60 | if old is not None and old != new: |
61 | self.unmap(old, check_remap=False) |
62 | @@ -482,7 +493,11 @@ |
63 | return len(self._root_node) |
64 | |
65 | def map(self, key, value): |
66 | - """Map a key tuple to value.""" |
67 | + """Map a key tuple to value. |
68 | + |
69 | + :param key: A key to map. |
70 | + :param value: The value to assign to key. |
71 | + """ |
72 | # Need a root object. |
73 | self._ensure_root() |
74 | prefix, node_details = self._root_node.map(self._store, key, value) |
75 | |
76 | === modified file 'bzrlib/dirstate.py' |
77 | --- bzrlib/dirstate.py 2009-07-10 05:33:07 +0000 |
78 | +++ bzrlib/dirstate.py 2009-07-16 16:36:05 +0000 |
79 | @@ -204,6 +204,7 @@ |
80 | import bisect |
81 | import binascii |
82 | import errno |
83 | +import operator |
84 | import os |
85 | from stat import S_IEXEC |
86 | import stat |
87 | @@ -1277,22 +1278,49 @@ |
88 | def update_by_delta(self, delta): |
89 | """Apply an inventory delta to the dirstate for tree 0 |
90 | |
91 | + This is the workhorse for apply_inventory_delta in dirstate based |
92 | + trees. |
93 | + |
94 | :param delta: An inventory delta. See Inventory.apply_delta for |
95 | details. |
96 | """ |
97 | self._read_dirblocks_if_needed() |
98 | + encode = cache_utf8.encode |
99 | insertions = {} |
100 | removals = {} |
101 | - for old_path, new_path, file_id, inv_entry in sorted(delta, reverse=True): |
102 | + # Accumulate parent references (path_utf8, id), to check for parentless |
103 | + # items or items placed under files/links/tree-references. We get |
104 | + # references from every item in the delta that is not a deletion and |
105 | + # is not itself the root. |
106 | + parents = set() |
107 | + # Added ids must not be in the dirstate already. This set holds those |
108 | + # ids. |
109 | + new_ids = set() |
110 | + # This loop transforms the delta to single atomic operations that can |
111 | + # be executed and validated. |
112 | + for old_path, new_path, file_id, inv_entry in sorted( |
113 | + inventory._check_delta_unique_old_paths( |
114 | + inventory._check_delta_unique_new_paths( |
115 | + inventory._check_delta_ids_match_entry( |
116 | + inventory._check_delta_new_path_entry_both_or_None(delta)))), |
117 | + reverse=True): |
118 | if (file_id in insertions) or (file_id in removals): |
119 | raise errors.InconsistentDelta(old_path or new_path, file_id, |
120 | "repeated file_id") |
121 | if old_path is not None: |
122 | old_path = old_path.encode('utf-8') |
123 | removals[file_id] = old_path |
124 | + else: |
125 | + new_ids.add(file_id) |
126 | if new_path is not None: |
127 | + if inv_entry is None: |
128 | + raise errors.InconsistentDelta(new_path, file_id, |
129 | + "new_path with no entry") |
130 | new_path = new_path.encode('utf-8') |
131 | dirname, basename = osutils.split(new_path) |
132 | + dirname_utf8 = encode(dirname) |
133 | + if basename: |
134 | + parents.add((dirname_utf8, inv_entry.parent_id)) |
135 | key = (dirname, basename, file_id) |
136 | minikind = DirState._kind_to_minikind[inv_entry.kind] |
137 | if minikind == 't': |
138 | @@ -1321,15 +1349,39 @@ |
139 | child_basename) |
140 | insertions[child[0][2]] = (key, minikind, executable, |
141 | fingerprint, new_child_path) |
142 | - self._apply_removals(removals.values()) |
143 | - self._apply_insertions(insertions.values()) |
144 | + self._check_delta_ids_absent(new_ids, delta, 0) |
145 | + try: |
146 | + self._apply_removals(removals.iteritems()) |
147 | + self._apply_insertions(insertions.values()) |
148 | + # Validate parents |
149 | + self._after_delta_check_parents(parents, 0) |
150 | + except errors.BzrError, e: |
151 | + if 'integrity error' not in str(e): |
152 | + raise |
153 | + # _get_entry raises BzrError when a request is inconsistent; we |
154 | + # want such errors to be shown as InconsistentDelta - and that |
155 | + # fits the behaviour we trigger. |
156 | + self._changes_aborted = True |
157 | + raise errors.InconsistentDeltaDelta(delta, "error from _get_entry.") |
158 | |
159 | def _apply_removals(self, removals): |
160 | - for path in sorted(removals, reverse=True): |
161 | + for file_id, path in sorted(removals, reverse=True, |
162 | + key=operator.itemgetter(1)): |
163 | dirname, basename = osutils.split(path) |
164 | block_i, entry_i, d_present, f_present = \ |
165 | self._get_block_entry_index(dirname, basename, 0) |
166 | - entry = self._dirblocks[block_i][1][entry_i] |
167 | + try: |
168 | + entry = self._dirblocks[block_i][1][entry_i] |
169 | + except IndexError: |
170 | + raise errors.InconsistentDelta(path, file_id, |
171 | + "Wrong path for old path.") |
172 | + if not f_present or entry[1][0][0] in 'ar': |
173 | + raise errors.InconsistentDelta(path, file_id, |
174 | + "Wrong path for old path.") |
175 | + if file_id != entry[0][2]: |
176 | + raise errors.InconsistentDelta(path, file_id, |
177 | + "Attempt to remove wrong has wrong id - found %r." |
178 | + % entry[0][2]) |
179 | self._make_absent(entry) |
180 | # See if we have a malformed delta: deleting a directory must not |
181 | # leave crud behind. This increases the number of bisects needed |
182 | @@ -1348,9 +1400,13 @@ |
183 | "not deleted.") |
184 | |
185 | def _apply_insertions(self, adds): |
186 | - for key, minikind, executable, fingerprint, path_utf8 in sorted(adds): |
187 | - self.update_minimal(key, minikind, executable, fingerprint, |
188 | - path_utf8=path_utf8) |
189 | + try: |
190 | + for key, minikind, executable, fingerprint, path_utf8 in sorted(adds): |
191 | + self.update_minimal(key, minikind, executable, fingerprint, |
192 | + path_utf8=path_utf8) |
193 | + except errors.NotVersionedError: |
194 | + raise errors.InconsistentDelta(path_utf8.decode('utf8'), key[2], |
195 | + "Missing parent") |
196 | |
197 | def update_basis_by_delta(self, delta, new_revid): |
198 | """Update the parents of this tree after a commit. |
199 | @@ -1400,16 +1456,22 @@ |
200 | # At the same time, to reduce interface friction we convert the input |
201 | # inventory entries to dirstate. |
202 | root_only = ('', '') |
203 | - # Accumulate parent references (path and id), to check for parentless |
204 | + # Accumulate parent references (path_utf8, id), to check for parentless |
205 | # items or items placed under files/links/tree-references. We get |
206 | # references from every item in the delta that is not a deletion and |
207 | # is not itself the root. |
208 | parents = set() |
209 | + # Added ids must not be in the dirstate already. This set holds those |
210 | + # ids. |
211 | + new_ids = set() |
212 | for old_path, new_path, file_id, inv_entry in delta: |
213 | if inv_entry is not None and file_id != inv_entry.file_id: |
214 | raise errors.InconsistentDelta(new_path, file_id, |
215 | "mismatched entry file_id %r" % inv_entry) |
216 | if new_path is not None: |
217 | + if inv_entry is None: |
218 | + raise errors.InconsistentDelta(new_path, file_id, |
219 | + "new_path with no entry") |
220 | new_path_utf8 = encode(new_path) |
221 | # note the parent for validation |
222 | dirname_utf8, basename_utf8 = osutils.split(new_path_utf8) |
223 | @@ -1418,6 +1480,7 @@ |
224 | if old_path is None: |
225 | adds.append((None, encode(new_path), file_id, |
226 | inv_to_entry(inv_entry), True)) |
227 | + new_ids.add(file_id) |
228 | elif new_path is None: |
229 | deletes.append((encode(old_path), None, file_id, None, True)) |
230 | elif (old_path, new_path) != root_only: |
231 | @@ -1466,7 +1529,7 @@ |
232 | # of everything. |
233 | changes.append((encode(old_path), encode(new_path), file_id, |
234 | inv_to_entry(inv_entry))) |
235 | - |
236 | + self._check_delta_ids_absent(new_ids, delta, 1) |
237 | try: |
238 | # Finish expunging deletes/first half of renames. |
239 | self._update_basis_apply_deletes(deletes) |
240 | @@ -1475,7 +1538,7 @@ |
241 | # Apply in-situ changes. |
242 | self._update_basis_apply_changes(changes) |
243 | # Validate parents |
244 | - self._update_basis_check_parents(parents) |
245 | + self._after_delta_check_parents(parents, 1) |
246 | except errors.BzrError, e: |
247 | if 'integrity error' not in str(e): |
248 | raise |
249 | @@ -1492,6 +1555,29 @@ |
250 | self._id_index = None |
251 | return |
252 | |
253 | + def _check_delta_ids_absent(self, new_ids, delta, tree_index): |
254 | + """Check that none of the file_ids in new_ids are present in a tree.""" |
255 | + if not new_ids: |
256 | + return |
257 | + id_index = self._get_id_index() |
258 | + for file_id in new_ids: |
259 | + for key in id_index.get(file_id, []): |
260 | + block_i, entry_i, d_present, f_present = \ |
261 | + self._get_block_entry_index(key[0], key[1], tree_index) |
262 | + if not f_present: |
263 | + # In a different tree |
264 | + continue |
265 | + entry = self._dirblocks[block_i][1][entry_i] |
266 | + if entry[0][2] != file_id: |
267 | + # Different file_id, so not what we want. |
268 | + continue |
269 | + # NB: No changes made before this helper is called, so no need |
270 | + # to set the _changes_aborted flag. |
271 | + raise errors.InconsistentDelta( |
272 | + ("%s/%s" % key[0:2]).decode('utf8'), file_id, |
273 | + "This file_id is new in the delta but already present in " |
274 | + "the target") |
275 | + |
276 | def _update_basis_apply_adds(self, adds): |
277 | """Apply a sequence of adds to tree 1 during update_basis_by_delta. |
278 | |
279 | @@ -1600,18 +1686,24 @@ |
280 | # it is being resurrected here, so blank it out temporarily. |
281 | self._dirblocks[block_index][1][entry_index][1][1] = null |
282 | |
283 | - def _update_basis_check_parents(self, parents): |
284 | - """Check that parents required by the delta are all intact.""" |
285 | + def _after_delta_check_parents(self, parents, index): |
286 | + """Check that parents required by the delta are all intact. |
287 | + |
288 | + :param parents: An iterable of (path_utf8, file_id) tuples which are |
289 | + required to be present in tree 'index' at path_utf8 with id file_id |
290 | + and be a directory. |
291 | + :param index: The column in the dirstate to check for parents in. |
292 | + """ |
293 | for dirname_utf8, file_id in parents: |
294 | # Get the entry - the ensures that file_id, dirname_utf8 exists and |
295 | # has the right file id. |
296 | - entry = self._get_entry(1, file_id, dirname_utf8) |
297 | + entry = self._get_entry(index, file_id, dirname_utf8) |
298 | if entry[1] is None: |
299 | self._changes_aborted = True |
300 | raise errors.InconsistentDelta(dirname_utf8.decode('utf8'), |
301 | file_id, "This parent is not present.") |
302 | # Parents of things must be directories |
303 | - if entry[1][1][0] != 'd': |
304 | + if entry[1][index][0] != 'd': |
305 | self._changes_aborted = True |
306 | raise errors.InconsistentDelta(dirname_utf8.decode('utf8'), |
307 | file_id, "This parent is not a directory.") |
308 | @@ -2478,7 +2570,8 @@ |
309 | # old is finished: insert current_new into the state. |
310 | self.update_minimal(new_entry_key, current_new_minikind, |
311 | executable=current_new[1].executable, |
312 | - path_utf8=new_path_utf8, fingerprint=fingerprint) |
313 | + path_utf8=new_path_utf8, fingerprint=fingerprint, |
314 | + fullscan=True) |
315 | current_new = advance(new_iterator) |
316 | elif not current_new: |
317 | # new is finished |
318 | @@ -2496,7 +2589,8 @@ |
319 | current_old[1][0][0] != current_new_minikind): |
320 | self.update_minimal(current_old[0], current_new_minikind, |
321 | executable=current_new[1].executable, |
322 | - path_utf8=new_path_utf8, fingerprint=fingerprint) |
323 | + path_utf8=new_path_utf8, fingerprint=fingerprint, |
324 | + fullscan=True) |
325 | # both sides are dealt with, move on |
326 | current_old = advance(old_iterator) |
327 | current_new = advance(new_iterator) |
328 | @@ -2507,7 +2601,8 @@ |
329 | # add a entry for this and advance new |
330 | self.update_minimal(new_entry_key, current_new_minikind, |
331 | executable=current_new[1].executable, |
332 | - path_utf8=new_path_utf8, fingerprint=fingerprint) |
333 | + path_utf8=new_path_utf8, fingerprint=fingerprint, |
334 | + fullscan=True) |
335 | current_new = advance(new_iterator) |
336 | else: |
337 | # we've advanced past the place where the old key would be, |
338 | @@ -2571,7 +2666,7 @@ |
339 | return last_reference |
340 | |
341 | def update_minimal(self, key, minikind, executable=False, fingerprint='', |
342 | - packed_stat=None, size=0, path_utf8=None): |
343 | + packed_stat=None, size=0, path_utf8=None, fullscan=False): |
344 | """Update an entry to the state in tree 0. |
345 | |
346 | This will either create a new entry at 'key' or update an existing one. |
347 | @@ -2588,6 +2683,9 @@ |
348 | :param size: Size information for new entry |
349 | :param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing |
350 | extra computation. |
351 | + :param fullscan: If True then a complete scan of the dirstate is being |
352 | + done and checking for duplicate rows should not be done. This |
353 | + should only be set by set_state_from_inventory and similar methods. |
354 | |
355 | If packed_stat and fingerprint are not given, they're invalidated in |
356 | the entry. |
357 | @@ -2602,6 +2700,22 @@ |
358 | new_details = (minikind, fingerprint, size, executable, packed_stat) |
359 | id_index = self._get_id_index() |
360 | if not present: |
361 | + # New record. Check there isn't a entry at this path already. |
362 | + if not fullscan: |
363 | + low_index, _ = self._find_entry_index(key[0:2] + ('',), block) |
364 | + while low_index < len(block): |
365 | + entry = block[low_index] |
366 | + if entry[0][0:2] == key[0:2]: |
367 | + if entry[1][0][0] not in 'ar': |
368 | + # This entry has the same path (but a different id) as |
369 | + # the new entry we're adding, and is present in ths |
370 | + # tree. |
371 | + raise errors.InconsistentDelta( |
372 | + ("%s/%s" % key[0:2]).decode('utf8'), key[2], |
373 | + "Attempt to add item at path already occupied by " |
374 | + "id %r" % entry[0][2]) |
375 | + else: |
376 | + break |
377 | # new entry, synthesis cross reference here, |
378 | existing_keys = id_index.setdefault(key[2], set()) |
379 | if not existing_keys: |
380 | |
381 | === modified file 'bzrlib/inventory.py' |
382 | --- bzrlib/inventory.py 2009-07-10 05:18:29 +0000 |
383 | +++ bzrlib/inventory.py 2009-07-16 16:36:05 +0000 |
384 | @@ -1089,6 +1089,9 @@ |
385 | See the inventory developers documentation for the theory behind |
386 | inventory deltas. |
387 | |
388 | + If delta application fails the inventory is left in an indeterminate |
389 | + state and must not be used. |
390 | + |
391 | :param delta: A list of changes to apply. After all the changes are |
392 | applied the final inventory must be internally consistent, but it |
393 | is ok to supply changes which, if only half-applied would have an |
394 | @@ -1129,7 +1132,8 @@ |
395 | # facility. |
396 | list(_check_delta_unique_ids(_check_delta_unique_new_paths( |
397 | _check_delta_unique_old_paths(_check_delta_ids_match_entry( |
398 | - delta))))) |
399 | + _check_delta_new_path_entry_both_or_None( |
400 | + delta)))))) |
401 | |
402 | children = {} |
403 | # Remove all affected items which were in the original inventory, |
404 | @@ -1138,13 +1142,13 @@ |
405 | # modified children remaining by the time we examine it. |
406 | for old_path, file_id in sorted(((op, f) for op, np, f, e in delta |
407 | if op is not None), reverse=True): |
408 | - if file_id not in self: |
409 | - # adds come later |
410 | - continue |
411 | # Preserve unaltered children of file_id for later reinsertion. |
412 | file_id_children = getattr(self[file_id], 'children', {}) |
413 | if len(file_id_children): |
414 | children[file_id] = file_id_children |
415 | + if self.id2path(file_id) != old_path: |
416 | + raise errors.InconsistentDelta(old_path, file_id, |
417 | + "Entry was at wrong other path %r." % self.id2path(file_id)) |
418 | # Remove file_id and the unaltered children. If file_id is not |
419 | # being deleted it will be reinserted back later. |
420 | self.remove_recursive_id(file_id) |
421 | @@ -1153,7 +1157,7 @@ |
422 | # longest, ensuring that items which were modified and whose parents in |
423 | # the resulting inventory were also modified, are inserted after their |
424 | # parents. |
425 | - for new_path, new_entry in sorted((np, e) for op, np, f, e in |
426 | + for new_path, f, new_entry in sorted((np, f, e) for op, np, f, e in |
427 | delta if np is not None): |
428 | if new_entry.kind == 'directory': |
429 | # Pop the child which to allow detection of children whose |
430 | @@ -1166,9 +1170,15 @@ |
431 | new_entry = replacement |
432 | try: |
433 | self.add(new_entry) |
434 | + except errors.DuplicateFileId: |
435 | + raise errors.InconsistentDelta(new_path, new_entry.file_id, |
436 | + "New id is already present in target.") |
437 | except AttributeError: |
438 | raise errors.InconsistentDelta(new_path, new_entry.file_id, |
439 | "Parent is not a directory.") |
440 | + if self.id2path(new_entry.file_id) != new_path: |
441 | + raise errors.InconsistentDelta(new_path, new_entry.file_id, |
442 | + "New path is not consistent with parent path.") |
443 | if len(children): |
444 | # Get the parent id that was deleted |
445 | parent_id, children = children.popitem() |
446 | @@ -1254,12 +1264,11 @@ |
447 | To add a file to a branch ready to be committed, use Branch.add, |
448 | which calls this. |
449 | |
450 | - Returns the new entry object. |
451 | + :return: entry |
452 | """ |
453 | if entry.file_id in self._byid: |
454 | raise errors.DuplicateFileId(entry.file_id, |
455 | self._byid[entry.file_id]) |
456 | - |
457 | if entry.parent_id is None: |
458 | self.root = entry |
459 | else: |
460 | @@ -1590,6 +1599,7 @@ |
461 | copied to and updated for the result. |
462 | :return: The new CHKInventory. |
463 | """ |
464 | + split = osutils.split |
465 | result = CHKInventory(self._search_key_name) |
466 | if propagate_caches: |
467 | # Just propagate the path-to-fileid cache for now |
468 | @@ -1630,8 +1640,18 @@ |
469 | inventory_delta = _check_delta_unique_new_paths(inventory_delta) |
470 | # Check for entries that don't match the fileid |
471 | inventory_delta = _check_delta_ids_match_entry(inventory_delta) |
472 | - # All changed entries need to have their parents be directories. |
473 | + # Check for new_path <-> entry consistency |
474 | + inventory_delta = _check_delta_new_path_entry_both_or_None( |
475 | + inventory_delta) |
476 | + # All changed entries need to have their parents be directories and be |
477 | + # at the right path. This set contains (path, id) tuples. |
478 | parents = set() |
479 | + # When we delete an item, all the children of it must be either deleted |
480 | + # or altered in their own right. As we batch process the change via |
481 | + # CHKMap.apply_delta, we build a set of things to use to validate the |
482 | + # delta. |
483 | + deletes = set() |
484 | + altered = set() |
485 | for old_path, new_path, file_id, entry in inventory_delta: |
486 | # file id changes |
487 | if new_path == '': |
488 | @@ -1646,17 +1666,23 @@ |
489 | del result._path_to_fileid_cache[old_path] |
490 | except KeyError: |
491 | pass |
492 | + deletes.add(file_id) |
493 | else: |
494 | new_key = (file_id,) |
495 | new_value = result._entry_to_bytes(entry) |
496 | # Update caches. It's worth doing this whether |
497 | # we're propagating the old caches or not. |
498 | result._path_to_fileid_cache[new_path] = file_id |
499 | - parents.add(entry.parent_id) |
500 | + parents.add((split(new_path)[0], entry.parent_id)) |
501 | if old_path is None: |
502 | old_key = None |
503 | else: |
504 | old_key = (file_id,) |
505 | + if self.id2path(file_id) != old_path: |
506 | + raise errors.InconsistentDelta(old_path, file_id, |
507 | + "Entry was at wrong other path %r." % |
508 | + self.id2path(file_id)) |
509 | + altered.add(file_id) |
510 | id_to_entry_delta.append((old_key, new_key, new_value)) |
511 | if result.parent_id_basename_to_file_id is not None: |
512 | # parent_id, basename changes |
513 | @@ -1675,11 +1701,23 @@ |
514 | # If the two keys are the same, the value will be unchanged |
515 | # as its always the file id. |
516 | parent_id_basename_delta.append((old_key, new_key, new_value)) |
517 | + # validate that deletes are complete. |
518 | + for file_id in deletes: |
519 | + entry = self[file_id] |
520 | + if entry.kind != 'directory': |
521 | + continue |
522 | + # This loop could potentially be better by using the id_basename |
523 | + # map to just get the child file ids. |
524 | + for child in entry.children.values(): |
525 | + if child.file_id not in altered: |
526 | + raise errors.InconsistentDelta(self.id2path(child.file_id), |
527 | + child.file_id, "Child not deleted or reparented when " |
528 | + "parent deleted.") |
529 | result.id_to_entry.apply_delta(id_to_entry_delta) |
530 | if parent_id_basename_delta: |
531 | result.parent_id_basename_to_file_id.apply_delta(parent_id_basename_delta) |
532 | - parents.discard(None) |
533 | - for parent in parents: |
534 | + parents.discard(('', None)) |
535 | + for parent_path, parent in parents: |
536 | try: |
537 | if result[parent].kind != 'directory': |
538 | raise errors.InconsistentDelta(result.id2path(parent), parent, |
539 | @@ -1687,6 +1725,9 @@ |
540 | except errors.NoSuchId: |
541 | raise errors.InconsistentDelta("<unknown>", parent, |
542 | "Parent is not present in resulting inventory.") |
543 | + if result.path2id(parent_path) != parent: |
544 | + raise errors.InconsistentDelta(parent_path, parent, |
545 | + "Parent has wrong path %r." % result.path2id(parent_path)) |
546 | return result |
547 | |
548 | @classmethod |
549 | @@ -2156,3 +2197,20 @@ |
550 | raise errors.InconsistentDelta(item[0] or item[1], item[2], |
551 | "mismatched id with %r" % entry) |
552 | yield item |
553 | + |
554 | + |
555 | +def _check_delta_new_path_entry_both_or_None(delta): |
556 | + """Decorate a delta and check that the new_path and entry are paired. |
557 | + |
558 | + :return: A generator over delta. |
559 | + """ |
560 | + for item in delta: |
561 | + new_path = item[1] |
562 | + entry = item[3] |
563 | + if new_path is None and entry is not None: |
564 | + raise errors.InconsistentDelta(item[0], item[1], |
565 | + "Entry with no new_path") |
566 | + if new_path is not None and entry is None: |
567 | + raise errors.InconsistentDelta(new_path, item[1], |
568 | + "new_path with no entry") |
569 | + yield item |
570 | |
571 | === modified file 'bzrlib/tests/test_chk_map.py' |
572 | --- bzrlib/tests/test_chk_map.py 2009-07-14 17:33:13 +0000 |
573 | +++ bzrlib/tests/test_chk_map.py 2009-07-16 16:36:05 +0000 |
574 | @@ -20,7 +20,11 @@ |
575 | |
576 | from bzrlib import ( |
577 | chk_map, |
578 | +<<<<<<< TREE |
579 | groupcompress, |
580 | +======= |
581 | + errors, |
582 | +>>>>>>> MERGE-SOURCE |
583 | osutils, |
584 | tests, |
585 | ) |
586 | @@ -466,6 +470,18 @@ |
587 | # updated key. |
588 | self.assertEqual(new_root, chkmap._root_node._key) |
589 | |
590 | + def test_apply_new_keys_must_be_new(self): |
591 | + # applying a delta (None, "a", "b") to a map with 'a' in it generates |
592 | + # an error. |
593 | + chk_bytes = self.get_chk_bytes() |
594 | + root_key = CHKMap.from_dict(chk_bytes, {("a",):"b"}) |
595 | + chkmap = CHKMap(chk_bytes, root_key) |
596 | + self.assertRaises(errors.InconsistentDelta, chkmap.apply_delta, |
597 | + [(None, ("a",), "b")]) |
598 | + # As an error occured, the update should have left us without changing |
599 | + # anything (the root should be unchanged). |
600 | + self.assertEqual(root_key, chkmap._root_node._key) |
601 | + |
602 | def test_apply_delta_is_deterministic(self): |
603 | chk_bytes = self.get_chk_bytes() |
604 | chkmap1 = CHKMap(chk_bytes, None) |
605 | |
606 | === modified file 'bzrlib/tests/test_inv.py' |
607 | --- bzrlib/tests/test_inv.py 2009-07-10 12:40:36 +0000 |
608 | +++ bzrlib/tests/test_inv.py 2009-07-16 16:36:05 +0000 |
609 | @@ -54,9 +54,14 @@ |
610 | 'apply_delta':apply_inventory_Repository_add_inventory_by_delta, |
611 | 'format':format})) |
612 | for format in workingtree_formats(): |
613 | - scenarios.append((str(format.__class__.__name__), { |
614 | + scenarios.append( |
615 | + (str(format.__class__.__name__) + ".update_basis_by_delta", { |
616 | 'apply_delta':apply_inventory_WT_basis, |
617 | 'format':format})) |
618 | + scenarios.append( |
619 | + (str(format.__class__.__name__) + ".apply_inventory_delta", { |
620 | + 'apply_delta':apply_inventory_WT, |
621 | + 'format':format})) |
622 | return multiply_tests(to_adapt, scenarios, result) |
623 | |
624 | |
625 | @@ -71,6 +76,42 @@ |
626 | return basis |
627 | |
628 | |
629 | +def apply_inventory_WT(self, basis, delta): |
630 | + """Apply delta to basis and return the result. |
631 | + |
632 | + This sets the tree state to be basis, and then calls apply_inventory_delta. |
633 | + |
634 | + :param basis: An inventory to be used as the basis. |
635 | + :param delta: The inventory delta to apply: |
636 | + :return: An inventory resulting from the application. |
637 | + """ |
638 | + control = self.make_bzrdir('tree', format=self.format._matchingbzrdir) |
639 | + control.create_repository() |
640 | + control.create_branch() |
641 | + tree = self.format.initialize(control) |
642 | + tree.lock_write() |
643 | + try: |
644 | + tree._write_inventory(basis) |
645 | + finally: |
646 | + tree.unlock() |
647 | + # Fresh object, reads disk again. |
648 | + tree = tree.bzrdir.open_workingtree() |
649 | + tree.lock_write() |
650 | + try: |
651 | + tree.apply_inventory_delta(delta) |
652 | + finally: |
653 | + tree.unlock() |
654 | + # reload tree - ensure we get what was written. |
655 | + tree = tree.bzrdir.open_workingtree() |
656 | + tree.lock_read() |
657 | + self.addCleanup(tree.unlock) |
658 | + # One could add 'tree._validate' here but that would cause 'early' failues |
659 | + # as far as higher level code is concerned. Possibly adding an |
660 | + # expect_fail parameter to this function and if that is False then do a |
661 | + # validate call. |
662 | + return tree.inventory |
663 | + |
664 | + |
665 | def apply_inventory_WT_basis(self, basis, delta): |
666 | """Apply delta to basis and return the result. |
667 | |
668 | @@ -118,7 +159,7 @@ |
669 | paths = {} |
670 | parents = set() |
671 | for old, new, id, entry in delta: |
672 | - if entry is None: |
673 | + if None in (new, entry): |
674 | continue |
675 | paths[new] = (entry.file_id, entry.kind) |
676 | parents.add(osutils.dirname(new)) |
677 | @@ -296,6 +337,22 @@ |
678 | self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
679 | inv, delta) |
680 | |
681 | + def test_mismatched_new_path_entry_None(self): |
682 | + inv = self.get_empty_inventory() |
683 | + delta = [(None, u'path', 'id', None)] |
684 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
685 | + inv, delta) |
686 | + |
687 | + def test_mismatched_new_path_None_entry(self): |
688 | + inv = self.get_empty_inventory() |
689 | + file1 = inventory.InventoryFile('id1', 'path', inv.root.file_id) |
690 | + file1.revision = 'result' |
691 | + file1.text_size = 0 |
692 | + file1.text_sha1 = "" |
693 | + delta = [(u"path", None, 'id1', file1)] |
694 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
695 | + inv, delta) |
696 | + |
697 | def test_parent_is_not_directory(self): |
698 | inv = self.get_empty_inventory() |
699 | file1 = inventory.InventoryFile('id1', 'path', inv.root.file_id) |
700 | @@ -321,6 +378,106 @@ |
701 | self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
702 | inv, delta) |
703 | |
704 | + def test_new_parent_path_has_wrong_id(self): |
705 | + inv = self.get_empty_inventory() |
706 | + parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id) |
707 | + parent1.revision = 'result' |
708 | + parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id) |
709 | + parent2.revision = 'result' |
710 | + file1 = inventory.InventoryFile('id', 'path', 'p-2') |
711 | + file1.revision = 'result' |
712 | + file1.text_size = 0 |
713 | + file1.text_sha1 = "" |
714 | + inv.add(parent1) |
715 | + inv.add(parent2) |
716 | + # This delta claims that file1 is at dir/path, but actually its at |
717 | + # dir2/path if you follow the inventory parent structure. |
718 | + delta = [(None, u'dir/path', 'id', file1)] |
719 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
720 | + inv, delta) |
721 | + |
722 | + def test_old_parent_path_is_wrong(self): |
723 | + inv = self.get_empty_inventory() |
724 | + parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id) |
725 | + parent1.revision = 'result' |
726 | + parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id) |
727 | + parent2.revision = 'result' |
728 | + file1 = inventory.InventoryFile('id', 'path', 'p-2') |
729 | + file1.revision = 'result' |
730 | + file1.text_size = 0 |
731 | + file1.text_sha1 = "" |
732 | + inv.add(parent1) |
733 | + inv.add(parent2) |
734 | + inv.add(file1) |
735 | + # This delta claims that file1 was at dir/path, but actually it was at |
736 | + # dir2/path if you follow the inventory parent structure. |
737 | + delta = [(u'dir/path', None, 'id', None)] |
738 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
739 | + inv, delta) |
740 | + |
741 | + def test_old_parent_path_is_for_other_id(self): |
742 | + inv = self.get_empty_inventory() |
743 | + parent1 = inventory.InventoryDirectory('p-1', 'dir', inv.root.file_id) |
744 | + parent1.revision = 'result' |
745 | + parent2 = inventory.InventoryDirectory('p-2', 'dir2', inv.root.file_id) |
746 | + parent2.revision = 'result' |
747 | + file1 = inventory.InventoryFile('id', 'path', 'p-2') |
748 | + file1.revision = 'result' |
749 | + file1.text_size = 0 |
750 | + file1.text_sha1 = "" |
751 | + file2 = inventory.InventoryFile('id2', 'path', 'p-1') |
752 | + file2.revision = 'result' |
753 | + file2.text_size = 0 |
754 | + file2.text_sha1 = "" |
755 | + inv.add(parent1) |
756 | + inv.add(parent2) |
757 | + inv.add(file1) |
758 | + inv.add(file2) |
759 | + # This delta claims that file1 was at dir/path, but actually it was at |
760 | + # dir2/path if you follow the inventory parent structure. At dir/path |
761 | + # is another entry we should not delete. |
762 | + delta = [(u'dir/path', None, 'id', None)] |
763 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
764 | + inv, delta) |
765 | + |
766 | + def test_add_existing_id_new_path(self): |
767 | + inv = self.get_empty_inventory() |
768 | + parent1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id) |
769 | + parent1.revision = 'result' |
770 | + parent2 = inventory.InventoryDirectory('p-1', 'dir2', inv.root.file_id) |
771 | + parent2.revision = 'result' |
772 | + inv.add(parent1) |
773 | + delta = [(None, u'dir2', 'p-1', parent2)] |
774 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
775 | + inv, delta) |
776 | + |
777 | + def test_add_new_id_existing_path(self): |
778 | + inv = self.get_empty_inventory() |
779 | + parent1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id) |
780 | + parent1.revision = 'result' |
781 | + parent2 = inventory.InventoryDirectory('p-2', 'dir1', inv.root.file_id) |
782 | + parent2.revision = 'result' |
783 | + inv.add(parent1) |
784 | + delta = [(None, u'dir1', 'p-2', parent2)] |
785 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
786 | + inv, delta) |
787 | + |
788 | + def test_remove_dir_leaving_dangling_child(self): |
789 | + inv = self.get_empty_inventory() |
790 | + dir1 = inventory.InventoryDirectory('p-1', 'dir1', inv.root.file_id) |
791 | + dir1.revision = 'result' |
792 | + dir2 = inventory.InventoryDirectory('p-2', 'child1', 'p-1') |
793 | + dir2.revision = 'result' |
794 | + dir3 = inventory.InventoryDirectory('p-3', 'child2', 'p-1') |
795 | + dir3.revision = 'result' |
796 | + inv.add(dir1) |
797 | + inv.add(dir2) |
798 | + inv.add(dir3) |
799 | + delta = [(u'dir1', None, 'p-1', None), |
800 | + (u'dir1/child2', None, 'p-3', None)] |
801 | + self.assertRaises(errors.InconsistentDelta, self.apply_delta, self, |
802 | + inv, delta) |
803 | + |
804 | |
805 | class TestInventoryEntry(TestCase): |
806 | |
807 | |
808 | === modified file 'doc/developers/inventory.txt' |
809 | --- doc/developers/inventory.txt 2009-07-02 07:22:27 +0000 |
810 | +++ doc/developers/inventory.txt 2009-07-16 16:36:05 +0000 |
811 | @@ -545,6 +545,8 @@ |
812 | times. |
813 | - has a fileid field different to the entry.fileid in the same item in the |
814 | delta. |
815 | + - has an entry that is in an impossible state (e.g. a directory with a text |
816 | + size) |
817 | |
818 | Forms of inventory inconsistency deltas can carry/cause: |
819 | - An entry newly introduced to a path without also removing or relocating any |
820 | @@ -554,6 +556,7 @@ |
821 | (Wrong path) |
822 | - An entry whose parent is not a directory. (Under non-directory). |
823 | - An entry that is internally inconsistent. |
824 | + - An entry that is already present in the tree (Duplicate id) |
825 | |
826 | Known causes of inconsistency: |
827 | - A 'new' entry which the inventory already has - when this is a directory |
Add WorkingTree. apply_inventory _delta to the set of delta
implementations interface tested.
This implementation wasn't preventing 'parent is not a directory'
errors. Other than that the changes that are made are essentially
cosmetic - consistent exceptions and catching corruption that we're very
unlikely to cause.
-Rob
--