Merge lp:~lifeless/bzr/apply-inventory-delta into lp:~bzr/bzr/trunk-old

Proposed by Robert Collins
Status: Merged
Merged at revision: not available
Proposed branch: lp:~lifeless/bzr/apply-inventory-delta
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: None lines
To merge this branch: bzr merge lp:~lifeless/bzr/apply-inventory-delta
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+8118@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

While investigating iter changes based commit I ran into repeated 'what
if' questions relating to the safety of partial deltas. As a result I've
written up a semi-formal model of what we do and how they work.

I'm going to follow this up with tests for the apply delta logic we have
to ensure that they deal correctly with the cases of corruption we know
about. Assuming (and I do) that they currently silently corrupt, this
may be a few days off.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has proposed merging lp:~lifeless/bzr/apply-inventory-delta into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> While investigating iter changes based commit I ran into repeated 'what
> if' questions relating to the safety of partial deltas. As a result I've
> written up a semi-formal model of what we do and how they work.
>
> I'm going to follow this up with tests for the apply delta logic we have
> to ensure that they deal correctly with the cases of corruption we know
> about. Assuming (and I do) that they currently silently corrupt, this
> may be a few days off.
>
> -Rob
>
>

 review approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpM5oIACgkQJdeBCYSNAAMdeACfXmeOq2V3RFDp002PaomI4eyz
JuMAn1yCJcU6Fkkh/rn7Jn/JJYi6CvzA
=6RHS
-----END PGP SIGNATURE-----

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/inventory.py'
--- bzrlib/inventory.py 2009-06-18 18:18:36 +0000
+++ bzrlib/inventory.py 2009-07-02 07:22:27 +0000
@@ -1090,6 +1090,9 @@
1090 def apply_delta(self, delta):1090 def apply_delta(self, delta):
1091 """Apply a delta to this inventory.1091 """Apply a delta to this inventory.
10921092
1093 See the inventory developers documentation for the theory behind
1094 inventory deltas.
1095
1093 :param delta: A list of changes to apply. After all the changes are1096 :param delta: A list of changes to apply. After all the changes are
1094 applied the final inventory must be internally consistent, but it1097 applied the final inventory must be internally consistent, but it
1095 is ok to supply changes which, if only half-applied would have an1098 is ok to supply changes which, if only half-applied would have an
@@ -1578,6 +1581,9 @@
1578 propagate_caches=False):1581 propagate_caches=False):
1579 """Create a new CHKInventory by applying inventory_delta to this one.1582 """Create a new CHKInventory by applying inventory_delta to this one.
15801583
1584 See the inventory developers documentation for the theory behind
1585 inventory deltas.
1586
1581 :param inventory_delta: The inventory delta to apply. See1587 :param inventory_delta: The inventory delta to apply. See
1582 Inventory.apply_delta for details.1588 Inventory.apply_delta for details.
1583 :param new_revision_id: The revision id of the resulting CHKInventory.1589 :param new_revision_id: The revision id of the resulting CHKInventory.
15841590
=== modified file 'bzrlib/mutabletree.py'
--- bzrlib/mutabletree.py 2009-06-10 03:56:49 +0000
+++ bzrlib/mutabletree.py 2009-07-02 07:22:27 +0000
@@ -523,6 +523,9 @@
523 for commit which is not required to handle situations that do not arise523 for commit which is not required to handle situations that do not arise
524 outside of commit.524 outside of commit.
525525
526 See the inventory developers documentation for the theory behind
527 inventory deltas.
528
526 :param new_revid: The new revision id for the trees parent.529 :param new_revid: The new revision id for the trees parent.
527 :param delta: An inventory delta (see apply_inventory_delta) describing530 :param delta: An inventory delta (see apply_inventory_delta) describing
528 the changes from the current left most parent revision to new_revid.531 the changes from the current left most parent revision to new_revid.
529532
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-06-26 09:24:34 +0000
+++ bzrlib/repository.py 2009-07-02 07:22:27 +0000
@@ -1018,6 +1018,9 @@
1018 parents, basis_inv=None, propagate_caches=False):1018 parents, basis_inv=None, propagate_caches=False):
1019 """Add a new inventory expressed as a delta against another revision.1019 """Add a new inventory expressed as a delta against another revision.
10201020
1021 See the inventory developers documentation for the theory behind
1022 inventory deltas.
1023
1021 :param basis_revision_id: The inventory id the delta was created1024 :param basis_revision_id: The inventory id the delta was created
1022 against. (This does not have to be a direct parent.)1025 against. (This does not have to be a direct parent.)
1023 :param delta: The inventory delta (see Inventory.apply_delta for1026 :param delta: The inventory delta (see Inventory.apply_delta for
10241027
=== modified file 'doc/developers/inventory.txt'
--- doc/developers/inventory.txt 2009-04-04 02:50:01 +0000
+++ doc/developers/inventory.txt 2009-07-02 07:22:27 +0000
@@ -498,3 +498,90 @@
498delta, the only other valid fields are OLDPATH and 'file-id'. PARENT_ID is ''498delta, the only other valid fields are OLDPATH and 'file-id'. PARENT_ID is ''
499when a delete has been recorded or when recording a new root entry.499when a delete has been recorded or when recording a new root entry.
500500
501
502Delta consistency
503=================
504
505Inventory deltas and more broadly changes between trees are a significant part
506of bzr's core operations: they are key components in status, diff, commit,
507and merge (although merge uses tree transform, deltas contain the changes that
508are applied to the transform). Our ability to perform a given operation depends
509on us creating consistent deltas between trees. Inconsistent deltas lead to
510errors and bugs, or even just unexpected conflicts.
511
512An inventory delta is a transform to change an inventory A into another
513inventory B (in patch terms its a perfect patch). Sometimes, for instance in a
514regular commit, inventory B is known at the time we create the delta. Other
515times, B is not known because the user is requesting that some parts of the
516second inventory they have are masked out from consideration. When this happens
517we create a delta that when applied to A creates a B we haven't seen in total
518before. In this situation we need to ensure that B will be internally
519consistent. Deltas are unidirectional, a delta(A, B) creates B from A, but
520cannot be used to create A from B.
521
522Deltas are expressed as a list of (oldpath, newpath, fileid, entry) tuples. The
523fileid, entry elements are normative; the old and new paths are strong hints
524but not currently guaranteed to be accurate. (This is a shame and something we
525should tighten up). Deltas are required to list all removals explicitly -
526removing the parent of an entry doesn't remove the entry.
527
528Applying a delta to an inventory consists of:
529 - removing all fileids for which entry is None
530 - adding or replacing all other fileids
531 - detecting consistency errors
532
533An interesting aspect of delta inconsistencies is when we notice them:
534 - Silent errors which our application logic misses
535 - Visible errors we catch during application, so bad data isn't stored in
536 the system.
537
538The minimum safe level for our application logic would be to catch all errors
539during application. Making generation never generate inconsistent deltas is
540a seperate but necessary condition for robust code.
541
542An inconsistent delta is one which:
543 - after application to an inventory the inventory is an impossible state.
544 - has the same fileid, or oldpath(not-None), or newpath(not-None) multiple
545 times.
546 - has a fileid field different to the entry.fileid in the same item in the
547 delta.
548
549Forms of inventory inconsistency deltas can carry/cause:
550 - An entry newly introduced to a path without also removing or relocating any
551 existing entry at that path. (Duplicate paths)
552 - An entry whose parent id isn't present in the tree. (Missing parent).
553 - Having oldpath or newpath not be actual original path or resulting path.
554 (Wrong path)
555 - An entry whose parent is not a directory. (Under non-directory).
556 - An entry that is internally inconsistent.
557
558Known causes of inconsistency:
559 - A 'new' entry which the inventory already has - when this is a directory
560 even arbitrary file ids under the 'new' entry are more likely to collide on
561 paths.
562 - Removing a directory without recursively removing its children - causes
563 Missing parent.
564 - Recording a change to an entry without including all changed entries found
565 following its parents up to and includin the root - can cause duplicate
566 paths, missing parents, wrong path, under non-directory.
567
568Avoiding inconsistent deltas
569----------------------------
570
571The simplest thing is to never create partial deltas, as it is trivial to
572be consistent when all data is examined every time. However users sometimes
573want to specify a subset of the changes in their tree when they do an operation
574which needs to create a delta - such as commit.
575
576We have a choice about handling user requests that can generate inconsistent
577deltas. We can alter or interpret the request in such a way that the delta will
578be consistent, but perhaps larger than the user had intended. Or we can
579identify problematic situations and abort, specifying to the user why we have
580aborted and likely things they can do to make their request generate a
581consistent delta.
582
583Currently we attempt to expand/interpret the request so that the user is not
584required to understand all the internal constraints of the system: if they
585request 'foo/bar' we automatically include foo. This works to an extent but on
586review we are creating inconsistent deltas by the way we do this. We need to
587avoid all the known causes of inconsistency in our delta creation logic.
501588