Merge lp:~pbeaman/akiban-persistit/fix-1032846-assertion-in-MVV-prune into lp:akiban-persistit

Proposed by Peter Beaman
Status: Merged
Approved by: Nathan Williams
Approved revision: 369
Merged at revision: 369
Proposed branch: lp:~pbeaman/akiban-persistit/fix-1032846-assertion-in-MVV-prune
Merge into: lp:akiban-persistit
Diff against target: 52 lines (+21/-4)
1 file modified
src/main/java/com/persistit/TransactionIndex.java (+21/-4)
To merge this branch: bzr merge lp:~pbeaman/akiban-persistit/fix-1032846-assertion-in-MVV-prune
Reviewer Review Type Date Requested Status
Nathan Williams Approve
Review via email: mp+124050@code.launchpad.net

Description of the change

Fix MVCC isolation protocol failures.

See

- https://bugs.launchpad.net/akiban-persistit/+bug/1032392
- https://bugs.launchpad.net/akiban-persistit/+bug/1032846

Two conditional statements in TransactionIndex are subject to possible race conditions. In addition to fixing these two statements, I carefully reviewed all uses of TransactionStatus state accessors outside of locked code blocks. The change in getStatus() is the result of code-review; I was never able to induce a failure as it was written, and it is possible that as written it was correct. However, as rewritten I can prove correctness.

The change in wwDependency is the result of detecting an actual failure using an instrumented version of the code. The key is that the value received by target.tc() changed between two AND clauses in the "Target committed and is not concurrent or it aborted" case. The condition

   target.tc() > 0

was true because the tc value received was UNCOMMITTED (Long.MAX_VALUE).

The condition

   target.getTc() < source.getTs()

was then true because the tc value received was -tcommit, the intermediate state prior to final commit. But there was never a value of target.getTc() that satisfied both conditions of the AND expression.

I could not think of a way to write unit tests for these conditions. In particular, the opportunity for target.getTc() to change in mid-expression has simply been removed. But I have rerun the tests that previously failed with moderate probability many times without failure since these changes.

I think the only way to guarantee correctness of these changes is through careful code review and analysis.

Note that I have been unable to link the AIOOBE in bug 1032392 with these changes; there is still another elusive bug in that bug report.

To post a comment you must log in.
Revision history for this message
Nathan Williams (nwilliams) wrote :

Delicate, but makes sense I think.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/main/java/com/persistit/TransactionIndex.java'
2--- src/main/java/com/persistit/TransactionIndex.java 2012-08-24 13:57:19 +0000
3+++ src/main/java/com/persistit/TransactionIndex.java 2012-09-12 20:26:18 +0000
4@@ -699,10 +699,25 @@
5 * These values are all visible to us with respect to a particular tsv
6 * because we could not have seen the tsv without its corresponding
7 * transaction status having been registered.
8+ *
9+ * Note: if tsv >= floor it is not sufficient to look only in current.
10+ * This is because the TransactionIndexBucket#reduce() method moves the
11+ * TransactionStatus to longRunning or aborted before it changes the
12+ * floor. But the converse is okay: if tsv < floor, then the
13+ * TransactionStatus must be in longRunning or aborted if present at
14+ * all.
15 */
16- if ((bucket.getCurrent() == null || tsv < bucket.getFloor()) && bucket.getLongRunning() == null
17+ final long floor = bucket.getFloor();
18+ if ((tsv >= floor && bucket.getCurrent() == null || tsv < floor) && bucket.getLongRunning() == null
19 && bucket.getAborted() == null) {
20- return null;
21+ /*
22+ * Ensure the floor was stable while reading these variables.
23+ * Otherwise lock and retry safely. Tests show this is almost always
24+ * the case, but there are very occasional misses.
25+ */
26+ if (floor == bucket.getFloor()) {
27+ return null;
28+ }
29 }
30
31 /*
32@@ -806,6 +821,8 @@
33 */
34 return 0;
35 }
36+
37+ final long tcommit = target.getTc();
38 if (target.getTs() != tsv) {
39 /*
40 * By the time the selected TransactionStatus has been found, it may
41@@ -817,9 +834,9 @@
42 return 0;
43 }
44
45- if (target.getTc() > 0 && target.getTc() < source.getTs() || target.getTc() == ABORTED) {
46+ if (tcommit > 0 && tcommit < source.getTs() || tcommit == ABORTED) {
47 /*
48- * Target is committed or aborted
49+ * Target committed and is not concurrent or it aborted
50 */
51 return 0;
52 }

Subscribers

People subscribed via source and target branches