Merge lp:~pbeaman/akiban-persistit/fix_1010079_prune_long_mvv_corruption into lp:akiban-persistit

Proposed by Peter Beaman
Status: Merged
Approved by: Nathan Williams
Approved revision: 320
Merged at revision: 318
Proposed branch: lp:~pbeaman/akiban-persistit/fix_1010079_prune_long_mvv_corruption
Merge into: lp:akiban-persistit
Diff against target: 299 lines (+125/-30)
5 files modified
src/main/java/com/persistit/Buffer.java (+7/-20)
src/main/java/com/persistit/Exchange.java (+4/-4)
src/main/java/com/persistit/IntegrityCheck.java (+1/-1)
src/test/java/com/persistit/Bug1010079Test.java (+108/-0)
src/test/java/com/persistit/MVCCPruneBufferTest.java (+5/-5)
To merge this branch: bzr merge lp:~pbeaman/akiban-persistit/fix_1010079_prune_long_mvv_corruption
Reviewer Review Type Date Requested Status
Akiban Build User Needs Fixing
Nathan Williams Approve
Review via email: mp+109261@code.launchpad.net

Description of the change

Branch fixes two issues related to pruning of long-record MVVs:

1. When refreshing state of the original buffer after copy has been pruned, the FastIndex for the original needs to be invalidated if any keys were removed.

2. Buffer#pruneMvvValues must not use the thread-local Value object if called from Exchange#storeInternal. (We never saw a corruption due to this, but know it would break.)

Other changes include a simple test, a new stress test script and a minor change in JournalManager so to get rid of a stale value of BaseAddress. This is needed to get through the mixture_txn_2.plan.

(Note: in another branch I will propose a better way of restarting the same Persistit instance in tests.)

To post a comment you must log in.
Revision history for this message
Nathan Williams (nwilliams) wrote :

The new flag to prevent the double usage of thread local is fine (though if I were being picky I'd like a new method instead of a boolean) but it is also indirect. I can't think of a better way at the moment, but it might be less error prone to pass the thread locals down from some top level place (e.g. Exchange) instead of using them wherever, as we have no other way to protect ourselves.

There isn't a change in this diff in JournalManager. Need to push or is that not relevant anymore?

Are my eyes playing tricks on me or is the new .plan file the same block repeated 12 time? Another thing is that this uses, as many of the stress tests do, 8k pages. Probably a change for another time though.

review: Needs Information
Revision history for this message
Peter Beaman (pbeaman) wrote :

During the drive home I realized there isn't much reason to commit mixture_txn_2.plan. It was just a hack to repeat the test lots of times while keeping the debugger attached. So I just removed it.

That eliminates the need for the change in JournalManager - best not to change it right now.

I'm hoping that in the Exchange refactoring we'll be able to reduce the number of Value objects required and get rid of the ThreadLocal altogether. I appreciate your comments but would like to defer action until we do the new design.

Revision history for this message
Nathan Williams (nwilliams) wrote :

Works for now.

review: Approve
Revision history for this message
Akiban Build User (build-akiban) wrote :

There was one failure during build/test:

* unknown exception (check log)

review: Needs Fixing
Revision history for this message
Nathan Williams (nwilliams) wrote :

Passed persistit-build fine, but timed out waiting for AMI to run server build on. Will restart.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/main/java/com/persistit/Buffer.java'
2--- src/main/java/com/persistit/Buffer.java 2012-06-04 19:12:21 +0000
3+++ src/main/java/com/persistit/Buffer.java 2012-06-08 00:57:19 +0000
4@@ -3073,8 +3073,7 @@
5
6 int size = (decodeTailBlockSize(tbData) + ~TAILBLOCK_MASK) & TAILBLOCK_MASK;
7 if (size <= 0) {
8- _persistit.fatal("Buffer has invalid tailblock length " + size + " at " + tail + " in "
9- + this, null);
10+ _persistit.fatal("Buffer has invalid tailblock length " + size + " at " + tail + " in " + this, null);
11 }
12
13 if ((tbData & TAILBLOCK_INUSE_MASK) != 0) {
14@@ -3571,11 +3570,10 @@
15 * @return
16 * @throws PersistitException
17 */
18- boolean pruneMvvValues(final Tree tree) throws PersistitException {
19+ boolean pruneMvvValues(final Tree tree, boolean pruneLongMVVs) throws PersistitException {
20
21 boolean changed = false;
22 try {
23- boolean bumped = false;
24 boolean hasLongMvvRecords = false;
25
26 if (!isMine()) {
27@@ -3630,15 +3628,9 @@
28 incCountIfMvv(_bytes, offset, newSize);
29 }
30
31- boolean prunedAntiValue = pruneAntiValue(valueByte, p, tree);
32- changed |= prunedAntiValue;
33- if (prunedAntiValue) {
34+ if (pruneAntiValue(valueByte, p, tree)) {
35 changed = true;
36 p -= KEYBLOCK_LENGTH;
37- if (!bumped) {
38- bumpGeneration();
39- bumped = true;
40- }
41 }
42 }
43 }
44@@ -3649,7 +3641,7 @@
45 deallocatePrunedVersions(_persistit, _vol, prunedVersions);
46 prunedVersions.clear();
47
48- if (hasLongMvvRecords) {
49+ if (pruneLongMVVs && hasLongMvvRecords) {
50 List<PersistitException> deferredExceptions = new ArrayList<PersistitException>();
51 List<Long> oldChainsToDeallocate = new ArrayList<Long>();
52
53@@ -3664,9 +3656,9 @@
54 _alloc = copy._alloc;
55 _slack = copy._slack;
56 _mvvCount = copy._mvvCount;
57- _keyBlockEnd = copy._keyBlockEnd;
58- if (copy.getGeneration() > getGeneration()) {
59- bumpGeneration();
60+ if (_keyBlockEnd != copy._keyBlockEnd) {
61+ _keyBlockEnd = copy._keyBlockEnd;
62+ invalidateFastIndex();
63 }
64 setDirtyAtTimestamp(copyTimestamp);
65 deallocatePrunedVersions(_persistit, _vol, prunedVersions);
66@@ -3700,7 +3692,6 @@
67 final List<PersistitException> deferredExceptions, final List<Long> toDeallocate) {
68
69 boolean changed = false;
70- boolean bumped = false;
71 for (int p = KEY_BLOCK_START; p < _keyBlockEnd; p += KEYBLOCK_LENGTH) {
72 final int kbData = getInt(p);
73 final int tail = decodeKeyBlockTail(kbData);
74@@ -3743,10 +3734,6 @@
75 if (pruneAntiValue(valueByte, p, tree)) {
76 changed = true;
77 p -= KEYBLOCK_LENGTH;
78- if (!bumped) {
79- bumpGeneration();
80- bumped = true;
81- }
82 }
83 }
84 }
85
86=== modified file 'src/main/java/com/persistit/Exchange.java'
87--- src/main/java/com/persistit/Exchange.java 2012-05-30 04:09:46 +0000
88+++ src/main/java/com/persistit/Exchange.java 2012-06-08 00:57:19 +0000
89@@ -1548,7 +1548,7 @@
90 if (splitRequired && !treeClaimAcquired) {
91 if (!didPrune && buffer.isDataPage()) {
92 didPrune = true;
93- if (buffer.pruneMvvValues(_tree)) {
94+ if (buffer.pruneMvvValues(_tree, false)) {
95 continue;
96 }
97 }
98@@ -3616,7 +3616,7 @@
99 search(key, true);
100 buffer = _levelCache[0]._buffer;
101 if (buffer != null) {
102- return buffer.pruneMvvValues(_tree);
103+ return buffer.pruneMvvValues(_tree, true);
104 } else {
105 return false;
106 }
107@@ -3638,7 +3638,7 @@
108
109 while (buffer != null) {
110 checkPageType(buffer, Buffer.PAGE_TYPE_DATA, false);
111- pruned |= buffer.pruneMvvValues(_tree);
112+ pruned |= buffer.pruneMvvValues(_tree, true);
113 final int foundAt = buffer.findKey(key2);
114 if (!buffer.isAfterRightEdge(foundAt)) {
115 break;
116@@ -3664,7 +3664,7 @@
117 Buffer buffer = null;
118 try {
119 buffer = _pool.get(_volume, page, true, true);
120- return buffer.pruneMvvValues(_tree);
121+ return buffer.pruneMvvValues(_tree, true);
122 } finally {
123 if (buffer != null) {
124 buffer.release();
125
126=== modified file 'src/main/java/com/persistit/IntegrityCheck.java'
127--- src/main/java/com/persistit/IntegrityCheck.java 2012-05-25 18:50:59 +0000
128+++ src/main/java/com/persistit/IntegrityCheck.java 2012-06-08 00:57:19 +0000
129@@ -1134,7 +1134,7 @@
130 _counters._mvvPageCount++;
131 if (_prune && !_currentVolume.isReadOnly() && _counters._pruningErrorCount < MAX_PRUNING_ERRORS) {
132 try {
133- buffer.pruneMvvValues(tree);
134+ buffer.pruneMvvValues(tree, true);
135 _counters._prunedPageCount++;
136 } catch (PersistitException e) {
137 _counters._pruningErrorCount++;
138
139=== added file 'src/test/java/com/persistit/Bug1010079Test.java'
140--- src/test/java/com/persistit/Bug1010079Test.java 1970-01-01 00:00:00 +0000
141+++ src/test/java/com/persistit/Bug1010079Test.java 2012-06-08 00:57:19 +0000
142@@ -0,0 +1,108 @@
143+/**
144+ * Copyright © 2012 Akiban Technologies, Inc. All rights reserved.
145+ *
146+ * This program is free software: you can redistribute it and/or modify
147+ * it under the terms of the GNU Affero General Public License as
148+ * published by the Free Software Foundation, version 3 (only) of the
149+ * License.
150+ *
151+ * This program is distributed in the hope that it will be useful,
152+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
153+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
154+ * GNU Affero General Public License for more details.
155+ *
156+ * You should have received a copy of the GNU Affero General Public License
157+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
158+ *
159+ * This program may also be available under different license terms. For more
160+ * information, see www.akiban.com or contact licensing@akiban.com.
161+ */
162+
163+package com.persistit;
164+
165+import static org.junit.Assert.assertTrue;
166+
167+import org.junit.Test;
168+
169+/**
170+ * https://bugs.launchpad.net/akiban-persistit/+bug/1010079
171+ *
172+ * Various symptoms found while running stress test mixture_txn_1, including
173+ * CorruptVolumeException, CorruptValueException, and several different asserts.
174+ * These all stem from new code introduced in
175+ * lp:~pbeaman/akiban-persistit/fix_1006576_long_record_pruning. This bug is
176+ * related to 1005206 and 1006576. Diagnosis:
177+ *
178+ * New code added to prune LongRecord MVVs operates on a copy of the original
179+ * buffer and then atomically refreshes the original buffer from the copy. That
180+ * operation fails to invalidate the FastIndex of the original buffer in the
181+ * event a key is removed due to a LongRecord MVV becoming a primordial
182+ * anti-value.
183+ *
184+ * In addition, pruneLongMvvValues uses a Value object obtained from a
185+ * ThreadLocal. However, this same Value is already in use by
186+ * Exchange#storeInternal.
187+ *
188+ * @author peter
189+ *
190+ */
191+public class Bug1010079Test extends MVCCTestBase {
192+
193+ @Test
194+ public void induceCorruption() throws Exception {
195+ /*
196+ * 1. Create a page with a long record MVV.
197+ *
198+ * 2. Read the value to set up the LevelCache
199+ *
200+ * 3. Abort the transaction so that long record MVV will become an
201+ * AntiValue
202+ *
203+ * 4. Prune the Buffer to liberate the long record chain.
204+ *
205+ * 5. Store a value in the page (not yet certain why this is needed to
206+ * induce the bug, but it is.)
207+ *
208+ * 6. Attempt to the read the first value back in.
209+ *
210+ * Theory of bug is that the read attempt will not see a generation
211+ * change, will use the LevelCache, and will attempt to read a long
212+ * record from a page that has now become a data page.
213+ */
214+
215+ /*
216+ * Disable background pruning
217+ */
218+ _persistit.getCleanupManager().setPollInterval(-1);
219+
220+ /*
221+ * Bump the generation number -- perhaps unnecessary
222+ */
223+ for (int i = 0; i < 10; i++) {
224+ ex1.to(i).store();
225+ ex1.remove();
226+ }
227+
228+ /*
229+ * Store a single Long MVV
230+ */
231+ trx1.begin();
232+ storeLongMVV(ex1, 1);
233+ ex1.fetch();
234+ trx1.rollback();
235+ trx1.end();
236+
237+ _persistit.getTransactionIndex().updateActiveTransactionCache();
238+
239+ ex1.prune();
240+
241+ ex1.getValue().put(RED_FOX);
242+ ex1.to(0).store();
243+
244+ /*
245+ * This method throws a CorruptVolumeException
246+ */
247+ ex1.to(1).fetch();
248+ assertTrue("Value should have been removed by rollback", !ex1.getValue().isDefined());
249+ }
250+}
251
252=== modified file 'src/test/java/com/persistit/MVCCPruneBufferTest.java'
253--- src/test/java/com/persistit/MVCCPruneBufferTest.java 2012-06-04 14:59:07 +0000
254+++ src/test/java/com/persistit/MVCCPruneBufferTest.java 2012-06-08 00:57:19 +0000
255@@ -48,7 +48,7 @@
256 int available = buffer1.getAvailableSize();
257 int keys = buffer1.getKeyCount();
258 buffer1.claim(true);
259- buffer1.pruneMvvValues(null);
260+ buffer1.pruneMvvValues(null, true);
261 assertEquals(keys, buffer1.getKeyCount());
262 assertTrue(buffer1.getMvvCount() > 0);
263 assertEquals(available, buffer1.getAvailableSize());
264@@ -58,7 +58,7 @@
265
266 _persistit.getTransactionIndex().updateActiveTransactionCache();
267
268- buffer1.pruneMvvValues(null);
269+ buffer1.pruneMvvValues(null, true);
270 assertEquals(0, _persistit.getCleanupManager().getAcceptedCount());
271 assertTrue("Pruning should have removed primordial Anti-values", keys > buffer1.getKeyCount());
272 // mvvCount is 1 because there is still a leading primordial AntiValue
273@@ -83,7 +83,7 @@
274 int available = buffer1.getAvailableSize();
275 int keys = buffer1.getKeyCount();
276 buffer1.claim(true);
277- buffer1.pruneMvvValues(null);
278+ buffer1.pruneMvvValues(null, true);
279 buffer1.release();
280 assertEquals(keys, buffer1.getKeyCount());
281 assertTrue(buffer1.getMvvCount() > 0);
282@@ -107,7 +107,7 @@
283 for (long page = 1; page < pageCount; page++) {
284 final Buffer buffer = ex1.getBufferPool().get(ex1.getVolume(), page, true, true);
285 try {
286- buffer.pruneMvvValues(null);
287+ buffer.pruneMvvValues(null, true);
288 } finally {
289 buffer.release();
290 }
291@@ -128,7 +128,7 @@
292 for (long page = 1; page < pageCount; page++) {
293 final Buffer buffer = ex1.getBufferPool().get(ex1.getVolume(), page, true, true);
294 try {
295- buffer.pruneMvvValues(ex1.getTree());
296+ buffer.pruneMvvValues(ex1.getTree(), true);
297 } finally {
298 buffer.release();
299 }

Subscribers

People subscribed via source and target branches