Merge lp:~pbeaman/akiban-persistit/fix-rebalance-exception2 into lp:akiban-persistit

Proposed by Peter Beaman
Status: Merged
Approved by: Nathan Williams
Approved revision: 400
Merged at revision: 400
Proposed branch: lp:~pbeaman/akiban-persistit/fix-rebalance-exception2
Merge into: lp:akiban-persistit
Diff against target: 420 lines (+279/-17)
6 files modified
src/main/java/com/persistit/Buffer.java (+6/-5)
src/main/java/com/persistit/CleanupManager.java (+1/-1)
src/main/java/com/persistit/Exchange.java (+93/-6)
src/main/java/com/persistit/TransactionPlayer.java (+3/-4)
src/main/java/com/persistit/VolumeStructure.java (+1/-1)
src/test/java/com/persistit/ExchangeRebalanceTest.java (+175/-0)
To merge this branch: bzr merge lp:~pbeaman/akiban-persistit/fix-rebalance-exception2
Reviewer Review Type Date Requested Status
Nathan Williams Approve
Peter Beaman Needs Resubmitting
Review via email: mp+136225@code.launchpad.net

Description of the change

This is a replacement for the original fix-rebalance-exception branch. This version has been demonstrated to work repeatedly in stress tests. I started from a fresh version of trunk at r396 in order to get rid of conflicts.

Here is the original description:

This proposal fixes two fairly delicate issues and should not be approved until Server 1.4.3 has shipped.

1. RebalanceException

There is a rarely encountered finge case where deleting a record can require two pages to be split into three. The case requires two adjacent pages that are very full, and a pair of keys A and B at the left edge of the right sibling page to be long and have a small elided byte count. In other words, B is different from A in such a way that all or nearly all of the bytes in B are required to represent its value. In addition, the key the left of A and A have a deep elision count. The RebalanceException occurs deleting A requires the max key of the left sibling page to become B, and B is too big to fit there. In all previous versions of Persistit, the application would simply receive a RebalanceException in response to the attempt to remove the key.

This phenomenon has never been seen in application code, including Akiban Server testing, but it does occur routinely in Persistit stress tests.

This branch fixes the problem by catching the RebalanceException (thrown by Buffer#join(..)) and handling it properly in Exchange#storeInternal(...). The code splits the left page and then sets up internal variables appropriately for a retry.

A new text ExchangeRebalanceTest creates the preconditions for exercising this code and then verifies by counting pages that the rebalance split has occurred. This code has also passed numerous stress test runs.

2. InUseException in stress tests

Thread A attempts to get page P from the buffer pool. Thread B attempts to get page Q, discovers that it needs to read the page from disk, and before A gets a claim on the buffer, choses the buffer containing P to evict and reused to hold Q. In this scenario, thread A does not detect that the buffer it is waiting for has changed until it can get a claim on the buffer. And further, because the identities of the pages being awaited by the two threads is different, a deadlock and result.

This bug was described by https://bugs.launchpad.net/akiban-persistit/+bug/1021734. However, the fix isn't quite right.

The original fix was for thread A, while waiting for a claim on the buffer containing page P, to wait for only a short interval. A would then recheck the identity of the page contained in the awaited buffer to verify it still contains P before retrying.

The problem with this is that on a very heavily loaded system (or when stress tests run an an ec2 instance with unreliable I/O times), there can be a livelock. Each time Thread A times out, it loses its place in the queue and goes back to the end. We think this is the mechanism for occasional timeouts now seen in stress tests, especially on ec2 instances.

The newly proposed solution is for Thread A to wait for page P using its original timeout (default is 60 seconds) and never give up its position in the queue. However, when Thread B changes the identity of the page held in the awaited buffer to Q, it ends Thread A's wait by interrupting it. Thread A receives the interrupt and uses a simple mechanism to detect whether the interrupt occurred for this reason or because there truly was an interrupt created within the environment.

Since creating the livelock in the first place is extremely difficult, I have no unit test for this. (In fact, I have no proof other than an empirical improvement in test failures to support the hypothesis about the mechanism.) The modified code has run through several stress test cycles, and we will be adding more experience with it over time.

To post a comment you must log in.
Revision history for this message
Nathan Williams (nwilliams) wrote :

Just a few questions, but otherwise seems OK.

Why was MINIMUM_PRUNE_OBSOLETE_TRANSACTIONS_INTERVAL_NS decreased by 10x?

Why was _spareKey2/3 changed to 3/4, diff line 209ish? Are 1 and 2 not safe to use here? Can we add a comment or asserts to remind us if so?

The _id and changed() handling int SharedResource seems vulnerable to false failures. As it is a volatile int and changed with ++, another thread can come into claim it and have seen it in one of two states: pre and post increment. In the former, it will return out of the catch block with false (as intended). In the latter, an non-user generated PersistitInterruptedException will be propagated. Is that what we want?

review: Needs Information
398. By Peter Beaman

Remove interrupt-changed logic

399. By Peter Beaman

Merge from trunk

Revision history for this message
Peter Beaman (pbeaman) wrote :

Re MINIMUM_PRUNE_OBSOLETE_TRANSACTIONS_INTERVAL_MS -

The intended value was 5 seconds, and due to a typo, the value has instead been 50 seconds. (Yes, someday we should do a time-standard sweep through the whole product - we have a story for this in the icebox.) The original intention was to avoid spending CPU to perform the pruneObsoleteTransaction loop on every CLEANUP_MANAGER polling cycle, which when the system is busy can soak the CPU. But the problem with 50 seconds is that it can allow obsolete transactions to pile up for a very long time. 5 seconds seems like a good compromise.

This change is not related to fixing the rebalance issue, and if you think we really need to, I could propose it as a separate branch. However, all the stress tests I've been running recently have used the new value.

Re _spareKey1/2 vs _spareKey3/4

Yes 1/2 are not safe to use here as I discovered the hard way. Again, we need to refactor Exchange some day to get rid of all these undocumented private contracts, but for the purpose of this branch I simple reviewed the code path and determined that 3/4 available and safe to use.

Re the changed() method and _id logic:

I have removed those changes and will explain why. I believe the increment of _changed is always safe because it was done only by a thread that had exclusive ownership of the page. However, there is an unsafe part of the logic and I could not figure out a way to avoid it. Specifically, if changed() detects that the buffer has changed identity, it will try to wake up all the threads waiting to claim it by interrupting them. The _id hack is there so that those threads can then try to figure out why they were interrupted. But even with the addition of the isQueued(Thread) call, I observed a case where a thread was interrupted while doing something else and so the application got a spurious PersistitInterruptedException.

The original purpose of this logic was to try to avoid a potential live-lock situation in which every time a thread wakes up after a short interval, it loses its place in the waiting thread queue. I believed but had no proof that this was the mechanism that caused some of the InUseException cases. However, recently I found an actual deadlock that probably explains those instances, so am less convinced of the live-lock scenario.

Also, removing that stuff makes this proposal specific to the RebalanceException which is cleaner. If we need the interrupt changes, we can always merge them in separately. I few more stress-test runs will tell the story.

Revision history for this message
Nathan Williams (nwilliams) wrote :

The interval change is fine. I just like to limit the number of things changing in each revision. It's already here and, as you say, tested.

Could we add asserts around the spareKey change? Yes a large refactor could possibly fix it, but misewell be as safe as we can until them.

I wasn't saying the increment of _id wasn't safe. I said it was vulnerable to false exceptions, which you then went on to say you saw (different cause, of course). Having it gone is all the better though.

Revision history for this message
Peter Beaman (pbeaman) wrote :

I would be happy with asserts on the spareKey change, but am not sure what
we would assert. Can you suggest the code you'd like to see?

On Wed, Nov 28, 2012 at 11:48 AM, Nathan Williams <email address hidden>wrote:

> The interval change is fine. I just like to limit the number of things
> changing in each revision. It's already here and, as you say, tested.
>
> Could we add asserts around the spareKey change? Yes a large refactor
> could possibly fix it, but misewell be as safe as we can until them.
>
> I wasn't saying the increment of _id wasn't safe. I said it was vulnerable
> to false exceptions, which you then went on to say you saw (different
> cause, of course). Having it gone is all the better though.
> --
>
> https://code.launchpad.net/~pbeaman/akiban-persistit/fix-rebalance-exception2/+merge/136225
> You are the owner of lp:~pbeaman/akiban-persistit/fix-rebalance-exception2.
>

Revision history for this message
Nathan Williams (nwilliams) wrote :

The problem is that raw_removeKeyRangeInternal already uses 1 and 2 in its implementation, right? So assert that the keys it is passed are not 1 and 2.

If for some reason that isn't how it is being used, at least add a comment next to the change so we don't ever use 1 and 2 again there.

400. By Peter Beaman

Assert that keys in raw_removeKeyRangeInternal are safe and use safe keys in TransactionPlayer

Revision history for this message
Peter Beaman (pbeaman) wrote :

Actually, this led to a great catch Nathan. Thanks. It used to be that sending _spareKey1/2 into raw_removeKeyStateInternal was safe; the keys are mutated by the method but no caller cares. (And the only callers that did use them were TransactionPlayer and pruneLeftEdgeValue.

But now the rebalanceSplit code causes a retry loop, and the mutated values do matter.

So, I added the requested asserts, modified TransactionPlayer (which required adding accessors for _spareKey3/4) and wrote a short comment.

The good catch is due to the possibility of a rebalanceSplit occurring during transaction replay from TransactionPlayer. We would have found it extremely difficult to diagnose such an occurrence in the field.

Revision history for this message
Peter Beaman (pbeaman) :
review: Needs Resubmitting
Revision history for this message
Nathan Williams (nwilliams) wrote :

Glad my pestering led to something useful.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/main/java/com/persistit/Buffer.java'
2--- src/main/java/com/persistit/Buffer.java 2012-10-26 19:44:17 +0000
3+++ src/main/java/com/persistit/Buffer.java 2012-11-28 18:57:32 +0000
4@@ -246,6 +246,7 @@
5 final static int LONGREC_PREFIX_OFFSET = 20;
6 final static int LONGREC_SIZE = LONGREC_PREFIX_OFFSET + LONGREC_PREFIX_SIZE;
7
8+ final static int ANTIVALUE_TYPE = Value.CLASS_ANTIVALUE;
9 /**
10 * Implicit overhead size
11 */
12@@ -1182,7 +1183,7 @@
13 return pointer;
14 }
15
16- void setLongRecordPointer(final int foundAt, final long pointer) {
17+ void neuterLongRecord(final int foundAt) {
18 assert isDataPage() : "Invalid page type for long records: " + this;
19 final int kbData = getInt(foundAt & P_MASK);
20 final int tail = decodeKeyBlockTail(kbData);
21@@ -1196,9 +1197,10 @@
22 if ((_bytes[tail + _tailHeaderSize + klength] & 0xFF) != LONGREC_TYPE) {
23 return;
24 }
25-
26- putLong(tail + _tailHeaderSize + klength + LONGREC_PAGE_OFFSET, (int) pointer);
27-
28+ /*
29+ * Value will now be undefined rather than a long record.
30+ */
31+ putByte(tail + _tailHeaderSize + klength, ANTIVALUE_TYPE);
32 }
33
34 long getPointer(final int foundAt) throws PersistitException {
35@@ -3597,7 +3599,6 @@
36 * @throws PersistitException
37 */
38 boolean pruneMvvValues(final Tree tree, final boolean pruneLongMVVs) throws PersistitException {
39-
40 boolean changed = false;
41 try {
42 boolean hasLongMvvRecords = false;
43
44=== modified file 'src/main/java/com/persistit/CleanupManager.java'
45--- src/main/java/com/persistit/CleanupManager.java 2012-09-26 18:34:01 +0000
46+++ src/main/java/com/persistit/CleanupManager.java 2012-11-28 18:57:32 +0000
47@@ -44,7 +44,7 @@
48
49 private final static long MINIMUM_MAINTENANCE_INTERVAL_NS = 1000000000L;
50
51- private final static long MINIMUM_PRUNE_OBSOLETE_TRANSACTIONS_INTERVAL_NS = 50000000000L;
52+ private final static long MINIMUM_PRUNE_OBSOLETE_TRANSACTIONS_INTERVAL_NS = 5000000000L;
53
54 private final static long DEFAULT_MINIMUM_PRUNING_DELAY_NS = 1000;
55
56
57=== modified file 'src/main/java/com/persistit/Exchange.java'
58--- src/main/java/com/persistit/Exchange.java 2012-11-25 20:14:58 +0000
59+++ src/main/java/com/persistit/Exchange.java 2012-11-28 18:57:32 +0000
60@@ -49,6 +49,7 @@
61 import com.persistit.exception.PersistitException;
62 import com.persistit.exception.PersistitInterruptedException;
63 import com.persistit.exception.ReadOnlyVolumeException;
64+import com.persistit.exception.RebalanceException;
65 import com.persistit.exception.RetryException;
66 import com.persistit.exception.RollbackException;
67 import com.persistit.exception.TreeNotFoundException;
68@@ -969,6 +970,26 @@
69 * An additional <code>Key</code> maintained for the convenience of
70 * {@link Transaction}, {@link PersistitMap} and {@link JournalManager}.
71 *
72+ * @return spareKey3
73+ */
74+ Key getAuxiliaryKey3() {
75+ return _spareKey3;
76+ }
77+
78+ /**
79+ * An additional <code>Key</code> maintained for the convenience of
80+ * {@link Transaction}, {@link PersistitMap} and {@link JournalManager}.
81+ *
82+ * @return spareKey4
83+ */
84+ Key getAuxiliaryKey4() {
85+ return _spareKey4;
86+ }
87+
88+ /**
89+ * An additional <code>Key</code> maintained for the convenience of
90+ * {@link Transaction}, {@link PersistitMap} and {@link JournalManager}.
91+ *
92 * @return spareKey2
93 */
94 Key getAuxiliaryKey2() {
95@@ -3196,6 +3217,13 @@
96 */
97 boolean raw_removeKeyRangeInternal(final Key key1, final Key key2, final boolean fetchFirst,
98 final boolean removeOnlyAntiValue) throws PersistitException {
99+ /*
100+ * _spareKey1 and _spareKey2 are mutated within the method and are then
101+ * wrong in the event of a retry loop.
102+ */
103+
104+ assert key1 != _spareKey1 && key2 != _spareKey1 && key1 != _spareKey2 && key2 != _spareKey2;
105+
106 _persistit.checkClosed();
107 _persistit.checkSuspended();
108
109@@ -3422,8 +3450,15 @@
110
111 Debug.$assert0.t(_tree.isOwnedAsWriterByMe() && buffer1.isOwnedAsWriterByMe()
112 && buffer2.isOwnedAsWriterByMe());
113- final boolean rebalanced = buffer1.join(buffer2, foundAt1, foundAt2, _spareKey1,
114- _spareKey2, _joinPolicy);
115+ boolean rebalanced = false;
116+ try {
117+ rebalanced = buffer1.join(buffer2, foundAt1, foundAt2, _spareKey1, _spareKey2,
118+ _joinPolicy);
119+ } catch (final RebalanceException rbe) {
120+ rebalanceSplit(lc);
121+ level++;
122+ continue;
123+ }
124 if (buffer1.isDataPage()) {
125 _tree.bumpChangeCount();
126 }
127@@ -3581,6 +3616,58 @@
128 return result;
129 }
130
131+ /**
132+ * Handle the extremely rare case where removing a key from a pair of
133+ * adjacent pages requires the left page to be split. To split the page this
134+ * method inserts an empty record with key being deleted, allowing the
135+ * {@link Buffer#split(Buffer, Key, ValueHelper, int, Key, Sequence, SplitPolicy)}
136+ * method to be used.
137+ *
138+ * @param lc
139+ * LevelCache set up by raw_removeKeyRangeInternal
140+ * @throws PersistitException
141+ */
142+ private void rebalanceSplit(final LevelCache lc) throws PersistitException {
143+ //
144+ // Allocate a new page
145+ //
146+ final int level = lc._level;
147+ final int foundAt = lc._leftFoundAt;
148+ final Buffer left = lc._leftBuffer;
149+ final Buffer inserted = _volume.getStructure().allocPage();
150+ try {
151+ final long timestamp = timestamp();
152+ left.writePageOnCheckpoint(timestamp);
153+ inserted.writePageOnCheckpoint(timestamp);
154+
155+ Debug.$assert0.t(inserted.getPageAddress() != 0);
156+ Debug.$assert0.t(inserted != left);
157+
158+ inserted.init(left.getPageType());
159+
160+ final Value value = _persistit.getThreadLocalValue();
161+ value.clear();
162+ _rawValueWriter.init(value);
163+ final Key key = _persistit.getThreadLocalKey();
164+ lc._rightBuffer.nextKey(key, Buffer.HEADER_SIZE);
165+
166+ left.split(inserted, key, _rawValueWriter, foundAt | EXACT_MASK, _spareKey1, Sequence.NONE,
167+ SplitPolicy.EVEN_BIAS);
168+
169+ inserted.setRightSibling(left.getRightSibling());
170+ left.setRightSibling(inserted.getPageAddress());
171+ left.setDirtyAtTimestamp(timestamp);
172+ inserted.setDirtyAtTimestamp(timestamp);
173+ lc._leftBuffer = inserted;
174+ lc._leftFoundAt = inserted.findKey(key);
175+
176+ _persistit.getCleanupManager().offer(
177+ new CleanupManager.CleanupIndexHole(_tree.getHandle(), inserted.getPageAddress(), level));
178+ } finally {
179+ left.releaseTouched();
180+ }
181+ }
182+
183 private void removeKeyRangeReleaseLevel(final int level) {
184
185 final LevelCache lc = _levelCache[level];
186@@ -3710,12 +3797,12 @@
187 final int offset = (int) (at >>> 32);
188 final int size = (int) at;
189 if (size == 1 && buffer.getBytes()[offset] == MVV.TYPE_ANTIVALUE) {
190- buffer.nextKey(_spareKey1, Buffer.KEY_BLOCK_START);
191+ buffer.nextKey(_spareKey3, Buffer.KEY_BLOCK_START);
192 buffer.release();
193 buffer = null;
194- _spareKey1.copyTo(_spareKey2);
195- _spareKey2.nudgeDeeper();
196- raw_removeKeyRangeInternal(_spareKey1, _spareKey2, false, true);
197+ _spareKey3.copyTo(_spareKey4);
198+ _spareKey4.nudgeDeeper();
199+ raw_removeKeyRangeInternal(_spareKey3, _spareKey4, false, true);
200 return true;
201 }
202 }
203
204=== modified file 'src/main/java/com/persistit/TransactionPlayer.java'
205--- src/main/java/com/persistit/TransactionPlayer.java 2012-10-11 20:35:03 +0000
206+++ src/main/java/com/persistit/TransactionPlayer.java 2012-11-28 18:57:32 +0000
207@@ -199,8 +199,8 @@
208 final int elisionCount = DR.getKey2Elision(bb);
209 final Exchange exchange = getExchange(DR.getTreeHandle(bb), address, startTimestamp);
210 exchange.ignoreTransactions();
211- final Key key1 = exchange.getAuxiliaryKey1();
212- final Key key2 = exchange.getAuxiliaryKey2();
213+ final Key key1 = exchange.getAuxiliaryKey3();
214+ final Key key2 = exchange.getAuxiliaryKey4();
215 System.arraycopy(bb.array(), bb.position() + DR.OVERHEAD, key1.getEncodedBytes(), 0, key1Size);
216 key1.setEncodedSize(key1Size);
217 final int key2Size = innerSize - DR.OVERHEAD - key1Size;
218@@ -208,8 +208,7 @@
219 System.arraycopy(bb.array(), bb.position() + DR.OVERHEAD + key1Size, key2.getEncodedBytes(),
220 elisionCount, key2Size);
221 key2.setEncodedSize(key2Size + elisionCount);
222- listener.removeKeyRange(address, startTimestamp, exchange, exchange.getAuxiliaryKey1(),
223- exchange.getAuxiliaryKey2());
224+ listener.removeKeyRange(address, startTimestamp, exchange, key1, key2);
225 appliedUpdates.incrementAndGet();
226 releaseExchange(exchange);
227 break;
228
229=== modified file 'src/main/java/com/persistit/VolumeStructure.java'
230--- src/main/java/com/persistit/VolumeStructure.java 2012-10-03 14:43:25 +0000
231+++ src/main/java/com/persistit/VolumeStructure.java 2012-11-28 18:57:32 +0000
232@@ -624,7 +624,7 @@
233 * Detects whether and prevents same pointer from being read
234 * and deallocated twice.
235 */
236- buffer.setLongRecordPointer(p, INVALID_PAGE_ADDRESS);
237+ buffer.neuterLongRecord(p);
238 }
239 }
240 }
241
242=== added file 'src/test/java/com/persistit/ExchangeRebalanceTest.java'
243--- src/test/java/com/persistit/ExchangeRebalanceTest.java 1970-01-01 00:00:00 +0000
244+++ src/test/java/com/persistit/ExchangeRebalanceTest.java 2012-11-28 18:57:32 +0000
245@@ -0,0 +1,175 @@
246+/**
247+ * Copyright © 2012 Akiban Technologies, Inc. All rights reserved.
248+ *
249+ * This program and the accompanying materials are made available
250+ * under the terms of the Eclipse Public License v1.0 which
251+ * accompanies this distribution, and is available at
252+ * http://www.eclipse.org/legal/epl-v10.html
253+ *
254+ * This program may also be available under different license terms.
255+ * For more information, see www.akiban.com or contact licensing@akiban.com.
256+ *
257+ * Contributors:
258+ * Akiban Technologies, Inc.
259+ */
260+
261+package com.persistit;
262+
263+import static org.junit.Assert.assertEquals;
264+
265+import org.junit.Test;
266+
267+import com.persistit.exception.PersistitException;
268+import com.persistit.policy.SplitPolicy;
269+
270+public class ExchangeRebalanceTest extends PersistitUnitTestCase {
271+
272+ final StringBuilder sb = new StringBuilder();
273+
274+ @Override
275+ public void setUp() throws Exception {
276+ super.setUp();
277+ while (sb.length() < Buffer.MAX_BUFFER_SIZE) {
278+ sb.append(RED_FOX);
279+ }
280+ }
281+
282+ /**
283+ * Construct a tree with two adjacent pages that are nearly full such that
284+ * the second key B of the right page is long and has very small elided byte
285+ * count relative to the first key A on that page. Removing key A requires
286+ * the left sibling to have B as its max key. If B won't fit on the left
287+ * page, then Persistit splits the left page to make room for it. Note that
288+ * changes in key or page structure will likely break this test.
289+ *
290+ * @throws Exception
291+ */
292+ @Test
293+ public void testRebalanceException() throws Exception {
294+ final Exchange exchange = _persistit.getExchange("persistit", "rebalance", true);
295+ exchange.setSplitPolicy(SplitPolicy.LEFT_BIAS);
296+ setUpPrettyFullBuffers(exchange, false, RED_FOX.length(), true);
297+ System.out.printf("\ntestRebalanceException\n");
298+ _persistit.checkAllVolumes();
299+ final int beforeRemove = countSiblings(exchange, 0);
300+ exchange.clear().append("b").previous(true);
301+ exchange.remove();
302+ _persistit.getCleanupManager().poll();
303+ _persistit.checkAllVolumes();
304+ final int afterRemove = countSiblings(exchange, 0);
305+ assertEquals("Remove should have caused rebalance", beforeRemove + 1, afterRemove);
306+ }
307+
308+ /**
309+ * Similar logic to {@link #testRebalanceException()} except that the keys
310+ * are carefully constructed on the index leaf level rather than the data
311+ * level. To do this we use three long-ish records per major key in the data
312+ * pages so that the key pattern in the index leaf table aligns as desired.
313+ *
314+ * @throws Exception
315+ */
316+ @Test
317+ public void testRebalanceIndexException() throws Exception {
318+ final Exchange exchange = _persistit.getExchange("persistit", "rebalance", true);
319+ exchange.setSplitPolicy(SplitPolicy.LEFT_BIAS);
320+ setUpPrettyFullBuffers(exchange, true, 0, true);
321+ System.out.printf("\ntestRebalanceIndexException\n");
322+ _persistit.checkAllVolumes();
323+ final int beforeRemove = countSiblings(exchange, 1);
324+
325+ exchange.clear().append("b").previous(true);
326+ exchange.cut();
327+ exchange.remove(Key.GT);
328+ _persistit.getCleanupManager().poll();
329+ _persistit.checkAllVolumes();
330+ final int afterRemove = countSiblings(exchange, 1);
331+ assertEquals("Remove should have caused rebalance", beforeRemove + 1, afterRemove);
332+
333+ }
334+
335+ private void setUpPrettyFullBuffers(final Exchange ex, final boolean asIndex, final int valueLength,
336+ final boolean discontinuous) throws PersistitException {
337+
338+ final int depth = asIndex ? 1 : 0;
339+ int a, b;
340+ long leftPage = 0;
341+
342+ // load page A with keys of increasing length
343+ for (a = 10;; a++) {
344+ setUpDeepKey(ex, 'a', a);
345+ storeValue(ex, valueLength, asIndex);
346+ if (ex.getTree().getDepth() > depth) {
347+ final Buffer b1 = ex.fetchBufferCopy(depth);
348+ // Stop when nearly full
349+ if (b1.getAvailableSize() < valueLength + 20) {
350+ break;
351+ }
352+ leftPage = b1.getPageAddress();
353+ }
354+ }
355+
356+ // load additional keys into page B
357+ for (b = a + 1;; b++) {
358+ if (ex.getTree().getDepth() > depth) {
359+ final Buffer b2 = ex.fetchBufferCopy(depth);
360+ if (b2.getPageAddress() != leftPage && b2.getAvailableSize() < valueLength + 20) {
361+ break;
362+ }
363+ }
364+ setUpDeepKey(ex, discontinuous && b > a ? 'b' : 'a', b);
365+ storeValue(ex, valueLength, asIndex);
366+ }
367+ }
368+
369+ private void setUpDeepKey(final Exchange ex, final char fill, final int n) {
370+ ex.getKey().clear().append(keyString(fill, n, n - 34, 4, n)).append(1);
371+ }
372+
373+ private void setupValue(final Exchange ex, final int valueLength) {
374+ ex.getValue().put(sb.toString().substring(0, valueLength));
375+ }
376+
377+ private void storeValue(final Exchange ex, final int valueLength, final boolean asIndex) throws PersistitException {
378+ if (asIndex) {
379+ // Fill up one data page so that the base key will be inserted into
380+ // an index page
381+ setupValue(ex, ex.getBufferPool().getBufferSize() / 4);
382+ for (int i = 1; i < 4; i++) {
383+ ex.cut().append(i).store();
384+ }
385+ } else {
386+ setupValue(ex, valueLength);
387+ ex.store();
388+ }
389+ }
390+
391+ String keyString(final char fill, final int length, final int prefix, final int width, final int k) {
392+ final StringBuilder sb = new StringBuilder();
393+ for (int i = 0; i < prefix && i < length; i++) {
394+ sb.append(fill);
395+ }
396+ sb.append(String.format("%0" + width + "d", k));
397+ for (int i = length - sb.length(); i < length; i++) {
398+ sb.append(fill);
399+ }
400+ sb.setLength(length);
401+ return sb.toString();
402+ }
403+
404+ private int countSiblings(final Exchange exchange, final int level) throws PersistitException {
405+ int count = 0;
406+ exchange.clear().append(Key.BEFORE);
407+ Buffer b = exchange.fetchBufferCopy(level);
408+ while (true) {
409+ count++;
410+ final long rightSibling = b.getRightSibling();
411+ if (rightSibling != 0) {
412+ b = exchange.getBufferPool().getBufferCopy(exchange.getVolume(), rightSibling);
413+ } else {
414+ break;
415+ }
416+ }
417+ return count;
418+ }
419+
420+}

Subscribers

People subscribed via source and target branches