Merge lp:~pbeaman/akiban-persistit/fix-rebalance-exception2 into lp:akiban-persistit
Status: | Merged |
---|---|
Approved by: | Nathan Williams |
Approved revision: | 400 |
Merged at revision: | 400 |
Proposed branch: | lp:~pbeaman/akiban-persistit/fix-rebalance-exception2 |
Merge into: | lp:akiban-persistit |
Diff against target: |
420 lines (+279/-17) 6 files modified
src/main/java/com/persistit/Buffer.java (+6/-5) src/main/java/com/persistit/CleanupManager.java (+1/-1) src/main/java/com/persistit/Exchange.java (+93/-6) src/main/java/com/persistit/TransactionPlayer.java (+3/-4) src/main/java/com/persistit/VolumeStructure.java (+1/-1) src/test/java/com/persistit/ExchangeRebalanceTest.java (+175/-0) |
To merge this branch: | bzr merge lp:~pbeaman/akiban-persistit/fix-rebalance-exception2 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Nathan Williams | Approve | ||
Peter Beaman | Needs Resubmitting | ||
Review via email: mp+136225@code.launchpad.net |
Description of the change
This is a replacement for the original fix-rebalance-
Here is the original description:
This proposal fixes two fairly delicate issues and should not be approved until Server 1.4.3 has shipped.
1. RebalanceException
There is a rarely encountered finge case where deleting a record can require two pages to be split into three. The case requires two adjacent pages that are very full, and a pair of keys A and B at the left edge of the right sibling page to be long and have a small elided byte count. In other words, B is different from A in such a way that all or nearly all of the bytes in B are required to represent its value. In addition, the key the left of A and A have a deep elision count. The RebalanceException occurs deleting A requires the max key of the left sibling page to become B, and B is too big to fit there. In all previous versions of Persistit, the application would simply receive a RebalanceException in response to the attempt to remove the key.
This phenomenon has never been seen in application code, including Akiban Server testing, but it does occur routinely in Persistit stress tests.
This branch fixes the problem by catching the RebalanceException (thrown by Buffer#join(..)) and handling it properly in Exchange#
A new text ExchangeRebalan
2. InUseException in stress tests
Thread A attempts to get page P from the buffer pool. Thread B attempts to get page Q, discovers that it needs to read the page from disk, and before A gets a claim on the buffer, choses the buffer containing P to evict and reused to hold Q. In this scenario, thread A does not detect that the buffer it is waiting for has changed until it can get a claim on the buffer. And further, because the identities of the pages being awaited by the two threads is different, a deadlock and result.
This bug was described by https:/
The original fix was for thread A, while waiting for a claim on the buffer containing page P, to wait for only a short interval. A would then recheck the identity of the page contained in the awaited buffer to verify it still contains P before retrying.
The problem with this is that on a very heavily loaded system (or when stress tests run an an ec2 instance with unreliable I/O times), there can be a livelock. Each time Thread A times out, it loses its place in the queue and goes back to the end. We think this is the mechanism for occasional timeouts now seen in stress tests, especially on ec2 instances.
The newly proposed solution is for Thread A to wait for page P using its original timeout (default is 60 seconds) and never give up its position in the queue. However, when Thread B changes the identity of the page held in the awaited buffer to Q, it ends Thread A's wait by interrupting it. Thread A receives the interrupt and uses a simple mechanism to detect whether the interrupt occurred for this reason or because there truly was an interrupt created within the environment.
Since creating the livelock in the first place is extremely difficult, I have no unit test for this. (In fact, I have no proof other than an empirical improvement in test failures to support the hypothesis about the mechanism.) The modified code has run through several stress test cycles, and we will be adding more experience with it over time.
Just a few questions, but otherwise seems OK.
Why was MINIMUM_ PRUNE_OBSOLETE_ TRANSACTIONS_ INTERVAL_ NS decreased by 10x?
Why was _spareKey2/3 changed to 3/4, diff line 209ish? Are 1 and 2 not safe to use here? Can we add a comment or asserts to remind us if so?
The _id and changed() handling int SharedResource seems vulnerable to false failures. As it is a volatile int and changed with ++, another thread can come into claim it and have seen it in one of two states: pre and post increment. In the former, it will return out of the catch block with false (as intended). In the latter, an non-user generated PersistitInterr uptedException will be propagated. Is that what we want?