Where S tells to skip each S events and save the current state if the
current event is not GTID_EVENT, otherwise save the state for the
first ongoing non-GTID_EVENT event.
We store the states into contigous GTID array and the indexes to that
array for the current state as well as the seek position into RB-tree
indexed by the event position.
When checking in gtid_state_from_pos() first we find upper element
with pos less than the subject position and recover its state, then we
start normal file scan from the seek_pos of the element. seek_pos is
the position of next event after pos.
For the successful testing it was required to sort the result returned
by gtid_state_from_pos() since in sparse algorithm gtid state stored
does not guarantee the order as get_most_recent_gtid_list() does not
guarantee the order.
Size optimization
When binlog_gtid_pos_cache_sparse is 0 cache works by contiguous
algorithm, otherwise by sparse algorithm.
Contiguous algorithm:
Each new GTID takes 16 bytes in gtids array.
Each new event takes 4 (key) + 16 (array element) + 8 (pos_hash_element) bytes in pos_hash.
For 2 events per GTID the cache takes 36 bytes per event.
Sparse algorithm:
Each new entry takes 32 (RB node base) + 16 * D (gtid state) + 4 (key) + 16 (value)
The cache takes 52 + 16 * D bytes per S events where D is average
number of domains and S is sparse factor.
p mysql_bin_log.gtid_state_cache->gtids.elements
p mysql_bin_log.gtid_state_cache->pos_hash.records
p rpl_global_gtid_binlog_state.binlog_hash.records
p *(GTID_state_cache *)(rpl_global_gtid_binlog_state.binlog_list.first.next.next)
In-memory cache for binlog position -> GTID state used by SQL function
binlog_gtid_pos(). The cache is growing when events are written to
binlog file. Configured by (1 is default value):
set @@global.binlog_gtid_pos_cache= N;
N = 0 means no caching is done, the existing cache will be deleted on
FLUSH BINARY LOGS.
N > 0 means keep cache for N last binary log files. On FLUSH BINARY
LOGS the cache is rotated according to current binlog_gtid_pos_cache
value.
The cache is done via GTID array in chronological order and hash
mapping event offsets to the index of last GTID. gtid_state->load() in
gtid_state_from_pos() loads then this array up the mapped index and
keeps only the last GTIDs per each domain (that's how it works
originally).
To indicate "no GTIDs" state hash stores
SIZE_T_MAX. gtid_state_from_pos() returns empty string for that and
NULL for error state.
Testing is done in binlog_hash.test which can compare results of cache
and non-cached algorithms via combinations cache_off/cache_on.
Debug tracing can be used to check cache hits/misses as well as hash
rotation:
- Remove -Wimplicit-fallthrough=2 for gcc versions < 6
- Don't do git submodule update one fresh git clones
This fixes an issue when using git 1.0 that gives
errors on empty submodule directories