maria:bb-11.0-midenok-MDEV-27180

Last commit made on 2023-05-18
Get this branch:
git clone -b bb-11.0-midenok-MDEV-27180 https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-11.0-midenok-MDEV-27180
Repository:
lp:maria

Recent commits

8e6aa5f... by midenok

MDEV-27180 Fully atomic partitioning DDL operations

Atomic DDL for partitioning originally covers crash-safety but it does
not recover fully from failures. F.ex. if error happened during the
rename or drop of partitions the ALTER operation does not return the
table to its original state.

The patch solves the above problem similarly to MDEV-25292 (atomic
CREATE OR REPLACE): new partitions are created as temporary, old
partitions are backed up, everything is guarded by two DDL log chains:
rollback and cleanup. Rollback chain contains the actions that bring
back the original table. Cleanup chain deletes the backup files.

The generic operation of alter partition is as follows:

  1. Create new partitions as tmp partitions;
  2. Fill TMP partitions with data;
  3. Rename old and not needed partitions to backup partitions;
  4. Rename tmp partitions to original partititions;
  5. Do any required logging (mariabackup, binary);
  6. If everything is ok, drop the backup partitions;
  7. In the case any of pp.1-5 fails drop new partitions, rename back
     backup partitions.

Each rename operation is executed right after the corresponding DDL
logging. Originally they were done in different loops:
write_log_changed_partitions(), mysql_change_partitions(), etc. The
actual table operations were done in ha_partition handler:
ha_change_partitions(), ha_drop_partitions(),
ha_rename_partitions(). Now these calls are removed.

Instead the above deprecated interfaces the following classes now
handle DDL logging and table operations:

      Alter_partition_logger
      Alter_partition_add
      Alter_partition_change

Alter_partition_logger handles the basic operation, DDL logging and
table renames. Alter_partition_add does the stuff for adding the
partitions. Alter_partition_change combines Alter_partition_logger and
Alter_partition_add for complex operations such as REORGANIZE.

ha_partition::change_partitions() call was not fully removed. Stages
1-4 from that call was renamed to allocate_partitions() and that is
used by Alter_partition_add for initializing file handler arrays
(m_added_file, m_reorged_file). ha_partition::prepare_new_partition()
is renamed to create_partition() because it actually does ha_create()
and the new name better describes what it does (see previous commit
"ha_partition refactoring").

DDL_LOG_DELETE/DDL_LOG_RENAME/DDL_LOG_REPLACE are now pure file
operations. To process both .par and .frm files we now issue two
actions instead of one. That makes interface more simple and that will
be used by MDEV-16417 in future. All the table operations are done via
other actions, such as DDL_LOG_RENAME_TABLE, DDL_LOG_DROP_TABLE, etc.

DDL_LOG_RENAME_TABLE has new alter_partition option. With this option
it does simple ha_rename_table() which is needed for renaming the
partitions. The action for renaming the partitions must start from
phase DDL_RENAME_PHASE_TABLE.

The testing was refactored a bit. Many new fail/crash points was
added, the test was powered by binary and mariabackup logging, it
tests more ALTER commands with variations of partitions-only and
subpartitions, without and under LOCK TABLES.

EXCHANGE PARTITION was fixed: after failure under LOCK TABLES the
table became unlocked. Fixed by correcting place of
reopen_tables(). Tested by partition_debug.

HA_PARTITION_ONE_PHASE handling was removed from
fast_alter_partition() as native partitioning is not supported.

54dc47c... by midenok

report_error mode for ddl_log_execute_action()

Report handler errors for ddl_log_revert().

Additional execute_entry parameter is used for extended trace logging.

7668e46... by midenok

Cleanups

Renamed
log_partition_alter_to_ddl_log() -> backup_log_alter_partition()

because the name was misleading: the function writes to mariabackup
log, not to DDL log.

ba1baf7... by midenok

Vanilla cleanup: unused WFRM_KEEP_SHARE

ce0c3ff... by midenok

ha_partition refactoring

prepare_new_partition() get file from data members, update
m_added_file before successful return.

partition_element::serial_id() returns positional id in
ha_partition::m_file array. Element IDs are updated in
prep_alter_part_table() for REORGANIZE.

partition_element::parent_part holds parent partition_element for
subpartition.

change_partitions() is split into allocate_partitions() and
change_partitions().

prepare_new_partition() is renamed to create_partition() because it
actually does ha_create().

allocate_partitions() and create_partition() are used in MDEV-27180.

3f16297... by midenok

MDEV-29831 Galera crashes when running CoR for a locked table after
    setting the minimum memory for a user session.

Failure happens when finalize_atomic_replace() was already finished
and we removed the table from locked tables list.
finalize_locked_tables() doesn't know about that, it doesn't add back
last deleted lock because operation_failed == true.
reopen_tables() doesn't reopen table and as a result we have NULL in
pos_in_locked_tables->table.

The fix adds the knowledge that the locked_tables_count changed since
the start of the command. And if that happened we
add_back_last_deleted_lock(). That makes MDEV-29544 fix with
locked_tables_decremented deprecated.

Alternative fix would add atomic_replace_finished to Atomic_info and
updated it on successful finalize_atomic_replace(). Then the condition
would look like this:

  if (atomic_replace_finished || !operation_failed)
  {
    /*
      Add back the deleted table and re-created table as a locked table
      This should always work as we have a meta lock on the table.
    */
    thd->locked_tables_list.add_back_last_deleted_lock(pos_in_locked_tables);
  }

e3e5db6... by midenok

MDEV-29824 Galera crashes when running CoR for a locked table

locked_tables_list.reopen_tables() does external_lock() which in
InnoDB opens new transaction. The previous commit in send_eof() sets
s_committed state in wsrep. The new transaction didn't change that
state yet and on commit s_committed state is unexpected.

The fix is similar to MDEV-22222 (2b8b7394a12): we update wsrep state
before starting new transaction. The better wsrep fix would track the
state properly when the new transaction is started.

b4681a5... by midenok

MDEV-29770 Broken table cannot be CREATE OR REPLACE -ed anymore

finalize_atomic_replace():

If rename to backup fails we fall back to the mechanism of dropping
old table. For that we write into cleanup chain these events
(executed in reverse order):

  1. rename from temporary to original;
  2. drop the original table.

After binlogging is done, in finalize_ddl() we execute that cleanup
chain in case of success. In case of error finalize_ddl() closes
cleanup chain and executes rollback chain which drops backup (at that
point non-existent) and the new table under tmp name.

mysql_rename_table():

ha_rename_table() is non-atomic, it can make partial changes and
fail. In that case we revert back these partitial changes to make it
more atomic-friendly.

ddl_log API changes:

DROP sequence of actions can now be written into non-empty
chain. After the sequence is replayed in straight order the chain
continues from what was before DROP_INIT action. That is done by
remembering last entry in ddl_log_drop_init() and update next_entry to
that in each ddl_log_drop(). Of course, each next ddl_log_drop()
updates previous DROP (or DROP_INIT) to point to itself as it was
before.

ddl_log_start_atomic_block()/ddl_log_commit_atomic_block(): these
functions allow to write multiple chain entries without updating
execute entry. Required when the block of several actions must be
atomic.

d8605a9... by midenok

MDEV-29802 #sql-backup tables are visible upon CREATE OR REPLACE
    and get stuck in S3 storage

Now when detecting possibility of atomic replace we check
db_type->flags of existing table. Before that we checked only db_type
of new table.

5c5f265... by midenok

MDEV-29802 Rename s3.partition_create_fail to s3.debug