maria:preview-10.10-optimizer

Last commit made on 2022-07-13
Get this branch:
git clone -b preview-10.10-optimizer https://git.launchpad.net/maria

Branch merges

Branch information

Name:
preview-10.10-optimizer
Repository:
lp:maria

Recent commits

ec4e590... by Oleg Smirnov

MDEV-28881 Fix memory leak caused by STL usage

Dep_analysis_context::create_unique_pseudo_key_if_needed() was using
std::set which allocated memory outside MEM_ROOT and so required
explicit deallocation. This has been solved by replacing std::set with
MY_BITMAP which is allocated on MEM_ROOT

1af0382... by Oleg Smirnov

MDEV-28881 Server crash in Dep_analysis_context::create_table_value

SELECT_LEX::first_select()->join is NULL for degenerate derived tables
which are known to have just one row and so were already materialized
by the optimizer.
This commit adds a check for this.

916c2d3... by Oleg Smirnov

MDEV-26278 Add functionality to eliminate derived tables from the query

Elimination of unnecessary tables from SQL queries is already present
in MariaDB. But it only works for regular tables and not for derived ones.

Imagine we have a view:
  CREATE VIEW v1 AS SELECT a, b, max(c) AS maxc FROM t1 GROUP BY a, b

Due to "GROUP BY a, b" the values of combinations {a, b} are unique,
and this fact can be treated as like derived table "v1" has a unique key
on fields {a, b}.

Suppose we have a SQL query:
  SELECT t2.* FROM t2 LEFT JOIN v1 ON t2.a=v1.a and t2.b=v1.b

1. Since {v1.a, v1.b} is unique and both these fields are bound to t2,
   "v1" is functionally dependent on t2.
   This means every record of "t2" will be either joined with
   a single record of "v1" or NULL-complemented.
2. No fields of "v1" are present on the SELECT list

These two facts allow the server to completely exclude (eliminate)
the derived table "v1" from the query.

6f979f8... by Monty <email address hidden>

Reduced size of POSITION

Replaced Cost_estimate prefix_cost with a double as prefix_cost was
only used to store and retrive total prefix cost.

This also speeds up things (a bit) as don't have to call
Cost_estimate::total_cost() for every access to the prefix_cost.

Sizeof POSITION decreased from 304 to 256.

581bdc9... by Monty <email address hidden>

Added current_cost and best_cost to optimizer trace when pruning by cost

This helps a lot when trying to find out why a table combination was pruned

7d2cbde... by Monty <email address hidden>

Added EQ_REF chaining to the greedy_optimizer

MDEV-28073 Slow query performance in MariaDB when using many table

The idea is to prefer and chain EQ_REF tables (tables that uses an
unique key to find a row) when searching for the best table combination.
This significantly reduces row combinations that has to be examined.
This is optimization is enabled when setting optimizer_prune_level=2
(which is now default).

Implementation:
- optimizer_prune_level has a new level, 2, which enables EQ_REF
  optimization in addition to the pruning done by level 1.
  Level 2 is now default.
- Added JOIN::eq_ref_tables that contains bits of tables that could use
  potentially use EQ_REF access in the query. This is calculated
  in sort_and_filter_keyuse()

Under optimizer_prune_level=2:
- When the greedy_optimizer notices that the preceding table was an
  EQ_REF table, it tries to add an EQ_REF table next. If an EQ_REF
  table exists, only this one will be considered at this level.
  We also collect all EQ_REF tables chained by the next levels and these
  are ignored on the starting level as we have already examined these.
  If no EQ_REF table exists, we continue as normal.

This optimization speeds up the greedy_optimizer combination test with
~25%

Other things:
- I ported the changes in MySQL 5.7 to greedy_optimizer.test to MariaDB
  to be able to ensure we can handle all cases that MySQL can do.
- I have run all tests with --mysqld=--optimizer_prune_level=1 to verify that
  there where no test changes.

85d23bc... by Monty <email address hidden>

Remove unnecessary testing of join dependency when sorting tables

The dependency checking is not needed when using
best_extension_by_limited_search() as this function ensures
that we don't use tables that are depending on any of the remaning
tables.

552cf89... by Monty <email address hidden>

Added get_allowed_nj_tables() to speed up gready_search()

"Get the tables that one is allowed to have as the next table in the
current plan"

Main author: Sergei Petrunia <email address hidden>
Co author: Monty

dedc3d7... by Monty <email address hidden>

Improve pruning in greedy_search by sorting tables during search

MDEV-28073 Slow query performance in MariaDB when using many tables

The faster we can find a good query plan, the more options we have for
finding and pruning (ignoring) bad plans.

This patch adds sorting of plans to best_extension_by_limited_search().
The plans, from best_access_path() are sorted according to the numbers
of found rows. This allows us to faster find 'good tables' and we are
thus able to eliminate 'bad plans' faster.

One side effect of this patch is that if two tables have equal cost,
the table that which was used earlier in the query is preferred.
This allows users to improve plans by reordering eq_ref tables in the
order they would like them to be uses.

Result changes caused by the patch:
- Traces are different as now we print the cost for using tables before
  we start considering them in the plan.
- Table order are changed for some plans. In most cases this is because
  the plans are equal and tables are in this case sorted according to
  their usage in the original query.
- A few plans was changed as the optimizer was able to find a better
  plan (that was pruned by the original code).

Other things:

- Added a new statistic variable: "optimizer_join_prefixes_check_calls",
  which counts number of calls to best_extension_by_limited_search().
  This can be used to check the prune efficiency in greedy_search().
- Added variable "JOIN_TAB::embedded_dependent" to be able to handle
  XX IN (SELECT..) in the greedy_optimizer. The idea is that we
  should prune a table if any of the tables in embedded_dependent is
  not yet read.
- When using many tables in a query, there will be some additional
  memory usage as we need to pre-allocate table of
  table_count*table_count*sizeof(POSITION) objects (POSITION is 312
  bytes for now) to hold the pre-calculated best_access_path()
  information. This memory usage is offset by the expected
  performance improvement when using many tables in a query.
- Removed the code from an earlier patch to keep the table order in
  join->best_ref in the original order. This is not needed anymore as we
  are now sorting the tables for each best_extension_by_limited_search()
  call.

63961a0... by Marko Mäkelä

Merge 10.9 into 10.10