maria:10.6-limit

Last commit made on 2021-02-25
Get this branch:
git clone -b 10.6-limit https://git.launchpad.net/maria

Branch merges

Branch information

Name:
10.6-limit
Repository:
lp:maria

Recent commits

f7c82a5... by varun

Attempt to have sort-nest as a bush

d542c8d... by varun

Review fixes

37a10dd... by varun

Enabled the order by limit optimization by default

1a18aee... by varun

MDEV-24740: Selectivity for equi-join predicates not involed in ref access is not taken into account for join cardinality estimation

Introduced a new function to evaluate the selectivity for equi-join predicates
which are not part of ref access.
Enabled this calculation only when one can use the ORDER BY LIMIT optimization.

ee9acd2... by varun

MDEV-8306: Complete cost-based optimization for ORDER BY with LIMIT

The motivation behind this optimization is to shortcut the join execution
for queries having ORDER BY with LIMIT clause. In other words we would like to
avoid computing the entire join for queries having ORDER BY with LIMIT.

The main idea behind this optimization is to push the LIMIT to a partial join.
For pushing the LIMIT there is one pre-requisite and that is the partial
join MUST resolve the ORDER BY clause.

The optimization has got its separate switch. One can disable the optimization
by running
     set optimizer_switch='cost_based_order_by_limit=off';

IMPLEMENTATION DETAILS

Let us divide the implementation details in 3 stages:

OPTIMIZATION STAGE
------------------

a) The cost of every potentially usable execution plan such that
   its first joined tables forms a bush the result of which is
   sorted in accordance with the ORDER BY clause.
   The evaluations takes into account that the LIMIT
   operation can be pushed right after the sort operation.

  The recursive procedure that enumerates such execution plans considers
  inserting a sort operation for any partial join prefix
  that can resolve the ORDER BY clause

  So for each such partial join prefix the procedure considers two options:
    1) to insert the sort operation immediately
    2) to add it later after expanding this partial join.

  For a partial prefix that cannot resolve the required ordering the procedure
  just extends the partial join.

b) Access methods that ensure pre-existing ordering are also
   taken into account inside the join planner.

  This helps us to enumerate all plans where on can push LIMIT to different
  partial plans. Finally the plan with the lowest cost is picked by the join
  planner.

COMPILATION STAGE
-----------------

Sort nest is a materialized nest of tables which can resolve
the ORDER BY CLAUSE.

Preparation of Sort Nest

Let's say the best join order is:

  t1, t2, t3, t4 .............tk,tk+1.........................tn
  |<---------prefix------------>|<-------suffix--------------->|

The array of join_tab structures would look like

  t1, t2, t3, t4 .............tk, <sort nest>, tk+1............tn

Consider the execution plan finally chosen by the planner.
This is a linear plan whose first node is a temporary table
that is created for the sort nest.

Join used for the sort nest is also executed by a linear plan.

                                  materialize
  t1, t2, t3, t4..............tk ============> <sort nest>
  |<---------prefix----------->|

  Here the sort nest is the first node as stated above:

  <sort nest> [sort], tk+1.........................tn
                      |<-------suffix-------------->|

After the temporary table for the sort nest is created the
conditions that can be pushed there are extracted from the
WHERE clause. Thus the join with the sort nest can use only
remainder of the extraction. This new condition has to
be re-bound to refer to the columns of the temporary table
whenever references to inner tables of the nest were used.

Similarly for ON clause, SELECT list, ORDER BY clause and REF items
this rebinding needs to be done.

EXECUTION STAGE
---------------

Let's say the best join order is:

  t1, t2, t3, t4 .............tk,tk+1...............tn
  |<---------prefix------------>|<-------suffix----->|

  The prefix are the inner tables of the sort nest while the
  suffix are the tables outside the sort nest.

  The join execution for this optimization can be split in 3 parts

  a) Materialize the prefix
                                     materialize
    t1, t2, t3, t4 .............tk ============> <sort nest>
    |<---------prefix------------>|

  b) Sort the <sort nest> in accordance with the ORDER BY clause

  c) Read records from the the result of sorting one by one
     and join with the tables in the suffix with
     NESTED LOOP JOIN

     <sort nest>, tk+1.........................tn
                  |<----------suffix----------->|

    The execution stops as soon as we get LIMIT records in the output.

cb5223a... by varun

Using recurisive descent instead of walk to check if the join cardinality estimates are precise

957531a... by varun

MDEV-22360: Sufficient conditions for accurate calculation of join cardinality

The aim of this task is to check if the estimate of join cardinality are accurate or not.
The implementation to check if we have the accurate estimate of the join cardinality is a
simple one, we have to walk over the WHERE clause.

The approach can be broken into 2 cases:

Case 1: WHERE clause is an AND conjunct
 For an AND item at the top level, we need to walk over all the top level conjuncts and call walk
 individually on them. This is done in such a way because for an AND conjunct at the top
 level we may have accurate selectivity, even if the predicate belongs to a different column.
 Eg: t1.a > 10 and t2.a < 5.

 For this AND item we will have accurate selectivities.
 For AND conjuncts (not at the top level), the entire conjunct needs to be resolved to one column.
 Eg: t1.a = t2.a AND ( (t1.a > 5 AND t2.a < 10) OR t1.a <= 0)

Case 2:
 2a) OR item
  For an OR item at the top level, we need to make sure that all the columns inside the OR
  conjunct need to belong to one column directly or indirectly.
  This needs to happen for an OR conjunct even if it is not at the
  top level.
  Eg: (t1.a > 5 or t1.a < 0);

 2b) Single predicate at the top level
     Eg:
       t1.a= t2.a [ For this case we need to make sure we know number of distinct values for t1.a and t2.a ]
       t1.a > 5 [ sargable predicate, get the estimate from the range optimizer ]

We need to make sure that for the predicates in the WHERE clause we have estimates either
from the first component of the index or from the EITS.
The implementation of these is covered with the callback
function passed to walk function.

1936b3c... by Karel Picman <email address hidden>

-s stands for silent, copy-paste mistake? (#1733)

fa14c42... by Hartmut Holzgraefe

MDEV-10271: add master host/port info to slave thread exit messages

Sample log error message generated:

mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 8 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329, master: 127.0.0.1:16000
mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 7 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329, master 127.0.0.1:16000
mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 12 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16000

Reviewer: <email address hidden>, Andrei and Sachin

9a08fcb... by Daniel Black

.gitattributes - correct language detection

Attempting to sign up to https://bestpractices.coreinfrastructure.org/en/projects
complains with "Implementation languages is too long (maximum is 254 characters)"

Looking at:

 curl -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/MariaDB/server/languages

It does appear that we're (just) over the limit.

As github is based on https://github.com/github/linguist, let's just
play nice and make our languages look realistic.