maria:10.11-psergey-sel-v1

Last commit made on 2022-09-09
Get this branch:
git clone -b 10.11-psergey-sel-v1 https://git.launchpad.net/maria

Branch merges

Branch information

Name:
10.11-psergey-sel-v1
Repository:
lp:maria

Recent commits

cc6cd19... by Michael Widenius <email address hidden>

Added optimizer_costs.h which includes all optimizer costs

This makes it easier to see how costs changes over commits

3763056... by Monty <email address hidden>

Split cost calculations into fetch and total

This patch causes no changes in costs or result files.

Changes:
- Store row compare cost separately in Cost_estimate::comp_cost
- Store cost of fetching rows separately in OPT_RANGE
- Use range->fetch_cost instead of adjust_quick_cost(total_cost)

This was done to simplify cost calculation in sql_select.cc:
- We can use range->fetch_cost directly without having to call
  adjust_quick_cost(). adjust_quick_cost() is now removed.

Other things:
- Removed some not used functions in Cost_estimate

11aa43e... by Monty <email address hidden>

Make trace.add() usage uniform

- Before any multiple add() calls, always use (if trace_started()).
- Add unlikely() around all tests of trace_started().
- Change trace.add(); trace.add(); to trace.add().add();
- When trace.add() goes over several line, use the following formating:
trace.
 add(xxx).
 add(yyy).
 add(zzz);

This format was choosen after a discussion between Sergei Petrunia and
me as it looks similar indepedent if 'trace' is an object or a
pointer. It also more suitable for an editors auto-indentation.

Other things:

Added DBUG_ASSERT(thd->trace_started()) to a few functions that should
only be called if trace is enabled.

"use_roworder_index_merge: true" changed to "use_sort_index_merge: false"
As the original output was often not correct.
Also fixed the related 'cause' to be correct.

In best_access_path() print the cost (and number of rows) before
checking if it the plan should be used. This removes the need to print
the cost in two places.

Changed a few "read_time" tags to "cost".

b1b8f20... by Sergey Petrunia

Stabilize main.subselect_sj2* tests

f20d1b6... by Monty <email address hidden>

Update cost for hash and cached joins

The old code didn't correctly add TIME_FOR_COMPARE to rows that are
part of the scan that will be compared with the attached where clause.

Now the cost calculation for hash join and full join cache join are
identical except for HASH_FANOUT (10%)

The cost for a join with keys is now also uniform.
The total cost for a using a key for lookup is calculated in one place as:

(cost_of_finding_rows_through_key(records) + records/TIME_FOR_COMPARE)*
record_count_of_previous_row_combinations + startup_cost

startup_cost is the cost of a creating a temporary table (if needed)

Best_cost now includes the cost of comparing all WHERE clauses and also
cost of joining with previous row combinations.

Other things:
- Optimizer trace is now printing the total costs, including testing the
  WHERE clause (TIME_FOR_COMPARE) and comparing with all previous rows.
- In optimizer trace, include also total cost of query together with the
  final join order. This makes it easier to find out where the cost was
  calculated.
- Old code used filter even if the cost for it was higher than not using a
  filter. This is not corrected.

d971b9b... by Monty <email address hidden>

Adjust costs for doing index scan in cost_group_min_max()

The idea is that when doing a tree dive (once per group), we need to
compare key values, which is fast. For each new group, we have to
compare the full where clause for the row.
Compared to original code, the cost of group_min_max() has slightly
increased which affects some test with only a few rows.
main.group_min_max and main.distinct have been modified to show the
effect of the change.

The patch also adjust the number of groups in case of quick selects:
- For simple WHERE clauses, ensure that we have at least as many groups
  as we have conditions on the used group-by key parts.
  The assumption is that each condition will create at least one group.
- Ensure that there are no more groups than rows found by quick_select

Test changes:
- For some small tables there has been a change of
  Using index for group-by -> Using index for group-by (scanning)
  Range -> Index and Using index for group-by -> Using index

b7467d6... by Monty <email address hidden>

Return >= 1 from matching_candidates_in_table if records > 0.0

Having rows >= 1.0 helps ensure that when we calculate total rows of joins
the number of resulting rows will not be less after the join.

Changes in test cases:
- Join order change for some tables with few records
- 'Filtered' is much higher for tables with few rows, as 1 row is a high
  procent of a table with few rows.

0c20ba7... by Monty <email address hidden>

Update matching_candidates_in_table() to treat all conditions similar

Fixed also that the 'with_found_constraint parameter' to
matching_candidates_in_table() is as documented: It is now true only
if there is a reference to a previous table in the WHERE condition for
the current examined table (as it was originally documented)

Changes in test results:
- Filtered was 25% smaller for some queries (expected).
- Some join order changed (probably because the tables had very few rows).
- Some more table scans, probably because there would be fewer returned
  rows.
- Some tests exposes a bug that if there is more filtered rows, then the
  cost for table scan will be higher. This will be fixed in a later commit.

0809587... by Monty <email address hidden>

Fix calculation of selectivity

calculate_cond_selectivity_for_table() is largely rewritten:
- Process keys in the order of rows found, smaller ranges first. If two
  ranges has equal number of rows, use the one with more key parts.
  This helps us to mark more used fields to not be used for further
  selectivity calculations. See cmp_quick_ranges().
- Ignore keys with fields that where used by previous keys
- Don't use rec_per_key[] to calculate selectivity for smaller
  secondary key parts. This does not work as rec_per_key[] value
  is calculated in the context of the previous key parts, not for the
  key part itself. The one exception is if the previous key parts
  is a constant.

Other things:
- Ensure that select->cond_selectivity is always between 0 and 1.
- Ensure that select->opt_range_condition_rows is never updated to
  a higher value. It is initially set to the number of rows in table.
- We know store in table->opt_range_condition_rows the lowest number of
  rows that any row-read-method has found so far. Before it was only done
  for QUICK_SELECT_I::QS_TYPE_ROR_UNION and
  QUICK_SELECT_I::QS_TYPE_INDEX_MERGE.
  Now it is done for a lot more methods. See
  calculate_cond_selectivity_for_table() for details.
- Calculate and use selectivity for the first key part of a multiple key
  part if the first key part is a constant.
  WHERE key1_part1=5 and key2_part1=5. IF key1 is used, then we can still
  use selectivity for key2

Changes in test results:
- 'filtered' is slighly changed, usually to something slightly smaller.
- A few cases where for group by queries the table order changed. This was
  because the number of resulting rows from a group by query with MIN/MAX
  is now set to be smaller.
- A few index was changed as we know prefer index with more key parts if
  the number of resulting rows is the same.

b83877d... by Monty <email address hidden>

Fixed bug in SQL_SELECT_LIMIT

We where comparing costs when we should be comparing number of rows
that will be examined