maria:10.7-selectivity-old

Last commit made on 2022-08-11
Get this branch:
git clone -b 10.7-selectivity-old https://git.launchpad.net/maria

Branch merges

Branch information

Name:
10.7-selectivity-old
Repository:
lp:maria

Recent commits

07ffb3a... by Monty <email address hidden>

TEMPORARY PUSH: Changing all cost calculation to be given in ms

- Added tests/check_costs.pl, a tool to verify optimizer cost calculations.
  - Most costs has been found with this program. All steps to calculate
    the new costs are documented in Docs/optimizer.costs
- User optimizer_cost variables are given in usec (as individual
  costs can be very small). Internally they are stored in ms.
- Changed DISK_READ_COST (was DISK_SEEK_BASE_COST) from a hard disk cost
  (9 ms) to common SSD cost (400MB/sec).
- Changed the following handler functions to return IO_AND_CPU_COST.
  This makes it easy to apply different cost modifiers in ha_..time()
  functions for io and cpu costs.
  - scan_time()
  - rndpos_time()
  - keyread_time()
- Enhanched keyread_time() to calculate the full cost of reading of a set
  of keys with a given number of ranges and optionall number of blocks that
  need to be accessed.
- Removed read_time() as keyread_time() + rndpos_time() is the same thing.
- Added the following new optimizer_variables:
  - optimizer_scan_lookup_cost
  - optimizer_row_lookup_cost
  - optimizer_index_lookup_cost
  - optimizer_disk_read_cost
- Added include/my_tracker.h ; Useful include file to quickly test costs
  of a function.
- Tuned sequence and heap engine costs (rest will be done in an updated
  commit)

0352e85... by Monty <email address hidden>

Fix cost calculation in test_if_cheaper_ordering() to be cost based

The original code was mostly rule based and preferred clustered or
covering indexed independent of cost.

There where a few test changes:
- Some test changed from using filesort to index or table scan. This
  happened when most of the rows had to be sorted and the ORDER BY could
  use covering or a clustered index (innodb_mysql, create_spatial_index).
- Some test changed range to filesort. This where mainly because the range
  was scanning most of the rows or using index scan + row lookup and filesort
  with table scan is cheaper. (order_by).
- Change in join_cache was because sorting 2 rows is faster than retrieving
  10 rows.
- In selectivity_innodb.test one test changed to use a cheaper index.

b34b66a... by Vicențiu Ciorbaru

Implement cost_of_filesort()

The sort length is extracted similarly to how sortlength() function does
it. The function makes use of filesort_use_addons function to compute
the length of addon fields. Finally, by calling compute_sort_costs we
get the fastest_sort possible.

Other changes:
* Sort_param::using_addon_fields() assumes addon fields are already
  allocated. This makes the use of Sort_param unusable for
  compute_sort_costs *if* we don't want to allocate addon fields.

  As a preliminary fix, pass "with_addon_fields" as bool value to
  compute_sort_costs() and make the internal functions use that value
  instead of Sort_param::using_addon_fields() method.

  The ideal fix would be to define a "leaner" struct with only the
  necessary members, but this can be done as a separate commit.

Reviewer: Monty

0890c48... by Vicențiu Ciorbaru

Refactor Sort_param::init_for_filesort

No logic changes.
Extract some of init_for_filesort logic into a separate function:
* Sort_param::setup_lengths_and_limit can be used to fill in the various
  xxx_length members of Sort_param, without having to allocate any of the
  other buffers.

Reviewer: Monty

e50c7d2... by Vicențiu Ciorbaru

Rewrite cost computation for filesort operations

This is a rework of how filesort calculates costs to allow functions
like test_if_skip_sort_order() to calculate the cost of filesort to
decide between filesort and using a key to resolve ORDER BY.

Changes:
- Split cost calculation of qsort + optional merge sort and priority queue
  to dedicated functions.
- Fixed some wrong calculations of cost in old code (use of log() instead
  of log2()).
- Added costs realted to fetching the rows if addon fields are not used.
- Updated get_merge_cost() to take into account that we are going to
  read data from temporary files in big chuncks (DISK_CHUNCK_SIZE (64K) and
  not in IO_SIZE (4K).
- More code documentation including various variables in Sort_param.

One effect of the cost update is that the cost of priority queue
with addon field has decreased slightly and is used in more cases.
When the rowid is large (like with InnoDB where rowid is the priority key),
using addon fields is in many cases preferable.

Reviewer: Monty

6040afb... by Vicențiu Ciorbaru

cleanup: Don't pass THD to get_merge_many_buff_cost_fast

We can pass the cost directly.

Reviewer: Monty

41ebf2e... by Vicențiu Ciorbaru

cleanup: Make tempfile creation uniform with DISK_CHUNK_SIZE

Replace READ_RECORD_SIZE and DISK_BUFFER_SIZE (renamed to
DISK_CHUNK_SIZE) to be used across all open_cached_file calls.

Reviewer: Monty

62c77b2... by Vicențiu Ciorbaru

cleanup: Rename Sort_param::max_rows to limit_rows

This makes the code easier to read as the intent of the parameter is
clearer.

Reviewer: Monty

bddedae... by Monty <email address hidden>

Added checking of arguments to COST_ADD and COST_MULT

These functions don't work with negative values and should never be
called with negative values. Added an assert to ensure this will
not happen.

f486362... by Monty <email address hidden>

Adjust cost for re-creating a row from the JOIN CACHE

Creating a record from the join cache is faster than getting a row from
the engine (less and simpler code to execute).

Added JOIN_CACHE_ROW_COPY_COST_FACTOR (0.5 for now) as the factor to
take this into account. This is multiplied with ROW_COPY_COST.

Other things:
- Added cost of copying rows to hash join, similar to join_cache joins.