TEMPORARY PUSH: Changing all cost calculation to be given in ms
- Added tests/check_costs.pl, a tool to verify optimizer cost calculations.
- Most costs has been found with this program. All steps to calculate
the new costs are documented in Docs/optimizer.costs
- User optimizer_cost variables are given in usec (as individual
costs can be very small). Internally they are stored in ms.
- Changed DISK_READ_COST (was DISK_SEEK_BASE_COST) from a hard disk cost
(9 ms) to common SSD cost (400MB/sec).
- Changed the following handler functions to return IO_AND_CPU_COST.
This makes it easy to apply different cost modifiers in ha_..time()
functions for io and cpu costs.
- scan_time()
- rndpos_time()
- keyread_time()
- Enhanched keyread_time() to calculate the full cost of reading of a set
of keys with a given number of ranges and optionall number of blocks that
need to be accessed.
- Removed read_time() as keyread_time() + rndpos_time() is the same thing.
- Added the following new optimizer_variables:
- optimizer_scan_lookup_cost
- optimizer_row_lookup_cost
- optimizer_index_lookup_cost
- optimizer_disk_read_cost
- Added include/my_tracker.h ; Useful include file to quickly test costs
of a function.
- Tuned sequence and heap engine costs (rest will be done in an updated
commit)
Fix cost calculation in test_if_cheaper_ordering() to be cost based
The original code was mostly rule based and preferred clustered or
covering indexed independent of cost.
There where a few test changes:
- Some test changed from using filesort to index or table scan. This
happened when most of the rows had to be sorted and the ORDER BY could
use covering or a clustered index (innodb_mysql, create_spatial_index).
- Some test changed range to filesort. This where mainly because the range
was scanning most of the rows or using index scan + row lookup and filesort
with table scan is cheaper. (order_by).
- Change in join_cache was because sorting 2 rows is faster than retrieving
10 rows.
- In selectivity_innodb.test one test changed to use a cheaper index.
The sort length is extracted similarly to how sortlength() function does
it. The function makes use of filesort_use_addons function to compute
the length of addon fields. Finally, by calling compute_sort_costs we
get the fastest_sort possible.
Other changes:
* Sort_param::using_addon_fields() assumes addon fields are already
allocated. This makes the use of Sort_param unusable for
compute_sort_costs *if* we don't want to allocate addon fields.
As a preliminary fix, pass "with_addon_fields" as bool value to
compute_sort_costs() and make the internal functions use that value
instead of Sort_param::using_addon_fields() method.
The ideal fix would be to define a "leaner" struct with only the
necessary members, but this can be done as a separate commit.
No logic changes.
Extract some of init_for_filesort logic into a separate function:
* Sort_param::setup_lengths_and_limit can be used to fill in the various
xxx_length members of Sort_param, without having to allocate any of the
other buffers.
This is a rework of how filesort calculates costs to allow functions
like test_if_skip_sort_order() to calculate the cost of filesort to
decide between filesort and using a key to resolve ORDER BY.
Changes:
- Split cost calculation of qsort + optional merge sort and priority queue
to dedicated functions.
- Fixed some wrong calculations of cost in old code (use of log() instead
of log2()).
- Added costs realted to fetching the rows if addon fields are not used.
- Updated get_merge_cost() to take into account that we are going to
read data from temporary files in big chuncks (DISK_CHUNCK_SIZE (64K) and
not in IO_SIZE (4K).
- More code documentation including various variables in Sort_param.
One effect of the cost update is that the cost of priority queue
with addon field has decreased slightly and is used in more cases.
When the rowid is large (like with InnoDB where rowid is the priority key),
using addon fields is in many cases preferable.