MDEV-26247 [wip] Make spider_create_gbh go through append_join too
After this, we can get rid of append_from_and_tables().
There are two tricky issues remaining.
1. By the time the query reaches the spider gbh, the optimizer may
have modified the query so that it is no longer valid. See the test
tmppp.test
2. Somehow this change also fails spider/bg.ha and spider.ha because
failing links are no longer detected with the link status updated to
NG, causing spider to use them even though they are unavailable.
MDEV-26247 [PoC] Re-implement spider gbh query rewrite of tables
Spider GBH's query rewrite of table joins is overly complex and
error-prone. In this PoC commit, we explore replacing it with
something closer to what dbug_print() (more specifically,
print_join()) does, but catered to spider. It seems to be working
well, based on two examples I have tested:
select * from t3 left join t1 on t3.a = t1.a left join t2 on t3.a = t2.a;
select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a;
Note that we have not removed the old functions yet, and it does not
support const tables or (presumably) eliminated tables.
However, it fails the second example due to issues in *item* printing.
Unfortunately, these issues already exist in the spider GBH without
the change in this commit (i.e. not a regression), so we have to fix
that too, but perhaps as a separate task... See below about the
failure
--8<---------------cut here---------------start------------->8---
select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a
select t0.`a` `a`,t1.`a` `a`,t2.`a` `a` from `auto_test_remote`.`t3` t2 left join (`auto_test_remote`.`t1` t0 left join `auto_test_remote`.`t2` t1 on (t1.`a` = t2.`a`)) on (t0.`a` = t2.`a`) where 1
select t0.`a` `a`,t1.`a` `a`,t2.`a` `a` from `auto_test_remote`.`t3` t2 left join ( left join `auto_test_remote`.`t2` t1 on (t1.`a` = t2.`a`) join `auto_test_remote`.`t1` t0) on (t0.`a` = t2.`a`) where 1
--8<---------------cut here---------------end--------------->8---
MDEV-26247 [wip] Spider gbh query rewrite should get table for fields in a simple way
Add a method spider_fields::find_table that searches its table holders
to find table for a given field. This way we will be able to get rid
of the first pass during the gbh creation where field_chains and
field_holders are created.
MDEV-29502 Fix some issues with spider direct aggregate
The direct aggregate mechanism sems to be only intended to work when
otherwise a full table scan query will be executed from the spider
node and the aggregation done at the spider node too. Typically this
happens in sub_select(). In the test spider.direct_aggregate_part
direct aggregate allows to send COUNT statements directly to the data
nodes and adds up the results at the spider node, instead of iterating
over the rows one by one at the spider node.
By contrast, the group by handler (GBH) typically sends aggregated
queries directly to data nodes, in which case DA does not improve the
situation here.
That is why we should fix it by disabling DA when GBH is used.
There are other reasons supporting this change. First, the creation of
GBH results in a call to change_to_use_tmp_fields() (as opposed to
setup_copy_fields()) which causes the spider DA function
spider_db_fetch_for_item_sum_funcs() to work on wrong items. Second,
the spider DA function only calls direct_add() on the items, and the
follow-up add() needs to be called by the sql layer code. In
do_select(), after executing the query with the GBH, it seems that the
required add() would not necessarily be called.
Disabling DA when GBH is used does fix the bug. There are a few
other things included in this commit to improve the situation with
spider DA:
1. Add a session variable that allows user to disable DA completely,
this will help as a temporary measure if/when further bugs with DA
emerge.
2. Move the increment of direct_aggregate_count to the spider DA
function. Currently this is done in rather bizarre and random
locations.
3. Fix the spider_db_mbase_row creation so that the last of its row
field (sentinel) is NULL. The code is already doing a null check, but
somehow the sentinel field is on an invalid address, causing the
segfaults. With a correct implementation of the row creation, we can
avoid such segfaults.
MDEV-31117 clean up spider connection info parsing
Spider connection string is a comma-separated parameter definitions,
where each definition is of the form "<param_title> <param_value>",
where <param_value> is quote delimited on both ends, with backslashes
acting as an escaping prefix.
The code however treated param title the same way as param value when
assigning, and have nonsensical fields like delim_title_len and
delim_title. We remove these.
We also clean up the spider comment connection string parsing,
including:
- Factoring out some code from the parsing function
- Rewriting the struct `st_spider_param_string_parse`, including
replacing its messy methods with cleaner ones
- And any necessary changes caused by the above changes