maria:bb-11.0-ycp-mdev-26247-unsquashed

Last commit made on 2023-09-26
Get this branch:
git clone -b bb-11.0-ycp-mdev-26247-unsquashed https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-11.0-ycp-mdev-26247-unsquashed
Repository:
lp:maria

Recent commits

7e8c56a... by Yuchen Pei <email address hidden>

MDEV-26247 [wip] Make spider_create_gbh go through append_join too

After this, we can get rid of append_from_and_tables().

There are two tricky issues remaining.

1. By the time the query reaches the spider gbh, the optimizer may
have modified the query so that it is no longer valid. See the test
tmppp.test

2. Somehow this change also fails spider/bg.ha and spider.ha because
failing links are no longer detected with the link status updated to
NG, causing spider to use them even though they are unavailable.

139967b... by Yuchen Pei <email address hidden>

MDEV-26247 [PoC] Re-implement spider gbh query rewrite of tables

Spider GBH's query rewrite of table joins is overly complex and
error-prone. In this PoC commit, we explore replacing it with
something closer to what dbug_print() (more specifically,
print_join()) does, but catered to spider. It seems to be working
well, based on two examples I have tested:

select * from t3 left join t1 on t3.a = t1.a left join t2 on t3.a = t2.a;
select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a;

Note that we have not removed the old functions yet, and it does not
support const tables or (presumably) eliminated tables.

However, it fails the second example due to issues in *item* printing.
Unfortunately, these issues already exist in the spider GBH without
the change in this commit (i.e. not a regression), so we have to fix
that too, but perhaps as a separate task... See below about the
failure

--8<---------------cut here---------------start------------->8---
select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a

select t0.`a` `a`,t1.`a` `a`,t2.`a` `a` from `auto_test_remote`.`t3` t2 left join (`auto_test_remote`.`t1` t0 left join `auto_test_remote`.`t2` t1 on (t1.`a` = t2.`a`)) on (t0.`a` = t2.`a`) where 1

select t0.`a` `a`,t1.`a` `a`,t2.`a` `a` from `auto_test_remote`.`t3` t2 left join ( left join `auto_test_remote`.`t2` t1 on (t1.`a` = t2.`a`) join `auto_test_remote`.`t1` t0) on (t0.`a` = t2.`a`) where 1
--8<---------------cut here---------------end--------------->8---

c60ac63... by Yuchen Pei <email address hidden>

MDEV-26247 [wip] Spider gbh should send correct queries involving const tables

- if it's not an outer join, simply skip the const table
- if it is an outer join, print (select 1) alias

Passes the included test, but fails regression test
direct_right_left_join_nullable.

e0e0135... by Yuchen Pei <email address hidden>

MDEV-26247 clean up spider_group_by_handler::init_scan()

14eed9f... by Yuchen Pei <email address hidden>

MDEV-26247 [wip] Spider gbh query rewrite should get table for fields in a simple way

Add a method spider_fields::find_table that searches its table holders
to find table for a given field. This way we will be able to get rid
of the first pass during the gbh creation where field_chains and
field_holders are created.

6cf66aa... by Yuchen Pei <email address hidden>

MDEV-26247 Remove two unused methods of spider_fields

There are probably more of these conn_holder related methods that can
be removed

4e374f3... by Yuchen Pei <email address hidden>

MDEV-29502 Fix some issues with spider direct aggregate

The direct aggregate mechanism sems to be only intended to work when
otherwise a full table scan query will be executed from the spider
node and the aggregation done at the spider node too. Typically this
happens in sub_select(). In the test spider.direct_aggregate_part
direct aggregate allows to send COUNT statements directly to the data
nodes and adds up the results at the spider node, instead of iterating
over the rows one by one at the spider node.

By contrast, the group by handler (GBH) typically sends aggregated
queries directly to data nodes, in which case DA does not improve the
situation here.

That is why we should fix it by disabling DA when GBH is used.

There are other reasons supporting this change. First, the creation of
GBH results in a call to change_to_use_tmp_fields() (as opposed to
setup_copy_fields()) which causes the spider DA function
spider_db_fetch_for_item_sum_funcs() to work on wrong items. Second,
the spider DA function only calls direct_add() on the items, and the
follow-up add() needs to be called by the sql layer code. In
do_select(), after executing the query with the GBH, it seems that the
required add() would not necessarily be called.

Disabling DA when GBH is used does fix the bug. There are a few
other things included in this commit to improve the situation with
spider DA:

1. Add a session variable that allows user to disable DA completely,
this will help as a temporary measure if/when further bugs with DA
emerge.

2. Move the increment of direct_aggregate_count to the spider DA
function. Currently this is done in rather bizarre and random
locations.

3. Fix the spider_db_mbase_row creation so that the last of its row
field (sentinel) is NULL. The code is already doing a null check, but
somehow the sentinel field is on an invalid address, causing the
segfaults. With a correct implementation of the row creation, we can
avoid such segfaults.

bb7646e... by Yuchen Pei <email address hidden>

MDEV-29502 Remove spider_db_handler::need_lock_before_set_sql_for_exec

This function trivially returns false

3b5cecb... by Yuchen Pei <email address hidden>

MDEV-29502 Clean up spider_db_seek_next() a bit

Also moved spider_conn_before_query() and spider_conn_after_query() to
be used by more places

And clean up other functions in the same file (spd_db_conn.cc).

6dfe8d1... by Yuchen Pei <email address hidden>

MDEV-31117 clean up spider connection info parsing

Spider connection string is a comma-separated parameter definitions,
where each definition is of the form "<param_title> <param_value>",
where <param_value> is quote delimited on both ends, with backslashes
acting as an escaping prefix.

The code however treated param title the same way as param value when
assigning, and have nonsensical fields like delim_title_len and
delim_title. We remove these.

We also clean up the spider comment connection string parsing,
including:

- Factoring out some code from the parsing function
- Rewriting the struct `st_spider_param_string_parse`, including
  replacing its messy methods with cleaner ones
- And any necessary changes caused by the above changes