MDEV-28881Server crashes in Dep_analysis_context::create_table_value/check_func_dependency
Closed
MDEV-30007SIGSEGV in st_select_lex_unit::is_derived_eliminated, runtime error: member access within null pointer of type 'struct TABLE' in st_select_lex_unit::is_derived_eliminated()
Closed
relates to
MDEV-27201Non-merged derived table: do not compute unused fields
In Review
MDEV-28818Eliminate derived tables having DISTINCT and/or UNION
Open
MDEV-28869Eliminated tables are not shown in EXPLAIN [FORMAT=JSON] output
In Review
MDEV-28817Derived table elimination does not work for multiple occurencies of a field
Related to MDEV-27201. MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.
Sergei Petrunia
added a comment - Related to MDEV-27201 . MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.
The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201.
The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key".
If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY:
explain
select t1.*
from
t1 leftjoin v2b on v2b.PRIMARY_KEY=t1.a;
then it will be able to eliminate v2b entirely.
Sergei Petrunia
added a comment -
The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201 .
Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here:
https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684
The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key".
If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY:
explain
select t1.*
from
t1 left join v2b on v2b.PRIMARY_KEY=t1.a;
then it will be able to eliminate v2b entirely.
Sergei Petrunia
added a comment - Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?
Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary:
(opt_table_elimination.cc line 1594):
for (uint i=0; i < table->s->keys; i++)
{
KEY *key= table->key_info + i;
if (key->flags & HA_NOSAME)
{
Dep_module_key *key_dep;
if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts)))
return NULL;
*key_list= key_dep;
key_list= &(key_dep->next_table_key);
}
}
Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?
Oleg Smirnov
added a comment - - edited Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary:
(opt_table_elimination.cc line 1594):
for (uint i=0; i < table->s->keys; i++)
{
KEY *key= table->key_info + i;
if (key->flags & HA_NOSAME)
{
Dep_module_key *key_dep;
if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts)))
return NULL;
*key_list= key_dep;
key_list= &(key_dep->next_table_key);
}
}
Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?
Answering to my previous question to Sergey: there is a special code for generating keys for derived tables:
bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array)
And eventually this function calls:
if (table->add_tmp_key(table->s->keys, parts,
get_next_field_for_derived_key,
(uchar *) &first_keyuse,
FALSE)) // bool unique
I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().
Oleg Smirnov
added a comment - Answering to my previous question to Sergey: there is a special code for generating keys for derived tables:
bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array)
And eventually this function calls:
if (table->add_tmp_key(table->s->keys, parts,
get_next_field_for_derived_key,
(uchar *) &first_keyuse,
FALSE)) // bool unique
I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().
generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys.
Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need.
Consider this example:
createtable t20 (a int);
createtable t21 (b int);
insertinto t20 values (1),(2),(3);
insertinto t21 values (1),(2),(3);
createtable t22 (a int, b int, c int);
insertinto t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;
explain
select * from
t20 join t21
leftjoin (select a, b, count(*) as CNT
from t22 groupby a, b) T
on T.a=t20.a and T.b=t21.b;
Debugging, I can see two keys created:
KEY(a)
KEY(b)
while the unique key would be the KEY(a,b) here.
Sergei Petrunia
added a comment - generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys.
Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need.
Consider this example:
create table t20 (a int );
create table t21 (b int );
insert into t20 values (1),(2),(3);
insert into t21 values (1),(2),(3);
create table t22 (a int , b int , c int );
insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;
explain
select * from
t20 join t21
left join ( select a, b, count (*) as CNT
from t22 group by a, b) T
on T.a=t20.a and T.b=t21.b;
Debugging, I can see two keys created:
KEY(a)
KEY(b)
while the unique key would be the KEY(a,b) here.
A piece in TABLE::add_tmp_key() which might be relevant to this MDEV:
if (key_parts == select_list_items)
{
if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) ||
derived->check_distinct_in_union())
keyinfo->rec_per_key[key_parts - 1]= 1;
}
They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.
Sergei Petrunia
added a comment - A piece in TABLE::add_tmp_key() which might be relevant to this MDEV:
if (key_parts == select_list_items)
{
if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) ||
derived->check_distinct_in_union())
keyinfo->rec_per_key[key_parts - 1]= 1;
}
They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.
Related to MDEV-27201. MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.