[MDEV-26278] Table elimination does not work across derived tables Created: 2021-07-29  Updated: 2023-03-21  Resolved: 2022-07-31

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.10.0

Type: Task Priority: Critical
Reporter: Sergei Petrunia Assignee: Oleg Smirnov
Resolution: Fixed Votes: 0
Labels: Preview_10.10, optimizer-easy, optimizer-feature

Issue Links:
Problem/Incident
causes MDEV-28881 Server crashes in Dep_analysis_contex... Closed
causes MDEV-30007 SIGSEGV in st_select_lex_unit::is_der... Closed
Relates
relates to MDEV-27201 Non-merged derived table: do not comp... In Review
relates to MDEV-28818 Eliminate derived tables having DISTI... Open
relates to MDEV-28869 Eliminated tables are not shown in EX... In Review
relates to MDEV-28817 Derived table elimination does not wo... In Review

 Description   

A dataset (not necessarily minimal):

create table t1 (a int);
insert into t1 select seq from seq_1_to_10;
 
create table t11 (
  a int not null,
  b int,
  key(a)
);
insert into t11 select A.seq, A.seq+B.seq 
from 
  seq_1_to_100 A,
  seq_1_to_1000 B;

create table t12 (
  pk int primary key,
  col1 int
);
insert into t12 select seq, seq from seq_1_to_100000;

A non-mergeable view where table t12 can be eliminated:

create view v2b as 
select 
  t11.a as a,
  count(*) as b
from
  t11 left join t12 on t12.pk=t11.b
group by
  t11.a;

EXPLAIN shows it it is indeed eliminated:

explain
select t1.*
from 
  t1 left join v2b on v2b.a=t1.a;

+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+
| id   | select_type     | table      | type | possible_keys | key  | key_len | ref     | rows | Extra       |
+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+
|    1 | PRIMARY         | t1         | ALL  | NULL          | NULL | NULL    | NULL    | 10   |             |
|    1 | PRIMARY         | <derived2> | ref  | key0          | key0 | 5       | j5.t1.a | 2    | Using where |
|    2 | LATERAL DERIVED | t11        | ref  | a             | a    | 4       | j5.t1.a | 1    |             |
+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+

Now, let's add a column from t12 into the select list:

create view v2c as 
select 
  t11.a as a,
  max(t12.col1) as b
from
  t11 left join t12 on t12.pk=t11.b
group by
  t11.a;

and run a query that doesn't use it:

explain
select t1.* 
from 
  t1 left join v2c on v2c.a=t1.a;

EXPLAIN shows t12 was not eliminated:

+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id   | select_type     | table      | type   | possible_keys | key     | key_len | ref      | rows | Extra       |
+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+
|    1 | PRIMARY         | t1         | ALL    | NULL          | NULL    | NULL    | NULL     | 10   |             |
|    1 | PRIMARY         | <derived2> | ref    | key0          | key0    | 5       | j5.t1.a  | 2    | Using where |
|    2 | LATERAL DERIVED | t11        | ref    | a             | a       | 4       | j5.t1.a  | 1    |             |
|    2 | LATERAL DERIVED | t12        | eq_ref | PRIMARY       | PRIMARY | 4       | j5.t11.b | 1    | Using where |
+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+



 Comments   
Comment by Sergei Petrunia [ 2022-01-18 ]

Related to MDEV-27201. MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.

Comment by Sergei Petrunia [ 2022-01-18 ]

The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201.

Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here:
https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684

The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key".

If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY:

explain
select t1.*
from 
  t1 left join v2b on v2b.PRIMARY_KEY=t1.a;

then it will be able to eliminate v2b entirely.

Comment by Sergei Petrunia [ 2022-01-18 ]

Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?

Comment by Sergei Petrunia [ 2022-01-21 ]

... upon a closer examination, it looks like we could not.

Comment by Oleg Smirnov [ 2022-02-05 ]

Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary:

(opt_table_elimination.cc line 1594):
for (uint i=0; i < table->s->keys; i++)
{
KEY *key= table->key_info + i;
if (key->flags & HA_NOSAME)

{ Dep_module_key *key_dep; if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts))) return NULL; *key_list= key_dep; key_list= &(key_dep->next_table_key); }

}

Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?

Comment by Oleg Smirnov [ 2022-02-11 ]

Answering to my previous question to Sergey: there is a special code for generating keys for derived tables:
bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array)

And eventually this function calls:
if (table->add_tmp_key(table->s->keys, parts,
get_next_field_for_derived_key,
(uchar *) &first_keyuse,
FALSE)) // bool unique

I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().

Comment by Sergei Petrunia [ 2022-02-13 ]

generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys.

Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need.
Consider this example:

create table t20 (a int);
create table t21 (b int);
insert into t20 values (1),(2),(3);
insert into t21 values (1),(2),(3);
 
create table t22 (a int, b int, c int);
insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;
 
explain
select * from
  t20 join t21
  left join (select a, b, count(*) as CNT
             from t22 group by a, b) T
             on T.a=t20.a and T.b=t21.b;

Debugging, I can see two keys created:

  KEY(a)
  KEY(b)

while the unique key would be the KEY(a,b) here.

Comment by Sergei Petrunia [ 2022-02-13 ]

A piece in TABLE::add_tmp_key() which might be relevant to this MDEV:

    if (key_parts == select_list_items)
    {
      if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) ||
          derived->check_distinct_in_union())
        keyinfo->rec_per_key[key_parts - 1]= 1;
    }

They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.

Comment by Oleg Smirnov [ 2022-04-02 ]

The implementation is pushed to bb-10.9-MDEV-26278, ready for review.

Comment by Sergei Petrunia [ 2022-05-19 ]

Review: https://lists.launchpad.net/maria-developers/msg13155.html

Comment by Oleg Smirnov [ 2022-05-24 ]

I believe I have addressed all the review comments.

Comment by Sergei Petrunia [ 2022-06-09 ]

Second review: https://lists.launchpad.net/maria-developers/msg13159.html

Comment by Sergei Petrunia [ 2022-06-12 ]

Third review: https://lists.launchpad.net/maria-developers/msg13161.html

Comment by Oleg Smirnov [ 2022-06-13 ]

Final version is pushed to new branch preview-10.10-optimizer

Comment by Alice Sherepa [ 2022-07-20 ]

It is ok to push it into 10.10. Thanks!

Generated at Thu Feb 08 09:44:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.