[MDEV-26278] Table elimination does not work across derived tables - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.10.0
Component/s: Optimizer
Labels:

Description

A dataset (not necessarily minimal):

create table t1 (a int);

insert into t1 select seq from seq_1_to_10;

create table t11 (

  a int not null,

  b int,

  key(a)

);

insert into t11 select A.seq, A.seq+B.seq

from

  seq_1_to_100 A,

  seq_1_to_1000 B;

create table t12 (

  pk int primary key,

  col1 int

);

insert into t12 select seq, seq from seq_1_to_100000;

A non-mergeable view where table t12 can be eliminated:

create view v2b as

select

  t11.a as a,

  count(*) as b

from

  t11 left join t12 on t12.pk=t11.b

group by

  t11.a;

EXPLAIN shows it it is indeed eliminated:

explain

select t1.*

from

  t1 left join v2b on v2b.a=t1.a;

+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+

| id   | select_type     | table      | type | possible_keys | key  | key_len | ref     | rows | Extra       |

+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+

|    1 | PRIMARY         | t1         | ALL  | NULL          | NULL | NULL    | NULL    | 10   |             |

|    1 | PRIMARY         | <derived2> | ref  | key0          | key0 | 5       | j5.t1.a | 2    | Using where |

|    2 | LATERAL DERIVED | t11        | ref  | a             | a    | 4       | j5.t1.a | 1    |             |

+------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+

Now, let's add a column from t12 into the select list:

create view v2c as

select

  t11.a as a,

  max(t12.col1) as b

from

  t11 left join t12 on t12.pk=t11.b

group by

  t11.a;

and run a query that doesn't use it:

explain

select t1.*

from

  t1 left join v2c on v2c.a=t1.a;

EXPLAIN shows t12 was not eliminated:

+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+

| id   | select_type     | table      | type   | possible_keys | key     | key_len | ref      | rows | Extra       |

+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+

|    1 | PRIMARY         | t1         | ALL    | NULL          | NULL    | NULL    | NULL     | 10   |             |

|    1 | PRIMARY         | <derived2> | ref    | key0          | key0    | 5       | j5.t1.a  | 2    | Using where |

|    2 | LATERAL DERIVED | t11        | ref    | a             | a       | 4       | j5.t1.a  | 1    |             |

|    2 | LATERAL DERIVED | t12        | eq_ref | PRIMARY       | PRIMARY | 4       | j5.t11.b | 1    | Using where |

+------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+

Attachments

Issue Links

causes

MDEV-28881 Server crashes in Dep_analysis_context::create_table_value/check_func_dependency

Closed

MDEV-30007 SIGSEGV in st_select_lex_unit::is_derived_eliminated, runtime error: member access within null pointer of type 'struct TABLE' in st_select_lex_unit::is_derived_eliminated()

Closed

relates to

MDEV-27201 Non-merged derived table: do not compute unused fields

In Review

MDEV-28818 Eliminate derived tables having DISTINCT and/or UNION

Open

MDEV-28869 Eliminated tables are not shown in EXPLAIN [FORMAT=JSON] output

In Review

MDEV-28817 Derived table elimination does not work for multiple occurencies of a field

In Review

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia added a comment - 2022-01-18 15:44

Related to MDEV-27201. MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.

Sergei Petrunia added a comment - 2022-01-18 15:44 Related to MDEV-27201 . MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.

Sergei Petrunia added a comment - 2022-01-18 16:08

The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201.

Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here:
https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684

The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key".

If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY:

explain

select t1.*

from

  t1 left join v2b on v2b.PRIMARY_KEY=t1.a;

then it will be able to eliminate v2b entirely.

Sergei Petrunia added a comment - 2022-01-18 16:08 The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201 . Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here: https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684 The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key". If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY: explain select t1.* from t1 left join v2b on v2b.PRIMARY_KEY=t1.a; then it will be able to eliminate v2b entirely.

Sergei Petrunia added a comment - 2022-01-18 17:22

Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?

Sergei Petrunia added a comment - 2022-01-18 17:22 Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?

Sergei Petrunia added a comment - 2022-01-21 13:59

... upon a closer examination, it looks like we could not.

Sergei Petrunia added a comment - 2022-01-21 13:59 ... upon a closer examination, it looks like we could not.

Oleg Smirnov added a comment - 2022-02-05 14:40 - edited

Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary:

(opt_table_elimination.cc line 1594):
for (uint i=0; i < table->s->keys; i++)
{
KEY *key= table->key_info + i;
if (key->flags & HA_NOSAME)

{ Dep_module_key *key_dep; if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts))) return NULL; *key_list= key_dep; key_list= &(key_dep->next_table_key); }

}

Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?

Oleg Smirnov added a comment - 2022-02-05 14:40 - edited Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary: (opt_table_elimination.cc line 1594): for (uint i=0; i < table->s->keys; i++) { KEY *key= table->key_info + i; if (key->flags & HA_NOSAME) { Dep_module_key *key_dep; if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts))) return NULL; *key_list= key_dep; key_list= &(key_dep->next_table_key); } } Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?

Oleg Smirnov added a comment - 2022-02-11 11:04

Answering to my previous question to Sergey: there is a special code for generating keys for derived tables:
bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array)

And eventually this function calls:
if (table->add_tmp_key(table->s->keys, parts,
get_next_field_for_derived_key,
(uchar *) &first_keyuse,
FALSE)) // bool unique

I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().

Oleg Smirnov added a comment - 2022-02-11 11:04 Answering to my previous question to Sergey: there is a special code for generating keys for derived tables: bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array) And eventually this function calls: if (table->add_tmp_key(table->s->keys, parts, get_next_field_for_derived_key, (uchar *) &first_keyuse, FALSE)) // bool unique I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().

Sergei Petrunia added a comment - 2022-02-13 16:48

generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys.

Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need.
Consider this example:

create table t20 (a int);

create table t21 (b int);

insert into t20 values (1),(2),(3);

insert into t21 values (1),(2),(3);

create table t22 (a int, b int, c int);

insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;

explain

select * from

  t20 join t21

  left join (select a, b, count(*) as CNT

             from t22 group by a, b) T

             on T.a=t20.a and T.b=t21.b;

Debugging, I can see two keys created:

  KEY(a)

  KEY(b)

while the unique key would be the KEY(a,b) here.

Sergei Petrunia added a comment - 2022-02-13 16:48 generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys. Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need. Consider this example: create table t20 (a int ); create table t21 (b int ); insert into t20 values (1),(2),(3); insert into t21 values (1),(2),(3); create table t22 (a int , b int , c int ); insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000; explain select * from t20 join t21 left join ( select a, b, count (*) as CNT from t22 group by a, b) T on T.a=t20.a and T.b=t21.b; Debugging, I can see two keys created: KEY(a) KEY(b) while the unique key would be the KEY(a,b) here.

Sergei Petrunia added a comment - 2022-02-13 17:29

A piece in TABLE::add_tmp_key() which might be relevant to this MDEV:

    if (key_parts == select_list_items)

      if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) ||

          derived->check_distinct_in_union())

        keyinfo->rec_per_key[key_parts - 1]= 1;

They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.

Sergei Petrunia added a comment - 2022-02-13 17:29 A piece in TABLE::add_tmp_key() which might be relevant to this MDEV: if (key_parts == select_list_items) { if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) || derived->check_distinct_in_union()) keyinfo->rec_per_key[key_parts - 1]= 1; } They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.

Oleg Smirnov added a comment - 2022-04-02 06:52

The implementation is pushed to bb-10.9-~~MDEV-26278~~, ready for review.

Oleg Smirnov added a comment - 2022-04-02 06:52 The implementation is pushed to bb-10.9- MDEV-26278 , ready for review.

Sergei Petrunia added a comment - 2022-05-19 16:00

Review: https://lists.launchpad.net/maria-developers/msg13155.html

Sergei Petrunia added a comment - 2022-05-19 16:00 Review: https://lists.launchpad.net/maria-developers/msg13155.html

Oleg Smirnov added a comment - 2022-05-24 02:46

I believe I have addressed all the review comments.

Oleg Smirnov added a comment - 2022-05-24 02:46 I believe I have addressed all the review comments.

Sergei Petrunia added a comment - 2022-06-09 11:11

Second review: https://lists.launchpad.net/maria-developers/msg13159.html

Sergei Petrunia added a comment - 2022-06-09 11:11 Second review: https://lists.launchpad.net/maria-developers/msg13159.html

Sergei Petrunia added a comment - 2022-06-12 15:23

Third review: https://lists.launchpad.net/maria-developers/msg13161.html

Sergei Petrunia added a comment - 2022-06-12 15:23 Third review: https://lists.launchpad.net/maria-developers/msg13161.html

Oleg Smirnov added a comment - 2022-06-13 07:32 - edited

Final version is pushed to new branch preview-10.10-optimizer

Oleg Smirnov added a comment - 2022-06-13 07:32 - edited Final version is pushed to new branch preview-10.10-optimizer

Alice Sherepa added a comment - 2022-07-20 08:34

It is ok to push it into 10.10. Thanks!

Alice Sherepa added a comment - 2022-07-20 08:34 It is ok to push it into 10.10. Thanks!

People

Assignee:: Oleg Smirnov

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2021-07-29 17:57

Updated:: 2023-03-21 10:44

Resolved:: 2022-07-31 14:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server