Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26278

Table elimination does not work across derived tables

Details

    Description

      A dataset (not necessarily minimal):

      create table t1 (a int);
      insert into t1 select seq from seq_1_to_10;
       
      create table t11 (
        a int not null,
        b int,
        key(a)
      );
      insert into t11 select A.seq, A.seq+B.seq 
      from 
        seq_1_to_100 A,
        seq_1_to_1000 B;
      

      create table t12 (
        pk int primary key,
        col1 int
      );
      insert into t12 select seq, seq from seq_1_to_100000;
      

      A non-mergeable view where table t12 can be eliminated:

      create view v2b as 
      select 
        t11.a as a,
        count(*) as b
      from
        t11 left join t12 on t12.pk=t11.b
      group by
        t11.a;
      

      EXPLAIN shows it it is indeed eliminated:

      explain
      select t1.*
      from 
        t1 left join v2b on v2b.a=t1.a;
      

      +------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+
      | id   | select_type     | table      | type | possible_keys | key  | key_len | ref     | rows | Extra       |
      +------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+
      |    1 | PRIMARY         | t1         | ALL  | NULL          | NULL | NULL    | NULL    | 10   |             |
      |    1 | PRIMARY         | <derived2> | ref  | key0          | key0 | 5       | j5.t1.a | 2    | Using where |
      |    2 | LATERAL DERIVED | t11        | ref  | a             | a    | 4       | j5.t1.a | 1    |             |
      +------+-----------------+------------+------+---------------+------+---------+---------+------+-------------+
      

      Now, let's add a column from t12 into the select list:

      create view v2c as 
      select 
        t11.a as a,
        max(t12.col1) as b
      from
        t11 left join t12 on t12.pk=t11.b
      group by
        t11.a;
      

      and run a query that doesn't use it:

      explain
      select t1.* 
      from 
        t1 left join v2c on v2c.a=t1.a;
      

      EXPLAIN shows t12 was not eliminated:

      +------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+
      | id   | select_type     | table      | type   | possible_keys | key     | key_len | ref      | rows | Extra       |
      +------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+
      |    1 | PRIMARY         | t1         | ALL    | NULL          | NULL    | NULL    | NULL     | 10   |             |
      |    1 | PRIMARY         | <derived2> | ref    | key0          | key0    | 5       | j5.t1.a  | 2    | Using where |
      |    2 | LATERAL DERIVED | t11        | ref    | a             | a       | 4       | j5.t1.a  | 1    |             |
      |    2 | LATERAL DERIVED | t12        | eq_ref | PRIMARY       | PRIMARY | 4       | j5.t11.b | 1    | Using where |
      +------+-----------------+------------+--------+---------------+---------+---------+----------+------+-------------+
      

      Attachments

        Issue Links

          Activity

            Related to MDEV-27201. MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.

            psergei Sergei Petrunia added a comment - Related to MDEV-27201 . MDEV-27201 focuses on the point that the unused fields in derived tables are still created and computed.

            The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201.

            Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here:
            https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684

            The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key".

            If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY:

            explain
            select t1.*
            from 
              t1 left join v2b on v2b.PRIMARY_KEY=t1.a;
            

            then it will be able to eliminate v2b entirely.

            psergei Sergei Petrunia added a comment - The report text in this MDEV describes how Table Elimination could be made to work when eliminated tables are used in unused columns of derived tables. This is worth implementing, there is also another MDEV about the same topic - MDEV-27201 . Another goal which wasn't explicitly mentioned so far: can we eliminate the whole derived table in this case, like it is shown here: https://jira.mariadb.org/browse/MDEV-26278?focusedCommentId=195684&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195684 The idea is fairly basic: VIEWs v2b and v2c have "group by t11.a" which means v2b.a (or v2c.a) is the derived table's "primary key". If Table Elimination in the top-level select is aware that v2b.a is v2b's PRIMARY_KEY: explain select t1.* from t1 left join v2b on v2b.PRIMARY_KEY=t1.a; then it will be able to eliminate v2b entirely.

            Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?

            psergei Sergei Petrunia added a comment - Question: could we borrow functional dependency check code from https://github.com/MariaDB/server/tree/10.5-mdev-11588 ?

            ... upon a closer examination, it looks like we could not.

            psergei Sergei Petrunia added a comment - ... upon a closer examination, it looks like we could not.
            oleg.smirnov Oleg Smirnov added a comment - - edited

            Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary:

            (opt_table_elimination.cc line 1594):
            for (uint i=0; i < table->s->keys; i++)
            {
            KEY *key= table->key_info + i;
            if (key->flags & HA_NOSAME)

            { Dep_module_key *key_dep; if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts))) return NULL; *key_list= key_dep; key_list= &(key_dep->next_table_key); }

            }

            Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?

            oleg.smirnov Oleg Smirnov added a comment - - edited Sergey, I can see the derived table v2c has one key as I debug the elimination code, but this key doesn't have HA_NOSAME flag and thus not considered as primary: (opt_table_elimination.cc line 1594): for (uint i=0; i < table->s->keys; i++) { KEY *key= table->key_info + i; if (key->flags & HA_NOSAME) { Dep_module_key *key_dep; if (!(key_dep= new Dep_module_key(tbl_dep, i, key->user_defined_key_parts))) return NULL; *key_list= key_dep; key_list= &(key_dep->next_table_key); } } Since the key exists, I believe there should be some specific code that assigns keys for derived tables but I can't find it. Can you please give me a hint where to look at?
            oleg.smirnov Oleg Smirnov added a comment -

            Answering to my previous question to Sergey: there is a special code for generating keys for derived tables:
            bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array)

            And eventually this function calls:
            if (table->add_tmp_key(table->s->keys, parts,
            get_next_field_for_derived_key,
            (uchar *) &first_keyuse,
            FALSE)) // bool unique

            I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().

            oleg.smirnov Oleg Smirnov added a comment - Answering to my previous question to Sergey: there is a special code for generating keys for derived tables: bool generate_derived_keys(DYNAMIC_ARRAY *keyuse_array) And eventually this function calls: if (table->add_tmp_key(table->s->keys, parts, get_next_field_for_derived_key, (uchar *) &first_keyuse, FALSE)) // bool unique I.e. generated keys are always non-unique and this prevents the table elimination logic from correct determination of functional dependency. So now the task is to determine whether the generated key is unique and pass the correct bool flag to add_tmp_table().

            generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys.

            Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need.
            Consider this example:

            create table t20 (a int);
            create table t21 (b int);
            insert into t20 values (1),(2),(3);
            insert into t21 values (1),(2),(3);
             
            create table t22 (a int, b int, c int);
            insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;
             
            explain
            select * from
              t20 join t21
              left join (select a, b, count(*) as CNT
                         from t22 group by a, b) T
                         on T.a=t20.a and T.b=t21.b;
            

            Debugging, I can see two keys created:

              KEY(a)
              KEY(b)
            

            while the unique key would be the KEY(a,b) here.

            psergei Sergei Petrunia added a comment - generate_derived_keys() generates "keys"(or rather, their descriptions) for derived_with_keys optimization. Initially, my intent for this MDEV was not to make use of derived_with_keys. Now, I'm thinking of whether we should do that. One argument against it is that it doesn't generate the set of keys we need. Consider this example: create table t20 (a int ); create table t21 (b int ); insert into t20 values (1),(2),(3); insert into t21 values (1),(2),(3);   create table t22 (a int , b int , c int ); insert into t22 select seq/100, seq/100, seq/100 from seq_1_to_10000;   explain select * from t20 join t21 left join ( select a, b, count (*) as CNT from t22 group by a, b) T on T.a=t20.a and T.b=t21.b; Debugging, I can see two keys created: KEY(a) KEY(b) while the unique key would be the KEY(a,b) here.

            A piece in TABLE::add_tmp_key() which might be relevant to this MDEV:

                if (key_parts == select_list_items)
                {
                  if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) ||
                      derived->check_distinct_in_union())
                    keyinfo->rec_per_key[key_parts - 1]= 1;
                }
            

            They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.

            psergei Sergei Petrunia added a comment - A piece in TABLE::add_tmp_key() which might be relevant to this MDEV: if (key_parts == select_list_items) { if ((!first->is_part_of_union() && (first->options & SELECT_DISTINCT)) || derived->check_distinct_in_union()) keyinfo->rec_per_key[key_parts - 1]= 1; } They detect another case where the derived table has no duplicates. They only adjust the index statistics, though.
            oleg.smirnov Oleg Smirnov added a comment -

            The implementation is pushed to bb-10.9-MDEV-26278, ready for review.

            oleg.smirnov Oleg Smirnov added a comment - The implementation is pushed to bb-10.9- MDEV-26278 , ready for review.
            psergei Sergei Petrunia added a comment - Review: https://lists.launchpad.net/maria-developers/msg13155.html
            oleg.smirnov Oleg Smirnov added a comment -

            I believe I have addressed all the review comments.

            oleg.smirnov Oleg Smirnov added a comment - I believe I have addressed all the review comments.
            psergei Sergei Petrunia added a comment - Second review: https://lists.launchpad.net/maria-developers/msg13159.html
            psergei Sergei Petrunia added a comment - Third review: https://lists.launchpad.net/maria-developers/msg13161.html
            oleg.smirnov Oleg Smirnov added a comment - - edited

            Final version is pushed to new branch preview-10.10-optimizer

            oleg.smirnov Oleg Smirnov added a comment - - edited Final version is pushed to new branch preview-10.10-optimizer
            alice Alice Sherepa added a comment -

            It is ok to push it into 10.10. Thanks!

            alice Alice Sherepa added a comment - It is ok to push it into 10.10. Thanks!

            People

              oleg.smirnov Oleg Smirnov
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.