Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27201

Non-merged derived table: do not compute unused fields

    XMLWordPrintable

Details

    Description

      (Filing this based on my observations from analyzing queries at a prospect considering migration to MariaDB)

      The problem: If a non-merged derived table has unused columns, they are still created and computed.

      • Creating a column might not be a big deal
      • Computing a column can cause some unused overhead
      • The worst part is that Table Elimination cannot eliminate the parts of query plan that compute useless information.

      Testcase:

      create table t1 (
        group_id int,
        a int, 
        b int,
        c int
      );
      insert into t1 select  seq/100, seq, seq, seq from seq_1_to_10000;
       
      create table t2 (a int);
      insert into t2 values (1),(2),(3);
      

      explain 
      select
        T.group_id,
        T.MAXA
      from 
        t2,
        (select 
           group_id,
           max(a) as MAXA,
           sum(b) as SUMB
         from
           t1
         group by group_id
        ) as T 
      where
       T.group_id = t2.a;
      

      +------+-------------+------------+------+---------------+------+---------+-----------+-------+---------------------------------+
      | id   | select_type | table      | type | possible_keys | key  | key_len | ref       | rows  | Extra                           |
      +------+-------------+------------+------+---------------+------+---------+-----------+-------+---------------------------------+
      |    1 | PRIMARY     | t2         | ALL  | NULL          | NULL | NULL    | NULL      | 3     | Using where                     |
      |    1 | PRIMARY     | <derived2> | ref  | key0          | key0 | 5       | test.t2.a | 100   |                                 |
      |    2 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL      | 10000 | Using temporary; Using filesort |
      +------+-------------+------------+------+---------------+------+---------+-----------+-------+---------------------------------+
      

      Note that sum(b) as SUMB is not used.

      Put breakpoints in

      • create_tmp_table
      • Item_sum_sum::update_field

      one can still see that the column for SUM() is created and the value is computed.

      How it affects table elimination

      Add another table into the subquery:

      create table t3 (
        pk int primary key,
        col int
      );
      insert into t3 select  seq, seq from seq_1_to_10000;
      

      explain 
      select
        T.group_id,
        T.MAXA
      from 
        t2,
        (select 
           group_id,
           max(a) as MAXA,
           sum(b) as SUMB
         from
           t1 left join t3 on t3.pk=t1.c
         group by group_id
        ) as T 
      where
       T.group_id = t2.a;
      

      +------+-------------+------------+------+---------------+------+---------+-----------+------+---------------------------------+
      | id   | select_type | table      | type | possible_keys | key  | key_len | ref       | rows | Extra                           |
      +------+-------------+------------+------+---------------+------+---------+-----------+------+---------------------------------+
      |    1 | PRIMARY     | t2         | ALL  | NULL          | NULL | NULL    | NULL      | 3    | Using where                     |
      |    1 | PRIMARY     | <derived2> | ref  | key0          | key0 | 5       | test.t2.a | 99   |                                 |
      |    2 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL      | 9963 | Using temporary; Using filesort |
      +------+-------------+------------+------+---------------+------+---------+-----------+------+---------------------------------+
      

      Ok, t3 is eliminated.

      Now, use t3.b in the SUM:

      explain 
      select
        T.group_id,
        T.MAXA
      from 
        t2,
        (select 
           group_id,
           max(a) as MAXA,
           sum(t3.col) as SUMB
         from
           t1 left join t3 on t3.pk=t1.c
         group by group_id
        ) as T 
      where
       T.group_id = t2.a;
      

      +------+-------------+------------+--------+---------------+---------+---------+-----------+------+---------------------------------+
      | id   | select_type | table      | type   | possible_keys | key     | key_len | ref       | rows | Extra                           |
      +------+-------------+------------+--------+---------------+---------+---------+-----------+------+---------------------------------+
      |    1 | PRIMARY     | t2         | ALL    | NULL          | NULL    | NULL    | NULL      | 3    | Using where                     |
      |    1 | PRIMARY     | <derived2> | ref    | key0          | key0    | 5       | test.t2.a | 99   |                                 |
      |    2 | DERIVED     | t1         | ALL    | NULL          | NULL    | NULL    | NULL      | 9963 | Using temporary; Using filesort |
      |    2 | DERIVED     | t3         | eq_ref | PRIMARY       | PRIMARY | 4       | test.t1.c | 1    | Using where                     |
      +------+-------------+------------+--------+---------------+---------+---------+-----------+------+---------------------------------+
      

      And now table elimination is unable to remove t3.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.