Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34894

Poor query plan, because range estimates are not reused for ref(const)

Details

    Description

      Discovered this when analyzing poor performance in TODO-4858.

      MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

      This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

      Testcase:

      create table t0 (
        a int,
        b int,
        dummy int
      );
      insert into t0 select seq,seq,seq from seq_1_to_10;
       
      create table t1 (
        pk1 int,
        pk2 int,
        pk3 int,
        key1 int,
        key(key1),
        filler char(100),
        primary key(pk1,pk2,pk3)
      );
       
      insert into t1
      select
        seq, seq, seq,
        FLOOR(seq/2),
        'filler-data'
      from seq_1_to_10000;
      analyze table t1;
      update t1 set pk1=1 where pk1 between 1 and 200;
      create table t2 (
       col int
      );
      insert into t2 select seq from seq_1_to_10000;
      

      On average, there is just one row with with t1.pk1=... :

      MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref       | rows | Extra       |
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL      | 10   | Using where |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | test.t0.a | 1    |             |
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      

      However the value of pk1=1 is an outlier with 200 rows:

      MariaDB [test]> explain select * from t1 where pk1=1;
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | const | 200  |       |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      

      In basic case, the optimizer can see this and use key1 instead of PK:

      MariaDB [test]> explain select * from t0, t1, t2 where   t1.pk1=1  and   t1.key1=t0.b;
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref             | rows | Extra                              |
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL | NULL    | NULL            | 10   | Using where                        |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY,key1  | key1 | 9       | test.t0.b,const | 1    |                                    |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL            | 9980 | Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      

      If we force use of PK, #rows shows 200:

      MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where   t1.pk1=1  and   t1.key1=t0.b;
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                              |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL  | 10   |                                    |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | const | 200  | Using where                        |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL    | NULL    | NULL  | 9980 | Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      

      However if we make situation convoluted by adding unusable conditions on pk2 and pk3:

      explain select * from t0, t1, t2
      where
        t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and
        t1.key1=t0.b;
      

      we'll get key=PRIMARY, rows=1:

      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                                           |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL  | 10   |                                                 |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY,key1  | PRIMARY | 4       | const | 1    | Using where                                     |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL    | NULL    | NULL  | 9980 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      

      Attachments

        Activity

          psergei Sergei Petrunia created issue -
          psergei Sergei Petrunia made changes -
          Field Original Value New Value
          psergei Sergei Petrunia made changes -
          Description Discovered this when analyzing poor performance in TODO-4858.

          MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

          This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

          Testcase:
          {code}
          create table t0 (
            a int,
            b int,
            dummy int
          );
          insert into t0 select seq,seq,seq from seq_1_to_10;

          create table t1 (
            pk1 int,
            pk2 int,
            pk3 int,
            key1 int,
            key(key1),
            filler char(100),
            primary key(pk1,pk2,pk3)
          );

          insert into t1
          select
            seq, seq, seq,
            FLOOR(seq/2),
            'filler-data'
          from seq_1_to_10000;
          analyze table t1;
          update t1 set pk1=1 where pk1 between 1 and 200;
          create table t2 (
           col int
          );
          insert into t2 select seq from seq_1_to_10000;
          {code}
          Discovered this when analyzing poor performance in TODO-4858.

          MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

          This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

          Testcase:
          {code:sql}
          create table t0 (
            a int,
            b int,
            dummy int
          );
          insert into t0 select seq,seq,seq from seq_1_to_10;

          create table t1 (
            pk1 int,
            pk2 int,
            pk3 int,
            key1 int,
            key(key1),
            filler char(100),
            primary key(pk1,pk2,pk3)
          );

          insert into t1
          select
            seq, seq, seq,
            FLOOR(seq/2),
            'filler-data'
          from seq_1_to_10000;
          analyze table t1;
          update t1 set pk1=1 where pk1 between 1 and 200;
          create table t2 (
           col int
          );
          insert into t2 select seq from seq_1_to_10000;
          {code}

          On average, there is just one row with with t1.pk1=... :
          {code}
          MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          {code}

          However the value of pk1=1 is an outlier with 200 rows:
          {code}
          MariaDB [test]> explain select * from t1 where pk1=1;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          {code}

          In basic case, the optimizer can see this and use key1 instead of PK:
          {code}
          MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          {code}

          If we force use of PK, #rows shows 200:
          {code}
          MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          {code}

          However if we make situation convoluted by adding unusable conditions on pk2 and pk3:
          {code:sql}
          explain select * from t0, t1, t2
          where
            t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and
            t1.key1=t0.b;
          we'll get key=PRIMARY, rows=1:
          {code}
          {code}
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          {code}
          psergei Sergei Petrunia made changes -
          Description Discovered this when analyzing poor performance in TODO-4858.

          MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

          This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

          Testcase:
          {code:sql}
          create table t0 (
            a int,
            b int,
            dummy int
          );
          insert into t0 select seq,seq,seq from seq_1_to_10;

          create table t1 (
            pk1 int,
            pk2 int,
            pk3 int,
            key1 int,
            key(key1),
            filler char(100),
            primary key(pk1,pk2,pk3)
          );

          insert into t1
          select
            seq, seq, seq,
            FLOOR(seq/2),
            'filler-data'
          from seq_1_to_10000;
          analyze table t1;
          update t1 set pk1=1 where pk1 between 1 and 200;
          create table t2 (
           col int
          );
          insert into t2 select seq from seq_1_to_10000;
          {code}

          On average, there is just one row with with t1.pk1=... :
          {code}
          MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          {code}

          However the value of pk1=1 is an outlier with 200 rows:
          {code}
          MariaDB [test]> explain select * from t1 where pk1=1;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          {code}

          In basic case, the optimizer can see this and use key1 instead of PK:
          {code}
          MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          {code}

          If we force use of PK, #rows shows 200:
          {code}
          MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          {code}

          However if we make situation convoluted by adding unusable conditions on pk2 and pk3:
          {code:sql}
          explain select * from t0, t1, t2
          where
            t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and
            t1.key1=t0.b;
          we'll get key=PRIMARY, rows=1:
          {code}
          {code}
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          {code}
          Discovered this when analyzing poor performance in TODO-4858.

          MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

          This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

          Testcase:
          {code:sql}
          create table t0 (
            a int,
            b int,
            dummy int
          );
          insert into t0 select seq,seq,seq from seq_1_to_10;

          create table t1 (
            pk1 int,
            pk2 int,
            pk3 int,
            key1 int,
            key(key1),
            filler char(100),
            primary key(pk1,pk2,pk3)
          );

          insert into t1
          select
            seq, seq, seq,
            FLOOR(seq/2),
            'filler-data'
          from seq_1_to_10000;
          analyze table t1;
          update t1 set pk1=1 where pk1 between 1 and 200;
          create table t2 (
           col int
          );
          insert into t2 select seq from seq_1_to_10000;
          {code}

          On average, there is just one row with with t1.pk1=... :
          {code}
          MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | |
          +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
          {code}

          However the value of pk1=1 is an outlier with 200 rows:
          {code}
          MariaDB [test]> explain select * from t1 where pk1=1;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
          {code}

          In basic case, the optimizer can see this and use key1 instead of PK:
          {code}
          MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
          {code}

          If we force use of PK, #rows shows 200:
          {code}
          MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b;
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
          {code}

          However if we make situation convoluted by adding unusable conditions on pk2 and pk3:
          {code:sql}
          explain select * from t0, t1, t2
          where
            t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and
            t1.key1=t0.b;
          {code}
          we'll get key=PRIMARY, rows=1:
          {code}
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
          | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where |
          | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
          +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
          {code}
          psergei Sergei Petrunia made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          psergei Sergei Petrunia made changes -
          Fix Version/s 10.6.20 [ 29903 ]
          Fix Version/s 10.6 [ 24028 ]
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          JIraAutomate JiraAutomate made changes -
          Fix Version/s 10.11.10 [ 29904 ]
          Fix Version/s 11.2.6 [ 29906 ]
          Fix Version/s 11.4.4 [ 29907 ]
          psergei Sergei Petrunia made changes -
          Fix Version/s 11.6.2 [ 29908 ]
          psergei Sergei Petrunia made changes -
          Labels cardinality

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.