Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34894

Poor query plan, because range estimates are not reused for ref(const)

    XMLWordPrintable

Details

    Description

      Discovered this when analyzing poor performance in TODO-4858.

      MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).

      This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.

      Testcase:

      create table t0 (
        a int,
        b int,
        dummy int
      );
      insert into t0 select seq,seq,seq from seq_1_to_10;
       
      create table t1 (
        pk1 int,
        pk2 int,
        pk3 int,
        key1 int,
        key(key1),
        filler char(100),
        primary key(pk1,pk2,pk3)
      );
       
      insert into t1
      select
        seq, seq, seq,
        FLOOR(seq/2),
        'filler-data'
      from seq_1_to_10000;
      analyze table t1;
      update t1 set pk1=1 where pk1 between 1 and 200;
      create table t2 (
       col int
      );
      insert into t2 select seq from seq_1_to_10000;
      

      On average, there is just one row with with t1.pk1=... :

      MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref       | rows | Extra       |
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL      | 10   | Using where |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | test.t0.a | 1    |             |
      +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
      

      However the value of pk1=1 is an outlier with 200 rows:

      MariaDB [test]> explain select * from t1 where pk1=1;
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | const | 200  |       |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
      

      In basic case, the optimizer can see this and use key1 instead of PK:

      MariaDB [test]> explain select * from t0, t1, t2 where   t1.pk1=1  and   t1.key1=t0.b;
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref             | rows | Extra                              |
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL | NULL    | NULL            | 10   | Using where                        |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY,key1  | key1 | 9       | test.t0.b,const | 1    |                                    |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL            | 9980 | Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
      

      If we force use of PK, #rows shows 200:

      MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where   t1.pk1=1  and   t1.key1=t0.b;
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                              |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL  | 10   |                                    |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | const | 200  | Using where                        |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL    | NULL    | NULL  | 9980 | Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
      

      However if we make situation convoluted by adding unusable conditions on pk2 and pk3:

      explain select * from t0, t1, t2
      where
        t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and
        t1.key1=t0.b;
      

      we'll get key=PRIMARY, rows=1:

      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                                           |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      |    1 | SIMPLE      | t0    | ALL  | NULL          | NULL    | NULL    | NULL  | 10   |                                                 |
      |    1 | SIMPLE      | t1    | ref  | PRIMARY,key1  | PRIMARY | 4       | const | 1    | Using where                                     |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL    | NULL    | NULL  | 9980 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
      

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.