Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5980

EITS: if condition is used for REF access, its selectivity is still in filtered%

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.9
    • 10.0.11
    • None

    Description

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1(key1 int, col1 int, key(key1));
      insert into t1 select A.a, A.a from ten A, ten B, ten C;
       
      set histogram_size=100;
      set use_stat_tables='preferably';
      set optimizer_use_condition_selectivity=4;
      analyze table t1 persistent for all;

      mysql> explain extended select * from t1 where col1=2;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

      This is ok.

      mysql> explain extended select * from t1 where key1=2;
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      |    1 | SIMPLE      | t1    | ref  | key1          | key1 | 5       | const |  100 |    10.00 |       |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

      This is not ok. "key1=2" is used for ref access. However, its selectivity is still present in "filtered%". Ref access is employed when a condition is very selective, counting its selectivity twice will cause gross under-estimation of output cardinality.

      Attachments

        Activity

          psergei Sergei Petrunia created issue -
          psergei Sergei Petrunia made changes -
          Field Original Value New Value
          Assignee Igor Babaev [ igor ]
          psergei Sergei Petrunia made changes -
          Assignee Igor Babaev [ igor ] Sergei Petrunia [ psergey ]
          psergei Sergei Petrunia made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          psergei Sergei Petrunia made changes -
          Description
          {noformat}
          create table ten(a int);
          insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
          create table t1(key1 int, col1 int, key(key1));
          insert into t1 select A.a, A.a from ten A, ten B, ten C;

          set histogram_size=100;
          set use_stat_tables='preferably';
          set optimizer_use_condition_selectivity=4;
          analyze table t1 persistent for all;
          {noformat}

          {noformat}
          mysql> explain extended select * from t1 where col1=2;
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | 9.90 | Using where |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          {noformat}


          {noformat}
          mysql> explain extended select * from t1 where key1=2;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | const | 100 | 10.00 | |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          {noformat}
          {noformat}
          create table ten(a int);
          insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
          create table t1(key1 int, col1 int, key(key1));
          insert into t1 select A.a, A.a from ten A, ten B, ten C;

          set histogram_size=100;
          set use_stat_tables='preferably';
          set optimizer_use_condition_selectivity=4;
          analyze table t1 persistent for all;
          {noformat}

          {noformat}
          mysql> explain extended select * from t1 where col1=2;
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | 9.90 | Using where |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          {noformat}

          This is ok.

          {noformat}
          mysql> explain extended select * from t1 where key1=2;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | 1 | SIMPLE | t1 | ref | key1 | key1 | 5 | const | 100 | 10.00 | |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          {noformat}

          This is not ok. "key1=2" is used for ref access. However, its selectivity is still present in "filtered%". Ref access is employed when a condition is very selective, counting its selectivity twice will cause gross under-estimation of output cardinality.
          psergei Sergei Petrunia made changes -
          Labels eits
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0.11 [ 15200 ]
          Fix Version/s 10.0.10 [ 14500 ]
          • so it is ref(const).
          • matching_candidates_in_table() is not called.

          We should recongize it as an equality that was bound by ref and so discount its
          selectivity (the selectivity comes from ... range... how do we figure out that
          ref-bound equality has contributed as a possible range scan?)

          Idea: keep array of induced selectivity

          table->added_selectivity[field_no] = 0.123456

          and if $field_no is used for ref access, than it's not contributing to
          selectivity, and we remove its selectivity from the product.

          Q: what to do about range accesses that span multiple key parts? It's not
          possible to remove one key part from selectivity calculations...

          psergei Sergei Petrunia added a comment - so it is ref(const). matching_candidates_in_table() is not called. We should recongize it as an equality that was bound by ref and so discount its selectivity (the selectivity comes from ... range... how do we figure out that ref-bound equality has contributed as a possible range scan?) Idea: keep array of induced selectivity table->added_selectivity [field_no] = 0.123456 and if $field_no is used for ref access, than it's not contributing to selectivity, and we remove its selectivity from the product. Q: what to do about range accesses that span multiple key parts? It's not possible to remove one key part from selectivity calculations...

          Modified the testcase a little bit so that data distribution is not uniform:

          create table t1a like t1;
          insert into t1a select * from t1;
          update t1a set key1=9 where key1=5 limit 50;
          update t1a set col1=9 where col1=5 limit 50;
          analyze table t1a persistent for all;

          explain extended select * from t1a where key1=5;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
           
          explain extended select * from t1a where key1=9;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |  150 |    15.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

          Now, will debug this query:

          explain extended select * from t1a where key1=5;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

          == calculate_cond_selectivity_for_table() ==
          finds quick_cond_selectivity=0.05
          sets it as table->cond_selectivity
          never attempts to perform records_in_column_ranges()-based analysis (this is
          the expected behavior)

          == table_cond_selectivity ==
          Starts with

          double sel= s->table->cond_selectivity; // = 0.05

          Eventually it runs this code:

                if (keyuse->val->const_item())
                  sel*= table->field[fldno]->cond_selectivity; 

          However, field->cond_selectivity=1 there (because we've used range access
          estimates, not records_in_column_ranges() estimates).

          The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05
          We return to best_extension_by_limited_search() and see

          (gdb) p partial_join_cardinality
            $8 = 2.5
          (gdb) print current_record_count
            $9 = 50
          (gdb) print pushdown_cond_selectivity
            $10 = 0.050000000000000003
          (gdb) print partial_join_cardinality
            $11 = 2.5

          psergei Sergei Petrunia added a comment - Modified the testcase a little bit so that data distribution is not uniform: create table t1a like t1; insert into t1a select * from t1; update t1a set key1=9 where key1=5 limit 50; update t1a set col1=9 where col1=5 limit 50; analyze table t1a persistent for all; explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+   explain extended select * from t1a where key1=9; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 150 | 15.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ Now, will debug this query: explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ == calculate_cond_selectivity_for_table() == finds quick_cond_selectivity=0.05 sets it as table->cond_selectivity never attempts to perform records_in_column_ranges()-based analysis (this is the expected behavior) == table_cond_selectivity == Starts with double sel= s->table->cond_selectivity; // = 0.05 Eventually it runs this code: if (keyuse->val->const_item()) sel*= table->field[fldno]->cond_selectivity; However, field->cond_selectivity=1 there (because we've used range access estimates, not records_in_column_ranges() estimates). The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05 We return to best_extension_by_limited_search() and see (gdb) p partial_join_cardinality $8 = 2.5 (gdb) print current_record_count $9 = 50 (gdb) print pushdown_cond_selectivity $10 = 0.050000000000000003 (gdb) print partial_join_cardinality $11 = 2.5

          Patch submitted for review

          psergei Sergei Petrunia added a comment - Patch submitted for review
          psergei Sergei Petrunia made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow defaullt [ 37802 ] MariaDB v2 [ 43654 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 43654 ] MariaDB v3 [ 63682 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 63682 ] MariaDB v4 [ 147719 ]

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.