Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5980

EITS: if condition is used for REF access, its selectivity is still in filtered%

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.9
    • 10.0.11
    • None

    Description

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1(key1 int, col1 int, key(key1));
      insert into t1 select A.a, A.a from ten A, ten B, ten C;
       
      set histogram_size=100;
      set use_stat_tables='preferably';
      set optimizer_use_condition_selectivity=4;
      analyze table t1 persistent for all;

      mysql> explain extended select * from t1 where col1=2;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

      This is ok.

      mysql> explain extended select * from t1 where key1=2;
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      |    1 | SIMPLE      | t1    | ref  | key1          | key1 | 5       | const |  100 |    10.00 |       |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

      This is not ok. "key1=2" is used for ref access. However, its selectivity is still present in "filtered%". Ref access is employed when a condition is very selective, counting its selectivity twice will cause gross under-estimation of output cardinality.

      Attachments

        Activity

          • so it is ref(const).
          • matching_candidates_in_table() is not called.

          We should recongize it as an equality that was bound by ref and so discount its
          selectivity (the selectivity comes from ... range... how do we figure out that
          ref-bound equality has contributed as a possible range scan?)

          Idea: keep array of induced selectivity

          table->added_selectivity[field_no] = 0.123456

          and if $field_no is used for ref access, than it's not contributing to
          selectivity, and we remove its selectivity from the product.

          Q: what to do about range accesses that span multiple key parts? It's not
          possible to remove one key part from selectivity calculations...

          psergei Sergei Petrunia added a comment - so it is ref(const). matching_candidates_in_table() is not called. We should recongize it as an equality that was bound by ref and so discount its selectivity (the selectivity comes from ... range... how do we figure out that ref-bound equality has contributed as a possible range scan?) Idea: keep array of induced selectivity table->added_selectivity [field_no] = 0.123456 and if $field_no is used for ref access, than it's not contributing to selectivity, and we remove its selectivity from the product. Q: what to do about range accesses that span multiple key parts? It's not possible to remove one key part from selectivity calculations...

          Modified the testcase a little bit so that data distribution is not uniform:

          create table t1a like t1;
          insert into t1a select * from t1;
          update t1a set key1=9 where key1=5 limit 50;
          update t1a set col1=9 where col1=5 limit 50;
          analyze table t1a persistent for all;

          explain extended select * from t1a where key1=5;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
           
          explain extended select * from t1a where key1=9;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |  150 |    15.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

          Now, will debug this query:

          explain extended select * from t1a where key1=5;
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
          |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
          +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+

          == calculate_cond_selectivity_for_table() ==
          finds quick_cond_selectivity=0.05
          sets it as table->cond_selectivity
          never attempts to perform records_in_column_ranges()-based analysis (this is
          the expected behavior)

          == table_cond_selectivity ==
          Starts with

          double sel= s->table->cond_selectivity; // = 0.05

          Eventually it runs this code:

                if (keyuse->val->const_item())
                  sel*= table->field[fldno]->cond_selectivity; 

          However, field->cond_selectivity=1 there (because we've used range access
          estimates, not records_in_column_ranges() estimates).

          The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05
          We return to best_extension_by_limited_search() and see

          (gdb) p partial_join_cardinality
            $8 = 2.5
          (gdb) print current_record_count
            $9 = 50
          (gdb) print pushdown_cond_selectivity
            $10 = 0.050000000000000003
          (gdb) print partial_join_cardinality
            $11 = 2.5

          psergei Sergei Petrunia added a comment - Modified the testcase a little bit so that data distribution is not uniform: create table t1a like t1; insert into t1a select * from t1; update t1a set key1=9 where key1=5 limit 50; update t1a set col1=9 where col1=5 limit 50; analyze table t1a persistent for all; explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+   explain extended select * from t1a where key1=9; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 150 | 15.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ Now, will debug this query: explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ == calculate_cond_selectivity_for_table() == finds quick_cond_selectivity=0.05 sets it as table->cond_selectivity never attempts to perform records_in_column_ranges()-based analysis (this is the expected behavior) == table_cond_selectivity == Starts with double sel= s->table->cond_selectivity; // = 0.05 Eventually it runs this code: if (keyuse->val->const_item()) sel*= table->field[fldno]->cond_selectivity; However, field->cond_selectivity=1 there (because we've used range access estimates, not records_in_column_ranges() estimates). The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05 We return to best_extension_by_limited_search() and see (gdb) p partial_join_cardinality $8 = 2.5 (gdb) print current_record_count $9 = 50 (gdb) print pushdown_cond_selectivity $10 = 0.050000000000000003 (gdb) print partial_join_cardinality $11 = 2.5

          Patch submitted for review

          psergei Sergei Petrunia added a comment - Patch submitted for review

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.