Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5985

EITS: selectivity estimates look illogical for join and non-key equalities

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.9
    • 10.0.12
    • None

    Description

      Create a dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
      create table one_k2 as select * from one_k;
      set histogram_size=100;
      set use_stat_tables='preferably';
      set optimizer_use_condition_selectivity=4;
      analyze table one_k persistent for all;
      analyze table one_k2 persistent for all;

      Let's see what histograms give us

      explain extended select * from one_k A where A.a < 40;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

      explain extended select * from one_k2 B where B.a < 100;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

      The numbers looks ok.

      Now, let's try a join:

      explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      |    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |
      |    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

      Looks ok. Now, let's add a condition.

      explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100 and B.a=A.a;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      |    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |
      |    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

      And filtered% becomes 100%. It used to be 9.90%, we have added another condition into WHERE and now the optimizer expects the condition to be less selective! This looks wrong.

      Attachments

        Issue Links

          Activity

            Look at the code near this comment:

                If the field f from the table is equal to a field from one the
                earlier joined tables then the selectivity of the range conditions
                over the field f must be discounted.

            Maybe, this is the cause of this bug?

            psergei Sergei Petrunia added a comment - Look at the code near this comment: If the field f from the table is equal to a field from one the earlier joined tables then the selectivity of the range conditions over the field f must be discounted. Maybe, this is the cause of this bug?

            Patch submitted for review

            psergei Sergei Petrunia added a comment - Patch submitted for review

            Summary of discussion with igor earlier this week:

            The committed patch doesn't handle all cases. When the multi-equality is a part
            of ref access, its selectivity is counted twice. Example:

            create table one_k3 (a int, b int, key(a));
            insert into one_k3 select A.a + 10*B.a, 12345 from ten A, ten B, ten C;
            analyze table one_k3 persistent for all;

            The query plan will be:

            explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a;
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
            | id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
            |    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |   100.00 | Using where |
            |    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |   100.00 |             |
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

            Now, add a condition:

            explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a and one_k2.a<10;
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
            | id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
            |    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |     0.99 | Using where |
            |    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |     9.90 |             |
            +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

            one_k2 has values 1...1000, selectivity is 10/1000 = 1%.
            one_k3 has values 1...100. The shown selectivity is 10%.

            However, we fail to take into account that ref access only gives us rows that
            have one_k2.a=one_k3.a < 10. Because of that, the inferred condition
            "one_k3.a < 10" will always be true, and we should have filtered=100%.

            psergei Sergei Petrunia added a comment - Summary of discussion with igor earlier this week: The committed patch doesn't handle all cases. When the multi-equality is a part of ref access, its selectivity is counted twice. Example: create table one_k3 (a int, b int, key(a)); insert into one_k3 select A.a + 10*B.a, 12345 from ten A, ten B, ten C; analyze table one_k3 persistent for all; The query plan will be: explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a; +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | 1 | SIMPLE | one_k2 | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where | | 1 | SIMPLE | one_k3 | ref | a | a | 5 | j22.one_k2.a | 10 | 100.00 | | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ Now, add a condition: explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a and one_k2.a<10; +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | 1 | SIMPLE | one_k2 | ALL | NULL | NULL | NULL | NULL | 1000 | 0.99 | Using where | | 1 | SIMPLE | one_k3 | ref | a | a | 5 | j22.one_k2.a | 10 | 9.90 | | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ one_k2 has values 1...1000, selectivity is 10/1000 = 1%. one_k3 has values 1...100. The shown selectivity is 10%. However, we fail to take into account that ref access only gives us rows that have one_k2.a=one_k3.a < 10. Because of that, the inferred condition "one_k3.a < 10" will always be true, and we should have filtered=100%.

            About the initial example (the one not using ref access):

            Debugging table_cond_selectivity(B)
            Table B uses full table scan (i.e. not ref acces).
            This means, sel=1.

            Then, we go into the part of code that starts with
            " If the field f from the table is equal to a field from one the..."

            We find that B.a is a member in multiple_equality(B.a, A.a).
            (B.a)->cond_selectivity=0.049, we execute

            if (field->cond_selectivity > 0)
            sel/= field->cond_selectivity;

            and get
            sel=20.19

            psergei Sergei Petrunia added a comment - About the initial example (the one not using ref access): Debugging table_cond_selectivity(B) Table B uses full table scan (i.e. not ref acces). This means, sel=1. Then, we go into the part of code that starts with " If the field f from the table is equal to a field from one the..." We find that B.a is a member in multiple_equality(B.a, A.a). (B.a)->cond_selectivity=0.049, we execute if (field->cond_selectivity > 0) sel/= field->cond_selectivity; and get sel=20.19

            So.

            1. When using full table scan, we must not try to discount field->cond_selectivity
            from "sel". With full table scan, we start with sel=1, that is,
            field->cond_selectivity was not in "sel" (it was dealt with in
            matching_cond_for_table)

            2. When using ref access, there are two possibilities:

            2.1 A member of multiple-equality is a part of ref_key. In this case, the
            condition for which field->cond_selectivity was produced (further $COND) will
            be pulled to some preceding table and checked there. Use of ref access on
            this table will guarantee that we will only get rows that already satisfy
            $COND.

            2.2 No member of multiple equality is a part of ref_key.
            In this case, field->cond_selectivity must not be discounted.

            psergei Sergei Petrunia added a comment - So. 1. When using full table scan, we must not try to discount field->cond_selectivity from "sel". With full table scan, we start with sel=1, that is, field->cond_selectivity was not in "sel" (it was dealt with in matching_cond_for_table) 2. When using ref access, there are two possibilities: 2.1 A member of multiple-equality is a part of ref_key. In this case, the condition for which field->cond_selectivity was produced (further $COND) will be pulled to some preceding table and checked there. Use of ref access on this table will guarantee that we will only get rows that already satisfy $COND. 2.2 No member of multiple equality is a part of ref_key. In this case, field->cond_selectivity must not be discounted.

            Debugging the example in the comment (with one_k3 table), I see another problem:

            The code tries to discount for selectivity of "one_k3.a < 3". It reaches the lines

                      if (field->cond_selectivity > 0)
                        sel/= field->cond_selectivity;

            However, field->cond_selectivity==1, because selectivity was obtained using range optimization.

            psergei Sergei Petrunia added a comment - Debugging the example in the comment (with one_k3 table), I see another problem: The code tries to discount for selectivity of "one_k3.a < 3". It reaches the lines if (field->cond_selectivity > 0) sel/= field->cond_selectivity; However, field->cond_selectivity==1, because selectivity was obtained using range optimization.

            Current EITS optimization code has multiple problems. I will not try to fix them all at once.

            Pushed the fix for the problem demonstrated by the example in this issue. The issue demonstrated in the comment dated 2014-04-25 20:29 is branched off into MDEV-6325.

            psergei Sergei Petrunia added a comment - Current EITS optimization code has multiple problems. I will not try to fix them all at once. Pushed the fix for the problem demonstrated by the example in this issue. The issue demonstrated in the comment dated 2014-04-25 20:29 is branched off into MDEV-6325 .

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.