[MDEV-5985] EITS: selectivity estimates look illogical for join and non-key equalities Created: 2014-03-31  Updated: 2014-06-10  Resolved: 2014-06-10

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.9
Fix Version/s: 10.0.12

Type: Bug Priority: Critical
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: eits

Issue Links:
Relates
relates to MDEV-6325 EITS: wrong selectivity for REF(t1.ke... Closed

 Description   

Create a dataset:

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
create table one_k2 as select * from one_k;
set histogram_size=100;
set use_stat_tables='preferably';
set optimizer_use_condition_selectivity=4;
analyze table one_k persistent for all;
analyze table one_k2 persistent for all;

Let's see what histograms give us

explain extended select * from one_k A where A.a < 40;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

explain extended select * from one_k2 B where B.a < 100;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

The numbers looks ok.

Now, let's try a join:

explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |
|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

Looks ok. Now, let's add a condition.

explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100 and B.a=A.a;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |
|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

And filtered% becomes 100%. It used to be 9.90%, we have added another condition into WHERE and now the optimizer expects the condition to be less selective! This looks wrong.



 Comments   
Comment by Sergei Petrunia [ 2014-03-31 ]

Look at the code near this comment:

    If the field f from the table is equal to a field from one the
    earlier joined tables then the selectivity of the range conditions
    over the field f must be discounted.

Maybe, this is the cause of this bug?

Comment by Sergei Petrunia [ 2014-04-21 ]

Patch submitted for review

Comment by Sergei Petrunia [ 2014-04-25 ]

Summary of discussion with igor earlier this week:

The committed patch doesn't handle all cases. When the multi-equality is a part
of ref access, its selectivity is counted twice. Example:

create table one_k3 (a int, b int, key(a));
insert into one_k3 select A.a + 10*B.a, 12345 from ten A, ten B, ten C;
analyze table one_k3 persistent for all;

The query plan will be:

explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a;
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
| id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
|    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |   100.00 | Using where |
|    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |   100.00 |             |
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

Now, add a condition:

explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a and one_k2.a<10;
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
| id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+
|    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |     0.99 | Using where |
|    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |     9.90 |             |
+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

one_k2 has values 1...1000, selectivity is 10/1000 = 1%.
one_k3 has values 1...100. The shown selectivity is 10%.

However, we fail to take into account that ref access only gives us rows that
have one_k2.a=one_k3.a < 10. Because of that, the inferred condition
"one_k3.a < 10" will always be true, and we should have filtered=100%.

Comment by Sergei Petrunia [ 2014-04-25 ]

About the initial example (the one not using ref access):

Debugging table_cond_selectivity(B)
Table B uses full table scan (i.e. not ref acces).
This means, sel=1.

Then, we go into the part of code that starts with
" If the field f from the table is equal to a field from one the..."

We find that B.a is a member in multiple_equality(B.a, A.a).
(B.a)->cond_selectivity=0.049, we execute

if (field->cond_selectivity > 0)
sel/= field->cond_selectivity;

and get
sel=20.19

Comment by Sergei Petrunia [ 2014-04-26 ]

So.

1. When using full table scan, we must not try to discount field->cond_selectivity
from "sel". With full table scan, we start with sel=1, that is,
field->cond_selectivity was not in "sel" (it was dealt with in
matching_cond_for_table)

2. When using ref access, there are two possibilities:

2.1 A member of multiple-equality is a part of ref_key. In this case, the
condition for which field->cond_selectivity was produced (further $COND) will
be pulled to some preceding table and checked there. Use of ref access on
this table will guarantee that we will only get rows that already satisfy
$COND.

2.2 No member of multiple equality is a part of ref_key.
In this case, field->cond_selectivity must not be discounted.

Comment by Sergei Petrunia [ 2014-04-28 ]

Debugging the example in the comment (with one_k3 table), I see another problem:

The code tries to discount for selectivity of "one_k3.a < 3". It reaches the lines

          if (field->cond_selectivity > 0)
            sel/= field->cond_selectivity;

However, field->cond_selectivity==1, because selectivity was obtained using range optimization.

Comment by Sergei Petrunia [ 2014-06-10 ]

Current EITS optimization code has multiple problems. I will not try to fix them all at once.

Pushed the fix for the problem demonstrated by the example in this issue. The issue demonstrated in the comment dated 2014-04-25 20:29 is branched off into MDEV-6325.

Generated at Thu Feb 08 07:08:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.