[MDEV-5985] EITS: selectivity estimates look illogical for join and non-key equalities - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0.9
Fix Version/s: 10.0.12
Component/s: None
Labels:
- eits

Description

Create a dataset:

create table ten(a int);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table one_k2 as select * from one_k;

set histogram_size=100;

set use_stat_tables='preferably';

set optimizer_use_condition_selectivity=4;

analyze table one_k persistent for all;

analyze table one_k2 persistent for all;

Let's see what histograms give us

explain extended select * from one_k A where A.a < 40;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

explain extended select * from one_k2 B where B.a < 100;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

The numbers looks ok.

Now, let's try a join:

explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |

|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where; Using join buffer (flat, BNL join) |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

Looks ok. Now, let's add a condition.

explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100 and B.a=A.a;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     4.95 | Using where                                     |

|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using where; Using join buffer (flat, BNL join) |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

And filtered% becomes 100%. It used to be 9.90%, we have added another condition into WHERE and now the optimizer expects the condition to be less selective! This looks wrong.

Attachments

Issue Links

relates to

MDEV-6325 EITS: wrong selectivity for REF(t1.key=t0.col) AND COND(t0.col)

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia created issue - 2014-03-31 11:03

Sergei Petrunia made changes - 2014-03-31 11:06

Field	Original Value	New Value
Description	Create a dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table one_k2 as select * from one_k; set histogram_size=100; set use_stat_tables='preferably'; set optimizer_use_condition_selectivity=4; analyze table one_k persistent for all; analyze table one_k2 persistent for all; {noformat} Let's see what histograms give us } explain extended select * from one_k A where A.a < 40; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ {noformat} {noformat} explain extended select * from one_k2 B where B.a < 100; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 9.90 \| Using where \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ {noformat} {noformat} explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 9.90 \| Using where; Using join buffer (flat, BNL join) \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ {noformat} {noformat} explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100 and B.a=A.a; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 100.00 \| Using where; Using join buffer (flat, BNL join) \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ {noformat}	Create a dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table one_k2 as select * from one_k; set histogram_size=100; set use_stat_tables='preferably'; set optimizer_use_condition_selectivity=4; analyze table one_k persistent for all; analyze table one_k2 persistent for all; {noformat} Let's see what histograms give us {noformat} explain extended select * from one_k A where A.a < 40; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ {noformat} {noformat} explain extended select * from one_k2 B where B.a < 100; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 9.90 \| Using where \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ {noformat} The numbers looks ok. Now, let's try a join: {noformat} explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 9.90 \| Using where; Using join buffer (flat, BNL join) \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ {noformat} Looks ok. Now, let's add a condition. {noformat} explain extended select * from one_k A, one_k2 B where A.a < 40 and B.a < 100 and B.a=A.a; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ \| 1 \| SIMPLE \| A \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 4.95 \| Using where \| \| 1 \| SIMPLE \| B \| ALL \| NULL \| NULL \| NULL \| NULL \| 1000 \| 100.00 \| Using where; Using join buffer (flat, BNL join) \| +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ {noformat} And filtered% becomes 100%. It used to be 9.90%, we have added another condition into WHERE and now the optimizer expects the condition to be less selective! This looks wrong.

Sergei Petrunia made changes - 2014-03-31 11:06

Assignee

Igor Babaev [ igor ]

Sergei Petrunia made changes - 2014-03-31 11:06

Assignee

Igor Babaev [ igor ]

Sergei Petrunia [ psergey ]

Sergei Petrunia made changes - 2014-03-31 11:06

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2014-03-31 22:19

Look at the code near this comment:

    If the field f from the table is equal to a field from one the

    earlier joined tables then the selectivity of the range conditions

    over the field f must be discounted.

Maybe, this is the cause of this bug?

Sergei Petrunia added a comment - 2014-03-31 22:19 Look at the code near this comment: If the field f from the table is equal to a field from one the earlier joined tables then the selectivity of the range conditions over the field f must be discounted. Maybe, this is the cause of this bug?

Sergei Petrunia made changes - 2014-04-21 14:42

Fix Version/s

10.0.11 [ 15200 ]

Sergei Petrunia added a comment - 2014-04-21 14:43

Patch submitted for review

Sergei Petrunia added a comment - 2014-04-21 14:43 Patch submitted for review

Sergei Petrunia added a comment - 2014-04-25 19:29

Summary of discussion with igor earlier this week:

The committed patch doesn't handle all cases. When the multi-equality is a part
of ref access, its selectivity is counted twice. Example:

create table one_k3 (a int, b int, key(a));

insert into one_k3 select A.a + 10*B.a, 12345 from ten A, ten B, ten C;

analyze table one_k3 persistent for all;

The query plan will be:

explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a;

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

| id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

|    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |   100.00 | Using where |

|    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |   100.00 |             |

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

Now, add a condition:

explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a and one_k2.a<10;

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

| id   | select_type | table  | type | possible_keys | key  | key_len | ref          | rows | filtered | Extra       |

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

|    1 | SIMPLE      | one_k2 | ALL  | NULL          | NULL | NULL    | NULL         | 1000 |     0.99 | Using where |

|    1 | SIMPLE      | one_k3 | ref  | a             | a    | 5       | j22.one_k2.a |   10 |     9.90 |             |

+------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+

one_k2 has values 1...1000, selectivity is 10/1000 = 1%.
one_k3 has values 1...100. The shown selectivity is 10%.

However, we fail to take into account that ref access only gives us rows that
have one_k2.a=one_k3.a < 10. Because of that, the inferred condition
"one_k3.a < 10" will always be true, and we should have filtered=100%.

Sergei Petrunia added a comment - 2014-04-25 19:29 Summary of discussion with igor earlier this week: The committed patch doesn't handle all cases. When the multi-equality is a part of ref access, its selectivity is counted twice. Example: create table one_k3 (a int, b int, key(a)); insert into one_k3 select A.a + 10*B.a, 12345 from ten A, ten B, ten C; analyze table one_k3 persistent for all; The query plan will be: explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a; +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | 1 | SIMPLE | one_k2 | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where | | 1 | SIMPLE | one_k3 | ref | a | a | 5 | j22.one_k2.a | 10 | 100.00 | | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ Now, add a condition: explain extended select * from one_k2 straight_join one_k3 where one_k2.a=one_k3.a and one_k2.a<10; +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ | 1 | SIMPLE | one_k2 | ALL | NULL | NULL | NULL | NULL | 1000 | 0.99 | Using where | | 1 | SIMPLE | one_k3 | ref | a | a | 5 | j22.one_k2.a | 10 | 9.90 | | +------+-------------+--------+------+---------------+------+---------+--------------+------+----------+-------------+ one_k2 has values 1...1000, selectivity is 10/1000 = 1%. one_k3 has values 1...100. The shown selectivity is 10%. However, we fail to take into account that ref access only gives us rows that have one_k2.a=one_k3.a < 10. Because of that, the inferred condition "one_k3.a < 10" will always be true, and we should have filtered=100%.

Sergei Petrunia added a comment - 2014-04-25 23:00

About the initial example (the one not using ref access):

Debugging table_cond_selectivity(B)
Table B uses full table scan (i.e. not ref acces).
This means, sel=1.

Then, we go into the part of code that starts with
" If the field f from the table is equal to a field from one the..."

We find that B.a is a member in multiple_equality(B.a, A.a).
(B.a)->cond_selectivity=0.049, we execute

if (field->cond_selectivity > 0)
sel/= field->cond_selectivity;

and get
sel=20.19

Sergei Petrunia added a comment - 2014-04-25 23:00 About the initial example (the one not using ref access): Debugging table_cond_selectivity(B) Table B uses full table scan (i.e. not ref acces). This means, sel=1. Then, we go into the part of code that starts with " If the field f from the table is equal to a field from one the..." We find that B.a is a member in multiple_equality(B.a, A.a). (B.a)->cond_selectivity=0.049, we execute if (field->cond_selectivity > 0) sel/= field->cond_selectivity; and get sel=20.19

Sergei Petrunia added a comment - 2014-04-26 00:56

So.

1. When using full table scan, we must not try to discount field->cond_selectivity
from "sel". With full table scan, we start with sel=1, that is,
field->cond_selectivity was not in "sel" (it was dealt with in
matching_cond_for_table)

2. When using ref access, there are two possibilities:

2.1 A member of multiple-equality is a part of ref_key. In this case, the
condition for which field->cond_selectivity was produced (further $COND) will
be pulled to some preceding table and checked there. Use of ref access on
this table will guarantee that we will only get rows that already satisfy
$COND.

2.2 No member of multiple equality is a part of ref_key.
In this case, field->cond_selectivity must not be discounted.

Sergei Petrunia added a comment - 2014-04-26 00:56 So. 1. When using full table scan, we must not try to discount field->cond_selectivity from "sel". With full table scan, we start with sel=1, that is, field->cond_selectivity was not in "sel" (it was dealt with in matching_cond_for_table) 2. When using ref access, there are two possibilities: 2.1 A member of multiple-equality is a part of ref_key. In this case, the condition for which field->cond_selectivity was produced (further $COND) will be pulled to some preceding table and checked there. Use of ref access on this table will guarantee that we will only get rows that already satisfy $COND. 2.2 No member of multiple equality is a part of ref_key. In this case, field->cond_selectivity must not be discounted.

Sergei Petrunia added a comment - 2014-04-28 22:50

Debugging the example in the comment (with one_k3 table), I see another problem:

The code tries to discount for selectivity of "one_k3.a < 3". It reaches the lines

          if (field->cond_selectivity > 0)

            sel/= field->cond_selectivity;

However, field->cond_selectivity==1, because selectivity was obtained using range optimization.

Sergei Petrunia added a comment - 2014-04-28 22:50 Debugging the example in the comment (with one_k3 table), I see another problem: The code tries to discount for selectivity of "one_k3.a < 3". It reaches the lines if (field->cond_selectivity > 0) sel/= field->cond_selectivity; However, field->cond_selectivity==1, because selectivity was obtained using range optimization.

Sergei Golubchik made changes - 2014-05-10 23:12

Fix Version/s		10.0.12 [ 15201 ]
Fix Version/s	10.0.11 [ 15200 ]

Sergei Petrunia made changes - 2014-06-03 23:31

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Petrunia made changes - 2014-06-10 23:05

Link

This issue relates to ~~MDEV-6325~~ [ ~~MDEV-6325~~ ]

Sergei Petrunia added a comment - 2014-06-10 23:07

Current EITS optimization code has multiple problems. I will not try to fix them all at once.

Pushed the fix for the problem demonstrated by the example in this issue. The issue demonstrated in the comment dated 2014-04-25 20:29 is branched off into ~~MDEV-6325~~.

Sergei Petrunia added a comment - 2014-06-10 23:07 Current EITS optimization code has multiple problems. I will not try to fix them all at once. Pushed the fix for the problem demonstrated by the example in this issue. The issue demonstrated in the comment dated 2014-04-25 20:29 is branched off into MDEV-6325 .

Sergei Petrunia made changes - 2014-06-10 23:08

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Petrunia made changes - 2014-06-10 23:09

Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2014-06-13 15:06

Workflow

defaullt [ 37900 ]

MariaDB v2 [ 43827 ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 43827 ]

MariaDB v3 [ 63026 ]

Sergei Golubchik made changes - 2021-12-06 21:39

Workflow

MariaDB v3 [ 63026 ]

MariaDB v4 [ 147723 ]

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2014-03-31 11:03

Updated:: 2014-06-10 23:09

Resolved:: 2014-06-10 23:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration