[MDEV-4362] Selectivity estimates for IN (...) do not depend on whether the values are in range - Jira

XML

Word

Printable

Create the dataset:

create table t5 (col1 int);

set @a=-1;

create table one_k (a int)  select (@a:=@a+1) as a from information_schema.session_variables A, information_schema.session_variables B limit 1000;

insert into t5 select A.a from one_k A, one_k B where A.a < 100 and B.a < 100;

set histogram_size=100;

analyze table t5 persistent for all;

select *, hex(histogram) from mysql.column_stats where table_name='t5'\G

*************************** 1. row ***************************

       db_name: j10

    table_name: t5

   column_name: col1

     min_value: 0

     max_value: 99

   nulls_ratio: 0.0000

    avg_length: 4.0000

 avg_frequency: 100.0000

     hist_size: 100

     hist_type:

     histogram:  (100 bytes here)

Ok, so we've got a table with 100 rows of 0, 100 rows of 1, and so forth up to 99.

Let's see how estimating works:

MariaDB [j10]> explain extended select * from t5 where col1 in (1,2,3);

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

|    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.79 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

1 row in set, 1 warning (10.64 sec)

The real selectivity is 3%, we've got 3.79. Good.

Now, let's try values that are certainly not in the table:

MariaDB [j10]> explain extended select * from t5 where col1 in (-1,-2,-3);

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

|    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.79 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

OOps, again 3.79%.

Let's see what non-equality range shows:

MariaDB [j10]> explain extended select * from t5 where col1<=-1;

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

|    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     0.99 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

1%. It's better.

I consider selecitivity obtained for "where col1 in (-1,-2,-3)" to be a bug.

relates to

MDEV-4145 Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.