Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Description
InnoDB's records_in_range estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table.
Test dataset:
create table ten(a int); |
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); |
|
create table one_k(a int); |
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; |
|
create table t1 ( |
a int, |
b varchar(32), |
c int, |
key(a), |
key(b) |
) engine=innodb;
|
# 100K NULLs
|
insert into t1 select |
null, |
null, |
A.a + 1000*B.a
|
from
|
one_k A,
|
one_k B
|
where |
B.a<100;
|
|
# 900 K non-NULLs
|
insert into t1 select |
A.a + 1000*B.a,
|
A.a + 1000*B.a,
|
A.a + 1000*B.a
|
from
|
one_k A,
|
one_k B
|
where |
B.a>=100;
|
Now, both a IS NOT NULL or b IS NOT NULL match 900K rows (90% of the table).
But EXPLAIN will show the estimates of about 500K rows, which is 50% of the table:
explain select * from t1 force index (a) where a is not null ;
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
explain select * from t1 force index (b) where b is not null ;
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
When the optimizer was simple, this property was not a problem.
A simple optimizer would only use range estimates to construct range access. Range access is cheaper than full table if it covers about 30% of the table. Returning 50% of the table instead of 90% was not an issue.
A more advanced optimizer also attempts to use range estimates for condition selectivity, etc. Here, returning 50% selectivity instead of 90% is a problem. (One must take into account that selectivity is computed for multiple indexes. For example, for 5 indexes 0.5^2= 1/32 . 32x under-estimation of selectivity)
Attachments
Issue Links
- relates to
-
MDEV-21895 Refactor handler::records_in_range()
- Stalled
-
MDEV-15777 Use inferred IS NOT NULL predicates in the range optimizer
- Closed
-
MDEV-17111 Estimates are low for a condition that satisfies all the rows of an INNODB table
- Open
-
MDEV-21136 InnoDB's records_in_range estimates can be way off
- Closed