Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Description
InnoDB's records_in_range estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table.
Test dataset:
create table ten(a int); |
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); |
|
create table one_k(a int); |
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; |
|
create table t1 ( |
a int, |
b varchar(32), |
c int, |
key(a), |
key(b) |
) engine=innodb;
|
# 100K NULLs
|
insert into t1 select |
null, |
null, |
A.a + 1000*B.a
|
from
|
one_k A,
|
one_k B
|
where |
B.a<100;
|
|
# 900 K non-NULLs
|
insert into t1 select |
A.a + 1000*B.a,
|
A.a + 1000*B.a,
|
A.a + 1000*B.a
|
from
|
one_k A,
|
one_k B
|
where |
B.a>=100;
|
Now, both a IS NOT NULL or b IS NOT NULL match 900K rows (90% of the table).
But EXPLAIN will show the estimates of about 500K rows, which is 50% of the table:
explain select * from t1 force index (a) where a is not null ;
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
explain select * from t1 force index (b) where b is not null ;
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
| 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
|
When the optimizer was simple, this property was not a problem.
A simple optimizer would only use range estimates to construct range access. Range access is cheaper than full table if it covers about 30% of the table. Returning 50% of the table instead of 90% was not an issue.
A more advanced optimizer also attempts to use range estimates for condition selectivity, etc. Here, returning 50% selectivity instead of 90% is a problem. (One must take into account that selectivity is computed for multiple indexes. For example, for 5 indexes 0.5^2= 1/32 . 32x under-estimation of selectivity)
Attachments
Issue Links
- relates to
-
MDEV-21895 Refactor handler::records_in_range()
-
- Stalled
-
-
MDEV-15777 Use inferred IS NOT NULL predicates in the range optimizer
-
- Closed
-
-
MDEV-17111 Estimates are low for a condition that satisfies all the rows of an INNODB table
-
- Closed
-
-
MDEV-21136 InnoDB's records_in_range estimates can be way off
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} Now, either {{a IS NOT NULL}} or {{b IS NOT NULL}} should match about 50% of the rows in the table. But EXPLAIN will show the estimates of about 50% of the table: {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} |
Description |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} Now, either {{a IS NOT NULL}} or {{b IS NOT NULL}} should match about 50% of the rows in the table. But EXPLAIN will show the estimates of about 50% of the table: {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} Now, either {{a IS NOT NULL}} or {{b IS NOT NULL}} should match about 50% of the rows in the table. But EXPLAIN will show the estimates of about 50% of the table: {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} When the optimizer was simple, this property was not a problem. A MySQL-5.0 era optimizer |
Description |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} Now, either {{a IS NOT NULL}} or {{b IS NOT NULL}} should match about 50% of the rows in the table. But EXPLAIN will show the estimates of about 50% of the table: {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} When the optimizer was simple, this property was not a problem. A MySQL-5.0 era optimizer |
InnoDB's {{records_in_range}} estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).
If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table. Test dataset: {code:sql} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 ( a int, b varchar(32), c int, key(a), key(b) ) engine=innodb; {code} {code:sql} # 100K NULLs insert into t1 select null, null, A.a + 1000*B.a from one_k A, one_k B where B.a<100; # 900 K non-NULLs insert into t1 select A.a + 1000*B.a, A.a + 1000*B.a, A.a + 1000*B.a from one_k A, one_k B where B.a>=100; {code} Now, both {{a IS NOT NULL}} or {{b IS NOT NULL}} match 900K rows (90% of the table). But EXPLAIN will show the estimates of about 500K rows, which is 50% of the table: {code} explain select * from t1 force index (a) where a is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} {code} explain select * from t1 force index (b) where b is not null ; +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ | 1 | SIMPLE | t1 | range | b | b | 35 | NULL | 494308 | Using index condition | +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+ {code} When the optimizer was simple, this property was not a problem. A simple optimizer would only use range estimates to construct range access. Range access is cheaper than full table if it covers about 30% of the table. Returning 50% of the table instead of 90% was not an issue. A more advanced optimizer also attempts to use range estimates for condition selectivity, etc. Here, returning 50% selectivity instead of 90% is a problem. (One must take into account that selectivity is computed for multiple indexes. For example, for 5 indexes 0.5^2= 1/32 . 32x under-estimation of selectivity) |
Link |
This issue relates to |
Assignee | Sachin Setiya [ sachin.setiya.007 ] |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] |
Link |
This issue relates to |
Labels | records_in_range |
Link |
This issue relates to |
Link | This issue relates to MDEV-21895 [ MDEV-21895 ] |
Fix Version/s | 10.1 [ 16100 ] |
Workflow | MariaDB v3 [ 96631 ] | MariaDB v4 [ 141284 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | Sergei Petrunia [ psergey ] |
Fix Version/s | 10.2 [ 14601 ] |
Fix Version/s | 10.3 [ 22126 ] |
Fix Version/s | 11.0.1 [ 28548 ] | |
Fix Version/s | 10.4(EOL) [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
According to marko, this part of InnoDB code has not been touched by anyone for a very long time.