Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
Description
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).
This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.
Testcase:
create table t0 ( |
a int, |
b int, |
dummy int |
);
|
insert into t0 select seq,seq,seq from seq_1_to_10; |
|
create table t1 ( |
pk1 int, |
pk2 int, |
pk3 int, |
key1 int, |
key(key1), |
filler char(100), |
primary key(pk1,pk2,pk3) |
);
|
|
insert into t1 |
select
|
seq, seq, seq,
|
FLOOR(seq/2),
|
'filler-data' |
from seq_1_to_10000; |
analyze table t1; |
update t1 set pk1=1 where pk1 between 1 and 200; |
create table t2 ( |
col int |
);
|
insert into t2 select seq from seq_1_to_10000; |
On average, there is just one row with with t1.pk1=... :
MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | |
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
However the value of pk1=1 is an outlier with 200 rows:
MariaDB [test]> explain select * from t1 where pk1=1;
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
In basic case, the optimizer can see this and use key1 instead of PK:
MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b;
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
|
| 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
If we force use of PK, #rows shows 200:
MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b;
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
However if we make situation convoluted by adding unusable conditions on pk2 and pk3:
explain select * from t0, t1, t2 |
where
|
t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and |
t1.key1=t0.b;
|
we'll get key=PRIMARY, rows=1:
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
|
| 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to TODO-4858 [ TODO-4858 ] |
Description |
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc). This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan. Testcase: {code} create table t0 ( a int, b int, dummy int ); insert into t0 select seq,seq,seq from seq_1_to_10; create table t1 ( pk1 int, pk2 int, pk3 int, key1 int, key(key1), filler char(100), primary key(pk1,pk2,pk3) ); insert into t1 select seq, seq, seq, FLOOR(seq/2), 'filler-data' from seq_1_to_10000; analyze table t1; update t1 set pk1=1 where pk1 between 1 and 200; create table t2 ( col int ); insert into t2 select seq from seq_1_to_10000; {code} |
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc). This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan. Testcase: {code:sql} create table t0 ( a int, b int, dummy int ); insert into t0 select seq,seq,seq from seq_1_to_10; create table t1 ( pk1 int, pk2 int, pk3 int, key1 int, key(key1), filler char(100), primary key(pk1,pk2,pk3) ); insert into t1 select seq, seq, seq, FLOOR(seq/2), 'filler-data' from seq_1_to_10000; analyze table t1; update t1 set pk1=1 where pk1 between 1 and 200; create table t2 ( col int ); insert into t2 select seq from seq_1_to_10000; {code} On average, there is just one row with with t1.pk1=... : {code} MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a; +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ {code} However the value of pk1=1 is an outlier with 200 rows: {code} MariaDB [test]> explain select * from t1 where pk1=1; +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ {code} In basic case, the optimizer can see this and use key1 instead of PK: {code} MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ {code} If we force use of PK, #rows shows 200: {code} MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ {code} However if we make situation convoluted by adding unusable conditions on pk2 and pk3: {code:sql} explain select * from t0, t1, t2 where t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and t1.key1=t0.b; we'll get key=PRIMARY, rows=1: {code} {code} +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ {code} |
Description |
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc). This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan. Testcase: {code:sql} create table t0 ( a int, b int, dummy int ); insert into t0 select seq,seq,seq from seq_1_to_10; create table t1 ( pk1 int, pk2 int, pk3 int, key1 int, key(key1), filler char(100), primary key(pk1,pk2,pk3) ); insert into t1 select seq, seq, seq, FLOOR(seq/2), 'filler-data' from seq_1_to_10000; analyze table t1; update t1 set pk1=1 where pk1 between 1 and 200; create table t2 ( col int ); insert into t2 select seq from seq_1_to_10000; {code} On average, there is just one row with with t1.pk1=... : {code} MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a; +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ {code} However the value of pk1=1 is an outlier with 200 rows: {code} MariaDB [test]> explain select * from t1 where pk1=1; +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ {code} In basic case, the optimizer can see this and use key1 instead of PK: {code} MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ {code} If we force use of PK, #rows shows 200: {code} MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ {code} However if we make situation convoluted by adding unusable conditions on pk2 and pk3: {code:sql} explain select * from t0, t1, t2 where t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and t1.key1=t0.b; we'll get key=PRIMARY, rows=1: {code} {code} +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ {code} |
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc). This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan. Testcase: {code:sql} create table t0 ( a int, b int, dummy int ); insert into t0 select seq,seq,seq from seq_1_to_10; create table t1 ( pk1 int, pk2 int, pk3 int, key1 int, key(key1), filler char(100), primary key(pk1,pk2,pk3) ); insert into t1 select seq, seq, seq, FLOOR(seq/2), 'filler-data' from seq_1_to_10000; analyze table t1; update t1 set pk1=1 where pk1 between 1 and 200; create table t2 ( col int ); insert into t2 select seq from seq_1_to_10000; {code} On average, there is just one row with with t1.pk1=... : {code} MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a; +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | | +------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+ {code} However the value of pk1=1 is an outlier with 200 rows: {code} MariaDB [test]> explain select * from t1 where pk1=1; +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------+ {code} In basic case, the optimizer can see this and use key1 instead of PK: {code} MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+ {code} If we force use of PK, #rows shows 200: {code} MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b; +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+ {code} However if we make situation convoluted by adding unusable conditions on pk2 and pk3: {code:sql} explain select * from t0, t1, t2 where t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and t1.key1=t0.b; {code} we'll get key=PRIMARY, rows=1: {code} +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where | | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+ {code} |
Status | Open [ 1 ] | In Progress [ 3 ] |
Fix Version/s | 10.6.20 [ 29903 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Fix Version/s | 10.11.10 [ 29904 ] | |
Fix Version/s | 11.2.6 [ 29906 ] | |
Fix Version/s | 11.4.4 [ 29907 ] |
Fix Version/s | 11.6.2 [ 29908 ] |
Labels | cardinality |