Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6
Description
Discovered this when analyzing poor performance in TODO-4858.
MySQL/ MariaDB has code that tries to adjust estimated #rows for ref access based on available quick select estimates for the same index. (Grep for ReuseRangeEstimateForRef in sql_select.cc).
This code may malfunction for multi-part indexes. The estimate from range access is not reused. If the data distribution is not uniform, this causes the optimizer to use a very wrong estimate and so pick a bad query plan.
Testcase:
create table t0 ( |
a int, |
b int, |
dummy int |
);
|
insert into t0 select seq,seq,seq from seq_1_to_10; |
|
create table t1 ( |
pk1 int, |
pk2 int, |
pk3 int, |
key1 int, |
key(key1), |
filler char(100), |
primary key(pk1,pk2,pk3) |
);
|
|
insert into t1 |
select
|
seq, seq, seq,
|
FLOOR(seq/2),
|
'filler-data' |
from seq_1_to_10000; |
analyze table t1; |
update t1 set pk1=1 where pk1 between 1 and 200; |
create table t2 ( |
col int |
);
|
insert into t2 select seq from seq_1_to_10000; |
On average, there is just one row with with t1.pk1=... :
MariaDB [test]> explain select * from t0,t1 where t1.pk1=t0.a;
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | test.t0.a | 1 | |
|
+------+-------------+-------+------+---------------+---------+---------+-----------+------+-------------+
|
However the value of pk1=1 is an outlier with 200 rows:
MariaDB [test]> explain select * from t1 where pk1=1;
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------+
|
In basic case, the optimizer can see this and use key1 instead of PK:
MariaDB [test]> explain select * from t0, t1, t2 where t1.pk1=1 and t1.key1=t0.b;
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
|
| 1 | SIMPLE | t1 | ref | PRIMARY,key1 | key1 | 9 | test.t0.b,const | 1 | |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+------+---------+-----------------+------+------------------------------------+
|
If we force use of PK, #rows shows 200:
MariaDB [test]> explain select * from t0, t1 FORCE INDEX(PRIMARY), t2 where t1.pk1=1 and t1.key1=t0.b;
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
|
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | const | 200 | Using where |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+------------------------------------+
|
However if we make situation convoluted by adding unusable conditions on pk2 and pk3:
explain select * from t0, t1, t2 |
where
|
t1.pk1=1 and t1.pk2=t2.col and t1.pk3=t0.dummy and |
t1.key1=t0.b;
|
we'll get key=PRIMARY, rows=1:
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
|
| 1 | SIMPLE | t1 | ref | PRIMARY,key1 | PRIMARY | 4 | const | 1 | Using where |
|
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------------------------------------------+
|