Details
-
Technical task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.4
-
None
-
None
-
Q1/2026 Server Development, Q1/2026 Server Maintenance
Description
(split from MDEV-38129)
When estimating number of rows produced by a join after `ref` access (i.e, lookups into a non-unique index), the optimizer assumes all driving table ( * ) values will find matches in the inner table ( ** ). This causes overestimation when the driving table has more distinct values than the inner table's key.
( * ) driving table is the one preceding in the join order
( ** ) inner table is the one following the driving table in the join order
Example
CREATE TABLE t_driving (a INT); -- 100 rows, 100 distinct values |
CREATE TABLE t_inner (a INT, KEY(a)); -- 300 rows, 3 distinct values (0,1,2) |
INSERT INTO t_driving SELECT seq FROM seq_1_to_100; |
INSERT INTO t_inner SELECT seq % 3 FROM seq_1_to_300; |
SELECT * FROM t_driving JOIN t_inner ON t_driving.a = t_inner.a; |
rec_per_key for KEY(t_inner.a) == 100.
Current estimate: 100 records in `t_driving` * 100 `rec_per_key` = 10,000 rows.
Actual: Only values 0,1,2 match => 300 rows.