Details
Description
Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not.
LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time.
The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive.
Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous.
Long
(This is based on a real-world case but the dataset I am using is totally artificial)
Tables that will make a prefix before the lateral:
# 5 values
|
create table t1(a int, b int);
|
insert into t1 select seq,seq from seq_1_to_5;
|
|
# 5 value groups of size 2 each
|
create table t2(a int, b int, key(a));
|
insert into t2
|
select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B;
|
|
# 5 value groups of size 3 each
|
create table t3(a int, b int, key(a));
|
insert into t3
|
select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B;
|
|
analyze table t1,t2,t3 persistent for all;
|
explain
|
select * from
|
(t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b;
|
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
|
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | |
|
| 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where |
|
| 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
|
Now, tables for the LATERAL DERIVED:
create table t10 (
|
grp_id int,
|
col1 int,
|
key(grp_id)
|
);
|
|
# 100 groups of 100 values each
|
insert into t10
|
select
|
A.seq,
|
B.seq
|
from
|
seq_1_to_100 A,
|
seq_1_to_100 B;
|
|
# and X10 multiplier
|
create table t11 (
|
col1 int,
|
col2 int
|
);
|
insert into t11
|
select A.seq, A.seq from seq_1_to_10 A;
|
|
analyze table t10,t11 persistent for all;
|
Now, the query:
explain
|
select * from |
(
|
(t1 left join t2 on t2.a=t1.b) |
left join t3 on t3.a=t1.b |
) left join (select grp_id, count(*) |
from t10 left join t11 on t11.col1=t10.col1 |
group by grp_id) T on T.grp_id=t1.b; |
Note that the join is between T and t1, the first table. t1 has 5 rows. However, the optimizer assumes that the table is refilled t1xt2xt3: 30 times.
Because of that, it doesn't pick LATERAL DERIVED:
+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | |
|
| 1 | PRIMARY | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where |
|
| 1 | PRIMARY | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where |
|
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 5 | test.t1.b | 1000 | Using where |
|
| 2 | DERIVED | t10 | ALL | grp_id | NULL | NULL | NULL | 10000 | Using temporary; Using filesort |
|
| 2 | DERIVED | t11 | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (flat, BNL join) |
|
+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+
|
Attachments
Issue Links
- causes
-
MDEV-29371 Rowid Filtering optimization can cause slowdowns
-
- Open
-
-
MDEV-31240 Crash with condition pushable into derived and containing outer reference
-
- Closed
-
-
MDEV-31241 Crashes in subselects in choose_best_splitting after upgrade to 10.6.13
-
- Closed
-
-
MDEV-31301 sql/opt_split.cc:1043:5: warning: ‘best_param_tables’ may be used uninitialized
-
- Closed
-
-
MDEV-31408 Second SELECT from VIEW based on information_schema.optimizer_trace gives NULL result
-
- Closed
-
-
MDEV-31440 choose_best_splitting: crash on update query using correlated subquery after minor update
-
- Confirmed
-
-
MDEV-32064 Crash when searching for the best split of derived table
-
- Closed
-
-
MDEV-32571 A specific query crashes MariaDB in choose_best_splitting
-
- Closed
-
-
MDEV-32602 Server crash after query
-
- Closed
-
-
MDEV-32624 Mariadb crashes with query
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to MENT-1259 [ MENT-1259 ] |
Assignee | Sergei Petrunia [ psergey ] |
Affects Version/s | 10.5 [ 23123 ] | |
Affects Version/s | 10.6 [ 24028 ] |
Component/s | Optimizer [ 10200 ] |
Fix Version/s | 10.5 [ 23123 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Labels | split_materialized |
Description |
LATERAL DERIVED optimization will refill the temp. table every time the derived table is accessed, regardless of whether the inputs were changed or not.
The cost calculations also make this assumption. Provided that the join optimizer typically over-estimates the partial join cardinalities, this makes the optimizer think that using LATERAL DERIVED optimization is more expensive than it actually is. |
h2. Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not. LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time. The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive. Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous. h2. Long (This is based on a real-world case but the dataset I am using is totally artificial) Tables that will make a prefix before the lateral: {code} # 5 values create table t1(a int, b int); insert into t1 select seq,seq from seq_1_to_5; # 5 value groups of size 2 each create table t2(a int, b int, key(a)); insert into t2 select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B; # 5 value groups of size 3 each create table t3(a int, b int, key(a)); insert into t3 select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B; analyze table t1,t2,t3 persistent for all; {code} {code} explain select * from (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b; {code} {code} +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ {code} |
Description |
h2. Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not. LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time. The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive. Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous. h2. Long (This is based on a real-world case but the dataset I am using is totally artificial) Tables that will make a prefix before the lateral: {code} # 5 values create table t1(a int, b int); insert into t1 select seq,seq from seq_1_to_5; # 5 value groups of size 2 each create table t2(a int, b int, key(a)); insert into t2 select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B; # 5 value groups of size 3 each create table t3(a int, b int, key(a)); insert into t3 select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B; analyze table t1,t2,t3 persistent for all; {code} {code} explain select * from (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b; {code} {code} +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ {code} |
h2. Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not. LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time. The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive. Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous. h2. Long (This is based on a real-world case but the dataset I am using is totally artificial) Tables that will make a prefix before the lateral: {code} # 5 values create table t1(a int, b int); insert into t1 select seq,seq from seq_1_to_5; # 5 value groups of size 2 each create table t2(a int, b int, key(a)); insert into t2 select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B; # 5 value groups of size 3 each create table t3(a int, b int, key(a)); insert into t3 select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B; analyze table t1,t2,t3 persistent for all; {code} {code} explain select * from (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b; {code} {code} +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ {code} Now, tables for the LATERAL DERIVED: {code} create table t10 ( grp_id int, col1 int, key(grp_id) ); # 100 groups of 100 values each insert into t10 select A.seq, B.seq from seq_1_to_100 A, seq_1_to_100 B; # and X10 multiplier create table t11 ( col1 int, col2 int ); insert into t11 select A.seq, A.seq from seq_1_to_10 A; analyze table t10,t11 persistent for all; {code} |
Description |
h2. Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not. LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time. The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive. Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous. h2. Long (This is based on a real-world case but the dataset I am using is totally artificial) Tables that will make a prefix before the lateral: {code} # 5 values create table t1(a int, b int); insert into t1 select seq,seq from seq_1_to_5; # 5 value groups of size 2 each create table t2(a int, b int, key(a)); insert into t2 select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B; # 5 value groups of size 3 each create table t3(a int, b int, key(a)); insert into t3 select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B; analyze table t1,t2,t3 persistent for all; {code} {code} explain select * from (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b; {code} {code} +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ {code} Now, tables for the LATERAL DERIVED: {code} create table t10 ( grp_id int, col1 int, key(grp_id) ); # 100 groups of 100 values each insert into t10 select A.seq, B.seq from seq_1_to_100 A, seq_1_to_100 B; # and X10 multiplier create table t11 ( col1 int, col2 int ); insert into t11 select A.seq, A.seq from seq_1_to_10 A; analyze table t10,t11 persistent for all; {code} |
h2. Short
LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not. LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time. The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive. Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous. h2. Long (This is based on a real-world case but the dataset I am using is totally artificial) Tables that will make a prefix before the lateral: {code} # 5 values create table t1(a int, b int); insert into t1 select seq,seq from seq_1_to_5; # 5 value groups of size 2 each create table t2(a int, b int, key(a)); insert into t2 select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B; # 5 value groups of size 3 each create table t3(a int, b int, key(a)); insert into t3 select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B; analyze table t1,t2,t3 persistent for all; {code} {code} explain select * from (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b; {code} {code} +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | SIMPLE | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ {code} Now, tables for the LATERAL DERIVED: {code} create table t10 ( grp_id int, col1 int, key(grp_id) ); # 100 groups of 100 values each insert into t10 select A.seq, B.seq from seq_1_to_100 A, seq_1_to_100 B; # and X10 multiplier create table t11 ( col1 int, col2 int ); insert into t11 select A.seq, A.seq from seq_1_to_10 A; analyze table t10,t11 persistent for all; {code} Now, the query: {code:sql} explain select * from ( (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b ) left join (select grp_id, count(*) from t10 left join t11 on t11.col1=t10.col1 group by grp_id) T on T.grp_id=t1.b; {code} Note that the join is between T and t1, the first table. t1 has 5 rows. However, the optimizer assumes that the table is refilled t1xt2xt3: 30 times. Because of that, it doesn't pick LATERAL DERIVED: {code} +------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | PRIMARY | t2 | ref | a | a | 5 | test.t1.b | 2 | Using where | | 1 | PRIMARY | t3 | ref | a | a | 5 | test.t1.b | 3 | Using where | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 5 | test.t1.b | 1000 | Using where | | 2 | DERIVED | t10 | ALL | grp_id | NULL | NULL | NULL | 10000 | Using temporary; Using filesort | | 2 | DERIVED | t11 | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (flat, BNL join) | +------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+ {code} |
Workflow | MariaDB v3 [ 124078 ] | MariaDB v4 [ 144608 ] |
Link | This issue causes MDEV-29371 [ MDEV-29371 ] |
Summary | LATERAL DERIVED refills the temp. table too many times | Split optimization refills temp. table too many times |
Summary | Split optimization refills temp. table too many times | Split optimization refills temporary table too many times |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] |
Affects Version/s | 10.4 [ 22408 ] |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Attachment | spetrunia-mdev26301-tracing.diff [ 69858 ] |
Link | This issue blocks TODO-3922 [ TODO-3922 ] |
Fix Version/s | 10.4.29 [ 28510 ] | |
Fix Version/s | 10.5.20 [ 28512 ] | |
Fix Version/s | 10.6.13 [ 28514 ] | |
Fix Version/s | 10.9.6 [ 28520 ] | |
Fix Version/s | 10.10.4 [ 28522 ] | |
Fix Version/s | 10.11.3 [ 28524 ] | |
Fix Version/s | 11.1.1 [ 28704 ] | |
Fix Version/s | 10.8.8 [ 28518 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link | This issue causes MDEV-31440 [ MDEV-31440 ] |
Link |
This issue causes |
Zendesk Related Tickets | 109159 |