Details
-
Task
-
Status: In Review (View Workflow)
-
Critical
-
Resolution: Unresolved
-
Q2/2025 Development, Q3/2025 Server Development
Description
Consider a query starting with table t1 and doing two LEFT JOINs to it:
set optimizer_trace=1; |
 |
explain
|
select * |
from
|
t1
|
left join t3 on t3.key1=t1.a |
left join t2 on t2.key1=t1.a; |
+------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
|
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | |
|
| 1 | SIMPLE | t3 | ref | t3_key2 | t3_key2 | 5 | j1.t1.a | 100 | Using where |
|
| 1 | SIMPLE | t2 | ref | t2_key1,t3_key1 | t2_key1 | 5 | j1.t1.a | 10 | Using where |
|
+------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
|
The optimizer trace shows that t1-t3-t2 is the only acceptable join order:
{
|
"table_dependencies": [
|
{
|
"table": "t1",
|
"row_may_be_null": false,
|
"map_bit": 0,
|
"depends_on_map_bits": []
|
},
|
{
|
"table": "t3",
|
"row_may_be_null": true,
|
"map_bit": 1,
|
"depends_on_map_bits": ["0"]
|
},
|
{
|
"table": "t2",
|
"row_may_be_null": true,
|
"map_bit": 2,
|
"depends_on_map_bits": ["0", "1"]
|
}
|
]
|
But is t1-t2-t3 order also allowed?
Notes
MySQL also has this limitation, and it seems so does PostgreSQL.
Rephrasing the task
t1
|
left join t3 on t3.key1=t1.a |
left join t2 on t2.key1=t1.a |
equivalent to
t1
|
left join t2 on t2.key1=t1.a |
left join t3 on t3.key1=t1.a |
Q: More generally, given a JOIN, is it possible to change the optimizer such that it picks an equivalent reordered JOIN that is executed faster?
A: yes, it seems to be easily possible. But when we do that, "pruned_by_heuristic" pruning starts to prune away good join orders in favor of ones with higher cost.
Other details
If this changes the way join optimization is done, we'll need a way to provide old behavior.
Formally speaking
A function P that preprocesses a join J, including noting down the table dependencies. It produces a join_tab, which is passed to greedy search G to find a good join order:
G(P(J))
Denote G1 and P1 to be the algorithm in the base. The basic patch under testing in MDEV-37340 corrects the table dependencies, so we now have a new preprocessor P2 such that
G1(P2(J) is better than G1(P1(J)) for some J.
The problem is that there exists join J such that
G1(P2(J)) is worse than G1(P1(J)).
Let S( n ) be the set of joins of at most n tables and S be the set of all joins. Apparently we could patch the greedy search to G2 such that for all J in S( n ) for small n, by disabling heuristic pruning for such J,
G2(P2(J)) is no worse than G1(P1(J))
The main question is now:
Find a new G3 such that G3(P2(J)) is no worse than G1(P1(J)) for any join J in S, and still better than G1(P1(J)) for some J.
More specifically, P2 expands the search space over P1. A greedy search G attempts to find
argmin_{r in R(P(J))} cost(r),
|
where R(P(J)) is the set of join orders allowed by P(J). By removing table dependencies, R(P1(J)) is a subset of R(P2(J)). All the permissable join orders are leaves of the search tree for the greedy search. So P2 adds branches and leaves compared to P1. The known problem is that for certain J's some "bad" branches are added, and G1 erroneously thinks these branches are better, and prunes "good" branches in R(P1(J)) instead.
One idea could be to find a characterisation of the added "bad" branches, and make G remove them based on these characteristics.
Attachments
Issue Links
- blocks
-
MDEV-37274 Re-order independent LEFT JOINs
-
- Stalled
-
- relates to
-
MDEV-13817 add support for oracle outer join syntax - the ( + )
-
- Closed
-
- split to
-
MDEV-37340 Testing for MDEV-36055: Optimise reorderable LEFT JOINs
-
- Open
-
-
MDEV-37346 Join optimizer prunes good join orders
-
- Confirmed
-