Details
-
New Feature
-
Status: In Progress (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
Q4/2025 Server Development
Description
The idea is to do a simplified implementation of MDEV-13648.
Phase 1: both sides of FULL OUTER JOIN operation must have one table (This also rules out nested full outer joins).
Phase 2: Allow one side to have multiple tables (it will be put first into the join order),
Phase 3: Allow multiple tables on both sides. This gets us to (or close to) to full MDEV-13648.
The rest of the text describes Phase 1. There will be sections about Phase 2 and 3 at the end.
1. Parser
2. Name resolution/semantic analysis
3. Optimizer
3.1 Simplify_joins
3.2 Table elimination
3.3 Range analysis
3.4 Constant table detection
3.5 Join optimization
3.5.1 Allowed join orders
3.5.2 Access method limits
3.5.3 Adding cost of null-complement scan
3.5.4 Join plan data structure
3.6 Plan refinement
3.6.1 Condition for null-complement-scan
4. Join Execution
1. Parser
Fix the parser. (an attempt to just add FULL OUTER JOIN alongside with LEFT and INNER and that causes some mysterious reduce/reduce conflicts: full-outer-join-parser-FAIL.diff ).
The produced data structure for
(t1 FULL OUTER JOIN t2 ON ... )
|
TABLE_LIST(t1) will have a mark noting it's left table of full outer join
TABLE_LIST(t2) will have a mark noting it's right table of full outer join
and on_expr (like right tables of LEFT JOINs do).
2. Name resolution/semantic analysis
Should just happen.
Make sure that tables on both sides of the FULL OUTER JOIN get table->maybe_null=1.
3. Optimizer
3.1 Simplify_joins
Conversion from full outer to left/right/inner join will be done in Phase 2.
This might be a good place to check that for any
t1 FULL OUTER JOIN t2
|
the t1 and t2 are indeed base tables (or un-merged derived) and abort with an error if not.
3.2 Table elimination
In Phase 1: Make sure it's disabled for tables in a full-outer-join.
3.3 Range analysis
Consider
select *
|
from
|
t1 full outer join t2 on on_cond
|
where
|
where_cond
|
Here, we can use on_cond to construct table quick selects. We can't use where_cond with FULL OUTER JOIN.
get_sargable_cond() implements this logic for LEFT JOIN. Need to modify it for FULL OUTER JOIN.
3.4 Constant table detection
If a table that's in full outer join is found to be const table... TODO (Don't mark it as such?
We have some special processing for constant tables on the right side of outer joins)
3.5 Join optimization
(Should JOIN_TAB::dependent bits be set for full outer join peers? I think, no.)
3.5.1 Allowed join orders
If the query has
t1 full outer join t2 on on_cond
then t1 and t2 may only be together in the join order.
It can be ... t1 t2 ... or ... t2 t1...
Need to modify check_interleaving_with_nj() and JOIN::get_allowed_nj_tables()
to enforce that.
(Just have each table remember its full-outer-join peer, if any. If the last table
in the join order has a peer, only allow that peer as the next table)
3.5.2 Access method limits
Disallow use of Join buffer (or hash join) for tables participating in full outer join.
3.5.3 Adding cost of null-complement scan
Suppose we have
FROM ... (t10 full outer join t11 on cond) ...
Then if we've built a join prefix of
join_prefix t10 t11
Then
- Remove the table t11 from join prefix
- Add cost/cardinality of the null-complement-scan.
TODO: is null-complement-scan the best way to read t11 "on its own" or
the best way to read "join_prefix t11" ??
3.5.4 Join plan data structure
Currently the join plan is an array of JOIN_TAB structures:
[ JOIN_TAB_1 ... JOIN_TAB_N ]
|
Some JOIN_TABs may be SJM-Nests and have pointers to child arrays.
For full outer joins, we would generate nested join tabs:
[ JOIN_TAB(...) ... JOIN_TAB(t1) jt2a=JOIN_TAB(t2) ]
|
|
|
v
|
jt2b=JOIN_TAB(t2, null-complement-scan)
|
The primary join tab array describes how to execute
... t1 left join t2 ...
|
but jt2a also has a pointer to single JOIN_TAB object jt2b.
jt2b refers to the same table t2. It is set to execute the null-complement-scan.
3.6 Plan refinement
- Create a temporary table to track records that have been matches. We can reuse class SJ_TMP_TABLE for this.
- In check_join_cache_usage, disallow use of Join buffer (or hash join) for tables participating in full outer join. Yes, we already did it in best_access_path but check_join_cache_usage() doesn't always follow the prescription.
- If first non-constant table participates in full outer join, make test_if_skip_sort_order() not consider join output to be ordered.
3.6.1 Condition for null-complement-scan
If we are executing
... t1 full outer join t2 on on_expr ...
|
and the join order is t1, t2, then we need to extract from on_expr a part of the condition that refers to table t2. We will use it in null-complement-scan.
4. Join Execution
So, we're executing
... t1 full outer join t2 on on_expr ...
|
and the join order is t1,t2 with join_tab structure of:
[ JOIN_TAB(...) ... JOIN_TAB(t1) jt2a=JOIN_TAB(t2) ]
|
|
|
v
|
jt2b=JOIN_TAB(t2, null-complement-scan)
|
sj_weedout_delete_rows
We should put a call sj_weedout_delete_rows() at the start of sub_select() for table t1.
For table t2, evaluate_join_record() should have a call to write t2's rowid.
Attachments
Issue Links
- is part of
-
MDEV-13648 Add FULL OUTER JOIN to MariaDB
-
- In Progress
-