Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36813

DRAFT: FULL OUTER JOIN for single table

    XMLWordPrintable

Details

    • New Feature
    • Status: In Progress (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.2
    • Optimizer
    • None
    • Q4/2025 Server Development

    Description

      The idea is to do a simplified implementation of MDEV-13648.

      Phase 1: both sides of FULL OUTER JOIN operation must have one table (This also rules out nested full outer joins).
      Phase 2: Allow one side to have multiple tables (it will be put first into the join order),
      Phase 3: Allow multiple tables on both sides. This gets us to (or close to) to full MDEV-13648.

      The rest of the text describes Phase 1. There will be sections about Phase 2 and 3 at the end.

      1. Parser
      2. Name resolution/semantic analysis
      3. Optimizer
      3.1 Simplify_joins
      3.2 Table elimination
      3.3 Range analysis
      3.4 Constant table detection
      3.5 Join optimization
      3.5.1 Allowed join orders
      3.5.2 Access method limits
      3.5.3 Adding cost of null-complement scan
      3.5.4 Join plan data structure
      3.6 Plan refinement
      3.6.1 Condition for null-complement-scan
      4. Join Execution

      1. Parser

      Fix the parser. (an attempt to just add FULL OUTER JOIN alongside with LEFT and INNER and that causes some mysterious reduce/reduce conflicts: full-outer-join-parser-FAIL.diff ).

      The produced data structure for

        (t1 FULL OUTER JOIN t2 ON ... )
      

      TABLE_LIST(t1) will have a mark noting it's left table of full outer join
      TABLE_LIST(t2) will have a mark noting it's right table of full outer join
      and on_expr (like right tables of LEFT JOINs do).

      2. Name resolution/semantic analysis

      Should just happen.

      Make sure that tables on both sides of the FULL OUTER JOIN get table->maybe_null=1.

      3. Optimizer

      3.1 Simplify_joins

      Conversion from full outer to left/right/inner join will be done in Phase 2.

      This might be a good place to check that for any

        t1 FULL OUTER JOIN t2 
      

      the t1 and t2 are indeed base tables (or un-merged derived) and abort with an error if not.

      3.2 Table elimination

      In Phase 1: Make sure it's disabled for tables in a full-outer-join.

      3.3 Range analysis

      Consider

      select * 
      from 
        t1 full outer join t2 on on_cond
      where
        where_cond
      

      Here, we can use on_cond to construct table quick selects. We can't use where_cond with FULL OUTER JOIN.

      get_sargable_cond() implements this logic for LEFT JOIN. Need to modify it for FULL OUTER JOIN.

      3.4 Constant table detection

      If a table that's in full outer join is found to be const table... TODO (Don't mark it as such?
      We have some special processing for constant tables on the right side of outer joins)

      3.5 Join optimization

      (Should JOIN_TAB::dependent bits be set for full outer join peers? I think, no.)

      3.5.1 Allowed join orders

      If the query has

      t1 full outer join t2 on on_cond

      then t1 and t2 may only be together in the join order.
      It can be ... t1 t2 ... or ... t2 t1...

      Need to modify check_interleaving_with_nj() and JOIN::get_allowed_nj_tables()
      to enforce that.

      (Just have each table remember its full-outer-join peer, if any. If the last table
      in the join order has a peer, only allow that peer as the next table)

      3.5.2 Access method limits

      Disallow use of Join buffer (or hash join) for tables participating in full outer join.

      3.5.3 Adding cost of null-complement scan

      Suppose we have

      FROM ... (t10 full outer join t11 on cond) ...

      Then if we've built a join prefix of

      join_prefix t10 t11

      Then

      • Remove the table t11 from join prefix
      • Add cost/cardinality of the null-complement-scan.

      TODO: is null-complement-scan the best way to read t11 "on its own" or
      the best way to read "join_prefix t11" ??

      3.5.4 Join plan data structure

      Currently the join plan is an array of JOIN_TAB structures:

         [ JOIN_TAB_1 ... JOIN_TAB_N ]
      

      Some JOIN_TABs may be SJM-Nests and have pointers to child arrays.

      For full outer joins, we would generate nested join tabs:

         [ JOIN_TAB(...) ...  JOIN_TAB(t1) jt2a=JOIN_TAB(t2) ]
                                                   |
                                                   v
                                           jt2b=JOIN_TAB(t2, null-complement-scan)
      

      The primary join tab array describes how to execute

        ... t1 left join t2 ...
      

      but jt2a also has a pointer to single JOIN_TAB object jt2b.
      jt2b refers to the same table t2. It is set to execute the null-complement-scan.

      3.6 Plan refinement

      • Create a temporary table to track records that have been matches. We can reuse class SJ_TMP_TABLE for this.
      • In check_join_cache_usage, disallow use of Join buffer (or hash join) for tables participating in full outer join. Yes, we already did it in best_access_path but check_join_cache_usage() doesn't always follow the prescription.
      • If first non-constant table participates in full outer join, make test_if_skip_sort_order() not consider join output to be ordered.

      3.6.1 Condition for null-complement-scan

      If we are executing

        ... t1 full outer join t2 on on_expr ...
      

      and the join order is t1, t2, then we need to extract from on_expr a part of the condition that refers to table t2. We will use it in null-complement-scan.

      4. Join Execution

      So, we're executing

        ... t1 full outer join t2 on on_expr ...
      

      and the join order is t1,t2 with join_tab structure of:

         [ JOIN_TAB(...) ...  JOIN_TAB(t1) jt2a=JOIN_TAB(t2) ]
                                                   |
                                                   v
                                           jt2b=JOIN_TAB(t2, null-complement-scan)
      

      sj_weedout_delete_rows

      We should put a call sj_weedout_delete_rows() at the start of sub_select() for table t1.

      For table t2, evaluate_join_record() should have a call to write t2's rowid.

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.