Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36055

RESEARCH: Optimise reorderable LEFT JOINs

    XMLWordPrintable

Details

    • Q2/2025 Development, Q3/2025 Server Development

    Description

      Consider a query starting with table t1 and doing two LEFT JOINs to it:

      set optimizer_trace=1;
       
      explain
      select * 
      from
        t1
        left join t3 on t3.key1=t1.a
        left join t2 on t2.key1=t1.a;
      

      +------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
      | id   | select_type | table | type | possible_keys   | key     | key_len | ref     | rows | Extra       |
      +------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL            | NULL    | NULL    | NULL    | 10   |             |
      |    1 | SIMPLE      | t3    | ref  | t3_key2         | t3_key2 | 5       | j1.t1.a | 100  | Using where |
      |    1 | SIMPLE      | t2    | ref  | t2_key1,t3_key1 | t2_key1 | 5       | j1.t1.a | 10   | Using where |
      +------+-------------+-------+------+-----------------+---------+---------+---------+------+-------------+
      

      The optimizer trace shows that t1-t3-t2 is the only acceptable join order:

                {
                  "table_dependencies": [
                    {
                      "table": "t1",
                      "row_may_be_null": false,
                      "map_bit": 0,
                      "depends_on_map_bits": []
                    },
                    {
                      "table": "t3",
                      "row_may_be_null": true,
                      "map_bit": 1,
                      "depends_on_map_bits": ["0"]
                    },
                    {
                      "table": "t2",
                      "row_may_be_null": true,
                      "map_bit": 2,
                      "depends_on_map_bits": ["0", "1"]
                    }
                  ]
      

      But is t1-t2-t3 order also allowed?

      Notes

      MySQL also has this limitation, and it seems so does PostgreSQL.

      Rephrasing the task

        t1
        left join t3 on t3.key1=t1.a
        left join t2 on t2.key1=t1.a
      

      equivalent to

        t1
        left join t2 on t2.key1=t1.a
        left join t3 on t3.key1=t1.a
      

      Q: More generally, given a JOIN, is it possible to change the optimizer such that it picks an equivalent reordered JOIN that is executed faster?
      A: yes, it seems to be easily possible. But when we do that, "pruned_by_heuristic" pruning starts to prune away good join orders in favor of ones with higher cost.

      Other details

      If this changes the way join optimization is done, we'll need a way to provide old behavior.

      Formally speaking

      A function P that preprocesses a join J, including noting down the table dependencies. It produces a join_tab, which is passed to greedy search G to find a good join order:

      G(P(J))

      Denote G1 and P1 to be the algorithm in the base. The basic patch under testing in MDEV-37340 corrects the table dependencies, so we now have a new preprocessor P2 such that

      G1(P2(J) is better than G1(P1(J)) for some J.

      The problem is that there exists join J such that

      G1(P2(J)) is worse than G1(P1(J)).

      Let S( n ) be the set of joins of at most n tables and S be the set of all joins. Apparently we could patch the greedy search to G2 such that for all J in S( n ) for small n, by disabling heuristic pruning for such J,

      G2(P2(J)) is no worse than G1(P1(J))

      The main question is now:

      Find a new G3 such that G3(P2(J)) is no worse than G1(P1(J)) for any join J in S, and still better than G1(P1(J)) for some J.

      More specifically, P2 expands the search space over P1. A greedy search G attempts to find

      argmin_{r in R(P(J))} cost(r),
      

      where R(P(J)) is the set of join orders allowed by P(J). By removing table dependencies, R(P1(J)) is a subset of R(P2(J)). All the permissable join orders are leaves of the search tree for the greedy search. So P2 adds branches and leaves compared to P1. The known problem is that for certain J's some "bad" branches are added, and G1 erroneously thinks these branches are better, and prunes "good" branches in R(P1(J)) instead.

      One idea could be to find a characterisation of the added "bad" branches, and make G remove them based on these characteristics.

      Attachments

        1. mdev-36055-testcase1.sql
          0.5 kB
          Sergei Petrunia
        2. mdev-36055-testcase3-same-fanout.sql
          0.8 kB
          Sergei Petrunia

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.