Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34720

Poor plan choice for large JOIN with ORDER BY and small LIMIT

Details

    Description

      Task setting

      This is a subset of MDEV-8306 that hits the customer.
      They have

      select 
      ...
      from 
         big_table
         join small_table1 on small_table1.key=big_table.key
         join small_table2 on small_table2.key=big_table.key
         ... and so forth ...
      order by bug_table.key limit N
      

      for a very small value of N.
      The optimizer sets

        join->sort_by_table = (big_table);
      

      so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST).

      But this doesn't help in all cases. Join orders of

         big_table, small_table1, small_table_2 ....
      

      and

         small_table1, big_table (using ref access), small_table2, ...
      

      have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close.
      TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too).

      What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account.

      Doing it in general case requires everything from MDEV-8306.

      This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit.
      The new behavior is to be controlled by a setting.
      If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice.

      Implementation

      First, identify sort_by_table - the table that allows to use LIMIT short-cutting.
      (Due to multiple equalities there can be multiple but we can assume it's just one table)

      Do adjustment of cost/cardinality only for full join orders

      Let's assume we've built a complete join order (ignoring ORDER BY ... LIMIT).
      Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly.

      (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this).

      Adjusting join order cost and cardinality

      Do it as follows:
      1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now).

      Otherwise, we have two options:

      2A: Change the access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above.

      2B. Pass sort_by_table to filesort. We will need to spend the entire cost of reading sort_by_table but then we will be able to short-cut joining with subsequent tables.

      Interplay with join optimizer pruning

      So, we will adjust the costs only for complete join orders. The adjustment will reduce the cost and output-records.

      Join pruning compares costs of prefix with full join order costs. Comparing un-adjusted costs with adjusted can produce wrong results.

      • we can prune un-adjusted prefix that will be adjusted

      Hooking into the join optimizer - variant 1

      First, run the join optimization as usual. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial).

      Save join->best_read and join->best_positions.

      Then, start the optimization again:
      set join->best_read= DBL_MAX,
      Run the join optimization with first table=sort_by_table.
      This produces QUERY_PLAN2 query plan that can take advantage of short-cutting.
      Normally it's more expensive than QUERY_PLAN1.

      Now, account for LIMIT short-cutting as specified in Adjusting join order's cost and cardinality section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1.

      Hooking into the join optimizer - variant 2

      Make the join optimizer first start with sort_table as the first table.
      Once we have constructed a full join order, perform #rows/cost adjustments for LIMIT.
      We can set the adjusted cost as join->best_read.

      Then, proceed to run the join optimizer as usual.

      Account for risks

      QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher.

      To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1.

      Attachments

        Issue Links

          Activity

            psergei, was a 10.11 version of this implemented, tested and reviewed? There are conflicts when attempting to merge this to 10.11.

            marko Marko Mäkelä added a comment - psergei , was a 10.11 version of this implemented, tested and reviewed? There are conflicts when attempting to merge this to 10.11.

            marko it should be similar. let me look.

            psergei Sergei Petrunia added a comment - marko it should be similar. let me look.
            ycp Yuchen Pei added a comment - - edited

            marko and psergei: I also noticed some non-trivial conflict when trying to merge 10.6->10.11 today, mainly with the following commits:

            commit b3c74bdc1f09b2b8babd7f2bd9d52df2749ddcc3
            Author: Monty <monty@mariadb.org>
            Date:   Tue May 31 17:36:32 2022 +0300
             
                Improve pruning in greedy_search by sorting tables during search
                
                MDEV-28073 Slow query performance in MariaDB when using many tables
            ...
            commit 8c2faad576d6a77314e92755a389de2c41e21242 (github/bb-10.10-optimizer-features)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Jul 19 14:13:17 2022 +0300
             
                MDEV-28929: Plan selection takes forever with MDEV-28852 ...
                Part #2: Extend heuristic pruning to use multiple tables as the
                "Model tables".
            ...
            

            The nontrivial conflict is this one in sql_select.cc:

            @@@ -10738,98 -10853,31 +11096,109 @@@ best_extension_by_limited_search(JOI
              
                DBUG_EXECUTE("opt", print_plan(join, idx, record_count, read_time, read_time,
                                              "part_plan"););
            # [... 32 lines elided]
            <<<<<<< A: HEAD
            # [... 57 lines elided]
              for (SORT_POSITION *pos= sort ; pos < sort_end ; pos++)
              {
                s= *pos->join_tab;
                if (!(found_eq_ref_tables & s->table->map) &&
                    !check_interleaving_with_nj(s))
            ||||||| Ancestor
                if ((allowed_tables & real_table_bit) &&
                    !(remaining_tables & s->dependent) &&
                    !check_interleaving_with_nj(s))
            =======
                if ((allowed_tables & real_table_bit) &&
                    !(remaining_tables & s->dependent) &&
                    !check_interleaving_with_nj(s) &&
                    join_limit_shortcut_allows_table(join, idx, s))
            >>>>>>> B: Incoming
                  {
             +      table_map real_table_bit= s->table->map;
                    double current_record_count, current_read_time;
                    double partial_join_cardinality;
             -      POSITION *position= join->positions + idx;
             -      POSITION loose_scan_pos;
             +      POSITION *position= join->positions + idx, *loose_scan_pos;
                    Json_writer_object trace_one_table(thd);
            

            Can you advise how to resolve it, psergei?

            ycp Yuchen Pei added a comment - - edited marko and psergei : I also noticed some non-trivial conflict when trying to merge 10.6->10.11 today, mainly with the following commits: commit b3c74bdc1f09b2b8babd7f2bd9d52df2749ddcc3 Author: Monty <monty@mariadb.org> Date: Tue May 31 17:36:32 2022 +0300   Improve pruning in greedy_search by sorting tables during search MDEV-28073 Slow query performance in MariaDB when using many tables ... commit 8c2faad576d6a77314e92755a389de2c41e21242 (github/bb-10.10-optimizer-features) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Jul 19 14:13:17 2022 +0300   MDEV-28929: Plan selection takes forever with MDEV-28852 ... Part #2: Extend heuristic pruning to use multiple tables as the "Model tables". ... The nontrivial conflict is this one in sql_select.cc: @@@ -10738,98 -10853,31 +11096,109 @@@ best_extension_by_limited_search(JOI DBUG_EXECUTE("opt", print_plan(join, idx, record_count, read_time, read_time, "part_plan");); # [... 32 lines elided] <<<<<<< A: HEAD # [... 57 lines elided] for (SORT_POSITION *pos= sort ; pos < sort_end ; pos++) { s= *pos->join_tab; if (!(found_eq_ref_tables & s->table->map) && !check_interleaving_with_nj(s)) ||||||| Ancestor if ((allowed_tables & real_table_bit) && !(remaining_tables & s->dependent) && !check_interleaving_with_nj(s)) ======= if ((allowed_tables & real_table_bit) && !(remaining_tables & s->dependent) && !check_interleaving_with_nj(s) && join_limit_shortcut_allows_table(join, idx, s)) >>>>>>> B: Incoming { + table_map real_table_bit= s->table->map; double current_record_count, current_read_time; double partial_join_cardinality; - POSITION *position= join->positions + idx; - POSITION loose_scan_pos; + POSITION *position= join->positions + idx, *loose_scan_pos; Json_writer_object trace_one_table(thd); Can you advise how to resolve it, psergei ?

            I just pushed a merge of this to 10.11 using this for reference. There will be the following conflicts left for a merge to 11.2:

            	both modified:   mysql-test/main/mysqld--help.result
            	both modified:   mysql-test/suite/sys_vars/r/sysvars_server_embedded.result
            	both modified:   mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
            	both modified:   sql/sql_select.cc
            

            I didn’t check if there would be further conflicts for a merge to 11.4.

            marko Marko Mäkelä added a comment - I just pushed a merge of this to 10.11 using this for reference. There will be the following conflicts left for a merge to 11.2: both modified: mysql-test/main/mysqld--help.result both modified: mysql-test/suite/sys_vars/r/sysvars_server_embedded.result both modified: mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result both modified: sql/sql_select.cc I didn’t check if there would be further conflicts for a merge to 11.4.

            ycp, I see that you must have been trying to merge 10.6→10.11 in order to resolve conflicts that your recently pushed changes introduced. You should be able to continue with that now.

            marko Marko Mäkelä added a comment - ycp , I see that you must have been trying to merge 10.6→10.11 in order to resolve conflicts that your recently pushed changes introduced. You should be able to continue with that now.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.