[MDEV-28929] Plan selection takes forever with MDEV-28852 comparing to reasonable time without Created: 2022-06-22  Updated: 2022-07-30  Resolved: 2022-07-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.10.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File chart03-reasons-for-not-pruning.png     PNG File chart04-improved-pruning.png     PNG File chart04-pruning-in-table0.png     PNG File chart05-improved-pruning-v3-bap.png     PNG File chart06-improved-pruning-v3.png     PNG File chart1-best-extensions-counts.png     PNG File chart2-best-extensions-grows.png     File psergey-mdev28929-improved-pruning-v3.diff     PNG File slide01.png    
Issue Links:
Problem/Incident
is caused by MDEV-28928 Plan selection takes 2+ times longer ... Closed
Relates
relates to MDEV-28852 Improve optimization of joins with ma... Closed

 Description   

By "forever" in the summary I mean that I have never seen it finish, the longest I waited before interrupting was ~1 hour.

The test uses all default server startup options, loads original data set from MDEV-28073 (xpo_full_testcase.sql), runs ANALYZE PERSISTENT FOR ALL on all tables, and executes EXPLAIN for a query structured identically to MDEV-28073 (minus a couple of last JOINs), but the tables and fields in the query are mixed up - that is, different tables and fields from the dataset appear in different JOIN positions and conditions. The exact query will be provided.

A slight abnormality of this particular query is that one of ON conditions compares a datetime field to a string literal (not convertible into a proper date time), and thus produces Truncated incorrect datetime value warning (ER_TRUNCATED_WRONG_VALUE). I'm not sure whether it's a decisive factor for the problem, but it's possible. Except for this condition, the query is identical to MDEV-28928.

The query was executed with EXPLAIN FORMAT=JSON on several different versions/builds for comparison. All builds are non-debug.

preview-10.10-optimizer f332260c98

|  4 | root | localhost:36260 | test | Query   |  590 | Statistics | EXPLAIN FORMAT=JSON SELECT * FROM ((((((((((((((((
...

It looks the same with optimizer_prune_level=2 (default) and 1.

preview-10.10-optimizer, MDEV-28852 baseline (07a31de109)

MariaDB [test]> EXPLAIN FORMAT=JSON SELECT * FROM (((((((((((((((( 
...
1 row in set, 1 warning (36.832 sec)

10.6 main 0e4cf497

MariaDB [test]> EXPLAIN FORMAT=JSON SELECT * FROM (((((((((((((((( 
...
1 row in set, 1 warning (9.314 sec)



 Comments   
Comment by Sergei Petrunia [ 2022-06-23 ]

Trying to analyze... on 10.6, it takes 43 seconds on debug build, but with optimizer_trace=1, optimizer_trace_max_mem_size=1G it hasn't finished in 30 min.

Looking at the trace:

...
#24 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4044963, idx=9, record_count=0, read_time=88.86961349375116, search_depth=53, 
    prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#25 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4045475, idx=8, record_count=0, read_time=88.86961349375116, search_depth=54, 
    prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#26 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4176547, idx=7, record_count=0, read_time=78.86961349375116, search_depth=55, 
    prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#27 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4176563, idx=6, record_count=5.4170105866999343, 
    read_time=44.269613493751166, search_depth=56, prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#28 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4176819, idx=5, record_count=43, read_time=23.011406774488858, 
    search_depth=57, prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#29 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4176827, idx=4, record_count=1, read_time=5.310106805004577, search_depth=58, 
    prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#30 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4193211, idx=3, record_count=1, read_time=3.9827265791882818, 
    search_depth=59, prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#31 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4194235, idx=2, record_count=1, read_time=2.6553463533719865, 
    search_depth=60, prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#32 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4194299, idx=1, record_count=1, read_time=1.3276731766859933, 
    search_depth=61, prune_level=1, use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#33 0x00005588b0b7650f in best_extension_by_limited_search (join=0x7f0ca02f9fc0, remaining_tables=4194303, idx=0, record_count=1, read_time=0, search_depth=62, prune_level=1, 
    use_cond_selectivity=4) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:10130
#34 0x00005588b0b73ca7 in greedy_search (join=0x7f0ca02f9fc0, remaining_tables=4194303, search_depth=62, prune_level=1, use_cond_selectivity=4)
    at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:9165
#35 0x00005588b0b72f0d in choose_plan (join=0x7f0ca02f9fc0, join_tables=4194303) at /home/psergey/dev-git2/10.6-optimizer-look/sql/sql_select.cc:8728
...

I can see that record_count becomes 0 at some point. I suppose this would prevent pruning...

Comment by Sergei Petrunia [ 2022-06-24 ]

Note: the used 10.6 main 0e4cf497 does include commits for MDEV-28073.

Comment by Michael Widenius [ 2022-06-27 ]

What you experience with MDEV-28929 is some what expected:

  • The solution i created for for MDEV-28073 will make it much more likely to find a good plan fast.
    Finding a good plan fast, will allow the greedy_optimizer to shortcut most of the other plans and thus
    produce a plan in a reasonable time
  • If there is no good reasonable easy to find query plan for multiple tables, then it will take "forever" for
    the optimizer to find a good plan as all plans are bad! Executing any of the plans (if there are a lot of rows) is also likely to take forever. For such bad plans, the only solution with current code is to reduce 'optimizer_search_depth' and try again. For example your query runs in 0.5 seconds with search depth 10.
  • The original problematic query had reasonable keys that could be used and the new code will be able to find a good plan reasonable fast (< 1 second and 238978 Optimizer_join_prefixes_check_calls)
Comment by Sergei Petrunia [ 2022-06-28 ]

Table statistics according to 10.6:

table rows  after_selectivity
tbl0  1328  (TODO check)
tbl1   1328  -
tbl2   1314  1
tbl3   86    0 << !!!
tbl4   123   -
tbl5   1314  -
tbl6   1314  1
tbl7   1314  1.92
tbl8   86    -
tbl9   619   - 
tbl10  123   1
tbl11  1328 0.001 // NOTE: very low
tbl12  344460   // !!! no indexes?
tbl13  1314 -
tbl14  344460 1
tbl15  1314
tbl16  86
tbl17  123
tbl18  619
tbl19  344460 // !! no indexes?
tbl20  1328
tbl21  344460 // !! no indexes ?

Note the tbl3 with rows=0. This does cause rows_for_plan=0 during join optimization. And this causes very low join->best_read=8155.6 for the query.

In 10.10, tbl3 has

86  0.69

And the join cost (with optimizer_search_depth=0, which means "auto") is: join->best_read=9.92807e+40

Comment by Sergei Petrunia [ 2022-06-29 ]

There are two other cases examples of Bad Tables in the query: tbl13 and tbl15 have plenty of rows and no usable keys. They are "not as bad" as tbl19, tbl21, but probably may cause the same effect:

*************************** 13. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl13
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1314
        Extra: Using where; Using join buffer (flat, BNL join)
*************************** 14. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl15
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1314
        Extra: Using where; Using join buffer (incremental, BNL join)

Note that they are in the middle of the join order:

+------+-------------+-------+------+-----------------------------------+-----------------+---------+----------------------------------------------------------------+------+--------------------------------------------------------+
| id   | select_type | table | type | possible_keys                     | key             | key_len | ref                                                            | rows | Extra                                                  |     
+------+-------------+-------+------+-----------------------------------+-----------------+---------+----------------------------------------------------------------+------+--------------------------------------------------------+
|    1 | SIMPLE      | tbl4  | ALL  | NULL                              | NULL            | NULL    | NULL                                                           | 123  | Using where                                            |
|    1 | SIMPLE      | tbl2  | ref  | assignment_group,number,a_str_4   | number          | 123     | const                                                          | 1    | Using index condition; Using where                     |
|    1 | SIMPLE      | tbl6  | ref  | number,a_ref_9                    | number          | 123     | const                                                          | 1    | Using index condition; Using where                     |
|    1 | SIMPLE      | tbl10 | ref  | category                          | category        | 99      | const                                                          | 1    | Using index condition; Using where                     |
|    1 | SIMPLE      | tbl14 | ref  | PRIMARY,sys_scope,sys_update_name | sys_scope       | 99      | const                                                          | 1    | Using index condition; Using where                     |
|    1 | SIMPLE      | tbl3  | ref  | related_id_2                      | related_id_2    | 97      | test.tbl2.assignment_group                                     | 43   | Using index condition                                  |
|    1 | SIMPLE      | tbl17 | ALL  | NULL                              | NULL            | NULL    | NULL                                                           | 123  | Using where; Using join buffer (flat, BNL join)        |     
|    1 | SIMPLE      | tbl9  | ALL  | cost_center                       | NULL            | NULL    | NULL                                                           | 619  | Using where; Using join buffer (incremental, BNL join) |
|    1 | SIMPLE      | tbl8  | ref  | metric_type                       | metric_type     | 99      | test.tbl9.cost_center                                          | 21   | Using where                                            |
|    1 | SIMPLE      | tbl18 | ref  | cost_center,sys_domain_path       | sys_domain_path | 768     | test.tbl2.a_str_4                                              | 610  | Using index condition                                  |     
|    1 | SIMPLE      | tbl5  | ref  | sys_domain_path                   | sys_domain_path | 768     | test.tbl4.description                                          | 1314 | Using index condition                                  |
|    1 | SIMPLE      | tbl7  | ref  | a_ref_8                           | a_ref_8         | 99      | test.tbl6.a_ref_9                                              | 641  |                                                        |
|    1 | SIMPLE      | tbl13 | ALL  | NULL                              | NULL            | NULL    | NULL                                                           | 1314 | Using where; Using join buffer (flat, BNL join)        |
|    1 | SIMPLE      | tbl15 | ALL  | NULL                              | NULL            | NULL    | NULL                                                           | 1314 | Using where; Using join buffer (incremental, BNL join) |
|    1 | SIMPLE      | tbl11 | ref  | reference_value                   | reference_value | 97      | test.tbl10.string_option                                       | 649  | Using index condition                                  |
|    1 | SIMPLE      | tbl20 | ref  | source_table,instance             | source_table    | 243     | test.tbl14.sys_update_name                                     | 442  | Using index condition                                  |
|    1 | SIMPLE      | tbl1  | ref  | reference_value,source_id         | source_id       | 97      | test.tbl20.instance                                            | 57   | Using index condition; Using where                     |
|    1 | SIMPLE      | tbl16 | ref  | related_id_4                      | related_id_4    | 97      | test.tbl1.reference_value                                      | 43   | Using where                                            |
|    1 | SIMPLE      | tbl0  | ref  | source_id                         | source_id       | 97      | test.tbl1.reference_table                                      | 57   | Using index condition                                  |
|    1 | SIMPLE      | tbl12 | ref  | sys_update_name,sys_class_name    | sys_update_name | 996     | test.tbl16.related_table_1,test.tbl13.additional_assignee_list | 1    | Using index condition; Using where                     |
+------+-------------+-------+------+-----------------------------------+-----------------+---------+----------------------------------------------------------------+------+--------------------------------------------------------+
20 rows in set (4.754 sec)

Comment by Sergei Petrunia [ 2022-06-29 ]

Indeed, removing them makes the query fast on 10.6:

EXPLAIN time on 10.6:

original query 40 sec
remove tbl19, tbl21 4 sec
also remove tbl13, tbl15 0.14 sec

On 10.10, it still never finishes.

Comment by Sergei Petrunia [ 2022-06-29 ]

... Modified 10.6 to NOT return rows=0 from histogram estimates (return 1 instead).

For query without
tbl19, tbl21, tbl13, tbl15, EXPLAIN now runs in 1.2 sec (slower). join->best_read=7.05545e+18 (a lot closer to 10.10)
bap_calls=1,403,449.

Comment by Sergei Petrunia [ 2022-06-29 ]

Number of best_access_path calls for the query without tbl19, tbl21, tbl13, tbl15 :

MariaDB 10.6 has finished after doing 1.4M calls. MariaDB 10.10 was interrupted after it has made 20M best_access_path calls:

idx= MariaDB-10.6 MariaDB-10.10
0 18 18
1 34 17
2 64 32
3 90 60
4 84 140
5 312 429
6 720 2148
7 1188 12771
8 2280 55310
9 6372 205749
10 11616 649456
11 21000 1554553
12 51696 2899647
13 133740 4827560
14 312784 5133676
15 416364 3393786
16 269238 856000
17 175849 408648
TOTAL 1403449 20000000

10.10 starts to make more calls at idx=4...

Comment by Sergei Petrunia [ 2022-06-29 ]

Looking at the number of extensions that best_extension_by_limited_search is exploring for different prefix size.

Note that 10.10 is exploring more for prefix_size=6.

I continued to run 10.10 and note the extension counts:

pruning doesn't seem to be working at prefix_size=4.

Comment by Sergei Petrunia [ 2022-06-29 ]

On the question of cost: 10.10 Does seem to find query plans with better cost eventually:
MariaDB 10.6:

BAP_calls join_best_read
1.4M 7.05545e+18

MariaDB-10.10

BAP_calls join_best_read
1M 1.98955e+17
2M 5.78458e+11
40M 2.34823e+09
Comment by Sergei Petrunia [ 2022-06-29 ]

So, pruning works poorly in 10.10.
Trying to find the reason for it.

        /*
          Prune some less promising partial plans. This heuristic may miss
          the optimal QEPs, thus it results in a non-exhaustive search.
        */
        if (prune_level >= 1)
        {
          if (best_record_count > current_record_count ||
              best_read_time > current_read_time ||
              (idx == join->const_tables &&  // 's' is the first table in the QEP
              s->table == join->sort_by_table))
          {

Added:

            if (best_record_count >= current_record_count &&
                !(best_read_time >= current_read_time))
              join->n_count_not_time[idx]++;

  
            if (!(best_record_count >= current_record_count) &&
                (best_read_time >= current_read_time))
              join->n_time_not_count[idx]++;

Comment by Sergei Petrunia [ 2022-06-29 ]

So, n_time_not_count is large, that is, typically we have

  • (best_record_count < current_record_count)
  • (best_read_time >= current_read_time)

that is

1. current_record_count > best_record_count, but
2. current_read_time <= best_read_time

The first makes sense, because sort_position() orders SORT_POSITION entries by records_read:

  sort_positions(SORT_POSITION *a, SORT_POSITION *b)
  {
    int cmp;              
    if ((cmp= compare_embedding_subqueries(*a->join_tab, *b->join_tab)) != 0)
      return cmp;
  
    if (a->position->records_read > b->position->records_read)
      return 1;
    if (a->position->records_read < b->position->records_read)
      return -1;

and then we try different extensions ordered by their records_read.

However we get the plans that have current_read_time <= best_read_time, that is, their read_time higher than read_time of the first option we've looked at.

This is a bit surprising as read_time includes #records/TIME_FOR_COMPARE so generally entries with higher read_time would have higher records_read as well?

Comment by Sergei Petrunia [ 2022-06-29 ]

An example from the debugger (query with no t13,t15,t19,t21):

join prefix: tbl8 tbl2 tbl4 tbl20 tbl9
record_count=15419.253 

pos[-1]->position:
  tbl12
  type= JT_ALL
  records_read= 0.281752
  read_time= 1158815.9(1.15M)

position:
  tbl6 
  type=JT_REF
  records_read = 1
  read_time = 17387.8

pos[1]->position:
  tbl10
  JT_REF
  records_read= 1
  read_time= 17383.3

pos[2]->position:
  tbl14
  JT_REF
  records_read=1
  read_time= 15624.9

pos[3]->position:
  tbl1
  JT_REF
  records_read=57.73
  read_time=896387.20

tbl12 is sorted first as it has low records_read. However, it has a high read_time!

Comment by Sergei Petrunia [ 2022-06-29 ]

Do the numbers for tbl12 make sense?

They look plausible:

                "selectivity_for_indexes": [],
                "selectivity_for_columns": [
                  {
                    "column_name": "sys_created_on",
                    "ranges": [
                      "0000-00-00 00:00:00 <= sys_created_on <= 0000-00-00 00:00:00"
                    ],
                    "selectivity_from_histogram": 8.179528e-7
                  }
                ],
                "cond_selectivity": 8.179528e-7
              },
              {
                "table": "tbl12",
                "table_scan": {
                  "rows": 344460,
                  "cost": 3949
                }
              },

The table has a lot of rows but the condition is very selective:
344460 * 8.179528e-7 = 0.28175 rows.

Comment by Sergei Petrunia [ 2022-06-30 ]

Playing with a patch which fixes the heuristic pruning (Description to follow).

Now the query without the 4 tables finishes in reasonable time.

  10.6 10.10-fixed-pruning
explain time 1.197 sec 4.506 sec
BAP_calls 1.4M 5.8M
join->best_read 7.05545e+18 2.0189e+17

(explain time is inexact as this is debug build talking to stderr, etc)
Pruning:

Comment by Sergei Petrunia [ 2022-07-03 ]

Improved heuristic pruning certainly helps, but is still does a lot less pruning than 10.6.
Investigating what happens for 10.0:

Table # read_time records outcome
tbl8 1 0.125977 20.2 new_best-1
tbl12 2 0.281752 72841 pruned_by_heuristic
tbl2 3 1 1.32767 new_best-2
tbl4 4 1 33.6 pruned_by_heuristic
tbl6 5 1 1.32767 no_pruning
tbl10 6 1 1.32738 no_pruning
tbl14 7 1 1.32738 no_pruning
tbl3 8 86 20.2 pruned_by_heuristic
tbl16 9 86 20.2 pruned_by_heuristic
tbl17 10 123 33.6 pruned_by_heuristic
tbl9 11 619 142.8 pruned_by_heuristic
tbl18 12 619 142.8 pruned_by_heuristic
tbl5 13 1314 359.8 pruned_by_heuristic
tbl7 14 1314 359.8 pruned_by_heuristic
tbl0 15 1328 362.6 pruned_by_heuristic
tbl1 16 1328 362.6 pruned_by_heuristic
tbl11 17 1328 362.6 pruned_by_heuristic
tbl20 18 1328 362.6 pruned_by_heuristic

Comment by Sergei Petrunia [ 2022-07-04 ]

Made a fix for the pruning, created "v3" patch.

Comment by Sergei Petrunia [ 2022-07-04 ]

The V3 patch: psergey-mdev28929-improved-pruning-v3.diff

Comment by Sergei Petrunia [ 2022-07-04 ]

The V3 patch helps with all EXPLAINs:

  • the query without tbl19, tbl21, tbl13, tbl15 - finishes faster than in 10.6
  • the query without tbl19, tbl21 - finishes in 0.68 sec
  • the original query finishes in 3.341 sec.
    (This is on a debug build)
Comment by Sergei Petrunia [ 2022-07-04 ]

The idea behind pruned_by_heuristic

Let's think of possible query plans considered in best_extension_by_limited_search. The query plans have common join prefix. After adding a table, we get:

1. read_time. This includes the time spent reading the added table.

2. record_count. The number of row combination we will pass to further parts of the query plan.

3. The cost of the rest of the query plan. We dont know this but it's a function of the choice we've made right now.

Doing the pruning.

Let's not think about #3, and just focus on #1 and #2. This is "the heuristic".

We don't know the "exchange ratio" between read_time and record_count. But we can tell that if both are bigger, then the query plan is less promising.

This is how pruning works. We discard the plans that have both higher read_time and higher record_count as they are less promising.

Limitations of pruning

The optimizer considers tables ordered by their record_count.

But it uses A DIFFERENT pruning criteria:

  PLAN_X.record_count >= PLAN_Y.record_count AND PLAN_X.read_time >= PLAN_Y.read_time

Which means the "best" tables to do pruning are not necessarily the first. We can get cases like this:

where the the table with least record_count is picked as "best" and is unable to do any pruning.

Ideally, we should have ordered the tables by their capacity to prune away other tables. This seems hard to do (I can only think of N^2 algorithms).

A workaround algorithm:

  • walk the tables from smaller to bigger record_count.
  • also keep another kind of "best" table, the one with the smallest read_time.
  • use both to do pruning.
    This is what V3 patch in the psergey-mdev28929-improved-pruning-v3.diff does.
Comment by Sergei Petrunia [ 2022-07-19 ]

The final version of the patch:

https://github.com/MariaDB/server/commit/ac3b3807cc9b21e23e65b1a70c7dcda1f45a3c57

in bb-10.10-mdev28929-v4

Comment by Sergei Petrunia [ 2022-07-20 ]

Fixed in the feature tree.

Generated at Thu Feb 08 10:04:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.