[MDEV-26301] Split optimization refills temporary table too many times Created: 2021-08-03  Updated: 2023-10-30  Resolved: 2023-05-04

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4, 10.5, 10.6
Fix Version/s: 11.1.1, 10.11.3, 10.4.29, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Blocker
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 1
Labels: split_materialized

Attachments: File spetrunia-mdev26301-tracing.diff    
Issue Links:
Blocks
Problem/Incident
causes MDEV-29371 Rowid Filtering optimization can caus... Open
causes MDEV-31240 Crash with condition pushable into de... Closed
causes MDEV-31241 Crashes in subselects in choose_best_... Closed
causes MDEV-31301 sql/opt_split.cc:1043:5: warning: ‘be... Closed
causes MDEV-31408 Second SELECT from VIEW based on info... Closed
causes MDEV-31440 choose_best_splitting: crash on updat... Confirmed
causes MDEV-32064 Crash when searching for the best spl... Closed
causes MDEV-32571 A specific query crashes MariaDB in c... Closed
causes MDEV-32602 Server crash after query Closed
causes MDEV-32624 Mariadb crashes with query Closed
Relates

 Description   

Short

LATERAL DERIVED optimization refills the temp. table every time the derived table is accessed, regardless of whether the inputs have changed or not.

LATERAL DERIVED's cost calculations match the execution and also assume the temp.table is refilled every time.
The join optimizer typically over-estimates join output cardinality, so the optimizer considers LATERAL DERIVED optimization to be even more expensive.

Taken together, these two properties cause the optimizer to miss using LATERAL DERIVED optimization where it is very advantageous.

Long

(This is based on a real-world case but the dataset I am using is totally artificial)

Tables that will make a prefix before the lateral:

# 5 values
create table t1(a int, b int);
insert into t1 select seq,seq from seq_1_to_5;
 
# 5 value groups of size 2 each
create table t2(a int, b int, key(a));
insert into t2
select A.seq,B.seq from seq_1_to_25 A, seq_1_to_2 B;
 
# 5 value groups of size 3 each
create table t3(a int, b int, key(a));
insert into t3
select A.seq,B.seq from seq_1_to_5 A, seq_1_to_3 B;
 
analyze table t1,t2,t3 persistent for all;

explain
select * from
  (t1 left join t2 on t2.a=t1.b) left join t3 on t3.a=t1.b;

+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL      | 5    |             |
|    1 | SIMPLE      | t2    | ref  | a             | a    | 5       | test.t1.b | 2    | Using where |
|    1 | SIMPLE      | t3    | ref  | a             | a    | 5       | test.t1.b | 3    | Using where |
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+

Now, tables for the LATERAL DERIVED:

create table t10 (
  grp_id int,
  col1 int,
  key(grp_id)
);
 
# 100 groups of 100 values each
insert into t10
select
  A.seq,
  B.seq
from
  seq_1_to_100 A,
  seq_1_to_100 B;
 
# and X10 multiplier
create table t11 (
  col1 int,
  col2 int
);
insert into t11
select A.seq, A.seq from seq_1_to_10 A;
 
analyze table t10,t11 persistent for all;

Now, the query:

explain
select * from
  (
    (t1 left join t2 on t2.a=t1.b)
    left join t3 on t3.a=t1.b
  ) left join (select grp_id, count(*)
               from t10 left join t11 on t11.col1=t10.col1
               group by grp_id) T on T.grp_id=t1.b;

Note that the join is between T and t1, the first table. t1 has 5 rows. However, the optimizer assumes that the table is refilled t1xt2xt3: 30 times.
Because of that, it doesn't pick LATERAL DERIVED:

+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref       | rows  | Extra                                           |
+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+
|    1 | PRIMARY     | t1         | ALL  | NULL          | NULL | NULL    | NULL      | 5     |                                                 |
|    1 | PRIMARY     | t2         | ref  | a             | a    | 5       | test.t1.b | 2     | Using where                                     |
|    1 | PRIMARY     | t3         | ref  | a             | a    | 5       | test.t1.b | 3     | Using where                                     |
|    1 | PRIMARY     | <derived2> | ref  | key0          | key0 | 5       | test.t1.b | 1000  | Using where                                     |
|    2 | DERIVED     | t10        | ALL  | grp_id        | NULL | NULL    | NULL      | 10000 | Using temporary; Using filesort                 |
|    2 | DERIVED     | t11        | ALL  | NULL          | NULL | NULL    | NULL      | 10    | Using where; Using join buffer (flat, BNL join) |
+------+-------------+------------+------+---------------+------+---------+-----------+-------+-------------------------------------------------+



 Comments   
Comment by Sergei Petrunia [ 2021-08-05 ]

Execution part should collect lateral's dependencies and keep them in a Cached_item

Comment by Sergei Petrunia [ 2021-08-08 ]

http://lists.askmonty.org/pipermail/commits/2021-August/014711.html

Also in the ES repo in 10.5.10-7-ment1259-v2

Comment by Sergei Petrunia [ 2023-04-28 ]

Input for the latest patch: https://lists.launchpad.net/maria-developers/msg13314.html

Comment by Sergei Petrunia [ 2023-05-02 ]

A patch to be applied over Igor's fix. It just adds optimizer trace coverage: spetrunia-mdev26301-tracing.diff

Comment by Sergei Petrunia [ 2023-05-02 ]

spetrunia-mdev26301-tracing.diff

Comment by Sergei Petrunia [ 2023-05-15 ]

For this MDEV, there were two commits pushed:

  • The fix itself

    commit ce7ffe61d836fe9f0cfc1087f058bc40d66e5cfb
    Author: Igor Babaev <igor@askmonty.org>
    Date:   Tue May 2 23:17:07 2023 -0700
     
        MDEV-26301 Split optimization refills temporary table too many times
        
        This patch optimizes the number of refills for the lateral derived table
        to which a materialized derived table subject to split optimization is
        is converted. This optimized number of refills is now considered as the
        expected number of refills of the materialized derived table when searching
        for the best possible splitting of the table.
    

  • A patch adding optimizer trace coverage:

    commit ed3e6f66a265952afded33fb134f6f8bcc31aa89
    Author: Sergei Petrunia <sergey@mariadb.com>
    Date:   Wed May 3 13:49:32 2023 +0300
     
        MDEV-26301: Split optimization refills: Optimizer Trace coverage
        
        Add Optimizer Trace printouts.
    

Generated at Thu Feb 08 09:44:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.