[MDEV-30511] Query slower when split_materialized=on - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 10.10.2
Fix Version/s: N/A
Component/s: N/A
Labels:
- performance
Environment:
Ubuntu 22.04

Description

Hi,

I've got a case where split_materialized optimization introduced in Mariadb 10.3 is not effective.

You can easily reproduce this behavior with this sample dataset attached in the file included (it creates 4 tables and fill them enough data to make evidence of the issue). The step to set up the environment with docker are the following :

docker run --detach --name mariadb --env MARIADB_USER=user --env MARIADB_PASSWORD=my_cool_secret --env MARIADB_ROOT_PASSWORD=my-secret-pw  mariadb:10.10.2

docker cp dataset.sql mariadb:dataset.sql

docker exec -it mariadb mysql -u root --password=my-secret-pw

set global innodb_buffer_pool_size = 1342177280;

CREATE DATABASE IF NOT EXISTS split_materialized;

USE split_materialized;

source dataset.sql

On my dev computer (i7 with 12 vcores, 16 GB RAM, disks SSD and OS Ubuntu 22.04), it takes about one minute to executes.

It creates 4 tables:

country with 300 rows
town with 20 000 rows
ticket with about 3 800 000 rows
opened_ticket with about 800 000 rows

Table ticket is in relationships with country and town.
Table opened_ticket is in relationships with ticket, country and town.

Once the test dataset is loaded, then you can execute the following query :

with my_opened_tickets as(

select

    town.id  as id,

    opened_ticket.nb2,

    opened_ticket.nb1

from

    town ,

    ticket ,

    opened_ticket

where

    opened_ticket.id = ticket.id

    and opened_ticket.country_id = 186

    and opened_ticket.town_id = town.id

),

max_nb2 as(

select

id,

    max(nb2) nb2

from

    my_opened_tickets

group by

    id ),

     max_by_id_nb2 as (

select

    max_nb2.id,

    max_nb2.nb2,

    max(my_opened_tickets.nb1)

from

    my_opened_tickets ,

    max_nb2

where

    my_opened_tickets.id = max_nb2.id

    and max_nb2.nb2 = my_opened_tickets.nb2

group by

    max_nb2.id,max_nb2.nb2)

    select * from max_by_id_nb2;

I executed it several times, it took an average 6.5s to run.

If I run explain, on this slow query I've this output :

+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+

| id   | select_type     | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                      | rows  | Extra                                                     |

+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+

|    1 | PRIMARY         | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                     | 40802 |                                                           |

|    4 | DERIVED         | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                     | 20401 | Using index; Using temporary; Using filesort              |

|    4 | DERIVED         | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |

|    4 | DERIVED         | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |

|    4 | DERIVED         | <derived3>    | ref    | key0                                                                                   | key0                      | 4       | split_materialized.town.id               | 2     | Using where                                               |

|    3 | LATERAL DERIVED | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | split_materialized.opened_ticket.town_id | 1     | Using where; Using index; Using temporary; Using filesort |

|    3 | LATERAL DERIVED | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |

|    3 | LATERAL DERIVED | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |

+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+

We can see the "LATERAL DERIVED" in the select time, this keyword bring me to this page https://mariadb.com/kb/en/lateral-derived-optimization/
where discovered the split_materialized optimization of MariaDB introduced

Then if I disable the option with this cli

set optimizer_switch='split_materialized=off';

I execute again the query, the average execution time is about one seconds.

An explain give the following output :

+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+

| id   | select_type | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                 | rows  | Extra                           |

+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+

|    1 | PRIMARY     | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 |                                 |

|    4 | DERIVED     | <derived3>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 | Using temporary; Using filesort |

|    4 | DERIVED     | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | max_nb2.id                          | 1     | Using index                     |

|    4 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | max_nb2.id,const                    | 1     | Using where                     |

|    4 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |

|    3 | DERIVED     | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                | 20401 | Using index                     |

|    3 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const    | 1     |                                 |

|    3 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |

+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+

Is that possible to fix the split_materialized optimization to work better with this kind of query ?

ps: I have reduced my dataset to make quicker to reproduce, but on production environment this query is very long to execute (several minutes) with split_materialized=on

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dataset.sql
2 kB
2023-01-30 11:10

Query slower when split_materialized=on

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration