[MDEV-30511] Query slower when split_materialized=on Created: 2023-01-30  Updated: 2023-02-01  Resolved: 2023-02-01

Status: Closed
Project: MariaDB Server
Component/s: N/A
Affects Version/s: 10.10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jerome Barotin Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: performance
Environment:

Ubuntu 22.04


Attachments: File dataset.sql    

 Description   

Hi,

I've got a case where split_materialized optimization introduced in Mariadb 10.3 is not effective.

You can easily reproduce this behavior with this sample dataset attached in the file included (it creates 4 tables and fill them enough data to make evidence of the issue). The step to set up the environment with docker are the following :

docker run --detach --name mariadb --env MARIADB_USER=user --env MARIADB_PASSWORD=my_cool_secret --env MARIADB_ROOT_PASSWORD=my-secret-pw  mariadb:10.10.2
 
docker cp dataset.sql mariadb:dataset.sql
 
docker exec -it mariadb mysql -u root --password=my-secret-pw 
 
set global innodb_buffer_pool_size = 1342177280;
CREATE DATABASE IF NOT EXISTS split_materialized;
USE split_materialized;
 
source dataset.sql

On my dev computer (i7 with 12 vcores, 16 GB RAM, disks SSD and OS Ubuntu 22.04), it takes about one minute to executes.

It creates 4 tables:

country with 300 rows
town with 20 000 rows
ticket with about 3 800 000 rows
opened_ticket with about 800 000 rows

Table ticket is in relationships with country and town.
Table opened_ticket is in relationships with ticket, country and town.

Once the test dataset is loaded, then you can execute the following query :

with my_opened_tickets as(
select
    town.id  as id,
    opened_ticket.nb2,
    opened_ticket.nb1
from
    town ,
    ticket ,
    opened_ticket
where
    opened_ticket.id = ticket.id
    and opened_ticket.country_id = 186
    and opened_ticket.town_id = town.id
    ),
max_nb2 as(
select
    id,
    max(nb2) nb2
from
    my_opened_tickets
group by
    id ),
     max_by_id_nb2 as (
select
    max_nb2.id,
    max_nb2.nb2,
    max(my_opened_tickets.nb1)
from
    my_opened_tickets ,
    max_nb2
where
    my_opened_tickets.id = max_nb2.id
    and max_nb2.nb2 = my_opened_tickets.nb2
group by
    max_nb2.id,max_nb2.nb2)
    select * from max_by_id_nb2;

I executed it several times, it took an average 6.5s to run.

If I run explain, on this slow query I've this output :

+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
| id   | select_type     | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                      | rows  | Extra                                                     |
+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
|    1 | PRIMARY         | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                     | 40802 |                                                           |
|    4 | DERIVED         | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                     | 20401 | Using index; Using temporary; Using filesort              |
|    4 | DERIVED         | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |
|    4 | DERIVED         | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |
|    4 | DERIVED         | <derived3>    | ref    | key0                                                                                   | key0                      | 4       | split_materialized.town.id               | 2     | Using where                                               |
|    3 | LATERAL DERIVED | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | split_materialized.opened_ticket.town_id | 1     | Using where; Using index; Using temporary; Using filesort |
|    3 | LATERAL DERIVED | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |
|    3 | LATERAL DERIVED | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |
+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+

We can see the "LATERAL DERIVED" in the select time, this keyword bring me to this page https://mariadb.com/kb/en/lateral-derived-optimization/
where discovered the split_materialized optimization of MariaDB introduced

Then if I disable the option with this cli

set optimizer_switch='split_materialized=off';

I execute again the query, the average execution time is about one seconds.

An explain give the following output :

+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
| id   | select_type | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                 | rows  | Extra                           |
+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
|    1 | PRIMARY     | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 |                                 |
|    4 | DERIVED     | <derived3>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 | Using temporary; Using filesort |
|    4 | DERIVED     | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | max_nb2.id                          | 1     | Using index                     |
|    4 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | max_nb2.id,const                    | 1     | Using where                     |
|    4 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |
|    3 | DERIVED     | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                | 20401 | Using index                     |
|    3 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const    | 1     |                                 |
|    3 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |
+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
 

Is that possible to fix the split_materialized optimization to work better with this kind of query ?

ps: I have reduced my dataset to make quicker to reproduce, but on production environment this query is very long to execute (several minutes) with split_materialized=on



 Comments   
Comment by Andrew Hutchings [ 2023-02-01 ]

Communication via Zulip to show how to setup histograms so that the optimizer makes better choices as to which algorithm to use.

Comment by Jerome Barotin [ 2023-02-01 ]

So I have run :

ANALYZE TABLE ticket PERSISTENT FOR ALL;
ANALYZE TABLE opend_ticket PERSISTENT FOR ALL;

And then, query execution have increased to about 1 seconds (similar time with split_materialized=off).

The ticket could be closed

Generated at Thu Feb 08 10:16:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.