Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
10.10.2
-
Ubuntu 22.04
Description
Hi,
I've got a case where split_materialized optimization introduced in Mariadb 10.3 is not effective.
You can easily reproduce this behavior with this sample dataset attached in the file included (it creates 4 tables and fill them enough data to make evidence of the issue). The step to set up the environment with docker are the following :
docker run --detach --name mariadb --env MARIADB_USER=user --env MARIADB_PASSWORD=my_cool_secret --env MARIADB_ROOT_PASSWORD=my-secret-pw mariadb:10.10.2 |
|
docker cp dataset.sql mariadb:dataset.sql
|
|
docker exec -it mariadb mysql -u root --password=my-secret-pw
|
|
set global innodb_buffer_pool_size = 1342177280; |
CREATE DATABASE IF NOT EXISTS split_materialized;
|
USE split_materialized;
|
|
source dataset.sql
|
|
On my dev computer (i7 with 12 vcores, 16 GB RAM, disks SSD and OS Ubuntu 22.04), it takes about one minute to executes.
It creates 4 tables:
country with 300 rows
town with 20 000 rows
ticket with about 3 800 000 rows
opened_ticket with about 800 000 rows
Table ticket is in relationships with country and town.
Table opened_ticket is in relationships with ticket, country and town.
Once the test dataset is loaded, then you can execute the following query :
with my_opened_tickets as(
|
select
|
town.id as id,
|
opened_ticket.nb2,
|
opened_ticket.nb1
|
from
|
town ,
|
ticket ,
|
opened_ticket
|
where
|
opened_ticket.id = ticket.id
|
and opened_ticket.country_id = 186 |
and opened_ticket.town_id = town.id
|
),
|
max_nb2 as(
|
select
|
id,
|
max(nb2) nb2
|
from
|
my_opened_tickets
|
group by
|
id ),
|
max_by_id_nb2 as (
|
select
|
max_nb2.id,
|
max_nb2.nb2,
|
max(my_opened_tickets.nb1)
|
from
|
my_opened_tickets ,
|
max_nb2
|
where
|
my_opened_tickets.id = max_nb2.id
|
and max_nb2.nb2 = my_opened_tickets.nb2
|
group by
|
max_nb2.id,max_nb2.nb2)
|
select * from max_by_id_nb2;
|
|
I executed it several times, it took an average 6.5s to run.
If I run explain, on this slow query I've this output :
+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
|
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 40802 | | |
| 4 | DERIVED | town | index | PRIMARY,town_id_IDX | PRIMARY | 4 | NULL | 20401 | Using index; Using temporary; Using filesort | |
| 4 | DERIVED | opened_ticket | ref | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10 | split_materialized.town.id,const | 1 | | |
| 4 | DERIVED | ticket | eq_ref | PRIMARY,ticket_id_IDX | PRIMARY | 4 | split_materialized.opened_ticket.id | 1 | Using index | |
| 4 | DERIVED | <derived3> | ref | key0 | key0 | 4 | split_materialized.town.id | 2 | Using where | |
| 3 | LATERAL DERIVED | town | eq_ref | PRIMARY,town_id_IDX | PRIMARY | 4 | split_materialized.opened_ticket.town_id | 1 | Using where; Using index; Using temporary; Using filesort | |
| 3 | LATERAL DERIVED | opened_ticket | ref | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10 | split_materialized.town.id,const | 1 | | |
| 3 | LATERAL DERIVED | ticket | eq_ref | PRIMARY,ticket_id_IDX | PRIMARY | 4 | split_materialized.opened_ticket.id | 1 | Using index | |
+------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
|
|
We can see the "LATERAL DERIVED" in the select time, this keyword bring me to this page https://mariadb.com/kb/en/lateral-derived-optimization/
where discovered the split_materialized optimization of MariaDB introduced
Then if I disable the option with this cli
set optimizer_switch='split_materialized=off'; |
I execute again the query, the average execution time is about one seconds.
An explain give the following output :
+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
|
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 20401 | | |
| 4 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 20401 | Using temporary; Using filesort | |
| 4 | DERIVED | town | eq_ref | PRIMARY,town_id_IDX | PRIMARY | 4 | max_nb2.id | 1 | Using index | |
| 4 | DERIVED | opened_ticket | ref | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10 | max_nb2.id,const | 1 | Using where | |
| 4 | DERIVED | ticket | eq_ref | PRIMARY,ticket_id_IDX | PRIMARY | 4 | split_materialized.opened_ticket.id | 1 | Using index | |
| 3 | DERIVED | town | index | PRIMARY,town_id_IDX | PRIMARY | 4 | NULL | 20401 | Using index | |
| 3 | DERIVED | opened_ticket | ref | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10 | split_materialized.town.id,const | 1 | | |
| 3 | DERIVED | ticket | eq_ref | PRIMARY,ticket_id_IDX | PRIMARY | 4 | split_materialized.opened_ticket.id | 1 | Using index | |
+------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
|
|
|
Is that possible to fix the split_materialized optimization to work better with this kind of query ?
ps: I have reduced my dataset to make quicker to reproduce, but on production environment this query is very long to execute (several minutes) with split_materialized=on