Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30511

Query slower when split_materialized=on

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.10.2
    • N/A
    • N/A
    • Ubuntu 22.04

    Description

      Hi,

      I've got a case where split_materialized optimization introduced in Mariadb 10.3 is not effective.

      You can easily reproduce this behavior with this sample dataset attached in the file included (it creates 4 tables and fill them enough data to make evidence of the issue). The step to set up the environment with docker are the following :

      docker run --detach --name mariadb --env MARIADB_USER=user --env MARIADB_PASSWORD=my_cool_secret --env MARIADB_ROOT_PASSWORD=my-secret-pw  mariadb:10.10.2
       
      docker cp dataset.sql mariadb:dataset.sql
       
      docker exec -it mariadb mysql -u root --password=my-secret-pw 
       
      set global innodb_buffer_pool_size = 1342177280;
      CREATE DATABASE IF NOT EXISTS split_materialized;
      USE split_materialized;
       
      source dataset.sql
      
      

      On my dev computer (i7 with 12 vcores, 16 GB RAM, disks SSD and OS Ubuntu 22.04), it takes about one minute to executes.

      It creates 4 tables:

      country with 300 rows
      town with 20 000 rows
      ticket with about 3 800 000 rows
      opened_ticket with about 800 000 rows

      Table ticket is in relationships with country and town.
      Table opened_ticket is in relationships with ticket, country and town.

      Once the test dataset is loaded, then you can execute the following query :

      with my_opened_tickets as(
      select
          town.id  as id,
          opened_ticket.nb2,
          opened_ticket.nb1
      from
          town ,
          ticket ,
          opened_ticket
      where
          opened_ticket.id = ticket.id
          and opened_ticket.country_id = 186
          and opened_ticket.town_id = town.id
          ),
      max_nb2 as(
      select
          id,
          max(nb2) nb2
      from
          my_opened_tickets
      group by
          id ),
           max_by_id_nb2 as (
      select
          max_nb2.id,
          max_nb2.nb2,
          max(my_opened_tickets.nb1)
      from
          my_opened_tickets ,
          max_nb2
      where
          my_opened_tickets.id = max_nb2.id
          and max_nb2.nb2 = my_opened_tickets.nb2
      group by
          max_nb2.id,max_nb2.nb2)
          select * from max_by_id_nb2;
      
      

      I executed it several times, it took an average 6.5s to run.

      If I run explain, on this slow query I've this output :

      +------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
      | id   | select_type     | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                      | rows  | Extra                                                     |
      +------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
      |    1 | PRIMARY         | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                     | 40802 |                                                           |
      |    4 | DERIVED         | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                     | 20401 | Using index; Using temporary; Using filesort              |
      |    4 | DERIVED         | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |
      |    4 | DERIVED         | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |
      |    4 | DERIVED         | <derived3>    | ref    | key0                                                                                   | key0                      | 4       | split_materialized.town.id               | 2     | Using where                                               |
      |    3 | LATERAL DERIVED | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | split_materialized.opened_ticket.town_id | 1     | Using where; Using index; Using temporary; Using filesort |
      |    3 | LATERAL DERIVED | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const         | 1     |                                                           |
      |    3 | LATERAL DERIVED | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id      | 1     | Using index                                               |
      +------+-----------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+------------------------------------------+-------+-----------------------------------------------------------+
      
      

      We can see the "LATERAL DERIVED" in the select time, this keyword bring me to this page https://mariadb.com/kb/en/lateral-derived-optimization/
      where discovered the split_materialized optimization of MariaDB introduced

      Then if I disable the option with this cli

      set optimizer_switch='split_materialized=off';
      

      I execute again the query, the average execution time is about one seconds.

      An explain give the following output :

      +------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
      | id   | select_type | table         | type   | possible_keys                                                                          | key                       | key_len | ref                                 | rows  | Extra                           |
      +------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
      |    1 | PRIMARY     | <derived4>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 |                                 |
      |    4 | DERIVED     | <derived3>    | ALL    | NULL                                                                                   | NULL                      | NULL    | NULL                                | 20401 | Using temporary; Using filesort |
      |    4 | DERIVED     | town          | eq_ref | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | max_nb2.id                          | 1     | Using index                     |
      |    4 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | max_nb2.id,const                    | 1     | Using where                     |
      |    4 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |
      |    3 | DERIVED     | town          | index  | PRIMARY,town_id_IDX                                                                    | PRIMARY                   | 4       | NULL                                | 20401 | Using index                     |
      |    3 | DERIVED     | opened_ticket | ref    | PRIMARY,fk_opened_ticket_town1,fk_opened_ticket_country1_idx,opened_ticket_town_id_IDX | opened_ticket_town_id_IDX | 10      | split_materialized.town.id,const    | 1     |                                 |
      |    3 | DERIVED     | ticket        | eq_ref | PRIMARY,ticket_id_IDX                                                                  | PRIMARY                   | 4       | split_materialized.opened_ticket.id | 1     | Using index                     |
      +------+-------------+---------------+--------+----------------------------------------------------------------------------------------+---------------------------+---------+-------------------------------------+-------+---------------------------------+
       
      
      

      Is that possible to fix the split_materialized optimization to work better with this kind of query ?

      ps: I have reduced my dataset to make quicker to reproduce, but on production environment this query is very long to execute (several minutes) with split_materialized=on

      Attachments

        Activity

          People

            Unassigned Unassigned
            jbarotin Jerome Barotin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.