Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7220

Materialization strategy is not used for REPLACE ... SELECT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.15
    • 10.0.17
    • Optimizer
    • None

    Description

      Materialization strategy is not used for REPLACE ... SELECT. Filing this based on CSC#8345.

      create table t1 (a int, b int, c int);
      insert into t1 select a,a,a from test.one_k A;
       
      create table t2 (a int, b int, c int);
       
      insert into t2 select A.a, A.a, A.a from test.one_k A;
      insert into t2 select * from t2;
      insert into t2 select * from t2;
      insert into t2 select * from t2;
      insert into t2 select * from t2;

      Let's make a query with a subquery that is better executed with Materialization:

      explain
      select * from t1 where (a,b) in (select max(a),b from t2 group by b);
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
      | id   | select_type  | table       | type   | possible_keys | key          | key_len | ref               | rows  | Extra           |
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
      |    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL              |  1000 | Using where     |
      |    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 8       | j12.t1.a,j12.t1.b |     1 |                 |
      |    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL              | 16000 | Using temporary |
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+

      Ok.
      Now, let's run a REPLACE ... SELECT for it:

      create table t3 as select * from t1 limit 1;
       
      replace into t3
      select * from t1 where (a,b) in (select max(a),b from t2 group by b);
       
      # Query OK, 1000 rows affected (3 min 30.18 sec)
      # Records: 1000  Duplicates: 0  Warnings: 0

      3 min looks like it is too long for the above EXPLAIN plan. Indeed, SHOW EXPLAIN shows:

      MariaDB [(none)]> show explain for 2;
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
      | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
      |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL |  1000 | Using where     |
      |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+

      It is using IN->EXISTS conversion, not materialization.

      CREATE TABLE ... SELECT is not affected:

      MariaDB [j12]> create table tmp1 as select * from t1 where (a,b) in (select max(a),b from t2 group by b);
      Query OK, 1000 rows affected (0.84 sec)
      Records: 1000  Duplicates: 0  Warnings: 0

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value

            Debugging...
            CREATE TABLE ... SELECT calls convert_subq_to_jtbm.

            REPLACE ... SELECT calls convert_join_subqueries_to_semijoins

            but then it exits here:

              if (join->select_lex->sj_subselects.is_empty())
                DBUG_RETURN(FALSE);

            psergei Sergei Petrunia added a comment - Debugging... CREATE TABLE ... SELECT calls convert_subq_to_jtbm. REPLACE ... SELECT calls convert_join_subqueries_to_semijoins but then it exits here: if (join->select_lex->sj_subselects.is_empty()) DBUG_RETURN(FALSE);

            (gdb) wher
              #0  is_materialization_applicable (thd=0x31d2150, in_subs=0x7fff980c7e10, child_select=0x7fff980078f0) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:523
              #1  0x00000000007cd26f in check_and_do_in_subquery_rewrites (join=0x7fff980c8b38) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:719
              #2  0x0000000000698d55 in JOIN::prepare (this=0x7fff980c8b38, rref_pointer_array=0x7fff98007b68, tables_init=0x7fff980c7738, wild_num=0, conds_init=0x0, og_num=1, order_init=0x0, skip_order_by=false, group_init=0x7fff98008728, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x7fff980078f0, unit_arg=0x7fff98007c68) at /home/psergey/dev2/10.0/sql/sql_select.cc:827
              #3  0x00000000009039f0 in subselect_single_select_engine::prepare (this=0x7fff980c7fc0) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3480
              #4  0x00000000008fa654 in Item_subselect::fix_fields (this=0x7fff980c7e10, thd_param=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:258
              #5  0x0000000000902cf5 in Item_in_subselect::fix_fields (this=0x7fff980c7e10, thd_arg=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3163
              #6  0x000000000060df1e in setup_conds (thd=0x31d2150, tables=0x7fff98007008, leaves=..., conds=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/sql_base.cc:8549
              #7  0x00000000006dc907 in setup_without_group (thd=0x31d2150, ref_pointer_array=0x7fff980c8980, tables=0x7fff98007008, leaves=..., fields=..., all_fields=..., conds=0x7fff980c8500, order=0x0, group=0x0, hidden_group_fields=0x7fff980c83e0, reserved=0x31d67e4) at /home/psergey/dev2/10.0/sql/sql_select.cc:633
              #8  0x0000000000698976 in JOIN::prepare (this=0x7fff980c80e0, rref_pointer_array=0x31d67c0, tables_init=0x7fff98007008, wild_num=1, conds_init=0x7fff980c7e10, og_num=0, order_init=0x0, skip_order_by=false, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x31d6548, unit_arg=0x31d5e60) at /home/psergey/dev2/10.0/sql/sql_select.cc:790
              #9  0x00000000006a17bb in mysql_select (thd=0x31d2150, rref_pointer_array=0x31d67c0, tables=0x7fff98007008, wild_num=1, fields=..., conds=0x7fff980c7e10, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489925888, result=0x7fff980c8048, unit=0x31d5e60, select_lex=0x31d6548) at /home/psergey/dev2/10.0/sql/sql_select.cc:3286
              #10 0x0000000000697b7b in handle_select (thd=0x31d2150, lex=0x31d5d98, result=0x7fff980c8048, setup_tables_done_option=1073741824) at /home/psergey/dev2/10.0/sql/sql_select.cc:373

            Here in is_materialization_applicable () we have:

              if (optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION) &&      // 0
                    !child_select->is_part_of_union() &&                          // 1
                    parent_unit->first_select()->leaf_tables.elements &&          // 2
                    (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
                     thd->lex->sql_command == SQLCOM_CREATE_TABLE) &&              // *

            while we have:

            (gdb) p thd->lex->sql_command
              $226 = SQLCOM_REPLACE_SELECT

            psergei Sergei Petrunia added a comment - (gdb) wher #0 is_materialization_applicable (thd=0x31d2150, in_subs=0x7fff980c7e10, child_select=0x7fff980078f0) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:523 #1 0x00000000007cd26f in check_and_do_in_subquery_rewrites (join=0x7fff980c8b38) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:719 #2 0x0000000000698d55 in JOIN::prepare (this=0x7fff980c8b38, rref_pointer_array=0x7fff98007b68, tables_init=0x7fff980c7738, wild_num=0, conds_init=0x0, og_num=1, order_init=0x0, skip_order_by=false, group_init=0x7fff98008728, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x7fff980078f0, unit_arg=0x7fff98007c68) at /home/psergey/dev2/10.0/sql/sql_select.cc:827 #3 0x00000000009039f0 in subselect_single_select_engine::prepare (this=0x7fff980c7fc0) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3480 #4 0x00000000008fa654 in Item_subselect::fix_fields (this=0x7fff980c7e10, thd_param=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:258 #5 0x0000000000902cf5 in Item_in_subselect::fix_fields (this=0x7fff980c7e10, thd_arg=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3163 #6 0x000000000060df1e in setup_conds (thd=0x31d2150, tables=0x7fff98007008, leaves=..., conds=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/sql_base.cc:8549 #7 0x00000000006dc907 in setup_without_group (thd=0x31d2150, ref_pointer_array=0x7fff980c8980, tables=0x7fff98007008, leaves=..., fields=..., all_fields=..., conds=0x7fff980c8500, order=0x0, group=0x0, hidden_group_fields=0x7fff980c83e0, reserved=0x31d67e4) at /home/psergey/dev2/10.0/sql/sql_select.cc:633 #8 0x0000000000698976 in JOIN::prepare (this=0x7fff980c80e0, rref_pointer_array=0x31d67c0, tables_init=0x7fff98007008, wild_num=1, conds_init=0x7fff980c7e10, og_num=0, order_init=0x0, skip_order_by=false, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x31d6548, unit_arg=0x31d5e60) at /home/psergey/dev2/10.0/sql/sql_select.cc:790 #9 0x00000000006a17bb in mysql_select (thd=0x31d2150, rref_pointer_array=0x31d67c0, tables=0x7fff98007008, wild_num=1, fields=..., conds=0x7fff980c7e10, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489925888, result=0x7fff980c8048, unit=0x31d5e60, select_lex=0x31d6548) at /home/psergey/dev2/10.0/sql/sql_select.cc:3286 #10 0x0000000000697b7b in handle_select (thd=0x31d2150, lex=0x31d5d98, result=0x7fff980c8048, setup_tables_done_option=1073741824) at /home/psergey/dev2/10.0/sql/sql_select.cc:373 Here in is_materialization_applicable () we have: if (optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION) && // 0 !child_select->is_part_of_union() && // 1 parent_unit->first_select()->leaf_tables.elements && // 2 (thd->lex->sql_command == SQLCOM_SELECT || // * thd->lex->sql_command == SQLCOM_CREATE_TABLE) && // * while we have: (gdb) p thd->lex->sql_command $226 = SQLCOM_REPLACE_SELECT

            is_materialization_applicable() check is also used for non-semi-join materialization.

            psergei Sergei Petrunia added a comment - is_materialization_applicable() check is also used for non-semi-join materialization.

            Tracking where this check of thd->lex->sql_command comes from...
            I have added SQLCOM_CREATE_TABLE in a bugfix...
            but the original thd->lex->sql_command == SQLCOM_SELECT comes from timour@askmonty.org-20100930153244-euf58b5c27dpdj0u

            MWL#89: Cost-based choice between Materialization and IN->EXISTS transformation
             
              Phase 3: Implementation of re-optimization of subqueries with injected predicates
                       and cost comparison between Materialization and IN->EXISTS strategies.

            psergei Sergei Petrunia added a comment - Tracking where this check of thd->lex->sql_command comes from... I have added SQLCOM_CREATE_TABLE in a bugfix... but the original thd->lex->sql_command == SQLCOM_SELECT comes from timour@askmonty.org-20100930153244-euf58b5c27dpdj0u MWL#89: Cost-based choice between Materialization and IN->EXISTS transformation   Phase 3: Implementation of re-optimization of subqueries with injected predicates and cost comparison between Materialization and IN->EXISTS strategies.

            Thinking of the reasons that Materialization won't work with non-select commands. Not finding any, so far.

            psergei Sergei Petrunia added a comment - Thinking of the reasons that Materialization won't work with non-select commands. Not finding any, so far.
            psergei Sergei Petrunia added a comment - - edited

            Btw, there is a non intrusive fix possible: just change

                    (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
                     thd->lex->sql_command == SQLCOM_CREATE_TABLE) &&              // *

            to

                    (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
                     thd->lex->sql_command == SQLCOM_CREATE_TABLE ||
                     thd->lex->sql_command == SQLCOM_INSERT_SELECT ||
                     thd->lex->sql_command == SQLCOM_REPLACE_SELECT ) &&              // *

            This is certainly safe

            psergei Sergei Petrunia added a comment - - edited Btw, there is a non intrusive fix possible: just change (thd->lex->sql_command == SQLCOM_SELECT || // * thd->lex->sql_command == SQLCOM_CREATE_TABLE) && // * to (thd->lex->sql_command == SQLCOM_SELECT || // * thd->lex->sql_command == SQLCOM_CREATE_TABLE || thd->lex->sql_command == SQLCOM_INSERT_SELECT || thd->lex->sql_command == SQLCOM_REPLACE_SELECT ) && // * This is certainly safe
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            psergei Sergei Petrunia added a comment - http://lists.askmonty.org/pipermail/commits/2014-November/007074.html

            elenst, could you run tests with the above commit (it is for current 10.0). The interesting test mixes would be those that use subquery materialization (need uncorrelated IN subqueries), and non-SELECT statements.

            psergei Sergei Petrunia added a comment - elenst , could you run tests with the above commit (it is for current 10.0). The interesting test mixes would be those that use subquery materialization (need uncorrelated IN subqueries), and non-SELECT statements.
            elenst Elena Stepanova made changes -
            Assignee Sergei Petrunia [ psergey ] Elena Stepanova [ elenst ]

            With the same test case as in the description, but instead of REPLACE, I do

            update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);

            Isn't it supposed to use materialization now, too? It doesn't:

            MariaDB [test]> explain update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            2 rows in set (0.00 sec)

            and

            MariaDB [test]> show processlist;
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            | Id | User | Host            | db   | Command | Time | State                | Info                                                                       | Progress |
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            |  6 | root | localhost:48852 | test | Query   |  120 | Copying to tmp table | update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b) |    0.000 |
            |  7 | root | localhost:48875 | test | Query   |    0 | init                 | show processlist                                                           |    0.000 |
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            2 rows in set (0.00 sec)
             
            MariaDB [test]> show explain for 6;
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            2 rows in set, 1 warning (0.00 sec)

            elenst Elena Stepanova added a comment - With the same test case as in the description, but instead of REPLACE, I do update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b); Isn't it supposed to use materialization now, too? It doesn't: MariaDB [test]> explain update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b); + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 1001 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 15794 | Using temporary | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ 2 rows in set (0.00 sec) and MariaDB [test]> show processlist; + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ | 6 | root | localhost:48852 | test | Query | 120 | Copying to tmp table | update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b) | 0.000 | | 7 | root | localhost:48875 | test | Query | 0 | init | show processlist | 0.000 | + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ 2 rows in set (0.00 sec)   MariaDB [test]> show explain for 6; + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 1001 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 15794 | Using temporary | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ 2 rows in set , 1 warning (0.00 sec)
            elenst Elena Stepanova made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            We discussed earlier that it's supposed to be looked at; re-assigning back so that it doesn't get lost.

            I ran a number of test rounds, didn't get wrong results or crashes. I didn't run performance tests since when DML is involved, it's too volatile to be run in an environment not specifically tuned for benchmarking.

            The issue above is what I got from manual testing.
            If it's not a bug or is a different bug, please push your bugfix for MDEV-7220.

            elenst Elena Stepanova added a comment - We discussed earlier that it's supposed to be looked at; re-assigning back so that it doesn't get lost. I ran a number of test rounds, didn't get wrong results or crashes. I didn't run performance tests since when DML is involved, it's too volatile to be run in an environment not specifically tuned for benchmarking. The issue above is what I got from manual testing. If it's not a bug or is a different bug, please push your bugfix for MDEV-7220 .
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Sergei Petrunia [ psergey ]

            elenst, the difference between the statements is quite big under the hood. REPLACE ... SELECT has a SELECT part (which has a JOIN structure, which allows for semi-join optimizations and so forth). The UPDATE you've found belongs to a class of single-table UPDATE/DELETE statements. We have a task for enabling semi-join optimization for those, MDEV-7487. I guess we should look at non-semijoin optimizations, too.

            psergei Sergei Petrunia added a comment - elenst , the difference between the statements is quite big under the hood. REPLACE ... SELECT has a SELECT part (which has a JOIN structure, which allows for semi-join optimizations and so forth). The UPDATE you've found belongs to a class of single-table UPDATE/DELETE statements. We have a task for enabling semi-join optimization for those, MDEV-7487 . I guess we should look at non-semijoin optimizations, too.

            Ok I will push this fix

            psergei Sergei Petrunia added a comment - Ok I will push this fix
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.0.17 [ 18300 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 58740 ] MariaDB v3 [ 65746 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 65746 ] MariaDB v4 [ 148554 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.