[MDEV-7220] Materialization strategy is not used for REPLACE ... SELECT Created: 2014-11-26  Updated: 2015-02-20  Resolved: 2015-02-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.15
Fix Version/s: 10.0.17

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7215 EXPLAIN REPLACE produces an error: Co... Closed

 Description   

Materialization strategy is not used for REPLACE ... SELECT. Filing this based on CSC#8345.

create table t1 (a int, b int, c int);
insert into t1 select a,a,a from test.one_k A;
 
create table t2 (a int, b int, c int);
 
insert into t2 select A.a, A.a, A.a from test.one_k A;
insert into t2 select * from t2;
insert into t2 select * from t2;
insert into t2 select * from t2;
insert into t2 select * from t2;

Let's make a query with a subquery that is better executed with Materialization:

explain
select * from t1 where (a,b) in (select max(a),b from t2 group by b);
+------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref               | rows  | Extra           |
+------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
|    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL              |  1000 | Using where     |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 8       | j12.t1.a,j12.t1.b |     1 |                 |
|    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL              | 16000 | Using temporary |
+------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+

Ok.
Now, let's run a REPLACE ... SELECT for it:

create table t3 as select * from t1 limit 1;
 
replace into t3
select * from t1 where (a,b) in (select max(a),b from t2 group by b);
 
# Query OK, 1000 rows affected (3 min 30.18 sec)
# Records: 1000  Duplicates: 0  Warnings: 0

3 min looks like it is too long for the above EXPLAIN plan. Indeed, SHOW EXPLAIN shows:

MariaDB [(none)]> show explain for 2;
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL |  1000 | Using where     |
|    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+

It is using IN->EXISTS conversion, not materialization.

CREATE TABLE ... SELECT is not affected:

MariaDB [j12]> create table tmp1 as select * from t1 where (a,b) in (select max(a),b from t2 group by b);
Query OK, 1000 rows affected (0.84 sec)
Records: 1000  Duplicates: 0  Warnings: 0



 Comments   
Comment by Sergei Petrunia [ 2014-11-26 ]

Debugging...
CREATE TABLE ... SELECT calls convert_subq_to_jtbm.

REPLACE ... SELECT calls convert_join_subqueries_to_semijoins

but then it exits here:

  if (join->select_lex->sj_subselects.is_empty())
    DBUG_RETURN(FALSE);

Comment by Sergei Petrunia [ 2014-11-26 ]

(gdb) wher
  #0  is_materialization_applicable (thd=0x31d2150, in_subs=0x7fff980c7e10, child_select=0x7fff980078f0) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:523
  #1  0x00000000007cd26f in check_and_do_in_subquery_rewrites (join=0x7fff980c8b38) at /home/psergey/dev2/10.0/sql/opt_subselect.cc:719
  #2  0x0000000000698d55 in JOIN::prepare (this=0x7fff980c8b38, rref_pointer_array=0x7fff98007b68, tables_init=0x7fff980c7738, wild_num=0, conds_init=0x0, og_num=1, order_init=0x0, skip_order_by=false, group_init=0x7fff98008728, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x7fff980078f0, unit_arg=0x7fff98007c68) at /home/psergey/dev2/10.0/sql/sql_select.cc:827
  #3  0x00000000009039f0 in subselect_single_select_engine::prepare (this=0x7fff980c7fc0) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3480
  #4  0x00000000008fa654 in Item_subselect::fix_fields (this=0x7fff980c7e10, thd_param=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:258
  #5  0x0000000000902cf5 in Item_in_subselect::fix_fields (this=0x7fff980c7e10, thd_arg=0x31d2150, ref=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/item_subselect.cc:3163
  #6  0x000000000060df1e in setup_conds (thd=0x31d2150, tables=0x7fff98007008, leaves=..., conds=0x7fff980c8500) at /home/psergey/dev2/10.0/sql/sql_base.cc:8549
  #7  0x00000000006dc907 in setup_without_group (thd=0x31d2150, ref_pointer_array=0x7fff980c8980, tables=0x7fff98007008, leaves=..., fields=..., all_fields=..., conds=0x7fff980c8500, order=0x0, group=0x0, hidden_group_fields=0x7fff980c83e0, reserved=0x31d67e4) at /home/psergey/dev2/10.0/sql/sql_select.cc:633
  #8  0x0000000000698976 in JOIN::prepare (this=0x7fff980c80e0, rref_pointer_array=0x31d67c0, tables_init=0x7fff98007008, wild_num=1, conds_init=0x7fff980c7e10, og_num=0, order_init=0x0, skip_order_by=false, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x31d6548, unit_arg=0x31d5e60) at /home/psergey/dev2/10.0/sql/sql_select.cc:790
  #9  0x00000000006a17bb in mysql_select (thd=0x31d2150, rref_pointer_array=0x31d67c0, tables=0x7fff98007008, wild_num=1, fields=..., conds=0x7fff980c7e10, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489925888, result=0x7fff980c8048, unit=0x31d5e60, select_lex=0x31d6548) at /home/psergey/dev2/10.0/sql/sql_select.cc:3286
  #10 0x0000000000697b7b in handle_select (thd=0x31d2150, lex=0x31d5d98, result=0x7fff980c8048, setup_tables_done_option=1073741824) at /home/psergey/dev2/10.0/sql/sql_select.cc:373

Here in is_materialization_applicable () we have:

  if (optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION) &&      // 0
        !child_select->is_part_of_union() &&                          // 1
        parent_unit->first_select()->leaf_tables.elements &&          // 2
        (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
         thd->lex->sql_command == SQLCOM_CREATE_TABLE) &&              // *

while we have:

(gdb) p thd->lex->sql_command
  $226 = SQLCOM_REPLACE_SELECT

Comment by Sergei Petrunia [ 2014-11-26 ]

is_materialization_applicable() check is also used for non-semi-join materialization.

Comment by Sergei Petrunia [ 2014-11-26 ]

Tracking where this check of thd->lex->sql_command comes from...
I have added SQLCOM_CREATE_TABLE in a bugfix...
but the original thd->lex->sql_command == SQLCOM_SELECT comes from timour@askmonty.org-20100930153244-euf58b5c27dpdj0u

MWL#89: Cost-based choice between Materialization and IN->EXISTS transformation
 
  Phase 3: Implementation of re-optimization of subqueries with injected predicates
           and cost comparison between Materialization and IN->EXISTS strategies.

Comment by Sergei Petrunia [ 2014-11-26 ]

Thinking of the reasons that Materialization won't work with non-select commands. Not finding any, so far.

Comment by Sergei Petrunia [ 2014-11-26 ]

Btw, there is a non intrusive fix possible: just change

        (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
         thd->lex->sql_command == SQLCOM_CREATE_TABLE) &&              // *

to

        (thd->lex->sql_command == SQLCOM_SELECT ||                     // *
         thd->lex->sql_command == SQLCOM_CREATE_TABLE ||
         thd->lex->sql_command == SQLCOM_INSERT_SELECT ||
         thd->lex->sql_command == SQLCOM_REPLACE_SELECT ) &&              // *

This is certainly safe

Comment by Sergei Petrunia [ 2014-11-26 ]

http://lists.askmonty.org/pipermail/commits/2014-November/007074.html

Comment by Sergei Petrunia [ 2014-11-26 ]

elenst, could you run tests with the above commit (it is for current 10.0). The interesting test mixes would be those that use subquery materialization (need uncorrelated IN subqueries), and non-SELECT statements.

Comment by Elena Stepanova [ 2014-12-23 ]

With the same test case as in the description, but instead of REPLACE, I do

update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);

Isn't it supposed to use materialization now, too? It doesn't:

MariaDB [test]> explain update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
|    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
2 rows in set (0.00 sec)

and

MariaDB [test]> show processlist;
+----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
| Id | User | Host            | db   | Command | Time | State                | Info                                                                       | Progress |
+----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
|  6 | root | localhost:48852 | test | Query   |  120 | Copying to tmp table | update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b) |    0.000 |
|  7 | root | localhost:48875 | test | Query   |    0 | init                 | show processlist                                                           |    0.000 |
+----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
2 rows in set (0.00 sec)
 
MariaDB [test]> show explain for 6;
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
|    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
2 rows in set, 1 warning (0.00 sec)

Comment by Elena Stepanova [ 2015-01-10 ]

We discussed earlier that it's supposed to be looked at; re-assigning back so that it doesn't get lost.

I ran a number of test rounds, didn't get wrong results or crashes. I didn't run performance tests since when DML is involved, it's too volatile to be run in an environment not specifically tuned for benchmarking.

The issue above is what I got from manual testing.
If it's not a bug or is a different bug, please push your bugfix for MDEV-7220.

Comment by Sergei Petrunia [ 2015-02-20 ]

elenst, the difference between the statements is quite big under the hood. REPLACE ... SELECT has a SELECT part (which has a JOIN structure, which allows for semi-join optimizations and so forth). The UPDATE you've found belongs to a class of single-table UPDATE/DELETE statements. We have a task for enabling semi-join optimization for those, MDEV-7487. I guess we should look at non-semijoin optimizations, too.

Comment by Sergei Petrunia [ 2015-02-20 ]

Ok I will push this fix

Generated at Thu Feb 08 07:17:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.