Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7220

Materialization strategy is not used for REPLACE ... SELECT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.15
    • 10.0.17
    • Optimizer
    • None

    Description

      Materialization strategy is not used for REPLACE ... SELECT. Filing this based on CSC#8345.

      create table t1 (a int, b int, c int);
      insert into t1 select a,a,a from test.one_k A;
       
      create table t2 (a int, b int, c int);
       
      insert into t2 select A.a, A.a, A.a from test.one_k A;
      insert into t2 select * from t2;
      insert into t2 select * from t2;
      insert into t2 select * from t2;
      insert into t2 select * from t2;

      Let's make a query with a subquery that is better executed with Materialization:

      explain
      select * from t1 where (a,b) in (select max(a),b from t2 group by b);
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
      | id   | select_type  | table       | type   | possible_keys | key          | key_len | ref               | rows  | Extra           |
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+
      |    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL              |  1000 | Using where     |
      |    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 8       | j12.t1.a,j12.t1.b |     1 |                 |
      |    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL              | 16000 | Using temporary |
      +------+--------------+-------------+--------+---------------+--------------+---------+-------------------+-------+-----------------+

      Ok.
      Now, let's run a REPLACE ... SELECT for it:

      create table t3 as select * from t1 limit 1;
       
      replace into t3
      select * from t1 where (a,b) in (select max(a),b from t2 group by b);
       
      # Query OK, 1000 rows affected (3 min 30.18 sec)
      # Records: 1000  Duplicates: 0  Warnings: 0

      3 min looks like it is too long for the above EXPLAIN plan. Indeed, SHOW EXPLAIN shows:

      MariaDB [(none)]> show explain for 2;
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
      | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
      |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL |  1000 | Using where     |
      |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
      +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+

      It is using IN->EXISTS conversion, not materialization.

      CREATE TABLE ... SELECT is not affected:

      MariaDB [j12]> create table tmp1 as select * from t1 where (a,b) in (select max(a),b from t2 group by b);
      Query OK, 1000 rows affected (0.84 sec)
      Records: 1000  Duplicates: 0  Warnings: 0

      Attachments

        Issue Links

          Activity

            elenst, could you run tests with the above commit (it is for current 10.0). The interesting test mixes would be those that use subquery materialization (need uncorrelated IN subqueries), and non-SELECT statements.

            psergei Sergei Petrunia added a comment - elenst , could you run tests with the above commit (it is for current 10.0). The interesting test mixes would be those that use subquery materialization (need uncorrelated IN subqueries), and non-SELECT statements.

            With the same test case as in the description, but instead of REPLACE, I do

            update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);

            Isn't it supposed to use materialization now, too? It doesn't:

            MariaDB [test]> explain update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b);
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            2 rows in set (0.00 sec)

            and

            MariaDB [test]> show processlist;
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            | Id | User | Host            | db   | Command | Time | State                | Info                                                                       | Progress |
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            |  6 | root | localhost:48852 | test | Query   |  120 | Copying to tmp table | update t3 set c = 100  where (a,b) in (select max(a),b from t2 group by b) |    0.000 |
            |  7 | root | localhost:48875 | test | Query   |    0 | init                 | show processlist                                                           |    0.000 |
            +----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+
            2 rows in set (0.00 sec)
             
            MariaDB [test]> show explain for 6;
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY            | t3    | ALL  | NULL          | NULL | NULL    | NULL |  1001 | Using where     |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 15794 | Using temporary |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            2 rows in set, 1 warning (0.00 sec)

            elenst Elena Stepanova added a comment - With the same test case as in the description, but instead of REPLACE, I do update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b); Isn't it supposed to use materialization now, too? It doesn't: MariaDB [test]> explain update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b); + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 1001 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 15794 | Using temporary | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ 2 rows in set (0.00 sec) and MariaDB [test]> show processlist; + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ | 6 | root | localhost:48852 | test | Query | 120 | Copying to tmp table | update t3 set c = 100 where (a,b) in ( select max (a),b from t2 group by b) | 0.000 | | 7 | root | localhost:48875 | test | Query | 0 | init | show processlist | 0.000 | + ----+------+-----------------+------+---------+------+----------------------+----------------------------------------------------------------------------+----------+ 2 rows in set (0.00 sec)   MariaDB [test]> show explain for 6; + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 1001 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 15794 | Using temporary | + ------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ 2 rows in set , 1 warning (0.00 sec)

            We discussed earlier that it's supposed to be looked at; re-assigning back so that it doesn't get lost.

            I ran a number of test rounds, didn't get wrong results or crashes. I didn't run performance tests since when DML is involved, it's too volatile to be run in an environment not specifically tuned for benchmarking.

            The issue above is what I got from manual testing.
            If it's not a bug or is a different bug, please push your bugfix for MDEV-7220.

            elenst Elena Stepanova added a comment - We discussed earlier that it's supposed to be looked at; re-assigning back so that it doesn't get lost. I ran a number of test rounds, didn't get wrong results or crashes. I didn't run performance tests since when DML is involved, it's too volatile to be run in an environment not specifically tuned for benchmarking. The issue above is what I got from manual testing. If it's not a bug or is a different bug, please push your bugfix for MDEV-7220 .

            elenst, the difference between the statements is quite big under the hood. REPLACE ... SELECT has a SELECT part (which has a JOIN structure, which allows for semi-join optimizations and so forth). The UPDATE you've found belongs to a class of single-table UPDATE/DELETE statements. We have a task for enabling semi-join optimization for those, MDEV-7487. I guess we should look at non-semijoin optimizations, too.

            psergei Sergei Petrunia added a comment - elenst , the difference between the statements is quite big under the hood. REPLACE ... SELECT has a SELECT part (which has a JOIN structure, which allows for semi-join optimizations and so forth). The UPDATE you've found belongs to a class of single-table UPDATE/DELETE statements. We have a task for enabling semi-join optimization for those, MDEV-7487 . I guess we should look at non-semijoin optimizations, too.

            Ok I will push this fix

            psergei Sergei Petrunia added a comment - Ok I will push this fix

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.