Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25008

Single-table UPDATE/DELETE: make cost-based choice between subquery strategies

Details

    Description

      There is a query in my application to delete duplicate records (specific fields), this query used to work fine on MySQL (v 8.0.21) on Windows 10 and takes about a minute,

      When I migrated to MariaDB 10.5.9 on Ubuntu 20.04 , this query stucks, I left it for +6 hours and it didn't come back.

      The table has about 4 million records, table is:

      CREATE TABLE `item_variant_price` (
        `seller_variant_id` int(11) NOT NULL AUTO_INCREMENT,
        `item_id` varchar(100) DEFAULT NULL,
        `price` float DEFAULT NULL,
        `seller_name` varchar(400) DEFAULT NULL,
        `variant` varchar(400) DEFAULT NULL,
        `is_fulfilled` int(11) NOT NULL DEFAULT 0,
        `insertion_date` timestamp NOT NULL DEFAULT current_timestamp(),
        `modification_date` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
        `is_main_page` int(11) DEFAULT 0,
        `is_cheapest` int(11) NOT NULL DEFAULT 0,
        `variant_url` varchar(400) DEFAULT NULL,
        PRIMARY KEY (`variant_id`),
        FULLTEXT KEY `item_variant_price_serial_IDX` (`item_id`,`seller_name`,`variant`)
      ) ENGINE=InnoDB AUTO_INCREMENT=4309337 DEFAULT CHARSET=utf8mb4;
      

      and this is the query:

      delete from item_variant_price where seller_variant_id not in (select m from (select max(seller_variant_id) m from item_variant_price group by item_id, seller_name, variant) as innerTable);
      

      the inner select statement takes 27 seconds, so there has to be an issue with delete ?

      Thanks for your support

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment - - edited

            I tried with 10000 rows, 2.8s in MariaDB 10.5.9 vs 0.07 Mysql 8.0.21. After setting optimizer_switch to 'in_to_exists=off' - MariaDB 10.5.9 -0.02

            --source include/have_innodb.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1 ( 
              id int NOT NULL PRIMARY KEY,
              item_id varchar(100),
              seller_name varchar(400),
              variant varchar(400),
              FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant)
            )engine=innodb;
             
            insert into t1 select seq,seq,seq,seq from seq_1_to_10000;
             
            analyze format=json
            DELETE FROM t1 WHERE id NOT IN
              (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            
            

            MariaDB 10.5.9

            MariaDB [test]> analyze  DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            | id   | select_type        | table      | type | possible_keys | key  | key_len | ref  | rows  | r_rows   | filtered | r_filtered | Extra                           |
            +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            |    1 | PRIMARY            | t1         | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 10000.00 |   100.00 |       0.00 | Using where                     |
            |    2 | DEPENDENT SUBQUERY | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 5000.50  |   100.00 |       0.02 | Using where                     |
            |    3 | DERIVED            | t1         | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 10000.00 |   100.00 |     100.00 | Using temporary; Using filesort |
            +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            3 rows in set (2.900 sec)
             
            MariaDB [test]> analyze format=json  DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | ANALYZE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
            +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | {
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 2958.619018,
                "table": {
                  "delete": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 10000,
                  "r_rows": 10000,
                  "r_filtered": 0,
                  "r_total_time_ms": 1.089836561,
                  "attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))"
                },
                "subqueries": [
                  {
                    "query_block": {
                      "select_id": 2,
                      "r_loops": 10000,
                      "r_total_time_ms": 2956.083461,
                      "having_condition": "innertable.m is null",
                      "table": {
                        "table_name": "<derived3>",
                        "access_type": "ALL",
                        "r_loops": 10000,
                        "rows": 10000,
                        "r_rows": 5000.5,
                        "r_table_time_ms": 624.9038041,
                        "r_other_time_ms": 2316.295274,
                        "filtered": 100,
                        "r_filtered": 0.019998,
                        "attached_condition": "10000 = innertable.m or innertable.m is null",
                        "materialized": {
                          "query_block": {
                            "select_id": 3,
                            "r_loops": 1,
                            "r_total_time_ms": 15.52120094,
                            "filesort": {
                              "sort_key": "t1.item_id, t1.seller_name, t1.variant",
                              "r_loops": 1,
                              "r_total_time_ms": 4.314342992,
                              "r_used_priority_queue": false,
                              "r_output_rows": 10000,
                              "r_buffer_size": "2047Kb",
                              "r_sort_mode": "packed_sort_key,rowid",
                              "temporary_table": {
                                "table": {
                                  "table_name": "t1",
                                  "access_type": "ALL",
                                  "r_loops": 1,
                                  "rows": 10000,
                                  "r_rows": 10000,
                                  "r_table_time_ms": 1.606918251,
                                  "r_other_time_ms": 7.823054272,
                                  "filtered": 100,
                                  "r_filtered": 100
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                ]
              }
            } |
            +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (2.957 sec)
            
            

            Mysql 8.0.21

            mysql> explain DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
            | id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
            +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
            |  1 | DELETE      | t1         | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10000 |   100.00 | Using where     |
            |  2 | SUBQUERY    | <derived3> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10000 |   100.00 | NULL            |
            |  3 | DERIVED     | t1         | NULL       | ALL  | t1_serial_IDX | NULL | NULL    | NULL | 10000 |   100.00 | Using temporary |
            +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
            3 rows in set, 1 warning (0.00 sec)
             
            Note (Code 1003): delete from `test`.`t1` where <in_optimizer>(`test`.`t1`.`id`,`test`.`t1`.`id` in ( <materialize> (/* select#2 */ select `innertable`.`m` from (/* select#3 */ select max(`test`.`t1`.`id`) AS `m` from `test`.`t1` group by `test`.`t1`.`item_id`,`test`.`t1`.`seller_name`,`test`.`t1`.`variant`) `innertable` where true having true ), <primary_index_lookup>(`test`.`t1`.`id` in <temporary table> on <auto_distinct_key> where ((`test`.`t1`.`id` = `<materialized_subquery>`.`m`)))) is false)
             
            mysql> explain analyze  DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
            +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | -> Delete from t1 (immediate)
                -> Filter: <in_optimizer>(t1.id,t1.id in (select #2) is false)  (cost=1015.12 rows=10000) (actual time=75.521..75.521 rows=0 loops=1)
                    -> Table scan on t1  (cost=1015.12 rows=10000) (actual time=0.023..17.489 rows=10000 loops=1)
                    -> Select #2 (subquery in condition; run only once)
                        -> Filter: ((t1.id = `<materialized_subquery>`.m))  (actual time=0.000..0.000 rows=1 loops=10001)
                            -> Limit: 1 row(s)  (actual time=0.000..0.000 rows=1 loops=10001)
                                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (m=t1.id)  (actual time=0.000..0.000 rows=1 loops=10001)
                                    -> Materialize with deduplication  (actual time=0.006..0.006 rows=1 loops=10001)
                                        -> Table scan on innertable  (actual time=0.001..0.336 rows=10000 loops=1)
                                            -> Materialize  (actual time=49.180..49.904 rows=10000 loops=1)
                                                -> Table scan on <temporary>  (actual time=0.001..0.652 rows=10000 loops=1)
                                                    -> Aggregate using temporary table  (actual time=47.329..48.414 rows=10000 loops=1)
                                                        -> Table scan on t1  (cost=1015.12 rows=10000) (actual time=0.009..31.229 rows=10000 loops=1)
             |
            +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (0.08 sec)
             
            mysql>  DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            Query OK, 0 rows affected (0.07 sec)
            

            with optimizer_switch='in_to_exists=off':

            MariaDB 10.5.9

            MariaDB [test]>   set optimizer_switch='in_to_exists=off';
            Query OK, 0 rows affected (0.000 sec)
             
            MariaDB [test]> analyze DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            | id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows  | r_rows   | filtered | r_filtered | Extra                           |
            +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            |    1 | PRIMARY      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 10000.00 |   100.00 |       0.00 | Using where                     |
            |    2 | MATERIALIZED | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 10000.00 |   100.00 |     100.00 |                                 |
            |    3 | DERIVED      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 10000 | 10000.00 |   100.00 |     100.00 | Using temporary; Using filesort |
            +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
            3 rows in set (0.022 sec)
            

            alice Alice Sherepa added a comment - - edited I tried with 10000 rows, 2.8s in MariaDB 10.5.9 vs 0.07 Mysql 8.0.21. After setting optimizer_switch to 'in_to_exists=off' - MariaDB 10.5.9 -0.02 --source include/have_innodb.inc --source include/have_sequence.inc   CREATE TABLE t1 ( id int NOT NULL PRIMARY KEY , item_id varchar (100), seller_name varchar (400), variant varchar (400), FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant) )engine=innodb;   insert into t1 select seq,seq,seq,seq from seq_1_to_10000;   analyze format=json DELETE FROM t1 WHERE id NOT IN ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); MariaDB 10.5.9 MariaDB [test]> analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 0.00 | Using where | | 2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 10000 | 5000.50 | 100.00 | 0.02 | Using where | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | Using temporary; Using filesort | +------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ 3 rows in set (2.900 sec)   MariaDB [test]> analyze format=json DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ANALYZE | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | { "query_block": { "select_id": 1, "r_total_time_ms": 2958.619018, "table": { "delete": 1, "table_name": "t1", "access_type": "ALL", "rows": 10000, "r_rows": 10000, "r_filtered": 0, "r_total_time_ms": 1.089836561, "attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))" }, "subqueries": [ { "query_block": { "select_id": 2, "r_loops": 10000, "r_total_time_ms": 2956.083461, "having_condition": "innertable.m is null", "table": { "table_name": "<derived3>", "access_type": "ALL", "r_loops": 10000, "rows": 10000, "r_rows": 5000.5, "r_table_time_ms": 624.9038041, "r_other_time_ms": 2316.295274, "filtered": 100, "r_filtered": 0.019998, "attached_condition": "10000 = innertable.m or innertable.m is null", "materialized": { "query_block": { "select_id": 3, "r_loops": 1, "r_total_time_ms": 15.52120094, "filesort": { "sort_key": "t1.item_id, t1.seller_name, t1.variant", "r_loops": 1, "r_total_time_ms": 4.314342992, "r_used_priority_queue": false, "r_output_rows": 10000, "r_buffer_size": "2047Kb", "r_sort_mode": "packed_sort_key,rowid", "temporary_table": { "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 10000, "r_rows": 10000, "r_table_time_ms": 1.606918251, "r_other_time_ms": 7.823054272, "filtered": 100, "r_filtered": 100 } } } } } } } } ] } } | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (2.957 sec) Mysql 8.0.21 mysql> explain DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+ | 1 | DELETE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | Using where | | 2 | SUBQUERY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | NULL | | 3 | DERIVED | t1 | NULL | ALL | t1_serial_IDX | NULL | NULL | NULL | 10000 | 100.00 | Using temporary | +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+ 3 rows in set, 1 warning (0.00 sec)   Note (Code 1003): delete from `test`.`t1` where <in_optimizer>(`test`.`t1`.`id`,`test`.`t1`.`id` in ( <materialize> (/* select#2 */ select `innertable`.`m` from (/* select#3 */ select max(`test`.`t1`.`id`) AS `m` from `test`.`t1` group by `test`.`t1`.`item_id`,`test`.`t1`.`seller_name`,`test`.`t1`.`variant`) `innertable` where true having true ), <primary_index_lookup>(`test`.`t1`.`id` in <temporary table> on <auto_distinct_key> where ((`test`.`t1`.`id` = `<materialized_subquery>`.`m`)))) is false)   mysql> explain analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Delete from t1 (immediate) -> Filter: <in_optimizer>(t1.id,t1.id in (select #2) is false) (cost=1015.12 rows=10000) (actual time=75.521..75.521 rows=0 loops=1) -> Table scan on t1 (cost=1015.12 rows=10000) (actual time=0.023..17.489 rows=10000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((t1.id = `<materialized_subquery>`.m)) (actual time=0.000..0.000 rows=1 loops=10001) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=1 loops=10001) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (m=t1.id) (actual time=0.000..0.000 rows=1 loops=10001) -> Materialize with deduplication (actual time=0.006..0.006 rows=1 loops=10001) -> Table scan on innertable (actual time=0.001..0.336 rows=10000 loops=1) -> Materialize (actual time=49.180..49.904 rows=10000 loops=1) -> Table scan on <temporary> (actual time=0.001..0.652 rows=10000 loops=1) -> Aggregate using temporary table (actual time=47.329..48.414 rows=10000 loops=1) -> Table scan on t1 (cost=1015.12 rows=10000) (actual time=0.009..31.229 rows=10000 loops=1) | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.08 sec)   mysql> DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); Query OK, 0 rows affected (0.07 sec) with optimizer_switch='in_to_exists=off': MariaDB 10.5.9 MariaDB [test]> set optimizer_switch='in_to_exists=off'; Query OK, 0 rows affected (0.000 sec)   MariaDB [test]> analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 0.00 | Using where | | 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | Using temporary; Using filesort | +------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+ 3 rows in set (0.022 sec)
            ahmedadel47 Ahmed Wahba added a comment -

            Thanks Alice for you reply,

            indeed after disabling: set optimizer_switch='in_to_exists=off';

            the query worked, and it took 2m 35s .

            ahmedadel47 Ahmed Wahba added a comment - Thanks Alice for you reply, indeed after disabling: set optimizer_switch='in_to_exists=off'; the query worked, and it took 2m 35s .
            ahmedadel47 Ahmed Wahba added a comment -

            Hi Team, following up on this issue, has there been an update on it?

            Thanks,
            Ahmed

            ahmedadel47 Ahmed Wahba added a comment - Hi Team, following up on this issue, has there been an update on it? Thanks, Ahmed

            Sorry, no. Let me bump the priority

            serg Sergei Golubchik added a comment - Sorry, no. Let me bump the priority
            ahmedadel47 Ahmed Wahba added a comment -

            Hi Team, thanks for verifying the issue.

            what is the way forward? is there a target date? target fix version?

            Thanks.

            ahmedadel47 Ahmed Wahba added a comment - Hi Team, thanks for verifying the issue. what is the way forward? is there a target date? target fix version? Thanks.

            Looks like the same issue as MDEV-22415.

            psergei Sergei Petrunia added a comment - Looks like the same issue as MDEV-22415 .
            ycp Yuchen Pei added a comment - - edited

            A poc solution, following the ideas at https://jira.mariadb.org/browse/MDEV-22415?focusedCommentId=288641&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-288641

            d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            ycp Yuchen Pei added a comment - - edited A poc solution, following the ideas at https://jira.mariadb.org/browse/MDEV-22415?focusedCommentId=288641&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-288641 d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update
            ycp Yuchen Pei added a comment - - edited

            After comparing 10.5 with 11.1, I have a different patch that fixes this testcase and a similar update statement:

            modified   sql/sql_derived.cc
            @@ -850,7 +850,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
               */
               thd->create_tmp_table_for_derived= TRUE;
               if (!(derived->table) &&
            -      derived->derived_result->create_result_table(thd, &unit->types, FALSE,
            +      derived->derived_result->create_result_table(thd, &unit->types, unit->distinct,
                                                                (first_select->options |
                                                                thd->variables.option_bits |
                                                                TMP_TABLE_ALL_COLUMNS),

            See also monty's change below (with no MDEV reference) that was pushed to 11.0:

            commit 5e5a8eda1641eda1d915a7eb5736e494d2179795
            Author: Monty <monty@mariadb.org>
            Date:   Wed May 4 17:26:43 2022 +0300
             
                Derived tables and union can now create distinct keys
                
                The idea is that instead of marking all select_lex's with DISTINCT, we
                only mark those that really need distinct result.
                
                Benefits of this change:
                - Temporary tables used with derived tables, UNION, IN are now smaller
                  as duplicates are removed already on the insert phase.
                - The optimizer can now produce better plans with EQ_REF. This can be
                  seen from the tests where several queries does not anymore materialize
                  derived tables twice.
                - Queries affected by 'in_predicate_conversion_threshold' where large IN
                  lists are converted to sub query produces better plans.
                
                Other things:
                - Removed on duplicate call to sel->init_select() in
                  LEX::add_primary_to_query_expression_body()
                - I moved the testing of
                  tab->table->pos_in_table_list->is_materialized_derived()
                  in join_read_const_table() to the caller as it caused problems for
                  derived tables that could be proven to be const tables.
                  This also is likely to fix some bugs as if join_read_const_table()
                  was aborted, the table was left marked as JT_CONST, which cannot
                  be good.  I added an ASSERT there for now that can be removed when
                  the code has been properly tested.

            Performance-wise, let's consider this test:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1 (
              id int NOT NULL PRIMARY KEY,
              item_id varchar(100),
              seller_name varchar(400),
              variant varchar(400),
              FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant)
            )engine=innodb;
             
            insert into t1 select seq,seq,seq,seq from seq_1_to_10000;
             
            set profiling=1;
            DELETE FROM t1 WHERE id NOT IN
              (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
             
            UPDATE t1 SET item_id="foo" WHERE id NOT IN
              (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            show profiles;
            set profiling=0;
             
            drop table t1;

            And I note the output duration of show profiles for the DELETE and UPDATE queries respectively

            • at 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2
              11.38s (DELETE) + 11.30s (UPDATE)
            • at monty's original commit:
              0.87 + 0.76
            • at parent commit to monty's commit:
              10.92 + 10.96
            • at dc7d83a9005 bb-10.5-mdev-25008-monty-patch, which is the backport of monty's commit to 10.5
              1.18 + 1.07
            • at b402994270e bb-10.5-mdev-25008-unit-distinct (the diff in this comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2)
              1.17 + 1.01
            • at 687b7fd83f6 bb-10.5-mdev-25008-outer-lookup-keys (the poc patch in the previous comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2)
              1.16 + 1.06

            From this we could conclude that the two fixes result in similar performance improvements, and that monty's commit seems to be the reason that the bug is not present in 11.0+.

            In terms of tests, bb-10.5-mdev-25008-unit-distinct b402994270ea89f5e186a430fa8810d29cbcc4db fails a massive number of tests. bb-10.5-mdev-25008-monty-patch dc7d83a90056f82c92817c3e555ad4ba86a33c94 fails an assertion in main.derived_opt. And bb-10.5-mdev-25008-outer-lookup-keys looks ok.

            ycp Yuchen Pei added a comment - - edited After comparing 10.5 with 11.1, I have a different patch that fixes this testcase and a similar update statement: modified sql/sql_derived.cc @@ -850,7 +850,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived) */ thd->create_tmp_table_for_derived= TRUE; if (!(derived->table) && - derived->derived_result->create_result_table(thd, &unit->types, FALSE, + derived->derived_result->create_result_table(thd, &unit->types, unit->distinct, (first_select->options | thd->variables.option_bits | TMP_TABLE_ALL_COLUMNS), See also monty's change below (with no MDEV reference) that was pushed to 11.0: commit 5e5a8eda1641eda1d915a7eb5736e494d2179795 Author: Monty <monty@mariadb.org> Date: Wed May 4 17:26:43 2022 +0300   Derived tables and union can now create distinct keys The idea is that instead of marking all select_lex's with DISTINCT, we only mark those that really need distinct result. Benefits of this change: - Temporary tables used with derived tables, UNION, IN are now smaller as duplicates are removed already on the insert phase. - The optimizer can now produce better plans with EQ_REF. This can be seen from the tests where several queries does not anymore materialize derived tables twice. - Queries affected by 'in_predicate_conversion_threshold' where large IN lists are converted to sub query produces better plans. Other things: - Removed on duplicate call to sel->init_select() in LEX::add_primary_to_query_expression_body() - I moved the testing of tab->table->pos_in_table_list->is_materialized_derived() in join_read_const_table() to the caller as it caused problems for derived tables that could be proven to be const tables. This also is likely to fix some bugs as if join_read_const_table() was aborted, the table was left marked as JT_CONST, which cannot be good. I added an ASSERT there for now that can be removed when the code has been properly tested. Performance-wise, let's consider this test: --source include/have_innodb.inc --source include/have_sequence.inc   CREATE TABLE t1 ( id int NOT NULL PRIMARY KEY , item_id varchar (100), seller_name varchar (400), variant varchar (400), FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant) )engine=innodb;   insert into t1 select seq,seq,seq,seq from seq_1_to_10000;   set profiling=1; DELETE FROM t1 WHERE id NOT IN ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);   UPDATE t1 SET item_id= "foo" WHERE id NOT IN ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); show profiles; set profiling=0;   drop table t1; And I note the output duration of show profiles for the DELETE and UPDATE queries respectively at 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2 11.38s (DELETE) + 11.30s (UPDATE) at monty's original commit: 0.87 + 0.76 at parent commit to monty's commit: 10.92 + 10.96 at dc7d83a9005 bb-10.5-mdev-25008-monty-patch, which is the backport of monty's commit to 10.5 1.18 + 1.07 at b402994270e bb-10.5-mdev-25008-unit-distinct (the diff in this comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2) 1.17 + 1.01 at 687b7fd83f6 bb-10.5-mdev-25008-outer-lookup-keys (the poc patch in the previous comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2) 1.16 + 1.06 From this we could conclude that the two fixes result in similar performance improvements, and that monty's commit seems to be the reason that the bug is not present in 11.0+. In terms of tests, bb-10.5-mdev-25008-unit-distinct b402994270ea89f5e186a430fa8810d29cbcc4db fails a massive number of tests. bb-10.5-mdev-25008-monty-patch dc7d83a90056f82c92817c3e555ad4ba86a33c94 fails an assertion in main.derived_opt. And bb-10.5-mdev-25008-outer-lookup-keys looks ok.

            d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc
            index 41f46beb54b..776ad34223b 100644
            --- a/sql/opt_subselect.cc
            +++ b/sql/opt_subselect.cc
            @@ -6606,11 +6606,8 @@ bool JOIN::choose_subquery_plan(table_map join_tables)
                 }
                 else
                 {
            -      /*
            -        TODO: outer_join can be NULL for DELETE statements.
            -        How to compute its cost?
            -      */
            -      outer_lookup_keys= 1;
            +      outer_lookup_keys=
            +        (double) unit->outer_select()->table_list.first->table->stat_records();
                 }
             
                 /*
            

            So, for

            update world_population w
            where
              w.country_index='Finland' and
              w.column IN (SELECT ...)
            

            we will use outer_lookup_keys=6Billion even if the query is likely to use an index INDEX(country_index) and have a QUICK_RANGE_SELECT with rows=5 million.

            There a question if this rows=5M is stored anywhere, except in some local variable in mysql_update. If it is not, it should be.
            I guess, the answer to this question may be different in 11.1 where MDEV-28883 has introduced Sql_cmd_update/delete classes.

            psergei Sergei Petrunia added a comment - d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc index 41f46beb54b..776ad34223b 100644 --- a/sql/opt_subselect.cc +++ b/sql/opt_subselect.cc @@ -6606,11 +6606,8 @@ bool JOIN::choose_subquery_plan(table_map join_tables) } else { - /* - TODO: outer_join can be NULL for DELETE statements. - How to compute its cost? - */ - outer_lookup_keys= 1; + outer_lookup_keys= + (double) unit->outer_select()->table_list.first->table->stat_records(); } /* So, for update world_population w where w.country_index= 'Finland' and w. column IN ( SELECT ...) we will use outer_lookup_keys=6Billion even if the query is likely to use an index INDEX(country_index) and have a QUICK_RANGE_SELECT with rows=5 million. There a question if this rows=5M is stored anywhere, except in some local variable in mysql_update. If it is not, it should be. I guess , the answer to this question may be different in 11.1 where MDEV-28883 has introduced Sql_cmd_update/delete classes.

            A testcase which still fails for me in 11.6

            create table t1 (a int, b int);
            insert into t1 select seq,seq from seq_1_to_1000;
             
            create table t2 (a int, b int);
            insert into t2 select seq,floor(seq/100) from seq_1_to_20000;
             
            create table t3 (a int, b int, key(a));
            insert into t3 select seq,floor(seq/100) from seq_1_to_20000;
            

            explain 
            select * from t1
            where 
              b>3 or
              a in (select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999);
            

            +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
            | id   | select_type  | table | type | possible_keys | key  | key_len | ref      | rows  | Extra       |
            +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
            |    1 | PRIMARY      | t1    | ALL  | NULL          | NULL | NULL    | NULL     | 1000  | Using where |
            |    2 | MATERIALIZED | t2    | ALL  | NULL          | NULL | NULL    | NULL     | 19874 | Using where |
            |    2 | MATERIALIZED | t3    | ref  | a             | a    | 5       | j20.t2.b | 1     |             |
            +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
            

            Ok.
            Now, let's try DELETE

            explain 
            delete from t1
            where 
              b>3 or
              a in (select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999);
            

            +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref      | rows  | Extra       |
            +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
            |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL     | 1000  | Using where |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL     | 19874 | Using where |
            |    2 | DEPENDENT SUBQUERY | t3    | ref  | a             | a    | 5       | j20.t2.b | 1     | Using where |
            +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
            

            and see that it uses IN->EXISTS which looks to be obviously worse than Materialization.

            psergei Sergei Petrunia added a comment - A testcase which still fails for me in 11.6 create table t1 (a int , b int ); insert into t1 select seq,seq from seq_1_to_1000;   create table t2 (a int , b int ); insert into t2 select seq,floor(seq/100) from seq_1_to_20000;   create table t3 (a int , b int , key (a)); insert into t3 select seq,floor(seq/100) from seq_1_to_20000; explain select * from t1 where b>3 or a in ( select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999); +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 19874 | Using where | | 2 | MATERIALIZED | t3 | ref | a | a | 5 | j20.t2.b | 1 | | +------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+ Ok. Now, let's try DELETE explain delete from t1 where b>3 or a in ( select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999); +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 19874 | Using where | | 2 | DEPENDENT SUBQUERY | t3 | ref | a | a | 5 | j20.t2.b | 1 | Using where | +------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+ and see that it uses IN->EXISTS which looks to be obviously worse than Materialization.

            Debugging the above, I can see the execution to enter JOIN(select_lex->select_number=2)::choose_subquery_plan().
            It reaches this if statement and takes does NOT take it:

                  if (outer_join && outer_join->table_count > 0 && // (1)
                      outer_join->join_tab &&                      // (2)
                      !in_subs->const_item())
                  {
                    outer_join->get_partial_cost_and_fanout(in_subs->get_join_tab_idx(),
                                                            table_map(-1),
                                                            &dummy,
                                                            &outer_lookup_keys);
            

            Why does it not take it? Because outer_join->join_tab==0...

            (gdb) p outer_join->table_count
              $13 = 1
            (gdb) p outer_join->join_tab
              $14 = (JOIN_TAB *) 0x0
            (gdb) p in_subs->const_item()
              $15 = false
            

            psergei Sergei Petrunia added a comment - Debugging the above, I can see the execution to enter JOIN(select_lex->select_number=2)::choose_subquery_plan(). It reaches this if statement and takes does NOT take it: if (outer_join && outer_join->table_count > 0 && // (1) outer_join->join_tab && // (2) !in_subs->const_item()) { outer_join->get_partial_cost_and_fanout(in_subs->get_join_tab_idx(), table_map(-1), &dummy, &outer_lookup_keys); Why does it not take it? Because outer_join->join_tab==0... (gdb) p outer_join->table_count $13 = 1 (gdb) p outer_join->join_tab $14 = (JOIN_TAB *) 0x0 (gdb) p in_subs->const_item() $15 = false
            psergei Sergei Petrunia added a comment - - edited

            ... and if I debug ANALYZE DELETE... I see that outer_join->join_tab remains NULL for the entire duration of the statement... Looks like the join object is an imitation ?

            I'm wondering on which condition JOIN object stops being an imitation? if the top-level JOIN has more than one table? In case it has one table, should we continue the imitation and create JOIN_TAB and POSITION and fill them with data describing the single-table access? Or, don't imitate and have JOIN have a pointer to Sql_cmd_delete/update so that choose_subquery_plan() can fetch the real number of subquery invocations from there?

            psergei Sergei Petrunia added a comment - - edited ... and if I debug ANALYZE DELETE... I see that outer_join->join_tab remains NULL for the entire duration of the statement... Looks like the join object is an imitation ? I'm wondering on which condition JOIN object stops being an imitation? if the top-level JOIN has more than one table? In case it has one table, should we continue the imitation and create JOIN_TAB and POSITION and fill them with data describing the single-table access? Or, don't imitate and have JOIN have a pointer to Sql_cmd_delete/update so that choose_subquery_plan() can fetch the real number of subquery invocations from there?
            ycp Yuchen Pei added a comment -

            Hi psergei, following up the discussions on tuesday, ptal at this patch thanks

            0190cb9ad3d bb-11.1-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            Two questions:

            • the bug is for 10.5+, but this change requires MDEV-28883 which is 11.1+ - do we need to devise a separate patch for 10.5?
            • main.ps and main.ps_mem_leaks fail in multiple CI builders. The failure (Assertion `(mem_root->flags & 4) == 0' failed) is bizarre and I could not reproduce it locally - do you have any clue what may have caused it? Here are a list of builder:
              • amd64-windows
              • amd64-ubuntu-2004-debug
              • amd64-ubuntu-2204-debug-ps
              • amd64-debian-11-msan
              • amd64-debian-11-msan-clang-16
            ycp Yuchen Pei added a comment - Hi psergei , following up the discussions on tuesday, ptal at this patch thanks 0190cb9ad3d bb-11.1-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update Two questions: the bug is for 10.5+, but this change requires MDEV-28883 which is 11.1+ - do we need to devise a separate patch for 10.5? main.ps and main.ps_mem_leaks fail in multiple CI builders. The failure (Assertion `(mem_root->flags & 4) == 0' failed) is bizarre and I could not reproduce it locally - do you have any clue what may have caused it? Here are a list of builder: amd64-windows amd64-ubuntu-2004-debug amd64-ubuntu-2204-debug-ps amd64-debian-11-msan amd64-debian-11-msan-clang-16
            ycp Yuchen Pei added a comment - - edited

            The main.ps assertion failure (in a MDEV-8756 test case but the ONLY_FULL_GROUP_BY is not needed to get the assertion failure) can only be reproduced with the flag PROTECT_STATEMENT_MEMROOT, introduced in MDEV-14959. The failing assertion used to be protected by this flag, but the protection was removed in the commit below. Perhaps we should re-protect the assertion with the flag for consistency, also considering that all the other uses of ROOT_FLAG_READ_ONLY are protected by this flag? cc serg

            commit 69d78cd3f858748c0ee5ccd6ae1e77cfc063d57b
            Author: Sergei Golubchik <serg@mariadb.org>
            Date:   Thu Nov 23 09:56:56 2023 +0100
             
                move MEM_ROOT::read_only into flags
            

            @@@ -281,42 -209,16 +272,37 @@@ void *alloc_root(MEM_ROOT *mem_root, si
                DBUG_ENTER("alloc_root");
                DBUG_PRINT("enter",("root: %p", mem_root));
                DBUG_ASSERT(alloc_root_inited(mem_root));
            -
            - #ifdef PROTECT_STATEMENT_MEMROOT
            -   DBUG_ASSERT(mem_root->read_only == 0);
            - #endif
            +   DBUG_ASSERT((mem_root->flags & ROOT_FLAG_READ_ONLY) == 0);
            

            ycp Yuchen Pei added a comment - - edited The main.ps assertion failure (in a MDEV-8756 test case but the ONLY_FULL_GROUP_BY is not needed to get the assertion failure) can only be reproduced with the flag PROTECT_STATEMENT_MEMROOT , introduced in MDEV-14959 . The failing assertion used to be protected by this flag, but the protection was removed in the commit below. Perhaps we should re-protect the assertion with the flag for consistency, also considering that all the other uses of ROOT_FLAG_READ_ONLY are protected by this flag? cc serg commit 69d78cd3f858748c0ee5ccd6ae1e77cfc063d57b Author: Sergei Golubchik <serg@mariadb.org> Date: Thu Nov 23 09:56:56 2023 +0100   move MEM_ROOT::read_only into flags @@@ -281,42 -209,16 +272,37 @@@ void *alloc_root(MEM_ROOT *mem_root, si DBUG_ENTER("alloc_root"); DBUG_PRINT("enter",("root: %p", mem_root)); DBUG_ASSERT(alloc_root_inited(mem_root)); - - #ifdef PROTECT_STATEMENT_MEMROOT - DBUG_ASSERT(mem_root->read_only == 0); - #endif + DBUG_ASSERT((mem_root->flags & ROOT_FLAG_READ_ONLY) == 0);

            The Assert means that memory leaks in PS/SP, so you should need to find why. In fact it mean that it continue to allocate memory on the statement mem_root after first execution.

            sanja Oleksandr Byelkin added a comment - The Assert means that memory leaks in PS/SP, so you should need to find why. In fact it mean that it continue to allocate memory on the statement mem_root after first execution.
            serg Sergei Golubchik added a comment - - edited

            ycp, mem_root->read_only didn't exist unless PROTECT_STATEMENT_MEMROOT was defined, so any access had to be protected with an #ifdef. But mem_root->flags is always declared, there is no need to protect accesses to it.

            Note that ROOT_FLAG_READ_ONLY is always set under #ifdef PROTECT_STATEMENT_MEMROOT, so without it the assert should always be satisfied.

            serg Sergei Golubchik added a comment - - edited ycp , mem_root->read_only didn't exist unless PROTECT_STATEMENT_MEMROOT was defined, so any access had to be protected with an #ifdef . But mem_root->flags is always declared, there is no need to protect accesses to it. Note that ROOT_FLAG_READ_ONLY is always set under #ifdef PROTECT_STATEMENT_MEMROOT , so without it the assert should always be satisfied.
            ycp Yuchen Pei added a comment - - edited

            serg, exactly, the assertion can only fail when the compiler flag PROTECT_STATEMENT_MEMROOT is on, which seems to be defeating the purpose of an assertion. It can also be confusing if it fails in the CI but not locally, as has happened to me. The confusion can be mitigated by placing a code comment "this assertion is only failable with the PROTECT_STATEMENT_MEMROOT flag set". But overall it seems rather inconsistent - what is the downside of protecting the assertion again with PROTECT_STATEMENT_MEMROOT?

            ycp Yuchen Pei added a comment - - edited serg , exactly, the assertion can only fail when the compiler flag PROTECT_STATEMENT_MEMROOT is on, which seems to be defeating the purpose of an assertion. It can also be confusing if it fails in the CI but not locally, as has happened to me. The confusion can be mitigated by placing a code comment "this assertion is only failable with the PROTECT_STATEMENT_MEMROOT flag set". But overall it seems rather inconsistent - what is the downside of protecting the assertion again with PROTECT_STATEMENT_MEMROOT ?
            ycp Yuchen Pei added a comment -

            Hi psergei, I've updated the patch according to the discussion yesterday. Note that the call to optimize_unflattened_subqueries(false) in case of no matching records needs to happen earlier as otherwise the the inner join would be destroyed in free_underlaid_joins() causing the optimization to be skipped below:

            // in st_select_lex::optimize_unflattened_subqueries:
                    if (!inner_join)
                      continue;
            

            ptal thanks:

            7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY
            

            ycp Yuchen Pei added a comment - Hi psergei , I've updated the patch according to the discussion yesterday. Note that the call to optimize_unflattened_subqueries(false) in case of no matching records needs to happen earlier as otherwise the the inner join would be destroyed in free_underlaid_joins() causing the optimization to be skipped below: // in st_select_lex::optimize_unflattened_subqueries: if (!inner_join) continue ; ptal thanks: 7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update 6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY
            psergei Sergei Petrunia added a comment - - edited

            ycp, ok I've checked the

             6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY
            

            before reading the comments above and I agree with serg that this patch is not needed.

            Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys).

            Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c.

            For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. (EDIT but not in 11.6-based tree for some reason ...)

            psergei Sergei Petrunia added a comment - - edited ycp , ok I've checked the 6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY before reading the comments above and I agree with serg that this patch is not needed. Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys). Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c. For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. ( EDIT but not in 11.6-based tree for some reason ...)
            psergei Sergei Petrunia added a comment - - edited

            As for

            7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            --- a/sql/sql_cmd.h
            +++ b/sql/sql_cmd.h
            @@ -324,10 +324,12 @@ class Sql_cmd_dml : public Sql_cmd
             
               select_result *get_result() { return result; }
             
            +  my_off_t get_scanned_rows() { return scanned_rows; }
            +
            

            Please use ha_rows. This is the type for row counts.

            diff --git a/sql/sql_update.cc b/sql/sql_update.cc
            index ea07caabf35..324761424e8 100644
            --- a/sql/sql_update.cc
            +++ b/sql/sql_update.cc
            @@ -423,7 +423,7 @@ bool Sql_cmd_update::update_single_table(THD *thd)
               switch_to_nullable_trigger_fields(*values, table);
             
               /* Apply the IN=>EXISTS transformation to all subqueries and optimize them */
            -  if (select_lex->optimize_unflattened_subqueries(false))
            +  if (select_lex->optimize_unflattened_subqueries(true))
                 DBUG_RETURN(TRUE);
            

            Fix comment. It's "all constant subqueries" now.
            Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too.

            @@ -539,8 +545,18 @@ bool Sql_cmd_update::update_single_table(THD *thd)
             
               table->update_const_key_parts(conds);
               order= simple_remove_const(order, conds);
            -  query_plan.scanned_rows= select? select->records: table->file->stats.records;
            -        
            +
            +  /*
            +    Estimate the number of scanned rows and have it accessible in
            +    JOIN::choose_subquery_plan() from the outer join through
            +    JOIN::sql_cmd_dml
            +  */
            +  scanned_rows= query_plan.scanned_rows= select ?
            +    select->records : table->file->stats.records;
            +  select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this;
            

            Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).

            +    /*
            +      In case of a DELETE or UPDATE, get number of scanned rows as an
            +      (upper bound) estimate of how many times the subquery will be
            +      executed.
            +    */
            +    else if (outer_join && outer_join->sql_cmd_dml)
            +      outer_lookup_keys= (double) outer_join->sql_cmd_dml->get_scanned_rows();
            

            Please use rows2double.

            psergei Sergei Petrunia added a comment - - edited As for 7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update --- a/sql/sql_cmd.h +++ b/sql/sql_cmd.h @@ -324,10 +324,12 @@ class Sql_cmd_dml : public Sql_cmd select_result *get_result() { return result; } + my_off_t get_scanned_rows() { return scanned_rows; } + Please use ha_rows. This is the type for row counts. diff --git a/sql/sql_update.cc b/sql/sql_update.cc index ea07caabf35..324761424e8 100644 --- a/sql/sql_update.cc +++ b/sql/sql_update.cc @@ -423,7 +423,7 @@ bool Sql_cmd_update::update_single_table(THD *thd) switch_to_nullable_trigger_fields(*values, table); /* Apply the IN=>EXISTS transformation to all subqueries and optimize them */ - if (select_lex->optimize_unflattened_subqueries(false)) + if (select_lex->optimize_unflattened_subqueries(true)) DBUG_RETURN(TRUE); Fix comment. It's "all constant subqueries" now. Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too. @@ -539,8 +545,18 @@ bool Sql_cmd_update::update_single_table(THD *thd) table->update_const_key_parts(conds); order= simple_remove_const(order, conds); - query_plan.scanned_rows= select? select->records: table->file->stats.records; - + + /* + Estimate the number of scanned rows and have it accessible in + JOIN::choose_subquery_plan() from the outer join through + JOIN::sql_cmd_dml + */ + scanned_rows= query_plan.scanned_rows= select ? + select->records : table->file->stats.records; + select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this; Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me). + /* + In case of a DELETE or UPDATE, get number of scanned rows as an + (upper bound) estimate of how many times the subquery will be + executed. + */ + else if (outer_join && outer_join->sql_cmd_dml) + outer_lookup_keys= (double) outer_join->sql_cmd_dml->get_scanned_rows(); Please use rows2double.

            So I take the current branch, compile with {{cmake ... -DWITH_PROTECT_STATEMENT_MEMROOT=1 }}, apply this patch:

            diff --git a/mysql-test/main/ps.test b/mysql-test/main/ps.test
            index 9cb9eaf3e76..1c903583d59 100644
            --- a/mysql-test/main/ps.test
            +++ b/mysql-test/main/ps.test
            @@ -3658,7 +3658,7 @@ CREATE TABLE t2 ( id INT(10) );
             SET @save_sql_mode= @@sql_mode;
             SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY';
             
            -PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)'; 
            +PREPARE stmt FROM 'explain UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)'; 
             execute stmt;
             insert into t1 values (1,10),(2,10),(3,10);
             insert into t2 values (1),(2);
            

            and I still see the crash.
            ycp, we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece:

            if (conds)                                                                                 
              {                                                                                          
                Item::cond_result result;
                conds= conds->remove_eq_conds(thd, &result, true);                                       
                if (result == Item::COND_FALSE)             // Impossible where                          
                {
                  limit= 0;
                  query_plan.set_impossible_where();                                                     
                  if (thd->lex->describe || thd->lex->analyze_stmt)                                      
                    goto produce_explain_and_leave;
                } 
              } 
            

            is this:

            set @var1=1; 
            set @var2=2;
            CREATE TABLE t1 ( id INT(10), value INT(10) );
            CREATE TABLE t2 ( id INT(10) );
            insert into t1 values (1,10),(2,10),(3,10);
            insert into t2 values (1),(2);
             
            PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id) WHERE  ?=?';
            execute stmt using @var1, @var2;
            execute stmt using @var1, @var1;
            deallocate prepare stmt;
            DROP TABLE t1,t2;
            

            also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second.

            psergei Sergei Petrunia added a comment - So I take the current branch, compile with {{cmake ... -DWITH_PROTECT_STATEMENT_MEMROOT=1 }}, apply this patch: diff --git a/mysql-test/main/ps.test b/mysql-test/main/ps.test index 9cb9eaf3e76..1c903583d59 100644 --- a/mysql-test/main/ps.test +++ b/mysql-test/main/ps.test @@ -3658,7 +3658,7 @@ CREATE TABLE t2 ( id INT(10) ); SET @save_sql_mode= @@sql_mode; SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; -PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)'; +PREPARE stmt FROM 'explain UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)'; execute stmt; insert into t1 values (1,10),(2,10),(3,10); insert into t2 values (1),(2); and I still see the crash. ycp , we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece: if (conds) { Item::cond_result result; conds= conds->remove_eq_conds(thd, &result, true ); if (result == Item::COND_FALSE) // Impossible where { limit= 0; query_plan.set_impossible_where(); if (thd->lex->describe || thd->lex->analyze_stmt) goto produce_explain_and_leave; } } is this: set @var1=1; set @var2=2; CREATE TABLE t1 ( id INT (10), value INT (10) ); CREATE TABLE t2 ( id INT (10) ); insert into t1 values (1,10),(2,10),(3,10); insert into t2 values (1),(2);   PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id) WHERE ?=?' ; execute stmt using @var1, @var2; execute stmt using @var1, @var1; deallocate prepare stmt; DROP TABLE t1,t2; also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second.
            ycp Yuchen Pei added a comment - - edited

            New patch:

            d295367afaf upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            Onto review comments:

            before reading the comments above and I agree with serg that this patch is not needed.

            Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys).

            Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c.

            OK, I've removed that patch.

            For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. (EDIT but not in 11.6-based tree for some reason ...)

            The following patch made it so that the flag is enabled by default in 10.6-11.1, but not 10.5 or 11.2+ - why is that, sanja?

            commit 8ed3c375929b449ef8556cfa0c49a35707d59b84
            Author: Oleksandr Byelkin <sanja@mariadb.com>
            Date:   Thu Jul 4 09:27:30 2024 +0200
             
                Make PROTECT_STATEMENT_MEMROOT default for version less then 11.2
            

            Please use ha_rows. This is the type for row counts.

            Done. I didn't use ha_rows to avoid introducing the first #include to the immaculate sql_cmd.h.

            Fix comment. It's "all constant subqueries" now.
            Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too.

            Done.

            Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).

            To reduce code duplication. Done.

            Please use rows2double.

            Done.

            ycp, we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece:

            Done.

            also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second.

            Done-ish, as my current patch covers this code path now. However, I still get an assertion failure when prune_partitions() produces empty set for the first ps and non-empty for the second. Turned out it is an existing bug for 10.5+, so I opened MDEV-34757 and mark it as blocking this issue.

            ycp Yuchen Pei added a comment - - edited New patch: d295367afaf upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update Onto review comments: before reading the comments above and I agree with serg that this patch is not needed. Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys). Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c. OK, I've removed that patch. For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. ( EDIT but not in 11.6-based tree for some reason ...) The following patch made it so that the flag is enabled by default in 10.6-11.1, but not 10.5 or 11.2+ - why is that, sanja ? commit 8ed3c375929b449ef8556cfa0c49a35707d59b84 Author: Oleksandr Byelkin <sanja@mariadb.com> Date: Thu Jul 4 09:27:30 2024 +0200   Make PROTECT_STATEMENT_MEMROOT default for version less then 11.2 Please use ha_rows. This is the type for row counts. Done. I didn't use ha_rows to avoid introducing the first #include to the immaculate sql_cmd.h . Fix comment. It's "all constant subqueries" now. Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too. Done. Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me). To reduce code duplication. Done. Please use rows2double. Done. ycp , we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece: Done. also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second. Done-ish, as my current patch covers this code path now. However, I still get an assertion failure when prune_partitions() produces empty set for the first ps and non-empty for the second. Turned out it is an existing bug for 10.5+, so I opened MDEV-34757 and mark it as blocking this issue.
            ycp Yuchen Pei added a comment - - edited

            An updated patch, which compared to the one in the previous comments has added tests for delete, as well as explanatory comments in the test main.ps_mem_leaks describing the coverage of various calls to optimize_unflattened_subqueries(false):

            28ce0d1f062 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            Note that it is on top of an MDEV-34757 patch, as the testcases overlap - all the PROTECT_STATEMENT_MEMROOT cases are in main.ps_mem_leaks. The patch of this issue does not introduce any new failures, and can be pushed without MDEV-34757, as long as we ensure the test coverage by including the testcases in the push.

            ycp Yuchen Pei added a comment - - edited An updated patch, which compared to the one in the previous comments has added tests for delete, as well as explanatory comments in the test main.ps_mem_leaks describing the coverage of various calls to optimize_unflattened_subqueries(false) : 28ce0d1f062 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update Note that it is on top of an MDEV-34757 patch, as the testcases overlap - all the PROTECT_STATEMENT_MEMROOT cases are in main.ps_mem_leaks . The patch of this issue does not introduce any new failures, and can be pushed without MDEV-34757 , as long as we ensure the test coverage by including the testcases in the push.
            ycp Yuchen Pei added a comment - - edited

            The current patch causes a regression on the following 2nd ps execution test.

            # .opt file
            --enable-sql-safe-updates

            # .test file
            create table t1 (a int, b int, primary key (a), key (b));
            insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8);
            CREATE TABLE t2 ( id INT(10) );
            insert into t2 values (1),(2);
             
            # calls select_lex->optimize_unflattened_subqueries(false) in the err label
            set @@optimizer_switch="index_merge=off";
            set @var1=1;
            set @var2=2;
            prepare stmt from 'update t1 set b=(SELECT 1 FROM t2 WHERE id = t1.a) where 1=? or b=2';
            --error ER_UPDATE_WITHOUT_KEY_IN_SAFE_MODE
            execute stmt using @var1;
            execute stmt using @var2;
            drop table t1, t2;

            This is because, when there is an error, calling st_select_lex::optimize_unflattened_subqueries(false) will set first_cond_optimization for the inner select to false, but skip saving leaf tables:

            // in JOIN::optimize_inner():
                if (thd->is_error() ||
                  (!select_lex->leaf_tables_saved && select_lex->save_leaf_tables(thd)))
                {

            If this happens during the first ps execution, then at the second execution, it will cause an reconstruction of the leaf tables from the (not) saved leaf tables leaf_tables_exec:

            // in setup_tables():
              if (select_lex->first_cond_optimization)
              {
                // reconstruct leaf tables
              }
              else
              {
                // construct empty leaf tables from the empty select_lex->leaf_tables_exec
              }

            But it seems that the leaf tables should not be empty here, and the empty-ness is the cause of the segv in make_join_statistics().

            There are two fixes I can think of:

            1. Guard the call to st_select_lex::optimize_unflattened_subqueries(false) with a check on thd->is_error(). This will not result in mem leak failures in the second ps execution as seen before, because failures in the first ps execution causes the memroot to not be marked as read only:

               // in Prepared_statement::execute_loop():
               #ifdef PROTECT_STATEMENT_MEMROOT
                 if (!error)
                 {
                   mem_root->flags |= ROOT_FLAG_READ_ONLY;
                   ++executed_counter;
               
                   DBUG_PRINT("info", ("execute counter: %lu", executed_counter));
                 }
                 else
                 {
                   // Error on call shouldn't be counted as a normal run
                   executed_counter= 0;
                   mem_root->flags &= ~ROOT_FLAG_READ_ONLY;
                 }
               #endif
               

            2. Add a check for select_lex->leaf_table_saved in setup_tables() before deciding how to construct leaf tables to prevent the scenario where select_lex->first_cond_optimization is set to false but the call to save_leaf_tables() is skipped:

               // in setup_tables():
                 if (select_lex->first_cond_optimization || !select_lex->leaf_table_saved)
                 {
               

            We can incorporate both fixes if no regressions are caused.

            ycp Yuchen Pei added a comment - - edited The current patch causes a regression on the following 2nd ps execution test. # .opt file --enable-sql-safe-updates # .test file create table t1 (a int , b int , primary key (a), key (b)); insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8); CREATE TABLE t2 ( id INT (10) ); insert into t2 values (1),(2);   # calls select_lex->optimize_unflattened_subqueries( false ) in the err label set @@optimizer_switch= "index_merge=off" ; set @var1=1; set @var2=2; prepare stmt from 'update t1 set b=(SELECT 1 FROM t2 WHERE id = t1.a) where 1=? or b=2' ; --error ER_UPDATE_WITHOUT_KEY_IN_SAFE_MODE execute stmt using @var1; execute stmt using @var2; drop table t1, t2; This is because, when there is an error, calling st_select_lex::optimize_unflattened_subqueries(false) will set first_cond_optimization for the inner select to false, but skip saving leaf tables: // in JOIN::optimize_inner(): if (thd->is_error() || (!select_lex->leaf_tables_saved && select_lex->save_leaf_tables(thd))) { If this happens during the first ps execution, then at the second execution, it will cause an reconstruction of the leaf tables from the (not) saved leaf tables leaf_tables_exec : // in setup_tables(): if (select_lex->first_cond_optimization) { // reconstruct leaf tables } else { // construct empty leaf tables from the empty select_lex->leaf_tables_exec } But it seems that the leaf tables should not be empty here, and the empty-ness is the cause of the segv in make_join_statistics() . There are two fixes I can think of: 1. Guard the call to st_select_lex::optimize_unflattened_subqueries(false) with a check on thd->is_error() . This will not result in mem leak failures in the second ps execution as seen before, because failures in the first ps execution causes the memroot to not be marked as read only: // in Prepared_statement::execute_loop(): #ifdef PROTECT_STATEMENT_MEMROOT if (!error) { mem_root->flags |= ROOT_FLAG_READ_ONLY; ++executed_counter; DBUG_PRINT( "info" , ( "execute counter: %lu" , executed_counter)); } else { // Error on call shouldn't be counted as a normal run executed_counter= 0; mem_root->flags &= ~ROOT_FLAG_READ_ONLY; } #endif 2. Add a check for select_lex->leaf_table_saved in setup_tables() before deciding how to construct leaf tables to prevent the scenario where select_lex->first_cond_optimization is set to false but the call to save_leaf_tables() is skipped: // in setup_tables(): if (select_lex->first_cond_optimization || !select_lex->leaf_table_saved) { We can incorporate both fixes if no regressions are caused.
            ycp Yuchen Pei added a comment - - edited

            Hi psergei, ptal at this patch thanks

            fc52f5e8c50 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
            

            As before it is based on an MDEV-34757 patch, but can progress independently

            It fails main.ps_3innodb main.ps_2myisam main.ps_4heap main.ps_5merge, but so does 11.6 (MDEV-34790).

            Note that the verbose low-level comments about which select_lex->optimize_unflattened_subqueries(false) is called in main.ps_mem_leaks is for development only and will be removed before push

            ycp Yuchen Pei added a comment - - edited Hi psergei , ptal at this patch thanks fc52f5e8c50 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update As before it is based on an MDEV-34757 patch, but can progress independently It fails main.ps_3innodb main.ps_2myisam main.ps_4heap main.ps_5merge, but so does 11.6 ( MDEV-34790 ). Note that the verbose low-level comments about which select_lex->optimize_unflattened_subqueries(false) is called in main.ps_mem_leaks is for development only and will be removed before push

            ycp, input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it?

            psergei Sergei Petrunia added a comment - ycp , input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it?
            ycp Yuchen Pei added a comment - - edited

            > input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it?

            psergei: No objection, but this is how I did it in the first version of my patch. But you commented in your review:

            >> + select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this;
            > Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).

            I understood it as using Sql_cmd_dml is less clear than using Sql_cmd_update and Sql_cmd_delete separately. Did you mean it was ok to use Sql_cmd_dml, but simply the cast was not needed?

            ycp Yuchen Pei added a comment - - edited > input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it? psergei : No objection, but this is how I did it in the first version of my patch. But you commented in your review: >> + select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this; > Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me). I understood it as using Sql_cmd_dml is less clear than using Sql_cmd_update and Sql_cmd_delete separately. Did you mean it was ok to use Sql_cmd_dml , but simply the cast was not needed?

            ycp, exactly, about the cast. Sorry for misunderstanding.

            psergei Sergei Petrunia added a comment - ycp , exactly, about the cast. Sorry for misunderstanding.

            Discussed this PROTECT_STATEMENT_MEMROOT and ROOT_FLAG_READ_ONLY issue
            with people on sql-processor call on Thursday.

            An alternative suggestion was floated:

            if statement took a shortcut: do not set ROOT_FLAG_READ_ONLY.

            They have had a similar issue with SP and its statements: SP has mem_root protection too, an SP execution will not necessarily execute all of its statements. So, they have this:

              #ifdef PROTECT_STATEMENT_MEMROOT
                if (!err_status)
                {
                  if (!(main_mem_root.flags & ROOT_FLAG_READ_ONLY) &&
                      has_all_instrs_executed())
                  { 
                    main_mem_root.flags |= ROOT_FLAG_READ_ONLY;
                  }
                  ++executed_counter;
            

            Note the has_all_instrs_executed() check.

            For regular prepared statement, the protection is activated in Prepared_statement::execute_loop:

              #ifdef PROTECT_STATEMENT_MEMROOT
                if (!error)
                {
                  mem_root->flags |= ROOT_FLAG_READ_ONLY;
                  ++executed_counter;
              
                  DBUG_PRINT("info", ("execute counter: %lu", executed_counter));
                }
                else
                {
                  // Error on call shouldn't be counted as a normal run
                  executed_counter= 0;
                  mem_root->flags &= ~ROOT_FLAG_READ_ONLY;
                }
            

            Note that it is not activated if there was an error...

            psergei Sergei Petrunia added a comment - Discussed this PROTECT_STATEMENT_MEMROOT and ROOT_FLAG_READ_ONLY issue with people on sql-processor call on Thursday. An alternative suggestion was floated: if statement took a shortcut: do not set ROOT_FLAG_READ_ONLY. They have had a similar issue with SP and its statements: SP has mem_root protection too, an SP execution will not necessarily execute all of its statements. So, they have this: #ifdef PROTECT_STATEMENT_MEMROOT if (!err_status) { if (!(main_mem_root.flags & ROOT_FLAG_READ_ONLY) && has_all_instrs_executed()) { main_mem_root.flags |= ROOT_FLAG_READ_ONLY; } ++executed_counter; Note the has_all_instrs_executed() check. For regular prepared statement, the protection is activated in Prepared_statement::execute_loop: #ifdef PROTECT_STATEMENT_MEMROOT if (!error) { mem_root->flags |= ROOT_FLAG_READ_ONLY; ++executed_counter; DBUG_PRINT( "info" , ( "execute counter: %lu" , executed_counter)); } else { // Error on call shouldn't be counted as a normal run executed_counter= 0; mem_root->flags &= ~ROOT_FLAG_READ_ONLY; } Note that it is not activated if there was an error...

            .. so we could have

            • introduced Prepared_statement::partial_execution,
            • Set that on any "partial" execution
            • Checked that here.

            I'm not sure which approach is better. The approach in the current patch has an advantage that UPDATEs work the same way as SELECTs...

            psergei Sergei Petrunia added a comment - .. so we could have introduced Prepared_statement::partial_execution , Set that on any "partial" execution Checked that here. I'm not sure which approach is better. The approach in the current patch has an advantage that UPDATEs work the same way as SELECTs...
            ycp Yuchen Pei added a comment -

            psergei: thanks for the alternative suggestion. It looks inferior to the existing solution though as it adds more complexity - what is the advantage of it?

            ycp Yuchen Pei added a comment - psergei : thanks for the alternative suggestion. It looks inferior to the existing solution though as it adds more complexity - what is the advantage of it?
            psergei Sergei Petrunia added a comment - - edited

            Take-aways from the call: decided against the alternative suggestion above. Keeping the current approach.

            Other input:

            After this, we can submit for testing.

            psergei Sergei Petrunia added a comment - - edited Take-aways from the call: decided against the alternative suggestion above. Keeping the current approach. Other input: Please act on this comment: https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=290099&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-290099 need_to_optimize is misleading. Please rename it to something like optimize_suqueries . After this, we can submit for testing.
            ycp Yuchen Pei added a comment - - edited

            Thanks for the review psergei.

            Hi lstartseva, ptal thanks:

            664cfb80717 upstream/bb-11.7-mdev-25008 MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization
            ca56c461703 MDEV-34757 Check leaf_tables_saved in partition pruning in UPDATE and DELETE
            

            ycp Yuchen Pei added a comment - - edited Thanks for the review psergei . Hi lstartseva , ptal thanks: 664cfb80717 upstream/bb-11.7-mdev-25008 MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization ca56c461703 MDEV-34757 Check leaf_tables_saved in partition pruning in UPDATE and DELETE
            psergei Sergei Petrunia added a comment - - edited

            ycp, the commit comment is very cryptic. Could you please change it to this, or something better:

            MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization
             
            Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
            subqueries. This didn't allow to make a meaningful choice between IN->EXISTS
            and Materialization strategies for subquery.
             
            Fix this:
            * Make UPDATE/DELETE savel Sql_cmd_dml::scanned_rows,
            * Then, subquery's JOIN::choose_subquery_plan() can fetch it from there.
             
            Details:
            UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries() twice,
            like SELECT does:
            1. Call with const_only=true before any optimizations. This allows range 
            optimizer and others to use the values of cheap const subqueries.
            2. Call it with const_only=false after range optimizer, partition pruning, etc.
            outer_lookup_keys value is provided, so it's possible to pick a good subquery strategy.
             
            Extra change: PROTECT_STATEMENT_MEMROOT requires that first SP execution performs
            subquery optimization for all subqueries, even for degenerate query plans like 
            "Impossible WHERE". Due to that, we call optimize_unflattened_subqueries (with 
            const_only=false) even for degenerate query plans.
            

            psergei Sergei Petrunia added a comment - - edited ycp , the commit comment is very cryptic. Could you please change it to this, or something better: MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization   Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for subqueries. This didn't allow to make a meaningful choice between IN->EXISTS and Materialization strategies for subquery.   Fix this: * Make UPDATE/DELETE savel Sql_cmd_dml::scanned_rows, * Then, subquery's JOIN::choose_subquery_plan() can fetch it from there.   Details: UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries() twice, like SELECT does: 1. Call with const_only=true before any optimizations. This allows range optimizer and others to use the values of cheap const subqueries. 2. Call it with const_only=false after range optimizer, partition pruning, etc. outer_lookup_keys value is provided, so it's possible to pick a good subquery strategy.   Extra change: PROTECT_STATEMENT_MEMROOT requires that first SP execution performs subquery optimization for all subqueries, even for degenerate query plans like "Impossible WHERE". Due to that, we call optimize_unflattened_subqueries (with const_only=false) even for degenerate query plans.
            ycp Yuchen Pei added a comment -

            psergei, thanks, done. I made to two changes to your suggested message and updated my testing request with the new commit hashes:

            1. Elaborated how select does the call twice
            2. Reworded the "extra change" as it is not a change in behaviour, as previously the early call to optimize_unflattened_subqueries(false) ensured optimization for degnerate query plans

            ycp Yuchen Pei added a comment - psergei , thanks, done. I made to two changes to your suggested message and updated my testing request with the new commit hashes: 1. Elaborated how select does the call twice 2. Reworded the "extra change" as it is not a change in behaviour, as previously the early call to optimize_unflattened_subqueries(false) ensured optimization for degnerate query plans

            Case for comparison of execution time N=100000, rows for delete is about 48%, Release build:

            CREATE TABLE t1 (
              id int NOT NULL PRIMARY KEY,
              item_id varchar(100),
              seller_name varchar(400),
              variant varchar(400),
              FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant)
            )engine=InnoDB  DEFAULT CHARSET=utf8mb4;
             
            insert into t1 select
            seq,
            FLOOR(1 + (RAND() * 100)),
            FLOOR(1 + (RAND() * 100)),
            FLOOR(1 + (RAND() * 100))
            from seq_1_to_100000;
             
            select count(*) FROM t1;
            +----------+
            | count(*) |
            +----------+
            |   100000 |
            +----------+
             
            select count(*) FROM t1 WHERE id NOT IN    (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +----------+
            | count(*) |
            +----------+
            |    48140 |
            +----------+
            

            Before changes (optimizer_switch='in_to_exists=on'):

            MariaDB [test]> explain
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
            | id   | select_type | table      | type           | possible_keys | key          | key_len | ref  | rows   | Extra                           |
            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
            |    1 | PRIMARY     | t1         | ALL            | NULL          | NULL         | NULL    | NULL | 100242 | Using where                     |
            |    2 | SUBQUERY    | <derived3> | index_subquery | NULL          | distinct_key | 5       | func | 20050  |                                 |
            |    3 | DERIVED     | t1         | ALL            | NULL          | NULL         | NULL    | NULL | 100242 | Using temporary; Using filesort |
            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
             
            MariaDB [test]> analyze format=json
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
                
            | {
              "query_optimization": {
                "r_total_time_ms": 0.293148733
              },
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 1453.569477,
                "table": {
                  "delete": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 100242,
                  "r_rows": 100000,
                  "r_total_filtered": 48.14,
                  "r_total_time_ms": 185.3549061,
                  "r_engine_stats": {
                    "pages_accessed": 241506,
                    "pages_updated": 145092
                  },
                  "attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))",
                  "r_filtered": 48.14
                },
                "subqueries": [
                  {
                    "query_block": {
                      "select_id": 2,
                      "cost": 2.24023997,
                      "having_condition": "innertable.m is null",
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "<derived3>",
                            "access_type": "index_subquery",
                            "key": "distinct_key",
                            "key_length": "5",
                            "used_key_parts": ["m"],
                            "ref": ["func"],
                            "loops": 1,
                            "r_loops": 0,
                            "rows": 20050,
                            "r_rows": null,
                            "cost": 2.24023997,
                            "r_table_time_ms": 2.962863545,
                            "r_other_time_ms": 249.7625712,
                            "filtered": 100,
                            "r_total_filtered": null,
                            "r_filtered": null,
                            "materialized": {
                              "query_block": {
                                "select_id": 3,
                                "cost": 84.00059774,
                                "r_loops": 1,
                                "r_total_time_ms": 837.9879334,
                                "filesort": {
                                  "sort_key": "t1.item_id, t1.seller_name, t1.variant",
                                  "r_loops": 1,
                                  "r_total_time_ms": 264.7830164,
                                  "r_used_priority_queue": false,
                                  "r_output_rows": 51860,
                                  "r_sort_passes": 102,
                                  "r_buffer_size": "253Kb",
                                  "r_sort_mode": "sort_key,rowid",
                                  "temporary_table": {
                                    "nested_loop": [
                                      {
                                        "table": {
                                          "table_name": "t1",
                                          "access_type": "ALL",
                                          "loops": 1,
                                          "r_loops": 1,
                                          "rows": 100242,
                                          "r_rows": 100000,
                                          "cost": 16.5964832,
                                          "r_table_time_ms": 32.3232741,
                                          "r_other_time_ms": 288.0603587,
                                          "r_engine_stats": {
                                            "pages_accessed": 267
                                          },
                                          "filtered": 100,
                                          "r_total_filtered": 100,
                                          "r_filtered": 100
                                        }
                                      }
                                    ]
                                  }
                                }
                              }
                            }
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            } |
            

            Before changes (optimizer_switch='in_to_exists=off'):

            MariaDB [test]> explain
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
            | id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows  | Extra                           |
            +------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
            |    1 | PRIMARY      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 97701 | Using where                     |
            |    2 | MATERIALIZED | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 97701 |                                 |
            |    3 | DERIVED      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 97701 | Using temporary; Using filesort |
            +------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
            3 rows in set (0,000 sec)
             
             
            MariaDB [test]> analyze format=json
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
             
            | {
              "query_optimization": {
                "r_total_time_ms": 0.270655265
              },
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 1687.263844,
                "table": {
                  "delete": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 97701,
                  "r_rows": 100000,
                  "r_total_filtered": 48.101,
                  "r_total_time_ms": 181.3790106,
                  "r_engine_stats": {
                    "pages_accessed": 241313,
                    "pages_updated": 144975
                  },
                  "attached_condition": "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))",
                  "r_filtered": 48.101
                },
                "subqueries": [
                  {
                    "materialization": {
                      "r_strategy": "index_lookup",
                      "r_loops": 100000,
                      "query_block": {
                        "select_id": 2,
                        "cost": 4.375730105,
                        "r_loops": 1,
                        "r_total_time_ms": 1089.840414,
                        "nested_loop": [
                          {
                            "table": {
                              "table_name": "<derived3>",
                              "access_type": "ALL",
                              "loops": 1,
                              "r_loops": 1,
                              "rows": 97701,
                              "r_rows": 51899,
                              "cost": 4.375730105,
                              "r_table_time_ms": 3.370080881,
                              "r_other_time_ms": 269.7888225,
                              "filtered": 100,
                              "r_total_filtered": 100,
                              "r_filtered": 100,
                              "materialized": {
                                "query_block": {
                                  "select_id": 3,
                                  "cost": 81.86802309,
                                  "r_loops": 1,
                                  "r_total_time_ms": 863.9443314,
                                  "filesort": {
                                    "sort_key": "t1.item_id, t1.seller_name, t1.variant",
                                    "r_loops": 1,
                                    "r_total_time_ms": 264.7268764,
                                    "r_used_priority_queue": false,
                                    "r_output_rows": 51899,
                                    "r_sort_passes": 102,
                                    "r_buffer_size": "253Kb",
                                    "r_sort_mode": "sort_key,rowid",
                                    "temporary_table": {
                                      "nested_loop": [
                                        {
                                          "table": {
                                            "table_name": "t1",
                                            "access_type": "ALL",
                                            "loops": 1,
                                            "r_loops": 1,
                                            "rows": 97701,
                                            "r_rows": 100000,
                                            "cost": 16.1823002,
                                            "r_table_time_ms": 33.74205653,
                                            "r_other_time_ms": 292.216494,
                                            "r_engine_stats": {
                                              "pages_accessed": 267
                                            },
                                            "filtered": 100,
                                            "r_total_filtered": 100,
                                            "r_filtered": 100
                                          }
                                        }
                                      ]
                                    }
                                  }
                                }
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                ]
              }
            } |
            

            After changes:

            MariaDB [test]> explain
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            +------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
            | id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
            +------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
            |    1 | PRIMARY      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 100175 | Using where                     |
            |    2 | MATERIALIZED | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 100175 |                                 |
            |    3 | DERIVED      | t1         | ALL  | NULL          | NULL | NULL    | NULL | 100175 | Using temporary; Using filesort |
            +------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
            3 rows in set (0,001 sec)
             
            MariaDB [test]> analyze format=json
                -> DELETE FROM t1 WHERE id NOT IN
                ->   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
             
            | {
              "query_optimization": {
                "r_total_time_ms": 0.310001332
              },
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 1702.036735,
                "table": {
                  "delete": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 100175,
                  "r_rows": 100000,
                  "r_total_filtered": 48.2,
                  "r_total_time_ms": 190.1354219,
                  "r_engine_stats": {
                    "pages_accessed": 241810,
                    "pages_updated": 145277
                  },
                  "attached_condition": "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))",
                  "r_filtered": 48.2
                },
                "subqueries": [
                  {
                    "materialization": {
                      "r_strategy": "index_lookup",
                      "r_loops": 100000,
                      "query_block": {
                        "select_id": 2,
                        "cost": 2.238774758,
                        "r_loops": 1,
                        "r_total_time_ms": 1087.639802,
                        "nested_loop": [
                          {
                            "table": {
                              "table_name": "<derived3>",
                              "access_type": "ALL",
                              "loops": 1,
                              "r_loops": 1,
                              "rows": 100175,
                              "r_rows": 51800,
                              "cost": 4.486279805,
                              "r_table_time_ms": 3.019494164,
                              "r_other_time_ms": 254.3995114,
                              "filtered": 100,
                              "r_total_filtered": 100,
                              "r_filtered": 100,
                              "materialized": {
                                "query_block": {
                                  "select_id": 3,
                                  "cost": 83.94436367,
                                  "r_loops": 1,
                                  "r_total_time_ms": 848.9372393,
                                  "filesort": {
                                    "sort_key": "t1.item_id, t1.seller_name, t1.variant",
                                    "r_loops": 1,
                                    "r_total_time_ms": 264.4373285,
                                    "r_used_priority_queue": false,
                                    "r_output_rows": 51800,
                                    "r_sort_passes": 102,
                                    "r_buffer_size": "253Kb",
                                    "r_sort_mode": "sort_key,rowid",
                                    "temporary_table": {
                                      "nested_loop": [
                                        {
                                          "table": {
                                            "table_name": "t1",
                                            "access_type": "ALL",
                                            "loops": 1,
                                            "r_loops": 1,
                                            "rows": 100175,
                                            "r_rows": 100000,
                                            "cost": 16.5855622,
                                            "r_table_time_ms": 33.04229617,
                                            "r_other_time_ms": 293.9514345,
                                            "r_engine_stats": {
                                              "pages_accessed": 267
                                            },
                                            "filtered": 100,
                                            "r_total_filtered": 100,
                                            "r_filtered": 100
                                          }
                                        }
                                      ]
                                    }
                                  }
                                }
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                ]
              }
            } |
            

            lstartseva Lena Startseva added a comment - Case for comparison of execution time N=100000, rows for delete is about 48%, Release build : CREATE TABLE t1 ( id int NOT NULL PRIMARY KEY , item_id varchar (100), seller_name varchar (400), variant varchar (400), FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant) )engine=InnoDB DEFAULT CHARSET=utf8mb4;   insert into t1 select seq, FLOOR(1 + (RAND() * 100)), FLOOR(1 + (RAND() * 100)), FLOOR(1 + (RAND() * 100)) from seq_1_to_100000;   select count (*) FROM t1; + ----------+ | count (*) | + ----------+ | 100000 | + ----------+   select count (*) FROM t1 WHERE id NOT IN ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); + ----------+ | count (*) | + ----------+ | 48140 | + ----------+ Before changes (optimizer_switch='in_to_exists=on'): MariaDB [test]> explain -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); + ------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100242 | Using where | | 2 | SUBQUERY | <derived3> | index_subquery | NULL | distinct_key | 5 | func | 20050 | | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100242 | Using temporary ; Using filesort | + ------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+   MariaDB [test]> analyze format=json -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); | { "query_optimization" : { "r_total_time_ms" : 0.293148733 }, "query_block" : { "select_id" : 1, "r_total_time_ms" : 1453.569477, "table" : { "delete" : 1, "table_name" : "t1" , "access_type" : "ALL" , "rows" : 100242, "r_rows" : 100000, "r_total_filtered" : 48.14, "r_total_time_ms" : 185.3549061, "r_engine_stats" : { "pages_accessed" : 241506, "pages_updated" : 145092 }, "attached_condition" : "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))" , "r_filtered" : 48.14 }, "subqueries" : [ { "query_block" : { "select_id" : 2, "cost" : 2.24023997, "having_condition" : "innertable.m is null" , "nested_loop" : [ { "table" : { "table_name" : "<derived3>" , "access_type" : "index_subquery" , "key" : "distinct_key" , "key_length" : "5" , "used_key_parts" : [ "m" ], "ref" : [ "func" ], "loops" : 1, "r_loops" : 0, "rows" : 20050, "r_rows" : null , "cost" : 2.24023997, "r_table_time_ms" : 2.962863545, "r_other_time_ms" : 249.7625712, "filtered" : 100, "r_total_filtered" : null , "r_filtered" : null , "materialized" : { "query_block" : { "select_id" : 3, "cost" : 84.00059774, "r_loops" : 1, "r_total_time_ms" : 837.9879334, "filesort" : { "sort_key" : "t1.item_id, t1.seller_name, t1.variant" , "r_loops" : 1, "r_total_time_ms" : 264.7830164, "r_used_priority_queue" : false , "r_output_rows" : 51860, "r_sort_passes" : 102, "r_buffer_size" : "253Kb" , "r_sort_mode" : "sort_key,rowid" , "temporary_table" : { "nested_loop" : [ { "table" : { "table_name" : "t1" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 100242, "r_rows" : 100000, "cost" : 16.5964832, "r_table_time_ms" : 32.3232741, "r_other_time_ms" : 288.0603587, "r_engine_stats" : { "pages_accessed" : 267 }, "filtered" : 100, "r_total_filtered" : 100, "r_filtered" : 100 } } ] } } } } } } ] } } ] } } | Before changes (optimizer_switch='in_to_exists=off'): MariaDB [test]> explain -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); + ------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 97701 | Using where | | 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 97701 | | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 97701 | Using temporary ; Using filesort | + ------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ 3 rows in set (0,000 sec)     MariaDB [test]> analyze format=json -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);   | { "query_optimization" : { "r_total_time_ms" : 0.270655265 }, "query_block" : { "select_id" : 1, "r_total_time_ms" : 1687.263844, "table" : { "delete" : 1, "table_name" : "t1" , "access_type" : "ALL" , "rows" : 97701, "r_rows" : 100000, "r_total_filtered" : 48.101, "r_total_time_ms" : 181.3790106, "r_engine_stats" : { "pages_accessed" : 241313, "pages_updated" : 144975 }, "attached_condition" : "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))" , "r_filtered" : 48.101 }, "subqueries" : [ { "materialization" : { "r_strategy" : "index_lookup" , "r_loops" : 100000, "query_block" : { "select_id" : 2, "cost" : 4.375730105, "r_loops" : 1, "r_total_time_ms" : 1089.840414, "nested_loop" : [ { "table" : { "table_name" : "<derived3>" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 97701, "r_rows" : 51899, "cost" : 4.375730105, "r_table_time_ms" : 3.370080881, "r_other_time_ms" : 269.7888225, "filtered" : 100, "r_total_filtered" : 100, "r_filtered" : 100, "materialized" : { "query_block" : { "select_id" : 3, "cost" : 81.86802309, "r_loops" : 1, "r_total_time_ms" : 863.9443314, "filesort" : { "sort_key" : "t1.item_id, t1.seller_name, t1.variant" , "r_loops" : 1, "r_total_time_ms" : 264.7268764, "r_used_priority_queue" : false , "r_output_rows" : 51899, "r_sort_passes" : 102, "r_buffer_size" : "253Kb" , "r_sort_mode" : "sort_key,rowid" , "temporary_table" : { "nested_loop" : [ { "table" : { "table_name" : "t1" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 97701, "r_rows" : 100000, "cost" : 16.1823002, "r_table_time_ms" : 33.74205653, "r_other_time_ms" : 292.216494, "r_engine_stats" : { "pages_accessed" : 267 }, "filtered" : 100, "r_total_filtered" : 100, "r_filtered" : 100 } } ] } } } } } } ] } } } ] } } | After changes: MariaDB [test]> explain -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); + ------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100175 | Using where | | 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 100175 | | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100175 | Using temporary ; Using filesort | + ------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ 3 rows in set (0,001 sec)   MariaDB [test]> analyze format=json -> DELETE FROM t1 WHERE id NOT IN -> ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);   | { "query_optimization" : { "r_total_time_ms" : 0.310001332 }, "query_block" : { "select_id" : 1, "r_total_time_ms" : 1702.036735, "table" : { "delete" : 1, "table_name" : "t1" , "access_type" : "ALL" , "rows" : 100175, "r_rows" : 100000, "r_total_filtered" : 48.2, "r_total_time_ms" : 190.1354219, "r_engine_stats" : { "pages_accessed" : 241810, "pages_updated" : 145277 }, "attached_condition" : "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))" , "r_filtered" : 48.2 }, "subqueries" : [ { "materialization" : { "r_strategy" : "index_lookup" , "r_loops" : 100000, "query_block" : { "select_id" : 2, "cost" : 2.238774758, "r_loops" : 1, "r_total_time_ms" : 1087.639802, "nested_loop" : [ { "table" : { "table_name" : "<derived3>" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 100175, "r_rows" : 51800, "cost" : 4.486279805, "r_table_time_ms" : 3.019494164, "r_other_time_ms" : 254.3995114, "filtered" : 100, "r_total_filtered" : 100, "r_filtered" : 100, "materialized" : { "query_block" : { "select_id" : 3, "cost" : 83.94436367, "r_loops" : 1, "r_total_time_ms" : 848.9372393, "filesort" : { "sort_key" : "t1.item_id, t1.seller_name, t1.variant" , "r_loops" : 1, "r_total_time_ms" : 264.4373285, "r_used_priority_queue" : false , "r_output_rows" : 51800, "r_sort_passes" : 102, "r_buffer_size" : "253Kb" , "r_sort_mode" : "sort_key,rowid" , "temporary_table" : { "nested_loop" : [ { "table" : { "table_name" : "t1" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 100175, "r_rows" : 100000, "cost" : 16.5855622, "r_table_time_ms" : 33.04229617, "r_other_time_ms" : 293.9514345, "r_engine_stats" : { "pages_accessed" : 267 }, "filtered" : 100, "r_total_filtered" : 100, "r_filtered" : 100 } } ] } } } } } } ] } } } ] } } |
            ycp Yuchen Pei added a comment -

            Hi lstartseva, sorry that in the discussion about comparison of performance before and after patch in 11.7, I forgot the pertinent point lost in the long time frame of this task that the bug is not present for 11.0+ for the testcase in the description, see also my comment above[1]. If my reading is correct, this is also reflected from your test results above.

            So the question now is whether we want to find a testcase that shows the performance improvements before and after the patch, in 11.7. Please advise, psergei.

            [1] https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=288799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-288799

            ycp Yuchen Pei added a comment - Hi lstartseva , sorry that in the discussion about comparison of performance before and after patch in 11.7, I forgot the pertinent point lost in the long time frame of this task that the bug is not present for 11.0+ for the testcase in the description, see also my comment above [1] . If my reading is correct, this is also reflected from your test results above. So the question now is whether we want to find a testcase that shows the performance improvements before and after the patch, in 11.7. Please advise, psergei . [1] https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=288799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-288799

            Here's the testcase, it's fairly trivial:

            create table t1 (
              grp_id int, 
              b int
            );
            insert into t1 
              select MOD(seq, 100), seq
            from 
              seq_1_to_1000;
             
             
            create table t2 (
              grp_id int,
              col1 int
            );
            insert into t2 
              select MOD(seq, 100), seq
            from 
              seq_1_to_50000;
            analyze table t1,t2;
            

            explain
            update t1 set b=b+1
            where
              grp_id not in (select grp_id from t2 group by grp_id having sum(col1)>100);
            

            analyze format=json
            update t1 set b=b+1
            where
              grp_id not in (select grp_id from t2 group by grp_id having sum(col1)>100);
            

            This query doesn't modify any rows, that's just to allow easier re-runs.

            EXPLAIN before the patch:

            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000  | Using where     |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 50283 | Using temporary |
            +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
            

            EXPLAIN after the patch:

            +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
            | id   | select_type  | table | type | possible_keys | key  | key_len | ref  | rows  | Extra           |
            +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
            |    1 | PRIMARY      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000  | Using where     |
            |    2 | MATERIALIZED | t2    | ALL  | NULL          | NULL | NULL    | NULL | 50283 | Using temporary |
            +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
            

            Timings:
            Before the patch: 16 seconds
            After the patch: 0.02 seconds.

            This is release build on a deskop machine.
            Before: commit 7a65dcb59efe10392e74ecec3236298be138f083
            After: commit 664cfb80717a4dfb05738af089d5813a253b5d59

            psergei Sergei Petrunia added a comment - Here's the testcase, it's fairly trivial: create table t1 ( grp_id int , b int ); insert into t1 select MOD(seq, 100), seq from seq_1_to_1000;     create table t2 ( grp_id int , col1 int ); insert into t2 select MOD(seq, 100), seq from seq_1_to_50000; analyze table t1,t2; explain update t1 set b=b+1 where grp_id not in ( select grp_id from t2 group by grp_id having sum (col1)>100); analyze format=json update t1 set b=b+1 where grp_id not in ( select grp_id from t2 group by grp_id having sum (col1)>100); This query doesn't modify any rows, that's just to allow easier re-runs. EXPLAIN before the patch: +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 50283 | Using temporary | +------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+ EXPLAIN after the patch: +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 50283 | Using temporary | +------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+ Timings: Before the patch: 16 seconds After the patch: 0.02 seconds. This is release build on a deskop machine. Before: commit 7a65dcb59efe10392e74ecec3236298be138f083 After: commit 664cfb80717a4dfb05738af089d5813a253b5d59
            psergei Sergei Petrunia added a comment - - edited

            Summary of lstartseva's comment above

            https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=291189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-291189 :

            Before:

            • default plan is: 1453 ms
            • in_to_exists=off: 1687 ms (1.16x slowdown)

            After:

            • (same plan as before with in_to_exists_off): 1702 ms
            psergei Sergei Petrunia added a comment - - edited Summary of lstartseva 's comment above https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=291189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-291189 : Before: default plan is: 1453 ms in_to_exists=off: 1687 ms (1.16x slowdown) After: (same plan as before with in_to_exists_off): 1702 ms

            One odd thing I observe in lstartseva's data:

            The query plan for Before changes (optimizer_switch='in_to_exists=on') uses SUBQUERY, that is, uncorrelated subquery. (Correlated subquery would use DEPENDENT SUBQUERY).

            The query plan with After changes uses MATERIALIZED. But why would we need to materialize if we only need to compute the subquery once?

            ycp, could you check this?

            psergei Sergei Petrunia added a comment - One odd thing I observe in lstartseva 's data: The query plan for Before changes (optimizer_switch='in_to_exists=on') uses SUBQUERY , that is, uncorrelated subquery. (Correlated subquery would use DEPENDENT SUBQUERY ). The query plan with After changes uses MATERIALIZED . But why would we need to materialize if we only need to compute the subquery once? ycp , could you check this?
            ycp Yuchen Pei added a comment - - edited

            psergei: not a direct answer to your question, but I notice that monty's change that fixed the original case in 11.0+ also resulted in the query type changing from "DEPENDENT SUBQUERY" to "SUBQUERY":

            # before 5e5a8eda1641eda1d915a7eb5736e494d2179795 Derived tables and union can now create distinct keys
            explain
            DELETE FROM t1 WHERE id NOT IN
            (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	10000	Using where
            2	DEPENDENT SUBQUERY	<derived3>	ALL	NULL	NULL	NULL	NULL	10000	Using where
            3	DERIVED	t1	ALL	NULL	NULL	NULL	NULL	10000	Using temporary; Using filesort
            # after
            explain
            DELETE FROM t1 WHERE id NOT IN
            (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	10000	Using where
            2	SUBQUERY	<derived3>	index_subquery	distinct_key	distinct_key	5	func	2002
            3	DERIVED	t1	ALL	NULL	NULL	NULL	NULL	10000	Using temporary; Using filesort
            

            At the parent of the patch for this issue, if we do the following then we will get "DEPENDENT SUBQUERY" again

            modified   sql/sql_derived.cc
            @@ -656,7 +656,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
             {
               SELECT_LEX_UNIT *unit= derived->get_unit();
               SELECT_LEX *first_select;
            -  bool res= FALSE, keep_row_order, distinct;
            +  bool res= FALSE, keep_row_order/* , distinct */;
               DBUG_ENTER("mysql_derived_prepare");
               DBUG_PRINT("enter", ("unit: %p  table_list: %p  alias: '%s'",
                                    unit, derived, derived->alias.str));
            @@ -868,13 +868,15 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
                 Note that the underlying query will also control distinct condition.
               */
               thd->create_tmp_table_for_derived= TRUE;
            -  distinct= (unit->first_select()->next_select() ?
            -             unit->union_distinct && !unit->union_distinct->next_select() :
            -             unit->distinct);
            +  /*
            +    distinct= (unit->first_select()->next_select() ?
            +               unit->union_distinct && !unit->union_distinct->next_select() :
            +               unit->distinct);
            +   */
             
               if (!(derived->table) &&
                   derived->derived_result->create_result_table(thd, &unit->types,
            -                                                   distinct,
            +                                                   false,
                                                                (first_select->options |
                                                                thd->variables.option_bits |
                                                                TMP_TABLE_ALL_COLUMNS),
            

            ycp Yuchen Pei added a comment - - edited psergei : not a direct answer to your question, but I notice that monty's change that fixed the original case in 11.0+ also resulted in the query type changing from "DEPENDENT SUBQUERY" to "SUBQUERY": # before 5e5a8eda1641eda1d915a7eb5736e494d2179795 Derived tables and union can now create distinct keys explain DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 10000 Using where 2 DEPENDENT SUBQUERY <derived3> ALL NULL NULL NULL NULL 10000 Using where 3 DERIVED t1 ALL NULL NULL NULL NULL 10000 Using temporary; Using filesort # after explain DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 10000 Using where 2 SUBQUERY <derived3> index_subquery distinct_key distinct_key 5 func 2002 3 DERIVED t1 ALL NULL NULL NULL NULL 10000 Using temporary; Using filesort At the parent of the patch for this issue, if we do the following then we will get "DEPENDENT SUBQUERY" again modified sql/sql_derived.cc @@ -656,7 +656,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived) { SELECT_LEX_UNIT *unit= derived->get_unit(); SELECT_LEX *first_select; - bool res= FALSE, keep_row_order, distinct; + bool res= FALSE, keep_row_order/* , distinct */; DBUG_ENTER("mysql_derived_prepare"); DBUG_PRINT("enter", ("unit: %p table_list: %p alias: '%s'", unit, derived, derived->alias.str)); @@ -868,13 +868,15 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived) Note that the underlying query will also control distinct condition. */ thd->create_tmp_table_for_derived= TRUE; - distinct= (unit->first_select()->next_select() ? - unit->union_distinct && !unit->union_distinct->next_select() : - unit->distinct); + /* + distinct= (unit->first_select()->next_select() ? + unit->union_distinct && !unit->union_distinct->next_select() : + unit->distinct); + */ if (!(derived->table) && derived->derived_result->create_result_table(thd, &unit->types, - distinct, + false, (first_select->options | thd->variables.option_bits | TMP_TABLE_ALL_COLUMNS),

            TL,DR: This "SUBQUERY" in EXPLAIN output doesn't seem to reflect what's going in. Execution looks like "DEPENDENT SUBQUERY".

            Running Alice's testcase on 11.2, I get this EXPLAIN:

            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
            | id   | select_type | table      | type           | possible_keys | key          | key_len | ref  | rows   | Extra                           |
            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
            |    1 | PRIMARY     | t1         | ALL            | NULL          | NULL         | NULL    | NULL | 100048 | Using where                     |
            |    2 | SUBQUERY    | <derived3> | index_subquery | NULL          | distinct_key | 5       | func | 20011  |                                 |
            |    3 | DERIVED     | t1         | ALL            | NULL          | NULL         | NULL    | NULL | 100048 | Using temporary; Using filesort |
            +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
            

            and this ANALYZE:

            analyze format=json DELETE FROM t1 WHERE id NOT IN   (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
            

            {
              "query_optimization": {
                "r_total_time_ms": 1.214205366
              },
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 294566.2563,
                "table": {
                  "delete": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 100048,
                  "r_rows": 100000,
                  "r_filtered": 48.155,
                  "r_total_time_ms": 60829.51264,
                  "r_engine_stats": {
                    "pages_accessed": 241689,
                    "pages_updated": 145142
                  },
                  "attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))"
                },
                "subqueries": [
                  {
                    "query_block": {
                      "select_id": 2,
                      "cost": 2.235889134,
                      "having_condition": "innertable.m is null",
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "<derived3>",
                            "access_type": "index_subquery",
                            "key": "distinct_key",
                            "key_length": "5",
                            "used_key_parts": ["m"],
                            "ref": ["func"],
                            "loops": 1,
                            "r_loops": 0,
                            "rows": 20011,
                            "r_rows": null,
                            "cost": 2.235889134,
                            "r_table_time_ms": 88.26781154,
                            "r_other_time_ms": 9677.190753,
                            "filtered": 100,
                            "r_filtered": null,
                            "materialized": {
                              "query_block": {
                                "select_id": 3,
                                "cost": 83.83777121,
                                "r_loops": 1,
                                "r_total_time_ms": 5724.475274,
                                "filesort": {
                                  "sort_key": "t1.item_id, t1.seller_name, t1.variant",
                                  "r_loops": 1,
                                  "r_total_time_ms": 103.603556,
                                  "r_used_priority_queue": false,
                                  "r_output_rows": 51845,
                                  "r_buffer_size": "2045Kb",
                                  "r_sort_mode": "packed_sort_key,rowid",
                                  "temporary_table": {
                                    "nested_loop": [
                                      {
                                        "table": {
                                          "table_name": "t1",
                                          "access_type": "ALL",
                                          "loops": 1,
                                          "r_loops": 1,
                                          "rows": 100048,
                                          "r_rows": 100000,
                                          "cost": 16.5648612,
                                          "r_table_time_ms": 1464.248393,
                                          "r_other_time_ms": 3917.197298,
                                          "r_engine_stats": {
                                            "pages_accessed": 267
                                          },
                                          "filtered": 100,
                                          "r_filtered": 100
                                        }
                                      }
                                    ]
                                  }
                                }
                              }
                            }
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            } 
            

            Note the

                            "table_name": "<derived3>",
                            "access_type": "index_subquery",
                            "key": "distinct_key",
                            "key_length": "5",
                            "used_key_parts": ["m"],
                            "ref": ["func"],
                            "loops": 1,
                            "r_loops": 0,
            

            But in debugger I can see that subselect_indexsubquery_engine::exec() is invoked many times... Why do we get r_loops:0 here...

            psergei Sergei Petrunia added a comment - TL,DR: This "SUBQUERY" in EXPLAIN output doesn't seem to reflect what's going in. Execution looks like "DEPENDENT SUBQUERY". Running Alice's testcase on 11.2, I get this EXPLAIN: +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100048 | Using where | | 2 | SUBQUERY | <derived3> | index_subquery | NULL | distinct_key | 5 | func | 20011 | | | 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100048 | Using temporary; Using filesort | +------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ and this ANALYZE: analyze format=json DELETE FROM t1 WHERE id NOT IN ( SELECT m FROM ( SELECT max (id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); { "query_optimization" : { "r_total_time_ms" : 1.214205366 }, "query_block" : { "select_id" : 1, "r_total_time_ms" : 294566.2563, "table" : { "delete" : 1, "table_name" : "t1" , "access_type" : "ALL" , "rows" : 100048, "r_rows" : 100000, "r_filtered" : 48.155, "r_total_time_ms" : 60829.51264, "r_engine_stats" : { "pages_accessed" : 241689, "pages_updated" : 145142 }, "attached_condition" : "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))" }, "subqueries" : [ { "query_block" : { "select_id" : 2, "cost" : 2.235889134, "having_condition" : "innertable.m is null" , "nested_loop" : [ { "table" : { "table_name" : "<derived3>" , "access_type" : "index_subquery" , "key" : "distinct_key" , "key_length" : "5" , "used_key_parts" : [ "m" ], "ref" : [ "func" ], "loops" : 1, "r_loops" : 0, "rows" : 20011, "r_rows" : null , "cost" : 2.235889134, "r_table_time_ms" : 88.26781154, "r_other_time_ms" : 9677.190753, "filtered" : 100, "r_filtered" : null , "materialized" : { "query_block" : { "select_id" : 3, "cost" : 83.83777121, "r_loops" : 1, "r_total_time_ms" : 5724.475274, "filesort" : { "sort_key" : "t1.item_id, t1.seller_name, t1.variant" , "r_loops" : 1, "r_total_time_ms" : 103.603556, "r_used_priority_queue" : false , "r_output_rows" : 51845, "r_buffer_size" : "2045Kb" , "r_sort_mode" : "packed_sort_key,rowid" , "temporary_table" : { "nested_loop" : [ { "table" : { "table_name" : "t1" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 100048, "r_rows" : 100000, "cost" : 16.5648612, "r_table_time_ms" : 1464.248393, "r_other_time_ms" : 3917.197298, "r_engine_stats" : { "pages_accessed" : 267 }, "filtered" : 100, "r_filtered" : 100 } } ] } } } } } } ] } } ] } } Note the "table_name": "<derived3>", "access_type": "index_subquery", "key": "distinct_key", "key_length": "5", "used_key_parts": ["m"], "ref": ["func"], "loops": 1, "r_loops": 0, But in debugger I can see that subselect_indexsubquery_engine::exec() is invoked many times... Why do we get r_loops:0 here...

            As for 11.0 changing how this query is optimized... Yes it does change it.

            DELETE FROM t1 
            WHERE id NOT IN  (SELECT m  -- select #2
                              FROM (SELECT max(id) m  -- select #3
                                    FROM t1 
                                    GROUP BY item_id, seller_name, variant
                                   ) AS innertable)
            

            Before 11.0:

            select #3 was materialized into a temporary table which didn't have a useful index.
            Then, NOT IN subquery used IN-to-EXISTS which had to scan the temptable many times.

            After 11:0:

            select #3 is materialized into a table with a potential index on innertable.m
            The NOT IN still uses IN-to-EXISTS.
            IN-to-EXISTS pushes the "t1.id=innertable.m" into subquery#2.
            potential index on innertable.m becomes a real index, and that index is now used to perform
            [non-unique] index lookups.
            This way, the query plan is now similar to Materialization strategy - have a temptable with a suitable index, make lookups there when computing the subquery.
            (Also EXPLAIN is wrong as shown above but this doesn't seem to have much effect).

            psergei Sergei Petrunia added a comment - As for 11.0 changing how this query is optimized... Yes it does change it. DELETE FROM t1 WHERE id NOT IN ( SELECT m -- select #2 FROM ( SELECT max (id) m -- select #3 FROM t1 GROUP BY item_id, seller_name, variant ) AS innertable) Before 11.0: select #3 was materialized into a temporary table which didn't have a useful index. Then, NOT IN subquery used IN-to-EXISTS which had to scan the temptable many times. After 11:0: select #3 is materialized into a table with a potential index on innertable.m The NOT IN still uses IN-to-EXISTS. IN-to-EXISTS pushes the "t1.id=innertable.m" into subquery#2. potential index on innertable.m becomes a real index, and that index is now used to perform [non-unique] index lookups. This way, the query plan is now similar to Materialization strategy - have a temptable with a suitable index, make lookups there when computing the subquery. (Also EXPLAIN is wrong as shown above but this doesn't seem to have much effect).

            Take away: need to fix EXPLAIN for the above case but this is not a blocker for this MDEV.

            psergei Sergei Petrunia added a comment - Take away: need to fix EXPLAIN for the above case but this is not a blocker for this MDEV.

            Filed MDEV-35231 for fixing the EXPLAIN. It's a different problem from this MDEV, this MDEV doesn't need to wait on it.

            psergei Sergei Petrunia added a comment - Filed MDEV-35231 for fixing the EXPLAIN. It's a different problem from this MDEV, this MDEV doesn't need to wait on it.

            Testing done. Ok to push. The problems discussed above will be fixed in MDEV-35231

            lstartseva Lena Startseva added a comment - Testing done. Ok to push. The problems discussed above will be fixed in MDEV-35231
            ycp Yuchen Pei added a comment -

            pushed 4b6922a315fa5411665ac99c0b40fd7238093403 to 11.7/main

            ycp Yuchen Pei added a comment - pushed 4b6922a315fa5411665ac99c0b40fd7238093403 to 11.7/main

            People

              ycp Yuchen Pei
              ahmedadel47 Ahmed Wahba
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.