Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
There is a query in my application to delete duplicate records (specific fields), this query used to work fine on MySQL (v 8.0.21) on Windows 10 and takes about a minute,
When I migrated to MariaDB 10.5.9 on Ubuntu 20.04 , this query stucks, I left it for +6 hours and it didn't come back.
The table has about 4 million records, table is:
CREATE TABLE `item_variant_price` ( |
`seller_variant_id` int(11) NOT NULL AUTO_INCREMENT, |
`item_id` varchar(100) DEFAULT NULL, |
`price` float DEFAULT NULL, |
`seller_name` varchar(400) DEFAULT NULL, |
`variant` varchar(400) DEFAULT NULL, |
`is_fulfilled` int(11) NOT NULL DEFAULT 0, |
`insertion_date` timestamp NOT NULL DEFAULT current_timestamp(), |
`modification_date` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), |
`is_main_page` int(11) DEFAULT 0, |
`is_cheapest` int(11) NOT NULL DEFAULT 0, |
`variant_url` varchar(400) DEFAULT NULL, |
PRIMARY KEY (`variant_id`), |
FULLTEXT KEY `item_variant_price_serial_IDX` (`item_id`,`seller_name`,`variant`) |
) ENGINE=InnoDB AUTO_INCREMENT=4309337 DEFAULT CHARSET=utf8mb4; |
and this is the query:
delete from item_variant_price where seller_variant_id not in (select m from (select max(seller_variant_id) m from item_variant_price group by item_id, seller_name, variant) as innerTable); |
the inner select statement takes 27 seconds, so there has to be an issue with delete ?
Thanks for your support
Attachments
Issue Links
- causes
-
MDEV-35318 Assertion `tl->jtbm_subselect' failed in JOIN::calc_allowed_top_level_tables on 2nd execution of PS
-
- Closed
-
-
MDEV-35662 Assertion failure in diagnostics area upon EXPLAIN UPDATE
-
- In Review
-
- duplicates
-
MDEV-22415 Single table UPDATE/DELETE doesn't use non-semijoin Materialization
-
- Closed
-
- relates to
-
MDEV-22415 Single table UPDATE/DELETE doesn't use non-semijoin Materialization
-
- Closed
-
-
MDEV-35231 Wrong EXPLAIN or ANALYZE output: SUBQUERY instead of DEPENDENT SUBQUERY
-
- Open
-
Activity
Thanks Alice for you reply,
indeed after disabling: set optimizer_switch='in_to_exists=off';
the query worked, and it took 2m 35s .
Hi Team, following up on this issue, has there been an update on it?
Thanks,
Ahmed
Hi Team, thanks for verifying the issue.
what is the way forward? is there a target date? target fix version?
Thanks.
A poc solution, following the ideas at https://jira.mariadb.org/browse/MDEV-22415?focusedCommentId=288641&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-288641
d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
After comparing 10.5 with 11.1, I have a different patch that fixes this testcase and a similar update statement:
modified sql/sql_derived.cc
|
@@ -850,7 +850,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
|
*/
|
thd->create_tmp_table_for_derived= TRUE;
|
if (!(derived->table) &&
|
- derived->derived_result->create_result_table(thd, &unit->types, FALSE,
|
+ derived->derived_result->create_result_table(thd, &unit->types, unit->distinct,
|
(first_select->options |
|
thd->variables.option_bits |
|
TMP_TABLE_ALL_COLUMNS),
|
See also monty's change below (with no MDEV reference) that was pushed to 11.0:
commit 5e5a8eda1641eda1d915a7eb5736e494d2179795
|
Author: Monty <monty@mariadb.org>
|
Date: Wed May 4 17:26:43 2022 +0300
|
|
Derived tables and union can now create distinct keys
|
|
The idea is that instead of marking all select_lex's with DISTINCT, we
|
only mark those that really need distinct result.
|
|
Benefits of this change:
|
- Temporary tables used with derived tables, UNION, IN are now smaller
|
as duplicates are removed already on the insert phase.
|
- The optimizer can now produce better plans with EQ_REF. This can be
|
seen from the tests where several queries does not anymore materialize
|
derived tables twice.
|
- Queries affected by 'in_predicate_conversion_threshold' where large IN
|
lists are converted to sub query produces better plans.
|
|
Other things:
|
- Removed on duplicate call to sel->init_select() in
|
LEX::add_primary_to_query_expression_body()
|
- I moved the testing of
|
tab->table->pos_in_table_list->is_materialized_derived()
|
in join_read_const_table() to the caller as it caused problems for
|
derived tables that could be proven to be const tables.
|
This also is likely to fix some bugs as if join_read_const_table()
|
was aborted, the table was left marked as JT_CONST, which cannot
|
be good. I added an ASSERT there for now that can be removed when
|
the code has been properly tested.
|
Performance-wise, let's consider this test:
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
|
CREATE TABLE t1 ( |
id int NOT NULL PRIMARY KEY, |
item_id varchar(100), |
seller_name varchar(400), |
variant varchar(400), |
FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant) |
)engine=innodb;
|
|
insert into t1 select seq,seq,seq,seq from seq_1_to_10000; |
|
set profiling=1; |
DELETE FROM t1 WHERE id NOT IN |
(SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
|
UPDATE t1 SET item_id="foo" WHERE id NOT IN |
(SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
show profiles;
|
set profiling=0; |
|
drop table t1; |
And I note the output duration of show profiles for the DELETE and UPDATE queries respectively
- at 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2
11.38s (DELETE) + 11.30s (UPDATE) - at monty's original commit:
0.87 + 0.76 - at parent commit to monty's commit:
10.92 + 10.96 - at dc7d83a9005 bb-10.5-mdev-25008-monty-patch, which is the backport of monty's commit to 10.5
1.18 + 1.07 - at b402994270e bb-10.5-mdev-25008-unit-distinct (the diff in this comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2)
1.17 + 1.01 - at 687b7fd83f6 bb-10.5-mdev-25008-outer-lookup-keys (the poc patch in the previous comment, applied to 10.5 b8f92ade57691a78cc97c5d79eae0a27a10cb8f2)
1.16 + 1.06
From this we could conclude that the two fixes result in similar performance improvements, and that monty's commit seems to be the reason that the bug is not present in 11.0+.
In terms of tests, bb-10.5-mdev-25008-unit-distinct b402994270ea89f5e186a430fa8810d29cbcc4db fails a massive number of tests. bb-10.5-mdev-25008-monty-patch dc7d83a90056f82c92817c3e555ad4ba86a33c94 fails an assertion in main.derived_opt. And bb-10.5-mdev-25008-outer-lookup-keys looks ok.
d28236bc26e bb-10.5-mdev-25008 MDEV-25008 [poc] Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc
|
index 41f46beb54b..776ad34223b 100644
|
--- a/sql/opt_subselect.cc
|
+++ b/sql/opt_subselect.cc
|
@@ -6606,11 +6606,8 @@ bool JOIN::choose_subquery_plan(table_map join_tables)
|
}
|
else
|
{
|
- /*
|
- TODO: outer_join can be NULL for DELETE statements.
|
- How to compute its cost?
|
- */
|
- outer_lookup_keys= 1;
|
+ outer_lookup_keys=
|
+ (double) unit->outer_select()->table_list.first->table->stat_records();
|
}
|
|
/* |
So, for
update world_population w |
where
|
w.country_index='Finland' and |
w.column IN (SELECT ...) |
we will use outer_lookup_keys=6Billion even if the query is likely to use an index INDEX(country_index) and have a QUICK_RANGE_SELECT with rows=5 million.
There a question if this rows=5M is stored anywhere, except in some local variable in mysql_update. If it is not, it should be.
I guess, the answer to this question may be different in 11.1 where MDEV-28883 has introduced Sql_cmd_update/delete classes.
A testcase which still fails for me in 11.6
create table t1 (a int, b int); |
insert into t1 select seq,seq from seq_1_to_1000; |
|
create table t2 (a int, b int); |
insert into t2 select seq,floor(seq/100) from seq_1_to_20000; |
|
create table t3 (a int, b int, key(a)); |
insert into t3 select seq,floor(seq/100) from seq_1_to_20000; |
explain
|
select * from t1 |
where |
b>3 or |
a in (select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999); |
+------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
|
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 19874 | Using where |
|
| 2 | MATERIALIZED | t3 | ref | a | a | 5 | j20.t2.b | 1 | |
|
+------+--------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
Ok.
Now, let's try DELETE
explain
|
delete from t1 |
where |
b>3 or |
a in (select t3.b from t2, t3 where t3.a=t2.b and t2.a<99999); |
+------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
|
| 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 19874 | Using where |
|
| 2 | DEPENDENT SUBQUERY | t3 | ref | a | a | 5 | j20.t2.b | 1 | Using where |
|
+------+--------------------+-------+------+---------------+------+---------+----------+-------+-------------+
|
and see that it uses IN->EXISTS which looks to be obviously worse than Materialization.
Debugging the above, I can see the execution to enter JOIN(select_lex->select_number=2)::choose_subquery_plan().
It reaches this if statement and takes does NOT take it:
if (outer_join && outer_join->table_count > 0 && // (1)
|
outer_join->join_tab && // (2)
|
!in_subs->const_item())
|
{
|
outer_join->get_partial_cost_and_fanout(in_subs->get_join_tab_idx(),
|
table_map(-1),
|
&dummy,
|
&outer_lookup_keys);
|
Why does it not take it? Because outer_join->join_tab==0...
(gdb) p outer_join->table_count
|
$13 = 1
|
(gdb) p outer_join->join_tab
|
$14 = (JOIN_TAB *) 0x0
|
(gdb) p in_subs->const_item()
|
$15 = false
|
... and if I debug ANALYZE DELETE... I see that outer_join->join_tab remains NULL for the entire duration of the statement... Looks like the join object is an imitation ?
I'm wondering on which condition JOIN object stops being an imitation? if the top-level JOIN has more than one table? In case it has one table, should we continue the imitation and create JOIN_TAB and POSITION and fill them with data describing the single-table access? Or, don't imitate and have JOIN have a pointer to Sql_cmd_delete/update so that choose_subquery_plan() can fetch the real number of subquery invocations from there?
Hi psergei, following up the discussions on tuesday, ptal at this patch thanks
0190cb9ad3d bb-11.1-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
Two questions:
- the bug is for 10.5+, but this change requires
MDEV-28883which is 11.1+ - do we need to devise a separate patch for 10.5? - main.ps and main.ps_mem_leaks fail in multiple CI builders. The failure (Assertion `(mem_root->flags & 4) == 0' failed) is bizarre and I could not reproduce it locally - do you have any clue what may have caused it? Here are a list of builder:
- amd64-windows
- amd64-ubuntu-2004-debug
- amd64-ubuntu-2204-debug-ps
- amd64-debian-11-msan
- amd64-debian-11-msan-clang-16
The main.ps assertion failure (in a MDEV-8756 test case but the ONLY_FULL_GROUP_BY is not needed to get the assertion failure) can only be reproduced with the flag PROTECT_STATEMENT_MEMROOT, introduced in MDEV-14959. The failing assertion used to be protected by this flag, but the protection was removed in the commit below. Perhaps we should re-protect the assertion with the flag for consistency, also considering that all the other uses of ROOT_FLAG_READ_ONLY are protected by this flag? cc serg
commit 69d78cd3f858748c0ee5ccd6ae1e77cfc063d57b
|
Author: Sergei Golubchik <serg@mariadb.org>
|
Date: Thu Nov 23 09:56:56 2023 +0100
|
|
move MEM_ROOT::read_only into flags
|
@@@ -281,42 -209,16 +272,37 @@@ void *alloc_root(MEM_ROOT *mem_root, si
|
DBUG_ENTER("alloc_root");
|
DBUG_PRINT("enter",("root: %p", mem_root));
|
DBUG_ASSERT(alloc_root_inited(mem_root));
|
-
|
- #ifdef PROTECT_STATEMENT_MEMROOT
|
- DBUG_ASSERT(mem_root->read_only == 0);
|
- #endif
|
+ DBUG_ASSERT((mem_root->flags & ROOT_FLAG_READ_ONLY) == 0); |
The Assert means that memory leaks in PS/SP, so you should need to find why. In fact it mean that it continue to allocate memory on the statement mem_root after first execution.
ycp, mem_root->read_only didn't exist unless PROTECT_STATEMENT_MEMROOT was defined, so any access had to be protected with an #ifdef. But mem_root->flags is always declared, there is no need to protect accesses to it.
Note that ROOT_FLAG_READ_ONLY is always set under #ifdef PROTECT_STATEMENT_MEMROOT, so without it the assert should always be satisfied.
serg, exactly, the assertion can only fail when the compiler flag PROTECT_STATEMENT_MEMROOT is on, which seems to be defeating the purpose of an assertion. It can also be confusing if it fails in the CI but not locally, as has happened to me. The confusion can be mitigated by placing a code comment "this assertion is only failable with the PROTECT_STATEMENT_MEMROOT flag set". But overall it seems rather inconsistent - what is the downside of protecting the assertion again with PROTECT_STATEMENT_MEMROOT?
Hi psergei, I've updated the patch according to the discussion yesterday. Note that the call to optimize_unflattened_subqueries(false) in case of no matching records needs to happen earlier as otherwise the the inner join would be destroyed in free_underlaid_joins() causing the optimization to be skipped below:
// in st_select_lex::optimize_unflattened_subqueries:
|
if (!inner_join) |
continue; |
ptal thanks:
7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY
|
ycp, ok I've checked the
6040dfcdd07 MDEV-25008 Re-protect assertion of not ROOT_FLAG_READ_ONLY
|
before reading the comments above and I agree with serg that this patch is not needed.
Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys).
Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c.
For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. (EDIT but not in 11.6-based tree for some reason ...)
As for
7a42d1299ff upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
--- a/sql/sql_cmd.h
|
+++ b/sql/sql_cmd.h
|
@@ -324,10 +324,12 @@ class Sql_cmd_dml : public Sql_cmd |
|
select_result *get_result() { return result; } |
|
+ my_off_t get_scanned_rows() { return scanned_rows; } |
+
|
Please use ha_rows. This is the type for row counts.
diff --git a/sql/sql_update.cc b/sql/sql_update.cc
|
index ea07caabf35..324761424e8 100644
|
--- a/sql/sql_update.cc
|
+++ b/sql/sql_update.cc
|
@@ -423,7 +423,7 @@ bool Sql_cmd_update::update_single_table(THD *thd)
|
switch_to_nullable_trigger_fields(*values, table);
|
|
/* Apply the IN=>EXISTS transformation to all subqueries and optimize them */
|
- if (select_lex->optimize_unflattened_subqueries(false))
|
+ if (select_lex->optimize_unflattened_subqueries(true))
|
DBUG_RETURN(TRUE);
|
Fix comment. It's "all constant subqueries" now.
Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too.
@@ -539,8 +545,18 @@ bool Sql_cmd_update::update_single_table(THD *thd)
|
|
table->update_const_key_parts(conds);
|
order= simple_remove_const(order, conds);
|
- query_plan.scanned_rows= select? select->records: table->file->stats.records;
|
-
|
+
|
+ /*
|
+ Estimate the number of scanned rows and have it accessible in
|
+ JOIN::choose_subquery_plan() from the outer join through
|
+ JOIN::sql_cmd_dml
|
+ */
|
+ scanned_rows= query_plan.scanned_rows= select ?
|
+ select->records : table->file->stats.records;
|
+ select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this; |
Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).
+ /*
|
+ In case of a DELETE or UPDATE, get number of scanned rows as an
|
+ (upper bound) estimate of how many times the subquery will be
|
+ executed.
|
+ */
|
+ else if (outer_join && outer_join->sql_cmd_dml)
|
+ outer_lookup_keys= (double) outer_join->sql_cmd_dml->get_scanned_rows(); |
Please use rows2double.
So I take the current branch, compile with {{cmake ... -DWITH_PROTECT_STATEMENT_MEMROOT=1 }}, apply this patch:
diff --git a/mysql-test/main/ps.test b/mysql-test/main/ps.test
|
index 9cb9eaf3e76..1c903583d59 100644
|
--- a/mysql-test/main/ps.test
|
+++ b/mysql-test/main/ps.test
|
@@ -3658,7 +3658,7 @@ CREATE TABLE t2 ( id INT(10) );
|
SET @save_sql_mode= @@sql_mode;
|
SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY';
|
|
-PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)';
|
+PREPARE stmt FROM 'explain UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id)';
|
execute stmt;
|
insert into t1 values (1,10),(2,10),(3,10);
|
insert into t2 values (1),(2); |
and I still see the crash.
ycp, we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece:
if (conds) |
{
|
Item::cond_result result;
|
conds= conds->remove_eq_conds(thd, &result, true); |
if (result == Item::COND_FALSE) // Impossible where |
{
|
limit= 0;
|
query_plan.set_impossible_where();
|
if (thd->lex->describe || thd->lex->analyze_stmt) |
goto produce_explain_and_leave; |
}
|
}
|
is this:
set @var1=1; |
set @var2=2; |
CREATE TABLE t1 ( id INT(10), value INT(10) ); |
CREATE TABLE t2 ( id INT(10) ); |
insert into t1 values (1,10),(2,10),(3,10); |
insert into t2 values (1),(2); |
|
PREPARE stmt FROM 'UPDATE t1 t1 SET value = (SELECT 1 FROM t2 WHERE id = t1.id) WHERE ?=?'; |
execute stmt using @var1, @var2; |
execute stmt using @var1, @var1; |
deallocate prepare stmt; |
DROP TABLE t1,t2; |
also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second.
New patch:
d295367afaf upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
Onto review comments:
before reading the comments above and I agree with serg that this patch is not needed.
Checking of ROOT_FLAG_READ_ONLY is a feature of MEM_ROOT code (which is in mysys).
Currently there is only one user - PROTECT_STATEMENT_MEMROOT. Note that "statement" is a concept from sql/. The code in mysys/ has no idea about statements, so use of PROTECT_STATEMENT_MEMROOT looks out-of-place in mysys/my_alloc.c.
OK, I've removed that patch.
For me, PROTECT_STATEMENT_MEMROOT is automatically enabled in a debug build. (EDIT but not in 11.6-based tree for some reason ...)
The following patch made it so that the flag is enabled by default in 10.6-11.1, but not 10.5 or 11.2+ - why is that, sanja?
commit 8ed3c375929b449ef8556cfa0c49a35707d59b84
|
Author: Oleksandr Byelkin <sanja@mariadb.com>
|
Date: Thu Jul 4 09:27:30 2024 +0200
|
|
Make PROTECT_STATEMENT_MEMROOT default for version less then 11.2
|
Please use ha_rows. This is the type for row counts.
Done. I didn't use ha_rows to avoid introducing the first #include to the immaculate sql_cmd.h.
Fix comment. It's "all constant subqueries" now.
Please also make the comments the same in sql_update.cc and sql_delete.cc. It's not very nice but if the code is duplicated we need to have duplicate comments too.
Done.
Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).
To reduce code duplication. Done.
Please use rows2double.
Done.
ycp, we need test coverage for all cases where execution of UPDATE/DELETE takes some non-regular path. These are : EXPLAIN (like shown above), test coverage for the impossible WHERE piece:
Done.
also test coverage for the case where prune_partitions() produces empty set for the first execution and non-empty for the second.
Done-ish, as my current patch covers this code path now. However, I still get an assertion failure when prune_partitions() produces empty set for the first ps and non-empty for the second. Turned out it is an existing bug for 10.5+, so I opened MDEV-34757 and mark it as blocking this issue.
An updated patch, which compared to the one in the previous comments has added tests for delete, as well as explanatory comments in the test main.ps_mem_leaks describing the coverage of various calls to optimize_unflattened_subqueries(false):
28ce0d1f062 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
Note that it is on top of an MDEV-34757 patch, as the testcases overlap - all the PROTECT_STATEMENT_MEMROOT cases are in main.ps_mem_leaks. The patch of this issue does not introduce any new failures, and can be pushed without MDEV-34757, as long as we ensure the test coverage by including the testcases in the push.
The current patch causes a regression on the following 2nd ps execution test.
# .opt file
|
--enable-sql-safe-updates
|
# .test file
|
create table t1 (a int, b int, primary key (a), key (b)); |
insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8); |
CREATE TABLE t2 ( id INT(10) ); |
insert into t2 values (1),(2); |
|
# calls select_lex->optimize_unflattened_subqueries(false) in the err label |
set @@optimizer_switch="index_merge=off"; |
set @var1=1; |
set @var2=2; |
prepare stmt from 'update t1 set b=(SELECT 1 FROM t2 WHERE id = t1.a) where 1=? or b=2'; |
--error ER_UPDATE_WITHOUT_KEY_IN_SAFE_MODE
|
execute stmt using @var1; |
execute stmt using @var2; |
drop table t1, t2; |
This is because, when there is an error, calling st_select_lex::optimize_unflattened_subqueries(false) will set first_cond_optimization for the inner select to false, but skip saving leaf tables:
// in JOIN::optimize_inner():
|
if (thd->is_error() || |
(!select_lex->leaf_tables_saved && select_lex->save_leaf_tables(thd)))
|
{
|
If this happens during the first ps execution, then at the second execution, it will cause an reconstruction of the leaf tables from the (not) saved leaf tables leaf_tables_exec:
// in setup_tables():
|
if (select_lex->first_cond_optimization) |
{
|
// reconstruct leaf tables |
}
|
else |
{
|
// construct empty leaf tables from the empty select_lex->leaf_tables_exec |
}
|
But it seems that the leaf tables should not be empty here, and the empty-ness is the cause of the segv in make_join_statistics().
There are two fixes I can think of:
1. Guard the call to st_select_lex::optimize_unflattened_subqueries(false) with a check on thd->is_error(). This will not result in mem leak failures in the second ps execution as seen before, because failures in the first ps execution causes the memroot to not be marked as read only:
// in Prepared_statement::execute_loop(): |
#ifdef PROTECT_STATEMENT_MEMROOT
|
if (!error) |
{
|
mem_root->flags |= ROOT_FLAG_READ_ONLY;
|
++executed_counter;
|
|
DBUG_PRINT("info", ("execute counter: %lu", executed_counter)); |
}
|
else |
{
|
// Error on call shouldn't be counted as a normal run |
executed_counter= 0;
|
mem_root->flags &= ~ROOT_FLAG_READ_ONLY;
|
}
|
#endif
|
|
2. Add a check for select_lex->leaf_table_saved in setup_tables() before deciding how to construct leaf tables to prevent the scenario where select_lex->first_cond_optimization is set to false but the call to save_leaf_tables() is skipped:
// in setup_tables(): |
if (select_lex->first_cond_optimization || !select_lex->leaf_table_saved) |
{
|
|
We can incorporate both fixes if no regressions are caused.
Hi psergei, ptal at this patch thanks
fc52f5e8c50 upstream/bb-11.6-mdev-25008 MDEV-25008 Estimate outer_lookup_keys later in mysql_delete and mysql_update
|
As before it is based on an MDEV-34757 patch, but can progress independently
It fails main.ps_3innodb main.ps_2myisam main.ps_4heap main.ps_5merge, but so does 11.6 (MDEV-34790).
Note that the verbose low-level comments about which select_lex->optimize_unflattened_subqueries(false) is called in main.ps_mem_leaks is for development only and will be removed before push
ycp, input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it?
> input piece #1: why sql_cmd_update and sql_cmd_delete are needed? Here's a patch to unify them https://gist.github.com/spetrunia/d648e160d07568cda2e3969c9f428a3b . Any objections to applying it?
psergei: No objection, but this is how I did it in the first version of my patch. But you commented in your review:
>> + select_lex->join->sql_cmd_dml= (Sql_cmd_dml *) this;
> Why (Sql_cmd_dml *) cast? It is not necessary and looks misleading (at least to me).
I understood it as using Sql_cmd_dml is less clear than using Sql_cmd_update and Sql_cmd_delete separately. Did you mean it was ok to use Sql_cmd_dml, but simply the cast was not needed?
Discussed this PROTECT_STATEMENT_MEMROOT and ROOT_FLAG_READ_ONLY issue
with people on sql-processor call on Thursday.
An alternative suggestion was floated:
if statement took a shortcut: do not set ROOT_FLAG_READ_ONLY.
They have had a similar issue with SP and its statements: SP has mem_root protection too, an SP execution will not necessarily execute all of its statements. So, they have this:
#ifdef PROTECT_STATEMENT_MEMROOT
|
if (!err_status) |
{
|
if (!(main_mem_root.flags & ROOT_FLAG_READ_ONLY) && |
has_all_instrs_executed())
|
{
|
main_mem_root.flags |= ROOT_FLAG_READ_ONLY;
|
}
|
++executed_counter;
|
Note the has_all_instrs_executed() check.
For regular prepared statement, the protection is activated in Prepared_statement::execute_loop:
#ifdef PROTECT_STATEMENT_MEMROOT
|
if (!error) |
{
|
mem_root->flags |= ROOT_FLAG_READ_ONLY;
|
++executed_counter;
|
|
DBUG_PRINT("info", ("execute counter: %lu", executed_counter)); |
}
|
else |
{
|
// Error on call shouldn't be counted as a normal run |
executed_counter= 0;
|
mem_root->flags &= ~ROOT_FLAG_READ_ONLY;
|
}
|
Note that it is not activated if there was an error...
.. so we could have
- introduced Prepared_statement::partial_execution,
- Set that on any "partial" execution
- Checked that here.
I'm not sure which approach is better. The approach in the current patch has an advantage that UPDATEs work the same way as SELECTs...
psergei: thanks for the alternative suggestion. It looks inferior to the existing solution though as it adds more complexity - what is the advantage of it?
Take-aways from the call: decided against the alternative suggestion above. Keeping the current approach.
Other input:
- Please act on this comment: https://jira.mariadb.org/browse/MDEV-25008?focusedCommentId=290099&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-290099
- need_to_optimize is misleading. Please rename it to something like optimize_suqueries.
After this, we can submit for testing.
Thanks for the review psergei.
Hi lstartseva, ptal thanks:
664cfb80717 upstream/bb-11.7-mdev-25008 MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization
|
ca56c461703 MDEV-34757 Check leaf_tables_saved in partition pruning in UPDATE and DELETE
|
ycp, the commit comment is very cryptic. Could you please change it to this, or something better:
MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization
|
|
Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
|
subqueries. This didn't allow to make a meaningful choice between IN->EXISTS
|
and Materialization strategies for subquery.
|
|
Fix this:
|
* Make UPDATE/DELETE savel Sql_cmd_dml::scanned_rows,
|
* Then, subquery's JOIN::choose_subquery_plan() can fetch it from there.
|
|
Details:
|
UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries() twice,
|
like SELECT does:
|
1. Call with const_only=true before any optimizations. This allows range
|
optimizer and others to use the values of cheap const subqueries.
|
2. Call it with const_only=false after range optimizer, partition pruning, etc.
|
outer_lookup_keys value is provided, so it's possible to pick a good subquery strategy.
|
|
Extra change: PROTECT_STATEMENT_MEMROOT requires that first SP execution performs
|
subquery optimization for all subqueries, even for degenerate query plans like
|
"Impossible WHERE". Due to that, we call optimize_unflattened_subqueries (with
|
const_only=false) even for degenerate query plans.
|
psergei, thanks, done. I made to two changes to your suggested message and updated my testing request with the new commit hashes:
1. Elaborated how select does the call twice
2. Reworded the "extra change" as it is not a change in behaviour, as previously the early call to optimize_unflattened_subqueries(false) ensured optimization for degnerate query plans
Case for comparison of execution time N=100000, rows for delete is about 48%, Release build:
CREATE TABLE t1 ( |
id int NOT NULL PRIMARY KEY, |
item_id varchar(100), |
seller_name varchar(400), |
variant varchar(400), |
FULLTEXT KEY t1_serial_IDX (item_id,seller_name,variant) |
)engine=InnoDB DEFAULT CHARSET=utf8mb4; |
|
insert into t1 select |
seq,
|
FLOOR(1 + (RAND() * 100)),
|
FLOOR(1 + (RAND() * 100)),
|
FLOOR(1 + (RAND() * 100))
|
from seq_1_to_100000; |
|
select count(*) FROM t1; |
+----------+ |
| count(*) | |
+----------+ |
| 100000 |
|
+----------+ |
|
select count(*) FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
+----------+ |
| count(*) | |
+----------+ |
| 48140 |
|
+----------+ |
Before changes (optimizer_switch='in_to_exists=on'):
MariaDB [test]> explain
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ |
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100242 | Using where | |
| 2 | SUBQUERY | <derived3> | index_subquery | NULL | distinct_key | 5 | func | 20050 | | |
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100242 | Using temporary; Using filesort | |
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+ |
|
MariaDB [test]> analyze format=json
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
|
| {
|
"query_optimization": { |
"r_total_time_ms": 0.293148733 |
},
|
"query_block": { |
"select_id": 1, |
"r_total_time_ms": 1453.569477, |
"table": { |
"delete": 1, |
"table_name": "t1", |
"access_type": "ALL", |
"rows": 100242, |
"r_rows": 100000, |
"r_total_filtered": 48.14, |
"r_total_time_ms": 185.3549061, |
"r_engine_stats": { |
"pages_accessed": 241506, |
"pages_updated": 145092 |
},
|
"attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))", |
"r_filtered": 48.14 |
},
|
"subqueries": [ |
{
|
"query_block": { |
"select_id": 2, |
"cost": 2.24023997, |
"having_condition": "innertable.m is null", |
"nested_loop": [ |
{
|
"table": { |
"table_name": "<derived3>", |
"access_type": "index_subquery", |
"key": "distinct_key", |
"key_length": "5", |
"used_key_parts": ["m"], |
"ref": ["func"], |
"loops": 1, |
"r_loops": 0, |
"rows": 20050, |
"r_rows": null, |
"cost": 2.24023997, |
"r_table_time_ms": 2.962863545, |
"r_other_time_ms": 249.7625712, |
"filtered": 100, |
"r_total_filtered": null, |
"r_filtered": null, |
"materialized": { |
"query_block": { |
"select_id": 3, |
"cost": 84.00059774, |
"r_loops": 1, |
"r_total_time_ms": 837.9879334, |
"filesort": { |
"sort_key": "t1.item_id, t1.seller_name, t1.variant", |
"r_loops": 1, |
"r_total_time_ms": 264.7830164, |
"r_used_priority_queue": false, |
"r_output_rows": 51860, |
"r_sort_passes": 102, |
"r_buffer_size": "253Kb", |
"r_sort_mode": "sort_key,rowid", |
"temporary_table": { |
"nested_loop": [ |
{
|
"table": { |
"table_name": "t1", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 100242, |
"r_rows": 100000, |
"cost": 16.5964832, |
"r_table_time_ms": 32.3232741, |
"r_other_time_ms": 288.0603587, |
"r_engine_stats": { |
"pages_accessed": 267 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"r_filtered": 100 |
}
|
}
|
]
|
}
|
}
|
}
|
}
|
}
|
}
|
]
|
}
|
}
|
]
|
}
|
} |
|
Before changes (optimizer_switch='in_to_exists=off'):
MariaDB [test]> explain
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
+------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ |
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 97701 | Using where | |
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 97701 | | |
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 97701 | Using temporary; Using filesort | |
+------+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ |
3 rows in set (0,000 sec) |
|
|
MariaDB [test]> analyze format=json
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
|
| {
|
"query_optimization": { |
"r_total_time_ms": 0.270655265 |
},
|
"query_block": { |
"select_id": 1, |
"r_total_time_ms": 1687.263844, |
"table": { |
"delete": 1, |
"table_name": "t1", |
"access_type": "ALL", |
"rows": 97701, |
"r_rows": 100000, |
"r_total_filtered": 48.101, |
"r_total_time_ms": 181.3790106, |
"r_engine_stats": { |
"pages_accessed": 241313, |
"pages_updated": 144975 |
},
|
"attached_condition": "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))", |
"r_filtered": 48.101 |
},
|
"subqueries": [ |
{
|
"materialization": { |
"r_strategy": "index_lookup", |
"r_loops": 100000, |
"query_block": { |
"select_id": 2, |
"cost": 4.375730105, |
"r_loops": 1, |
"r_total_time_ms": 1089.840414, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "<derived3>", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 97701, |
"r_rows": 51899, |
"cost": 4.375730105, |
"r_table_time_ms": 3.370080881, |
"r_other_time_ms": 269.7888225, |
"filtered": 100, |
"r_total_filtered": 100, |
"r_filtered": 100, |
"materialized": { |
"query_block": { |
"select_id": 3, |
"cost": 81.86802309, |
"r_loops": 1, |
"r_total_time_ms": 863.9443314, |
"filesort": { |
"sort_key": "t1.item_id, t1.seller_name, t1.variant", |
"r_loops": 1, |
"r_total_time_ms": 264.7268764, |
"r_used_priority_queue": false, |
"r_output_rows": 51899, |
"r_sort_passes": 102, |
"r_buffer_size": "253Kb", |
"r_sort_mode": "sort_key,rowid", |
"temporary_table": { |
"nested_loop": [ |
{
|
"table": { |
"table_name": "t1", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 97701, |
"r_rows": 100000, |
"cost": 16.1823002, |
"r_table_time_ms": 33.74205653, |
"r_other_time_ms": 292.216494, |
"r_engine_stats": { |
"pages_accessed": 267 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"r_filtered": 100 |
}
|
}
|
]
|
}
|
}
|
}
|
}
|
}
|
}
|
]
|
}
|
}
|
}
|
]
|
}
|
} |
|
After changes:
MariaDB [test]> explain
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
+------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ |
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100175 | Using where | |
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 100175 | | |
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100175 | Using temporary; Using filesort | |
+------+--------------+------------+------+---------------+------+---------+------+--------+---------------------------------+ |
3 rows in set (0,001 sec) |
|
MariaDB [test]> analyze format=json
|
-> DELETE FROM t1 WHERE id NOT IN |
-> (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
|
| {
|
"query_optimization": { |
"r_total_time_ms": 0.310001332 |
},
|
"query_block": { |
"select_id": 1, |
"r_total_time_ms": 1702.036735, |
"table": { |
"delete": 1, |
"table_name": "t1", |
"access_type": "ALL", |
"rows": 100175, |
"r_rows": 100000, |
"r_total_filtered": 48.2, |
"r_total_time_ms": 190.1354219, |
"r_engine_stats": { |
"pages_accessed": 241810, |
"pages_updated": 145277 |
},
|
"attached_condition": "!(<in_optimizer>(t1.`id`,t1.`id` in (subquery#2)))", |
"r_filtered": 48.2 |
},
|
"subqueries": [ |
{
|
"materialization": { |
"r_strategy": "index_lookup", |
"r_loops": 100000, |
"query_block": { |
"select_id": 2, |
"cost": 2.238774758, |
"r_loops": 1, |
"r_total_time_ms": 1087.639802, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "<derived3>", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 100175, |
"r_rows": 51800, |
"cost": 4.486279805, |
"r_table_time_ms": 3.019494164, |
"r_other_time_ms": 254.3995114, |
"filtered": 100, |
"r_total_filtered": 100, |
"r_filtered": 100, |
"materialized": { |
"query_block": { |
"select_id": 3, |
"cost": 83.94436367, |
"r_loops": 1, |
"r_total_time_ms": 848.9372393, |
"filesort": { |
"sort_key": "t1.item_id, t1.seller_name, t1.variant", |
"r_loops": 1, |
"r_total_time_ms": 264.4373285, |
"r_used_priority_queue": false, |
"r_output_rows": 51800, |
"r_sort_passes": 102, |
"r_buffer_size": "253Kb", |
"r_sort_mode": "sort_key,rowid", |
"temporary_table": { |
"nested_loop": [ |
{
|
"table": { |
"table_name": "t1", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 100175, |
"r_rows": 100000, |
"cost": 16.5855622, |
"r_table_time_ms": 33.04229617, |
"r_other_time_ms": 293.9514345, |
"r_engine_stats": { |
"pages_accessed": 267 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"r_filtered": 100 |
}
|
}
|
]
|
}
|
}
|
}
|
}
|
}
|
}
|
]
|
}
|
}
|
}
|
]
|
}
|
} |
|
Hi lstartseva, sorry that in the discussion about comparison of performance before and after patch in 11.7, I forgot the pertinent point lost in the long time frame of this task that the bug is not present for 11.0+ for the testcase in the description, see also my comment above[1]. If my reading is correct, this is also reflected from your test results above.
So the question now is whether we want to find a testcase that shows the performance improvements before and after the patch, in 11.7. Please advise, psergei.
Here's the testcase, it's fairly trivial:
create table t1 ( |
grp_id int, |
b int |
);
|
insert into t1 |
select MOD(seq, 100), seq |
from |
seq_1_to_1000;
|
|
|
create table t2 ( |
grp_id int, |
col1 int |
);
|
insert into t2 |
select MOD(seq, 100), seq |
from |
seq_1_to_50000;
|
analyze table t1,t2; |
explain
|
update t1 set b=b+1 |
where
|
grp_id not in (select grp_id from t2 group by grp_id having sum(col1)>100); |
analyze format=json
|
update t1 set b=b+1 |
where
|
grp_id not in (select grp_id from t2 group by grp_id having sum(col1)>100); |
This query doesn't modify any rows, that's just to allow easier re-runs.
EXPLAIN before the patch:
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
|
| 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 50283 | Using temporary |
|
+------+--------------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
EXPLAIN after the patch:
+------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
|
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 50283 | Using temporary |
|
+------+--------------+-------+------+---------------+------+---------+------+-------+-----------------+
|
Timings:
Before the patch: 16 seconds
After the patch: 0.02 seconds.
This is release build on a deskop machine.
Before: commit 7a65dcb59efe10392e74ecec3236298be138f083
After: commit 664cfb80717a4dfb05738af089d5813a253b5d59
Summary of lstartseva's comment above
Before:
- default plan is: 1453 ms
- in_to_exists=off: 1687 ms (1.16x slowdown)
After:
- (same plan as before with in_to_exists_off): 1702 ms
One odd thing I observe in lstartseva's data:
The query plan for Before changes (optimizer_switch='in_to_exists=on') uses SUBQUERY, that is, uncorrelated subquery. (Correlated subquery would use DEPENDENT SUBQUERY).
The query plan with After changes uses MATERIALIZED. But why would we need to materialize if we only need to compute the subquery once?
ycp, could you check this?
psergei: not a direct answer to your question, but I notice that monty's change that fixed the original case in 11.0+ also resulted in the query type changing from "DEPENDENT SUBQUERY" to "SUBQUERY":
# before 5e5a8eda1641eda1d915a7eb5736e494d2179795 Derived tables and union can now create distinct keys
|
explain
|
DELETE FROM t1 WHERE id NOT IN
|
(SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
|
id select_type table type possible_keys key key_len ref rows Extra
|
1 PRIMARY t1 ALL NULL NULL NULL NULL 10000 Using where
|
2 DEPENDENT SUBQUERY <derived3> ALL NULL NULL NULL NULL 10000 Using where
|
3 DERIVED t1 ALL NULL NULL NULL NULL 10000 Using temporary; Using filesort
|
# after
|
explain
|
DELETE FROM t1 WHERE id NOT IN
|
(SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
|
id select_type table type possible_keys key key_len ref rows Extra
|
1 PRIMARY t1 ALL NULL NULL NULL NULL 10000 Using where
|
2 SUBQUERY <derived3> index_subquery distinct_key distinct_key 5 func 2002
|
3 DERIVED t1 ALL NULL NULL NULL NULL 10000 Using temporary; Using filesort
|
At the parent of the patch for this issue, if we do the following then we will get "DEPENDENT SUBQUERY" again
modified sql/sql_derived.cc
|
@@ -656,7 +656,7 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
|
{
|
SELECT_LEX_UNIT *unit= derived->get_unit();
|
SELECT_LEX *first_select;
|
- bool res= FALSE, keep_row_order, distinct;
|
+ bool res= FALSE, keep_row_order/* , distinct */;
|
DBUG_ENTER("mysql_derived_prepare");
|
DBUG_PRINT("enter", ("unit: %p table_list: %p alias: '%s'",
|
unit, derived, derived->alias.str));
|
@@ -868,13 +868,15 @@ bool mysql_derived_prepare(THD *thd, LEX *lex, TABLE_LIST *derived)
|
Note that the underlying query will also control distinct condition.
|
*/
|
thd->create_tmp_table_for_derived= TRUE;
|
- distinct= (unit->first_select()->next_select() ?
|
- unit->union_distinct && !unit->union_distinct->next_select() :
|
- unit->distinct);
|
+ /*
|
+ distinct= (unit->first_select()->next_select() ?
|
+ unit->union_distinct && !unit->union_distinct->next_select() :
|
+ unit->distinct);
|
+ */
|
|
if (!(derived->table) &&
|
derived->derived_result->create_result_table(thd, &unit->types,
|
- distinct,
|
+ false,
|
(first_select->options |
|
thd->variables.option_bits |
|
TMP_TABLE_ALL_COLUMNS),
|
TL,DR: This "SUBQUERY" in EXPLAIN output doesn't seem to reflect what's going in. Execution looks like "DEPENDENT SUBQUERY".
Running Alice's testcase on 11.2, I get this EXPLAIN:
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 100048 | Using where |
|
| 2 | SUBQUERY | <derived3> | index_subquery | NULL | distinct_key | 5 | func | 20011 | |
|
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 100048 | Using temporary; Using filesort |
|
+------+-------------+------------+----------------+---------------+--------------+---------+------+--------+---------------------------------+
|
and this ANALYZE:
analyze format=json DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable); |
{
|
"query_optimization": { |
"r_total_time_ms": 1.214205366 |
},
|
"query_block": { |
"select_id": 1, |
"r_total_time_ms": 294566.2563, |
"table": { |
"delete": 1, |
"table_name": "t1", |
"access_type": "ALL", |
"rows": 100048, |
"r_rows": 100000, |
"r_filtered": 48.155, |
"r_total_time_ms": 60829.51264, |
"r_engine_stats": { |
"pages_accessed": 241689, |
"pages_updated": 145142 |
},
|
"attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))" |
},
|
"subqueries": [ |
{
|
"query_block": { |
"select_id": 2, |
"cost": 2.235889134, |
"having_condition": "innertable.m is null", |
"nested_loop": [ |
{
|
"table": { |
"table_name": "<derived3>", |
"access_type": "index_subquery", |
"key": "distinct_key", |
"key_length": "5", |
"used_key_parts": ["m"], |
"ref": ["func"], |
"loops": 1, |
"r_loops": 0, |
"rows": 20011, |
"r_rows": null, |
"cost": 2.235889134, |
"r_table_time_ms": 88.26781154, |
"r_other_time_ms": 9677.190753, |
"filtered": 100, |
"r_filtered": null, |
"materialized": { |
"query_block": { |
"select_id": 3, |
"cost": 83.83777121, |
"r_loops": 1, |
"r_total_time_ms": 5724.475274, |
"filesort": { |
"sort_key": "t1.item_id, t1.seller_name, t1.variant", |
"r_loops": 1, |
"r_total_time_ms": 103.603556, |
"r_used_priority_queue": false, |
"r_output_rows": 51845, |
"r_buffer_size": "2045Kb", |
"r_sort_mode": "packed_sort_key,rowid", |
"temporary_table": { |
"nested_loop": [ |
{
|
"table": { |
"table_name": "t1", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 100048, |
"r_rows": 100000, |
"cost": 16.5648612, |
"r_table_time_ms": 1464.248393, |
"r_other_time_ms": 3917.197298, |
"r_engine_stats": { |
"pages_accessed": 267 |
},
|
"filtered": 100, |
"r_filtered": 100 |
}
|
}
|
]
|
}
|
}
|
}
|
}
|
}
|
}
|
]
|
}
|
}
|
]
|
}
|
}
|
Note the
"table_name": "<derived3>",
|
"access_type": "index_subquery",
|
"key": "distinct_key",
|
"key_length": "5",
|
"used_key_parts": ["m"],
|
"ref": ["func"],
|
"loops": 1,
|
"r_loops": 0,
|
But in debugger I can see that subselect_indexsubquery_engine::exec() is invoked many times... Why do we get r_loops:0 here...
As for 11.0 changing how this query is optimized... Yes it does change it.
DELETE FROM t1 |
WHERE id NOT IN (SELECT m -- select #2 |
FROM (SELECT max(id) m -- select #3 |
FROM t1 |
GROUP BY item_id, seller_name, variant |
) AS innertable) |
Before 11.0:
select #3 was materialized into a temporary table which didn't have a useful index.
Then, NOT IN subquery used IN-to-EXISTS which had to scan the temptable many times.
After 11:0:
select #3 is materialized into a table with a potential index on innertable.m
The NOT IN still uses IN-to-EXISTS.
IN-to-EXISTS pushes the "t1.id=innertable.m" into subquery#2.
potential index on innertable.m becomes a real index, and that index is now used to perform
[non-unique] index lookups.
This way, the query plan is now similar to Materialization strategy - have a temptable with a suitable index, make lookups there when computing the subquery.
(Also EXPLAIN is wrong as shown above but this doesn't seem to have much effect).
Take away: need to fix EXPLAIN for the above case but this is not a blocker for this MDEV.
Filed MDEV-35231 for fixing the EXPLAIN. It's a different problem from this MDEV, this MDEV doesn't need to wait on it.
Testing done. Ok to push. The problems discussed above will be fixed in MDEV-35231
I tried with 10000 rows, 2.8s in MariaDB 10.5.9 vs 0.07 Mysql 8.0.21. After setting optimizer_switch to 'in_to_exists=off' - MariaDB 10.5.9 -0.02
--source include/have_innodb.inc
--source include/have_sequence.inc
)engine=innodb;
analyze format=json
MariaDB 10.5.9
MariaDB [test]> analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
+------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 0.00 | Using where |
| 2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 10000 | 5000.50 | 100.00 | 0.02 | Using where |
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | Using temporary; Using filesort |
+------+--------------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
3 rows in set (2.900 sec)
MariaDB [test]> analyze format=json DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ANALYZE |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"query_block": {
"select_id": 1,
"r_total_time_ms": 2958.619018,
"table": {
"delete": 1,
"table_name": "t1",
"access_type": "ALL",
"rows": 10000,
"r_rows": 10000,
"r_filtered": 0,
"r_total_time_ms": 1.089836561,
"attached_condition": "!(<in_optimizer>(t1.`id`,<exists>(subquery#2)))"
},
"subqueries": [
{
"query_block": {
"select_id": 2,
"r_loops": 10000,
"r_total_time_ms": 2956.083461,
"having_condition": "innertable.m is null",
"table": {
"table_name": "<derived3>",
"access_type": "ALL",
"r_loops": 10000,
"rows": 10000,
"r_rows": 5000.5,
"r_table_time_ms": 624.9038041,
"r_other_time_ms": 2316.295274,
"filtered": 100,
"r_filtered": 0.019998,
"attached_condition": "10000 = innertable.m or innertable.m is null",
"materialized": {
"query_block": {
"select_id": 3,
"r_loops": 1,
"r_total_time_ms": 15.52120094,
"filesort": {
"sort_key": "t1.item_id, t1.seller_name, t1.variant",
"r_loops": 1,
"r_total_time_ms": 4.314342992,
"r_used_priority_queue": false,
"r_output_rows": 10000,
"r_buffer_size": "2047Kb",
"r_sort_mode": "packed_sort_key,rowid",
"temporary_table": {
"table": {
"table_name": "t1",
"access_type": "ALL",
"r_loops": 1,
"rows": 10000,
"r_rows": 10000,
"r_table_time_ms": 1.606918251,
"r_other_time_ms": 7.823054272,
"filtered": 100,
"r_filtered": 100
}
}
}
}
}
}
}
}
]
}
} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (2.957 sec)
Mysql 8.0.21
mysql> explain DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | DELETE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | Using where |
| 2 | SUBQUERY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | NULL |
| 3 | DERIVED | t1 | NULL | ALL | t1_serial_IDX | NULL | NULL | NULL | 10000 | 100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
3 rows in set, 1 warning (0.00 sec)
Note (Code 1003): delete from `test`.`t1` where <in_optimizer>(`test`.`t1`.`id`,`test`.`t1`.`id` in ( <materialize> (/* select#2 */ select `innertable`.`m` from (/* select#3 */ select max(`test`.`t1`.`id`) AS `m` from `test`.`t1` group by `test`.`t1`.`item_id`,`test`.`t1`.`seller_name`,`test`.`t1`.`variant`) `innertable` where true having true ), <primary_index_lookup>(`test`.`t1`.`id` in <temporary table> on <auto_distinct_key> where ((`test`.`t1`.`id` = `<materialized_subquery>`.`m`)))) is false)
mysql> explain analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Delete from t1 (immediate)
-> Filter: <in_optimizer>(t1.id,t1.id in (select #2) is false) (cost=1015.12 rows=10000) (actual time=75.521..75.521 rows=0 loops=1)
-> Table scan on t1 (cost=1015.12 rows=10000) (actual time=0.023..17.489 rows=10000 loops=1)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.id = `<materialized_subquery>`.m)) (actual time=0.000..0.000 rows=1 loops=10001)
-> Limit: 1 row(s) (actual time=0.000..0.000 rows=1 loops=10001)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (m=t1.id) (actual time=0.000..0.000 rows=1 loops=10001)
-> Materialize with deduplication (actual time=0.006..0.006 rows=1 loops=10001)
-> Table scan on innertable (actual time=0.001..0.336 rows=10000 loops=1)
-> Materialize (actual time=49.180..49.904 rows=10000 loops=1)
-> Table scan on <temporary> (actual time=0.001..0.652 rows=10000 loops=1)
-> Aggregate using temporary table (actual time=47.329..48.414 rows=10000 loops=1)
-> Table scan on t1 (cost=1015.12 rows=10000) (actual time=0.009..31.229 rows=10000 loops=1)
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.08 sec)
mysql> DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
Query OK, 0 rows affected (0.07 sec)
with optimizer_switch='in_to_exists=off':
MariaDB 10.5.9
MariaDB [test]> set optimizer_switch='in_to_exists=off';
Query OK, 0 rows affected (0.000 sec)
MariaDB [test]> analyze DELETE FROM t1 WHERE id NOT IN (SELECT m FROM (SELECT max(id) m FROM t1 GROUP BY item_id, seller_name, variant) AS innertable);
+------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 0.00 | Using where |
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | |
| 3 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | 10000.00 | 100.00 | 100.00 | Using temporary; Using filesort |
+------+--------------+------------+------+---------------+------+---------+------+-------+----------+----------+------------+---------------------------------+
3 rows in set (0.022 sec)