|
Debugging the case 1, with GROUP BY
select * from t1 where a in (select a from t2 group by a);
|
In the function remove_redundant_subquery_clauses(select_lex), we remove the GROUP BY clause, here is the code snippet
/*
|
Remove GROUP BY if there are no aggregate functions and no HAVING
|
clause
|
*/
|
if (subq_select_lex->group_list.elements &&
|
!subq_select_lex->with_sum_func && !subq_select_lex->join->having)
|
{
|
for (ORDER *ord= subq_select_lex->group_list.first; ord; ord= ord->next)
|
{
|
(*ord->item)->walk(&Item::eliminate_subselect_processor, FALSE, NULL);
|
}
|
subq_select_lex->join->group_list= NULL;
|
subq_select_lex->group_list.empty();
|
DBUG_PRINT("info", ("GROUP BY removed"));
|
}
|
So the first case is also converted to a semi-join. We can look this in the optimizer trace to be sure too
"join_preparation": {
|
"select_id": 2,
|
"steps": [
|
{
|
"transformation": {
|
"select_id": 2,
|
"from": "IN (SELECT)",
|
"to": "materialization",
|
"sjm_scan_allowed": true,
|
"possible": true
|
}
|
},
|
{
|
"transformation": {
|
"select_id": 2,
|
"from": "IN (SELECT)",
|
"to": "semijoin",
|
"chosen": true
|
}
|
},
|
|
|
lets make sure we consider the case of JTBM
the query is:
select * from t1 where a in (select max(a) from t2 group by a);
|
We do have a keyuse, looking into the optimizer trace:
{
|
"ref_optimizer_key_uses": [
|
{
|
"table": "<subquery2>",
|
"field": "max(a)",
|
"equals": "t1.a",
|
"null_rejecting": true
|
}
|
]
|
},
|
and the choice of scanning the <subquery2> or making a lookup in the <subquery2> is cost based
|
|
Ok I did some digging while fixing issue for MDEV-8306:
MariaDB [test]> explain extended select * from t0 where a in (select a from t2);
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+
|
| 1 | PRIMARY | t0 | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | |
|
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | 100.00 | |
|
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | |
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+
|
The data used here is:
create table t0 (a int);
|
insert into t0 values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
|
create table t2 (a int);
|
insert into t2 values (12), (88), (47), (33), (28);
|
insert into t2 values (12), (88), (47), (33), (28);
|
So the lookups are made in the materialized table for semi-joins too and they are not dependent on the fact that a keyuse should be present.
In the function setup_sj_materialization_part2 , we create the ref access that would be used to make lookup into the materialized table
Also in best_extension_by_limited_search, there is a function advance_sj_state which after adding a table, tries to see if a semi-join strategy is possible and then there is a cost based approach to pick the best semi-join strategy.
|
This becomes a real problem when t1 and t2 are 100 times bigger.
Trying to do that:
create table ten(a int);
|
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
|
|
create table one_k(a int);
|
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
|
|
create table t1big like t1;
|
create table t2big like t2;
|
insert into t1big select A.a +1000*B.a from one_k A, ten B;
|
insert into t2big select floor(rand()*1000) from one_k limit 500;
|
explain select * from t1big where a in (select a from t2big group by a);
|
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+
|
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 500 | |
|
| 1 | PRIMARY | t1big | ALL | NULL | NULL | NULL | NULL | 10000 | Using where; Using join buffer (flat, BNL join) |
|
| 2 | MATERIALIZED | t2big | ALL | NULL | NULL | NULL | NULL | 500 | |
|
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+
|
Looks fairly bad.
But if one disables the join buffering, the plan is good:
set join_cache_level=0;
|
explain select * from t1big where a in (select a from t2big group by a);
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+
|
| 1 | PRIMARY | t1big | ALL | NULL | NULL | NULL | NULL | 9980 | |
|
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | |
|
| 2 | MATERIALIZED | t2big | ALL | NULL | NULL | NULL | NULL | 500 | |
|
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+
|
|
|
Trying on mysql-8.0:
mysql> explain select * from t1big where a in (select a from t2big group by a);
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
|
| 1 | SIMPLE | t1big | NULL | ALL | NULL | NULL | NULL | NULL | 9980 | 10.00 | Using where; Using join buffer (Block Nested Loop) |
|
| 2 | MATERIALIZED | t2big | NULL | ALL | NULL | NULL | NULL | NULL | 500 | 100.00 | NULL |
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|
3 rows in set, 1 warning (0.01 sec)
|
The same plan.
mysql> set optimizer_switch='block_nested_loop=off';
|
Query OK, 0 rows affected (0.00 sec)
|
|
mysql> explain select * from t1big where a in (select a from t2big group by a);
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
|
| 1 | SIMPLE | t1big | NULL | ALL | NULL | NULL | NULL | NULL | 9980 | 10.00 | Using where |
|
| 2 | MATERIALIZED | t2big | NULL | ALL | NULL | NULL | NULL | NULL | 500 | 100.00 | NULL |
|
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
3 rows in set, 1 warning (0.00 sec)
|
that is, without join buffering, it still would not use the index of the materialized table
|
|
Reading MySQL's Optimizer Trace for this:
It starts empty prefix, puts table t2big there first:
"considered_execution_plans": [
|
{
|
"plan_prefix": [],
|
"table": "`t2big`",
|
"best_access_path": { ...},
|
"semijoin_strategy_choice": [...],
|
Then adds t1big:
"rest_of_plan": [
|
{
|
"plan_prefix": [ "`t2big`" ],
|
"table": "`t1big`",
|
"best_access_path": { ... },
|
"semijoin_strategy_choice": [
|
Considers MaterializeScan and DuplicatesWeedout (coosing the former:
"semijoin_strategy_choice": [
|
{
|
"strategy": "MaterializeScan",
|
...
|
},
|
{
|
"strategy": "DuplicatesWeedout",
|
...
|
}
|
],
|
"chosen": true
|
}
|
]
|
},
|
Then, it puts t1big first:
{
|
"plan_prefix": [ ],
|
"table": "`t1big`",
|
"best_access_path": {...},
|
"semijoin_strategy_choice": [],
|
"pruned_by_heuristic": true
|
},
|
]
|
and prunes away this plan. The join order "t1big, t2big" is never considered.
|
|
In MariaDB, the trace shows that "t2big,t1big" order is considered:
"rest_of_plan": [
|
{
|
"plan_prefix": ["t2big"],
|
"table": "t1big",
|
"best_access_path": {
|
"considered_access_paths": [
|
{
|
"access_type": "scan",
|
"resulting_rows": 9980,
|
"cost": 21,
|
"chosen": true
|
}
|
]
|
}
|
}
|
]
|
What happens with semi-join optimization is not clear as semi-joins are not yet printed (and maybe some other necessary details are not printed, either).
|
|
Debugging how MariaDB optimizer processes this
Join order: t1big, t2big
t1big:
|
current_record_count=9980
|
current_read_time=2017 (= 21 cost===
|
So we get:
|
+ 1996 output size penalty)
|
t2big:
|
current_record_count=4,990,000
|
// position->read_time=1
|
current_read_time =
|
read_time + // 2017
|
position->read_time + // 1
|
current_record_count/TIME_FOR_COMPARE // ~1M
|
= 1,000,018
|
advance_sj_state() for "t1big, t2big"
|
// FirstMatch finds the plan applicable
|
// SJ-Materialization:
|
mat_info->materialization_cost.total_cost()=26
|
mat_info->lookup_cost.total_cost()= 0.05
|
|
mat_read_time = 2542
|
// ^^ this is 2017 + 26 + 9980 * lookup_cost
|
|
After that, we have:
|
(gdb) print current_record_count
|
$54 = 9980
|
(gdb) print current_read_time
|
$55 = 2542
|
(gdb) p join->positions[idx].sj_strategy
|
$57 = SJ_OPT_MATERIALIZE
|
Join order: t2big, t1big
t2big:
|
current_record_count=500
|
current_read_time=101 (= 1 cost + 100 output size penalty)
|
t1big:
|
current_record_count=4,990,000
|
// position->read_time=21
|
|
current_read_time =
|
read_time + // 10
|
position->read_time + // 21
|
current_record_count/TIME_FOR_COMPARE // ~1M
|
= 998,122
|
advance_sj_state() for "t2big, t1big":
|
|
// SJ-Materialization:
|
mat_info->materialization_cost.total_cost()= 26
|
mat_info->scan_cost.total_cost()=25
|
|
prefix_cost= (sum of the above two)=51
|
prefix_rec_count=500
|
|
optimize_wo_join_buffering:
|
(gdb) p curpos.records_read
|
$92 = 9980
|
(gdb) p curpos.read_time
|
$93 = 21
|
|
(gdb) print prefix_rec_count
|
$95 = 4990000
|
(gdb) print prefix_cost
|
$96 = 72
|
back in best_extension_by_limited_search, we get:
(gdb) p *current_read_time
|
$100 = 72
|
(gdb) p *current_record_count
|
$101 = 4990000
|
Two things are wrong here:
- subquery's fanout is not removed from current_record_count
- there is no out_rows/TIME_FOR_COMPARE penalty.
Because of that, the cost comparison is not apples-to-apples.
(also note that the two errors work to cancel each other out here)
mysql> explain select * from t1big where a in (select a from t2big);
|
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 500 | |
|
| 1 | PRIMARY | t1big | ALL | NULL | NULL | NULL | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
|
| 2 | MATERIALIZED | t2big | ALL | NULL | NULL | NULL | NULL | 500 | |
|
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|
mysql> show status like 'Last_query_cost%';
|
+-----------------+-----------+
|
| Variable_name | Value |
|
+-----------------+-----------+
|
| Last_query_cost | 71.999000 |
|
+-----------------+-----------+
|
|
|
Second variant of the patch (smaller):
http://lists.askmonty.org/pipermail/commits/2019-August/013944.html
|
|
Checked the patch against related fixes:
Testcase from MDEV-20083 is fixed
https://jira.mariadb.org/browse/MDEV-20083?focusedCommentId=131221&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131221
MDEV-20105, Testcase 1 is fixed
https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131291 (a4.test) - this is fixed.
MDEV-20105, Test case 2 not fixed
https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131292&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131292 (a5.test)
This is not fixed:
id select_type table type possible_keys key key_len ref rows Extra
|
1 PRIMARY regression_1_478389 ALL NULL NULL NULL NULL 300000
|
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5000 Using where; FirstMatch(regression_1_478389); Using join buffer (flat, BNL join)
|
3 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
|
"r_total_time_ms": 353638, (in MariaDB 10.2: "r_total_time_ms": 384.23)
Materialization is not used because constants are strings (as they are quoted)?
MDEV-20105, Test case 3 - not fixed
https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131293&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131293 (a6.test)
id select_type table type possible_keys key key_len ref rows Extra
|
1 PRIMARY a ALL PRIMARY NULL NULL NULL 300000
|
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5000 Using where; FirstMatch(a); Using join buffer (flat, BNL join)
|
1 PRIMARY b eq_ref PRIMARY PRIMARY 4 test.a.id 1
|
1 PRIMARY <derived5> ALL NULL NULL NULL NULL 5000 Using where; FirstMatch(b); Using join buffer (flat, BNL join)
|
5 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
|
3 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
|
"r_total_time_ms": 367334,
|
In 10.2:
id select_type table type possible_keys key key_len ref rows Extra
|
1 SIMPLE a ALL PRIMARY NULL NULL NULL 300000 Using where
|
1 SIMPLE b eq_ref PRIMARY PRIMARY 4 test.a.id 1 Using where
|
"r_total_time_ms": 502.61,
|
Materialization is not used because constants are strings (as they are quoted)?
MDEV-20105, Test case 4 - not fixed
https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131294&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131294 (a7.test)
id select_type table type possible_keys key key_len ref rows Extra
|
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5000 Start temporary
|
1 PRIMARY a eq_ref PRIMARY PRIMARY 4 tvc_0._col_1 1 Using index condition; End temporary
|
1 PRIMARY b eq_ref PRIMARY PRIMARY 4 test.a.id 1
|
1 PRIMARY <derived5> ALL NULL NULL NULL NULL 5000 Using where; FirstMatch(b); Using join buffer (flat, BNL join)
|
5 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
|
3 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used
|
"r_total_time_ms": 3683.4,
|
In 10.2 it was:
1 SIMPLE a range PRIMARY PRIMARY 4 NULL 4996 Using index condition
|
1 SIMPLE b eq_ref PRIMARY PRIMARY 4 test.a.id 1 Using where
|
"r_total_time_ms": 131.5,
|
Materialization is not used because constants are strings (as they are quoted)?
|
|
Hi Sergei, that's great news thanks for the comment. That basically should fix it all at least for us, as we're using only large integer predicate sets, we are using strings too but much lower amounts than 1000 in one IN(...).
I run these not-yet-fixed tests (2,3,4) on 10.4.7 to be sure conversion between int column type in the table and string IN (...) plays no role and yes, times differ insignificiantly with different CREATE TABLE with all columns being varchar (<500ms) just different data type inside IN(...) seems to result in much different performance (for unpatched server), so yes that seems to be the cause...
Is there some way of "forcing" (like FORCE INDEX xxx) of this materialization / unique key?
Thanks,
Slawomir.
|
|
... Addressed review input, and backported the patch to 10.3. Currently the code is being tested at: https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.3-mdev20109
|
|
Hi pslawek83,
> Is there some way of "forcing" (like FORCE INDEX xxx) of this materialization / unique key?
Currently no, there's no way to force it.
|
|
Ok, the fix is pushed into the MariaDB 10.3 tree:
https://github.com/MariaDB/server/commit/ef76f81c982bdbcfa4797ce26224db9c016ddebd
https://github.com/MariaDB/server/commit/a379f151b18e4bc7e4c26be5f5e1254a59e27bfa
The patches should be applicable to the earlier versions of MariaDB as well. They are not pushed there, because those server versions have been stable for a long time so change in the query optimizer might not be welcome there.
On the other hand, this and related bugs show there is an issue in 10.3 (where in-predicate-to-subquery conversion feature was introduced) , so the patch goes into 10.3 (and not just into 10.4)
|
|
After this was merged to 10.4, the testcase was failing intermittently. Fixed it by collecting EITS stats for the tables (also fixed similar failure in other parts of the test )
http://lists.askmonty.org/pipermail/commits/2019-September/013964.html
|
|
Marking this MDEV is closed as the patch has been pushed. The linked issues will be closed after checking the quoted variant.
|