Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
12.0
-
None
Description
colored textFrom MDEV-34888: For SEMIJOIN hints, if no strategies are named, semijoin is used if possible based on the strategies enabled according to the optimizer_switch system variable. If strategies are named but inapplicable for the statement, DUPSWEEDOUT is used.
Testcase:
CREATE TABLE t1 (a INTEGER NOT NULL, b INT, PRIMARY KEY (a)); |
CREATE TABLE t2 (a INTEGER NOT NULL, KEY (a)); |
CREATE TABLE t3 (a INTEGER NOT NULL, b INT, KEY (a)); |
INSERT INTO t1 VALUES (1,10), (2,20), (3,30), (4,40); |
INSERT INTO t2 VALUES (2), (2), (3), (3), (4), (5); |
INSERT INTO t3 VALUES (10,3), (15,3), (20,4), (30,5); |
|
EXPLAIN EXTENDED
|
SELECT /*+ SEMIJOIN(@subq1 LOOSESCAN) SEMIJOIN(@subq2 LOOSESCAN) */ * FROM t1 |
WHERE t1.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3) |
AND t1.b IN (SELECT /*+ QB_NAME(subq2) */ a FROM t2); |
|
DROP TABLE t1,t2,t3; |
This query assumes that LooseScan is not possible for both subqueries at the same time, so in the second case it should be replaced with DuplicateWeedout.
For engine=MyISAM everything looks good:
EXPLAIN EXTENDED
|
SELECT /*+ SEMIJOIN(@subq1 LOOSESCAN) SEMIJOIN(@subq2 LOOSESCAN) */ * FROM t1 |
WHERE t1.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3) |
AND t1.b IN (SELECT /*+ QB_NAME(subq2) */ a FROM t2); |
id select_type table type possible_keys key key_len ref rows filtered Extra |
1 PRIMARY t3 index a a 4 NULL 4 100.00 Using index; LooseScan |
1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 test.t3.a 1 25.00 Using where |
1 PRIMARY t2 ref a a 4 test.t1.b 1 66.67 Using index; Start temporary; End temporary |
Warnings:
|
Note 1003 select /*+ SEMIJOIN(@`subq1` LOOSESCAN) SEMIJOIN(@`subq2` LOOSESCAN) */ `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t3`) semi join (`test`.`t2`) where `test`.`t1`.`a` = `test`.`t3`.`a` and `test`.`t2`.`a` = `test`.`t1`.`b` |
But for engine=InnoDB (Aria) DuplicateWeedout is not used:
XPLAIN EXTENDED
|
SELECT /*+ SEMIJOIN(@subq1 LOOSESCAN) SEMIJOIN(@subq2 LOOSESCAN) */ * FROM t1 |
WHERE t1.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3) |
AND t1.b IN (SELECT /*+ QB_NAME(subq2) */ a FROM t2); |
id select_type table type possible_keys key key_len ref rows filtered Extra |
1 PRIMARY t2 index a a 4 NULL 6 100.00 Using index |
1 PRIMARY t3 index a a 4 NULL 4 100.00 Using index; LooseScan |
1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 test.t3.a 1 25.00 Using where |
Warnings:
|
Note 1003 select /*+ SEMIJOIN(@`subq1` LOOSESCAN) SEMIJOIN(@`subq2` LOOSESCAN) */ `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t3`) semi join (`test`.`t2`) where `test`.`t1`.`a` = `test`.`t3`.`a` and `test`.`t1`.`b` = `test`.`t2`.`a` |
But if we use the DuplicateWeedout strategy in both cases, we can see that its use is acceptable:
EXPLAIN EXTENDED
|
SELECT /*+ SEMIJOIN(@subq1 DUPSWEEDOUT) SEMIJOIN(@subq2 DUPSWEEDOUT) */ * FROM t1 |
WHERE t1.a IN (SELECT /*+ QB_NAME(subq1) */ a FROM t3) |
AND t1.b IN (SELECT /*+ QB_NAME(subq2) */ a FROM t2); |
id select_type table type possible_keys key key_len ref rows filtered Extra |
1 PRIMARY t3 index a a 4 NULL 4 100.00 Using index; Start temporary |
1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 test.t3.a 1 25.00 Using where; End temporary |
1 PRIMARY t2 ref a a 4 test.t1.b 1 66.67 Using index; Start temporary; End temporary |
Warnings:
|
Note 1003 select /*+ SEMIJOIN(@`subq1` DUPSWEEDOUT) SEMIJOIN(@`subq2` DUPSWEEDOUT) */ `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t3`) semi join (`test`.`t2`) where `test`.`t1`.`a` = `test`.`t3`.`a` and `test`.`t2`.`a` = `test`.`t1`.`b` |
Attachments
Issue Links
- blocks
-
MDEV-34888 Implement subquery optimizer hints
-
- Stalled
-
-
MDEV-35504 MySQL 8-style optimizer hints: milestone 1
-
- Stalled
-
- is caused by
-
MDEV-34888 Implement subquery optimizer hints
-
- Stalled
-
- split to
-
MDEV-36707 Semi-joins can introduce incorrect cardinality
-
- Open
-
A bit simplified notation (no change to the logic):
EXPLAIN EXTENDED
The behaviour appears to be dependent on the uniqueness of KEY(a) of `t1`. If the key is not unique:
then we get
EXPLAIN EXTENDED
LooseScan/DupsWeedout, which is expected.
However, if the key is unique
then the EXPLAIN output
EXPLAIN EXTENDED
Warnings:
looks like a case of table pullout of the table `t2`. However, table pullout does not seem applicable here since values of `t2.a` are not unique.