Details
Description
A COUNT() query containing EXISTS and subqueries returns wrong results.
Sometimes disabling the optimizer switch exists_to_in works as expected, I can't always reproduce this behaviour thou.
Dropping an index (see sample.txt) also seems to fix, but there it's still 1 row off: 468 vs 469)
Attachments
- sample.txt
- 5 kB
- test_case_scr.sql
- 486 kB
Activity
Test case |
DROP TABLE IF EXISTS t1, t2, t3; |
|
CREATE TABLE t1 (a INT NOT NULL); |
INSERT INTO t1 VALUES (1),(1),(1),(5),(5); |
|
CREATE TABLE t2 (b INT); |
INSERT INTO t2 VALUES (5),(1); |
|
CREATE TABLE t3 (c INT, KEY(c)); |
INSERT INTO t3 VALUES (5),(5); |
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a); |
|
SET optimizer_switch='exists_to_in=off'; |
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a); |
|
DROP TABLE t1, t2, t3; |
Result with the default switch (wrong) |
+---+
|
| a |
|
+---+
|
| 5 |
|
| 5 |
|
| 5 |
|
| 5 |
|
+---+
|
4 rows in set (0.00 sec)
|
plan |
MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
| 1 | PRIMARY | t3 | index | c | c | 5 | NULL | 2 | 100.00 | Using index |
|
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | 100.00 | Using where |
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where; Using join buffer (flat, BNL join) |
|
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | |
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
Result with exists_to_in=OFF (correct) |
+---+
|
| a |
|
+---+
|
| 5 |
|
| 5 |
|
+---+
|
2 rows in set (0.00 sec)
|
plan |
+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where |
|
| 3 | DEPENDENT SUBQUERY | t3 | ref | c | c | 5 | test.t1.a | 1 | 100.00 | Using index |
|
| 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
|
+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+
|
Reproducible on 10.x, both with MyISAM and InnoDB. Not reproducible on 5.5.
It looks like the result is just result of usual join instead of 'semi-'
select `test`.`t1`.`a` AS `a` from `test`.`t1` join (`test`.`t2`) join (`test`.`t3`) where ((`test`.`t2`.`b` = `test`.`t3`.`c`) and (`test`.`t1`.`a` = `test`.`t3`.`c`));
a
5
5
5
5
This is a bug of semi-join, here is the proof (2 last selects return correct results):
SET @optimiser_switch_save= @@optimizer_switch;
|
|
CREATE TABLE t1 (a INT NOT NULL);
|
INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
|
|
CREATE TABLE t2 (b INT);
|
INSERT INTO t2 VALUES (5),(1);
|
|
CREATE TABLE t3 (c INT, KEY(c));
|
INSERT INTO t3 VALUES (5),(5);
|
|
SET optimizer_switch='exists_to_in=on';
|
|
explain
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
|
SET optimizer_switch='semijoin=off';
|
explain
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
|
SET optimizer_switch='exists_to_in=off';
|
explain
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
|
SET @@optimiser_switch= @optimizer_switch_save;
|
DROP TABLE t1, t2, t3;
|
It is only semi-join problem:
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
|
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
|
a
|
5
|
5
|
5
|
5
|
SET optimizer_switch='semijoin=off';
|
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
|
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
|
a
|
5
|
5
|
MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
| 1 | PRIMARY | t3 | index | c | c | 5 | NULL | 2 | 100.00 | Using index |
|
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | 100.00 | Using where |
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where; Using join buffer (flat, BNL join) |
|
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | |
|
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|
The query plan is incorrect.
- EXPLAIN EXTENDED + SHOW WARNINGS show that the subqueries were converted into semi-joins (but not inner joins).
- The query plan has no ways to remove duplicate matches produced by table t3.
Debugging. The optimizer constructs this join prefix:
idx=0, table=t3
|
idx=1, table=t2
|
idx=2, table=t1
|
without any parts to remove duplicates (dups_producing_tables=6, 6=2+4={t2,t3}).
Then, in advance_sj_state():
- LooseScan strategy is used to remove duplicates produced by table t3.
- SJ-Materialization-scan is used to remove duplicates produced by table t2.
However, one can store only one value at a time in join->positions[2].sj_strategy.
Thus, information about the need to use LooseScan for t3 is lost.
fix_semijoin_strategies_for_picked_join_order() only sets up execution for SJ_MATERIALIZE_SCAN.
Possible ways to fix this:
- Allow multiple semi-join strategies to be attached to one POSITION object.
- * I am not sure about all consequences of an attempt to do this
- Resolve such "collisions" between strategies by falling back to DuplicateElimination. That strategy is special as it can handle multiple semi-joins at once and is always applicable (so all fanout will be removed). It is not always the most performant, though.
5.5 test suite
SET @optimiser_switch_save= @@optimizer_switch;
|
|
CREATE TABLE t1 (a INT NOT NULL);
|
INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
|
|
CREATE TABLE t2 (b INT);
|
INSERT INTO t2 VALUES (5),(1);
|
|
CREATE TABLE t3 (c INT, KEY(c));
|
INSERT INTO t3 VALUES (5),(5);
|
|
SET optimizer_switch='semijoin=on';
|
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
|
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
|
|
SET optimizer_switch='semijoin=off';
|
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
|
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
|
|
SET @@optimiser_switch= @optimizer_switch_save;
|
DROP TABLE t1, t2, t3;
|
I was unable to find any issues in the patch.
Let's still ask elenst (or should we ask alice now?) to do a test pass with a semi-join RQG grammar, with various values of the firstmatch,loosescan,materialization flags in @@optimizer_switch.
claudio.nanni, the 1-row difference between the conditional SELECT and unconditional one seems to be legitimate, at least in the attached test case.
There is one line in sceb205fa121 with dia34a99d3bc = 21, which is naturally counted by the unconditional SELECT. The WHERE clause in the conditional SELECT has two EXISTS:
For the first EXISTS, there is no row in ed8ec6a6b8d1 which would have values (21,4,34), so the row is excluded from the count.