[MDEV-13933] Wrong results in COUNT() query with EXISTS and exists_to_in Created: 2017-09-28  Updated: 2020-08-25  Resolved: 2018-01-10

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.1.26, 10.2
Fix Version/s: 5.5.59, 10.0.34, 10.3.4, 10.1.31, 10.2.13

Type: Bug Priority: Critical
Reporter: Claudio Nanni Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux


Attachments: Text File sample.txt     File test_case_scr.sql    
Sprint: 10.2.10, 5.5.58, 10.1.29, 10.1.30, 10.2.12, 5.5.59

 Description   

A COUNT() query containing EXISTS and subqueries returns wrong results.
Sometimes disabling the optimizer switch exists_to_in works as expected, I can't always reproduce this behaviour thou.
Dropping an index (see sample.txt) also seems to fix, but there it's still 1 row off: 468 vs 469)



 Comments   
Comment by Elena Stepanova [ 2017-09-28 ]

claudio.nanni, the 1-row difference between the conditional SELECT and unconditional one seems to be legitimate, at least in the attached test case.
There is one line in sceb205fa121 with dia34a99d3bc = 21, which is naturally counted by the unconditional SELECT. The WHERE clause in the conditional SELECT has two EXISTS:

SELECT COUNT(1) FROM sceb205fa121 sch 
WHERE 
  EXISTS (SELECT 1 FROM ed8ec6a6b8d1 WHERE dia34a99d3bc = sch.dia34a99d3bc AND moc979966566 = 4 AND ye91b4a70018 = 34 ) 
AND 
  EXISTS (SELECT 1 FROM us23a3719e74 usr INNER JOIN di569c818b1e dis ON usr.dia34a99d3bc = dis.idb80bb77402 WHERE usr.st9ed39e2ea9 = 0 AND usr.dia34a99d3bc = sch.dia34a99d3bc AND usr.ty94757cae63 IN (3, 4, 5));

For the first EXISTS, there is no row in ed8ec6a6b8d1 which would have values (21,4,34), so the row is excluded from the count.

Comment by Elena Stepanova [ 2017-09-28 ]

Test case

DROP TABLE IF EXISTS t1, t2, t3;
 
CREATE TABLE t1 (a INT NOT NULL);
INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
 
CREATE TABLE t2 (b INT);
INSERT INTO t2 VALUES (5),(1);
 
CREATE TABLE t3 (c INT, KEY(c));
INSERT INTO t3 VALUES (5),(5);
 
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
 
SET optimizer_switch='exists_to_in=off';
 
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
 
DROP TABLE t1, t2, t3;

Result with the default switch (wrong)

+---+
| a |
+---+
| 5 |
| 5 |
| 5 |
| 5 |
+---+
4 rows in set (0.00 sec)

plan

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | t3          | index | c             | c    | 5       | NULL |    2 |   100.00 | Using index                                     |
|    1 | PRIMARY      | <subquery2> | ALL   | distinct_key  | NULL | NULL    | NULL |    2 |   100.00 | Using where                                     |
|    1 | PRIMARY      | t1          | ALL   | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL   | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                                                 |
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+

Result with exists_to_in=OFF (correct)

+---+
| a |
+---+
| 5 |
| 5 |
+---+
2 rows in set (0.00 sec)

plan

+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref       | rows | filtered | Extra       |
+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+
|    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL      |    5 |   100.00 | Using where |
|    3 | DEPENDENT SUBQUERY | t3    | ref  | c             | c    | 5       | test.t1.a |    1 |   100.00 | Using index |
|    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL      |    2 |   100.00 | Using where |
+------+--------------------+-------+------+---------------+------+---------+-----------+------+----------+-------------+

Reproducible on 10.x, both with MyISAM and InnoDB. Not reproducible on 5.5.

Comment by Oleksandr Byelkin [ 2017-10-05 ]

It looks like the result is just result of usual join instead of 'semi-'

select `test`.`t1`.`a` AS `a` from `test`.`t1` join (`test`.`t2`) join (`test`.`t3`) where ((`test`.`t2`.`b` = `test`.`t3`.`c`) and (`test`.`t1`.`a` = `test`.`t3`.`c`));
a
5
5
5
5

Comment by Oleksandr Byelkin [ 2017-10-05 ]

This is a bug of semi-join, here is the proof (2 last selects return correct results):

SET @optimiser_switch_save= @@optimizer_switch;
 
CREATE TABLE t1 (a INT NOT NULL);
INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
 
CREATE TABLE t2 (b INT);
INSERT INTO t2 VALUES (5),(1);
 
CREATE TABLE t3 (c INT, KEY(c));
INSERT INTO t3 VALUES (5),(5);
 
SET optimizer_switch='exists_to_in=on';
 
explain
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
 
SET optimizer_switch='semijoin=off';
explain
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
 
SET optimizer_switch='exists_to_in=off';
explain
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
 
SET @@optimiser_switch= @optimizer_switch_save;
DROP TABLE t1, t2, t3;

Comment by Oleksandr Byelkin [ 2017-10-05 ]

It is only semi-join problem:

select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
a
5
5
5
5
SET optimizer_switch='semijoin=off';
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
a
5
5

Comment by Sergei Petrunia [ 2017-10-05 ]

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | t3          | index | c             | c    | 5       | NULL |    2 |   100.00 | Using index                                     |
|    1 | PRIMARY      | <subquery2> | ALL   | distinct_key  | NULL | NULL    | NULL |    2 |   100.00 | Using where                                     |
|    1 | PRIMARY      | t1          | ALL   | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL   | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                                                 |
+------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+

The query plan is incorrect.

  • EXPLAIN EXTENDED + SHOW WARNINGS show that the subqueries were converted into semi-joins (but not inner joins).
  • The query plan has no ways to remove duplicate matches produced by table t3.
Comment by Sergei Petrunia [ 2017-10-05 ]

Debugging. The optimizer constructs this join prefix:

idx=0, table=t3  
idx=1, table=t2
idx=2, table=t1 

without any parts to remove duplicates (dups_producing_tables=6, 6=2+4={t2,t3}).
Then, in advance_sj_state():

  • LooseScan strategy is used to remove duplicates produced by table t3.
  • SJ-Materialization-scan is used to remove duplicates produced by table t2.

However, one can store only one value at a time in join->positions[2].sj_strategy.

Thus, information about the need to use LooseScan for t3 is lost.
fix_semijoin_strategies_for_picked_join_order() only sets up execution for SJ_MATERIALIZE_SCAN.

Comment by Sergei Petrunia [ 2017-10-05 ]

Possible ways to fix this:

  • Allow multiple semi-join strategies to be attached to one POSITION object.
  • * I am not sure about all consequences of an attempt to do this
  • Resolve such "collisions" between strategies by falling back to DuplicateElimination. That strategy is special as it can handle multiple semi-joins at once and is always applicable (so all fanout will be removed). It is not always the most performant, though.
Comment by Oleksandr Byelkin [ 2017-10-06 ]

5.5 test suite

SET @optimiser_switch_save= @@optimizer_switch;
 
CREATE TABLE t1 (a INT NOT NULL);
INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
 
CREATE TABLE t2 (b INT);
INSERT INTO t2 VALUES (5),(1);
 
CREATE TABLE t3 (c INT, KEY(c));
INSERT INTO t3 VALUES (5),(5);
 
SET optimizer_switch='semijoin=on';
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
 
SET optimizer_switch='semijoin=off';
select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
 
SET @@optimiser_switch= @optimizer_switch_save;
DROP TABLE t1, t2, t3;

Comment by Sergei Petrunia [ 2017-12-19 ]

I was unable to find any issues in the patch.

Let's still ask elenst (or should we ask alice now?) to do a test pass with a semi-join RQG grammar, with various values of the firstmatch,loosescan,materialization flags in @@optimizer_switch.

Generated at Thu Feb 08 08:09:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.