[MDEV-12145] Wrong result (missing rows) on query with IN and EXISTS subqueries from InnoDB tables Created: 2017-02-27  Updated: 2019-04-17  Resolved: 2019-04-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File diff-12145.diff    
Issue Links:
Duplicate
duplicates MDEV-12429 Wrong result from a query with IN sub... Closed
Sprint: 10.2.6-3

 Description   

Results are different before and after ANALYZE TABLE.

Result before ANALYZE (wrong)

MariaDB [test]> SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
Empty set (0.00 sec)

Execution plan before ANALYZE

+------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
| id   | select_type        | table       | type   | possible_keys | key          | key_len | ref        | rows | filtered | Extra                                                        |
+------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
|    1 | PRIMARY            | t1          | ALL    | NULL          | NULL         | NULL    | NULL       |    2 |   100.00 | Using where                                                  |
|    2 | DEPENDENT SUBQUERY | <subquery3> | eq_ref | distinct_key  | distinct_key | 4       | func       |    1 |   100.00 |                                                              |
|    2 | DEPENDENT SUBQUERY | t2          | index  | i2            | i2           | 5       | NULL       |    3 |   100.00 | Using where; Using index; Using join buffer (flat, BNL join) |
|    2 | DEPENDENT SUBQUERY | t3          | ref    | i3            | i3           | 5       | test.t2.i2 |    4 |   100.00 | Using where; Using index                                     |
|    3 | MATERIALIZED       | t3          | ALL    | NULL          | NULL         | NULL    | NULL       |    8 |   100.00 | Using where                                                  |
|    3 | MATERIALIZED       | t4          | ref    | i4            | i4           | 5       | test.t3.f3 |    1 |   100.00 | Using where                                                  |
+------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                                                         |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'test.t1.f1' of SELECT #2 was resolved in SELECT #1                                                                                                                                                                                                                                                                                          |
| Note  | 1003 | select `test`.`t1`.`f1` AS `f1` from `test`.`t1` where <expr_cache><`test`.`t1`.`f1`>(exists(select 1 from `test`.`t2` semi join (`test`.`t3` join `test`.`t4`) join `test`.`t3` where ((`test`.`t3`.`i3` = `test`.`t2`.`i2`) and (`test`.`t4`.`i4` = `test`.`t3`.`f3`) and (`test`.`t4`.`f4` <> `test`.`t3`.`f3`) and (`test`.`t1`.`f1` = `test`.`t3`.`f3`)))) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Result after ANALYZE (correct)

MariaDB [test]> ANALYZE TABLE t1, t2, t3, t4;
...
MariaDB [test]> SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
+------+
| f1   |
+------+
|    6 |
+------+
1 row in set (0.01 sec)

Execution plan after ANALYZE

+------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
| id   | select_type        | table | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                                                        |
+------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
|    1 | PRIMARY            | t1    | ALL   | NULL          | NULL | NULL    | NULL       |    2 |   100.00 | Using where                                                  |
|    2 | DEPENDENT SUBQUERY | t3    | ALL   | NULL          | NULL | NULL    | NULL       |    8 |   100.00 | Using where                                                  |
|    2 | DEPENDENT SUBQUERY | t4    | ref   | i4            | i4   | 5       | test.t3.f3 |    1 |   100.00 | Using where; FirstMatch                                      |
|    2 | DEPENDENT SUBQUERY | t2    | index | i2            | i2   | 5       | NULL       |    3 |   100.00 | Using where; Using index; Using join buffer (flat, BNL join) |
|    2 | DEPENDENT SUBQUERY | t3    | ref   | i3            | i3   | 5       | test.t2.i2 |    1 |   100.00 | Using where; Using index                                     |
+------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                                                         |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'test.t1.f1' of SELECT #2 was resolved in SELECT #1                                                                                                                                                                                                                                                                                          |
| Note  | 1003 | select `test`.`t1`.`f1` AS `f1` from `test`.`t1` where <expr_cache><`test`.`t1`.`f1`>(exists(select 1 from `test`.`t2` semi join (`test`.`t3` join `test`.`t4`) join `test`.`t3` where ((`test`.`t3`.`i3` = `test`.`t2`.`i2`) and (`test`.`t4`.`i4` = `test`.`t3`.`f3`) and (`test`.`t4`.`f4` <> `test`.`t3`.`f3`) and (`test`.`t1`.`f1` = `test`.`t3`.`f3`)))) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Test case

--source include/have_innodb.inc
 
CREATE TABLE t1 (f1 INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (4),(6);
 
CREATE TABLE t2 (i2 INT, KEY(i2)) ENGINE=InnoDB;
INSERT INTO t2 VALUES (8),(7),(1);
 
CREATE TABLE t3 (f3 INT, i3 INT, KEY(i3)) ENGINE=InnoDB;
CREATE ALGORITHM=MERGE VIEW v3 AS SELECT * FROM t3;
INSERT INTO t3 VALUES (8,0),(6,3),(2,8),(3,8),(1,6),(0,0),(1,0),(1,5);
 
CREATE TABLE t4 (i4 INT, f4 INT, KEY(i4)) ENGINE=InnoDB;
INSERT INTO t4 VALUES (0,0),(4,0),(0,2),(1,0),(2,1),(2,7),(6,3),(8,7),(8,1),(1,0),(7,2),(6,0),(8,1);
 
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
ANALYZE TABLE t1,t2,t3,t4;
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
 
DROP VIEW v3;
DROP TABLE t1, t2, t3, t4;

The problem isn't always reproducible right away. Here is the alternative technical test case (only to reproduce it more easily, don't put it into the test suite!)

--source include/have_innodb.inc
 
CREATE TABLE t1 (f1 INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (4),(6);
 
CREATE TABLE t2 (i2 INT, KEY(i2)) ENGINE=InnoDB;
INSERT INTO t2 VALUES (8),(7),(1);
 
CREATE TABLE t3 (f3 INT, i3 INT, KEY(i3)) ENGINE=InnoDB;
CREATE ALGORITHM=MERGE VIEW v3 AS SELECT * FROM t3;
INSERT INTO t3 VALUES (8,0),(6,3),(2,8),(3,8),(1,6),(0,0),(1,0),(1,5);
 
CREATE TABLE t4 (i4 INT, f4 INT, KEY(i4)) ENGINE=InnoDB;
INSERT INTO t4 VALUES (0,0),(4,0),(0,2),(1,0),(2,1),(2,7),(6,3),(8,7),(8,1),(1,0),(7,2),(6,0),(8,1);
 
--let $query = SELECT COUNT(*) FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) )
 
--let $try = 100
--let $found_zero= 0
--let $found_non_zero = 0
--let $printed_zero_plan= 0
--let $printed_non_zero_plan= 0
 
while ($try)
{
    --let $res= `$query`
    dec $try;
    --echo # 
    if ($res)
    {
        --let $found_non_zero= $res
        if (!$printed_non_zero_plan)
        {
            --echo #------- NON-ZERO PLAN -------
            eval EXPLAIN EXTENDED $query;
            --let $printed_non_zero_plan= 1
            --echo #-----------------------------
        }
    }
    if (!$res)
    {
        --let $found_zero= 1
        if (!$printed_zero_plan)
        {
            --echo #------- ZERO PLAN -------
            eval EXPLAIN EXTENDED $query;
            --let $printed_zero_plan= 1
            --echo #-------------------------
        }
    }
    if ($found_zero)
    {
        if ($found_non_zero)
        {
            --let $try= 0
        }
    }
    --disable_query_log
    --disable_result_log
    ANALYZE TABLE t1,t2,t3,t4;
    --enable_result_log
    --enable_query_log
}
 
--echo #-------- RESULT --------
if ($found_zero)
{
    --echo # Found 0
}
if ($found_non_zero)
{
    --echo # Found $found_non_zero
}
--echo #------------------------
 
DROP VIEW v3;
DROP TABLE t1, t2, t3, t4;

The indication that the problem was reproduced is this output at the end of the test:

#-------- RESULT --------
# Found 0
# Found 1
#------------------------

Before it, there will also be plans for the zero- and non-zero results.



 Comments   
Comment by Sergei Petrunia [ 2017-03-17 ]

varun, please investigate.

Comment by Varun Gupta (Inactive) [ 2017-03-22 ]

I was just playing with the queries, so first change I made to the above query, was replacing the IN clause by exist

the new query would be:

 
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND EXISTS ( SELECT 1 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 and f3 = f1 ) );
 
Executing it returns
f1
6

which is what we expect the output from the above query. So we can see that the issue is with the transformation of IN to EXIST

Comment by Varun Gupta (Inactive) [ 2017-03-27 ]

set optimizer_switch = "semijoin=off";
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
f1
6
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
f1
6

So here we see if we turn off the semi join optimization then the above test does not fail, so we can see that the issue is with the semi-join optimization

Comment by Varun Gupta (Inactive) [ 2017-03-28 ]

Also if we see the explain in the test case above, we see for the first query we use the
Semi-join materialization optimization
while for the later one we use First match optimization.

So our issue is to resolve why are we not getting results for the semi-join materialization

Comment by Sergei Petrunia [ 2017-03-29 ]

EXPLAIN FORMAT=JSON: https://gist.github.com/spetrunia/890d15995aa20e2fc9ec583b807d72ff

Interesting things as pointed out by Varun:

                    "attached_condition": "1 and t3.f3 is not null"

where does the "1" come from?

Comment by Sergei Petrunia [ 2017-03-29 ]

Right after make_join_readinfo it is:

(gdb) p dbug_print_item(join_tab->bush_children->start->select.cond)
  $47 = 0x5555570c1040 <dbug_item_print_buf> "t1.f1 = t3.f3 and t3.f3 is not null"

(And this could be already wrong, will post why)

"t1.f1 = t3.f3" is replaced with "1" here:

  #0  base_list_iterator::replace (this=0x7ffff7ed3560, element=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:475
  #1  0x0000555555a9ad13 in List_iterator<Item>::replace (this=0x7ffff7ed3560, a=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:563
  #2  0x0000555555c95699 in remove_sj_conds (thd=0x7ffff4047070, tree=0x7ffff4199718) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3870
  #3  0x0000555555c94d26 in setup_sj_materialization_part2 (sjm_tab=0x7ffff419ebe8) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3686
  #4  0x0000555555b54efb in make_join_readinfo (join=0x7ffff4192da0, options=4, no_jbuf_after=4) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:11253
  #5  0x0000555555b3b8eb in JOIN::optimize_inner (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1955
  #6  0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084
  #7  0x0000555555ae8d6b in st_select_lex::optimize_unflattened_subqueries (this=0x7ffff404b280, const_only=false) at /home/psergey/dev-git/10.2-r3/sql/sql_lex.cc:3823
  #8  0x0000555555c9831c in JOIN::optimize_unflattened_subqueries (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:5047
  #9  0x0000555555b3baf2 in JOIN::optimize_inner (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1987
  #10 0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084
  #11 0x0000555555b414c6 in mysql_select (thd=0x7ffff4047070, tables=0x7ffff4095370, wild_num=1, fields=..., conds=0x7ffff4099aa0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748612, result=0x7ffff409ada8, unit=0x7ffff404ab50, select_lex=0x7ffff404b280) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:3644

Comment by Sergei Petrunia [ 2017-03-29 ]

re "And this could be already wrong" in the previous comment: this is materialization part of the subquery. Why does it have a reference to t1.f1? This doesn't look valid.

Reference to t1.f1 is a reference to outside of the subquery. however it is not valid to have it, because materialization of subquery#3 will be done once per statement execution (and not once per execution of subquery#2).

Comment by Sergei Petrunia [ 2017-03-29 ]

I gather make_join_select() has a bug which caused it to attach "t1.f1 = t3.f3" to subquery's table t3.

Comment by Varun Gupta (Inactive) [ 2017-04-01 ]

diff-12145.diff

Comment by Igor Babaev [ 2017-04-03 ]

The following simplified query returns wrong result (an empty set) for the MyISAM tables with the same contents:

SELECT * FROM t1 
     WHERE EXISTS ( SELECT * FROM t2, t3
                                      WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3 ) );

Comment by Igor Babaev [ 2017-04-03 ]

Varun:
The following patch applied to 5.5 fixes mdev-12145, mdev-12429, mdev-9686:

iff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 43b3b6b..d83e8e2 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -8794,7 +8794,7 @@ bool TABLE_LIST::is_active_sjm()
        It solve problem with select like SELECT * FROM t1 WHERE rand() > 0.5
       */
       if (tab == join->join_tab + join->top_join_tab_count - 1)
-       current_map|= OUTER_REF_TABLE_BIT | RAND_TABLE_BIT;
+        current_map|= RAND_TABLE_BIT;
       used_tables|=current_map;
 
       if (tab->type == JT_REF && tab->quick &&

I've checked that your patch above does not fix at least mdev-12429

Comment by Igor Babaev [ 2019-04-17 ]

Closed a duplicate of MDEV-12429 pushed into 5.5 by the commit b0395d8701ec49f49ad23f9917a3b2369bb49e7a in April 2017

Generated at Thu Feb 08 07:55:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.