Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12145

Wrong result (missing rows) on query with IN and EXISTS subqueries from InnoDB tables

Details

    • 10.2.6-3

    Description

      Results are different before and after ANALYZE TABLE.

      Result before ANALYZE (wrong)

      MariaDB [test]> SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
      Empty set (0.00 sec)
      

      Execution plan before ANALYZE

      +------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
      | id   | select_type        | table       | type   | possible_keys | key          | key_len | ref        | rows | filtered | Extra                                                        |
      +------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
      |    1 | PRIMARY            | t1          | ALL    | NULL          | NULL         | NULL    | NULL       |    2 |   100.00 | Using where                                                  |
      |    2 | DEPENDENT SUBQUERY | <subquery3> | eq_ref | distinct_key  | distinct_key | 4       | func       |    1 |   100.00 |                                                              |
      |    2 | DEPENDENT SUBQUERY | t2          | index  | i2            | i2           | 5       | NULL       |    3 |   100.00 | Using where; Using index; Using join buffer (flat, BNL join) |
      |    2 | DEPENDENT SUBQUERY | t3          | ref    | i3            | i3           | 5       | test.t2.i2 |    4 |   100.00 | Using where; Using index                                     |
      |    3 | MATERIALIZED       | t3          | ALL    | NULL          | NULL         | NULL    | NULL       |    8 |   100.00 | Using where                                                  |
      |    3 | MATERIALIZED       | t4          | ref    | i4            | i4           | 5       | test.t3.f3 |    1 |   100.00 | Using where                                                  |
      +------+--------------------+-------------+--------+---------------+--------------+---------+------------+------+----------+--------------------------------------------------------------+
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                                                                                                                                                                                                                                                                         |
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Note  | 1276 | Field or reference 'test.t1.f1' of SELECT #2 was resolved in SELECT #1                                                                                                                                                                                                                                                                                          |
      | Note  | 1003 | select `test`.`t1`.`f1` AS `f1` from `test`.`t1` where <expr_cache><`test`.`t1`.`f1`>(exists(select 1 from `test`.`t2` semi join (`test`.`t3` join `test`.`t4`) join `test`.`t3` where ((`test`.`t3`.`i3` = `test`.`t2`.`i2`) and (`test`.`t4`.`i4` = `test`.`t3`.`f3`) and (`test`.`t4`.`f4` <> `test`.`t3`.`f3`) and (`test`.`t1`.`f1` = `test`.`t3`.`f3`)))) |
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      2 rows in set (0.00 sec)
      

      Result after ANALYZE (correct)

      MariaDB [test]> ANALYZE TABLE t1, t2, t3, t4;
      ...
      MariaDB [test]> SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
      +------+
      | f1   |
      +------+
      |    6 |
      +------+
      1 row in set (0.01 sec)
      

      Execution plan after ANALYZE

      +------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
      | id   | select_type        | table | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                                                        |
      +------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
      |    1 | PRIMARY            | t1    | ALL   | NULL          | NULL | NULL    | NULL       |    2 |   100.00 | Using where                                                  |
      |    2 | DEPENDENT SUBQUERY | t3    | ALL   | NULL          | NULL | NULL    | NULL       |    8 |   100.00 | Using where                                                  |
      |    2 | DEPENDENT SUBQUERY | t4    | ref   | i4            | i4   | 5       | test.t3.f3 |    1 |   100.00 | Using where; FirstMatch                                      |
      |    2 | DEPENDENT SUBQUERY | t2    | index | i2            | i2   | 5       | NULL       |    3 |   100.00 | Using where; Using index; Using join buffer (flat, BNL join) |
      |    2 | DEPENDENT SUBQUERY | t3    | ref   | i3            | i3   | 5       | test.t2.i2 |    1 |   100.00 | Using where; Using index                                     |
      +------+--------------------+-------+-------+---------------+------+---------+------------+------+----------+--------------------------------------------------------------+
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                                                                                                                                                                                                                                                                         |
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Note  | 1276 | Field or reference 'test.t1.f1' of SELECT #2 was resolved in SELECT #1                                                                                                                                                                                                                                                                                          |
      | Note  | 1003 | select `test`.`t1`.`f1` AS `f1` from `test`.`t1` where <expr_cache><`test`.`t1`.`f1`>(exists(select 1 from `test`.`t2` semi join (`test`.`t3` join `test`.`t4`) join `test`.`t3` where ((`test`.`t3`.`i3` = `test`.`t2`.`i2`) and (`test`.`t4`.`i4` = `test`.`t3`.`f3`) and (`test`.`t4`.`f4` <> `test`.`t3`.`f3`) and (`test`.`t1`.`f1` = `test`.`t3`.`f3`)))) |
      +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Test case

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (f1 INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (4),(6);
       
      CREATE TABLE t2 (i2 INT, KEY(i2)) ENGINE=InnoDB;
      INSERT INTO t2 VALUES (8),(7),(1);
       
      CREATE TABLE t3 (f3 INT, i3 INT, KEY(i3)) ENGINE=InnoDB;
      CREATE ALGORITHM=MERGE VIEW v3 AS SELECT * FROM t3;
      INSERT INTO t3 VALUES (8,0),(6,3),(2,8),(3,8),(1,6),(0,0),(1,0),(1,5);
       
      CREATE TABLE t4 (i4 INT, f4 INT, KEY(i4)) ENGINE=InnoDB;
      INSERT INTO t4 VALUES (0,0),(4,0),(0,2),(1,0),(2,1),(2,7),(6,3),(8,7),(8,1),(1,0),(7,2),(6,0),(8,1);
       
      SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
      ANALYZE TABLE t1,t2,t3,t4;
      SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
       
      DROP VIEW v3;
      DROP TABLE t1, t2, t3, t4;
      

      The problem isn't always reproducible right away. Here is the alternative technical test case (only to reproduce it more easily, don't put it into the test suite!)

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (f1 INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (4),(6);
       
      CREATE TABLE t2 (i2 INT, KEY(i2)) ENGINE=InnoDB;
      INSERT INTO t2 VALUES (8),(7),(1);
       
      CREATE TABLE t3 (f3 INT, i3 INT, KEY(i3)) ENGINE=InnoDB;
      CREATE ALGORITHM=MERGE VIEW v3 AS SELECT * FROM t3;
      INSERT INTO t3 VALUES (8,0),(6,3),(2,8),(3,8),(1,6),(0,0),(1,0),(1,5);
       
      CREATE TABLE t4 (i4 INT, f4 INT, KEY(i4)) ENGINE=InnoDB;
      INSERT INTO t4 VALUES (0,0),(4,0),(0,2),(1,0),(2,1),(2,7),(6,3),(8,7),(8,1),(1,0),(7,2),(6,0),(8,1);
       
      --let $query = SELECT COUNT(*) FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) )
       
      --let $try = 100
      --let $found_zero= 0
      --let $found_non_zero = 0
      --let $printed_zero_plan= 0
      --let $printed_non_zero_plan= 0
       
      while ($try)
      {
          --let $res= `$query`
          dec $try;
          --echo # 
          if ($res)
          {
              --let $found_non_zero= $res
              if (!$printed_non_zero_plan)
              {
                  --echo #------- NON-ZERO PLAN -------
                  eval EXPLAIN EXTENDED $query;
                  --let $printed_non_zero_plan= 1
                  --echo #-----------------------------
              }
          }
          if (!$res)
          {
              --let $found_zero= 1
              if (!$printed_zero_plan)
              {
                  --echo #------- ZERO PLAN -------
                  eval EXPLAIN EXTENDED $query;
                  --let $printed_zero_plan= 1
                  --echo #-------------------------
              }
          }
          if ($found_zero)
          {
              if ($found_non_zero)
              {
                  --let $try= 0
              }
          }
          --disable_query_log
          --disable_result_log
          ANALYZE TABLE t1,t2,t3,t4;
          --enable_result_log
          --enable_query_log
      }
       
      --echo #-------- RESULT --------
      if ($found_zero)
      {
          --echo # Found 0
      }
      if ($found_non_zero)
      {
          --echo # Found $found_non_zero
      }
      --echo #------------------------
       
      DROP VIEW v3;
      DROP TABLE t1, t2, t3, t4;
      

      The indication that the problem was reproduced is this output at the end of the test:

      #-------- RESULT --------
      # Found 0
      # Found 1
      #------------------------
      

      Before it, there will also be plans for the zero- and non-zero results.

      Attachments

        Issue Links

          Activity

            varun, please investigate.

            psergei Sergei Petrunia added a comment - varun , please investigate.
            varun Varun Gupta (Inactive) added a comment - - edited

            I was just playing with the queries, so first change I made to the above query, was replacing the IN clause by exist

            the new query would be:

             
            SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND EXISTS ( SELECT 1 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 and f3 = f1 ) );
             
            Executing it returns
            f1
            6
            

            which is what we expect the output from the above query. So we can see that the issue is with the transformation of IN to EXIST

            varun Varun Gupta (Inactive) added a comment - - edited I was just playing with the queries, so first change I made to the above query, was replacing the IN clause by exist the new query would be:   SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND EXISTS ( SELECT 1 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 and f3 = f1 ) );   Executing it returns f1 6 which is what we expect the output from the above query. So we can see that the issue is with the transformation of IN to EXIST
            varun Varun Gupta (Inactive) added a comment - - edited

            set optimizer_switch = "semijoin=off";
            SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
            f1
            6
            SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) );
            f1
            6
            
            

            So here we see if we turn off the semi join optimization then the above test does not fail, so we can see that the issue is with the semi-join optimization

            varun Varun Gupta (Inactive) added a comment - - edited set optimizer_switch = "semijoin=off"; SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3, t4 WHERE i4 = f3 AND f4 <> f3 ) ); f1 6 SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM v3, t4 WHERE i4 = f3 AND f4 <> f3 ) ); f1 6 So here we see if we turn off the semi join optimization then the above test does not fail, so we can see that the issue is with the semi-join optimization

            Also if we see the explain in the test case above, we see for the first query we use the
            Semi-join materialization optimization
            while for the later one we use First match optimization.

            So our issue is to resolve why are we not getting results for the semi-join materialization

            varun Varun Gupta (Inactive) added a comment - Also if we see the explain in the test case above, we see for the first query we use the Semi-join materialization optimization while for the later one we use First match optimization. So our issue is to resolve why are we not getting results for the semi-join materialization

            EXPLAIN FORMAT=JSON: https://gist.github.com/spetrunia/890d15995aa20e2fc9ec583b807d72ff

            Interesting things as pointed out by Varun:

                                "attached_condition": "1 and t3.f3 is not null"
            

            where does the "1" come from?

            psergei Sergei Petrunia added a comment - EXPLAIN FORMAT=JSON: https://gist.github.com/spetrunia/890d15995aa20e2fc9ec583b807d72ff Interesting things as pointed out by Varun: "attached_condition": "1 and t3.f3 is not null" where does the "1" come from?

            Right after make_join_readinfo it is:

            (gdb) p dbug_print_item(join_tab->bush_children->start->select.cond)
              $47 = 0x5555570c1040 <dbug_item_print_buf> "t1.f1 = t3.f3 and t3.f3 is not null"
            

            (And this could be already wrong, will post why)

            "t1.f1 = t3.f3" is replaced with "1" here:

              #0  base_list_iterator::replace (this=0x7ffff7ed3560, element=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:475
              #1  0x0000555555a9ad13 in List_iterator<Item>::replace (this=0x7ffff7ed3560, a=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:563
              #2  0x0000555555c95699 in remove_sj_conds (thd=0x7ffff4047070, tree=0x7ffff4199718) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3870
              #3  0x0000555555c94d26 in setup_sj_materialization_part2 (sjm_tab=0x7ffff419ebe8) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3686
              #4  0x0000555555b54efb in make_join_readinfo (join=0x7ffff4192da0, options=4, no_jbuf_after=4) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:11253
              #5  0x0000555555b3b8eb in JOIN::optimize_inner (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1955
              #6  0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084
              #7  0x0000555555ae8d6b in st_select_lex::optimize_unflattened_subqueries (this=0x7ffff404b280, const_only=false) at /home/psergey/dev-git/10.2-r3/sql/sql_lex.cc:3823
              #8  0x0000555555c9831c in JOIN::optimize_unflattened_subqueries (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:5047
              #9  0x0000555555b3baf2 in JOIN::optimize_inner (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1987
              #10 0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084
              #11 0x0000555555b414c6 in mysql_select (thd=0x7ffff4047070, tables=0x7ffff4095370, wild_num=1, fields=..., conds=0x7ffff4099aa0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748612, result=0x7ffff409ada8, unit=0x7ffff404ab50, select_lex=0x7ffff404b280) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:3644
            

            psergei Sergei Petrunia added a comment - Right after make_join_readinfo it is: (gdb) p dbug_print_item(join_tab->bush_children->start->select.cond) $47 = 0x5555570c1040 <dbug_item_print_buf> "t1.f1 = t3.f3 and t3.f3 is not null" (And this could be already wrong, will post why) "t1.f1 = t3.f3" is replaced with "1" here: #0 base_list_iterator::replace (this=0x7ffff7ed3560, element=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:475 #1 0x0000555555a9ad13 in List_iterator<Item>::replace (this=0x7ffff7ed3560, a=0x7ffff418cc68) at /home/psergey/dev-git/10.2-r3/sql/sql_list.h:563 #2 0x0000555555c95699 in remove_sj_conds (thd=0x7ffff4047070, tree=0x7ffff4199718) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3870 #3 0x0000555555c94d26 in setup_sj_materialization_part2 (sjm_tab=0x7ffff419ebe8) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:3686 #4 0x0000555555b54efb in make_join_readinfo (join=0x7ffff4192da0, options=4, no_jbuf_after=4) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:11253 #5 0x0000555555b3b8eb in JOIN::optimize_inner (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1955 #6 0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192da0) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084 #7 0x0000555555ae8d6b in st_select_lex::optimize_unflattened_subqueries (this=0x7ffff404b280, const_only=false) at /home/psergey/dev-git/10.2-r3/sql/sql_lex.cc:3823 #8 0x0000555555c9831c in JOIN::optimize_unflattened_subqueries (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/opt_subselect.cc:5047 #9 0x0000555555b3baf2 in JOIN::optimize_inner (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1987 #10 0x0000555555b386fc in JOIN::optimize (this=0x7ffff4192620) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:1084 #11 0x0000555555b414c6 in mysql_select (thd=0x7ffff4047070, tables=0x7ffff4095370, wild_num=1, fields=..., conds=0x7ffff4099aa0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748612, result=0x7ffff409ada8, unit=0x7ffff404ab50, select_lex=0x7ffff404b280) at /home/psergey/dev-git/10.2-r3/sql/sql_select.cc:3644

            re "And this could be already wrong" in the previous comment: this is materialization part of the subquery. Why does it have a reference to t1.f1? This doesn't look valid.

            Reference to t1.f1 is a reference to outside of the subquery. however it is not valid to have it, because materialization of subquery#3 will be done once per statement execution (and not once per execution of subquery#2).

            psergei Sergei Petrunia added a comment - re "And this could be already wrong" in the previous comment: this is materialization part of the subquery. Why does it have a reference to t1.f1 ? This doesn't look valid. Reference to t1.f1 is a reference to outside of the subquery. however it is not valid to have it, because materialization of subquery#3 will be done once per statement execution (and not once per execution of subquery#2).

            I gather make_join_select() has a bug which caused it to attach "t1.f1 = t3.f3" to subquery's table t3.

            psergei Sergei Petrunia added a comment - I gather make_join_select() has a bug which caused it to attach "t1.f1 = t3.f3" to subquery's table t3.

            The following simplified query returns wrong result (an empty set) for the MyISAM tables with the same contents:

            SELECT * FROM t1 
                 WHERE EXISTS ( SELECT * FROM t2, t3
                                                  WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3 ) );
            

            igor Igor Babaev (Inactive) added a comment - The following simplified query returns wrong result (an empty set) for the MyISAM tables with the same contents: SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2, t3 WHERE i3 = i2 AND f1 IN ( SELECT f3 FROM t3 ) );
            igor Igor Babaev (Inactive) added a comment - - edited

            Varun:
            The following patch applied to 5.5 fixes mdev-12145, mdev-12429, mdev-9686:

            iff --git a/sql/sql_select.cc b/sql/sql_select.cc
            index 43b3b6b..d83e8e2 100644
            --- a/sql/sql_select.cc
            +++ b/sql/sql_select.cc
            @@ -8794,7 +8794,7 @@ bool TABLE_LIST::is_active_sjm()
                    It solve problem with select like SELECT * FROM t1 WHERE rand() > 0.5
                   */
                   if (tab == join->join_tab + join->top_join_tab_count - 1)
            -       current_map|= OUTER_REF_TABLE_BIT | RAND_TABLE_BIT;
            +        current_map|= RAND_TABLE_BIT;
                   used_tables|=current_map;
             
                   if (tab->type == JT_REF && tab->quick &&
            

            I've checked that your patch above does not fix at least mdev-12429

            igor Igor Babaev (Inactive) added a comment - - edited Varun: The following patch applied to 5.5 fixes mdev-12145, mdev-12429, mdev-9686: iff --git a/sql/sql_select.cc b/sql/sql_select.cc index 43b3b6b..d83e8e2 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -8794,7 +8794,7 @@ bool TABLE_LIST::is_active_sjm() It solve problem with select like SELECT * FROM t1 WHERE rand() > 0.5 */ if (tab == join->join_tab + join->top_join_tab_count - 1) - current_map|= OUTER_REF_TABLE_BIT | RAND_TABLE_BIT; + current_map|= RAND_TABLE_BIT; used_tables|=current_map; if (tab->type == JT_REF && tab->quick && I've checked that your patch above does not fix at least mdev-12429

            Closed a duplicate of MDEV-12429 pushed into 5.5 by the commit b0395d8701ec49f49ad23f9917a3b2369bb49e7a in April 2017

            igor Igor Babaev (Inactive) added a comment - Closed a duplicate of MDEV-12429 pushed into 5.5 by the commit b0395d8701ec49f49ad23f9917a3b2369bb49e7a in April 2017

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.