Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30333

Wrong result with not_null_range_scan and LEFT JOIN with empty table

Details

    Description

      CREATE TABLE t1 (a INT, b INT) ENGINE=MyISAM;
      INSERT INTO t1 (b) VALUES (1),(2);
       
      CREATE TABLE t2 (c INT) ENGINE=MyISAM;
       
      SET optimizer_switch= 'not_null_range_scan=off'; # Default
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
       
      SET optimizer_switch = 'not_null_range_scan=on';
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
       
      DROP TABLE t1, t2;
      

      10.5 b8f4b984

      SET optimizer_switch= 'not_null_range_scan=off';
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      b
      1
      2
      SET optimizer_switch = 'not_null_range_scan=on';
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      b
      DROP TABLE t1, t2;
      

      The expected result is 2 rows.

      When the test case is executed exactly this way (nothing else is done), it produces the wrong result as above, and the execution plan for the 2nd SELECT is

      EXPLAIN EXTENDED SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
      Warnings:
      Note	1003	select `test`.`t1`.`b` AS `b` from `test`.`t1` where 0 order by `test`.`t1`.`b`
      

      However, important!
      The outcome also depends on some other circumstances. For example, if we add FLUSH TABLES and re-execute the query, we can get a different (correct) result and a different plan:

      CREATE TABLE t1 (a INT, b INT) ENGINE=MyISAM;
      INSERT INTO t1 (b) VALUES (1),(2);
      CREATE TABLE t2 (c INT) ENGINE=MyISAM;
      SET optimizer_switch= 'not_null_range_scan=off';
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      b
      1
      2
      SET optimizer_switch = 'not_null_range_scan=on';
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      b
      FLUSH TABLES;
      EXPLAIN EXTENDED SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	SIMPLE	t2	system	NULL	NULL	NULL	NULL	0	0.00	Const row not found
      1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	2	100.00	Using where; Using filesort
      Warnings:
      Note	1003	select `test`.`t1`.`b` AS `b` from `test`.`t1` where `test`.`t1`.`a` is null order by `test`.`t1`.`b`
      SELECT b FROM t1 LEFT JOIN t2 ON c = a WHERE a IS NULL ORDER BY b;
      b
      1
      2
      

      Please be aware of this, as it may affect reproducibility and cause confusion (especially if you are running the test case in the client, not in MTR which does everything from scratch every time). Also, since I don't know the reasons of the non-determinism, it may show up in other forms, e.g. depending on the environment, settings or whatever.

      Attachments

        Activity

          Another test case which seems to be less volatile (or maybe I'm being more lucky with it) and somewhat different in the plan, but otherwise similar in characteristics

          DROP TABLE IF EXISTS t1, t2, t3;
           
          CREATE TABLE t1 (a int, b varchar(10)) ENGINE=MyISAM;
          INSERT INTO t1 VALUES (69,'foo'),(71,'bar');
           
          CREATE TABLE t2 (c int) ENGINE=MyISAM;
          INSERT INTO t2 VALUES (1),(2);
           
          CREATE TABLE t3 (d int, e int, KEY(e)) ENGINE=MyISAM;
           
          SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;
          SET optimizer_switch = 'not_null_range_scan=on';
          SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;
           
          # Cleanup
          DROP TABLE t1, t2, t3;
          

          10.5 b8f4b984

          SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;
          a	b	c	d	e
          69	foo	1	NULL	NULL
          71	bar	1	NULL	NULL
          69	foo	2	NULL	NULL
          71	bar	2	NULL	NULL
          SET optimizer_switch = 'not_null_range_scan=on';
          SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;
          a	b	c	d	e
          69	foo	NULL	NULL	NULL
          71	bar	NULL	NULL	NULL
          

          EXPLAIN EXTENDED
          SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
          1	SIMPLE	t3	const	NULL	NULL	NULL	NULL	1	100.00	Impossible ON condition
          1	SIMPLE	t2	const	NULL	NULL	NULL	NULL	1	100.00	Impossible ON condition
          1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	2	100.00	
          Warnings:
          Note	1003	select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,NULL AS `c`,NULL AS `d`,NULL AS `e` from `test`.`t1` left join (`test`.`t2` left join `test`.`t3` on(multiple equal(NULL, NULL))) on(1) where 1
          

          elenst Elena Stepanova added a comment - Another test case which seems to be less volatile (or maybe I'm being more lucky with it) and somewhat different in the plan, but otherwise similar in characteristics DROP TABLE IF EXISTS t1, t2, t3;   CREATE TABLE t1 (a int , b varchar (10)) ENGINE=MyISAM; INSERT INTO t1 VALUES (69, 'foo' ),(71, 'bar' );   CREATE TABLE t2 (c int ) ENGINE=MyISAM; INSERT INTO t2 VALUES (1),(2);   CREATE TABLE t3 (d int , e int , KEY (e)) ENGINE=MyISAM;   SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1; SET optimizer_switch = 'not_null_range_scan=on' ; SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1;   # Cleanup DROP TABLE t1, t2, t3; 10.5 b8f4b984 SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1; a b c d e 69 foo 1 NULL NULL 71 bar 1 NULL NULL 69 foo 2 NULL NULL 71 bar 2 NULL NULL SET optimizer_switch = 'not_null_range_scan=on' ; SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1; a b c d e 69 foo NULL NULL NULL 71 bar NULL NULL NULL EXPLAIN EXTENDED SELECT * FROM t1 LEFT JOIN t2 LEFT JOIN t3 ON t3.e = t3.d ON 1; id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t3 const NULL NULL NULL NULL 1 100.00 Impossible ON condition 1 SIMPLE t2 const NULL NULL NULL NULL 1 100.00 Impossible ON condition 1 SIMPLE t1 ALL NULL NULL NULL NULL 2 100.00 Warnings: Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`, NULL AS `c`, NULL AS `d`, NULL AS `e` from `test`.`t1` left join (`test`.`t2` left join `test`.`t3` on (multiple equal( NULL , NULL ))) on (1) where 1
          hholzgra Hartmut Holzgraefe added a comment - - edited

          Another apparently related case, here a query returns the correct result first, but after another query has been run it returns empty result until FLUSH TABLES, or a specific other query, resets the situation. Just running EXPLAIN is actually already enough to reproduce this, actually executing the queries is not necessary:

          CREATE TABLE t1 (
           id   int,
           name varchar(100),
           PRIMARY KEY (id)
          ) ENGINE=InnoDB;
           
          CREATE TABLE t2 (
           id    int,
           t1_id int,
           PRIMARY KEY (id)
          ) ENGINE=InnoDB;
           
          CREATE TABLE t3 (
           id     int,
           status int,
           PRIMARY KEY (id)
          ) ENGINE=InnoDB;
           
          CREATE TABLE t4 (
           id    int,
           t2_id int,
           t3_id int,
           PRIMARY KEY (id)
          ) ENGINE=InnoDB;
           
          insert into t1 values (1, "test");
          insert into t2 values (1, 1);
          insert into t4 values (1, 1, NULL);
           
          set optimizer_switch='not_null_range_scan=on';
           
          EXPLAIN select t1.id
            from t1
            JOIN t2 ON t1.id = t2.t1_id
            JOIN t4 ON t2.id = t4.t2_id
           WHERE t2.t1_id=1
             and t4.t3_id is null;
             
          explain select t1.id
            from t1
            join t2 on t1.id = t2.t1_id
            join t4 on t2.id = t4.t2_id
            join t3 on t3.id = t4.t3_id
           where t2.id=1
             and t3.status=0;
           
          EXPLAIN select t1.id
            from t1
            JOIN t2 ON t1.id = t2.t1_id
            JOIN t4 ON t2.id = t4.t2_id
           WHERE t2.t1_id=1
             and t4.t3_id is null;
           
          FLUSH TABLES;
           
          EXPLAIN select t1.id
            from t1
            JOIN t2 ON t1.id = t2.t1_id
            JOIN t4 ON t2.id = t4.t2_id
           WHERE t2.t1_id=1
             and t4.t3_id is null;
          

          Before running the EXPLAIN on the 2nd query the plan for the first one is:

          +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+
          | id   | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra       |
          +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+
          |    1 | SIMPLE      | t2    | const  | PRIMARY       | PRIMARY | 4       | const         | 1    |             |
          |    1 | SIMPLE      | t1    | const  | PRIMARY       | PRIMARY | 4       | const         | 1    | Using index |
          |    1 | SIMPLE      | t4    | ALL    | NULL          | NULL    | NULL    | NULL          | 1    | Using where |
          |    1 | SIMPLE      | t3    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t4.t3_id | 1    | Using where |
          +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+
          

          after the EXPLAIN on the 2nd query re-running EXPLAIN for the first one gives:

          +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
          +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
          |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
          +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
          

          After running FLUSH TABLES the original query plan is restored. The actual result set of the query contains exactly one row, so "Impossible WHERE condition" is obviously not correct. A specific query resetting to the correct plan here is:

          select *
            from t4
           where t2_id <> 0;
          

          hholzgra Hartmut Holzgraefe added a comment - - edited Another apparently related case, here a query returns the correct result first, but after another query has been run it returns empty result until FLUSH TABLES, or a specific other query, resets the situation. Just running EXPLAIN is actually already enough to reproduce this, actually executing the queries is not necessary: CREATE TABLE t1 ( id int, name varchar(100), PRIMARY KEY (id) ) ENGINE=InnoDB;   CREATE TABLE t2 ( id int, t1_id int, PRIMARY KEY (id) ) ENGINE=InnoDB;   CREATE TABLE t3 ( id int, status int, PRIMARY KEY (id) ) ENGINE=InnoDB;   CREATE TABLE t4 ( id int, t2_id int, t3_id int, PRIMARY KEY (id) ) ENGINE=InnoDB;   insert into t1 values (1, "test"); insert into t2 values (1, 1); insert into t4 values (1, 1, NULL);   set optimizer_switch='not_null_range_scan=on';   EXPLAIN select t1.id from t1 JOIN t2 ON t1.id = t2.t1_id JOIN t4 ON t2.id = t4.t2_id WHERE t2.t1_id=1 and t4.t3_id is null; explain select t1.id from t1 join t2 on t1.id = t2.t1_id join t4 on t2.id = t4.t2_id join t3 on t3.id = t4.t3_id where t2.id=1 and t3.status=0;   EXPLAIN select t1.id from t1 JOIN t2 ON t1.id = t2.t1_id JOIN t4 ON t2.id = t4.t2_id WHERE t2.t1_id=1 and t4.t3_id is null;   FLUSH TABLES;   EXPLAIN select t1.id from t1 JOIN t2 ON t1.id = t2.t1_id JOIN t4 ON t2.id = t4.t2_id WHERE t2.t1_id=1 and t4.t3_id is null; Before running the EXPLAIN on the 2nd query the plan for the first one is: +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+ | 1 | SIMPLE | t2 | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | t1 | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index | | 1 | SIMPLE | t4 | ALL | NULL | NULL | NULL | NULL | 1 | Using where | | 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY | 4 | test.t4.t3_id | 1 | Using where | +------+-------------+-------+--------+---------------+---------+---------+---------------+------+-------------+ after the EXPLAIN on the 2nd query re-running EXPLAIN for the first one gives: +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables | +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+ After running FLUSH TABLES the original query plan is restored. The actual result set of the query contains exactly one row, so "Impossible WHERE condition" is obviously not correct. A specific query resetting to the correct plan here is: select * from t4 where t2_id <> 0;

          There was a bug in JOIN::make_notnull_conds_for_range_scans() when clearing TABLE->tmp_set, which was used to mark fields that could not be null.
          This function was only used if 'not_null_range_scan=on' is set.

          The effect was that tmp_set had a 'random value' and this caused the optimizer to think that some fields that could be null could not.
          FLUSH TABLES caused things to temporary work as tmp_set was then cleared.

          Fixed by clearing the tmp_set properly.

          monty Michael Widenius added a comment - There was a bug in JOIN::make_notnull_conds_for_range_scans() when clearing TABLE->tmp_set, which was used to mark fields that could not be null. This function was only used if 'not_null_range_scan=on' is set. The effect was that tmp_set had a 'random value' and this caused the optimizer to think that some fields that could be null could not. FLUSH TABLES caused things to temporary work as tmp_set was then cleared. Fixed by clearing the tmp_set properly.

          People

            monty Michael Widenius
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.