Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13933

Wrong results in COUNT() query with EXISTS and exists_to_in

Details

    • 10.2.10, 5.5.58, 10.1.29, 10.1.30, 10.2.12, 5.5.59

    Description

      A COUNT() query containing EXISTS and subqueries returns wrong results.
      Sometimes disabling the optimizer switch exists_to_in works as expected, I can't always reproduce this behaviour thou.
      Dropping an index (see sample.txt) also seems to fix, but there it's still 1 row off: 468 vs 469)

      Attachments

        Activity

          MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a);
          +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
          | id   | select_type  | table       | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
          +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
          |    1 | PRIMARY      | t3          | index | c             | c    | 5       | NULL |    2 |   100.00 | Using index                                     |
          |    1 | PRIMARY      | <subquery2> | ALL   | distinct_key  | NULL | NULL    | NULL |    2 |   100.00 | Using where                                     |
          |    1 | PRIMARY      | t1          | ALL   | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using where; Using join buffer (flat, BNL join) |
          |    2 | MATERIALIZED | t2          | ALL   | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                                                 |
          +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+
          

          The query plan is incorrect.

          • EXPLAIN EXTENDED + SHOW WARNINGS show that the subqueries were converted into semi-joins (but not inner joins).
          • The query plan has no ways to remove duplicate matches produced by table t3.
          psergei Sergei Petrunia added a comment - MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE b = t1.a) AND EXISTS (SELECT 1 FROM t3 WHERE c = t1.a); +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+ | 1 | PRIMARY | t3 | index | c | c | 5 | NULL | 2 | 100.00 | Using index | | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | 100.00 | Using where | | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where; Using join buffer (flat, BNL join) | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | | +------+--------------+-------------+-------+---------------+------+---------+------+------+----------+-------------------------------------------------+ The query plan is incorrect. EXPLAIN EXTENDED + SHOW WARNINGS show that the subqueries were converted into semi-joins (but not inner joins). The query plan has no ways to remove duplicate matches produced by table t3 .
          psergei Sergei Petrunia added a comment - - edited

          Debugging. The optimizer constructs this join prefix:

          idx=0, table=t3  
          idx=1, table=t2
          idx=2, table=t1 
          

          without any parts to remove duplicates (dups_producing_tables=6, 6=2+4={t2,t3}).
          Then, in advance_sj_state():

          • LooseScan strategy is used to remove duplicates produced by table t3.
          • SJ-Materialization-scan is used to remove duplicates produced by table t2.

          However, one can store only one value at a time in join->positions[2].sj_strategy.

          Thus, information about the need to use LooseScan for t3 is lost.
          fix_semijoin_strategies_for_picked_join_order() only sets up execution for SJ_MATERIALIZE_SCAN.

          psergei Sergei Petrunia added a comment - - edited Debugging. The optimizer constructs this join prefix: idx=0, table=t3 idx=1, table=t2 idx=2, table=t1 without any parts to remove duplicates (dups_producing_tables=6, 6=2+4={t2,t3}). Then, in advance_sj_state(): LooseScan strategy is used to remove duplicates produced by table t3. SJ-Materialization-scan is used to remove duplicates produced by table t2. However, one can store only one value at a time in join->positions[2].sj_strategy . Thus, information about the need to use LooseScan for t3 is lost. fix_semijoin_strategies_for_picked_join_order() only sets up execution for SJ_MATERIALIZE_SCAN.

          Possible ways to fix this:

          • Allow multiple semi-join strategies to be attached to one POSITION object.
          • * I am not sure about all consequences of an attempt to do this
          • Resolve such "collisions" between strategies by falling back to DuplicateElimination. That strategy is special as it can handle multiple semi-joins at once and is always applicable (so all fanout will be removed). It is not always the most performant, though.
          psergei Sergei Petrunia added a comment - Possible ways to fix this: Allow multiple semi-join strategies to be attached to one POSITION object. * I am not sure about all consequences of an attempt to do this Resolve such "collisions" between strategies by falling back to DuplicateElimination. That strategy is special as it can handle multiple semi-joins at once and is always applicable (so all fanout will be removed). It is not always the most performant, though.

          5.5 test suite

          SET @optimiser_switch_save= @@optimizer_switch;
           
          CREATE TABLE t1 (a INT NOT NULL);
          INSERT INTO t1 VALUES (1),(1),(1),(5),(5);
           
          CREATE TABLE t2 (b INT);
          INSERT INTO t2 VALUES (5),(1);
           
          CREATE TABLE t3 (c INT, KEY(c));
          INSERT INTO t3 VALUES (5),(5);
           
          SET optimizer_switch='semijoin=on';
          select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
          and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
           
          SET optimizer_switch='semijoin=off';
          select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`)
          and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);
           
          SET @@optimiser_switch= @optimizer_switch_save;
          DROP TABLE t1, t2, t3;
          

          sanja Oleksandr Byelkin added a comment - 5.5 test suite SET @optimiser_switch_save= @@optimizer_switch;   CREATE TABLE t1 (a INT NOT NULL); INSERT INTO t1 VALUES (1),(1),(1),(5),(5);   CREATE TABLE t2 (b INT); INSERT INTO t2 VALUES (5),(1);   CREATE TABLE t3 (c INT, KEY(c)); INSERT INTO t3 VALUES (5),(5);   SET optimizer_switch='semijoin=on'; select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`) and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);   SET optimizer_switch='semijoin=off'; select t1.a from t1 where t1.a in (select `test`.`t2`.`b` from `test`.`t2`) and t1.a in (select `test`.`t3`.`c` from `test`.`t3`);   SET @@optimiser_switch= @optimizer_switch_save; DROP TABLE t1, t2, t3;

          I was unable to find any issues in the patch.

          Let's still ask elenst (or should we ask alice now?) to do a test pass with a semi-join RQG grammar, with various values of the firstmatch,loosescan,materialization flags in @@optimizer_switch.

          psergei Sergei Petrunia added a comment - I was unable to find any issues in the patch. Let's still ask elenst (or should we ask alice now?) to do a test pass with a semi-join RQG grammar, with various values of the firstmatch,loosescan,materialization flags in @@optimizer_switch.

          People

            sanja Oleksandr Byelkin
            claudio.nanni Claudio Nanni
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.