Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27442

Wrong result upon query with DISTINCT and EXISTS subquery

Details

    Description

      CREATE TABLE t (a int, b int, KEY b (b,a)) ENGINE=MyISAM;
      INSERT INTO t VALUES (0,100),(2,100),(2,101),(3,102);
       
      SELECT DISTINCT b FROM t WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );
       
      # Cleanup
      DROP TABLE t;
      

      Actual result on 10.5 4c3ad244

      SELECT DISTINCT b FROM t WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );
      b
      101
      102
      

      This is naturally wrong, 100 should also be there.

      The failure appeared in 10.5 branch after this commit in 10.5.3:

      commit eb483c5181ab430877c135c16224284cfc517b3d
      Author: Monty
      Date:   Fri Feb 28 12:59:30 2020 +0200
       
          Updated optimizer costs in multi_range_read_info_const() and sql_select.cc
      

      Plan on the current branch (wrong result)

      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "range",
            "key": "b",
            "key_length": "5",
            "used_key_parts": ["b"],
            "rows": 5,
            "filtered": 100,
            "attached_condition": "<in_optimizer>(1,exists(subquery#2))",
            "using_index_for_group_by": true
          },
          "subqueries": [
            {
              "expression_cache": {
                "state": "uninitialized",
                "query_block": {
                  "select_id": 2,
                  "table": {
                    "message": "No tables used"
                  }
                }
              }
            }
          ]
        }
      }
      

      Plan before the change (correct result)

      {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t",
            "access_type": "index",
            "key": "b",
            "key_length": "10",
            "used_key_parts": ["b", "a"],
            "rows": 4,
            "filtered": 100,
            "attached_condition": "<in_optimizer>(1,exists(subquery#2))",
            "using_index": true
          },
          "subqueries": [
            {
              "expression_cache": {
                "state": "uninitialized",
                "query_block": {
                  "select_id": 2,
                  "table": {
                    "message": "No tables used"
                  }
                }
              }
            }
          ]
        }
      }
      

      Attachments

        Activity

          elenst Elena Stepanova created issue -

          This has nothing to do with my commit.
          One has the same problem with 10.4, with this slight modified test:

          CREATE TABLE t1 (a int, b int, KEY b (b,a)) ENGINE=MyISAM;
          INSERT INTO t1 VALUES (0,100), (2,100),(2,101),(3,102);
          INSERT INTO t1 SELECT -seq,-seq from seq_1_to_100;
          SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );

          monty Michael Widenius added a comment - This has nothing to do with my commit. One has the same problem with 10.4, with this slight modified test: CREATE TABLE t1 (a int, b int, KEY b (b,a)) ENGINE=MyISAM; INSERT INTO t1 VALUES (0,100), (2,100),(2,101),(3,102); INSERT INTO t1 SELECT -seq,-seq from seq_1_to_100; SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );
          monty Michael Widenius made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]

          The problem was that get_best_group_min_max() did not check if fields used
          by the "group_min_max optimization" where used in sub queries.
          Because of this, it did not detect that a key (b,a) was used in the WHERE
          clause for the statement:
          SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ).

          Fixed by also traversing the sub queries when checking if a field is used.
          This disables group_min_max_optimization for the above query.

          Fix created by Sergei Petrunia

          monty Michael Widenius added a comment - The problem was that get_best_group_min_max() did not check if fields used by the "group_min_max optimization" where used in sub queries. Because of this, it did not detect that a key (b,a) was used in the WHERE clause for the statement: SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ). Fixed by also traversing the sub queries when checking if a field is used. This disables group_min_max_optimization for the above query. Fix created by Sergei Petrunia
          monty Michael Widenius made changes -
          issue.field.resolutiondate 2022-02-08 14:13:45.0 2022-02-08 14:13:45.546
          monty Michael Widenius made changes -
          Fix Version/s 10.5.15 [ 26810 ]
          Fix Version/s 10.6.7 [ 26812 ]
          Fix Version/s 10.7.3 [ 26814 ]
          Fix Version/s 10.5 [ 23123 ]
          Fix Version/s 10.6 [ 24028 ]
          Fix Version/s 10.7 [ 24805 ]
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          marko Marko Mäkelä made changes -
          Fix Version/s 10.8.2 [ 26816 ]

          People

            monty Michael Widenius
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.