[MDEV-27442] Wrong result upon query with DISTINCT and EXISTS subquery Created: 2022-01-07  Updated: 2022-02-09  Resolved: 2022-02-08

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5, 10.6, 10.7
Fix Version/s: 10.5.15, 10.6.7, 10.7.3, 10.8.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None


 Description   

CREATE TABLE t (a int, b int, KEY b (b,a)) ENGINE=MyISAM;
INSERT INTO t VALUES (0,100),(2,100),(2,101),(3,102);
 
SELECT DISTINCT b FROM t WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );
 
# Cleanup
DROP TABLE t;

Actual result on 10.5 4c3ad244

SELECT DISTINCT b FROM t WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );
b
101
102

This is naturally wrong, 100 should also be there.

The failure appeared in 10.5 branch after this commit in 10.5.3:

commit eb483c5181ab430877c135c16224284cfc517b3d
Author: Monty
Date:   Fri Feb 28 12:59:30 2020 +0200
 
    Updated optimizer costs in multi_range_read_info_const() and sql_select.cc

Plan on the current branch (wrong result)

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "range",
      "key": "b",
      "key_length": "5",
      "used_key_parts": ["b"],
      "rows": 5,
      "filtered": 100,
      "attached_condition": "<in_optimizer>(1,exists(subquery#2))",
      "using_index_for_group_by": true
    },
    "subqueries": [
      {
        "expression_cache": {
          "state": "uninitialized",
          "query_block": {
            "select_id": 2,
            "table": {
              "message": "No tables used"
            }
          }
        }
      }
    ]
  }
}

Plan before the change (correct result)

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "index",
      "key": "b",
      "key_length": "10",
      "used_key_parts": ["b", "a"],
      "rows": 4,
      "filtered": 100,
      "attached_condition": "<in_optimizer>(1,exists(subquery#2))",
      "using_index": true
    },
    "subqueries": [
      {
        "expression_cache": {
          "state": "uninitialized",
          "query_block": {
            "select_id": 2,
            "table": {
              "message": "No tables used"
            }
          }
        }
      }
    ]
  }
}



 Comments   
Comment by Michael Widenius [ 2022-02-01 ]

This has nothing to do with my commit.
One has the same problem with 10.4, with this slight modified test:

CREATE TABLE t1 (a int, b int, KEY b (b,a)) ENGINE=MyISAM;
INSERT INTO t1 VALUES (0,100), (2,100),(2,101),(3,102);
INSERT INTO t1 SELECT -seq,-seq from seq_1_to_100;
SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 );

Comment by Michael Widenius [ 2022-02-08 ]

The problem was that get_best_group_min_max() did not check if fields used
by the "group_min_max optimization" where used in sub queries.
Because of this, it did not detect that a key (b,a) was used in the WHERE
clause for the statement:
SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ).

Fixed by also traversing the sub queries when checking if a field is used.
This disables group_min_max_optimization for the above query.

Fix created by Sergei Petrunia

Generated at Thu Feb 08 09:52:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.