[MDEV-16450] Optimizer seems to make incorrect choice when AND clause on 2 varchar columns and those 2 columns are indexed (includes ORDER BY different_column and LIMIT 1) Created: 2018-06-08  Updated: 2021-04-22

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: None

Type: Task Priority: Major
Reporter: Chris Calender (Inactive) Assignee: Ralf Gebhardt
Resolution: Unresolved Votes: 1
Labels: order-by-optimization

Issue Links:
Relates
relates to MDEV-18079 Opportunistic optimization for ORDER ... Open

 Description   

We are seeing where a seemingly simple query is not using the correct index, and instead preferring an full table scan, which is resulting in a slower query.

Here is a simplified version of the CREATE TABLE and SELECT:

CREATE TABLE `t1` (
`ts` datetime NOT NULL DEFAULT current_timestamp(),
`val1` varchar(5) NOT NULL DEFAULT '0',
`val2` varchar(3) NOT NULL DEFAULT '0',
KEY `idx_val1_val2` (`val1`,`val2`),
KEY `idx_ts` (`ts`)
) ENGINE=InnoDB;

SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;

If I run this through EXPLAIN with 0 data, I see:

mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts LIMIT 1;
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 1 | Using where |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

If I add some rows, I still see the same:

INSERT INTO t1 VALUES (now(),'04643','955');
INSERT INTO t1 VALUES (now(),'04643','999');
INSERT INTO t1 VALUES (now(),'04643','955');
INSERT INTO t1 VALUES (now(),'04643','999');
INSERT INTO t1 VALUES (now(),'04643','955');
INSERT INTO t1 VALUES (now(),'04643','999');
INSERT INTO t1 VALUES (now(),'04643','955');
INSERT INTO t1 VALUES (now(),'04643','999');
INSERT INTO t1 VALUES (now(),'04643','955');
INSERT INTO t1 VALUES (now(),'04643','999');

mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 2 | Using where |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

If I remove the "LIMIT 1", then it chooses the correct index.

Note that in the above EXPLAIN, only 1 index is listed for "possible_keys", yet the other is chosen for "key", so this tells me the full table scan was preferred over the index, and then the secondary index was used for the ORDER BY.

This incorrect choice leads to slower times as opposed to when it chooses the correct index (or FORCE INDEX is used).



 Comments   
Comment by Alice Sherepa [ 2018-06-11 ]

Reproduced on MariaDB 10.1-10.3
(I used ANALYZE, that was introduced in 10.1, I guess the same issue exists from 5.5, but fix version should be decided by developers).
testcase:

--source include/have_innodb.inc
 
CREATE TABLE `t1` (
`ts` datetime NOT NULL ,
`val1` varchar(5) NOT NULL,
`val2` varchar(3) NOT NULL,
KEY `k1` (`val1`,`val2`),
KEY `k2` (`ts`)
)ENGINE=InnoDB;
 
DELIMITER $$;
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i < 200 DO
    INSERT INTO t1 VALUES (now(), '04643','956'), (now(), '04643','955');
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;$$
 
CALL prepare_data();
 
analyze
SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
 
analyze
SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;
 
DROP table t1;
DROP procedure prepare_data;

the results differ when InnoDB or MyISAM engine is used, or int instead of varchar datatype.
in the second ANALYZE query, index k2 is not listed as a possible key

analyze
SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | t1    | index | k1            | k2   | 5       | const,const |  201 |   1.00 |    49.75 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
1 row in set (0.000 sec)
 
--------------
analyze
SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;
+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra                                              |
+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
|    1 | SIMPLE      | t1    | ref  | k1            | k1   | 12      | const,const |  100 |   3.00 |   100.00 |     100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
1 row in set (0.002 sec)

Comment by Varun Gupta (Inactive) [ 2018-07-18 ]

If we look closely at both the analyze statements for the different queries , we don't see key k2 as
a possible key in either of the plans. This is because possible_keys shows indexes or keys that can be used to evaluate the WHERE clause.

For the first query (with limit = 1) there are 2 plans
Plan 1

  • start reading though index ts
  • stop as soon as we have found a row that matches `WHERE val1='04643' AND val2='955'`

Plan 2

  • read all rows that match
  • sort them and pick the 1st one

There is a cost based approach to decide between these 2 plans, so for the first query we chose plan 1 and for the second query we chose plan 2.

Comment by Varun Gupta (Inactive) [ 2018-07-18 ]

Another idea that psergey came up with

we could do both:

  • start sscanning on ts, scan the expected number of rows.
  • If we havent found N matching rows, switch to the other plan.

This way the worst case performance would be better but this is more of a new optimizer feature than a bugfix.

Generated at Thu Feb 08 08:29:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.