[MDEV-16450] Optimizer seems to make incorrect choice when AND clause on 2 varchar columns and those 2 columns are indexed (includes ORDER BY different_column and LIMIT 1) - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer
Labels:
- order-by-optimization

Description

We are seeing where a seemingly simple query is not using the correct index, and instead preferring an full table scan, which is resulting in a slower query.

Here is a simplified version of the CREATE TABLE and SELECT:

CREATE TABLE `t1` (

`ts` datetime NOT NULL DEFAULT current_timestamp(),

`val1` varchar(5) NOT NULL DEFAULT '0',

`val2` varchar(3) NOT NULL DEFAULT '0',

KEY `idx_val1_val2` (`val1`,`val2`),

KEY `idx_ts` (`ts`)

) ENGINE=InnoDB;

SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;

If I run this through EXPLAIN with 0 data, I see:

mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts LIMIT 1;

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

| 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 1 | Using where |

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

If I add some rows, I still see the same:

INSERT INTO t1 VALUES (now(),'04643','955');

INSERT INTO t1 VALUES (now(),'04643','999');

INSERT INTO t1 VALUES (now(),'04643','955');

INSERT INTO t1 VALUES (now(),'04643','999');

INSERT INTO t1 VALUES (now(),'04643','955');

INSERT INTO t1 VALUES (now(),'04643','999');

INSERT INTO t1 VALUES (now(),'04643','955');

INSERT INTO t1 VALUES (now(),'04643','999');

INSERT INTO t1 VALUES (now(),'04643','955');

INSERT INTO t1 VALUES (now(),'04643','999');

mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

| 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 2 | Using where |

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

If I remove the "LIMIT 1", then it chooses the correct index.

Note that in the above EXPLAIN, only 1 index is listed for "possible_keys", yet the other is chosen for "key", so this tells me the full table scan was preferred over the index, and then the secondary index was used for the ORDER BY.

This incorrect choice leads to slower times as opposed to when it chooses the correct index (or FORCE INDEX is used).

Attachments

Issue Links

relates to

MDEV-18079 Opportunistic optimization for ORDER BY LIMIT N queries

Open

Activity

Ascending order - Click to sort in descending order

Alice Sherepa added a comment - 2018-06-11 11:57 - edited

Reproduced on MariaDB 10.1-10.3
(I used ANALYZE, that was introduced in 10.1, I guess the same issue exists from 5.5, but fix version should be decided by developers).
testcase:

--source include/have_innodb.inc

CREATE TABLE `t1` (

`ts` datetime NOT NULL ,

`val1` varchar(5) NOT NULL,

`val2` varchar(3) NOT NULL,

KEY `k1` (`val1`,`val2`),

KEY `k2` (`ts`)

)ENGINE=InnoDB;

DELIMITER $$;

CREATE PROCEDURE prepare_data()

BEGIN

  DECLARE i INT DEFAULT 1;

  WHILE i < 200 DO

    INSERT INTO t1 VALUES (now(), '04643','956'), (now(), '04643','955');

    SET i = i + 1;

  END WHILE;

END$$

DELIMITER ;$$

CALL prepare_data();

analyze

SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;

analyze

SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;

DROP table t1;

DROP procedure prepare_data;

the results differ when InnoDB or MyISAM engine is used, or int instead of varchar datatype.
in the second ANALYZE query, index k2 is not listed as a possible key

analyze

SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;

+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+

| id   | select_type | table | type  | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra       |

+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+

|    1 | SIMPLE      | t1    | index | k1            | k2   | 5       | const,const |  201 |   1.00 |    49.75 |     100.00 | Using where |

+------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+

1 row in set (0.000 sec)

--------------

analyze

SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;

+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra                                              |

+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+

|    1 | SIMPLE      | t1    | ref  | k1            | k1   | 12      | const,const |  100 |   3.00 |   100.00 |     100.00 | Using index condition; Using where; Using filesort |

+------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+

1 row in set (0.002 sec)

Alice Sherepa added a comment - 2018-06-11 11:57 - edited Reproduced on MariaDB 10.1-10.3 (I used ANALYZE, that was introduced in 10.1, I guess the same issue exists from 5.5, but fix version should be decided by developers). testcase: --source include/have_innodb.inc CREATE TABLE `t1` ( `ts` datetime NOT NULL , `val1` varchar (5) NOT NULL , `val2` varchar (3) NOT NULL , KEY `k1` (`val1`,`val2`), KEY `k2` (`ts`) )ENGINE=InnoDB; DELIMITER $$; CREATE PROCEDURE prepare_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 200 DO INSERT INTO t1 VALUES (now(), '04643' , '956' ), (now(), '04643' , '955' ); SET i = i + 1; END WHILE; END $$ DELIMITER ;$$ CALL prepare_data(); analyze SELECT * FROM t1 WHERE val1= '04643' AND val2= '955' ORDER BY ts DESC LIMIT 1; analyze SELECT * FROM t1 WHERE val1= '04643' AND val2= '955' ORDER BY ts DESC LIMIT 3; DROP table t1; DROP procedure prepare_data; the results differ when InnoDB or MyISAM engine is used, or int instead of varchar datatype. in the second ANALYZE query, index k2 is not listed as a possible key analyze SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1; +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ | 1 | SIMPLE | t1 | index | k1 | k2 | 5 | const,const | 201 | 1.00 | 49.75 | 100.00 | Using where | +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ 1 row in set (0.000 sec) -------------- analyze SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3; +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ | 1 | SIMPLE | t1 | ref | k1 | k1 | 12 | const,const | 100 | 3.00 | 100.00 | 100.00 | Using index condition; Using where; Using filesort | +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ 1 row in set (0.002 sec)

Varun Gupta (Inactive) added a comment - 2018-07-18 05:44

If we look closely at both the analyze statements for the different queries , we don't see key k2 as
a possible key in either of the plans. This is because possible_keys shows indexes or keys that can be used to evaluate the WHERE clause.

For the first query (with limit = 1) there are 2 plans
Plan 1

start reading though index ts
stop as soon as we have found a row that matches `WHERE val1='04643' AND val2='955'`

Plan 2

read all rows that match
sort them and pick the 1st one

There is a cost based approach to decide between these 2 plans, so for the first query we chose plan 1 and for the second query we chose plan 2.

Varun Gupta (Inactive) added a comment - 2018-07-18 05:44 If we look closely at both the analyze statements for the different queries , we don't see key k2 as a possible key in either of the plans. This is because possible_keys shows indexes or keys that can be used to evaluate the WHERE clause. For the first query (with limit = 1) there are 2 plans Plan 1 start reading though index ts stop as soon as we have found a row that matches `WHERE val1='04643' AND val2='955'` Plan 2 read all rows that match sort them and pick the 1st one There is a cost based approach to decide between these 2 plans, so for the first query we chose plan 1 and for the second query we chose plan 2.

Varun Gupta (Inactive) added a comment - 2018-07-18 05:50 - edited

Another idea that psergey came up with

we could do both:

start sscanning on ts, scan the expected number of rows.
If we havent found N matching rows, switch to the other plan.

This way the worst case performance would be better but this is more of a new optimizer feature than a bugfix.

Varun Gupta (Inactive) added a comment - 2018-07-18 05:50 - edited Another idea that psergey came up with we could do both: start sscanning on ts, scan the expected number of rows. If we havent found N matching rows, switch to the other plan. This way the worst case performance would be better but this is more of a new optimizer feature than a bugfix.

MariaDB Server

Optimizer seems to make incorrect choice when AND clause on 2 varchar columns and those 2 columns are indexed (includes ORDER BY different_column and LIMIT 1)

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration