[MDEV-15310] Range optimizer does not work well for "WHERE temporal_column NOT IN (const_list)" Created: 2018-02-14  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Data types, Optimizer, Temporal Types
Affects Version/s: 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Epic Link: Data type cleanups

 Description   

I create a table with a TIME column and populate it with some data, and run an EXPLAIN for a SELECT with a ``WHERE a NOT IN()'' condition:

CREATE OR REPLACE TABLE t1 (a TIME, filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:01', 'no');
INSERT INTO t1 VALUES ('23:00:02', 'no');
INSERT INTO t1 VALUES ('23:00:03', 'yes');
INSERT INTO t1 VALUES ('23:00:04', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('23:00:01','23:00:02');
DROP TABLE t1;

+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | a             | a    | 4       | NULL |    4 | Using index condition |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

Notice, the condition covers only two records. The optimizer correctly choses to use the range search.

Now I do the same for negative TIME values:

CREATE OR REPLACE TABLE t1 (a TIME, filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:01', 'no');
INSERT INTO t1 VALUES ('-23:00:02', 'no');
INSERT INTO t1 VALUES ('-23:00:03', 'yes');
INSERT INTO t1 VALUES ('-23:00:04', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('-23:00:01','-23:00:02');
DROP TABLE t1;

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL |   27 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

As in the first example, the condition matches only two records again, but the range search does not work any more and it goes through the full table scan.

Now I do the same for positive time values, but outside of the time-of-the-day range, greater than 23:59:59.

CREATE OR REPLACE TABLE t1 (a TIME, filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:01', 'no');
INSERT INTO t1 VALUES ('24:00:02', 'no');
INSERT INTO t1 VALUES ('24:00:03', 'yes');
INSERT INTO t1 VALUES ('24:00:04', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('24:00:01','24:00:02');
DROP TABLE t1;

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL |   27 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

It erroneously uses the full table scan again.

Now I do the same for a TIME(1) column with positive fractional data, within the time-of-the-day range:

CREATE OR REPLACE TABLE t1 (a TIME(1), filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:01.1', 'no');
INSERT INTO t1 VALUES ('23:00:02.1', 'no');
INSERT INTO t1 VALUES ('23:00:03.1', 'yes');
INSERT INTO t1 VALUES ('23:00:04.1', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('23:00:01.1','23:00:02.1');
DROP TABLE t1;

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL |   27 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

Again, although the condition matches only two records, it erroneously goes through the full table scan.

Now I do the same for a DATETIME column:

CREATE OR REPLACE TABLE t1 (a DATETIME, filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:02', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:03', 'yes');
INSERT INTO t1 VALUES ('2001-01-01 23:00:04', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('2001-01-01 23:00:01','2001-01-01 23:00:02');
DROP TABLE t1;

+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | a             | a    | 6       | NULL |    4 | Using index condition |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

Looks good. It uses the range search.

Now I do the same for a DATETIME(1) column, with fractional data:

CREATE OR REPLACE TABLE t1 (a DATETIME(1), filler CHAR(200), KEY(a));
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:01.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:02.1', 'no');
INSERT INTO t1 VALUES ('2001-01-01 23:00:03.1', 'yes');
INSERT INTO t1 VALUES ('2001-01-01 23:00:04.1', 'yes');
EXPLAIN SELECT * FROM t1 WHERE a NOT IN ('2001-01-01 23:00:01.1','2001-01-01 23:00:02.1');
DROP TABLE t1;

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL |   27 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

It erroneously goes through the full table scan.


Generated at Thu Feb 08 08:20:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.