[MDEV-18898] SELECT using wrong index when using operator IN with mixed types Created: 2019-03-12  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1.37, 10.1.38, 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Přemysl Karbula Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: datatype, innodb, optimizer, performance, types
Environment:

Experienced on CentOS 7 (centos-release-7-5.1804.5.el7.centos.x86_64) and able to reproduce also on WSL Ubuntu 18.04 under Win10 Insiders build 18351


Issue Links:
Blocks
is blocked by MDEV-15759 Expect "Impossible WHERE" for indexed... Closed
is blocked by MDEV-18964 Non-optimal execution of WHERE num_fi... Open
is blocked by MDEV-18968 Both (WHERE 0.1) and (WHERE NOT 0.1) ... Closed
is blocked by MDEV-19008 Slow EXPLAIN SELECT ... WHERE col IN ... Closed
Relates
relates to MDEV-8787 Return Impossible WHERE instead of Fu... Open
relates to MDEV-18319 BIGINT UNSIGNED Performance issue Stalled

 Description   

Hello good people,

I accidentaly stumbled upon a weird behavior of SELECT query optimizer, which doesn't seem to handle values of mixed types very well when using operator IN in WHERE condition. This ultimately results in suboptimal usage of indexes, even though sufficient information is IMHO available for handling the query better (i.e. using correct index).

I was able to come up with MVCE of this sort:

CREATE TEMPORARY TABLE __t1 (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  name varchar(100) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO __t1 SELECT seq, MD5(seq) FROM seq_1_to_500000;
 
ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2);
ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Uses wrong index
ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Uses wrong index
ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2');
ANALYZE SELECT id, name FROM __t1 WHERE id = 1;
ANALYZE SELECT id, name FROM __t1 WHERE id = '2';
ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2;
ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2';
ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2';
ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2;

Resulting in:

> ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2);
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Wrong index
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | r_rows     | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|    1 | SIMPLE      | __t1  | index | PRIMARY       | name | 302     | NULL | 996723 | 1000000.00 |   100.00 |       0.00 | Using where; Using index |
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
1 row in set (0.21 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Wrong index
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | r_rows     | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|    1 | SIMPLE      | __t1  | index | PRIMARY       | name | 302     | NULL | 996723 | 1000000.00 |   100.00 |       0.00 | Using where; Using index |
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
1 row in set (0.22 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2');
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1;
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|    1 | SIMPLE      | __t1  | const | PRIMARY       | PRIMARY | 4       | const |    1 |   NULL |   100.00 |       NULL |       |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = '2';
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|    1 | SIMPLE      | __t1  | const | PRIMARY       | PRIMARY | 4       | const |    1 |   NULL |   100.00 |       NULL |       |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2;
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2';
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2';
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
 
> ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2;
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
1 row in set (0.00 sec)

As you can see the second and third SELECT queries employ wrong index and thus both result in (unnecessary?) full scan, even though it is clear that the condition is based solely on the id column, which acts as a primary key.

SELECT queries with mixed types (integer and string) inside IN condition are the ones affected by this suboptimal index usage.

SELECTs with values of a single common type use correct indexes - even if they're not matching the id column's type (integer), e.g. when values inside IN are numeric strings. These queries are handled well.

SELECTs with values of mixed types used with the OR operator instead (the last four queries) also use correct the index. These queries are handled well, too.

The performance difference there is not trivial - and it becomes worse with bigger tables.

One might argue that one should just always use the correct types of values with these conditions, but there are environments in the wild that don't really care about differences between integers and numeric strings - one prime example being PHP. These can (and will) pass these values in unexpected and inconsistent fashion. In fact this is how I found out about this.

(I was unable test this with different versions of MariaDB than those mentioned above. Also I apologize in advance if I made any wrong assumptions about the MariaDB's internals.)


Generated at Thu Feb 08 08:47:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.