Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.1.37, 10.1.38, 5.5(EOL), 10.0(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
-
Experienced on CentOS 7 (centos-release-7-5.1804.5.el7.centos.x86_64) and able to reproduce also on WSL Ubuntu 18.04 under Win10 Insiders build 18351
Description
Hello good people,
I accidentaly stumbled upon a weird behavior of SELECT query optimizer, which doesn't seem to handle values of mixed types very well when using operator IN in WHERE condition. This ultimately results in suboptimal usage of indexes, even though sufficient information is IMHO available for handling the query better (i.e. using correct index).
I was able to come up with MVCE of this sort:
CREATE TEMPORARY TABLE __t1 ( |
id int(10) unsigned NOT NULL AUTO_INCREMENT, |
name varchar(100) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, |
PRIMARY KEY (`id`), |
UNIQUE KEY `name` (`name`) |
) ENGINE=InnoDB DEFAULT CHARSET=utf8; |
INSERT INTO __t1 SELECT seq, MD5(seq) FROM seq_1_to_500000; |
|
ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2); |
ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Uses wrong index |
ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Uses wrong index |
ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2'); |
ANALYZE SELECT id, name FROM __t1 WHERE id = 1; |
ANALYZE SELECT id, name FROM __t1 WHERE id = '2'; |
ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2; |
ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2'; |
ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2'; |
ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2; |
Resulting in:
> ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2);
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Wrong index
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
| 1 | SIMPLE | __t1 | index | PRIMARY | name | 302 | NULL | 996723 | 1000000.00 | 100.00 | 0.00 | Using where; Using index |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
1 row in set (0.21 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Wrong index
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
| 1 | SIMPLE | __t1 | index | PRIMARY | name | 302 | NULL | 996723 | 1000000.00 | 100.00 | 0.00 | Using where; Using index |
|
+------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
|
1 row in set (0.22 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2');
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1;
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
| 1 | SIMPLE | __t1 | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL | 100.00 | NULL | |
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = '2';
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
| 1 | SIMPLE | __t1 | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL | 100.00 | NULL | |
|
+------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2;
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2';
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2';
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
|
> ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2;
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
| 1 | SIMPLE | __t1 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 2.00 | 100.00 | 100.00 | Using where |
|
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
|
1 row in set (0.00 sec)
|
As you can see the second and third SELECT queries employ wrong index and thus both result in (unnecessary?) full scan, even though it is clear that the condition is based solely on the id column, which acts as a primary key.
SELECT queries with mixed types (integer and string) inside IN condition are the ones affected by this suboptimal index usage.
SELECTs with values of a single common type use correct indexes - even if they're not matching the id column's type (integer), e.g. when values inside IN are numeric strings. These queries are handled well.
SELECTs with values of mixed types used with the OR operator instead (the last four queries) also use correct the index. These queries are handled well, too.
The performance difference there is not trivial - and it becomes worse with bigger tables.
One might argue that one should just always use the correct types of values with these conditions, but there are environments in the wild that don't really care about differences between integers and numeric strings - one prime example being PHP. These can (and will) pass these values in unexpected and inconsistent fashion. In fact this is how I found out about this.
(I was unable test this with different versions of MariaDB than those mentioned above. Also I apologize in advance if I made any wrong assumptions about the MariaDB's internals.)
Attachments
Issue Links
- is blocked by
-
MDEV-15759 Expect "Impossible WHERE" for indexed_int_column=out_of_range_int_constant
- Closed
-
MDEV-18964 Non-optimal execution of WHERE num_field=const AND num_field=const
- Open
-
MDEV-18968 Both (WHERE 0.1) and (WHERE NOT 0.1) return empty set
- Closed
-
MDEV-19008 Slow EXPLAIN SELECT ... WHERE col IN (const1,const2,(subquery))
- Closed
- relates to
-
MDEV-8787 Return Impossible WHERE instead of Full table scan on some admittedly false field=const expressions
- Open
-
MDEV-31303 Key not used when IN clause has both signed and unsigned values
- Closed
-
MDEV-18319 BIGINT UNSIGNED Performance issue
- Closed