Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Duplicate
-
10.2.26, 10.3.17, 10.4.7
-
None
-
CentOS 7
Description
When querying a MyISAM or Aria table on a non-key field using an IN predicate to match values in an un-indexed column, results are fast so long as there are fewer than 1000 elements in the list so the plain select uses the key cache. The 1000th element causes the optimizer to materialize the list and do a join on an unindexed field. The result is that selecting 999 rows from a 5m row table takes 11 seconds while selecting 1000 takes over 24 minutes:
MariaDB [test]> show create table tkcm\G
|
*************************** 1. row ***************************
|
Table: tkcm
|
Create Table: CREATE TABLE `tkcm` (
|
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
|
`other_id` bigint(20) unsigned NOT NULL,
|
PRIMARY KEY (`id`)
|
) ENGINE=MyISAM DEFAULT CHARSET=latin1
|
1 row in set (0.000 sec)
|
|
MariaDB [test]> show indexes from tkcm;
|
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
|
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
| tkcm | 0 | PRIMARY | 1 | id | A | 5000000 | NULL | NULL | | BTREE | | |
|
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
1 row in set (0.000 sec)
|
|
MariaDB [test]> \. tkcm999.sql
|
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|
| 1 | SIMPLE | tkcm | ALL | NULL | NULL | NULL | NULL | 5000000 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|
1 row in set (0.002 sec)
|
|
MariaDB [test]> \. tkcm1000.sql
|
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
|
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 1000 | |
|
| 1 | PRIMARY | tkcm | ALL | NULL | NULL | NULL | NULL | 5000000 | Using where; Using join buffer (flat, BNL join) |
|
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 1000 | |
|
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
|
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
|
4 rows in set (0.002 sec)
|
Running a query to select a specific number of rows based on distinct values in the non-key field:
Sat Aug 31 17:11:06 EDT 2019 MyISAM 999 took 11 sec
|
Sat Aug 31 17:11:17 EDT 2019 Aria 999r took 11 sec
|
Sat Aug 31 17:36:12 EDT 2019 MyISAM 1000 took 1495 sec
|
Sat Aug 31 18:00:38 EDT 2019 Aria 1000 took 1466 sec
|
Note that indexing the second column makes the problem go away, since the query plan can then use that second index. This is just to point out that the results are not good when the list exceeds 1000 items and the queried column is not indexed.
Attachments
Issue Links
- duplicates
-
MDEV-20109 Optimizer ignores distinct key created for materialized semi-join subquery when searching for best execution plan
- Closed
- relates to
-
MDEV-16871 in_predicate_conversion_threshold cannot be set in my.cnf
- Closed