[MDEV-20482] MyISAM & Aria very slow when IN predicates containing more than 999 elements reference unindexed column. Created: 2019-09-03  Updated: 2020-08-25  Resolved: 2019-09-25

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - Aria, Storage Engine - MyISAM
Affects Version/s: 10.2.26, 10.3.17, 10.4.7
Fix Version/s: 10.3.18, 10.4.8

Type: Bug Priority: Minor
Reporter: Juan Assignee: Sergei Petrunia
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS 7


Issue Links:
Duplicate
duplicates MDEV-20109 Optimizer ignores distinct key create... Closed
Relates
relates to MDEV-16871 in_predicate_conversion_threshold can... Closed

 Description   

When querying a MyISAM or Aria table on a non-key field using an IN predicate to match values in an un-indexed column, results are fast so long as there are fewer than 1000 elements in the list so the plain select uses the key cache. The 1000th element causes the optimizer to materialize the list and do a join on an unindexed field. The result is that selecting 999 rows from a 5m row table takes 11 seconds while selecting 1000 takes over 24 minutes:

MariaDB [test]> show create table tkcm\G
*************************** 1. row ***************************
       Table: tkcm
Create Table: CREATE TABLE `tkcm` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `other_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.000 sec)
 
MariaDB [test]> show indexes from tkcm;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tkcm  |          0 | PRIMARY  |            1 | id          | A         |     5000000 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.000 sec)
 
MariaDB [test]> \. tkcm999.sql
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | tkcm  | ALL  | NULL          | NULL | NULL    | NULL | 5000000 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.002 sec)
 
MariaDB [test]> \. tkcm1000.sql
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
| id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra                                           |
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL |    1000 |                                                 |
|    1 | PRIMARY      | tkcm        | ALL  | NULL          | NULL | NULL    | NULL | 5000000 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | <derived3>  | ALL  | NULL          | NULL | NULL    | NULL |    1000 |                                                 |
|    3 | DERIVED      | NULL        | NULL | NULL          | NULL | NULL    | NULL |    NULL | No tables used                                  |
+------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
4 rows in set (0.002 sec)

Running a query to select a specific number of rows based on distinct values in the non-key field:

Sat Aug 31 17:11:06 EDT 2019 MyISAM 999 took 11 sec
Sat Aug 31 17:11:17 EDT 2019 Aria 999r took 11 sec
Sat Aug 31 17:36:12 EDT 2019 MyISAM 1000 took 1495 sec
Sat Aug 31 18:00:38 EDT 2019 Aria 1000 took 1466 sec

Note that indexing the second column makes the problem go away, since the query plan can then use that second index. This is just to point out that the results are not good when the list exceeds 1000 items and the queried column is not indexed.



 Comments   
Comment by Igor Babaev [ 2019-09-04 ]

Juan,
This is another manifestation of bug MDEV-20109.

Comment by Juan [ 2019-09-04 ]

igor
Any value in keeping this open or adding this information to MDEV-20109, or shall I just close it then?

Comment by Igor Babaev [ 2019-09-04 ]

Juan,
Please keep it open until the next release that is supposed to be this week, then check the issue again.

Comment by Juan [ 2019-09-25 ]

Confirmed fixed in 10.3.18 & 10.4.8

Generated at Thu Feb 08 08:59:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.