Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20482

MyISAM & Aria very slow when IN predicates containing more than 999 elements reference unindexed column.

    XMLWordPrintable

Details

    Description

      When querying a MyISAM or Aria table on a non-key field using an IN predicate to match values in an un-indexed column, results are fast so long as there are fewer than 1000 elements in the list so the plain select uses the key cache. The 1000th element causes the optimizer to materialize the list and do a join on an unindexed field. The result is that selecting 999 rows from a 5m row table takes 11 seconds while selecting 1000 takes over 24 minutes:

      MariaDB [test]> show create table tkcm\G
      *************************** 1. row ***************************
             Table: tkcm
      Create Table: CREATE TABLE `tkcm` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `other_id` bigint(20) unsigned NOT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=MyISAM DEFAULT CHARSET=latin1
      1 row in set (0.000 sec)
       
      MariaDB [test]> show indexes from tkcm;
      +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
      +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | tkcm  |          0 | PRIMARY  |            1 | id          | A         |     5000000 |     NULL | NULL   |      | BTREE      |         |               |
      +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      1 row in set (0.000 sec)
       
      MariaDB [test]> \. tkcm999.sql
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
      |    1 | SIMPLE      | tkcm  | ALL  | NULL          | NULL | NULL    | NULL | 5000000 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
      1 row in set (0.002 sec)
       
      MariaDB [test]> \. tkcm1000.sql
      +------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      | id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra                                           |
      +------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      |    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL |    1000 |                                                 |
      |    1 | PRIMARY      | tkcm        | ALL  | NULL          | NULL | NULL    | NULL | 5000000 | Using where; Using join buffer (flat, BNL join) |
      |    2 | MATERIALIZED | <derived3>  | ALL  | NULL          | NULL | NULL    | NULL |    1000 |                                                 |
      |    3 | DERIVED      | NULL        | NULL | NULL          | NULL | NULL    | NULL |    NULL | No tables used                                  |
      +------+--------------+-------------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      4 rows in set (0.002 sec)
      

      Running a query to select a specific number of rows based on distinct values in the non-key field:

      Sat Aug 31 17:11:06 EDT 2019 MyISAM 999 took 11 sec
      Sat Aug 31 17:11:17 EDT 2019 Aria 999r took 11 sec
      Sat Aug 31 17:36:12 EDT 2019 MyISAM 1000 took 1495 sec
      Sat Aug 31 18:00:38 EDT 2019 Aria 1000 took 1466 sec
      

      Note that indexing the second column makes the problem go away, since the query plan can then use that second index. This is just to point out that the results are not good when the list exceeds 1000 items and the queried column is not indexed.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              juan.vera Juan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.