Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18898

SELECT using wrong index when using operator IN with mixed types

    XMLWordPrintable

Details

    Description

      Hello good people,

      I accidentaly stumbled upon a weird behavior of SELECT query optimizer, which doesn't seem to handle values of mixed types very well when using operator IN in WHERE condition. This ultimately results in suboptimal usage of indexes, even though sufficient information is IMHO available for handling the query better (i.e. using correct index).

      I was able to come up with MVCE of this sort:

      CREATE TEMPORARY TABLE __t1 (
        id int(10) unsigned NOT NULL AUTO_INCREMENT,
        name varchar(100) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `name` (`name`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      INSERT INTO __t1 SELECT seq, MD5(seq) FROM seq_1_to_500000;
       
      ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2);
      ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Uses wrong index
      ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Uses wrong index
      ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2');
      ANALYZE SELECT id, name FROM __t1 WHERE id = 1;
      ANALYZE SELECT id, name FROM __t1 WHERE id = '2';
      ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2;
      ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2';
      ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2';
      ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2;
      

      Resulting in:

      > ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, 2);
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', 2); -- Wrong index
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | r_rows     | filtered | r_filtered | Extra                    |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      |    1 | SIMPLE      | __t1  | index | PRIMARY       | name | 302     | NULL | 996723 | 1000000.00 |   100.00 |       0.00 | Using where; Using index |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      1 row in set (0.21 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id IN (1, '2'); -- Wrong index
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | r_rows     | filtered | r_filtered | Extra                    |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      |    1 | SIMPLE      | __t1  | index | PRIMARY       | name | 302     | NULL | 996723 | 1000000.00 |   100.00 |       0.00 | Using where; Using index |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+------------+----------+------------+--------------------------+
      1 row in set (0.22 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id IN ('1', '2');
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = 1;
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra |
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      |    1 | SIMPLE      | __t1  | const | PRIMARY       | PRIMARY | 4       | const |    1 |   NULL |   100.00 |       NULL |       |
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = '2';
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra |
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      |    1 | SIMPLE      | __t1  | const | PRIMARY       | PRIMARY | 4       | const |    1 |   NULL |   100.00 |       NULL |       |
      +------+-------------+-------+-------+---------------+---------+---------+-------+------+--------+----------+------------+-------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = 2;
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = '2';
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = 1 OR id = '2';
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
       
      > ANALYZE SELECT id, name FROM __t1 WHERE id = '1' OR id = 2;
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | __t1  | range | PRIMARY       | PRIMARY | 4       | NULL |    2 |   2.00 |   100.00 |     100.00 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------+----------+------------+-------------+
      1 row in set (0.00 sec)
      

      As you can see the second and third SELECT queries employ wrong index and thus both result in (unnecessary?) full scan, even though it is clear that the condition is based solely on the id column, which acts as a primary key.

      SELECT queries with mixed types (integer and string) inside IN condition are the ones affected by this suboptimal index usage.

      SELECTs with values of a single common type use correct indexes - even if they're not matching the id column's type (integer), e.g. when values inside IN are numeric strings. These queries are handled well.

      SELECTs with values of mixed types used with the OR operator instead (the last four queries) also use correct the index. These queries are handled well, too.

      The performance difference there is not trivial - and it becomes worse with bigger tables.

      One might argue that one should just always use the correct types of values with these conditions, but there are environments in the wild that don't really care about differences between integers and numeric strings - one prime example being PHP. These can (and will) pass these values in unexpected and inconsistent fashion. In fact this is how I found out about this.

      (I was unable test this with different versions of MariaDB than those mentioned above. Also I apologize in advance if I made any wrong assumptions about the MariaDB's internals.)

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              smuuf Přemysl Karbula
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.