Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16450

Optimizer seems to make incorrect choice when AND clause on 2 varchar columns and those 2 columns are indexed (includes ORDER BY different_column and LIMIT 1)

Details

    Description

      We are seeing where a seemingly simple query is not using the correct index, and instead preferring an full table scan, which is resulting in a slower query.

      Here is a simplified version of the CREATE TABLE and SELECT:

      CREATE TABLE `t1` (
      `ts` datetime NOT NULL DEFAULT current_timestamp(),
      `val1` varchar(5) NOT NULL DEFAULT '0',
      `val2` varchar(3) NOT NULL DEFAULT '0',
      KEY `idx_val1_val2` (`val1`,`val2`),
      KEY `idx_ts` (`ts`)
      ) ENGINE=InnoDB;
      

      SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
      

      If I run this through EXPLAIN with 0 data, I see:

      mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts LIMIT 1;
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      | 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 1 | Using where |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      

      If I add some rows, I still see the same:

      INSERT INTO t1 VALUES (now(),'04643','955');
      INSERT INTO t1 VALUES (now(),'04643','999');
      INSERT INTO t1 VALUES (now(),'04643','955');
      INSERT INTO t1 VALUES (now(),'04643','999');
      INSERT INTO t1 VALUES (now(),'04643','955');
      INSERT INTO t1 VALUES (now(),'04643','999');
      INSERT INTO t1 VALUES (now(),'04643','955');
      INSERT INTO t1 VALUES (now(),'04643','999');
      INSERT INTO t1 VALUES (now(),'04643','955');
      INSERT INTO t1 VALUES (now(),'04643','999');
      

      mysql> EXPLAIN SELECT ts, val1, val2 FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      | 1 | SIMPLE | t1 | index | idx_val1_val2 | idx_ts | 5 | NULL | 2 | Using where |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
      

      If I remove the "LIMIT 1", then it chooses the correct index.

      Note that in the above EXPLAIN, only 1 index is listed for "possible_keys", yet the other is chosen for "key", so this tells me the full table scan was preferred over the index, and then the secondary index was used for the ORDER BY.

      This incorrect choice leads to slower times as opposed to when it chooses the correct index (or FORCE INDEX is used).

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment - - edited

            Reproduced on MariaDB 10.1-10.3
            (I used ANALYZE, that was introduced in 10.1, I guess the same issue exists from 5.5, but fix version should be decided by developers).
            testcase:

            --source include/have_innodb.inc
             
            CREATE TABLE `t1` (
            `ts` datetime NOT NULL ,
            `val1` varchar(5) NOT NULL,
            `val2` varchar(3) NOT NULL,
            KEY `k1` (`val1`,`val2`),
            KEY `k2` (`ts`)
            )ENGINE=InnoDB;
             
            DELIMITER $$;
            CREATE PROCEDURE prepare_data()
            BEGIN
              DECLARE i INT DEFAULT 1;
              WHILE i < 200 DO
                INSERT INTO t1 VALUES (now(), '04643','956'), (now(), '04643','955');
                SET i = i + 1;
              END WHILE;
            END$$
            DELIMITER ;$$
             
            CALL prepare_data();
             
            analyze
            SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
             
            analyze
            SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;
             
            DROP table t1;
            DROP procedure prepare_data;
            

            the results differ when InnoDB or MyISAM engine is used, or int instead of varchar datatype.
            in the second ANALYZE query, index k2 is not listed as a possible key

            analyze
            SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1;
            +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
            | id   | select_type | table | type  | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra       |
            +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
            |    1 | SIMPLE      | t1    | index | k1            | k2   | 5       | const,const |  201 |   1.00 |    49.75 |     100.00 | Using where |
            +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+
            1 row in set (0.000 sec)
             
            --------------
            analyze
            SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3;
            +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | r_rows | filtered | r_filtered | Extra                                              |
            +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
            |    1 | SIMPLE      | t1    | ref  | k1            | k1   | 12      | const,const |  100 |   3.00 |   100.00 |     100.00 | Using index condition; Using where; Using filesort |
            +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+
            1 row in set (0.002 sec)
            

            alice Alice Sherepa added a comment - - edited Reproduced on MariaDB 10.1-10.3 (I used ANALYZE, that was introduced in 10.1, I guess the same issue exists from 5.5, but fix version should be decided by developers). testcase: --source include/have_innodb.inc   CREATE TABLE `t1` ( `ts` datetime NOT NULL , `val1` varchar (5) NOT NULL , `val2` varchar (3) NOT NULL , KEY `k1` (`val1`,`val2`), KEY `k2` (`ts`) )ENGINE=InnoDB;   DELIMITER $$; CREATE PROCEDURE prepare_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 200 DO INSERT INTO t1 VALUES (now(), '04643' , '956' ), (now(), '04643' , '955' ); SET i = i + 1; END WHILE; END $$ DELIMITER ;$$   CALL prepare_data();   analyze SELECT * FROM t1 WHERE val1= '04643' AND val2= '955' ORDER BY ts DESC LIMIT 1;   analyze SELECT * FROM t1 WHERE val1= '04643' AND val2= '955' ORDER BY ts DESC LIMIT 3;   DROP table t1; DROP procedure prepare_data; the results differ when InnoDB or MyISAM engine is used, or int instead of varchar datatype. in the second ANALYZE query, index k2 is not listed as a possible key analyze SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 1; +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ | 1 | SIMPLE | t1 | index | k1 | k2 | 5 | const,const | 201 | 1.00 | 49.75 | 100.00 | Using where | +------+-------------+-------+-------+---------------+------+---------+-------------+------+--------+----------+------------+-------------+ 1 row in set (0.000 sec)   -------------- analyze SELECT * FROM t1 WHERE val1='04643' AND val2='955' ORDER BY ts DESC LIMIT 3; +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ | 1 | SIMPLE | t1 | ref | k1 | k1 | 12 | const,const | 100 | 3.00 | 100.00 | 100.00 | Using index condition; Using where; Using filesort | +------+-------------+-------+------+---------------+------+---------+-------------+------+--------+----------+------------+----------------------------------------------------+ 1 row in set (0.002 sec)

            If we look closely at both the analyze statements for the different queries , we don't see key k2 as
            a possible key in either of the plans. This is because possible_keys shows indexes or keys that can be used to evaluate the WHERE clause.

            For the first query (with limit = 1) there are 2 plans
            Plan 1

            • start reading though index ts
            • stop as soon as we have found a row that matches `WHERE val1='04643' AND val2='955'`

            Plan 2

            • read all rows that match
            • sort them and pick the 1st one

            There is a cost based approach to decide between these 2 plans, so for the first query we chose plan 1 and for the second query we chose plan 2.

            varun Varun Gupta (Inactive) added a comment - If we look closely at both the analyze statements for the different queries , we don't see key k2 as a possible key in either of the plans. This is because possible_keys shows indexes or keys that can be used to evaluate the WHERE clause. For the first query (with limit = 1) there are 2 plans Plan 1 start reading though index ts stop as soon as we have found a row that matches `WHERE val1='04643' AND val2='955'` Plan 2 read all rows that match sort them and pick the 1st one There is a cost based approach to decide between these 2 plans, so for the first query we chose plan 1 and for the second query we chose plan 2.
            varun Varun Gupta (Inactive) added a comment - - edited

            Another idea that psergey came up with

            we could do both:

            • start sscanning on ts, scan the expected number of rows.
            • If we havent found N matching rows, switch to the other plan.

            This way the worst case performance would be better but this is more of a new optimizer feature than a bugfix.

            varun Varun Gupta (Inactive) added a comment - - edited Another idea that psergey came up with we could do both: start sscanning on ts, scan the expected number of rows. If we havent found N matching rows, switch to the other plan. This way the worst case performance would be better but this is more of a new optimizer feature than a bugfix.

            People

              Unassigned Unassigned
              ccalender Chris Calender (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.