[MDEV-6402] Optimizer doesn't choose best execution plan when composite key is used. Created: 2014-06-27  Updated: 2018-12-19  Resolved: 2014-10-06

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.11
Fix Version/s: 10.1.1

Type: Bug Priority: Major
Reporter: Seunguck Lee Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer, order-by-optimization
Environment:

Linux matt001 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux


Attachments: File mdev6402-dataset1.sql    

 Description   

MariaDB optimizer doesn't choose best execution plan when they use composite key.

There's composite key with two columns (pk1 + fd5).
The the query which have (pk1=? and fd5>?) where condition and ORDER BY fd5 clause
generate plan using only "pk1" column.
It also happen in MariaDB 5.5.24.

See below test case.
I can't upload sample data of table, becaus it's too big and it's also real service data. I think you can generate test data with index cardinality and table status.

Test case ------------------------------------------------------------------------------

MariaDB [test]> select version();
+---------------------+
| version()           |
+---------------------+
| 10.0.11-MariaDB-log |
+---------------------+
1 row in set (0.00 sec)

MariaDB [test]> show variables like 'optimizer_switch'\G
*************************** 1. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=off
1 row in set (0.00 sec)

MariaDB [test]> show create table tb_test\G
*************************** 1. row ***************************
       Table: tb_test
Create Table: CREATE TABLE `tb_test` (
  `pk1` int(11) NOT NULL,
  `pk2` int(11) NOT NULL,
  `fd1` int(11) NOT NULL,
  `fd2` bigint(20) NOT NULL DEFAULT '0',
  `fd3` bigint(20) NOT NULL DEFAULT '0',
  `fd4` datetime NOT NULL,
  `fd5` bigint(20) DEFAULT NULL,
  `fd6` varchar(64) DEFAULT NULL,
  `fd7` text,
  `fd8` varchar(64) DEFAULT NULL,
  `fd9` tinyint(1) NOT NULL DEFAULT '1',
  `fd10` bigint(20) NOT NULL DEFAULT '0',
  `fd11` tinyint(1) DEFAULT NULL,
  `fd12` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`pk1`,`pk2`),
  UNIQUE KEY `ux_pk1_fd5` (`pk1`,`fd5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY KEY (pk1)
PARTITIONS 5 */
1 row in set (0.00 sec)

MariaDB [test]> alter table tb_test engine=innodb;
Query OK, 0 rows affected (7 min 3.85 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> show table status like 'tb_test'\G
*************************** 1. row ***************************
           Name: tb_test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 8366199
 Avg_row_length: 96
    Data_length: 806895616
Max_data_length: 0
   Index_length: 195002368
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: partitioned
        Comment:
1 row in set (0.02 sec)

MariaDB [test]> show index from tb_test;
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tb_test |          0 | PRIMARY    |            1 | pk1         | A         |     4183099 |     NULL | NULL   |      | BTREE      |         |               |
| tb_test |          0 | PRIMARY    |            2 | pk2         | A         |     8366199 |     NULL | NULL   |      | BTREE      |         |               |
| tb_test |          0 | ux_pk1_fd5 |            1 | pk1         | A         |     8366199 |     NULL | NULL   |      | BTREE      |         |               |
| tb_test |          0 | ux_pk1_fd5 |            2 | fd5         | A         |     8366199 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [test]> EXPLAIN SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5<91866952691281442 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
| id   | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows  | Extra       |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
|    1 | SIMPLE      | tb_test | range | ux_pk1_fd5    | ux_pk1_fd5 | 13      | NULL | 14590 | Using where |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
1 row in set (0.01 sec)
 
MariaDB [test]> EXPLAIN SELECT * FROM tb_test WHERE pk1=8287001 AND fd5<91866952691281442 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
| id   | select_type | table   | type | possible_keys      | key        | key_len | ref   | rows  | Extra       |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
|    1 | SIMPLE      | tb_test | ref  | PRIMARY,ux_pk1_fd5 | ux_pk1_fd5 | 4       | const | 65108 | Using where |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
1 row in set (0.01 sec)
 
MariaDB [test]> show status like 'Handler_read_prev';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_read_prev | 0     |
+-------------------+-------+
1 row in set (0.02 sec)

MariaDB [test]> SELECT * FROM tb_test WHERE pk1=8287001 AND fd5<91866952691281442 ORDER BY fd5 DESC LIMIT 201,1;
...
1 row in set (0.27 sec)
 
MariaDB [test]> show status like 'Handler_read_prev';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_read_prev | 28201 |
+-------------------+-------+
1 row in set (0.00 sec)
 
MariaDB [test]> SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5<91866952691281442 ORDER BY fd5 DESC LIMIT 201,1;
...
1 row in set (0.01 sec)
 
MariaDB [test]> show status like 'Handler_read_prev';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_read_prev | 28402 |
+-------------------+-------+
1 row in set (0.00 sec)



 Comments   
Comment by Sergei Petrunia [ 2014-07-09 ]

I tried to create a dataset that would match the provided descriptions. I used these two queries

EXPLAIN SELECT * FROM tb_test                       WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;
EXPLAIN SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;

and I got this:

MariaDB [j6]> EXPLAIN SELECT * FROM tb_test                       WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
| id   | select_type | table   | type | possible_keys      | key        | key_len | ref   | rows  | Extra       |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
|    1 | SIMPLE      | tb_test | ref  | PRIMARY,ux_pk1_fd5 | ux_pk1_fd5 | 4       | const | 55678 | Using where |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+

MariaDB [j6]> EXPLAIN SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
| id   | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows  | Extra       |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
|    1 | SIMPLE      | tb_test | range | ux_pk1_fd5    | ux_pk1_fd5 | 13      | NULL | 55594 | Using where |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+

Comment by Sergei Petrunia [ 2014-07-09 ]

Both queries run in 0.00 sec for me. However, it is still peculiar that we would use a ref access where there is a possible range access on the same index that will scan a proper subset of the records. The original bug report shows a situation where range access reads fewer records.

Comment by Sergei Petrunia [ 2014-07-09 ]

I've tried to replicate the situation from the bug report. the choice seems to depend on thevalues of the constants.

When the range is very small, both queries pick range(2 keyparts):

EXPLAIN SELECT * FROM tb_test                       WHERE pk1=8287001 AND fd5 < 100000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+-------+--------------------+------------+---------+------+------+-------------+
| id   | select_type | table   | type  | possible_keys      | key        | key_len | ref  | rows | Extra       |
+------+-------------+---------+-------+--------------------+------------+---------+------+------+-------------+
|    1 | SIMPLE      | tb_test | range | PRIMARY,ux_pk1_fd5 | ux_pk1_fd5 | 13      | NULL |    1 | Using where |
+------+-------------+---------+-------+--------------------+------------+---------+------+------+-------------+
 
EXPLAIN SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5 < 100000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+-------+---------------+------------+---------+------+------+-------------+
| id   | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+------+-------------+---------+-------+---------------+------------+---------+------+------+-------------+
|    1 | SIMPLE      | tb_test | range | ux_pk1_fd5    | ux_pk1_fd5 | 13      | NULL |    1 | Using where |
+------+-------------+---------+-------+---------------+------------+---------+------+------+-------------+

when the range gets bigger, the query without hint switches to ref(one keypart):

EXPLAIN SELECT * FROM tb_test                       WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
| id   | select_type | table   | type | possible_keys      | key        | key_len | ref   | rows  | Extra       |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
|    1 | SIMPLE      | tb_test | ref  | PRIMARY,ux_pk1_fd5 | ux_pk1_fd5 | 4       | const | 55678 | Using where |
+------+-------------+---------+------+--------------------+------------+---------+-------+-------+-------------+
 
MariaDB [j6]> EXPLAIN SELECT * FROM tb_test USE INDEX(ux_pk1_fd5) WHERE pk1=8287001 AND fd5 < 1000000 ORDER BY fd5 DESC LIMIT 201,1;
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
| id   | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows  | Extra       |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+
|    1 | SIMPLE      | tb_test | range | ux_pk1_fd5    | ux_pk1_fd5 | 13      | NULL | 55594 | Using where |
+------+-------------+---------+-------+---------------+------------+---------+------+-------+-------------+

Comment by Sergei Petrunia [ 2014-07-09 ]

The above experiments were done on a tree that didn't have the fix for MDEV-6384. Retried on the tree with fix for MDEV-6384, got the same result.

Comment by Sergei Petrunia [ 2014-07-09 ]

Debugging...
best_access_path() picks ref(PRIMARY)
(an alternative could be when it picks range(ux_pk1_fd5))

test_if_skip_sort_order() finds that ref(PRIMARY) does not provide the desired
ordering and goes to search for an index that produces records in order.

The code has a special branch that handles cases when there is another index
with the same prefix that can be used for ref access:

      if ((new_ref_key= test_if_subkey(order, table, ref_key, ref_key_parts,
				       &usable_keys)) < MAX_KEY)
      {
	if (tab->ref.key >= 0)
	{
          /*
            We'll use ref access method on key new_ref_key. In general case 
            the index search tuple for new_ref_key will be different (e.g.
            when one index is defined as (part1, part2, ...) and another as
            (part1, part2(N), ...) and the WHERE clause contains 
            "part1 = const1 AND part2=const2". 
            So we build tab->ref from scratch here.
          */
          KEYUSE *keyuse= tab->keyuse;
          while (keyuse->key != new_ref_key && keyuse->table == tab->table)
            keyuse++;
          if (create_ref_for_key(tab->join, tab, keyuse, FALSE,
                                 (tab->join->const_table_map |
                                  OUTER_REF_TABLE_BIT)))
            goto use_filesort;

We don't make a choice between ref and range access, we use ref
unconditionally. test_if_subkey() goes through suitable indexes and picks one
that has the least key length, but that's it.

Comment by Sergei Petrunia [ 2014-07-09 ]

Debugging, #rows and costs:

== range optimization ==
=== PRIMARY ===

rows=55678
cost={io_count = 742.951, avg_io_cost = 1, cpu_cost = 11135.610, }
cost.total_cost()= 11878.561

This gets returned as tab->quick.

=== ux_pk1_fd5 ===

cost={io_count = 55595, avg_io_cost = 1, cpu_cost = 11118.809,}
cost.total_cost()=66713.809

== Join optimization ==

=== PRIMARY ===

rows=55678 
tmp= table->file->read_time() = 742.951

This ref is picked as best. Note that its cost is much smaller than cost of range access over the same interval. The difference is caused by range optimizer adding "CPU cost" (which is cpu_cost = 11135.610 > io_cost)

=== ux_pk1_fd5 ===

rows=55594
tmp= table->file->read_time() = 55595

again, range access's CPU cost is not added here.
=== join optimization/quick select ===
We don't consider quick select because it fails this condition:

    (2) This doesn't hold: the best way to perform table scan is to to perform
        'range' access using index IDX, and the best way to perform 'ref' 
        access is to use the same index IDX, with the same or more key parts.
        (note: it is not clear how this rule is/should be extended to 
        index_merge quick selects)

== test_if_skip_sort_order ==
(see above comments, no cost-based comparisons are done).

Comment by Sergei Petrunia [ 2014-07-10 ]

Test dataset fill script, mdev6402-dataset1.sql

Comment by Sergei Petrunia [ 2014-07-10 ]

Observations about cost calculations:

#rows is very close for the PRIMARY key and for ux_pk1_fd5.
But the cost of PK access is much smaller. The difference comes from
ha_innobase::read_time() which uses different cost formulas for PK and
non-PK (PK costs are much smaller).
The difference in costs looks unrealistically high.

Then, test_if_skip_sort_order() doesn't care about costs. It changes to using
ux_pk1_fd5 based on the fact that ref access would scan the same range there.

Comment by Sergei Petrunia [ 2014-07-10 ]

Possible solutions

== CHECK-RANGE-ALSO ==
Having switched to ref(const) access, check if the used index has a range
access also. We know for sure taht range will scan a smaller set of rows.
(Q: does this mean that range will be cheaper? Is scanning fewer rows with
index_next() faster than scanning a bit more rows with index_next_same()?)

== REF-IS-THE-FIRST-STEP ==
(This makes sense after MDEV-6384). So we've used "equal ranges are scanned"
as justification to pick a ref(const) access. Now, continue to consider all other ways
to read rows in the required ordering. Make a cost-based decision between the available options. (Q: how to account for condition selectivity and LIMIT? correlation between
WHERE clause and the used access method is a big problem when both are highly selective)

Comment by Sergei Petrunia [ 2014-09-09 ]

Re-tried on 10.1 with fixes for MDEV-6384 and MDEV-6657 (https://github.com/MariaDB/server/tree/bb-10.1-mdev6657-r2). The issue is still repeatable.

Comment by Sergei Petrunia [ 2014-09-12 ]

Pushed a patch here: https://github.com/MariaDB/server/tree/bb-10.1-mdev6402

elenst, I need a testing pass for this tree.

Comment by George Diamantopoulos [ 2018-12-19 ]

I think there might be a regression with 10.3.11 but it also may be a different issue. My use case is similar, composite primary key with (id, date), indexes for date and callid. `date` and `callid` used in WHERE clause, `id` in ORDER BY clause.

Mariadb version is: 10.3.11-MariaDB-1:10.3.11+maria~stretch-log mariadb.org binary distribution

This is a galera cluster installation, in case it makes any difference.

With ORDER BY clause key selection is sub-optimal, leading to the query taking a long time.

Without ORDER BY clause, execution is very fast.

I've pasted more information at https://pastebin.com/dPWEBs25

Please let me know if I need to open a separate issue for this...

Generated at Thu Feb 08 07:11:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.