[MDEV-6384] It seems like OPTIMIZER take into account the order of indexes in the table. Created: 2014-06-25  Updated: 2014-10-11  Resolved: 2014-09-09

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.39, 10.0.11
Fix Version/s: 10.1.1

Type: Bug Priority: Critical
Reporter: Seunguck Lee Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer, order-by-optimization
Environment:

Linux matt001 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux


Attachments: Text File global_variables.txt     File mdev6384-psergey-try1.sql    
Issue Links:
Relates
relates to MDEV-465 Optimizer : wrong index choice, leadi... Closed
relates to MDEV-6454 Performance degradation (suboptimal e... Closed

 Description   

I created two test table. The only difference between them is the order of secondary index.
And I explained same query on two table after insert same test data in two tables.
But query execution plan is different.
<< First see the below test case >>

Here's the problem.
If "ix_fd1_fd2" index is UNIQUE, optimizer never use "ix_fd_fdpk" index in this case. Becuase UNIQUE index will be positioned before NORMAL secondary index like below table, even though it is added later.

CREATE TABLE `tb_test1` (
  ...
  PRIMARY KEY (`fd_pk`),
  UNIQUE KEY `ux_fd1_fd2` (`fd1`,`fd2`),
  KEY `ix_fd_fdpk` (`fd1`,`fd_pk`),
  KEY `ix_fd1_fd2` (`fd1`,`fd2`),
);

How can I make optimizer use ix_fd1_fdpk to avoid filesort operation (Without index hint ^^) ?
And is this expected ?

-- TEST CASE ------------------------
// Prepare some test data and table
MariaDB [test]> INSERT INTO tb_test1 SELECT NULL, ORDINAL_POSITION, IFNULL(CHARACTER_MAXIMUM_LENGTH, ROUND(RAND()*10000)), NOW(), 'dummy' FROM information_schema.COLUMNS;
Query OK, 1886 rows affected (0.26 sec)
Records: 1886  Duplicates: 0  Warnings: 0
 
MariaDB [test]> INSERT INTO tb_test2 SELECT NULL, ORDINAL_POSITION, IFNULL(CHARACTER_MAXIMUM_LENGTH, ROUND(RAND()*10000)), NOW(), 'dummy' FROM information_schema.COLUMNS;
Query OK, 1886 rows affected (0.23 sec)
Records: 1886  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table tb_test1 engine=innodb;
Query OK, 0 rows affected (0.79 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table tb_test2 engine=innodb;
Query OK, 0 rows affected (0.45 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> show table status like 'tb_test1'\G
*************************** 1. row ***************************
           Name: tb_test1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1886
 Avg_row_length: 78
    Data_length: 147456
Max_data_length: 0
   Index_length: 147456
      Data_free: 0
 Auto_increment: 2048
    Create_time: 2014-06-25 08:29:11
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

MariaDB [test]> show table status like 'tb_test2'\G
*************************** 1. row ***************************
           Name: tb_test2
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1886
 Avg_row_length: 78
    Data_length: 147456
Max_data_length: 0
   Index_length: 147456
      Data_free: 0
 Auto_increment: 2048
    Create_time: 2014-06-25 08:29:13
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

MariaDB [test]> show index from tb_test1;
+----------+------------+------------+--------------+-------------+-----------+-------------+..
| Table    | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality |..
+----------+------------+------------+--------------+-------------+-----------+-------------+..
| tb_test1 |          0 | PRIMARY    |            1 | fd_pk       | A         |        1886 |..
| tb_test1 |          1 | ix_fd_fdpk |            1 | fd1         | A         |         157 |..
| tb_test1 |          1 | ix_fd_fdpk |            2 | fd_pk       | A         |        1886 |..
| tb_test1 |          1 | ix_fd1_fd2 |            1 | fd1         | A         |         157 |..
| tb_test1 |          1 | ix_fd1_fd2 |            2 | fd2         | A         |        1886 |..
+----------+------------+------------+--------------+-------------+-----------+-------------+..
5 rows in set (0.00 sec)
MariaDB [test]> show index from tb_test2;
+----------+------------+------------+--------------+-------------+-----------+-------------+..
| Table    | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality |..
+----------+------------+------------+--------------+-------------+-----------+-------------+..
| tb_test2 |          0 | PRIMARY    |            1 | fd_pk       | A         |        1886 |..
| tb_test2 |          1 | ix_fd1_fd2 |            1 | fd1         | A         |         157 |..
| tb_test2 |          1 | ix_fd1_fd2 |            2 | fd2         | A         |        1886 |..
| tb_test2 |          1 | ix_fd_fdpk |            1 | fd1         | A         |         157 |..
| tb_test2 |          1 | ix_fd_fdpk |            2 | fd_pk       | A         |        1886 |..
+----------+------------+------------+--------------+-------------+-----------+-------------+..
5 rows in set (0.00 sec)

MariaDB [test]> explain select * from tb_test1 where fd1=1 order by fd_pk limit 1000;
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-------------+
| id   | select_type | table    | type | possible_keys         | key        | key_len | ref   | rows | Extra       |
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-------------+
|    1 | SIMPLE      | tb_test1 | ref  | ix_fd_fdpk,ix_fd1_fd2 | ix_fd_fdpk | 8       | const |  172 | Using where |
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain select * from tb_test2 where fd1=1 order by fd_pk limit 1000;
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-----------------------------+
| id   | select_type | table    | type | possible_keys         | key        | key_len | ref   | rows | Extra                       |
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-----------------------------+
|    1 | SIMPLE      | tb_test2 | ref  | ix_fd1_fd2,ix_fd_fdpk | ix_fd1_fd2 | 8       | const |  172 | Using where; Using filesort |
+------+-------------+----------+------+-----------------------+------------+---------+-------+------+-----------------------------+
1 row in set (0.00 sec)

==> The last query doesn't need filesort operation when they use ix_fd1_fdpk index. But not..



 Comments   
Comment by Sergei Petrunia [ 2014-07-02 ]

The report doesn't specify the dataset (types of columns not known, and the query that fills the tables uses information_schema.columns whose contents are different on every database). Matt74, if you still have tb_test1 and tb_test2 tables, could you please attach a mysqldump of these to make sure we're looking at the same problem?

Comment by Sergei Petrunia [ 2014-07-02 ]

I've created a test dataset that I believe (but not 100% certain) to be similar. Attaching mdev6384-psergey-try1.sql.

Comment by Sergei Petrunia [ 2014-07-02 ]

And the problem I'm observing is as follows. two tables, t1 and t2, are identical except the order of indexes:

--- t1.sql
+++ t2.sql
@@ -1,10 +1,10 @@
-CREATE TABLE `t1` (
+CREATE TABLE `t2` (
   fd_pk int auto_increment,
   fd1 int,
   fd2 int,
   datecol datetime,
   dummy varchar(32),
   PRIMARY KEY (`fd_pk`),
-  KEY `ix_fd1_fd2` (`fd1`,`fd2`),
-  KEY `ix_fd1_fdpk` (`fd1`,`fd_pk`)
+  KEY `ix_fd1_fdpk` (`fd1`,`fd_pk`),
+  KEY `ix_fd1_fd2` (`fd1`,`fd2`)
 );

Now, if we run the same query on t1 and t2, we will see different query plans:

+------+-------------+-------+------+------------------------+------------+---------+-------+------+-----------------------------+
| id   | select_type | table | type | possible_keys          | key        | key_len | ref   | rows | Extra                       |
+------+-------------+-------+------+------------------------+------------+---------+-------+------+-----------------------------+
|    1 | SIMPLE      | t1    | ref  | ix_fd1_fd2,ix_fd1_fdpk | ix_fd1_fd2 | 5       | const |  192 | Using where; Using filesort |
+------+-------------+-------+------+------------------------+------------+---------+-------+------+-----------------------------+

+------+-------------+-------+------+------------------------+-------------+---------+-------+------+-------------+
| id   | select_type | table | type | possible_keys          | key         | key_len | ref   | rows | Extra       |
+------+-------------+-------+------+------------------------+-------------+---------+-------+------+-------------+
|    1 | SIMPLE      | t2    | ref  | ix_fd1_fdpk,ix_fd1_fd2 | ix_fd1_fdpk | 5       | const |  192 | Using where |
+------+-------------+-------+------+------------------------+-------------+---------+-------+------+-------------+

Both plans scan the same range of values. the plan with "Using filesort" is apparently more expensive. Why is it picked. Why does optimizer's choice between /different/ query plans depend on the order of indexes in the table?

I'll investigate this.

Comment by Sergei Petrunia [ 2014-07-02 ]

Debugging

explain select * from t2 where fd1=1 order by fd_pk limit 1000; 

one can see that the range optimizer choses ref(ix_fd1_fdpk), later test_if_skip_sort_order() finds that using this index produces the required ordering. Everything looks ok.

Debugging

explain select * from t2 where fd1=1 order by fd_pk limit 1000; 

one can see that
the range optimizer choses ref(ix_fd1_fd2);
test_if_skip_sort_order() finds that this index doesn't produce the required ordering;
test_if_cheaper_ordering() is called.
it looks for an index that produces the required ordering.
it considers PRIMARY (I skip it) and then ix_fd1_fdpk.
There, it has

(gdb) print refkey_rows_estimate
  $34 = 192

OK

(gdb) print table_records
  $35 = 1640
(gdb) print select_limit
  $36 = 1640

Now, we think that with index `ix_fd1_fdpk` we will read more rows than with `ix_fd1_fd2`. This is wrong. Eventually we arrive at this estimate

(gdb) print index_scan_time
  $41 = 1640

while the original estimate was

(gdb) print read_time
  $42 = 22

This is apparently wrong. I am not sure which part of test_if_cheaper_ordering() is wrong, I'll investigate further.

Comment by Sergei Petrunia [ 2014-07-02 ]

The source of read_time=22 is best_access_path:

tmp=table->quick_rows[key]=192
s->worst_seeks=21

    tmp= table->file->read_time(key, 1,
                                (ha_rows) MY_MIN(tmp,s->worst_seeks));

rows + ranges = 21 + 1 = 22.

Comment by Sergei Petrunia [ 2014-07-03 ]

Debugging further, I find two problems.
The first is this piece of code

        /*
          We assume that each of the tested indexes is not correlated
          with ref_key. Thus, to select first N records we have to scan
          N/selectivity(ref_key) index entries. 
          selectivity(ref_key) = #scanned_records/#table_records =
          refkey_rows_estimate/table_records.
          In any case we can't select more than #table_records.
          N/(refkey_rows_estimate/table_records) > table_records
          <=> N > refkey_rows_estimate.
         */
        if (select_limit > refkey_rows_estimate)
          select_limit= table_records;
        else
          select_limit= (ha_rows) (select_limit *
                                   (double) table_records /
                                    refkey_rows_estimate);

we have

refkey_rows_estimate=192
select_limit=1000
table_records=1640

if we assume that condition on the index being tested (ix_fd1_fdpk) is not correlated with ref_key (ix_fd1_fd2), then

$where_selectivity= refkey_rows_estimate / table_records = 0.1   

"Every 10th record matches the where clause". In order to get select_limit matching records, we'll need to read select_limit / selectivity= 10000 rows. The table has only table_records=1640 rows, so we set select_limit=1640.

So, we don't take correlation into account and that causes us to have the wrong LIMIT value.

I've tried setting the right LIMIT value in debugger, and it still picks the wrong plan.

Comment by Sergei Petrunia [ 2014-07-03 ]

The second problem part is here:

        /*
          Here we take into account the fact that rows are
          accessed in sequences rec_per_key records in each.
          Rows in such a sequence are supposed to be ordered
          by rowid/primary key. When reading the data
          in a sequence we'll touch not more pages than the
          table file contains.
          TODO. Use the formula for a disk sweep sequential access
          to calculate the cost of accessing data rows for one 
          index entry.
        */
        index_scan_time= select_limit/rec_per_key *
                         MY_MIN(rec_per_key, table->file->scan_time());

This is where we get index_scan_time=1640 (or 1000 if I force select_limit=1000 like in previous comment).

This totally ignores the fact that the index in question has a potential range access with much lower #rows and cost.

Comment by Sergei Petrunia [ 2014-07-03 ]

So, if we set select select_limit = table->quick_rows[nr] = 192 right before the line quoted above.. that still doesn't help, we get

(gdb) print index_scan_time
  $163 = 192
(gdb) print read_time
  $164 = 22

Comment by Sergei Petrunia [ 2014-07-03 ]

Let's compare cost numbers from different parts of the optimizer:

== range optimizer ==
#rows=192
idx=1: cost=231 # {io_count = 193, avg_io_cost = 1, cpu_cost = 38.409999999999997,
idx=2: cost=231

== ref optimizer (best_access_path() function) ==
#rows is reused from table->quick_rows[key]=192
idx=1: cost=22 # this is because of s->worst_seeks=21
idx=2: cost=22

== ORDER BY optimizer ==
#if we manually set select_limit=table->quick_rows[key]=192
index_scan_time= 192.

== Conclusions ==
it looks like different cost formulas are used across the optimizer.

Comment by Sergei Petrunia [ 2014-07-03 ]

Unification of cost model across the optimizer is definitely outside of the scope of this bugfix. As an interim solution:

  • let range optimizer save quick_cost[idx]\ (like it already saves table->quick_rows[idx])
  • let best_access_path() save costs for ref(const) accesses it considers.
  • let test_if_cheaper_ordering() reuse cost of ref(const) access or range access.
  • test_if_cheaper_ordering() should take into account both limit_rows and table->quick_rows[key].

  range_or_ref_cost= have_const_ref_cost[key]? const_ref_cost[key] :range_cost[key];
  if (limit_rows > table->quick_rows[key]) {
    // we will not have to read limit_rows. Range estimates
    // say that there will be at most table->quick_rows[key] rows
    cost= range_or_ref_cost;
  }
  else
  {
    double fraction_to_read= limit_rows / table->quick_rows[key];
    cost= range_or_ref_cost * fraction_to_read;
  }

Comment by Sergei Petrunia [ 2014-07-07 ]

There are issues with the above approach. If we solve it the simple way, we will need to allocate arrays to store potential ref(const) costs for each index in each used table

We could do some pre-processing to make the set of potentially usable indexes smaller. That would require touching a lot of optimizer code.

How about considering an alternative: re-caculate the costs in test_if_cheaper_ordering(), as requested.

Comment by Sergei Petrunia [ 2014-07-07 ]

In order to re-calculate the costs, we need to tell whether we're caclulating
ref(const) costs or range access costs.

table->quick_rows/quick_costs can provide range access costs and #rows.
As for ref access, we need to be able to detect whether ref(const) was
applicable and what its cost was.

Note that potential range access over one interval is not the same as
ref(const) access. consider

  t.key IN (10, 20) AND t.key > 15

this is a 1-interval range access but ref(const) is not considered for this
query.

Q: Could we reuse best_access_path()?
A: Hardly. We need to consider ref(const) over one specific index. For range access, we
need to consider only one specific index (and not just the best possible range access
like it is done in best_access_path.

Comment by Sergei Petrunia [ 2014-07-08 ]

Committed a patch: http://lists.askmonty.org/pipermail/commits/2014-July/006282.html

Comment by Sergei Petrunia [ 2014-07-10 ]

The patch is going through the QA.

Comment by Sergei Petrunia [ 2014-08-27 ]

See updates at MDEV-6454. After some input from the QA, the patch was modified. It is now being tested again.

One problem with this fix is that it is getting too big for 10.0. It is ok to include in 10.1, but pushing this change into 10.0 may cause an unwanted regression for somebody.

Generated at Thu Feb 08 07:11:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.