[MDEV-29151] Index choice depends on index definition order Created: 2022-07-22  Updated: 2023-10-20  Resolved: 2023-10-06

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4.22, 10.4.26, 10.5.17
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Oleg Smirnov
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File MDEV-29151-pages_accessed.test     File wip-newph.test    
Issue Links:
Relates
relates to MDEV-32286 ANALYZE displays a huge number of Inn... Confirmed
relates to MDEV-32358 Improve optimizer cost model to take ... Open

 Description   

Consider this primitive example:

MariaDB [test]> drop table t;
Query OK, 0 rows affected (0.012 sec)
 
MariaDB [test]> create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int, key i1(c1), key i2(c1,c2,c3,c4));
Query OK, 0 rows affected (0.025 sec)
 
MariaDB [test]> explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
|    1 | SIMPLE      | t     | ref  | i1,i2         | i1   | 5       | const | 1    | Using where |
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.001 sec)
 
MariaDB [test]> drop table t;
Query OK, 0 rows affected (0.020 sec)
 
MariaDB [test]> create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int, key i2(c1,c2,c3,c4), key i1(c1));
Query OK, 0 rows affected (0.029 sec)
 
MariaDB [test]> explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
+------+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra                 |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
|    1 | SIMPLE      | t     | ref  | i2,i1         | i2   | 10      | const,const | 1    | Using index condition |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
1 row in set (0.001 sec)

It shows that optimizer choice depends on order of indexes definition and may be suboptimal. On real data using multiple column index may give much better performance (or not), the choice should be statistics-based and deterministic no matter in what order the indexes are added.



 Comments   
Comment by Sergei Petrunia [ 2023-09-17 ]

Does index statistics indicate that one index is better than the other here? If it does, then indeed it's a shortcoming in the optimizer.
We also need to check if newer versions still have this property or not.

oleg.smirnov, can you investigate please?

Comment by Oleg Smirnov [ 2023-09-20 ]

I confirm this bug is present in the recent 10.4. At least a few rows of data are required to reproduce the initial example:

--source include/have_sequence.inc
 
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
                key i1(c1), key i2(c1,c2,c3,c4));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
drop table t;
 
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
                key i2(c1,c2,c3,c4), key i1(c1));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
drop table t;

Comment by Oleg Smirnov [ 2023-09-22 ]

Sorry, my last confirmation of the bug was not correct. If the costs of some execution plans are the same, the optimizer may choose any one of them. In this case the order indexes were created affects which one is chosen.

Moreover, adding more realistic data to the dataset makes the optimizer choose index i2 in both cases:

create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
                key i1(c1), key i2(c1,c2,c3,c4));
insert into t select seq, seq/10, seq/100, seq/10, seq, seq from seq_1_to_10000;
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5) group by c1;
drop table t;
 
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
               key i2(c1,c2,c3,c4), key i1(c1));
insert into t select seq, seq/10, seq/100, seq/10, seq, seq from seq_1_to_10000;
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5) group by c1;
drop table t;

explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5) group by c1;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t	ref	i2,i1	i2	10	const,const	1	Using index condition

So such a simplification of the customer issue if not completely valid.

Comment by Valerii Kravchuk [ 2023-09-22 ]

The following test case based on your previous one with ANALYZE TABLE added produces different results on recent enough 10.5 (no dependency on order of indexes creation) vs 10.4:

Yuliyas-Air:mysql-test Valerii$ ./mtr innodb.mdev29151
Logging: ./mtr  innodb.mdev29151
VS config: 
vardir: /Users/Valerii/dbs/maria10.5/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/Users/Valerii/dbs/maria10.5/mysql-test/var'...
Checking supported features...
MariaDB Version 10.5.23-MariaDB
 - SSL connections supported
 - binaries built with wsrep patch
Collecting tests...
Installing system database...
 
==============================================================================
 
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
key i1(c1), key i2(c1,c2,c3,c4));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
analyze table t;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	Table is already up to date
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t	ref	i1,i2	i1	5	const	1	Using where
drop table t;
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
key i2(c1,c2,c3,c4), key i1(c1));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
analyze table t;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	Table is already up to date
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t	ref	i2,i1	i1	5	const	1	Using where
drop table t;
innodb.mdev29151                         [ pass ]     50
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.050 of 11 seconds executing testcases
 
Completed: All 1 tests were successful.
 
Yuliyas-Air:mysql-test Valerii$ cd -
/Users/Valerii/dbs/maria10.4/mysql-test
Yuliyas-Air:mysql-test Valerii$ ./mtr innodb.mdev29151
Logging: ./mtr  innodb.mdev29151
VS config: 
vardir: /Users/Valerii/dbs/maria10.4/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/Users/Valerii/dbs/maria10.4/mysql-test/var'...
Checking supported features...
MariaDB Version 10.4.32-MariaDB
 - SSL connections supported
 - binaries built with wsrep patch
Collecting tests...
Installing system database...
 
==============================================================================
 
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
key i1(c1), key i2(c1,c2,c3,c4));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
analyze table t;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	Table is already up to date
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t	ref	i1,i2	i1	5	const	1	Using where
drop table t;
create table t (id int primary key, c1 int, c2 int, c3 int, c4 int, c5 int,
key i2(c1,c2,c3,c4), key i1(c1));
insert into t select seq, seq+1, seq+2, seq+3, seq+4, seq+5 from seq_1_to_10;
analyze table t;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	Table is already up to date
explain select * from t where c1 = 1 and c2 = 1 and (c3 = 0 or c4 = 5);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t	ref	i2,i1	i2	10	const,const	1	Using index condition
drop table t;
innodb.mdev29151                         [ pass ]     36
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.036 of 10 seconds executing testcases
 
Completed: All 1 tests were successful.

I'd say 10.4 still has a problem to resolve.

Comment by Oleg Smirnov [ 2023-09-23 ]

Here is a sample script intended to be more or less close to the customer's data:

CREATE TABLE t1 (
 `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `domain` char(2) NOT NULL,
 `cnk` mediumint(7) unsigned zerofill NOT NULL,
 `start_date` datetime NOT NULL,
 `end_date` datetime DEFAULT NULL,
 `promo_type` tinyint(1) unsigned NOT NULL,
 `sale_price` decimal(8,2) unsigned NOT NULL,
 `sale_price_eur` decimal(8,2) unsigned NOT NULL,
 `reference_price` decimal(8,2) DEFAULT NULL,
 `reference_price_eur` decimal(8,2) DEFAULT NULL,
 `crossed_price` decimal(8,2) DEFAULT NULL,
 `crossed_price_eur` decimal(8,2) DEFAULT NULL,
 `price_indication_directive` decimal(8,2) DEFAULT NULL,
 `currency_code` char(3) NOT NULL,
 `conversion_rate` decimal(8,4) NOT NULL,
 `force_medical` tinyint(1) unsigned NOT NULL DEFAULT 0,
 `true_promo` tinyint(1) unsigned NOT NULL DEFAULT 0,
 `min_stock` int(10) unsigned NOT NULL DEFAULT 1,
 `remark` text DEFAULT NULL,
 `creation_date` datetime NOT NULL DEFAULT current_timestamp(),
 `import_file_name` varchar(255) NOT NULL,
 `status` tinyint(1) unsigned NOT NULL DEFAULT 1 COMMENT '1 enable, 0 disable',
 `row_start` datetime NOT NULL DEFAULT current_timestamp(),
 `row_end` datetime NOT NULL DEFAULT current_timestamp(),
 PRIMARY KEY (`id`),
 KEY `cnk` (`cnk`),
 KEY `promo_type` (`promo_type`),
 KEY `create` (`creation_date`),
 KEY `cnk_domain_promo_type` (`cnk`,`domain`,`promo_type`,`start_date`,`end_date`,`status`) USING BTREE,
 KEY `domain` (`domain`),
 KEY `active_epims` (`domain`,`status`,`start_date`,`end_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
create table domains (id int primary key, domain char(2));
insert into domains values (1, 'ab'),(2, 'ac'),(3, 'ad'), (4, 'ba'), (5, 'bb'),
  (6, 'be'), (7, 'ca'), (8, 'cb'), (9, 'cc'), (10, 'cd');
 
insert into t1
  (id, domain, 
  status, start_date, end_date, cnk,
  promo_type, sale_price, sale_price_eur, currency_code, conversion_rate, force_medical,
  true_promo, min_stock, creation_date, import_file_name)
select 
  seq, (select domain from domains d where d.id = (s.seq mod 10 + 1)), 
  seq mod 2, date_add('2022-07-18 00:00:00', interval s.seq/100 minute), 
    date_add('2022-07-18 00:00:00', interval s.seq/100 minute), seq/100,
  seq mod 2 + 1, seq/1000, seq/1000, 'EUR', 1.07, 0,
  1, seq/100, date_add('2022-07-16 00:00:00', interval s.seq minute), 'filename1.txt'
  from seq_1_to_1000000 s;
 
analyze table t1 persistent for all;

First let's disable Index Condition Pushdown:

SET optimizer_switch='index_condition_pushdown=off';

and run the statement:

analyze format=json SELECT SQL_NO_CACHE
 `cnk`, `domain`, IFNULL(`reference_price`, 0), -1
FROM t1
WHERE `domain` = 'be' AND status = 1 AND 
  (`start_date`= '2022-07-18 00:00:00' OR `end_date` = '2022-07-18 00:00:00')
GROUP BY cnk;

Optimizer chooses index "domain":

| {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 3678.5,
    "table": {
      "table_name": "t1",
      "access_type": "ref",
      "possible_keys": ["domain", "active_epims"],
      "key": "domain",
      "key_length": "6",
      "used_key_parts": ["domain"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 179670,
      "r_rows": 100000,
      "r_total_time_ms": 3635.9,
      "filtered": 8.9838,
      "r_filtered": 0.005,
      "attached_condition": "t1.`status` = 1 and t1.domain = 'be' and (t1.start_date = TIMESTAMP'2022-07-18 00:00:00' or t1.end_date = TIMESTAMP'2022-07-18 00:00:00')"
    }
  }
} |
1 row in set (3.677 sec)

If we force using the multi-column index "active_epims":

analyze format=json SELECT SQL_NO_CACHE
 cnk, domain, IFNULL(reference_price, 0), -1
 FROM t1
 FORCE INDEX(active_epims)
 WHERE domain = 'be' AND status = 1 AND 
   (start_date= '2022-07-18 00:00:00' OR end_date = '2022-07-18 00:00:00')
GROUP BY cnk;

, the results are slightly worse:

  
"query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 3791.9,
    "table": {
      "table_name": "t1",
      "access_type": "ref",
      "possible_keys": ["active_epims"],
      "key": "active_epims",
      "key_length": "7",
      "used_key_parts": ["domain", "status"],
      "ref": ["const", "const"],
      "r_loops": 1,
      "rows": 181048,
      "r_rows": 100000,
      "r_total_time_ms": 3753.5,
      "filtered": 100,
      "r_filtered": 0.005,
      "attached_condition": "t1.domain = 'be' and (t1.start_date = TIMESTAMP'2022-07-18 00:00:00' or t1.end_date = TIMESTAMP'2022-07-18 00:00:00')"
    }
  }
} |
1 row in set (3.789 sec)

Let's enable the Index Condition Pushdown back:

SET optimizer_switch='index_condition_pushdown=on';

And the performance of the query where the index "active_epims" is forced increases drastically:

  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 205.35,
    "table": {
      "table_name": "t1",
      "access_type": "ref",
      "possible_keys": ["active_epims"],
      "key": "active_epims",
      "key_length": "7",
      "used_key_parts": ["domain", "status"],
      "ref": ["const", "const"],
      "r_loops": 1,
      "rows": 181048,
      "r_rows": 5,
      "r_total_time_ms": 205.28,
      "filtered": 100,
      "r_filtered": 100,
      "index_condition": "t1.domain = 'be' and (t1.start_date = TIMESTAMP'2022-07-18 00:00:00' or t1.end_date = TIMESTAMP'2022-07-18 00:00:00')"
    }
  }
} |
1 row in set (0.208 sec)

In fact, ICP (Index Condition Pushdown) very effectively filters out rows leaving only 5 of them ("r_rows": 5). But the problem is the optimizer does not take it into account (or does it incorrectly), and the optimizer trace shows that the cost of using both indexes equals to 1001. Since the costs are equal, optimizer chooses the first index from the dictionary in order of creation, so it's either "domain" or "active_epims" which is observed by the customer.

Now there is the question should the optimizer evaluate the cost of ICP and is it possible at all?

Comment by Sergei Petrunia [ 2023-09-25 ]

The first thing to take into account here is selectivity of pushed index condition. Which will affect the number of full rows the query reads, which affects the execution speed.

The optimizer is not currently able to take into account the selectivity of pushed index condition. It won't be easy to add this, as pushed index condition is computed after the query plan is picked.

The selective part of pushed condition here is:

(t1.start_date = TIMESTAMP'2022-07-18 00:00:00' or t1.end_date = TIMESTAMP'2022-07-18 00:00:00')

Note the "OR", one will need to combine selectivities of both OR-ed conditions...

If one had histograms on both columns... one could compute the selectivity of condition like the above... But again: it would be rather hard to use the selectivity value as Pushed Index Condition is inferred from the WHERE clause after the join order has been picked.

Comment by Oleg Smirnov [ 2023-09-25 ]

Running same tests on 11.3: optimizer chooses full scan of "t1" instead of using any indexes:

MariaDB [test]> analyze SELECT SQL_NO_CACHE  cnk, domain, IFNULL(reference_price, 0), -1 FROM t1 WHERE domain = 'be' AND status = 1 AND    (start_date= '2022-07-18 00:00:00' OR end_date = '2022-07-18 00:00:00') GROUP BY cnk;
+------+-------------+-------+------+---------------------+------+---------+------+---------+------------+----------+------------+----------------------------------------------+
| id   | select_type | table | type | possible_keys       | key  | key_len | ref  | rows    | r_rows     | filtered | r_filtered | Extra                                        |
+------+-------------+-------+------+---------------------+------+---------+------+---------+------------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | domain,active_epims | NULL | NULL    | NULL | 1000000 | 1000000.00 |    18.35 |       0.00 | Using where; Using temporary; Using filesort |
+------+-------------+-------+------+---------------------+------+---------+------+---------+------------+----------+------------+----------------------------------------------+
1 row in set (4.102 sec)
 
MariaDB [test]> analyze SELECT SQL_NO_CACHE  cnk, domain, IFNULL(reference_price, 0), -1 FROM t1 FORCE INDEX(domain) WHERE domain = 'be' AND status = 1 AND  (start_date= '2022-07-18 00:00:00' OR end_date = '2022-07-18 00:00:00') GROUP BY cnk;
+------+-------------+-------+------+---------------+--------+---------+-------+--------+-----------+----------+------------+----------------------------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows   | r_rows    | filtered | r_filtered | Extra                                        |
+------+-------------+-------+------+---------------+--------+---------+-------+--------+-----------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | t1    | ref  | domain        | domain | 6       | const | 199378 | 100000.00 |    50.00 |       0.01 | Using where; Using temporary; Using filesort |
+------+-------------+-------+------+---------------+--------+---------+-------+--------+-----------+----------+------------+----------------------------------------------+
1 row in set (1.923 sec)
 
MariaDB [test]> analyze SELECT SQL_NO_CACHE  cnk, domain, IFNULL(reference_price, 0), -1 FROM t1 FORCE INDEX(active_epims) WHERE domain = 'be' AND status = 1 AND  (start_date= '2022-07-18 00:00:00' OR end_date = '2022-07-18 00:00:00') GROUP BY cnk;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-----------+----------+------------+----------------------------------------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | r_rows    | filtered | r_filtered | Extra                                        |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-----------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | t1    | ref  | active_epims  | active_epims | 7       | const,const | 183462 | 100000.00 |   100.00 |       0.01 | Using where; Using temporary; Using filesort |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-----------+----------+------------+----------------------------------------------+
1 row in set (1.931 sec)

Comment by Marko Mäkelä [ 2023-09-28 ]

MDEV-29151-pages_accessed.test is a simplified and smaller variant of wip-newph.test that shows a huge difference on InnoDB pages_accessed: 200079 for the first query and 28 for the second query.

Comment by Marko Mäkelä [ 2023-09-28 ]

I debugged the pages_accessed anomaly. It is a bug in MDEV-31577, but outside InnoDB. InnoDB is merely incrementing a thread-local counter. I tested it by removing the DROP TABLE t1 statement from the end of MDEV-29151-pages_accessed.test and then starting a new server on the data directory that was left behind by the test. I did disassemble buf_page_get_low to find the place where the thread-local counter is being incremented (easy: on AMD64 Linux, thread-local variables use the fs register):

   0x000055555668f85b <+107>:	call   0x555556664080 <__tls_init()>
   0x000055555668f860 <+112>:	mov    %fs:0x0,%rax
   0x000055555668f869 <+121>:	lea    -0x78(%rax),%rax
   0x000055555668f870 <+128>:	mov    %r12,%rdi
   0x000055555668f873 <+131>:	mov    0x3f46d6(%rip),%rcx        # 0x555556a83f50 <buf_pool+16784>
   0x000055555668f87a <+138>:	xor    %edx,%edx
   0x000055555668f87c <+140>:	shr    $0x20,%rdi
   0x000055555668f880 <+144>:	mov    (%rax),%rax
   0x000055555668f883 <+147>:	incq   (%rax)

I set a breakpoint to the incq instruction and then set a hardware watch on the address that $rax would point to. Then I would execute the analyze format=json select … that does not use FORCE INDEX. I observed that the counter would never be reset between statements, and InnoDB would increment it for each invocation of the statement from where it was left, by 27 steps.

The bug is that ANALYZE FORMAT=JSON is reporting the wrong counter value. Even if I use FORCE INDEX, the thread-local counter will not be reset; InnoDB will just keep incrementing it. I guess that the SQL layer is reporting the difference between the previous value and the new value. That difference is being computed incorrectly for the funnily acting statement.

Even for the ANALYZE FORMAT=JSON…FORCE INDEX statement, I see something funny. InnoDB would increment the counter by 26, but the output claims pages_accessed to be 2 more. I also got pages_accessed reported as 3 for a simple ANALYZE FORMAT=JSON SELECT * FROM t1 LIMIT 1;. If there are no BLOBs, that should be a single page access.

oleg.smirnov, please file a separate bug for these anomalies and make sure that some mtr coverage of pages_accessed will be added in the regression test.

Comment by Oleg Smirnov [ 2023-10-06 ]

MDEV-32286 has been created for the pages_accessed anomaly, otherwise it seems to be nothing to fix here. To resume: better performance of an access by key was caused by an effective ICP, but the optimizer doesn't evaluate the ICP impact when calculating costs (MDEV-32358 created).

What about 11.0 and higher, where the optimizer prefers full scan of the table instead of using any of the indexes: on a release build the difference of the execution times is not significant, so I don't think this can be considered as a bug.

Generated at Thu Feb 08 10:06:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.