[MDEV-29121] Optimizer does not use the index for SELECT while it gives notable improvement in the execution time when FORCEd Created: 2022-07-18  Updated: 2023-10-30

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: None

Type: Task Priority: Major
Reporter: Valerii Kravchuk Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File mdev29121-try-adding-indexes.sql     Text File optimizer_trace_explain_without_force_index_2022-07-18.txt    
Issue Links:
Relates
relates to MDEV-32358 Improve optimizer cost model to take ... Open

 Description   

For the specific table and data:

> show create table mll_app_magna_pure
Create Table: CREATE TABLE `mll_app_magna_pure` (
`mp_id` char(36) NOT NULL DEFAULT '',
`mp_create_date` bigint(20) unsigned NOT NULL,
`mp_create_user` varchar(40) NOT NULL DEFAULT '',
`mp_change_date` bigint(20) unsigned NOT NULL,
`mp_change_user` varchar(40) NOT NULL DEFAULT '',
`mp_valid_from` bigint(20) unsigned NOT NULL,
`mp_valid_to` bigint(20) unsigned NOT NULL,
`mp_outputplate` varchar(30) DEFAULT NULL,
`mp_transferplate` varchar(20) DEFAULT NULL,
KEY `mp_id_idx` (`mp_id`),
KEY `mp_transferplate_idx` (`mp_valid_to`,`mp_transferplate`,`mp_outputplate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 
> select count(*) from mll_app_magna_pure;
1408028
 
> select count(*) from mll_app_magna_pure where mp_valid_to=99991231235555;
806751
 
> SELECT count(*) FROM mll_app_magna_pure WHERE mp_valid_to=99991231235555 AND mp_transferplate='0502212000095962';
0

the index is not used for this query:

> ANALYZE FORMAT=JSON
SELECT SQL_NO_CACHE * FROM mll_app_magna_pure WHERE mp_valid_to=99991231235555 AND (mp_transferplate='0502212000095962' OR mp_outputplate='0502212000095962')\G
ANALYZE
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 4160.197218,
"table": {
"table_name": "mll_app_magna_pure",
"access_type": "ALL",
"possible_keys": [
"mp_sample_no_idx",
"mp_valid_to",
"mp_status_barcode_idx",
"mp_exas_id_idx",
"mp_cdna_charge_idx",
"mp_transferplate_idx",
"mp_mpr_no_idx"
],
"r_loops": 1,
"rows": 1364085,
"r_rows": 1412113,
"r_table_time_ms": 3975.964708,
"r_other_time_ms": 184.1642187,
"filtered": 49.99996185,
"r_filtered": 0.00325753,
"attached_condition": "mll_app_magna_pure.mp_valid_to = 99991231235555 and (mll_app_magna_pure.mp_transferplate = '0502212000095962' or mll_app_magna_pure.mp_outputplate = '0502212000095962')"
}
}
}

even though WHERE clause is covered by the index (in a way) and the index is clearly beneficial when forced:

> ANALYZE FORMAT=JSON
SELECT SQL_NO_CACHE * FROM mll_app_magna_pure FORCE INDEX (`mp_transferplate_idx`) WHERE mp_valid_to=99991231235555 AND (mp_transferplate='0502212000095962' OR mp_outputplate='0502212000095962');
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 162.173876,
"table": {
"table_name": "mll_app_magna_pure",
"access_type": "ref",
"possible_keys": ["mp_transferplate_idx"],
"key": "mp_transferplate_idx",
"key_length": "8",
"used_key_parts": ["mp_valid_to"],
"ref": ["const"],
"r_loops": 1,
"rows": 632958,
"r_rows": 46,
"r_table_time_ms": 161.9810586,
"r_other_time_ms": 0.135228766,
"filtered": 100,
"r_filtered": 100,
"index_condition": "mll_app_magna_pure.mp_transferplate = '0502212000095962' or mll_app_magna_pure.mp_outputplate = '0502212000095962'"
}
}
}

and is used by the optimizer for other content of the table:

MariaDB [test]> explain SELECT SQL_NO_CACHE * FROM mll_app_magna_pure WHERE mp_valid_to=99991231235555 AND (mp_transferplate='0502212000095962' OR mp_outputplate='0502212000095962');
+------+-------------+--------------------+------+----------------------+----------------------+---------+-------+------+-----------------------+
| id   | select_type | table              | type | possible_keys        | key                  | key_len | ref   | rows | Extra                 |
+------+-------------+--------------------+------+----------------------+----------------------+---------+-------+------+-----------------------+
|    1 | SIMPLE      | mll_app_magna_pure | ref  | mp_transferplate_idx | mp_transferplate_idx | 8       | const | 1    | Using index condition |
+------+-------------+--------------------+------+----------------------+----------------------+---------+-------+------+-----------------------+
1 row in set (0.037 sec)

The statistics for the table in the real case (InnoDB persistent one) is close to reality and was collected over as many sample pages as we have data pages. But looks like number of rows that match the condition on the first column in the index forces optimizer to ignore it.



 Comments   
Comment by Sergei Petrunia [ 2022-10-09 ]

The report says:

the index is not used for this query:

> ANALYZE FORMAT=JSON
SELECT SQL_NO_CACHE * 
FROM mll_app_magna_pure 
WHERE 
  mp_valid_to=99991231235555 AND 
  (mp_transferplate='0502212000095962' OR mp_outputplate='0502212000095962')\G

but then:

and is used by the optimizer for other content of the table:

MariaDB [test]> 
explain SELECT SQL_NO_CACHE * 
FROM mll_app_magna_pure 
WHERE 
  mp_valid_to=99991231235555 AND 
  (mp_transferplate='0502212000095962' OR mp_outputplate='0502212000095962');

(I've formatted the query in both cases).

But it's the same query? Please clarify this.

Comment by Sergei Petrunia [ 2022-10-09 ]

There is only one index that's usable for the query:

KEY `mp_transferplate_idx` (`mp_valid_to`,`mp_transferplate`,`mp_outputplate`)

The WHERE condition is:

WHERE 
  mp_valid_to=99991231235555 AND -- keypart#1 
  (mp_transferplate='0502212000095962'  -- keypart#2
   OR 
   mp_outputplate='0502212000095962') -- keypart#3

.

range access is only able to use the first restriction:

mp_valid_to=99991231235555

it matches 806K rows (out of 1.4M rows in the table).

Comment by Sergei Petrunia [ 2022-10-09 ]

however, as ANALYZE output with FORCE INDEX shows, the whole WHERE is very selective, it matches just 46 rows:

  "key": "mp_transferplate_idx",
  "key_length": "8",
  "used_key_parts": ["mp_valid_to"],
  "ref": ["const"],
  "rows": 632958,
  "r_rows": 46,
  "r_table_time_ms": 161.9810586,
  "index_condition": "mll_app_magna_pure.mp_transferplate = '0502212000095962' 
                     or mll_app_magna_pure.mp_outputplate = '0502212000095962'"

That is, the pushed index condition is very selective. It filters out most of the rows.

Comment by Sergei Petrunia [ 2022-10-09 ]

One can imagine the optimizer figuring out that ICP condition would be very selective but I don't expect us being able to add this feature soon (and even if we added it, it would require some extra statistics).

Comment by Sergei Petrunia [ 2022-10-09 ]

Why not just add indexes?

If the conditions on mp_transferplate or mp_outputplate are always very selective, use:

alter table mll_app_magna_pure add key (mp_transferplate);
alter table mll_app_magna_pure add key (mp_outputplate);

Alternatively can add one index

alter table mll_app_magna_pure (`mp_valid_to`,`mp_outputplate`);

Then, index_merge will be used and I expect the queries to run very fast.

Comment by Valerii Kravchuk [ 2022-10-10 ]

I've asked customer to provide some outputs to clarify the impact of your new indexes suggested. But I think that adding some statistics to help optimizer decide that ICP in this case would produce a very selective condition is a way to proceed here, no matter what workaround (FORCE or new indexes) may help in this specific case.

Comment by Ralf Gebhardt [ 2022-10-18 ]

psergei, I am changing this to a tasks for now with a lower priority, given that a solution was provided. Please comment if this is a edge case use case.

Generated at Thu Feb 08 10:06:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.