[MDEV-18808] Index_merge intersect plan is picked where ref access would be faster Created: 2019-03-04  Updated: 2020-04-27

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: index_merge

Attachments: File test_order_customer.sql    

 Description   

(I am using 10.4 for debugging but this most likely affects earlier versions too)

Use the attached file to fill the dataset and then we get this table

CREATE TABLE orders (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  customer_id int(10) unsigned DEFAULT NULL,
  is_processed tinyint(3) unsigned DEFAULT NULL,
 
  PRIMARY KEY (id),
  KEY customer_id (customer_id),
  KEY is_processed (is_processed)
) ENGINE=InnoDB;

The query

SELECT COUNT(id) 
FROM 
  ordersis_processed = 1 AND 
  customer_id = 10000;

Produces this query plan:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: index_merge
possible_keys: customer_id,is_processed
          key: customer_id,is_processed
      key_len: 5,2
          ref: NULL
         rows: 9401
        Extra: Using intersect(customer_id,is_processed); Using where; Using index

This is much slower than just doing a ref access on customer_id.

Data distribution for is_processed is as follows

+--------------+---------------------+
| is_processed | count(is_processed) |
+--------------+---------------------+
|            0 |               10542 |
|            1 |             1038034 |
+--------------+---------------------+

so the condition "is_processed=1" has 99% selectivity.
Need to investigate, why does the optimizer choose to use it for index_merge?



 Comments   
Comment by Sergei Petrunia [ 2019-03-04 ]

Index_merge uses "Using index", and is not reading full table rows, while ref access on customer_id would read full table rows.

This should not be relevant in this case, though, because the condition on is_processed has 99% selectivity. It is hard to believe that scanning 99% of the secondary index is cheaper than reading full table rows for rows that match "customer_id=10000".

This is especially true for the example we are looking at - rows in the table `orders` are not large.

Comment by Sergei Petrunia [ 2019-03-04 ]

Looking at optimizer trace:

"analyzing_range_alternatives": {
...
 
    {
      "index": "is_processed",
      "ranges": ["1 <= is_processed <= 1"],
      "rowid_ordered": true,
      "using_mrr": false,
      "index_only": false,
      "rows": 513861,
      "cost": 642741,
      "chosen": false,
      "cause": "cost"
    }

The estimate for #rows on "is_processed=1" is 500K rows.

(Is this a known problem with InnoDB that records_in_range() may return rows_in_table/2, even when the range covers a much greater portion of the index. See MDEV-17111 for example, and it was observed on other occasions, too).

Overly-optimistic estimate from records_in_range() might be one, but not the sole cause of this issue.
Scanning 500K rows as part of index_merge still seems like a poor choice. will continue investigating the optimizer part too.

Comment by Sergei Petrunia [ 2019-03-04 ]

Query execution times:

SELECT COUNT(id) 
FROM orders ignore index(is_processed) 
WHERE is_processed = 1 AND customer_id = 10000

0.04 sec
Handler_read_key +1
Handler_read_next +10,447

SELECT COUNT(id) 
FROM orders
WHERE is_processed = 1 AND customer_id = 10000

0.22 sec
Handler_read_key +2
Handler_read_next +1,048,476

Comment by Jesse [ 2020-04-27 ]

Few more tests

1. Customer with few orders

select count(id) from orders where customer_id IN (9999) and is_processed=1 \G
*************************** 1. row ***************************
count(id): 106
1 row in set (0.002 sec)

2. customer with 10000+ orders and the one above IN

select count(id) from orders where customer_id IN (9999,10000) and is_processed=1 \G
*************************** 1. row ***************************
count(id): 10332
1 row in set (0.038 sec)

3. customer with 10000+ orders and non-existing customer_id IN

select count(id) from orders where customer_id IN (10000, 234897234) and is_processed=1 \G
*************************** 1. row ***************************
count(id): 10226
1 row in set (0.032 sec)

4. and the slow one just for customer with 10000+ orders

select count(id) from orders where customer_id IN (10000) and is_processed=1 \G
*************************** 1. row ***************************
count(id): 10226
1 row in set (0.562 sec)

Which can be fixed with index hint for customer_id or

set session optimizer_switch="index_merge_intersection=off"

Generated at Thu Feb 08 08:46:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.