[MDEV-588] LP:1005898 - index_merge/intersection is used when ref(const) is faster Created: 2012-05-29  Updated: 2017-10-16  Resolved: 2017-10-16

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.2.14, 5.3.12
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Not a Bug Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug1005898.xml    

 Description   

The queries were provided by Stephane Varoqui:

SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+--------+
| amount |
+--------+
|   9552 |
+--------+
1 row in set (1.03 sec)
 
mysql> set optimizer_switch="index_merge=off";Query OK, 0 rows affected (0.00 sec)
mysql> SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+--------+
| amount |
+--------+
|   9552 |
+--------+
1 row in set (0.03 sec)
 
mysql> SELECT SQL_NO_CACHE count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+--------+
| amount |
+--------+
|   9552 |
+--------+

The dataset is lots.tgz, uploaded to ftp.askmonty.org/private/

EXPLAIN outputs and query time:

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type        | possible_keys           | key                     | key_len | ref  | rows | Extra                                                              |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
|  1 | SIMPLE      | lots  | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5     | NULL | 3257 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
1.85 sec.

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(tsClosed) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys  | key            | key_len | ref   | rows  | Extra       |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
|  1 | SIMPLE      | lots  | ref  | contractNumber | contractNumber | 5       | const | 28600 | Using where |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
0.30 sec

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(contractNumber) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key      | key_len | ref   | rows   | Extra                              |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
|  1 | SIMPLE      | lots  | ref  | tsClosed      | tsClosed | 5       | const | 243422 | Using index condition; Using where |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
4.50 sec

As one can see, index_merge/intersect is about 1.85/0.30=6 times slower than ref access on contractNumber.



 Comments   
Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
Let's explore the dataset: here's numbers of matching records for both parts of the WHERE:

Total rows: 2 137 152
lots.tsClosed IS NULL - 544 288 (estimate: 243 422)
contractNumber='1478876' - 30 000 (estimate: 28 600)
contractNumber='1478876' AND lots.tsClosed IS NULL - 10 000 (index_merge's estimate: 3257)

  • index_merge's estimate of 3257 was obtained assuming both parts of WHERE are not correlated. In fact, they're strongly correlated.
  • Estimate for "lots.tsClosed IS NULL" misses the real value by the order of two.
Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
If I put a breakpoint in ha_myisam::records_in_range() and return the exact number for records_in_range("lots.tsClosed IS NULL") call, index_merge will still be used:

MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
--------------------------------------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

--------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 7283 Using intersect(contractNumber,tsClosed); Using where; Using index

--------------------------------------------------------------------------------------------------------------------------------------------------------------+

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
In MariaDB 5.2, I get:

MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876';
---------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

---------------------------------------------------------------------------------------------+

1 SIMPLE lots ref contractNumber contractNumber 5 const 28600 Using where; Using index

---------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE lots.tsClosed IS NULL;
---------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

---------------------------------------------------------------------------------------+

1 SIMPLE lots ref tsClosed tsClosed 5 const 243422 Using where; Using index

---------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
--------------------------------------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

--------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 3257 Using intersect(contractNumber,tsClosed); Using where; Using index

--------------------------------------------------------------------------------------------------------------------------------------------------------------+
contractNumber: 0.30 sec
tsClosed: 4.34 sec
Index_merge: 2.00 sec

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
Conclusion: MariaDB 5.2 is the same as 5.3 for this bug.

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
=== Range analyzer ===
lots.tsClosed IS NULL
(gdb) print found_records
$59 = 243422
(gdb) p cost
$58 =

{io_count = 243423, avg_io_cost = 1, cpu_cost = 48684.410000000003, mem_cost = 0, import_cost = 0}

contractNumber='1478876'
(gdb) print found_records
$60 = 28600
(gdb) p cost
$61 =

{io_count = 28601, avg_io_cost = 1, cpu_cost = 5720.0100000000002, mem_cost = 0, import_cost = 0}

As a result, we have:
(gdb) print read_time
$62 = 34321.010000000002
(gdb) print best_records
$63 = 28600

=== Index Merge
T@10 : | | | | | | | | | | | | info: ROR key scans (original): tsClosed,contractNumber
T@10 : | | | | | | | | | | | | info: ROR key scans (ordered): contractNumber,tsClosed

T@10 : | | | | | | | | | | | >ror_intersect_add
T@10 : | | | | | | | | | | | | info: Current out_rows= 2.13715e+06
T@10 : | | | | | | | | | | | | info: Adding scan on contractNumber
T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
T@10 : | | | | | | | | | | | | >ror_scan_selectivity
T@10 : | | | | | | | | | | | | | info: sel_arg step
T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.0133823
T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.0133823
T@10 : | | | | | | | | | | | | <ror_scan_selectivity
(gdb) p intersect->out_rows
$69 = 28600
(gdb) p intersect->index_records
$70 = 28600
(gdb) p intersect->index_scan_costs
$71 = 602.50860420650099
(gdb) p intersect->total_cost
$72 = 17919.86261827108

T@10 : | | | | | | | | | | | >ror_intersect_add
T@10 : | | | | | | | | | | | | info: Current out_rows= 28600
T@10 : | | | | | | | | | | | | info: Adding scan on tsClosed
T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
T@10 : | | | | | | | | | | | | >ror_scan_selectivity
T@10 : | | | | | | | | | | | | | info: sel_arg step
T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.1139
T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.1139
T@10 : | | | | | | | | | | | | <ror_scan_selectivity
T@10 : | | | | | | | | | | | | info: ROR-intersect is covering now
(gdb) p intersect->out_rows
$73 = 3257.5451816248915
(gdb) p intersect->index_records
$74 = 272022
(gdb) p intersect->index_scan_costs
$75 = 5723.2619502868065
(gdb) p intersect->total_cost
$76 = 5723.2619502868065

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
Final comparison of range-vs-index_merge:
(gdb) print min_cost ## this is index_merge
$77 = 5723.2619502868065
(gdb) print read_time ## this is read_time
$78 = 34321.010000000002

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
Index_merge code assume conditions to be uncorrelated and "tsClosed IS NULL" adds selectivity of " Selectivity multiplier: 0.1139".

The real value of the multiplier is 0.3. If I set that, I ge this plan:

MariaDB [lots]> explain  SELECT count(*) AS amount FROM lots   WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type        | possible_keys           | key                     | key_len | ref  | rows | Extra                                                              |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
|  1 | SIMPLE      | lots  | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5     | NULL | 8580 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+

#rows is closer to reality now (8580 is closer to the real value of 10,000), but still, index_merge is chosen.

Comment by Sergei Petrunia [ 2012-05-29 ]

Re: index_merge/intersection is used when ref(const) is faster
The problem seems to be the mismatch between range and index_merge costs:

Range access costs

  • are calculated in handler::multi_range_read_info_const(), the formula is roughly

handler::read_time() + #rows / TIME_FOR_COMPARE

index_merge costs:

  • are calculated in opt_range.cc.
  • for a 2-way intersection over a MyISAM table the formula is roughly:

handler->keyread_time(index1, ...) + handler->keyread_time(index2, ...) + get_sweep_read_cost(...)

note that the formula only includes index costs, #rows / TIME_FOR_COMPARE was not added.

Comment by Rasmus Johansson (Inactive) [ 2012-05-29 ]

Launchpad bug id: 1005898

Comment by Sergei Golubchik [ 2017-09-25 ]

psergey, is that still an issue?

Generated at Thu Feb 08 06:29:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.