[MDEV-25707] SELECT queries not using index after upgrade to 10.5.9 (from 10.3.21) Created: 2021-05-17 Updated: 2022-01-06 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.5.9 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maria M Pflaum | Assignee: | Sergei Petrunia |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | eq_range_index_dive_limit, optimizer-hard | ||
| Environment: |
RHEL8 |
||
| Attachments: |
|
| Description |
|
The following type of query is using where in EXPLAIN instead of index for some large quantity of OR's (i.e. on 10.5.9 uses WHERE, on 10.3.x uses INDEX also on 10.5.9 takes many minutes, on 10.3.x takes seconds) query: SELECT col2 FROM mytable WHERE col1=a OR's col1=b ... with lots of OR's If we force the query to use the index, "FORCE INDEX(mytable_col1_idx')", then the 10.5.9 query will finish in time similar to 10.3.x See attached files to reproduce. |
| Comments |
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
I am trying on current 10.5 (10.5.11) and I see this plan:
I get the same on 10.5.9. | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
Table def
The table in the test dataset has 1.7M rows, the query is an OR of 5661 conditions in form col1=NNN OR .... | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
The number of range exceeds eq_range_index_dive_limit, so the estimate of #values in one range is taken from the index statistics. Could ANALYZE TABLE help here? | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
Ok, I was able to reproduce.
Relevant portion of Optimizer trace:
That is, it is not picking the range access because it thinks its cost is bigger than that of full table scan. The number of rows it expects: 416694 | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
Index statistics:
| ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
The estimate of 74 rows per lookup key agrees with the real dataset. However the query itself will return only 107 rows. | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-05-31 ] | ||||||||||||||||||||||||||||||
|
Suggested workaround: set eq_range_index_dive_limit to a value that's higher than the number of elements in the IN-list (5641). Or, set it it to 0, which was the default before MariaDB 10.4.2. | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-06-04 ] | ||||||||||||||||||||||||||||||
|
Note: MySQL-8 suffers from the same issue (with innodb: except when one runs the query after loading the data. After one runs ANALYZE TABLE, the bad query plan is chosen). PostgreSQL picks a good query plan, but it will switch to poor query plan if the query has more "col1=const" disjuncts. | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-06-09 ] | ||||||||||||||||||||||||||||||
|
Notes from the optimizer call: | ||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-06-24 ] | ||||||||||||||||||||||||||||||
The causeThis bug is a consequence of eq_range_index_dive_limit optimization (let's denote it as EQ-OPT). The goal of EQ-OPT was to speed up the query optimization (right! optimization, not execution) by using index statistics estimates (this is table-wide average, cheap to use but imprecise), instead of using so-called "records-in-range estimates" (more precise but expensive to get) for queries that scan a lot of equality ranges. Using imprecise estimates can cause the optimizer to pick a bad query plan. It happens when the reality is far from the estimate. The impactThe optimizer will use bad query plans for workloads that scan the "outliers" - values that are much more or much less common than others. The workaroundSetting the eq_range_index_dive_limit will cause the optimizer to spend more time but construct better query plans. |