[MDEV-17761] Odd optimizer choice with ORDER BY LIMIT and condition selectivity - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Minor
Resolution: Unresolved
Affects Version/s: 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: 11.4, 11.8
Component/s: Optimizer
Labels:
- order-by-optimization

Description

The ORDER BY...LIMIT optimizer displays strange effects when one adds
histogram and enables use_condition_selectivity.

create table ten(a int);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table ten_k(a int);

insert into ten_k select A.a + 1000 *B.a from one_k A, ten B;

create table t12 (

  a int,

  b int,

  c int,

  filler1 char(255),

  filler2 char(255),

  key(a)

);

insert into t12 select a,a,a, a,a from ten_k;

With current default settings and @@optimizer_use_condition_selectivity=1:

mysql> explain extended select * from t12 where b < 5000;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

|    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 9646 |   100.00 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

Ok, filtered=100%, the optimizer has no clue about selectivity.

Now, the ORDER BY ... LIMIT query:

mysql> explain extended select * from t12 where b < 5000 order by a limit 600;

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                       |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+

|    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 9646 |   100.00 | Using where; Using filesort |

+------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+

In order to read 600 rows it will use filesort. (if one uses a lower LIMIT value, e.g. "LIMIT 400", the optimizer will use the
index).

Now, let's give optimizer a clue about the condition selectivity:

set histogram_size=100;

set use_stat_tables=preferably;

set optimizer_use_condition_selectivity=4;

analyze table t12 persistent for columns(b) indexes ();

Now, the optimizer knows about condition selectivity:

mysql> explain extended select * from t12 where b < 5000 ;

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

|    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 10000 |    50.50 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+

The query plan for the ORDER BY...LIMIT query becomes:

mysql> explain extended select * from t12 where b < 5000 order by a limit 600;

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+

| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+

|    1 | SIMPLE      | t12   | index | NULL          | a    | 5       | NULL |  600 |   100.00 | Using where |

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+

The odd parts about this are:
(Minor) filtered=100%, although the optimizer has information about the condition selectivity.

(Major) Why did the query plan change from using filesort to using an index?
selectivity=50% which means we'll need to scan 2x more rows before we find
#LIMIT matching rows.

Attachments

Issue Links

relates to

MDEV-18066 DELETE/UPDATE and ORDER BY ... LIMIT - the choice is not cost-based

Open

MDEV-18073 get_range_limit_read_cost() doesnt adjust LIMIT for the range access

Open

MDEV-18094 Query with order by limit picking index scan over filesort

Closed

MDEV-19808 Add Optimizer Switch for Filesort with Small LIMIT Optimization

Open

mentioned in: Page Failed to load

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia added a comment - 2018-12-22 15:36 - edited

An example with the same use_condition_selectivity setting:

The optimizer has no clue about condition' selectivity, assumes it to be 100%, and intends to use filesort:

MariaDB [j1]> explain  select * from t12 where b+0 < 9000 order by a limit 600;

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                       |

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------+

|    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where; Using filesort |

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------+

Ok, using an index to find 600 matching rows was not a good plan, we've used full table scan and filesort.

Now, change the condition so that the optimizer knows that the condition' selectivity is 0.9.
This means that we will need to read more than 600 rows from the index to get the 600 matches. And baam, we decide to use an index for that:

MariaDB [j1]> explain  select * from t12 where b < 9000 order by a limit 600;

+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+

| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |

+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+

|    1 | SIMPLE      | t12   | index | NULL          | a    | 5       | NULL |  600 | Using where |

+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+

Sergei Petrunia added a comment - 2018-12-22 15:41 - edited

The difference comes from here:

in best_access_path():

both queries reach matching_candidates_in_table() call.
For both, matching_candidates_in_table()=10K
for both, in JOIN_TAB::scan_time(): found_records=10K, read_time=417

then, tmp=417, and we reach here:

/*

          For each record we have to:

          - read the whole table record

          - skip rows which does not satisfy join condition

*/

        tmp= record_count *

          (tmp +

           (s->records - rnd_records)/(double) TIME_FOR_COMPARE);

the "b<9K" case has s->records=rnd_records, so the cost remains 417.
the "b+0<9K" case has s->records=10K , rnd_records=9K, the difference is 1K,
cost is 417 + 100=615.

then, in test_if_cheaper_ordering(), here:

        if ((ref_key < 0 && (group || table->force_index || is_covering)) ||

            index_scan_time < read_time)

index_scan_time=600 for both, read_time=417 or 615, respectively.

To sum up:

When the selectivity is known, the cost of doing a full table scan goes UP! ( because of the s->records - rnd_records code above).

The number of records that the optimizer thinks it will need to enumerate in the index in order to get #LIMIT matches is the same for both cases, the optimizer does not take condition selectivity into account when computing it (it does take table->quick_condition_rows into account but not this one).

Sergei Petrunia added a comment - 2018-12-22 15:41 - edited The difference comes from here: in best_access_path(): both queries reach matching_candidates_in_table() call. For both, matching_candidates_in_table()=10K for both, in JOIN_TAB::scan_time(): found_records=10K, read_time=417 then, tmp=417, and we reach here: /* For each record we have to: - read the whole table record - skip rows which does not satisfy join condition */ tmp= record_count * (tmp + (s->records - rnd_records)/( double ) TIME_FOR_COMPARE); the "b<9K" case has s->records=rnd_records, so the cost remains 417. the "b+0<9K" case has s->records=10K , rnd_records=9K, the difference is 1K, cost is 417 + 100=615. then, in test_if_cheaper_ordering(), here: if ((ref_key < 0 && (group || table->force_index || is_covering)) || index_scan_time < read_time) index_scan_time=600 for both, read_time=417 or 615, respectively. To sum up: When the selectivity is known, the cost of doing a full table scan goes UP! ( because of the s->records - rnd_records code above). The number of records that the optimizer thinks it will need to enumerate in the index in order to get #LIMIT matches is the same for both cases, the optimizer does not take condition selectivity into account when computing it (it does take table->quick_condition_rows into account but not this one).

Sergei Petrunia added a comment - 2019-01-23 13:29 - edited

Pushed a patch that fixes the second point (into 10.4)

Sergei Petrunia added a comment - 2019-01-23 13:29 - edited Pushed a patch that fixes the second point (into 10.4)

Julien Fritsch added a comment - 2023-12-05 09:26

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch added a comment - 2023-12-05 09:26 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2018-11-18 13:26

Updated:: 2025-02-17 10:48

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server