[MDEV-38164] Poor cost calculations for index access cause bad query plans for big VARCHARs in 11.x - Jira

XML

Word

Printable

Details

Type: Bug
Status: In Progress (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.4
Fix Version/s: 11.4
Component/s: Optimizer
Labels:
None

Description

This comes from customer case analysis in TODO-5687.

Consider a table which defines a lot of long VARCHAR columns but the values are much smaller than the maximum length (this is fairly common):

create table t1 (

  pk1 int not null,

  pk2 int not null,

  filler1 varchar(500),

  filler2 varchar(500),

  filler3 varchar(500),

  filler4 varchar(500),

  filler5 varchar(500),

  filler6 varchar(500),

  filler7 varchar(500),

  filler8 varchar(500),

  filler9 varchar(500),

  fillerA varchar(500),

  fillerB varchar(500),

  fillerC varchar(500),

  fillerD varchar(500),

  fillerE varchar(500),

  fillerF varchar(500),

  primary key(pk1, pk2)

) collate utf8mb4_general_ci engine=innodb;

insert into t1 (pk1, pk2) select

  mod(seq, 8), seq

from

  seq_1_to_10000;

analyze table t1;

The table has 10K rows, let's use a condition on PK prefix to select 12.5% of that:

explain select * from t1 where pk1=1;

Surprisingly, this will pick full table scan:

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra

1       SIMPLE  t1      ALL     PRIMARY NULL    NULL    NULL    10000   Using where

Optimizer trace will show that

estimated record numbers are OK for both access methods
estimated cost of full scan is OK

estimated cost for RANGE is very high:

                "range_analysis": {

                  "table_scan": {

                    "rows": 10000,

                    "cost": 1.6673536

},

                  "potential_range_indexes": [

                      "index": "PRIMARY",

                      "usable": true,

                      "key_parts": ["pk1", "pk2"]

],

                  "setup_range_conditions": [],

                  "analyzing_range_alternatives": {

                    "range_scan_alternatives": [

                        "index": "PRIMARY",

                        "ranges": ["(1) <= (pk1) <= (1)"],

                        "rowid_ordered": true,

                        "using_mrr": false,

                        "index_only": false,

                        "rows": 1250,

                        "cost": 2.82014029,

                        "chosen": false,

                        "cause": "cost"

],

The reason for this is that the cost of range access over clustered PK is computed as cost of reading N index tuples of TABLE_SHARE::stored_rec_length size. This number is a worst-case upper-bound estimate. It is very far from reality in this case.

The cost of full table scan is computed using handler->stats.data_file_length estimate, which does account for values being much smaller than their maximum possible size.

Attachments

Activity

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 4 days ago 11:49

Updated:: 6 hours ago

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.