Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38164

Poor cost calculations for index access cause bad query plans for big VARCHARs in 11.x

    XMLWordPrintable

Details

    • Bug
    • Status: In Progress (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.4
    • 11.4
    • Optimizer
    • None

    Description

      This comes from customer case analysis in TODO-5687.

      Consider a table which defines a lot of long VARCHAR columns but the values are much smaller than the maximum length (this is fairly common):

      create table t1 (
        pk1 int not null,
        pk2 int not null,
        filler1 varchar(500),
        filler2 varchar(500),
        filler3 varchar(500),
        filler4 varchar(500),
        filler5 varchar(500),
        filler6 varchar(500),
        filler7 varchar(500),
        filler8 varchar(500),
        filler9 varchar(500),
        fillerA varchar(500),
        fillerB varchar(500),
        fillerC varchar(500),
        fillerD varchar(500),
        fillerE varchar(500),
        fillerF varchar(500),
        primary key(pk1, pk2)
      ) collate utf8mb4_general_ci engine=innodb;
       
      insert into t1 (pk1, pk2) select
        mod(seq, 8), seq
      from
        seq_1_to_10000;
      analyze table t1;
      

      The table has 10K rows, let's use a condition on PK prefix to select 12.5% of that:

      explain select * from t1 where pk1=1;
      

      Surprisingly, this will pick full table scan:

      id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
      1       SIMPLE  t1      ALL     PRIMARY NULL    NULL    NULL    10000   Using where
      

      Optimizer trace will show that

      • estimated record numbers are OK for both access methods
      • estimated cost of full scan is OK
      • estimated cost for RANGE is very high:

                        "range_analysis": {
                          "table_scan": {
                            "rows": 10000,
                            "cost": 1.6673536
                          },
                          "potential_range_indexes": [
                            {
                              "index": "PRIMARY",
                              "usable": true,
                              "key_parts": ["pk1", "pk2"]
                            }
                          ],
                          "setup_range_conditions": [],
                          "analyzing_range_alternatives": {
                            "range_scan_alternatives": [
                              {
                                "index": "PRIMARY",
                                "ranges": ["(1) <= (pk1) <= (1)"],
                                "rowid_ordered": true,
                                "using_mrr": false,
                                "index_only": false,
                                "rows": 1250,
                                "cost": 2.82014029,
                                "chosen": false,
                                "cause": "cost"
                              }
                            ],
        

      The reason for this is that the cost of range access over clustered PK is computed as cost of reading N index tuples of TABLE_SHARE::stored_rec_length size. This number is a worst-case upper-bound estimate. It is very far from reality in this case.

      The cost of full table scan is computed using handler->stats.data_file_length estimate, which does account for values being much smaller than their maximum possible size.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.