[MDEV-26389] (records_in_range) Query plan regression after upgrading from 10.3 to 10.5 Created: 2021-08-17  Updated: 2022-03-04

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.10, 10.5.11
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Alex Sladkov Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: records_in_range
Environment:

CentOS 7


Issue Links:
Relates
relates to MDEV-21535 Unnecessarily large ha_innobase::reco... Closed

 Description   

After recently upgrading MariaDB from version 10.3.22 to 10.5.10 we started seeing some of our queries' performance drastically decreased.
Turned out that for some reason optimizer goes for a full table scan instead of using an index.
The most prominent example is described below.

Given a table

CREATE TABLE `feedback_questions` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_cat` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `id_subcat` int(10) unsigned DEFAULT NULL,
  `date_create` int(10) unsigned NOT NULL DEFAULT 0,
  `id_reg_user` int(10) unsigned NOT NULL DEFAULT 0,
  `id_hr_user` int(10) unsigned NOT NULL DEFAULT 0,
  `id_resume` int(10) unsigned NOT NULL DEFAULT 0,
  `id_vacancy` int(10) unsigned DEFAULT NULL,
  `id_broadcast_transmission` int(10) unsigned NOT NULL DEFAULT 0,
  `status` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `date_status` int(10) unsigned NOT NULL DEFAULT 0,
  `id_admin` int(10) unsigned NOT NULL DEFAULT 0,
  `date_id_admin` int(10) unsigned NOT NULL DEFAULT 0,
  `email` varchar(255) NOT NULL DEFAULT '',
  `phone` varchar(255) NOT NULL DEFAULT '',
  `subject` varchar(255) NOT NULL DEFAULT '',
  `body` longtext NOT NULL,
  `name` varchar(255) NOT NULL DEFAULT '',
  `file_extension` enum('','jpg','gif','png','jpeg','bmp','doc','docx','xls','msg','eml','tiff','pcx') NOT NULL,
  `location_key` enum('ru','ua','uz') NOT NULL DEFAULT 'ru',
  `id_profile_hackwork` int(10) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `id_reg_user` (`id_reg_user`),
  KEY `id_hr_user` (`id_hr_user`),
  KEY `id_resume` (`id_resume`),
  KEY `status` (`status`),
  KEY `date_create` (`date_create`),
  KEY `id_cat` (`id_cat`),
  KEY `id_admin` (`id_admin`),
  KEY `subject` (`subject`(6)),
  KEY `email` (`email`(191)),
  KEY `id_vacancy` (`id_vacancy`),
  KEY `id_subcat` (`id_subcat`),
  KEY `location_key` (`location_key`),
  KEY `location_key__status__id_admin` (`location_key`,`status`,`id_admin`),
  KEY `id_profile_hackwork` (`id_profile_hackwork`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC

and a query

SELECT * FROM `region`.`feedback_questions` `t` WHERE status = 10

MariaDB 10.3.22 chooses a plan like this

ANALYZE: {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.0719,
    "table": {
      "table_name": "t",
      "access_type": "ref",
      "possible_keys": ["status"],
      "key": "status",
      "key_length": "1",
      "used_key_parts": ["status"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 7,
      "r_rows": 7,
      "r_total_time_ms": 0.0437,
      "filtered": 100,
      "r_filtered": 100
    }
  }
}

and after upgrading to MariaDB 10.5.10 the plan becomes

ANALYZE: {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 4370.607334,
    "table": {
      "table_name": "t",
      "access_type": "ALL",
      "possible_keys": ["status"],
      "r_loops": 1,
      "rows": 1418875,
      "r_rows": 1647148,
      "r_table_time_ms": 4286.20105,
      "r_other_time_ms": 84.39859076,
      "filtered": 22.71969032,
      "r_filtered": 6.0711e-5,
      "attached_condition": "region.t.`status` = 10"
    }
  }
}

Distribution of "status" column values is the following

+--------+----------+
| status | count(*) |
+--------+----------+
|     10 |        8 |
|     15 |       35 |
|     20 |        9 |
|    110 |  1061454 |
|    111 |       39 |
|    112 |       48 |
|    113 |       11 |
|    114 |        6 |
|    120 |   248403 |
|    130 |   232643 |
|    190 |    97839 |
|    200 |     6073 |
|    210 |      571 |
|    220 |      674 |
|    230 |     2035 |
+--------+----------+

Optimizer trace output

                            QUERY: ANALYZE FORMAT=JSON
SELECT * FROM `region`.`feedback_questions` `t` WHERE status = 10
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select_id": 1,
        "steps": [
          {
            "expanded_query": "select region.t.`id` AS `id`,region.t.id_cat AS id_cat,region.t.id_subcat AS id_subcat,region.t.date_create AS date_create,region.t.id_reg_user AS id_reg_user,region.t.id_hr_user AS id_hr_user,region.t.id_resume AS id_resume,region.t.id_vacancy AS id_vacancy,region.t.id_broadcast_transmission AS id_broadcast_transmission,region.t.`status` AS `status`,region.t.date_status AS date_status,region.t.id_admin AS id_admin,region.t.date_id_admin AS date_id_admin,region.t.email AS email,region.t.phone AS phone,region.t.`subject` AS `subject`,region.t.`body` AS `body`,region.t.`name` AS `name`,region.t.file_extension AS file_extension,region.t.location_key AS location_key,region.t.id_profile_hackwork AS id_profile_hackwork from feedback_questions t where region.t.`status` = 10"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select_id": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "region.t.`status` = 10",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "multiple equal(10, region.t.`status`)"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "multiple equal(10, region.t.`status`)"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "multiple equal(10, region.t.`status`)"
                }
              ]
            }
          },
          {
            "table_dependencies": [
              {
                "table": "t",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": []
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
              {
                "table": "t",
                "field": "status",
                "equals": "10",
                "null_rejecting": false
              }
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "t",
                "range_analysis": {
                  "table_scan": {
                    "rows": 1418875,
                    "cost": 332481
                  },
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_reg_user",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_hr_user",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_resume",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "status",
                      "usable": true,
                      "key_parts": ["status", "id"]
                    },
                    {
                      "index": "date_create",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_cat",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_admin",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "subject",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "email",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_vacancy",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_subcat",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "location_key",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "location_key__status__id_admin",
                      "usable": false,
                      "cause": "not applicable"
                    },
                    {
                      "index": "id_profile_hackwork",
                      "usable": false,
                      "cause": "not applicable"
                    }
                  ],
                  "setup_range_conditions": [],
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "status",
                        "ranges": ["(10) <= (status) <= (10)"],
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 322364,
                        "cost": 386986.6193,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
                    "analyzing_roworder_intersect": {
                      "cause": "too few roworder scans"
                    },
                    "analyzing_index_merge_union": []
                  },
                  "group_index_range": {
                    "chosen": false,
                    "cause": "no group by or distinct"
                  }
                }
              },
              {
                "selectivity_for_indexes": [
                  {
                    "index_name": "status",
                    "selectivity_from_index": 0.227196899
                  }
                ],
                "selectivity_for_columns": [],
                "cond_selectivity": 0.227196899
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [],
                "table": "t",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "status",
                      "used_range_estimates": true,
                      "rows": 322364,
                      "cost": 322513.7993,
                      "chosen": true
                    },
                    {
                      "access_type": "scan",
                      "resulting_rows": 322364,
                      "cost": 268006.2,
                      "chosen": true
                    }
                  ],
                  "chosen_access_method": {
                    "type": "scan",
                    "records": 322364,
                    "cost": 268006.2,
                    "uses_join_buffering": false
                  }
                },
                "rows_for_plan": 322364,
                "cost_for_plan": 332479,
                "estimated_join_cardinality": 322364
              }
            ]
          },
          {
            "best_join_order": ["t"]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "region.t.`status` = 10",
              "attached_conditions_computation": [],
              "attached_conditions_summary": [
                {
                  "table": "t",
                  "attached": "region.t.`status` = 10"
                }
              ]
            }
          }
        ]
      }
    },
    {
      "join_execution": {
        "select_id": 1,
        "steps": []
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0

Historically we've been using "use_stat_tables=never" and "optimizer_use_condition_selectivity=1", so I thought that maybe that was the cause of what I'm describing, but trying to use engine-independent histogram based statistics didn't help (unless I missed something and there is something else that should be done besides tweaking these settings and doing "ANALYZE TABLE `region`.`feedback_questions` PERSISTENT FOR ALL").
Index and column statistics all seem to be in place

MariaDB [(none)]> select min_value, max_value, nulls_ratio, avg_length, avg_frequency, hist_size, hist_type, hex(histogram), decode_histogram(hist_type,histogram) from mysql.column_stats where db_name = 'region' and table_name = 'feedback_questions' and column_name = 'status' \G
*************************** 1. row ***************************
                            min_value: 10
                            max_value: 230
                          nulls_ratio: 0.0000
                           avg_length: 1.0000
                        avg_frequency: 109989.3333
                            hist_size: 254
                            hist_type: DOUBLE_PREC_HB
                       hex(histogram): 5C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C745C74FF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FFF7FA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28BA28B73D173D173D173D173D173D173D173D1
decode_histogram(hist_type,histogram): 0.45454,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.04546,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.04546,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.27272,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,0.18183
1 row in set (0.00 sec)

MariaDB [(none)]> select * from mysql.index_stats where db_name = 'region' and table_name = 'feedback_questions' and index_name = 'status' ;
+---------+--------------------+------------+--------------+---------------+
| db_name | table_name         | index_name | prefix_arity | avg_frequency |
+---------+--------------------+------------+--------------+---------------+
| region  | feedback_questions | status     |            1 |   109989.3333 |
| region  | feedback_questions | status     |            2 |        1.0000 |
+---------+--------------------+------------+--------------+---------------+
2 rows in set (0.00 sec)

MariaDB [(none)]> select * from mysql.table_stats where db_name = 'region' and table_name = 'feedback_questions' ;
+---------+--------------------+-------------+
| db_name | table_name         | cardinality |
+---------+--------------------+-------------+
| region  | feedback_questions |     1649840 |
+---------+--------------------+-------------+
1 row in set (0.01 sec)

Also, unfortunately, I couldn't reproduce the issue in a "clean" manner.
I tried the following ways of reproducing:
1. loading the table from sql dump into a fresh 10.5.10 instance with similar settings: query plan uses the index as expected;
2. loading the table from sql dump into a fresh 10.3.22 instance with similar settings and the performing an upgrade to 10.5.10: query plan uses the index as expected;
3. importing the tablespace from a read-only replica: couldn't test, because query "ALTER TABLE region.feedback_questions IMPORT TABLESPACE" failed with an error

ERROR 1808 (HY000) at line 1: Schema mismatch (Column email max prefix mismatch it's 764 in the table and 0 in the tablespace meta file)

When I tried to dump+drop+reimport this table on a replica, it made the issue go away on that replica.

For now I'm just adding FORCE INDEX wherever appropriate, but would like to know if there's a way to fix this issue without re-importing the tables in question.



 Comments   
Comment by Sergei Petrunia [ 2021-08-19 ]

Thanks for the detailed report.

Looking at the trace:

            "rows_estimation": [

                  
                  "table_scan": {
                    "rows": 1 418 875,
                    "cost": 332 481
                  },

ok the table has 1.4M rows

                    "range_scan_alternatives": [
                      {
                        "index": "status",
                        "ranges": ["(10) <= (status) <= (10)"],
                        ...
                        "rows": 322 364,
                        "cost": 386 986.6193,
                        "chosen": false,
                        "cause": "cost"
                      }

When the optimizer calls records_in_range to ask InnoDB about the number of rows matching status=10, it returns 332K rows.

The cost of reading that many rows is higher than the cost of doing a full scan, so the optimizer chooses to do full scan.

The actual number of rows is indeed very small:

+--------+----------+
| status | count(*) |
+--------+----------+
|     10 |        8 |

Comment by Sergei Petrunia [ 2021-08-19 ]

ANALYZE TABLE `region`.`feedback_questions` PERSISTENT FOR ALL" won't help in this case. The optimizer is coded to prefer the data from records_in_range call over the data from the histogram (i.e. from mysql.column_stats). There's no setting to change this.

Comment by Sergei Petrunia [ 2021-08-19 ]

There are known issues with InnoDB's records_in_range :MDEV-21136 MDEV-21895 but this one doesn't look like any of them.

Comment by Sergei Petrunia [ 2021-08-19 ]

Workarounds:

One can use

ALTER TABLE region ENGINE=INNODB;

or

alter table region drop key status;
alter table region add key(status);

Both will cause the index to be rebuilt and the issue to go away.

Comment by Sergei Petrunia [ 2021-08-19 ]

How can we investigate/fix this

It's obvious that a certain particular data layout is required to reproduce the issue.

According to marko, exporting/importing a tablespace may change the data layout (e.g. it will remove delete-marked records). So even if we figured how to avoid the Schema mismatch error, this would not help.

narkq, is it possible to share a tarball of the data directory where this issue can be observed with MariaDB developers? You can upload it to the private folder as described at https://mariadb.com/kb/en/meta/mariadb-ftp-server/ , in that case it would be only accessible by the MariaDB developers.

Comment by Alex Sladkov [ 2021-08-19 ]

Alex Sladkov, is it possible to share a tarball of the data directory where this issue can be observed with MariaDB developers?

The whole data directory is quite large (around 2TB), so I think it wouldn't be of any use to upload it in its entirety.
I'll try to figure out a way to prepare a tarball with just the table in question (approx 1.3GB).
I'm planning to make a replica via LVM snapshot and then drop all other tables on that replica. Would that suffice?

Comment by Sergei Golubchik [ 2021-08-23 ]

yes, I suppose you can drop all other tables, they shouldn't affect anything.

Comment by Alex Sladkov [ 2021-08-23 ]

Uploaded a tarball of the data directory to the private folder at ftp://ftp.mariadb.org
Filename is

MDEV-26389_data_dir.<sha1sum>.tar.gz


Unfortunately, I don't know a way to shrink ibdata1 and ib_logfile0 files without restoring all tables from sql dump, so the archive turned out to be larger than I initially thought it would be: 5.5G (24G uncompressed).

Comment by Marko Mäkelä [ 2022-01-05 ]

narkq, I wonder if 10.4 shows the same regression. The way how handler::records_in_range() is invoked was changed in MDEV-16188. It caused a performance regression that MDEV-21535 was expected to fix.

I am familiar with InnoDB, not the query execution, so I cannot say whether disabling MDEV-16188 could theoretically have any impact on this:

SET SESSION optimizer_switch='rowid_filter=off';

Comment by Igor Babaev [ 2022-01-05 ]

marko,
The optimizer trace shows that rowid_filter=on has nothing to do with the wrong estimate for records_in_range=10. My guess is that this estimate is due to the new Monty's code for records_in_range(). The code was added to fix MDEV-21535. I also think that recreation of the index should help. Monty's code plays somehow with index page numbers. You can look at his code.

Generated at Thu Feb 08 09:44:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.