[MDEV-24137] Optimizer choosing the wrong index despite of the PK showing better explain Created: 2020-11-05  Updated: 2020-12-15

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.43, 10.4.12, 10.4.13, 10.4.14, 10.4.15
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Manuel Arostegui Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: optimizer, optimizer_trace
Environment:

debian


Attachments: Text File built-trees.txt     File xpl.diff    

 Description   

We have the following table:

CREATE TABLE `templatelinks` (
  `tl_from` int(8) unsigned NOT NULL DEFAULT 0,
  `tl_namespace` int(11) NOT NULL DEFAULT 0,
  `tl_title` varbinary(255) NOT NULL DEFAULT '',
  `tl_from_namespace` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`tl_from`,`tl_namespace`,`tl_title`),
  KEY `tl_namespace` (`tl_namespace`,`tl_title`,`tl_from`),
  KEY `tl_backlinks_namespace` (`tl_from_namespace`,`tl_namespace`,`tl_title`,`tl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED

We have run an analyze table to refresh all the stats.
Having the following query: (pasted here as otherwise it goes above the characters limit for bugs reporting: https://phabricator.wikimedia.org/P13194

The explain for it:

mysql:root@localhost [cswiki]> show explain for 73439158;
+------+-------------+---------------+------+----------------------+--------------+---------+-------+---------+--------------------------+
| id   | select_type | table         | type | possible_keys        | key          | key_len | ref   | rows    | Extra                    |
+------+-------------+---------------+------+----------------------+--------------+---------+-------+---------+--------------------------+
|    1 | SIMPLE      | templatelinks | ref  | PRIMARY,tl_namespace | tl_namespace | 4       | const | 1294541 | Using where; Using index |
+------+-------------+---------------+------+----------------------+--------------+---------+-------+---------+--------------------------+
1 row in set, 1 warning (0.000 sec)

However, using a hint like this, changes the query plan drastically:

SELECT / tl_from, tl_title FROM `templatelinks` USE INDEX (PRIMARY) WHERE....

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	templatelinks	range	PRIMARY	PRIMARY	265	NULL	2480	Using where; Using index

However, the optimizer keeps choosing the tl_namespace.

This is the optimizer trace for the first and original query: https://phabricator.wikimedia.org/P13220

And the same but with the hint to use the PK: https://phabricator.wikimedia.org/P13221

At a first glance we can see the large difference on scanned rows (and run time):
What the optimizer picks by default

    "r_total_time_ms": 162170,
    "table": {
      "table_name": "templatelinks",
      "access_type": "ref",
      "possible_keys": ["PRIMARY", "tl_namespace"],
      "key": "tl_namespace",
      "key_length": "4",
      "used_key_parts": ["tl_namespace"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 1294541,

And the runtime and rows if we force the PK to be used:

   "r_total_time_ms": 8.3322,
    "table": {
      "table_name": "templatelinks",
      "access_type": "range",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "265",
      "used_key_parts": ["tl_from", "tl_namespace", "tl_title"],
      "r_loops": 1,
      "rows": 2480,

The query run time is way different with and without the hint:
And this is true if we try a `USE (PRIMARY) on the query shows way better results:

265 rows in set (0.146 sec)

And this is NOT doing USE (PRIMARY) and just leaving the query choosing whatever it prefers (tl_namespace);

265 rows in set (2 min 9.220 sec)



 Comments   
Comment by Sergei Petrunia [ 2020-12-06 ]

SELECT /* GrowthExperiments\NewcomerTasks\TemplateFilter::buildResultsMap */ 
  tl_from, tl_title 
FROM `templatelinks` 
WHERE 
  (tl_from = 670356 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 1066533 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 1333958 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  ...

The WHERE clause has 248 lines, the only difference is the constant in the tl_from=N

The default query plan

Uses ref acess on tl_namespace=10.

Range optimizer didn't construct any potential query plans.
(Note that the optimizer trace doesn't have the "analyzing_range_alternatives"
node).

The query plan with force index

uses range plan over all 3 key parts of the primary key:

  PRIMARY KEY (tl_from,tl_namespace,tl_title),

which is basically the whole WHERE clause. 248 lines * 10 elements in the in-list gives 2480 ranges.

Optimizer trace confirms this.

Comment by Sergei Petrunia [ 2020-12-06 ]

Indeed, the range optimizer is hitting MAX_SEL_ARGS limitation. After the primary get_mm_tree call:

(gdb) p param.alloced_sel_args
  $11 = 16003
(gdb) p tree
  $12 = (SEL_TREE *) 0x0

Comment by Sergei Petrunia [ 2020-12-06 ]

Increased the MAX_SEL_ARGS limit 10x . The query would consume this many:

(gdb) p param.alloced_sel_args
  $37 = 16258

Comment by Sergei Petrunia [ 2020-12-06 ]

The query doesn't have restrictions on the first part of the key tl_backlinks_namespace. I've dropped it to make sure it's not interfering.
So, we're left with:

  PRIMARY KEY (`tl_from`,`tl_namespace`,`tl_title`),
  KEY `tl_namespace` (`tl_namespace`,`tl_title`,`tl_from`),

Re-running the explain and noting the alloced_sel_args after the get_mm_tree:

(no hint)                  - param.alloced_sel_args=16258
FORCE INDEX(primary)       - param.alloced_sel_args=0
FORCE INDEX(tl_namespace)  - param.alloced_sel_args=9

Comment by Sergei Petrunia [ 2020-12-06 ]

That is, the code that constructs ranges for a specific index is not the issue. The high usage is caused by the code that combines scans on multiple indexes.

Comment by Sergei Petrunia [ 2020-12-06 ]

Indeed, the code in tree_or does the cloning in order to construct index_merge:
Outline of how that happens:

sel_trees_can_be_ored(param, tree1, tree2, &ored_keys)) --> true

    bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys); -->false

must_be_ored=false, because indexes primary and tl_namespace are not infixes of each other. Because of that,
sel_trees_must_be_ored() returns FALSE:

      if (!is_key_infix(key1_init, key1_end, key2_init, key2_end) &&
          !is_key_infix(key2_init, key2_end, key1_init, key1_end))
        DBUG_RETURN(FALSE);

Then, this code in the main per-key key_or() loop copies the trees:

    while ((key_no= it++) != key_map::Iterator::BITMAP_END)
    {
      SEL_ARG *key1= tree1->keys[key_no];
      SEL_ARG *key2= tree2->keys[key_no];
      if (!must_be_ored)
      {
        key1->incr_refs();
        key2->incr_refs();
      }
      if ((result->keys[key_no]= key_or(param, key1, key2)))
        result->keys_map.set_bit(key_no);

copying the trees is fully justified, because tree_or will later construct a SEL_IMERGE from them.

but if we have an N-way OR condition, with each OR-branch producing one range, the number of copying will be:

1 + 2 +3 + ... + N = N^2  

ranges.

Comment by Sergei Petrunia [ 2020-12-12 ]

A patch for printing the generated trees: xpl.diff and dump of data structures obtained with it: built-trees.txt .

This covers only the first rows of the query:

 
explain
SELECT /* GrowthExperiments\NewcomerTasks\TemplateFilter::buildResultsMap */ 
  tl_from, tl_title 
FROM `templatelinks`
WHERE 
  (tl_from = 670356 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 1066533 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 1333958 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 921514 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
  (tl_from = 12783 AND tl_title IN ('Kdy?', 'Kdo?', 'Pravopis', 'Sloh', 'Transkripce', 'Reklama', 'NPOV', 'Kým?', 'Jaký?', 'Který?') AND tl_namespace = 10) OR 
...

Apparently the optimizer is building a lot of options to construct index_merge plans (including a 3-way index_merge even when there are two indexes!). It is not clear for me how to fix this, though.

Generated at Thu Feb 08 09:27:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.