[MDEV-34993] Incorrect cardinality estimation causes poor query plan - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5
Fix Version/s: 10.5.27, 10.6.20, 10.11.10, 11.2.6, 11.4.4
Component/s: Optimizer
Labels:
None

Description

Filing this based on investigation in TODO-4922.

There is a class of known issues one can observe with optimizer_use_condition_selectivity=3 setting. MariaDB 11.0 has a rewrite of relevant code.

This MDEV is about making a minimal fix to stop pre-11.0 versions from picking very poor query plans.

Attachments

Issue Links

causes

MDEV-35074 main.selectivity_notembedded and main.selectivity_innodb_notembedded fail with view protocol

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia added a comment - 2024-09-24 13:08

commit ccf1510ae6aa369fd2e6e588f2a946f8085cd482 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993)

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Tue Sep 24 10:23:34 2024 +0300

    MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs

    And make the fix for MDEV-34993 switchable.

commit 19e654dec144bc00cc5b4d60892e209688437036

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Tue Sep 24 09:44:39 2024 +0300

    MDEV-34993: Incorrect cardinality estimation causes poor query plan

    When calculate_cond_selectivity_for_table() takes into account multi-

    column selectivities from range access, it tries to take-into account

    that selectivity for some columns may have been already taken into account.

    For example, for range access on IDX1 using {kp1, kp2}, the selectivity

    of restrictions on "kp2" might have already been taken into account

    to some extent.

    So, the code tries to "discount" that using rec_per_key[] estimates.

    This seems to be wrong and unreliable: the "discounting" may produce a

    rselectivity_multiplier number that hints that the overall selectivity

    of range access on IDX1 was greater than 1.

    Do a conservative fix: if we arrive at conclusion that selectivity of

    range access on condition in IDX1 >1.0, clip it down to 1.

Sergei Petrunia added a comment - 2024-09-24 13:08 commit ccf1510ae6aa369fd2e6e588f2a946f8085cd482 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 10:23:34 2024 +0300 MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs And make the fix for MDEV-34993 switchable. commit 19e654dec144bc00cc5b4d60892e209688437036 Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 09:44:39 2024 +0300 MDEV-34993: Incorrect cardinality estimation causes poor query plan When calculate_cond_selectivity_for_table() takes into account multi- column selectivities from range access, it tries to take-into account that selectivity for some columns may have been already taken into account. For example, for range access on IDX1 using {kp1, kp2}, the selectivity of restrictions on "kp2" might have already been taken into account to some extent. So, the code tries to "discount" that using rec_per_key[] estimates. This seems to be wrong and unreliable: the "discounting" may produce a rselectivity_multiplier number that hints that the overall selectivity of range access on IDX1 was greater than 1. Do a conservative fix: if we arrive at conclusion that selectivity of range access on condition in IDX1 >1.0, clip it down to 1.

Sergei Petrunia added a comment - 2024-09-25 12:06

Final version with review input addressed:

commit 772c58b9dc77dc36c5a87521bab5ab485683e362 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993)

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Tue Sep 24 10:23:34 2024 +0300

    MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs

    ...and make the fix for MDEV-34993 switchable. It is enabled by default

    and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier

commit 8f07ecef64a4ad4f4ca2d896f35696dfbec2a264

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Tue Sep 24 09:44:39 2024 +0300

    MDEV-34993: Incorrect cardinality estimation causes poor query plan

lstartseva, could you test it as discussed?

Sergei Petrunia added a comment - 2024-09-25 12:06 Final version with review input addressed: commit 772c58b9dc77dc36c5a87521bab5ab485683e362 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 10:23:34 2024 +0300 MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs ...and make the fix for MDEV-34993 switchable. It is enabled by default and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier commit 8f07ecef64a4ad4f4ca2d896f35696dfbec2a264 Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 09:44:39 2024 +0300 MDEV-34993: Incorrect cardinality estimation causes poor query plan lstartseva , could you test it as discussed?

Sergei Petrunia added a comment - 2024-09-26 12:11

A testcase showing good query plan after this fix and bad query plan before:

Based on the MTR testcase for this MDEV, but add table t2:

create table t1 (

  pk int,

  key1 int,

  key2 int,

  filler char(100),

  index (key1, pk),

  index(key2),

  primary key (pk)

);

insert into t1

select

  seq, FLOOR(seq/100), seq, 'filler'

from

  seq_1_to_1000;

analyze table t1;

create table t2 (a int, b int, key(a));

insert into t2 select mod(seq, 1000), seq from seq_1_to_100000;

analyze table t2;

After the fix:

MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2;

+------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+

| id   | select_type | table | type  | possible_keys     | key     | key_len | ref           | rows | Extra       |

+------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+

|    1 | SIMPLE      | t1    | range | PRIMARY,key1,key2 | PRIMARY | 4       | NULL          | 5    | Using where |

|    1 | SIMPLE      | t2    | ref   | a                 | a       | 5       | test2.t1.key2 | 51   |             |

+------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+

2 rows in set (0.002 sec)

This will read 5 + 5 *51 = 260 rows.

Try with fix disabled:

MariaDB [test2]> set optimizer_adjust_secondary_key_costs='';

Query OK, 0 rows affected (0.001 sec)

MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2;

+------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+

| id   | select_type | table | type       | possible_keys     | key       | key_len | ref        | rows    | Extra                                                  |

+------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+

|    1 | SIMPLE      | t2    | ALL        | a                 | NULL      | NULL    | NULL       | 100113  | Using where                                            |

|    1 | SIMPLE      | t1    | ref|filter | PRIMARY,key1,key2 | key2|key1 | 5|9     | test2.t2.a | 1 (40%) | Using index condition; Using where; Using rowid filter |

+------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+

2 rows in set (0.003 sec)

100K rows for table t2 + 100K lookups into t1 gives ~ 200K rows read.

Sergei Petrunia added a comment - 2024-09-26 12:11 A testcase showing good query plan after this fix and bad query plan before: Based on the MTR testcase for this MDEV, but add table t2: create table t1 ( pk int , key1 int , key2 int , filler char (100), index (key1, pk), index (key2), primary key (pk) ); insert into t1 select seq, FLOOR(seq/100), seq, 'filler' from seq_1_to_1000; analyze table t1; create table t2 (a int , b int , key (a)); insert into t2 select mod(seq, 1000), seq from seq_1_to_100000; analyze table t2; After the fix: MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2; +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ | 1 | SIMPLE | t1 | range | PRIMARY,key1,key2 | PRIMARY | 4 | NULL | 5 | Using where | | 1 | SIMPLE | t2 | ref | a | a | 5 | test2.t1.key2 | 51 | | +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ 2 rows in set (0.002 sec) This will read 5 + 5 *51 = 260 rows. Try with fix disabled: MariaDB [test2]> set optimizer_adjust_secondary_key_costs=''; Query OK, 0 rows affected (0.001 sec) MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2; +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | a | NULL | NULL | NULL | 100113 | Using where | | 1 | SIMPLE | t1 | ref|filter | PRIMARY,key1,key2 | key2|key1 | 5|9 | test2.t2.a | 1 (40%) | Using index condition; Using where; Using rowid filter | +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ 2 rows in set (0.003 sec) 100K rows for table t2 + 100K lookups into t1 gives ~ 200K rows read.

Michael Widenius added a comment - 2024-09-30 11:54

Ok to push.

Michael Widenius added a comment - 2024-09-30 11:54 Ok to push.

Lena Startseva added a comment - 2024-09-30 16:17

Testing done. Ok to push.

Lena Startseva added a comment - 2024-09-30 16:17 Testing done. Ok to push.

Sergei Petrunia added a comment - 2024-10-02 12:03

Pushed into 10.5. It would be nice to manually merge to 11.x but currently there's a lot of unmerged stuff which has a lot of conflicts.

Sergei Petrunia added a comment - 2024-10-02 12:03 Pushed into 10.5. It would be nice to manually merge to 11.x but currently there's a lot of unmerged stuff which has a lot of conflicts.

Marko Mäkelä added a comment - 2024-10-03 11:14

Merged to 11.2

Marko Mäkelä added a comment - 2024-10-03 11:14 Merged to 11.2

Michael Widenius added a comment - 2024-10-04 09:24

A quick note about this MDEV:
This patch is only relevant for 10.5 - 10.11. 11.0 and above does not need it as the selectivity code was already fixed.
This was taking into consideration by Marko and Sergei when doing the merge to 11.2.

Michael Widenius added a comment - 2024-10-04 09:24 A quick note about this MDEV: This patch is only relevant for 10.5 - 10.11. 11.0 and above does not need it as the selectivity code was already fixed. This was taking into consideration by Marko and Sergei when doing the merge to 11.2.

Michael Widenius added a comment - 2024-10-04 09:28

Fix pushed to 10.5 tree by Sergei

Michael Widenius added a comment - 2024-10-04 09:28 Fix pushed to 10.5 tree by Sergei

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024-09-24 05:13

Updated:: 2024-10-16 13:46

Resolved:: 2024-10-04 09:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server