Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34993

Incorrect cardinality estimation causes poor query plan

Details

    Description

      Filing this based on investigation in TODO-4922.

      There is a class of known issues one can observe with optimizer_use_condition_selectivity=3 setting. MariaDB 11.0 has a rewrite of relevant code.

      This MDEV is about making a minimal fix to stop pre-11.0 versions from picking very poor query plans.

      Attachments

        Issue Links

          Activity

            commit ccf1510ae6aa369fd2e6e588f2a946f8085cd482 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Sep 24 10:23:34 2024 +0300
             
                MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs
                
                And make the fix for MDEV-34993 switchable.
            
            

            commit 19e654dec144bc00cc5b4d60892e209688437036
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Sep 24 09:44:39 2024 +0300
             
                MDEV-34993: Incorrect cardinality estimation causes poor query plan
                
                When calculate_cond_selectivity_for_table() takes into account multi-
                column selectivities from range access, it tries to take-into account
                that selectivity for some columns may have been already taken into account.
                
                For example, for range access on IDX1 using {kp1, kp2}, the selectivity
                of restrictions on "kp2" might have already been taken into account
                to some extent.
                So, the code tries to "discount" that using rec_per_key[] estimates.
                
                This seems to be wrong and unreliable: the "discounting" may produce a
                rselectivity_multiplier number that hints that the overall selectivity
                of range access on IDX1 was greater than 1.
                
                Do a conservative fix: if we arrive at conclusion that selectivity of
                range access on condition in IDX1 >1.0, clip it down to 1.
            
            

            psergei Sergei Petrunia added a comment - commit ccf1510ae6aa369fd2e6e588f2a946f8085cd482 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 10:23:34 2024 +0300   MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs And make the fix for MDEV-34993 switchable. commit 19e654dec144bc00cc5b4d60892e209688437036 Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 09:44:39 2024 +0300   MDEV-34993: Incorrect cardinality estimation causes poor query plan When calculate_cond_selectivity_for_table() takes into account multi- column selectivities from range access, it tries to take-into account that selectivity for some columns may have been already taken into account. For example, for range access on IDX1 using {kp1, kp2}, the selectivity of restrictions on "kp2" might have already been taken into account to some extent. So, the code tries to "discount" that using rec_per_key[] estimates. This seems to be wrong and unreliable: the "discounting" may produce a rselectivity_multiplier number that hints that the overall selectivity of range access on IDX1 was greater than 1. Do a conservative fix: if we arrive at conclusion that selectivity of range access on condition in IDX1 >1.0, clip it down to 1.

            Final version with review input addressed:

            commit 772c58b9dc77dc36c5a87521bab5ab485683e362 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Sep 24 10:23:34 2024 +0300
             
                MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs
                
                ...and make the fix for MDEV-34993 switchable. It is enabled by default
                and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier
             
            commit 8f07ecef64a4ad4f4ca2d896f35696dfbec2a264
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Sep 24 09:44:39 2024 +0300
             
                MDEV-34993: Incorrect cardinality estimation causes poor query plan
            

            lstartseva, could you test it as discussed?

            psergei Sergei Petrunia added a comment - Final version with review input addressed: commit 772c58b9dc77dc36c5a87521bab5ab485683e362 (HEAD -> bb-10.5-mdev34993, origin/bb-10.5-mdev34993) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 10:23:34 2024 +0300   MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs ...and make the fix for MDEV-34993 switchable. It is enabled by default and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier   commit 8f07ecef64a4ad4f4ca2d896f35696dfbec2a264 Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Sep 24 09:44:39 2024 +0300   MDEV-34993: Incorrect cardinality estimation causes poor query plan lstartseva , could you test it as discussed?

            A testcase showing good query plan after this fix and bad query plan before:

            Based on the MTR testcase for this MDEV, but add table t2:

            create table t1 (
              pk int,
              key1 int,
              key2 int,
              filler char(100),
              index (key1, pk),
              index(key2),
              primary key (pk)
            );
             
            insert into t1
            select 
              seq, FLOOR(seq/100), seq, 'filler'
            from 
              seq_1_to_1000;
            analyze table t1;
             
            create table t2 (a int, b int, key(a));
            insert into t2 select mod(seq, 1000), seq from seq_1_to_100000;
            analyze table t2;
            

            After the fix:

            MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2;
            +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+
            | id   | select_type | table | type  | possible_keys     | key     | key_len | ref           | rows | Extra       |
            +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+
            |    1 | SIMPLE      | t1    | range | PRIMARY,key1,key2 | PRIMARY | 4       | NULL          | 5    | Using where |
            |    1 | SIMPLE      | t2    | ref   | a                 | a       | 5       | test2.t1.key2 | 51   |             |
            +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+
            2 rows in set (0.002 sec)
            

            This will read 5 + 5 *51 = 260 rows.

            Try with fix disabled:

            MariaDB [test2]> set optimizer_adjust_secondary_key_costs='';
            Query OK, 0 rows affected (0.001 sec)
             
            MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2;
            +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+
            | id   | select_type | table | type       | possible_keys     | key       | key_len | ref        | rows    | Extra                                                  |
            +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+
            |    1 | SIMPLE      | t2    | ALL        | a                 | NULL      | NULL    | NULL       | 100113  | Using where                                            |
            |    1 | SIMPLE      | t1    | ref|filter | PRIMARY,key1,key2 | key2|key1 | 5|9     | test2.t2.a | 1 (40%) | Using index condition; Using where; Using rowid filter |
            +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+
            2 rows in set (0.003 sec)
            

            100K rows for table t2 + 100K lookups into t1 gives ~ 200K rows read.

            psergei Sergei Petrunia added a comment - A testcase showing good query plan after this fix and bad query plan before: Based on the MTR testcase for this MDEV, but add table t2: create table t1 ( pk int , key1 int , key2 int , filler char (100), index (key1, pk), index (key2), primary key (pk) );   insert into t1 select seq, FLOOR(seq/100), seq, 'filler' from seq_1_to_1000; analyze table t1;   create table t2 (a int , b int , key (a)); insert into t2 select mod(seq, 1000), seq from seq_1_to_100000; analyze table t2; After the fix: MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2; +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ | 1 | SIMPLE | t1 | range | PRIMARY,key1,key2 | PRIMARY | 4 | NULL | 5 | Using where | | 1 | SIMPLE | t2 | ref | a | a | 5 | test2.t1.key2 | 51 | | +------+-------------+-------+-------+-------------------+---------+---------+---------------+------+-------------+ 2 rows in set (0.002 sec) This will read 5 + 5 *51 = 260 rows. Try with fix disabled: MariaDB [test2]> set optimizer_adjust_secondary_key_costs=''; Query OK, 0 rows affected (0.001 sec)   MariaDB [test2]> explain select * from t1, t2 where pk in (1,2,3,4,5) and key1 <= 4 and t2.a=t1.key2; +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | a | NULL | NULL | NULL | 100113 | Using where | | 1 | SIMPLE | t1 | ref|filter | PRIMARY,key1,key2 | key2|key1 | 5|9 | test2.t2.a | 1 (40%) | Using index condition; Using where; Using rowid filter | +------+-------------+-------+------------+-------------------+-----------+---------+------------+---------+--------------------------------------------------------+ 2 rows in set (0.003 sec) 100K rows for table t2 + 100K lookups into t1 gives ~ 200K rows read.

            Ok to push.

            monty Michael Widenius added a comment - Ok to push.

            Testing done. Ok to push.

            lstartseva Lena Startseva added a comment - Testing done. Ok to push.

            Pushed into 10.5. It would be nice to manually merge to 11.x but currently there's a lot of unmerged stuff which has a lot of conflicts.

            psergei Sergei Petrunia added a comment - Pushed into 10.5. It would be nice to manually merge to 11.x but currently there's a lot of unmerged stuff which has a lot of conflicts.

            A quick note about this MDEV:
            This patch is only relevant for 10.5 - 10.11. 11.0 and above does not need it as the selectivity code was already fixed.
            This was taking into consideration by Marko and Sergei when doing the merge to 11.2.

            monty Michael Widenius added a comment - A quick note about this MDEV: This patch is only relevant for 10.5 - 10.11. 11.0 and above does not need it as the selectivity code was already fixed. This was taking into consideration by Marko and Sergei when doing the merge to 11.2.

            Fix pushed to 10.5 tree by Sergei

            monty Michael Widenius added a comment - Fix pushed to 10.5 tree by Sergei

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.