Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31380

Assertion `s->table->opt_range_condition_rows <= s->found_records' failed with optimizer_use_condition_selectivity=1

Details

    Description

      --source include/have_innodb.inc
      --source include/have_sequence.inc
       
      CREATE TABLE t (a INT, b INT, PRIMARY KEY(a), KEY(b)) ENGINE=InnoDB;
      INSERT INTO t SELECT seq, seq FROM seq_1_to_100;
       
      SET optimizer_use_condition_selectivity = 1;
      SELECT DISTINCT * FROM t WHERE a IN (1, 2);
       
      # Cleanup
      DROP TABLE t;
      

      10.6 a3e5b5c4

      mariadbd: /data/src/10.6/sql/sql_select.cc:7643: double apply_selectivity_for_table(JOIN_TAB*, uint, bool*): Assertion `s->table->opt_range_condition_rows <= s->found_records' failed.
      230601 11:44:04 [ERROR] mysqld got signal 6 ;
       
      #9  0x00007fb7d9253df2 in __GI___assert_fail (assertion=0x55e831d9d740 "s->table->opt_range_condition_rows <= s->found_records", file=0x55e831d98fc0 "/data/src/10.6/sql/sql_select.cc", line=7643, function=0x55e831d9d7a0 "double apply_selectivity_for_table(JOIN_TAB*, uint, bool*)") at ./assert/assert.c:101
      #10 0x000055e82fe3c413 in apply_selectivity_for_table (s=0x62b0000c69d8, use_cond_selectivity=1, force_estimate=0x7fb7c83bf330) at /data/src/10.6/sql/sql_select.cc:7643
      #11 0x000055e82fe422ef in best_access_path (join=0x62b0000c5e08, s=0x62b0000c69d8, remaining_tables=1, join_positions=0x62b0000c6fc8, idx=0, disable_jbuf=true, record_count=1, pos=0x62b0000c6fc8, loose_scan_pos=0x7fb7c83bf790) at /data/src/10.6/sql/sql_select.cc:8646
      #12 0x000055e82fe4b344 in best_extension_by_limited_search (join=0x62b0000c5e08, remaining_tables=1, idx=0, record_count=1, read_time=0, search_depth=62, prune_level=1, use_cond_selectivity=1) at /data/src/10.6/sql/sql_select.cc:10335
      #13 0x000055e82fe46fce in greedy_search (join=0x62b0000c5e08, remaining_tables=1, search_depth=62, prune_level=1, use_cond_selectivity=1) at /data/src/10.6/sql/sql_select.cc:9472
      #14 0x000055e82fe44ce5 in choose_plan (join=0x62b0000c5e08, join_tables=1) at /data/src/10.6/sql/sql_select.cc:9032
      #15 0x000055e82fe2eb44 in make_join_statistics (join=0x62b0000c5e08, tables_list=..., keyuse_array=0x62b0000c6130) at /data/src/10.6/sql/sql_select.cc:5963
      #16 0x000055e82fe0b006 in JOIN::optimize_inner (this=0x62b0000c5e08) at /data/src/10.6/sql/sql_select.cc:2509
      #17 0x000055e82fe04001 in JOIN::optimize (this=0x62b0000c5e08) at /data/src/10.6/sql/sql_select.cc:1848
      #18 0x000055e82fe259a6 in mysql_select (thd=0x62b0000bd218, tables=0x62b0000c4950, fields=..., conds=0x62b0000c5340, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748609, result=0x62b0000c5dd8, unit=0x62b0000c1588, select_lex=0x62b0000c4300) at /data/src/10.6/sql/sql_select.cc:5055
      #19 0x000055e82fdf58bc in handle_select (thd=0x62b0000bd218, lex=0x62b0000c14c0, result=0x62b0000c5dd8, setup_tables_done_option=0) at /data/src/10.6/sql/sql_select.cc:559
      #20 0x000055e82fd5a501 in execute_sqlcom_select (thd=0x62b0000bd218, all_tables=0x62b0000c4950) at /data/src/10.6/sql/sql_parse.cc:6273
      #21 0x000055e82fd48bda in mysql_execute_command (thd=0x62b0000bd218, is_called_from_prepared_stmt=false) at /data/src/10.6/sql/sql_parse.cc:3949
      #22 0x000055e82fd65479 in mysql_parse (thd=0x62b0000bd218, rawbuf=0x62b0000c4238 "SELECT DISTINCT * FROM t WHERE a IN (1, 2)", length=42, parser_state=0x7fb7c83c1a30) at /data/src/10.6/sql/sql_parse.cc:8036
      #23 0x000055e82fd3b2c0 in dispatch_command (command=COM_QUERY, thd=0x62b0000bd218, packet=0x629000280219 "SELECT DISTINCT * FROM t WHERE a IN (1, 2)", packet_length=42, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1896
      #24 0x000055e82fd37ffe in do_command (thd=0x62b0000bd218, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1409
      #25 0x000055e8301a5308 in do_handle_one_connection (connect=0x608000003938, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1416
      #26 0x000055e8301a4cc9 in handle_one_connection (arg=0x6080000038b8) at /data/src/10.6/sql/sql_connect.cc:1318
      #27 0x000055e830e01e2e in pfs_spawn_thread (arg=0x617000008218) at /data/src/10.6/storage/perfschema/pfs.cc:2201
      #28 0x00007fb7d92a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #29 0x00007fb7d93285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      The failure started happening on 10.6-10.11 after this commit in 10.6.13:

      commit 4329ec5d3b109cb0bcbee151b5800dc7b19d1945
      Author:     Sergei Petrunia
      AuthorDate: Thu Mar 9 17:04:07 2023 +0300
      CommitDate: Fri Apr 28 16:24:08 2023 +0300
       
          MDEV-30812: Improve output cardinality estimates for hash join
      

      Not reproducible with the provided test case on 11.0.
      No obvious immediate problem on a non-debug build.

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            ralf.gebhardt Ralf Gebhardt made changes -
            Field Original Value New Value
            Affects Version/s 10.6.13 [ 28514 ]
            Affects Version/s 10.6 [ 24028 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            The assertion is caused by the following sequence of events:

            • The table has 100 rows, head->stat_records()=101
            • range optimizer finds a quick range select with found_records=2.
            • There is a potential loose scan with group_trp->records=102.

            We hit this (wrong) code in opt_range.cc:

                  if (!only_single_index_range_scan)
                  { 
                    TRP_GROUP_MIN_MAX *group_trp;
                    if (tree)
                      restore_nonrange_trees(&param, tree, backup_keys);
                    if ((group_trp= get_best_group_min_max(&param, tree, read_time)))
                    {
                      param.table->opt_range_condition_rows= MY_MIN(group_trp->records,
                                                                head->stat_records());
            

            Note the

                  opt_range_condition_rows= MY_MIN(group_trp->records, head->stat_records())
            

            We had opt_range_condition_rows=2 (due to the quick select), but MY_MIN(...) ignored it
            and now it is set to 101 again.

            • However, the QUICK_RANGE_SELECT is cheaper than loose scan, so it returned as join_tab->quick
              and the code in make_join_statistics() sets join_tab->found_records based on the quick->records=2

            This is how we end up with

            s->found_records=2
            s->table->opt_range_condition_rows=101
            

            which hits an assert.

            psergei Sergei Petrunia added a comment - The assertion is caused by the following sequence of events: The table has 100 rows, head->stat_records()=101 range optimizer finds a quick range select with found_records=2. There is a potential loose scan with group_trp->records=102. We hit this (wrong) code in opt_range.cc: if (!only_single_index_range_scan) { TRP_GROUP_MIN_MAX *group_trp; if (tree) restore_nonrange_trees(&param, tree, backup_keys); if ((group_trp= get_best_group_min_max(&param, tree, read_time))) { param.table->opt_range_condition_rows= MY_MIN(group_trp->records, head->stat_records()); Note the opt_range_condition_rows= MY_MIN(group_trp->records, head->stat_records()) We had opt_range_condition_rows=2 (due to the quick select), but MY_MIN(...) ignored it and now it is set to 101 again. However, the QUICK_RANGE_SELECT is cheaper than loose scan, so it returned as join_tab->quick and the code in make_join_statistics() sets join_tab->found_records based on the quick->records=2 This is how we end up with s->found_records=2 s->table->opt_range_condition_rows=101 which hits an assert.
            psergei Sergei Petrunia added a comment - - edited

            Does this have anything to do with the code added by patch for MDEV-30812?

            That patch added the assert that is firing:

            +  DBUG_ASSERT(s->table->opt_range_condition_rows <= s->found_records);
            

            ... but that's it.

            psergei Sergei Petrunia added a comment - - edited Does this have anything to do with the code added by patch for MDEV-30812 ? That patch added the assert that is firing: + DBUG_ASSERT(s->table->opt_range_condition_rows <= s->found_records); ... but that's it.

            The problem code itself:

                      param.table->opt_range_condition_rows= MY_MIN(group_trp->records,
                                                                head->stat_records());
            

            is rather old, last meaningful ( not rename/typecast cleanup) change was in 2013 or before...

            psergei Sergei Petrunia added a comment - The problem code itself: param.table->opt_range_condition_rows= MY_MIN(group_trp->records, head->stat_records()); is rather old, last meaningful ( not rename/typecast cleanup) change was in 2013 or before...

            bb-10.6-MDEV-31380 .

            Note to the reviewer: the crash can be observed starting from 10.6, because we have the patch for MDEV-30812 in 10.6. The fix for this MDEV is applicable to earlier versions, too. (I think we should push it there.)

            psergei Sergei Petrunia added a comment - bb-10.6- MDEV-31380 . Note to the reviewer: the crash can be observed starting from 10.6, because we have the patch for MDEV-30812 in 10.6. The fix for this MDEV is applicable to earlier versions, too. (I think we should push it there.)
            psergei Sergei Petrunia made changes -
            Affects Version/s 10.5 [ 23123 ]

            The problem doesn't exist in 10.4 as there the range optimizer first considers the loose scan plan, and after that the range plans. This way, the lines

                  param.table->quick_condition_rows= MY_MIN(group_trp->records,
                                                         head->stat_records());
            

            are correct.

            But 10.5 already checks the range access first and then checks loose scan.

            For the testcase for this MDEV, having opt_range_condition_rows > quick->rows causes best_access_path to use DBL_MAX as the cost of range scan:

                        "considered_execution_plans": [
                          {
                            "plan_prefix": [],
                            "table": "t",
                            "best_access_path": {
                              "considered_access_paths": [
                                {
                                  "access_type": "range",
                                  "resulting_rows": 100,
                                  "cost": 1.79769e308,
                                  "chosen": true
                                } 
                              ],
            

            This doesn't make any difference for this MDEV's testcase (the best quick select, range(PK) is used regardless of what best_access_path thinks about its cost).

            What if there is a:

            • A very cheap range access on IDX1, using non-singlepoint range.
            • const ref access on index IDX2, (and so also range access), which is more expensive and returns more rows than RANGE(IDX1).
            • A loose scan plan on another index (possibly on IDX2, too).

            Then

            • the range optimizer will pick range(IDX1).
            • loose scan will set quick_condition_rows to a number higher than range(IDX1).rows
            • best_access_path() will pick ref(IDX2) as the first choice, and due to DBL_MAX cost will fail to use range(IDX1).

            It looks like 10.5 could potentially make poor query plan choices.

            psergei Sergei Petrunia added a comment - The problem doesn't exist in 10.4 as there the range optimizer first considers the loose scan plan, and after that the range plans. This way, the lines param.table->quick_condition_rows= MY_MIN(group_trp->records, head->stat_records()); are correct. But 10.5 already checks the range access first and then checks loose scan. For the testcase for this MDEV, having opt_range_condition_rows > quick->rows causes best_access_path to use DBL_MAX as the cost of range scan: "considered_execution_plans": [ { "plan_prefix": [], "table": "t", "best_access_path": { "considered_access_paths": [ { "access_type": "range", "resulting_rows": 100, "cost": 1.79769e308, "chosen": true } ], This doesn't make any difference for this MDEV's testcase (the best quick select, range(PK) is used regardless of what best_access_path thinks about its cost). What if there is a: A very cheap range access on IDX1, using non-singlepoint range. const ref access on index IDX2, (and so also range access), which is more expensive and returns more rows than RANGE(IDX1). A loose scan plan on another index (possibly on IDX2, too). Then the range optimizer will pick range(IDX1). loose scan will set quick_condition_rows to a number higher than range(IDX1).rows best_access_path() will pick ref(IDX2) as the first choice, and due to DBL_MAX cost will fail to use range(IDX1). It looks like 10.5 could potentially make poor query plan choices.

            bb-10.5-MDEV-31380-v2

            psergei Sergei Petrunia added a comment - bb-10.5- MDEV-31380 -v2

            The problem was introduced by this commit:

            commit b3ab3105fdb34dae6c2d4270751bc0694c3d9df8
            Author:	Monty <monty@mariadb.org>  Wed Mar  4 20:52:19 2020
            Committer:	Monty <monty@mariadb.org>  Fri Mar 27 04:54:45 2020
             
            Removed double calls to records_in_range from distinct and group by
             
            Fixed by moving testing of get_best_group_min_max() after range testing.
            

            which was pushed into 10.5.
            Before that commit, QUICK_GROUP_MIN_MAX was checked before QUICK_RANGE_SELECTs were, so
            this code was ok:

                      param.table->opt_range_condition_rows= MY_MIN(group_trp->records,
                                                                head->stat_records());
            

            as the only quick select we had considered was the one in group_trp.

            Starting from 10.5, the above code was executed after quick range select has set opt_range_condition_rows to a value that was lower than group_trp->records.
            This would cause the situation where opt_range_condition_rows > join_tab->quick->rows.

            psergei Sergei Petrunia added a comment - The problem was introduced by this commit: commit b3ab3105fdb34dae6c2d4270751bc0694c3d9df8 Author: Monty <monty@mariadb.org> Wed Mar 4 20:52:19 2020 Committer: Monty <monty@mariadb.org> Fri Mar 27 04:54:45 2020   Removed double calls to records_in_range from distinct and group by   Fixed by moving testing of get_best_group_min_max() after range testing. which was pushed into 10.5. Before that commit, QUICK_GROUP_MIN_MAX was checked before QUICK_RANGE_SELECTs were, so this code was ok: param.table->opt_range_condition_rows= MY_MIN(group_trp->records, head->stat_records()); as the only quick select we had considered was the one in group_trp. Starting from 10.5, the above code was executed after quick range select has set opt_range_condition_rows to a value that was lower than group_trp->records. This would cause the situation where opt_range_condition_rows > join_tab->quick->rows .
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Michael Widenius [ monty ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.5.21 [ 28913 ]
            Fix Version/s 10.6.14 [ 28914 ]
            Fix Version/s 10.9.7 [ 28916 ]
            Fix Version/s 10.10.5 [ 28917 ]
            Fix Version/s 10.11.4 [ 28918 ]
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            dbart Daniel Bartholomew made changes -
            Fix Version/s 10.5.22 [ 29011 ]
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]
            Fix Version/s 10.5.21 [ 28913 ]
            Fix Version/s 10.6.14 [ 28914 ]
            Fix Version/s 10.9.7 [ 28916 ]
            Fix Version/s 10.10.5 [ 28917 ]
            Fix Version/s 10.11.4 [ 28918 ]
            elenst Elena Stepanova made changes -
            Labels regression
            psergei Sergei Petrunia added a comment - - edited

            The testcase fails in --emb runs. The testcase uses optimizer trace which is not available in embedded. The fix for this is already pushed.

            psergei Sergei Petrunia added a comment - - edited The testcase fails in --emb runs. The testcase uses optimizer trace which is not available in embedded. The fix for this is already pushed.

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.