Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17155

Incorrect ORDER BY optimization: full index scan is used instead of range

Details

    Description

      This is based on a customer testcase.

      I wasn't able to construct an actual testcase so far, but I was able to get the invalid query plan by tampering with rows/cost estimates in the debugger.

      The problem looks like this:

      EXPLAIN 
      SELECT 
        non_idex_col 
      FROM 
        t2
      WHERE 
        key1part1 = 1500 and key1part2 IN (5, 60, 133, 387) AND 
        pk1 = 700000
      ORDER BY
        pk2 DESC LIMIT 1;
      

      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      |    1 | SIMPLE      | t2    | index | PRIMARY,key1  | PRIMARY | 6       | NULL |   26 | Using where |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      

      The plan uses a full index scan, even if there is a potential ref (or range) access over the PRIMARY key: pk1 = 700000. Costs and selectivity numbers are irrelevant - if we pick to do a full index scan, then range on that index will not be slower.

      The optimizer actually has this logic and was expected to use range access. It even typically does. Getting the above query plan requires some peculiar relationship between costs of range/ref accesses through key1 and primary key.

      Attachments

        Activity

          psergei Sergei Petrunia added a comment - - edited

          Steps to reproduce:

          1. Fill the dataset

          CREATE TABLE t2 (
            pk1 int(10) unsigned NOT NULL,
            pk2 smallint(6) NOT NULL DEFAULT '0',
           
            key1part1 mediumint(8) unsigned NOT NULL,
            key1part2 smallint(5) unsigned NOT NULL,
            non_idex_col smallint(6) NOT NULL,
            other_data varchar(100),
           
            PRIMARY KEY (pk1, pk2),
            KEY key1 (key1part1, key1part2)
          ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
           
          insert into t2 select A.a+1000*B.a, 123, A.a+1000*B.a, A.a, 123, 'filler-data' from one_k A, one_k B;
           
          update t2 set key1part1=1500, key1part2=5 where pk1 between 700000 and 700000+20000;
          set @a=0;
          update t2 set pk2=(@a:=@a+1) where pk1 between 700000 and 700000+20000;
          update t2 set pk1=700000 where pk1 between 700000 and 700000+20000;
          

          2. In the debugger, put a breakpoint in get_key_scans_params after this call:

                found_records= check_quick_select(param, idx, read_index_only, key,
                                                  update_tbl_stats, &mrr_flags,
                                                  &buf_size, &cost);
          

          Condition for breakpoint: keynr == 1

          Actions to do:

          set cost.io_count=10
          set found_records=found_records * 0.5
          

          Continue execution. It should be that best_idx=1 after loop exit.

          3. Put a breakpoint in best_access_path BEFORE this code:

          =>      if (tmp + 0.0001 < best_time - records/(double) TIME_FOR_COMPARE)
                  {
                    best_time= tmp + records/(double) TIME_FOR_COMPARE;
                    best= tmp;
                    best_records= records;
                    best_key= start_key;
                    best_max_key_part= max_key_part;
                    best_ref_depends_map= found_ref;
                  }
          

          On the second breakpoint hit, you will see tmp=10291 or something like that.

          Actions to do:

          set tmp=200
          

          It should pick ref access on key1

          Then continue execution and test_if_skip_sort_order/test_if_cheaper_ordering will choose to switch to PRIMARY key and use index scan.

          psergei Sergei Petrunia added a comment - - edited Steps to reproduce: 1. Fill the dataset CREATE TABLE t2 ( pk1 int (10) unsigned NOT NULL , pk2 smallint (6) NOT NULL DEFAULT '0' ,   key1part1 mediumint(8) unsigned NOT NULL , key1part2 smallint (5) unsigned NOT NULL , non_idex_col smallint (6) NOT NULL , other_data varchar (100),   PRIMARY KEY (pk1, pk2), KEY key1 (key1part1, key1part2) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;   insert into t2 select A.a+1000*B.a, 123, A.a+1000*B.a, A.a, 123, 'filler-data' from one_k A, one_k B;   update t2 set key1part1=1500, key1part2=5 where pk1 between 700000 and 700000+20000; set @a=0; update t2 set pk2=(@a:=@a+1) where pk1 between 700000 and 700000+20000; update t2 set pk1=700000 where pk1 between 700000 and 700000+20000; 2. In the debugger, put a breakpoint in get_key_scans_params after this call: found_records= check_quick_select(param, idx, read_index_only, key, update_tbl_stats, &mrr_flags, &buf_size, &cost); Condition for breakpoint: keynr == 1 Actions to do: set cost.io_count=10 set found_records=found_records * 0.5 Continue execution. It should be that best_idx=1 after loop exit. 3. Put a breakpoint in best_access_path BEFORE this code: => if (tmp + 0.0001 < best_time - records/( double ) TIME_FOR_COMPARE) { best_time= tmp + records/( double ) TIME_FOR_COMPARE; best= tmp; best_records= records; best_key= start_key; best_max_key_part= max_key_part; best_ref_depends_map= found_ref; } On the second breakpoint hit, you will see tmp=10291 or something like that. Actions to do: set tmp=200 It should pick ref access on key1 Then continue execution and test_if_skip_sort_order/test_if_cheaper_ordering will choose to switch to PRIMARY key and use index scan.

          I then follow the execution all the way to here

            #0  SQL_SELECT::test_quick_select (this=0x7ff8b402b078, thd=0x7ff8b4000b00, keys_to_use=..., prev_tables=0, limit=1, force_quick_range=true, ordered_output=false, remove_false_parts_of_where=false) at /home/psergey/dev-git/10.3-r3/sql/opt_range.cc:2545
            #1  0x00007ff92ad930b2 in test_if_skip_sort_order (tab=0x7ff8b4029f90, order=0x7ff8b4027430, select_limit=26, no_changes=false, map=0x7ff8b40a8a70) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:22128
            #2  0x00007ff92ad5ebb0 in JOIN::optimize_stage2 (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:2567
            #3  0x00007ff92ad5c7eb in JOIN::optimize_inner (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:1921
            #4  0x00007ff92ad5acff in JOIN::optimize (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:1448
            #5  0x00007ff92ad648be in mysql_select (thd=0x7ff8b4000b00, tables=0x7ff8b4025f98, wild_num=0, fields=..., conds=0x7ff8b4026ec0, og_num=1, order=0x7ff8b4027430, group=0x0, having=0x0, proc_param=0x0, select_options=2147748612, result=0x7ff8b4027550, unit=0x7ff8b40049b0, select_lex=0x7ff8b4005120) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:4220
          

          The calculations show that yes, it is better to switch to a quick select for the key PRIMARY.
          And it calls SQL_SELECT::test_quick_select to get the quick select.
          And it's not going to succeed, because the condition that is passed to get_mm_tree is taken from SQL_SELECT::cond and it is:

          (gdb) p dbug_print_item(cond)
            $141 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 <=> 1500"
          

          psergei Sergei Petrunia added a comment - I then follow the execution all the way to here #0 SQL_SELECT::test_quick_select (this=0x7ff8b402b078, thd=0x7ff8b4000b00, keys_to_use=..., prev_tables=0, limit=1, force_quick_range=true, ordered_output=false, remove_false_parts_of_where=false) at /home/psergey/dev-git/10.3-r3/sql/opt_range.cc:2545 #1 0x00007ff92ad930b2 in test_if_skip_sort_order (tab=0x7ff8b4029f90, order=0x7ff8b4027430, select_limit=26, no_changes=false, map=0x7ff8b40a8a70) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:22128 #2 0x00007ff92ad5ebb0 in JOIN::optimize_stage2 (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:2567 #3 0x00007ff92ad5c7eb in JOIN::optimize_inner (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:1921 #4 0x00007ff92ad5acff in JOIN::optimize (this=0x7ff8b4027bd8) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:1448 #5 0x00007ff92ad648be in mysql_select (thd=0x7ff8b4000b00, tables=0x7ff8b4025f98, wild_num=0, fields=..., conds=0x7ff8b4026ec0, og_num=1, order=0x7ff8b4027430, group=0x0, having=0x0, proc_param=0x0, select_options=2147748612, result=0x7ff8b4027550, unit=0x7ff8b40049b0, select_lex=0x7ff8b4005120) at /home/psergey/dev-git/10.3-r3/sql/sql_select.cc:4220 The calculations show that yes, it is better to switch to a quick select for the key PRIMARY. And it calls SQL_SELECT::test_quick_select to get the quick select. And it's not going to succeed, because the condition that is passed to get_mm_tree is taken from SQL_SELECT::cond and it is: (gdb) p dbug_print_item(cond) $141 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 <=> 1500"
          psergei Sergei Petrunia added a comment - - edited

          Where did the condition go...

          In the case that I am debugging it seems to be pushed down by Index Condition Pushdown:

          (gdb) p dbug_print_item(tab->pre_idx_push_select_cond)
            $150 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 <=> 1500 and t2.pk1 = 700000 and t2.key1part2 in (5,60,133,387)"
          

          Another potential concern is that if ref access was chosen, then the equalities that used to guarantee to be true were removed?

          JOIN::conds:

          (gdb) p dbug_print_item(join->conds)
            $154 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 = 1500 and t2.pk1 = 700000 and t2.key1part2 in (5,60,133,387)"
          

          psergei Sergei Petrunia added a comment - - edited Where did the condition go... In the case that I am debugging it seems to be pushed down by Index Condition Pushdown: (gdb) p dbug_print_item(tab->pre_idx_push_select_cond) $150 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 <=> 1500 and t2.pk1 = 700000 and t2.key1part2 in (5,60,133,387)" Another potential concern is that if ref access was chosen, then the equalities that used to guarantee to be true were removed? JOIN::conds: (gdb) p dbug_print_item(join->conds) $154 = 0x7ff92c4a0c80 <dbug_item_print_buf> "t2.key1part1 = 1500 and t2.pk1 = 700000 and t2.key1part2 in (5,60,133,387)"

          Another potential concern is that if ref access was chosen, then the equalities that used to guarantee to be true were removed?

          They would be, but then add_ref_to_table_cond is called to put them back. So this is not a concern.

          psergei Sergei Petrunia added a comment - Another potential concern is that if ref access was chosen, then the equalities that used to guarantee to be true were removed? They would be, but then add_ref_to_table_cond is called to put them back. So this is not a concern.
          psergei Sergei Petrunia added a comment - The patch: http://lists.askmonty.org/pipermail/commits/2018-September/012901.html

          Igor please review

          psergei Sergei Petrunia added a comment - Igor please review
          igor Igor Babaev added a comment -

          ok to push

          igor Igor Babaev added a comment - ok to push

          I had this issue on my environment too, and 10.1.37 fixed it. The optimizer now behaves correctly

          marostegui Manuel Arostegui added a comment - I had this issue on my environment too, and 10.1.37 fixed it. The optimizer now behaves correctly

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.