Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18073

get_range_limit_read_cost() doesnt adjust LIMIT for the range access

    XMLWordPrintable

    Details

      Description

      Basically, it ignores the selectivity of the range access being considered.

      Consider an example

      create table t1  (
        pk int not null,
        a int,
        b int,
        c int,
        filler char(100),
        KEY a_c(a,c),
        KEY a_b(a,b)
      );
       
      insert into t1 
      select a, a,a,a, 'filler-dataaa' from test.one_k;
       
      update t1 set a=1 where pk between 0 and 150;
      update t1 set b=2 where pk between 0 and 20;
      

      Let's debug this query:

      explain 
      select * from t1
      where a=1 and b=2
      order by c limit 10;
      

      Put a breakpoint in test_if_cheaper_ordering.
      We will have:

      • select_limit=10
      • refkey_rows_estimate=21 (this is from range on index a_b)
      • table_records=1000

      The selectivity WHERE condition would be about 2%:

      refkey_rows_estimate / table_records= 0.021  
      

      This code is executed:

                select_limit= (ha_rows) (select_limit *
                                         (double) table_records /
                                          refkey_rows_estimate);
      

      and adjusts select_limit to be: 10 / 0.021 = 476.
      The meaning of this number is:

      Assuming we are doing a full index scan on an index compatible with ORDER BY clause,
      how many records we would need to scan before we get the LIMIT (10) matches we want

      Good so far.

      Then, we enter get_range_limit_read_cost(... rows_limit=476, ...).

      and arrive here:

          if (best_rows > rows_limit)
          {
            /*
              LIMIT clause specifies that we will need to read fewer records than
              quick select will return. Assume that quick select's cost is
              proportional to the number of records we need to return (e.g. if we 
              only need 1/3rd of records, it will cost us 1/3rd of quick select's
              read time)
            */
            best_cost *= rows_limit / best_rows;
          }
          *read_time= best_cost;
      

      Look at the if-condition.
      It compares the number of rows we will get from the quick select (151 rows, these rows will satisfy a=1) with the rows_limit, which is the number of rows to read before the condition on "a=1" is applied.

      This seems wrong.

      Instead of rows_limit, we should use "the number of rows to read from quick select, before we find #LIMIT matches".

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergey Sergei Petrunia
              Reporter:
              psergey Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: