Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19424

InnoDB's records_in_range estimates are capped at about 50%

    Details

      Description

      InnoDB's records_in_range estimates seem to be capped at ~50% of the table. (We used to observe this on various occasions before but I haven't been able to find an MDEV for this).

      If I pass a range that contains a bigger fraction of the table, the estimated number of rows is still around 50% of the total rows in the table.

      Test dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t1 (
        a int,
        b varchar(32),
        c int,
        key(a),
        key(b)
      ) engine=innodb;
      

      # 100K NULLs
      insert into t1 select
        null,
        null,
        A.a + 1000*B.a 
      from
        one_k A,
        one_k B
      where 
        B.a<100;
       
      # 900 K non-NULLs
      insert into t1 select
        A.a + 1000*B.a,
        A.a + 1000*B.a,
        A.a + 1000*B.a 
      from
        one_k A,
        one_k B
      where 
        B.a>=100;
      

      Now, both a IS NOT NULL or b IS NOT NULL match 900K rows (90% of the table).
      But EXPLAIN will show the estimates of about 500K rows, which is 50% of the table:

      explain select * from t1 force index (a) where a is not null ;
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                 |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      |    1 | SIMPLE      | t1    | range | a             | a    | 5       | NULL | 494308 | Using index condition |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      

      explain select * from t1 force index (b) where b is not null ;
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                 |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      |    1 | SIMPLE      | t1    | range | b             | b    | 35      | NULL | 494308 | Using index condition |
      +------+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------+
      

      When the optimizer was simple, this property was not a problem.

      A simple optimizer would only use range estimates to construct range access. Range access is cheaper than full table if it covers about 30% of the table. Returning 50% of the table instead of 90% was not an issue.

      A more advanced optimizer also attempts to use range estimates for condition selectivity, etc. Here, returning 50% selectivity instead of 90% is a problem. (One must take into account that selectivity is computed for multiple indexes. For example, for 5 indexes 0.5^2= 1/32 . 32x under-estimation of selectivity)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sachin.setiya.007 Sachin Setiya
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: