Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6634

Wrong estimates for ref(const) and key IS NULL predicate

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5.39, 10.0.13
    • Fix Version/s: 10.1.1
    • Component/s: None
    • Labels:

      Description

      Create the dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t1p (                                                                                                      
        pk int primary key,                                                                                                             
        removed enum('true') DEFAULT NULL,                                                                                              
        key(removed)                                                                                                                    
      ) engine=innodb pack_keys=1;                                                                                                      
       
      # This inserts 10M records:
      insert into t1p
       select 
         A.a + 1000*B.a+1000*1000*C.a, 
         IF(A.a+1000*B.a > 100, NULL, 'true')
       from one_k A, one_k B,ten C;

      Let's explore the dataset

      MariaDB [test2]> explain select count(*) from t1p force index(removed) where removed is null;
      +------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows    | Extra                    |
      +------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+
      |    1 | SIMPLE      | t1p   | ref  | removed       | removed | 2       | const | 5003505 | Using where; Using index |
      +------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+

      Here, it underestimates the amount of NULLs (by about two times). This is not a problem, yet.

      MariaDB [test2]> explain select count(*) from t1p force index(removed) where removed is not null;
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
      |    1 | SIMPLE      | t1p   | range | removed       | removed | 2       | NULL | 1009 | Using where; Using index |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

      This is close to reality.

      MariaDB [test2]> explain
          -> select * from ten left join t1p on ten.a=3 and t1p.removed is null;
      +------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
      | id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
      |    1 | SIMPLE      | ten   | ALL  | NULL          | NULL    | NULL    | NULL  |   10 |                          |
      |    1 | SIMPLE      | t1p   | ref  | removed       | removed | 2       | const |   10 | Using where; Using index |
      +------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+

      Now, here the scan on "removed IS NULL" produces 10 rows, even if we saw above that the optimizer knows it will produce about 1K rows.

      This might be not a problem for this particular query, but it may cause the optimizer not to pick a good query plan.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergey Sergei Petrunia
              Reporter:
              psergey Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: