[MDEV-6634] Wrong estimates for ref(const) and key IS NULL predicate - Jira

XML

Word

Printable

Create the dataset:

create table ten(a int);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t1p (

  pk int primary key,

  removed enum('true') DEFAULT NULL,

  key(removed)

) engine=innodb pack_keys=1;

# This inserts 10M records:

insert into t1p

 select

   A.a + 1000*B.a+1000*1000*C.a,

   IF(A.a+1000*B.a > 100, NULL, 'true')

 from one_k A, one_k B,ten C;

Let's explore the dataset

MariaDB [test2]> explain select count(*) from t1p force index(removed) where removed is null;

+------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+

| id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows    | Extra                    |

+------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+

|    1 | SIMPLE      | t1p   | ref  | removed       | removed | 2       | const | 5003505 | Using where; Using index |

+------+-------------+-------+------+---------------+---------+---------+-------+---------+--------------------------+

Here, it underestimates the amount of NULLs (by about two times). This is not a problem, yet.

MariaDB [test2]> explain select count(*) from t1p force index(removed) where removed is not null;

+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |

+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

|    1 | SIMPLE      | t1p   | range | removed       | removed | 2       | NULL | 1009 | Using where; Using index |

+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

This is close to reality.

MariaDB [test2]> explain

    -> select * from ten left join t1p on ten.a=3 and t1p.removed is null;

+------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+

| id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                    |

+------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+

|    1 | SIMPLE      | ten   | ALL  | NULL          | NULL    | NULL    | NULL  |   10 |                          |

|    1 | SIMPLE      | t1p   | ref  | removed       | removed | 2       | const |   10 | Using where; Using index |

+------+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+

Now, here the scan on "removed IS NULL" produces 10 rows, even if we saw above that the optimizer knows it will produce about 1K rows.

This might be not a problem for this particular query, but it may cause the optimizer not to pick a good query plan.

relates to

MDEV-6672 Performance degradation on a query with joins and ORDER BY .. LIMIT

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.