[MDEV-22664] Estimates for range conditions from histograms are way off Created: 2020-05-22  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Varun Gupta (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: eits


 Description   

Here is a simple dataset used:

CREATE TABLE t1(a INT);
INSERT INTO t1 SELECT 5 from seq_1_to_99;
INSERT INTO t1 VALUES (10);

set optimizer_use_condition_selectivity=4;
set use_stat_tables='preferably';
set histogram_size=255;
ANALYZE TABLE t1 PERSISTENT FOR ALL;  # Collect EITS

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a < 5;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 100  |    98.44 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.001 sec)

Well there are no rows with a < 5 and we end up with filtered as 98.44

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a > 5;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 100  |    99.22 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.002 sec)

Well there is 1 row with a > 5 and we end up with filtered as 99.22 %



 Comments   
Comment by Varun Gupta (Inactive) [ 2020-05-22 ]

This is not a bug but a feature missing in the implementation while getting range selectivity from the histogram.
It would be good to take care of the end points of ranges into consideration. After having a discussion with psergey, he said it would make more sense to take endpoints into consideration that we store the values in the histogram instead of storing a fraction between [0,1].

Generated at Thu Feb 08 09:16:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.