[MDEV-31327] Histogram range selectivity estimates should be merged, not added. Created: 2023-05-22  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 11.1.0, 10.11.3, 11.0.1, 10.4.29, 10.5.20, 10.6.13, 10.9.6, 10.10
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Rex Johnston Assignee: Rex Johnston
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-31067 selectivity_from_histogram >1.0 for c... Closed
relates to MDEV-22360 Sufficient conditions for accurate ca... Stalled

 Description   

When the optimizer is evaluating join order and estimating rows produced by a join order, it loops through column constraints adding rather than merging selectivity estimates.



 Comments   
Comment by Rex Johnston [ 2023-05-22 ]

drop table if exists bucket, one_third_of_bucket, t10;
create table bucket(a int);  # This holds how many rows we hold in a bucket.
insert into bucket select 1 from seq_1_to_780;
 
create table one_third_of_bucket(a int);  # one-third of a bucket
insert into one_third_of_bucket select 1 from seq_1_to_260;
 
create table t10 (a int);
insert into t10 select 0 from bucket, seq_1_to_4;
 
insert into t10 select 86930 from one_third_of_bucket;
insert into t10 select 86940 from one_third_of_bucket;
insert into t10 select 86950 from one_third_of_bucket;
 
 
insert into t10 select 347830 from one_third_of_bucket;
insert into t10 select 347840 from one_third_of_bucket;
insert into t10 select 347850 from one_third_of_bucket;
 
 
insert into t10 select 347850 from bucket, seq_1_to_8;
 
insert into t10 select 652140 from one_third_of_bucket;
insert into t10 select 652150 from one_third_of_bucket;
insert into t10 select 652160 from one_third_of_bucket;
 
insert into t10 select 652160 from bucket, seq_1_to_52;
 
insert into t10 select 652170 from one_third_of_bucket;
insert into t10 select 652180 from one_third_of_bucket;
insert into t10 select 652190 from one_third_of_bucket;
 
insert into t10 select 652190 from bucket;
 
 
insert into t10 select 739130 from one_third_of_bucket;
insert into t10 select 739140 from one_third_of_bucket;
insert into t10 select 739150 from one_third_of_bucket;
 
insert into t10 select 739150 from bucket, seq_1_to_40;
 
 
insert into t10 select 782570 from one_third_of_bucket;
insert into t10 select 782580 from one_third_of_bucket;
insert into t10 select 782590 from one_third_of_bucket;
 
insert into t10 select 913000 from one_third_of_bucket;
insert into t10 select 913010 from one_third_of_bucket;
insert into t10 select 913020 from one_third_of_bucket;
 
insert into t10 select 913020 from bucket, seq_1_to_6;
 
insert into t10 select 913030 from one_third_of_bucket;
insert into t10 select 913040 from one_third_of_bucket;
insert into t10 select 913050 from one_third_of_bucket;
 
insert into t10 select 913050 from bucket, seq_1_to_8;
 
insert into t10 select  999980 from one_third_of_bucket;
insert into t10 select  999990 from one_third_of_bucket;
insert into t10 select 1000000 from one_third_of_bucket;
 
analyze table t10 persistent for all;

and

set optimizer_trace=1;
explain select * from t10 where a in (86930, 86931, 86932, 86933, 86934, 86935, 86936, 86937, 86938, 86939, 86940, 86941, 86942, 86943, 86944, 86945, 86946, 86947, 86948, 86949, 86950, 86951, 86952, 86953, 86954, 86955, 86956, 86957, 86958, 86959, 86960, 86961, 86962, 86963, 86964, 86965, 86966, 86967, 86968, 86969, 86960, 86971, 86972, 86973, 86974, 86975, 86976, 86977, 86978, 86979, 86980, 86981, 86982, 86983, 86984, 86985, 86986, 86987, 86988, 86989, 86990, 86991, 86992, 86993, 86994, 86995, 86996, 86997, 86998, 86999, 87000, 87001, 87002, 87003, 87004, 87005, 87006, 87007, 87008, 87009, 87010, 87011, 87012, 87013, 87014, 87015, 87016, 87017, 87018, 87019, 87020, 87021, 87022, 87023, 87024, 87025, 87026, 87027, 87028, 87029, 87030, 87031, 87032, 87033, 87034, 87035, 87036, 87037, 87038, 87039, 87040, 87041, 87042, 87043, 87044, 87045, 87046, 87047, 87048, 87049, 87050, 87051, 87052, 87053, 87054, 87055, 87056, 87057, 87058, 87059);
select json_detailed(json_extract(trace, '$**.selectivity_for_columns')) from information_schema.optimizer_trace;

produces

| [
    [
        {
            "column_name": "a",
            "ranges": 
            [
                "86930 <= a <= 86930",
                "86931 <= a <= 86931",
                "86932 <= a <= 86932",
                "86933 <= a <= 86933",
                "86934 <= a <= 86934",
                "86935 <= a <= 86935",
<SNIP>
                "87055 <= a <= 87055",
                "87056 <= a <= 87056",
                "87057 <= a <= 87057",
                "87058 <= a <= 87058",
                "87059 <= a <= 87059"
            ],
            "selectivity_from_histogram": 1.0078
        }
    ]
] |

All these values fit into one bucket. The correct selectivity is 0.0078125.

Comment by Sergei Petrunia [ 2023-05-22 ]

Agree.

For the record: the following happens in the code with the pushed variant for MDEV-31067:

The column has 30 distinct values, the histogram has 128 buckets.
All lookup constants fit into one bucket so we get 1/128th (as it is less than 1/30th) as the estimate for each a=const.
The IN-list has 129 elements so the estimate sums up to a value slightly greater than 1.

With the old code (before any patches for MDEV-31067), we didn't have this effect for this example, because the bucket the values hit is "narrow" and the logic that MDEV-31067 patch refers to as "brave heuristic" makes the estimate smaller.

Comment by Sergei Petrunia [ 2023-05-22 ]

... that is, in order to observe the issue the column had to have 30 values and the query had to query for more values than that (129).

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by JiraAutomate [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 10:23:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.