[MDEV-31327] Histogram range selectivity estimates should be merged, not added. - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.0.1, 11.1.0, 10.4.29, 10.5.20, 10.6.13, 10.9.6, 10.11.3, 10.10(EOL)
Fix Version/s: 10.4(EOL)
Component/s: Optimizer
Labels:
None

Description

When the optimizer is evaluating join order and estimating rows produced by a join order, it loops through column constraints adding rather than merging selectivity estimates.

Attachments

Issue Links

relates to

MDEV-31067 selectivity_from_histogram >1.0 for col=const on DOUBLE_PREC_HB histogram

Closed

MDEV-22360 Sufficient conditions for accurate calculation of join cardinality

Stalled

Activity

Ascending order - Click to sort in descending order

Rex Johnston added a comment - 2023-05-22 17:53

drop table if exists bucket, one_third_of_bucket, t10;

create table bucket(a int);  # This holds how many rows we hold in a bucket.

insert into bucket select 1 from seq_1_to_780;

create table one_third_of_bucket(a int);  # one-third of a bucket

insert into one_third_of_bucket select 1 from seq_1_to_260;

create table t10 (a int);

insert into t10 select 0 from bucket, seq_1_to_4;

insert into t10 select 86930 from one_third_of_bucket;

insert into t10 select 86940 from one_third_of_bucket;

insert into t10 select 86950 from one_third_of_bucket;

insert into t10 select 347830 from one_third_of_bucket;

insert into t10 select 347840 from one_third_of_bucket;

insert into t10 select 347850 from one_third_of_bucket;

insert into t10 select 347850 from bucket, seq_1_to_8;

insert into t10 select 652140 from one_third_of_bucket;

insert into t10 select 652150 from one_third_of_bucket;

insert into t10 select 652160 from one_third_of_bucket;

insert into t10 select 652160 from bucket, seq_1_to_52;

insert into t10 select 652170 from one_third_of_bucket;

insert into t10 select 652180 from one_third_of_bucket;

insert into t10 select 652190 from one_third_of_bucket;

insert into t10 select 652190 from bucket;

insert into t10 select 739130 from one_third_of_bucket;

insert into t10 select 739140 from one_third_of_bucket;

insert into t10 select 739150 from one_third_of_bucket;

insert into t10 select 739150 from bucket, seq_1_to_40;

insert into t10 select 782570 from one_third_of_bucket;

insert into t10 select 782580 from one_third_of_bucket;

insert into t10 select 782590 from one_third_of_bucket;

insert into t10 select 913000 from one_third_of_bucket;

insert into t10 select 913010 from one_third_of_bucket;

insert into t10 select 913020 from one_third_of_bucket;

insert into t10 select 913020 from bucket, seq_1_to_6;

insert into t10 select 913030 from one_third_of_bucket;

insert into t10 select 913040 from one_third_of_bucket;

insert into t10 select 913050 from one_third_of_bucket;

insert into t10 select 913050 from bucket, seq_1_to_8;

insert into t10 select  999980 from one_third_of_bucket;

insert into t10 select  999990 from one_third_of_bucket;

insert into t10 select 1000000 from one_third_of_bucket;

analyze table t10 persistent for all;

and

set optimizer_trace=1;

explain select * from t10 where a in (86930, 86931, 86932, 86933, 86934, 86935, 86936, 86937, 86938, 86939, 86940, 86941, 86942, 86943, 86944, 86945, 86946, 86947, 86948, 86949, 86950, 86951, 86952, 86953, 86954, 86955, 86956, 86957, 86958, 86959, 86960, 86961, 86962, 86963, 86964, 86965, 86966, 86967, 86968, 86969, 86960, 86971, 86972, 86973, 86974, 86975, 86976, 86977, 86978, 86979, 86980, 86981, 86982, 86983, 86984, 86985, 86986, 86987, 86988, 86989, 86990, 86991, 86992, 86993, 86994, 86995, 86996, 86997, 86998, 86999, 87000, 87001, 87002, 87003, 87004, 87005, 87006, 87007, 87008, 87009, 87010, 87011, 87012, 87013, 87014, 87015, 87016, 87017, 87018, 87019, 87020, 87021, 87022, 87023, 87024, 87025, 87026, 87027, 87028, 87029, 87030, 87031, 87032, 87033, 87034, 87035, 87036, 87037, 87038, 87039, 87040, 87041, 87042, 87043, 87044, 87045, 87046, 87047, 87048, 87049, 87050, 87051, 87052, 87053, 87054, 87055, 87056, 87057, 87058, 87059);

select json_detailed(json_extract(trace, '$**.selectivity_for_columns')) from information_schema.optimizer_trace;

produces

| [

            "column_name": "a",

            "ranges":

                "86930 <= a <= 86930",

                "86931 <= a <= 86931",

                "86932 <= a <= 86932",

                "86933 <= a <= 86933",

                "86934 <= a <= 86934",

                "86935 <= a <= 86935",

<SNIP>

                "87055 <= a <= 87055",

                "87056 <= a <= 87056",

                "87057 <= a <= 87057",

                "87058 <= a <= 87058",

                "87059 <= a <= 87059"

],

            "selectivity_from_histogram": 1.0078

] |

All these values fit into one bucket. The correct selectivity is 0.0078125.

Rex Johnston added a comment - 2023-05-22 17:53 drop table if exists bucket, one_third_of_bucket, t10; create table bucket(a int ); # This holds how many rows we hold in a bucket. insert into bucket select 1 from seq_1_to_780; create table one_third_of_bucket(a int ); # one-third of a bucket insert into one_third_of_bucket select 1 from seq_1_to_260; create table t10 (a int ); insert into t10 select 0 from bucket, seq_1_to_4; insert into t10 select 86930 from one_third_of_bucket; insert into t10 select 86940 from one_third_of_bucket; insert into t10 select 86950 from one_third_of_bucket; insert into t10 select 347830 from one_third_of_bucket; insert into t10 select 347840 from one_third_of_bucket; insert into t10 select 347850 from one_third_of_bucket; insert into t10 select 347850 from bucket, seq_1_to_8; insert into t10 select 652140 from one_third_of_bucket; insert into t10 select 652150 from one_third_of_bucket; insert into t10 select 652160 from one_third_of_bucket; insert into t10 select 652160 from bucket, seq_1_to_52; insert into t10 select 652170 from one_third_of_bucket; insert into t10 select 652180 from one_third_of_bucket; insert into t10 select 652190 from one_third_of_bucket; insert into t10 select 652190 from bucket; insert into t10 select 739130 from one_third_of_bucket; insert into t10 select 739140 from one_third_of_bucket; insert into t10 select 739150 from one_third_of_bucket; insert into t10 select 739150 from bucket, seq_1_to_40; insert into t10 select 782570 from one_third_of_bucket; insert into t10 select 782580 from one_third_of_bucket; insert into t10 select 782590 from one_third_of_bucket; insert into t10 select 913000 from one_third_of_bucket; insert into t10 select 913010 from one_third_of_bucket; insert into t10 select 913020 from one_third_of_bucket; insert into t10 select 913020 from bucket, seq_1_to_6; insert into t10 select 913030 from one_third_of_bucket; insert into t10 select 913040 from one_third_of_bucket; insert into t10 select 913050 from one_third_of_bucket; insert into t10 select 913050 from bucket, seq_1_to_8; insert into t10 select 999980 from one_third_of_bucket; insert into t10 select 999990 from one_third_of_bucket; insert into t10 select 1000000 from one_third_of_bucket; analyze table t10 persistent for all ; and set optimizer_trace=1; explain select * from t10 where a in (86930, 86931, 86932, 86933, 86934, 86935, 86936, 86937, 86938, 86939, 86940, 86941, 86942, 86943, 86944, 86945, 86946, 86947, 86948, 86949, 86950, 86951, 86952, 86953, 86954, 86955, 86956, 86957, 86958, 86959, 86960, 86961, 86962, 86963, 86964, 86965, 86966, 86967, 86968, 86969, 86960, 86971, 86972, 86973, 86974, 86975, 86976, 86977, 86978, 86979, 86980, 86981, 86982, 86983, 86984, 86985, 86986, 86987, 86988, 86989, 86990, 86991, 86992, 86993, 86994, 86995, 86996, 86997, 86998, 86999, 87000, 87001, 87002, 87003, 87004, 87005, 87006, 87007, 87008, 87009, 87010, 87011, 87012, 87013, 87014, 87015, 87016, 87017, 87018, 87019, 87020, 87021, 87022, 87023, 87024, 87025, 87026, 87027, 87028, 87029, 87030, 87031, 87032, 87033, 87034, 87035, 87036, 87037, 87038, 87039, 87040, 87041, 87042, 87043, 87044, 87045, 87046, 87047, 87048, 87049, 87050, 87051, 87052, 87053, 87054, 87055, 87056, 87057, 87058, 87059); select json_detailed(json_extract(trace, '$**.selectivity_for_columns' )) from information_schema.optimizer_trace; produces | [ [ { "column_name": "a", "ranges": [ "86930 <= a <= 86930", "86931 <= a <= 86931", "86932 <= a <= 86932", "86933 <= a <= 86933", "86934 <= a <= 86934", "86935 <= a <= 86935", <SNIP> "87055 <= a <= 87055", "87056 <= a <= 87056", "87057 <= a <= 87057", "87058 <= a <= 87058", "87059 <= a <= 87059" ], "selectivity_from_histogram": 1.0078 } ] ] | All these values fit into one bucket. The correct selectivity is 0.0078125.

Sergei Petrunia added a comment - 2023-05-22 19:44

Agree.

For the record: the following happens in the code with the pushed variant for ~~MDEV-31067~~:

The column has 30 distinct values, the histogram has 128 buckets.
All lookup constants fit into one bucket so we get 1/128th (as it is less than 1/30th) as the estimate for each a=const.
The IN-list has 129 elements so the estimate sums up to a value slightly greater than 1.

With the old code (before any patches for ~~MDEV-31067~~), we didn't have this effect for this example, because the bucket the values hit is "narrow" and the logic that ~~MDEV-31067~~ patch refers to as "brave heuristic" makes the estimate smaller.

Sergei Petrunia added a comment - 2023-05-22 19:44 Agree. For the record: the following happens in the code with the pushed variant for MDEV-31067 : The column has 30 distinct values, the histogram has 128 buckets. All lookup constants fit into one bucket so we get 1/128th (as it is less than 1/30th) as the estimate for each a=const . The IN-list has 129 elements so the estimate sums up to a value slightly greater than 1. With the old code (before any patches for MDEV-31067 ), we didn't have this effect for this example, because the bucket the values hit is "narrow" and the logic that MDEV-31067 patch refers to as "brave heuristic" makes the estimate smaller.

Sergei Petrunia added a comment - 2023-05-22 20:21

... that is, in order to observe the issue the column had to have 30 values and the query had to query for more values than that (129).

Sergei Petrunia added a comment - 2023-05-22 20:21 ... that is, in order to observe the issue the column had to have 30 values and the query had to query for more values than that (129).

Julien Fritsch added a comment - 2023-12-05 09:26

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch added a comment - 2023-12-05 09:26 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:48

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:48 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

MariaDB Server

Histogram range selectivity estimates should be merged, not added.

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration