Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31327

Histogram range selectivity estimates should be merged, not added.

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.0.1, 11.1.0, 10.4.29, 10.5.20, 10.6.13, 10.9.6, 10.11.3, 10.10(EOL)
    • 10.4(EOL)
    • Optimizer
    • None

    Description

      When the optimizer is evaluating join order and estimating rows produced by a join order, it loops through column constraints adding rather than merging selectivity estimates.

      Attachments

        Issue Links

          Activity

            Johnston Rex Johnston added a comment -

            drop table if exists bucket, one_third_of_bucket, t10;
            create table bucket(a int);  # This holds how many rows we hold in a bucket.
            insert into bucket select 1 from seq_1_to_780;
             
            create table one_third_of_bucket(a int);  # one-third of a bucket
            insert into one_third_of_bucket select 1 from seq_1_to_260;
             
            create table t10 (a int);
            insert into t10 select 0 from bucket, seq_1_to_4;
             
            insert into t10 select 86930 from one_third_of_bucket;
            insert into t10 select 86940 from one_third_of_bucket;
            insert into t10 select 86950 from one_third_of_bucket;
             
             
            insert into t10 select 347830 from one_third_of_bucket;
            insert into t10 select 347840 from one_third_of_bucket;
            insert into t10 select 347850 from one_third_of_bucket;
             
             
            insert into t10 select 347850 from bucket, seq_1_to_8;
             
            insert into t10 select 652140 from one_third_of_bucket;
            insert into t10 select 652150 from one_third_of_bucket;
            insert into t10 select 652160 from one_third_of_bucket;
             
            insert into t10 select 652160 from bucket, seq_1_to_52;
             
            insert into t10 select 652170 from one_third_of_bucket;
            insert into t10 select 652180 from one_third_of_bucket;
            insert into t10 select 652190 from one_third_of_bucket;
             
            insert into t10 select 652190 from bucket;
             
             
            insert into t10 select 739130 from one_third_of_bucket;
            insert into t10 select 739140 from one_third_of_bucket;
            insert into t10 select 739150 from one_third_of_bucket;
             
            insert into t10 select 739150 from bucket, seq_1_to_40;
             
             
            insert into t10 select 782570 from one_third_of_bucket;
            insert into t10 select 782580 from one_third_of_bucket;
            insert into t10 select 782590 from one_third_of_bucket;
             
            insert into t10 select 913000 from one_third_of_bucket;
            insert into t10 select 913010 from one_third_of_bucket;
            insert into t10 select 913020 from one_third_of_bucket;
             
            insert into t10 select 913020 from bucket, seq_1_to_6;
             
            insert into t10 select 913030 from one_third_of_bucket;
            insert into t10 select 913040 from one_third_of_bucket;
            insert into t10 select 913050 from one_third_of_bucket;
             
            insert into t10 select 913050 from bucket, seq_1_to_8;
             
            insert into t10 select  999980 from one_third_of_bucket;
            insert into t10 select  999990 from one_third_of_bucket;
            insert into t10 select 1000000 from one_third_of_bucket;
             
            analyze table t10 persistent for all;
            

            and

            set optimizer_trace=1;
            explain select * from t10 where a in (86930, 86931, 86932, 86933, 86934, 86935, 86936, 86937, 86938, 86939, 86940, 86941, 86942, 86943, 86944, 86945, 86946, 86947, 86948, 86949, 86950, 86951, 86952, 86953, 86954, 86955, 86956, 86957, 86958, 86959, 86960, 86961, 86962, 86963, 86964, 86965, 86966, 86967, 86968, 86969, 86960, 86971, 86972, 86973, 86974, 86975, 86976, 86977, 86978, 86979, 86980, 86981, 86982, 86983, 86984, 86985, 86986, 86987, 86988, 86989, 86990, 86991, 86992, 86993, 86994, 86995, 86996, 86997, 86998, 86999, 87000, 87001, 87002, 87003, 87004, 87005, 87006, 87007, 87008, 87009, 87010, 87011, 87012, 87013, 87014, 87015, 87016, 87017, 87018, 87019, 87020, 87021, 87022, 87023, 87024, 87025, 87026, 87027, 87028, 87029, 87030, 87031, 87032, 87033, 87034, 87035, 87036, 87037, 87038, 87039, 87040, 87041, 87042, 87043, 87044, 87045, 87046, 87047, 87048, 87049, 87050, 87051, 87052, 87053, 87054, 87055, 87056, 87057, 87058, 87059);
            select json_detailed(json_extract(trace, '$**.selectivity_for_columns')) from information_schema.optimizer_trace;
            

            produces

            | [
                [
                    {
                        "column_name": "a",
                        "ranges": 
                        [
                            "86930 <= a <= 86930",
                            "86931 <= a <= 86931",
                            "86932 <= a <= 86932",
                            "86933 <= a <= 86933",
                            "86934 <= a <= 86934",
                            "86935 <= a <= 86935",
            <SNIP>
                            "87055 <= a <= 87055",
                            "87056 <= a <= 87056",
                            "87057 <= a <= 87057",
                            "87058 <= a <= 87058",
                            "87059 <= a <= 87059"
                        ],
                        "selectivity_from_histogram": 1.0078
                    }
                ]
            ] |
            
            

            All these values fit into one bucket. The correct selectivity is 0.0078125.

            Johnston Rex Johnston added a comment - drop table if exists bucket, one_third_of_bucket, t10; create table bucket(a int ); # This holds how many rows we hold in a bucket. insert into bucket select 1 from seq_1_to_780; create table one_third_of_bucket(a int ); # one-third of a bucket insert into one_third_of_bucket select 1 from seq_1_to_260; create table t10 (a int ); insert into t10 select 0 from bucket, seq_1_to_4; insert into t10 select 86930 from one_third_of_bucket; insert into t10 select 86940 from one_third_of_bucket; insert into t10 select 86950 from one_third_of_bucket; insert into t10 select 347830 from one_third_of_bucket; insert into t10 select 347840 from one_third_of_bucket; insert into t10 select 347850 from one_third_of_bucket; insert into t10 select 347850 from bucket, seq_1_to_8; insert into t10 select 652140 from one_third_of_bucket; insert into t10 select 652150 from one_third_of_bucket; insert into t10 select 652160 from one_third_of_bucket; insert into t10 select 652160 from bucket, seq_1_to_52; insert into t10 select 652170 from one_third_of_bucket; insert into t10 select 652180 from one_third_of_bucket; insert into t10 select 652190 from one_third_of_bucket; insert into t10 select 652190 from bucket; insert into t10 select 739130 from one_third_of_bucket; insert into t10 select 739140 from one_third_of_bucket; insert into t10 select 739150 from one_third_of_bucket; insert into t10 select 739150 from bucket, seq_1_to_40; insert into t10 select 782570 from one_third_of_bucket; insert into t10 select 782580 from one_third_of_bucket; insert into t10 select 782590 from one_third_of_bucket; insert into t10 select 913000 from one_third_of_bucket; insert into t10 select 913010 from one_third_of_bucket; insert into t10 select 913020 from one_third_of_bucket; insert into t10 select 913020 from bucket, seq_1_to_6; insert into t10 select 913030 from one_third_of_bucket; insert into t10 select 913040 from one_third_of_bucket; insert into t10 select 913050 from one_third_of_bucket; insert into t10 select 913050 from bucket, seq_1_to_8; insert into t10 select 999980 from one_third_of_bucket; insert into t10 select 999990 from one_third_of_bucket; insert into t10 select 1000000 from one_third_of_bucket;   analyze table t10 persistent for all ; and set optimizer_trace=1; explain select * from t10 where a in (86930, 86931, 86932, 86933, 86934, 86935, 86936, 86937, 86938, 86939, 86940, 86941, 86942, 86943, 86944, 86945, 86946, 86947, 86948, 86949, 86950, 86951, 86952, 86953, 86954, 86955, 86956, 86957, 86958, 86959, 86960, 86961, 86962, 86963, 86964, 86965, 86966, 86967, 86968, 86969, 86960, 86971, 86972, 86973, 86974, 86975, 86976, 86977, 86978, 86979, 86980, 86981, 86982, 86983, 86984, 86985, 86986, 86987, 86988, 86989, 86990, 86991, 86992, 86993, 86994, 86995, 86996, 86997, 86998, 86999, 87000, 87001, 87002, 87003, 87004, 87005, 87006, 87007, 87008, 87009, 87010, 87011, 87012, 87013, 87014, 87015, 87016, 87017, 87018, 87019, 87020, 87021, 87022, 87023, 87024, 87025, 87026, 87027, 87028, 87029, 87030, 87031, 87032, 87033, 87034, 87035, 87036, 87037, 87038, 87039, 87040, 87041, 87042, 87043, 87044, 87045, 87046, 87047, 87048, 87049, 87050, 87051, 87052, 87053, 87054, 87055, 87056, 87057, 87058, 87059); select json_detailed(json_extract(trace, '$**.selectivity_for_columns' )) from information_schema.optimizer_trace; produces | [ [ { "column_name": "a", "ranges": [ "86930 <= a <= 86930", "86931 <= a <= 86931", "86932 <= a <= 86932", "86933 <= a <= 86933", "86934 <= a <= 86934", "86935 <= a <= 86935", <SNIP> "87055 <= a <= 87055", "87056 <= a <= 87056", "87057 <= a <= 87057", "87058 <= a <= 87058", "87059 <= a <= 87059" ], "selectivity_from_histogram": 1.0078 } ] ] | All these values fit into one bucket. The correct selectivity is 0.0078125.

            Agree.

            For the record: the following happens in the code with the pushed variant for MDEV-31067:

            The column has 30 distinct values, the histogram has 128 buckets.
            All lookup constants fit into one bucket so we get 1/128th (as it is less than 1/30th) as the estimate for each a=const.
            The IN-list has 129 elements so the estimate sums up to a value slightly greater than 1.

            With the old code (before any patches for MDEV-31067), we didn't have this effect for this example, because the bucket the values hit is "narrow" and the logic that MDEV-31067 patch refers to as "brave heuristic" makes the estimate smaller.

            psergei Sergei Petrunia added a comment - Agree. For the record: the following happens in the code with the pushed variant for MDEV-31067 : The column has 30 distinct values, the histogram has 128 buckets. All lookup constants fit into one bucket so we get 1/128th (as it is less than 1/30th) as the estimate for each a=const . The IN-list has 129 elements so the estimate sums up to a value slightly greater than 1. With the old code (before any patches for MDEV-31067 ), we didn't have this effect for this example, because the bucket the values hit is "narrow" and the logic that MDEV-31067 patch refers to as "brave heuristic" makes the estimate smaller.

            ... that is, in order to observe the issue the column had to have 30 values and the query had to query for more values than that (129).

            psergei Sergei Petrunia added a comment - ... that is, in order to observe the issue the column had to have 30 values and the query had to query for more values than that (129).

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            julien.fritsch Julien Fritsch added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.
            JIraAutomate JiraAutomate added a comment -

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            People

              Johnston Rex Johnston
              Johnston Rex Johnston
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.