Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33697

Improve selectivity computations for multi-part keys

Details

    Description

      Selectivity computations can produce poor estimates when multi-part keys are involved.

      Motivating example:

      create table t1 (
        id int auto_increment primary key, 
        a int, 
        b int, 
        f varchar(256) default 'xxx', 
        index idx1(a), 
        index idx2(b)
      );
      insert into t1(a,b) select  rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
      create index idx3 on t1(a,b);
      # Use query plans that are easier to read:
      set optimizer_switch='rowid_filter=off';
      

      First, let's use single-column indexes only:

      explain extended select * from t1 ignore index (idx3)
      where a between 50 and 80 and b between 33 and 41;
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                              |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
      |    1 | SIMPLE      | t1    | range | idx1,idx2     | idx1 | 5       | NULL | 149  |    17.50 | Using index condition; Using where |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
      

      The estimated #rows in the output is 149 * 0.175= 26 rows.
      The estimates are obtained from

                "range_scan_alternatives": [
                  {
                    "index": "idx1",
                    "ranges": ["(50) <= (a) <= (80)"],
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 149,
                    "cost": 0.1537364,
                    "chosen": true
                  },
                  {
                    "index": "idx2",
                    "ranges": ["(33) <= (b) <= (41)"],
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 175,
                    "cost": 0.17956584,
                    "chosen": false,
                    "cause": "cost"
                  }
      

      Now, with three indexes:

      explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
      

      +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
      | id   | select_type | table | type  | possible_keys  | key  | key_len | ref  | rows | filtered | Extra                 |
      +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
      |    1 | SIMPLE      | t1    | range | idx1,idx2,idx3 | idx3 | 10      | NULL | 144  |   100.00 | Using index condition |
      +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
      

      It is using this plan

                        "chosen_range_access_summary": {
                          "range_access_plan": {
                            "type": "range_scan",
                            "index": "idx3",
                            "rows": 144,
                            "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                          },
                          "rows_for_plan": 144,
                          "cost_for_plan": 0.1487692,
                          "chosen": true
                        }
      

      It has these estimate numbers at its disposal:

                          "range_scan_alternatives": [
                            {
                              "index": "idx1",
                              "ranges": ["(50) <= (a) <= (80)"],
                              "rowid_ordered": false,
                              "using_mrr": false,
                              "index_only": false,
                              "rows": 149,
                              "cost": 0.1537364,
                              "chosen": true
                            },
                            {
                              "index": "idx2",
                              "ranges": ["(33) <= (b) <= (41)"],
                              "rowid_ordered": false,
                              "using_mrr": false,
                              "index_only": false,
                              "rows": 175,
                              "cost": 0.17956584,
                              "chosen": false,
                              "cause": "cost"
                            },
                            {
                              "index": "idx3",
                              "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                              "rowid_ordered": false,
                              "using_mrr": false,
                              "index_only": false,
                              "rows": 144,
                              "cost": 0.1487692,
                              "chosen": true
                            }
                          ],
      

      and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on a and b separately.

      Igor's point is that "keyparts >=1 do not matter for non-equalities".

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}

            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking
            into account the estimates on {{a} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            First, let's use single-column indexes only:
            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}

            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking
            into account the estimates on {{a} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            psergei Sergei Petrunia made changes -
            Description Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            First, let's use single-column indexes only:
            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}

            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking
            into account the estimates on {{a} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            First, let's use single-column indexes only:
            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}
            The estimated #rows in the output is 149 * 0.175= 26 rows.
            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking
            into account the estimates on {{a} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            psergei Sergei Petrunia made changes -
            Description Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            First, let's use single-column indexes only:
            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}
            The estimated #rows in the output is 149 * 0.175= 26 rows.
            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking
            into account the estimates on {{a} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            Selectivity computations can produce poor estimates when multi-part keys are involved.

            Motivating example:

            {code:sql}
            create table t1 (
              id int auto_increment primary key,
              a int,
              b int,
              f varchar(256) default 'xxx',
              index idx1(a),
              index idx2(b)
            );
            insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
            create index idx3 on t1(a,b);
            # Use query plans that are easier to read:
            set optimizer_switch='rowid_filter=off';
            {code}

            First, let's use single-column indexes only:
            {code:sql}
            explain extended select * from t1 ignore index (idx3)
            where a between 50 and 80 and b between 33 and 41;
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
            {code}
            The estimated #rows in the output is 149 * 0.175= 26 rows.
            The estimates are obtained from
            {code:js}
                      "range_scan_alternatives": [
                        {
                          "index": "idx1",
                          "ranges": ["(50) <= (a) <= (80)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 149,
                          "cost": 0.1537364,
                          "chosen": true
                        },
                        {
                          "index": "idx2",
                          "ranges": ["(33) <= (b) <= (41)"],
                          "rowid_ordered": false,
                          "using_mrr": false,
                          "index_only": false,
                          "rows": 175,
                          "cost": 0.17956584,
                          "chosen": false,
                          "cause": "cost"
                        }
            {code}

            Now, with three indexes:
            {code:sql}
            explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
            {code}
            {code}
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            | 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
            +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
            {code}

            It is using this plan
            {code:js}
                              "chosen_range_access_summary": {
                                "range_access_plan": {
                                  "type": "range_scan",
                                  "index": "idx3",
                                  "rows": 144,
                                  "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                                },
                                "rows_for_plan": 144,
                                "cost_for_plan": 0.1487692,
                                "chosen": true
                              }
            {code}

            It has these estimate numbers at its disposal:
            {code:js}
                                "range_scan_alternatives": [
                                  {
                                    "index": "idx1",
                                    "ranges": ["(50) <= (a) <= (80)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 149,
                                    "cost": 0.1537364,
                                    "chosen": true
                                  },
                                  {
                                    "index": "idx2",
                                    "ranges": ["(33) <= (b) <= (41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 175,
                                    "cost": 0.17956584,
                                    "chosen": false,
                                    "cause": "cost"
                                  },
                                  {
                                    "index": "idx3",
                                    "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                                    "rowid_ordered": false,
                                    "using_mrr": false,
                                    "index_only": false,
                                    "rows": 144,
                                    "cost": 0.1487692,
                                    "chosen": true
                                  }
                                ],
            {code}

            and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on {{a}} and {{b}} separately.

            Igor's point is that "keyparts >=1 do not matter for non-equalities".



            psergei Sergei Petrunia made changes -
            Fix Version/s 11.0 [ 28320 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            Labels selectivity-after-11.0

            bb-11.6-MDEV-33697

            psergei Sergei Petrunia added a comment - bb-11.6- MDEV-33697
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.0 [ 28320 ]
            psergei Sergei Petrunia made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Michael Widenius [ monty ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            Got review input from monty, addressed it.

            psergei Sergei Petrunia added a comment - Got review input from monty , addressed it.
            psergei Sergei Petrunia made changes -
            Assignee Michael Widenius [ monty ] Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]

            commit ce504779b16ff8274fb23c3c2b39ec2538120c38 (HEAD -> bb-11.6-MDEV-33697, origin/bb-11.6-MDEV-33697)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Tue Jun 4 09:53:27 2024 +0300
             
                MDEV-33697: Improve selectivity computations for multi-part keys
            

            psergei Sergei Petrunia added a comment - commit ce504779b16ff8274fb23c3c2b39ec2538120c38 (HEAD -> bb-11.6-MDEV-33697, origin/bb-11.6-MDEV-33697) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Jun 4 09:53:27 2024 +0300   MDEV-33697: Improve selectivity computations for multi-part keys
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Lena Startseva [ JIRAUSER50478 ]
            psergei Sergei Petrunia made changes -
            Affects Version/s 11.0 [ 28320 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            Assignee Lena Startseva [ JIRAUSER50478 ] Sergei Petrunia [ psergey ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Labels selectivity-after-11.0 Preview_11.6 selectivity-after-11.0
            serg Sergei Golubchik made changes -
            Fix Version/s 11.7 [ 29815 ]
            Fix Version/s 11.6 [ 29515 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.7 [ 29815 ]
            psergei Sergei Petrunia made changes -
            Labels Preview_11.6 selectivity-after-11.0 Preview_11.6 optimizer-feature selectivity-after-11.0

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.