[MDEV-33697] Improve selectivity computations for multi-part keys - Jira

Details

Type: Task
Status: In Testing (View Workflow)
Priority: Critical
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer
Labels:

Description

Selectivity computations can produce poor estimates when multi-part keys are involved.

Motivating example:

create table t1 (

  id int auto_increment primary key,

  a int,

  b int,

  f varchar(256) default 'xxx',

  index idx1(a),

  index idx2(b)

);

insert into t1(a,b) select  rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;

create index idx3 on t1(a,b);

# Use query plans that are easier to read:

set optimizer_switch='rowid_filter=off';

First, let's use single-column indexes only:

explain extended select * from t1 ignore index (idx3)

where a between 50 and 80 and b between 33 and 41;

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+

| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                              |

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+

|    1 | SIMPLE      | t1    | range | idx1,idx2     | idx1 | 5       | NULL | 149  |    17.50 | Using index condition; Using where |

+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+

The estimated #rows in the output is 149 * 0.175= 26 rows.
The estimates are obtained from

          "range_scan_alternatives": [

              "index": "idx1",

              "ranges": ["(50) <= (a) <= (80)"],

              "rowid_ordered": false,

              "using_mrr": false,

              "index_only": false,

              "rows": 149,

              "cost": 0.1537364,

              "chosen": true

},

              "index": "idx2",

              "ranges": ["(33) <= (b) <= (41)"],

              "rowid_ordered": false,

              "using_mrr": false,

              "index_only": false,

              "rows": 175,

              "cost": 0.17956584,

              "chosen": false,

              "cause": "cost"

Now, with three indexes:

explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;

+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+

| id   | select_type | table | type  | possible_keys  | key  | key_len | ref  | rows | filtered | Extra                 |

+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+

|    1 | SIMPLE      | t1    | range | idx1,idx2,idx3 | idx3 | 10      | NULL | 144  |   100.00 | Using index condition |

+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+

It is using this plan

                  "chosen_range_access_summary": {

                    "range_access_plan": {

                      "type": "range_scan",

                      "index": "idx3",

                      "rows": 144,

                      "ranges": ["(50,33) <= (a,b) <= (80,41)"]

},

                    "rows_for_plan": 144,

                    "cost_for_plan": 0.1487692,

                    "chosen": true

It has these estimate numbers at its disposal:

                    "range_scan_alternatives": [

                        "index": "idx1",

                        "ranges": ["(50) <= (a) <= (80)"],

                        "rowid_ordered": false,

                        "using_mrr": false,

                        "index_only": false,

                        "rows": 149,

                        "cost": 0.1537364,

                        "chosen": true

},

                        "index": "idx2",

                        "ranges": ["(33) <= (b) <= (41)"],

                        "rowid_ordered": false,

                        "using_mrr": false,

                        "index_only": false,

                        "rows": 175,

                        "cost": 0.17956584,

                        "chosen": false,

                        "cause": "cost"

},

                        "index": "idx3",

                        "ranges": ["(50,33) <= (a,b) <= (80,41)"],

                        "rowid_ordered": false,

                        "using_mrr": false,

                        "index_only": false,

                        "rows": 144,

                        "cost": 0.1487692,

                        "chosen": true

],

and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on a and b separately.

Igor's point is that "keyparts >=1 do not matter for non-equalities".

Attachments

Issue Links

is part of

MDEV-23707 Fix condition selectivity computation for join prefixes

Stalled

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia created issue - 2024-03-17 14:36

Sergei Petrunia made changes - 2024-03-17 14:37

Field	Original Value	New Value
Description	Selectivity computations can produce poor estimates when multi-part keys are involved. Motivating example: {code:sql} create table t1 ( id int auto_increment primary key, a int, b int, f varchar(256) default 'xxx', index idx1(a), index idx2(b) ); insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000; create index idx3 on t1(a,b); # Use query plans that are easier to read: set optimizer_switch='rowid_filter=off'; {code} {code:sql} explain extended select * from t1 ignore index (idx3) where a between 50 and 80 and b between 33 and 41; +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ \| 1 \| SIMPLE \| t1 \| range \| idx1,idx2 \| idx1 \| 5 \| NULL \| 149 \| 17.50 \| Using index condition; Using where \| +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ {code} The estimates are obtained from {code:js} "range_scan_alternatives": [ { "index": "idx1", "ranges": ["(50) <= (a) <= (80)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 149, "cost": 0.1537364, "chosen": true }, { "index": "idx2", "ranges": ["(33) <= (b) <= (41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 175, "cost": 0.17956584, "chosen": false, "cause": "cost" } {code} Now, with three indexes: {code:sql} explain extended select * from t1 where a between 50 and 80 and b between 33 and 41; {code} {code} +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ \| 1 \| SIMPLE \| t1 \| range \| idx1,idx2,idx3 \| idx3 \| 10 \| NULL \| 144 \| 100.00 \| Using index condition \| +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ {code} It is using this plan {code:js} "chosen_range_access_summary": { "range_access_plan": { "type": "range_scan", "index": "idx3", "rows": 144, "ranges": ["(50,33) <= (a,b) <= (80,41)"] }, "rows_for_plan": 144, "cost_for_plan": 0.1487692, "chosen": true } {code} It has these estimate numbers at its disposal: {code:js} "range_scan_alternatives": [ { "index": "idx1", "ranges": ["(50) <= (a) <= (80)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 149, "cost": 0.1537364, "chosen": true }, { "index": "idx2", "ranges": ["(33) <= (b) <= (41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 175, "cost": 0.17956584, "chosen": false, "cause": "cost" }, { "index": "idx3", "ranges": ["(50,33) <= (a,b) <= (80,41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 144, "cost": 0.1487692, "chosen": true } ], {code} and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on {{a} and {{b}} separately. Igor's point is that "keyparts >=1 do not matter for non-equalities".	Selectivity computations can produce poor estimates when multi-part keys are involved. Motivating example: {code:sql} create table t1 ( id int auto_increment primary key, a int, b int, f varchar(256) default 'xxx', index idx1(a), index idx2(b) ); insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000; create index idx3 on t1(a,b); # Use query plans that are easier to read: set optimizer_switch='rowid_filter=off'; {code} First, let's use single-column indexes only: {code:sql} explain extended select * from t1 ignore index (idx3) where a between 50 and 80 and b between 33 and 41; +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ \| 1 \| SIMPLE \| t1 \| range \| idx1,idx2 \| idx1 \| 5 \| NULL \| 149 \| 17.50 \| Using index condition; Using where \| +------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+ {code} The estimates are obtained from {code:js} "range_scan_alternatives": [ { "index": "idx1", "ranges": ["(50) <= (a) <= (80)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 149, "cost": 0.1537364, "chosen": true }, { "index": "idx2", "ranges": ["(33) <= (b) <= (41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 175, "cost": 0.17956584, "chosen": false, "cause": "cost" } {code} Now, with three indexes: {code:sql} explain extended select * from t1 where a between 50 and 80 and b between 33 and 41; {code} {code} +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| filtered \| Extra \| +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ \| 1 \| SIMPLE \| t1 \| range \| idx1,idx2,idx3 \| idx3 \| 10 \| NULL \| 144 \| 100.00 \| Using index condition \| +------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+ {code} It is using this plan {code:js} "chosen_range_access_summary": { "range_access_plan": { "type": "range_scan", "index": "idx3", "rows": 144, "ranges": ["(50,33) <= (a,b) <= (80,41)"] }, "rows_for_plan": 144, "cost_for_plan": 0.1487692, "chosen": true } {code} It has these estimate numbers at its disposal: {code:js} "range_scan_alternatives": [ { "index": "idx1", "ranges": ["(50) <= (a) <= (80)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 149, "cost": 0.1537364, "chosen": true }, { "index": "idx2", "ranges": ["(33) <= (b) <= (41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 175, "cost": 0.17956584, "chosen": false, "cause": "cost" }, { "index": "idx3", "ranges": ["(50,33) <= (a,b) <= (80,41)"], "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 144, "cost": 0.1487692, "chosen": true } ], {code} and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on {{a} and {{b}} separately. Igor's point is that "keyparts >=1 do not matter for non-equalities".

Sergei Petrunia made changes - 2024-03-17 14:38

Description

Selectivity computations can produce poor estimates when multi-part keys are involved.

Motivating example:

{code:sql}
create table t1 (
  id int auto_increment primary key,
  a int,
  b int,
  f varchar(256) default 'xxx',
  index idx1(a),
  index idx2(b)
);
insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
create index idx3 on t1(a,b);
# Use query plans that are easier to read:
set optimizer_switch='rowid_filter=off';
{code}

First, let's use single-column indexes only:
{code:sql}
explain extended select * from t1 ignore index (idx3)
where a between 50 and 80 and b between 33 and 41;
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
{code}

The estimates are obtained from
{code:js}
          "range_scan_alternatives": [
            {
              "index": "idx1",
              "ranges": ["(50) <= (a) <= (80)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 149,
              "cost": 0.1537364,
              "chosen": true
            },
            {
              "index": "idx2",
              "ranges": ["(33) <= (b) <= (41)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 175,
              "cost": 0.17956584,
              "chosen": false,
              "cause": "cost"
            }
{code}

Now, with three indexes:
{code:sql}
explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
{code}
{code}
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
{code}

It is using this plan
{code:js}
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx3",
                      "rows": 144,
                      "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                    },
                    "rows_for_plan": 144,
                    "cost_for_plan": 0.1487692,
                    "chosen": true
                  }
{code}

It has these estimate numbers at its disposal:
{code:js}
                    "range_scan_alternatives": [
                      {
                        "index": "idx1",
                        "ranges": ["(50) <= (a) <= (80)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149,
                        "cost": 0.1537364,
                        "chosen": true
                      },
                      {
                        "index": "idx2",
                        "ranges": ["(33) <= (b) <= (41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 175,
                        "cost": 0.17956584,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx3",
                        "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 144,
                        "cost": 0.1487692,
                        "chosen": true
                      }
                    ],
{code}

and the point is that it could come up with a tighter estimate of output bound by taking
into account the estimates on {{a} and {{b}} separately.

Igor's point is that "keyparts >=1 do not matter for non-equalities".

Selectivity computations can produce poor estimates when multi-part keys are involved.

Motivating example:

{code:sql}
create table t1 (
  id int auto_increment primary key,
  a int,
  b int,
  f varchar(256) default 'xxx',
  index idx1(a),
  index idx2(b)
);
insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
create index idx3 on t1(a,b);
# Use query plans that are easier to read:
set optimizer_switch='rowid_filter=off';
{code}

First, let's use single-column indexes only:
{code:sql}
explain extended select * from t1 ignore index (idx3)
where a between 50 and 80 and b between 33 and 41;
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
{code}
The estimated #rows in the output is 149 * 0.175= 26 rows.
The estimates are obtained from
{code:js}
          "range_scan_alternatives": [
            {
              "index": "idx1",
              "ranges": ["(50) <= (a) <= (80)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 149,
              "cost": 0.1537364,
              "chosen": true
            },
            {
              "index": "idx2",
              "ranges": ["(33) <= (b) <= (41)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 175,
              "cost": 0.17956584,
              "chosen": false,
              "cause": "cost"
            }
{code}

Now, with three indexes:
{code:sql}
explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
{code}
{code}
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
{code}

It is using this plan
{code:js}
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx3",
                      "rows": 144,
                      "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                    },
                    "rows_for_plan": 144,
                    "cost_for_plan": 0.1487692,
                    "chosen": true
                  }
{code}

It has these estimate numbers at its disposal:
{code:js}
                    "range_scan_alternatives": [
                      {
                        "index": "idx1",
                        "ranges": ["(50) <= (a) <= (80)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149,
                        "cost": 0.1537364,
                        "chosen": true
                      },
                      {
                        "index": "idx2",
                        "ranges": ["(33) <= (b) <= (41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 175,
                        "cost": 0.17956584,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx3",
                        "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 144,
                        "cost": 0.1487692,
                        "chosen": true
                      }
                    ],
{code}

and the point is that it could come up with a tighter estimate of output bound by taking
into account the estimates on {{a} and {{b}} separately.

Igor's point is that "keyparts >=1 do not matter for non-equalities".

Sergei Petrunia made changes - 2024-03-17 14:39

Description

Selectivity computations can produce poor estimates when multi-part keys are involved.

Motivating example:

{code:sql}
create table t1 (
  id int auto_increment primary key,
  a int,
  b int,
  f varchar(256) default 'xxx',
  index idx1(a),
  index idx2(b)
);
insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
create index idx3 on t1(a,b);
# Use query plans that are easier to read:
set optimizer_switch='rowid_filter=off';
{code}

First, let's use single-column indexes only:
{code:sql}
explain extended select * from t1 ignore index (idx3)
where a between 50 and 80 and b between 33 and 41;
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
{code}
The estimated #rows in the output is 149 * 0.175= 26 rows.
The estimates are obtained from
{code:js}
          "range_scan_alternatives": [
            {
              "index": "idx1",
              "ranges": ["(50) <= (a) <= (80)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 149,
              "cost": 0.1537364,
              "chosen": true
            },
            {
              "index": "idx2",
              "ranges": ["(33) <= (b) <= (41)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 175,
              "cost": 0.17956584,
              "chosen": false,
              "cause": "cost"
            }
{code}

Now, with three indexes:
{code:sql}
explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
{code}
{code}
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
{code}

It is using this plan
{code:js}
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx3",
                      "rows": 144,
                      "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                    },
                    "rows_for_plan": 144,
                    "cost_for_plan": 0.1487692,
                    "chosen": true
                  }
{code}

It has these estimate numbers at its disposal:
{code:js}
                    "range_scan_alternatives": [
                      {
                        "index": "idx1",
                        "ranges": ["(50) <= (a) <= (80)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149,
                        "cost": 0.1537364,
                        "chosen": true
                      },
                      {
                        "index": "idx2",
                        "ranges": ["(33) <= (b) <= (41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 175,
                        "cost": 0.17956584,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx3",
                        "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 144,
                        "cost": 0.1487692,
                        "chosen": true
                      }
                    ],
{code}

and the point is that it could come up with a tighter estimate of output bound by taking
into account the estimates on {{a} and {{b}} separately.

Igor's point is that "keyparts >=1 do not matter for non-equalities".

Selectivity computations can produce poor estimates when multi-part keys are involved.

Motivating example:

{code:sql}
create table t1 (
  id int auto_increment primary key,
  a int,
  b int,
  f varchar(256) default 'xxx',
  index idx1(a),
  index idx2(b)
);
insert into t1(a,b) select rand(13) * 1000 mod 200, rand(17) * 1000 mod 50 from seq_1_to_1000;
create index idx3 on t1(a,b);
# Use query plans that are easier to read:
set optimizer_switch='rowid_filter=off';
{code}

First, let's use single-column indexes only:
{code:sql}
explain extended select * from t1 ignore index (idx3)
where a between 50 and 80 and b between 33 and 41;
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 149 | 17.50 | Using index condition; Using where |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+------------------------------------+
{code}
The estimated #rows in the output is 149 * 0.175= 26 rows.
The estimates are obtained from
{code:js}
          "range_scan_alternatives": [
            {
              "index": "idx1",
              "ranges": ["(50) <= (a) <= (80)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 149,
              "cost": 0.1537364,
              "chosen": true
            },
            {
              "index": "idx2",
              "ranges": ["(33) <= (b) <= (41)"],
              "rowid_ordered": false,
              "using_mrr": false,
              "index_only": false,
              "rows": 175,
              "cost": 0.17956584,
              "chosen": false,
              "cause": "cost"
            }
{code}

Now, with three indexes:
{code:sql}
explain extended select * from t1 where a between 50 and 80 and b between 33 and 41;
{code}
{code}
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2,idx3 | idx3 | 10 | NULL | 144 | 100.00 | Using index condition |
+------+-------------+-------+-------+----------------+------+---------+------+------+----------+-----------------------+
{code}

It is using this plan
{code:js}
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx3",
                      "rows": 144,
                      "ranges": ["(50,33) <= (a,b) <= (80,41)"]
                    },
                    "rows_for_plan": 144,
                    "cost_for_plan": 0.1487692,
                    "chosen": true
                  }
{code}

It has these estimate numbers at its disposal:
{code:js}
                    "range_scan_alternatives": [
                      {
                        "index": "idx1",
                        "ranges": ["(50) <= (a) <= (80)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149,
                        "cost": 0.1537364,
                        "chosen": true
                      },
                      {
                        "index": "idx2",
                        "ranges": ["(33) <= (b) <= (41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 175,
                        "cost": 0.17956584,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx3",
                        "ranges": ["(50,33) <= (a,b) <= (80,41)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 144,
                        "cost": 0.1487692,
                        "chosen": true
                      }
                    ],
{code}

and the point is that it could come up with a tighter estimate of output bound by taking into account the estimates on {{a}} and {{b}} separately.

Igor's point is that "keyparts >=1 do not matter for non-equalities".

Sergei Petrunia made changes - 2024-03-19 10:27

Fix Version/s

11.0 [ 28320 ]

Sergei Petrunia made changes - 2024-03-19 10:27

Assignee

Sergei Petrunia [ psergey ]

Sergei Petrunia made changes - 2024-03-19 10:27

Status

Open [ 1 ]

Confirmed [ 10101 ]

Sergei Petrunia made changes - 2024-03-19 10:28

Link

This issue is part of MDEV-23707 [ MDEV-23707 ]

Sergei Petrunia made changes - 2024-03-19 10:28

Labels

selectivity-after-11.0

Sergei Petrunia added a comment - 2024-06-04 08:19

bb-11.6-MDEV-33697

Sergei Petrunia added a comment - 2024-06-04 08:19 bb-11.6- MDEV-33697

Sergei Petrunia made changes - 2024-06-04 08:19

Fix Version/s		11.6 [ 29515 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Petrunia made changes - 2024-06-04 08:19

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Sergei Petrunia made changes - 2024-06-04 08:19

Assignee	Sergei Petrunia [ psergey ]	Michael Widenius [ monty ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia made changes - 2024-06-04 08:20

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Petrunia added a comment - 2024-06-10 14:50

Got review input from monty, addressed it.

Sergei Petrunia added a comment - 2024-06-10 14:50 Got review input from monty , addressed it.

Sergei Petrunia made changes - 2024-06-10 14:50

Assignee

Michael Widenius [ monty ]

Sergei Petrunia [ psergey ]

Sergei Petrunia made changes - 2024-06-10 14:50

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Sergei Petrunia made changes - 2024-06-10 14:50

Status

Stalled [ 10000 ]

In Testing [ 10301 ]

Sergei Petrunia added a comment - 2024-06-10 14:51

commit ce504779b16ff8274fb23c3c2b39ec2538120c38 (HEAD -> bb-11.6-MDEV-33697, origin/bb-11.6-MDEV-33697)

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Tue Jun 4 09:53:27 2024 +0300

    MDEV-33697: Improve selectivity computations for multi-part keys

Sergei Petrunia added a comment - 2024-06-10 14:51 commit ce504779b16ff8274fb23c3c2b39ec2538120c38 (HEAD -> bb-11.6-MDEV-33697, origin/bb-11.6-MDEV-33697) Author: Sergei Petrunia <sergey@mariadb.com> Date: Tue Jun 4 09:53:27 2024 +0300 MDEV-33697: Improve selectivity computations for multi-part keys

Sergei Petrunia made changes - 2024-06-11 18:03

Assignee

Sergei Petrunia [ psergey ]

Lena Startseva [ JIRAUSER50478 ]

Sergei Petrunia made changes - 2024-06-13 12:06

Affects Version/s	11.0 [ 28320 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Sergei Petrunia made changes - 2024-06-13 12:07

Link

This issue is part of TODO-4765 [ TODO-4765 ]

Sergei Petrunia made changes - 2024-06-24 13:30

Assignee

Lena Startseva [ JIRAUSER50478 ]

Sergei Petrunia [ psergey ]

Ralf Gebhardt made changes - 2024-06-25 06:53

Labels

selectivity-after-11.0

Preview_11.6 selectivity-after-11.0

Sergei Golubchik made changes - 2024-07-23 15:05

Fix Version/s		11.7 [ 29815 ]
Fix Version/s	11.6 [ 29515 ]

Sergei Petrunia made changes - 2024-08-20 15:19

Fix Version/s

11.7 [ 29815 ]

Sergei Petrunia made changes - 2024-08-20 15:19

Labels

Preview_11.6 selectivity-after-11.0

Preview_11.6 optimizer-feature selectivity-after-11.0

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2024-03-17 14:36

Updated:: 2024-08-20 15:19

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration