[MDEV-20459] Selectivity of equality condition in ref access not discounted if range access on same index involved a non-equality condition - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.4(EOL)
Component/s: Optimizer
Labels:
None

Description

CREATE TABLE t1 (

pk1 INT,

pk2 INT,

f1 VARCHAR(3),

f2 VARCHAR(1021),

PRIMARY KEY (pk1,pk2),

KEY(f2)

) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1,2,'2','abc'),(2,3,'3','def');

INSERT INTO t1 VALUES (2,0,'2','abc');

INSERT INTO t1 VALUES (2,5,'2','abc');

INSERT INTO t1 VALUES (2,4,'2','abc');

Also use

set optimizer_use_condition_selecitivity=4;

MariaDB [test]> analyze select * from t1 force index(f2)  where pk1 <= 5 and pk2 <=5 and f2 = 'abc' and f1 <= '3';

+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+

| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                              |

+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+

|    1 | SIMPLE      | t1    | range | f2            | f2   | 1032    | NULL | 4    | 4.00   |    80.00 |     100.00 | Using index condition; Using where |

+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+

1 row in set (0.004 sec)

So filtered here shows 80%. The reason is we set filtered during best_access_path and there we actually picked ref access instead of range and later in plan refinement stage we switched the plan to use range access instead of ref access.

The optimizer trace shows that we picked ref access first

            "considered_execution_plans": [

                "plan_prefix": [],

                "table": "t1",

                "best_access_path": {

                  "considered_access_paths": [

                      "access_type": "ref",

                      "index": "f2",

                      "used_range_estimates": true,

                      "rows": 4,

                      "cost": 3,

                      "chosen": true

},

                      "access_type": "range",

                      "resulting_rows": 4,

                      "cost": 6.3004,

                      "chosen": false

},

                "rows_for_plan": 4,

                "cost_for_plan": 3.8,

                "selectivity": 0.8,

                "join_cardinality": 3.2

ref access on f2 is cheaper than range access on f2.
The selectivity is 0.8 (80%) which we see in the analyze output.
The issue with such cases is that for ref access estimates we sometimes use range access estimates(because they are more accurate).
One can see in the optimizer trace used_range_estimates which means that we actually used the range estimates. So if we used the range estimates we took care of the selecitivity of the entire condition.
So the selectivity here should have been 1. There is some bug while discounting selectivity for conditions that were already taken into account.
We only discount the selectivity when the range access and ref access on the index has the same keyparts and they are of the type keypart_i = const

Attachments

Issue Links

relates to

MDEV-8306 Complete cost-based optimization for ORDER BY with LIMIT

Stalled

Activity

Ascending order - Click to sort in descending order

Varun Gupta (Inactive) added a comment - 2019-09-02 14:56 - edited

Lets consider an example:
and index idx1 is defined as (a,b,c)
the condition is: WHERE a=1 and b=2 and c < 6;

So ref access in this case would be picked for 2 keyparts (a=1 and b=2)
and range access will be considered for 3 keyparts (a,b,c)

Issue 1:

There are cases when ref access estimates are too optimistic and we prefer to use ranges estimates instead. The problem with this case is if we use range estimates
then we need to make sure that we also discount the selectivity for the range estimate from the original selectivity, else the estimates we have would become optimistic rather than pessimistic.

Issue 2:

It would be better if we could get the individual selectiivties of columns that are part of the index. In this case we can easily get the selectivity for columns a,b,c.
rec_per_key has estimates that we can use to predict the selectivitiy of each field inside the key.

This would gives us the selectivity for column a and b for the index idx.
sel(a) = rec_per_key[0]/table_records
sel(b) = rec_per_key[1]/rec_per_key[0]
sel(c) = rec_per_key[2]/rec_per_key[1]

If ref access is picked then we should discount this selectivity of a and b from the original selectivity and the selectivity
that would remain would only be for c < 6, which makes sense.

Varun Gupta (Inactive) added a comment - 2019-09-02 14:56 - edited Lets consider an example: and index idx1 is defined as (a,b,c) the condition is: WHERE a=1 and b=2 and c < 6; So ref access in this case would be picked for 2 keyparts (a=1 and b=2) and range access will be considered for 3 keyparts (a,b,c) Issue 1: There are cases when ref access estimates are too optimistic and we prefer to use ranges estimates instead. The problem with this case is if we use range estimates then we need to make sure that we also discount the selectivity for the range estimate from the original selectivity, else the estimates we have would become optimistic rather than pessimistic. Issue 2: It would be better if we could get the individual selectiivties of columns that are part of the index. In this case we can easily get the selectivity for columns a,b,c. rec_per_key has estimates that we can use to predict the selectivitiy of each field inside the key. This would gives us the selectivity for column a and b for the index idx. sel(a) = rec_per_key [0] /table_records sel(b) = rec_per_key [1] /rec_per_key [0] sel(c) = rec_per_key [2] /rec_per_key [1] If ref access is picked then we should discount this selectivity of a and b from the original selectivity and the selectivity that would remain would only be for c < 6, which makes sense.

Varun Gupta (Inactive) added a comment - 2019-09-03 18:07 - edited

Inside the function table_cond_selectivity we have this code where we try to discount of the selectivity for the range scan if all they const key parts are part both the range and ref access.

But also in this code I see that the for loop is not needed, looks like dead code to me.

    if (!is_hash_join_key_no(key))

      key_part_map quick_key_map= (key_part_map(1) << table->quick_key_parts[key]) - 1;

      if (table->quick_rows[key] &&

          !(quick_key_map & ~table->const_key_parts[key]))

/*

          Ok, there is an equality for each of the key parts used by the

          quick select. This means, quick select's estimate can be reused to

          discount the selectivity of a prefix of a ref access.

*/

        for (; quick_key_map & 1 ; quick_key_map>>= 1)

          while (keyuse->table == table && keyuse->key == key &&

                 keyuse->keypart == keyparts)

            keyuse++;

          keyparts++;

        sel /= (double)table->quick_rows[key] / (double) table->stat_records();

        used_range_selectivity= true;

Varun Gupta (Inactive) added a comment - 2019-09-03 18:07 - edited Inside the function table_cond_selectivity we have this code where we try to discount of the selectivity for the range scan if all they const key parts are part both the range and ref access. But also in this code I see that the for loop is not needed, looks like dead code to me. if (!is_hash_join_key_no(key)) { key_part_map quick_key_map= (key_part_map(1) << table->quick_key_parts[key]) - 1; if (table->quick_rows[key] && !(quick_key_map & ~table->const_key_parts[key])) { /* Ok, there is an equality for each of the key parts used by the quick select. This means, quick select's estimate can be reused to discount the selectivity of a prefix of a ref access. */ for (; quick_key_map & 1 ; quick_key_map>>= 1) { while (keyuse->table == table && keyuse->key == key && keyuse->keypart == keyparts) { keyuse++; } keyparts++; } sel /= ( double )table->quick_rows[key] / ( double ) table->stat_records(); used_range_selectivity= true ; } }

Julien Fritsch added a comment - 2023-12-05 09:26

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch added a comment - 2023-12-05 09:26 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:49

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:49 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

MariaDB Server

Selectivity of equality condition in ref access not discounted if range access on same index involved a non-equality condition