[MDEV-24740] Selectivity for equi-join predicates not involed in ref access is not taken into account for join cardinality estimation Created: 2021-01-30  Updated: 2023-11-06

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Varun Gupta (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File formula.png    

 Description   

Dataset:

CREATE TABLE t0(a INT);
INSERT INTO t0 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE t1(a INT, b INT);
INSERT INTO t1 SELECT a, a FROM t0; 
INSERT INTO t1 SELECT a, a FROM t0; 
 
CREATE TABLE t2(a INT, b INT);
INSERT INTO t2 SELECT a, a FROM t0; 
INSERT INTO t2 SELECT a, a FROM t0;

Run ANALYZE for the tables, so the EITS are collected

ANALYZE TABLE t1 PERSISTENT FOR ALL;
ANALYZE TABLE t2 PERSISTENT FOR ALL

Just setting optimizer_use_condition_selectivity to 3, so we use EITS as there are no indexed columns

set optimizer_use_condition_selectivity=3;

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1,t2 WHERE t1.a=t2.a;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 20   |   100.00 |                                                 |
|    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL | 20   |   100.00 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
2 rows in set, 1 warning (0.006 sec)

So here we have an equi-join non-indexed condition t1.a = t2.a. The filtered for table t2 shows 100% that means the estimates are such that we would expect all rows to match this condition.

But for such predicates the selectivity is calculated in table_multi_eq_cond_selectivity as:

      while ((fi++))
      {
        double curr_eq_fld_sel;
        Field *fld= fi.get_curr_field();
        if (!(fld->table->map & ~(table_bit | rem_tables)))
          continue;
        curr_eq_fld_sel= get_column_avg_frequency(fld) /
                         fld->table->stat_records();
        if (curr_eq_fld_sel < 1.0)
          set_if_bigger(eq_fld_sel, curr_eq_fld_sel);
      }

here fi is an iterator over an Item_equal object (t1.a=t2.a).

So lets say our join order is t1,t2

for table t2 when the function table_multi_eq_cond_selectivity is invoked, then we state the the selectivity for the predicate (t1.a=t2.a) is

get_column_avg_frequency(t1.a) / #rows(t1)

In this case: get_column_avg_frequency(t1.a)= (Number of non-NULL values of t1.a / Distinct values for t1.a) which is 2 here
so the selectivity would be 2/20 = 0.1
so the filtered should show 10% here for table t2



 Comments   
Comment by Varun Gupta (Inactive) [ 2021-02-02 ]

The branch for the changes is 10.6-selectivity

Comment by Varun Gupta (Inactive) [ 2021-02-05 ]

Also another imporatant takeaway from the optimizer call was that the function which calculates selectivity for the non-indexes columns should be used only when the ORDER BY LIMIT optimization is enabled.

Generated at Thu Feb 08 09:32:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.