Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24740

Selectivity for equi-join predicates not involed in ref access is not taken into account for join cardinality estimation

    XMLWordPrintable

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2, 10.3, 10.4, 10.5, 10.6
    • 10.6
    • Optimizer
    • None

    Description

      Dataset:

      CREATE TABLE t0(a INT);
      INSERT INTO t0 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      CREATE TABLE t1(a INT, b INT);
      INSERT INTO t1 SELECT a, a FROM t0; 
      INSERT INTO t1 SELECT a, a FROM t0; 
       
      CREATE TABLE t2(a INT, b INT);
      INSERT INTO t2 SELECT a, a FROM t0; 
      INSERT INTO t2 SELECT a, a FROM t0;
      

      Run ANALYZE for the tables, so the EITS are collected

      ANALYZE TABLE t1 PERSISTENT FOR ALL;
      ANALYZE TABLE t2 PERSISTENT FOR ALL
      

      Just setting optimizer_use_condition_selectivity to 3, so we use EITS as there are no indexed columns

      set optimizer_use_condition_selectivity=3;
      

      MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1,t2 WHERE t1.a=t2.a;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 20   |   100.00 |                                                 |
      |    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL | 20   |   100.00 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
      2 rows in set, 1 warning (0.006 sec)
      

      So here we have an equi-join non-indexed condition t1.a = t2.a. The filtered for table t2 shows 100% that means the estimates are such that we would expect all rows to match this condition.

      But for such predicates the selectivity is calculated in table_multi_eq_cond_selectivity as:

            while ((fi++))
            {
              double curr_eq_fld_sel;
              Field *fld= fi.get_curr_field();
              if (!(fld->table->map & ~(table_bit | rem_tables)))
                continue;
              curr_eq_fld_sel= get_column_avg_frequency(fld) /
                               fld->table->stat_records();
              if (curr_eq_fld_sel < 1.0)
                set_if_bigger(eq_fld_sel, curr_eq_fld_sel);
            }
      

      here fi is an iterator over an Item_equal object (t1.a=t2.a).

      So lets say our join order is t1,t2

      for table t2 when the function table_multi_eq_cond_selectivity is invoked, then we state the the selectivity for the predicate (t1.a=t2.a) is

      get_column_avg_frequency(t1.a) / #rows(t1)

      In this case: get_column_avg_frequency(t1.a)= (Number of non-NULL values of t1.a / Distinct values for t1.a) which is 2 here
      so the selectivity would be 2/20 = 0.1
      so the filtered should show 10% here for table t2

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            varun Varun Gupta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.