Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30812

Improve output cardinality estimates for hash join

Details

    Description

      Current state

      Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

      Example:

      create table t1 (a int);
      insert into t1 select seq from seq_1_to_10;
      create table t2 (a int);
      insert into t2 select seq from seq_1_to_20;
      

      set join_cache_level=6;
      explain select * from t1, t2 where t1.a=t2.a;
      +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
      | id   | select_type | table | type     | possible_keys | key       | key_len | ref     | rows | Extra                                            |
      +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
      |    1 | SIMPLE      | t1    | ALL      | NULL          | NULL      | NULL    | NULL    | 10   | Using where                                      |
      |    1 | SIMPLE      | t2    | hash_ALL | NULL          | #hash#$hj | 5       | j8.t1.a | 20   | Using where; Using join buffer (flat, BNLH join) |
      +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
      

      select * from information_schema.optimizer_trace :
      ...
                        {
                          "plan_prefix": "t1",
                          "table": "t2",
                          "rows_for_plan": 200,
                          "cost_for_plan": 0.03089156
                        }
      ...
      

      The output cardinality is 200 = 20*10.
      This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

      MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

      On the 10% multiplier

      When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

      Code before 11.0:

          double join_sel= 0.1;
          ...
          best_time= COST_ADD(tmp,
                              COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                        rnd_records));
      

      code after 11.0:

      #define HASH_FANOUT 0.1
       
      // best_access_path():
       
            cmp_time= (record_count * row_copy_cost +
                       rnd_records * record_count * HASH_FANOUT *
                       ((idx - join->const_tables) * row_copy_cost +
                        WHERE_COST_THD(thd)));
      

      Improved output cardinality estimates

      If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
      The expected source of column cardinality estimate is the EITS statistics.

      Basic formula

      select * from t1, t2 where t1.col1=t2.col2
      

      The formula for equi-join output cardinality size is:

        t1_rows *  t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
      

      Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

      Multiple columns

      What about equi-join on multiple columns?

      select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
      

      We don't have n_distinct({t1.col1, t1.col3}), we have only individual per-column estimates.

      Assuming column independence and using

      n_distinct({t1.col1, t1.col3}) =  n_distinct(t1.col1) * n_distinct(t1.col3)
      

      is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

      The suggestion is to make the pessimistic assumption instead:

      n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
      

      and perform computations based on that.

      Concerns about impact on query plan choice

      After this change, query plans with hash join will

      • have much lower cost than before
      • have much lower rows_for_plan than before

      Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for copying to/from join buffer. This code can be copied from 11.0.

      Use of hash join rows_out estimates for other access methods

      (Suggestion by Monty)
      In 11.0, best_access_path() has this property: regardless of which access method is used, rows_out will have the least number of rows of any considered access method.
      Should this be ported as part of this MDEV?

      INTERFACE CHANGE:
      Adds a new optimizer switch hash_join_cardinality

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}

            In many cases this vastly over-estimates the output cardinality.

            The output cardinality is the same in MariaDB 11.0.

            h2. Better output cardinality estimates

            If we have cardinality estimates for the joined columns (t1.a and t2.a in the above example), we can produce a more precise and much lower estimate of the output cardinality size.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On 10%.
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On 10%.
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of estimates is the EITS statistics.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4

            We don't have n_distinct({t1.col1, t1.col3}), we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct({ ... }) may be much higher than the reality, and join
            output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4

            We don't have n_distinct({t1.col1, t1.col3}), we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct({ ... }) may be much higher than the reality, and join
            output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.

            h3. Concerns about impact on query plan choice

            After this change, query plans with hash join will
            * have much lower cost than before
            * have much lower rows_for_plan than before

            Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for

            h3. Use of hash join rows_out estimates for other access methods
            (Suggestion by Monty)
            In 11.0, best_access_path() has this property: regardless of which access method is used, {{rows_out}} will have the least number of rows of any considered access method.
            Should this be ported as part of this MDEV?
            psergei Sergei Petrunia made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.

            h3. Concerns about impact on query plan choice

            After this change, query plans with hash join will
            * have much lower cost than before
            * have much lower rows_for_plan than before

            Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for

            h3. Use of hash join rows_out estimates for other access methods
            (Suggestion by Monty)
            In 11.0, best_access_path() has this property: regardless of which access method is used, {{rows_out}} will have the least number of rows of any considered access method.
            Should this be ported as part of this MDEV?
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.

            h3. Concerns about impact on query plan choice

            After this change, query plans with hash join will
            * have much lower cost than before
            * have much lower rows_for_plan than before

            Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for copying to/from join buffer. This code can be copied from 11.0.

            h3. Use of hash join rows_out estimates for other access methods
            (Suggestion by Monty)
            In 11.0, best_access_path() has this property: regardless of which access method is used, {{rows_out}} will have the least number of rows of any considered access method.
            Should this be ported as part of this MDEV?
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.6 [ 24028 ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.6.13 [ 28514 ]
            Fix Version/s 10.9.6 [ 28520 ]
            Fix Version/s 10.10.4 [ 28522 ]
            Fix Version/s 10.11.3 [ 28524 ]
            Fix Version/s 10.8.8 [ 28518 ]
            Fix Version/s 11.0.2 [ 28706 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Description h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.

            h3. Concerns about impact on query plan choice

            After this change, query plans with hash join will
            * have much lower cost than before
            * have much lower rows_for_plan than before

            Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for copying to/from join buffer. This code can be copied from 11.0.

            h3. Use of hash join rows_out estimates for other access methods
            (Suggestion by Monty)
            In 11.0, best_access_path() has this property: regardless of which access method is used, {{rows_out}} will have the least number of rows of any considered access method.
            Should this be ported as part of this MDEV?
            h2. Current state
            Currently, hash join access method produces the worst-case estimates for its output cardinality. The output cardinality is a product of incoming record count and the number of rows we're going to read from the joined table.

            Example:
            {code:sql}
            create table t1 (a int);
            insert into t1 select seq from seq_1_to_10;
            create table t2 (a int);
            insert into t2 select seq from seq_1_to_20;
            {code}

            {code}
            set join_cache_level=6;
            explain select * from t1, t2 where t1.a=t2.a;
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
            | 1 | SIMPLE | t2 | hash_ALL | NULL | #hash#$hj | 5 | j8.t1.a | 20 | Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+-------+----------+---------------+-----------+---------+---------+------+--------------------------------------------------+
            {code}

            {code}
            select * from information_schema.optimizer_trace :
            ...
                              {
                                "plan_prefix": "t1",
                                "table": "t2",
                                "rows_for_plan": 200,
                                "cost_for_plan": 0.03089156
                              }
            ...
            {code}
            The output cardinality is 200 = 20*10.
            This is the maximum possible cardinality. One will get it if all rows of t1 and t2 have the same value $VAL=t1.a=t2.a, which is an edge case. If t1.a and/or t2.a have many different values the output cardinality will be much lower.

            MariaDB 11.0 doesn't change this computation. It produces the same rows_for_plan.

            h3. On the 10% multiplier
            When computing costs, the optimizer multiples the cross-product output cardinality by 0.1. Note that the output cardinality is NOT multiplied by that factor.

            Code before 11.0:
            {code:cpp}
                double join_sel= 0.1;
                ...
                best_time= COST_ADD(tmp,
                                    COST_MULT((record_count*join_sel) / TIME_FOR_COMPARE,
                                              rnd_records));
            {code}

            code after 11.0:
            {code:cpp}
            #define HASH_FANOUT 0.1

            // best_access_path():

                  cmp_time= (record_count * row_copy_cost +
                             rnd_records * record_count * HASH_FANOUT *
                             ((idx - join->const_tables) * row_copy_cost +
                              WHERE_COST_THD(thd)));
            {code}

            h2. Improved output cardinality estimates

            If we have have cardinality estimates for the join columns, we can produce a more precise and much lower estimate of the output cardinality size.
            The expected source of column cardinality estimate is the EITS statistics.

            h3. Basic formula
            {code}
            select * from t1, t2 where t1.col1=t2.col2
            {code}
            The formula for equi-join output cardinality size is:
            {code}
              t1_rows * t2_rows * (1 / MAX(n_distinct(t1.col1), n_distinct(t2.col2)))
            {code}
            Note that regular ref access on INDEX(t2.col2) would just use n_distinct(t2.col2) instead of the MAX(...).

            h3. Multiple columns
            What about equi-join on multiple columns?

            {code}
            select * from t1, t2 where t1.col1=t2.col2 and t1.col3=col4
            {code}

            We don't have {{n_distinct(\{t1.col1, t1.col3\})}}, we have only individual per-column estimates.

            Assuming column independence and using

            {code}
            n_distinct({t1.col1, t1.col3}) = n_distinct(t1.col1) * n_distinct(t1.col3)
            {code}

            is very optimistic: n_distinct for the tuple may be much higher than the reality, and join output cardinality estimate will be much lower than in reality.

            The suggestion is to make the pessimistic assumption instead:

            {code}
            n_distinct({t1.col1, t1.col3}) = MAX(n_distinct(t1.col1), n_distinct(t1.col3))
            {code}

            and perform computations based on that.

            h3. Concerns about impact on query plan choice

            After this change, query plans with hash join will
            * have much lower cost than before
            * have much lower rows_for_plan than before

            Could it be that now the cost of hash join will be much lower than the reality? Current hash join computation doesn't account for copying to/from join buffer. This code can be copied from 11.0.

            h3. Use of hash join rows_out estimates for other access methods
            (Suggestion by Monty)
            In 11.0, best_access_path() has this property: regardless of which access method is used, {{rows_out}} will have the least number of rows of any considered access method.
            Should this be ported as part of this MDEV?


            INTERFACE CHANGE:
            Adds a new optimizer switch hash_join_cardinality
            ralf.gebhardt Ralf Gebhardt made changes -
            Roel Roel Van de Paar made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 172089

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.