Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-446

Reuse quick_condition_rows in fanout computations

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.0.2
    • None
    • None

    Description

      Currently:

      • range optimizer produces quick_condition_rows
      • they are used by best_access_path(), but only if it considers
        full table scan acccess (see matching_candidates_in_table() call)

      We need to make it so that quick_condition_rows is used for any access
      method.

      === Details ===
      Currently, quick_condition_rows is only used for full table scans. This makes the code simple.
      When we start using quick_condition_rows with arbitrary acess method, we will be faced with this problem:

      1. quick_condition_rows estimate was made from a certain part of WHERE clause (the part of WHERE that was used to construct a range)
      2. access method that we're using also was made from a certain part of WHERE clause.

      question: are these two correlated? When does the estimate of the fanout of an access method take into account quick_condition_rows?

      We will adopt this answer:

      • Don't take quick_condition_rows into account if we're using a quick select for table access,
        or if we're using ref(const) (in this case we would construct a quick select and take its estimates into account).
      • Do adopt it otherwise.

      THere are more complex cases like
      t.keypart1=const AND t.keypart2 BETWEEN 'foo' AND 'bar' AND t.keypart2=othertbl.col

      Here:
      quick_condition_rows is made from "t.keypart1=const AND t.keypart2 BETWEEN ..."
      ref access will be on "t.keypart1=const AND t.keypart2=othertbl.col"

      when part of WHERE will be appled,

      • t.keypart1=const – will not filter anything out (because it's guaranteed to be true
        by ref access)
      • t.keypart2 BETWEEN ... – will filter something out.

      Available estimates:
      E1. rec_per_key(t.keypart1=...)
      E2. rec_per_key("t.keypart1=... AND t.keypart2=...")
      E3. records_in_range("t.keypart1=const AND t.keypart2 BETWEEN ...")

      Note, that the estimates will not necessary be in sync. For instance, we saw real-world cases where E3 > E1, even if it is clear that E3 < E1 for any dataset.

      How to compute the fanout of the WHERE predicates not covered by an access method:

      • if an access method covers all WHERE predicates, then the WHERE clause doesn't have any selectivity on its own.
      • if WHERE's selectivity hasn't been "absorbed" by the employed access method, then it has selecivity on its own.

      We assume that:

      • ref(non-const) access's selecitivity is orthogonal to quick_condition_rows selecitivity.
      • full table scan has no selectivity => WHERE condition will apply quick_condition_rows selecitivity.
      • That's it, for now. More complex analysis is possible but will be done outside of this task.
        For any type of access other than the above two, assume quick_condition_rows doesn't have any additional selectivity - selectivity(WHERE)=1.

      == Implementation ==

      • All code should be in best_access_path().
      • QUESTIONABLE:: The result should be recorded in POSITION::records_read.
        (Yes, there will be cases where POSITION::records_read < 1)

      APPARENT: POSITION::prefix_record_count is not what is needed.

      • The result should be visible in EXPLAIN EXTENDED, in "filtered" column.

      == Implementation step #1 ==

      • Take the problematic query from mdev-402
      • Remove the scalar subquery predicate.
      • Convert both IN-subqueries into a JOIN ( I don't care if it's not equivalent)
      • Use STRAIGHT_JOIN hints, if necessary to produce the join order of
        nation,supplier,partsupp,part.
      • Make it so that for table part, in EXPLAIN:
        = access is eq_ref
        = rows==1 (it's eq_ref)
        = "filtered shows" 0.0118 * 100 = 1.18 (percent).
        This may require introducing extra members in POSITION (investigate).

      Attachments

        Issue Links

          Activity

            Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey):

            explain extended
            select sql_calc_found_rows
            s_name, s_address
            from nation straight_join supplier straight_join part straight_join partsupp
            where s_suppkey = ps_suppkey
            and ps_partkey = p_partkey
            and p_name like 'forest%'
            and s_nationkey = n_nationkey
            and n_name = 'CANADA'
            order by s_name
            limit 10;

            There is an index on table nation(n_name).

            The plan for DBT3 scale 10 is:

            ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

            1 SIMPLE nation ref PRIMARY,i_n_name i_n_name 26 const 1 100.00 Using where; Using index; Using temporary; Using filesort
            1 SIMPLE supplier ref PRIMARY,i_s_nationkey i_s_nationkey 5 dbt3.nation.n_nationkey 1070 100.00  
            1 SIMPLE partsupp ref PRIMARY,i_ps_partkey,i_ps_suppkey i_ps_suppkey 4 dbt3.supplier.s_suppkey 39 100.00 Using index
            1 SIMPLE part eq_ref PRIMARY,i_p_name PRIMARY 4 dbt3.partsupp.ps_partkey 0 0.00 Using where

            ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

            Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double.
            The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are:

            (gdb) p records
            $20 = 1
            (gdb) p s->table->quick_condition_rows
            $21 = 48334
            (gdb) p s->table->file->stats.records
            $22 = 1996869

            The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name.
            The actual value is 24037.

            The numbers for scale 1 are:

            (gdb) p s->table->quick_condition_rows
            $73 = 2387
            (gdb) p s->table->alias->Ptr
            $74 = 0x7f73120 "part"
            (gdb) p s->table->file->stats.records
            $75 = 200252
            (gdb) p pos->where_part_selectivity
            $76 = 0.011919980824161557

            This is 1.19% selectivity, which was the expected result.

            Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).

            timour Timour Katchaounov (Inactive) added a comment - Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey): explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier straight_join part straight_join partsupp where s_suppkey = ps_suppkey and ps_partkey = p_partkey and p_name like 'forest%' and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name limit 10; There is an index on table nation(n_name). The plan for DBT3 scale 10 is: ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- 1 SIMPLE nation ref PRIMARY,i_n_name i_n_name 26 const 1 100.00 Using where; Using index; Using temporary; Using filesort 1 SIMPLE supplier ref PRIMARY,i_s_nationkey i_s_nationkey 5 dbt3.nation.n_nationkey 1070 100.00   1 SIMPLE partsupp ref PRIMARY,i_ps_partkey,i_ps_suppkey i_ps_suppkey 4 dbt3.supplier.s_suppkey 39 100.00 Using index 1 SIMPLE part eq_ref PRIMARY,i_p_name PRIMARY 4 dbt3.partsupp.ps_partkey 0 0.00 Using where ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double. The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are: (gdb) p records $20 = 1 (gdb) p s->table->quick_condition_rows $21 = 48334 (gdb) p s->table->file->stats.records $22 = 1996869 The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name. The actual value is 24037. The numbers for scale 1 are: (gdb) p s->table->quick_condition_rows $73 = 2387 (gdb) p s->table->alias->Ptr $74 = 0x7f73120 "part" (gdb) p s->table->file->stats.records $75 = 200252 (gdb) p pos->where_part_selectivity $76 = 0.011919980824161557 This is 1.19% selectivity, which was the expected result. Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).

            This task is no longer needed because MWL#253 provides a generic solution for predicate selectivity.

            timour Timour Katchaounov (Inactive) added a comment - This task is no longer needed because MWL#253 provides a generic solution for predicate selectivity.

            People

              timour Timour Katchaounov (Inactive)
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.