Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18478

ANALYZE for statement should show selectivity of pushed index condition

    XMLWordPrintable

Details

    Description

      (filing his is a followup for the optimizer call on January, 29th)

      The EXPLAIN output has:

      • rows - expected # of total rows enumerated
      • filtered - expected condition selectivity

      Additional members from ANALYZE:

      • r_rows - shows how many rows were returned to the SQL layer (that is, after pushed index condition was applied (the value will be lower))
      • r_filtered - shows the selectivity of the condition that is applied at the SQL layer (that is, the selectivity of pushed index condition is not counted, the value will be higher)

      Considerations for fixing this:

      • ANALYZE FORMAT=JSON could show #rows enumerated, and both selectivities
      • For tabular ANALYZE, we are limited to two numbers.
      • rows vs r_rows and filtered vs r_filtered should be apples-to-apples comparisons.

      Implementation

      Current handler statistics/counters

      There are multiple ways different statistics about actions inside handler make it to ANALYZE output:

      Option 1: ha_handler_stats.h:class ha_handler_stats

      This was introduced recently to produce r_engine_stats.
      It seems, there is no issue if this structure had another member which handler_index_cond_check would increment...

      Option 2: handler->tracker approach

      This approach is used by a lot of ANALYZE FORMAT=JSON counters.

      Read access time is counted using this member of class handler:

        /* ANALYZE time tracker, if present */
        Exec_time_tracker *tracker;
      

      It points to a member of class Explain_table_access:

        Table_access_tracker tracker;
      

      This works, but do we want to add another member like 'handler::icp_tracker' ?

      Option 3: Rowid Filter's approach

      Storage engine code calls handler_rowid_filter_check() which calls Range_rowid_filter::check() through

      h->pushed_rowid_filter->check(...) 
      

      call. Range_rowid_filter::check increments Rowid_filter::tracker.

      The key part is that class handler has

        Rowid_filter *pushed_rowid_filter;
      

      which is the context and has the counter.

      For ICP, we have instead:

        Item *pushed_idx_cond;
        uint pushed_idx_cond_keyno;  /* The index which the above condition is for */
      

      messing with cleaning/restoring pushed_idx_cond might be hard.

      Which approach to use

      *Option1* seems the best choice...

      Output

      TODO.
      Considerations:

      One should be able to compare rows and r_rows and it should be apples-to-apples comparison.
      rows is number-of-rows-to-enumerate: before rowid filtering, before ICP.
      so, r_rows should be the same.

      There is also filtered which is... not as clearly defined? Should it be:

      • %of rows left after any kind of filtering?
      • %of rows left after checking the attached_condition?
        (TODO: check what happens when we have an extra pushed-index-condition which selectivity is known to the optimizer).
        (TODO: check what happens when using a rowid filter. Is its selectivity included in filtered% or not?)

      In any case, we need something like r_icp_filtered.
      The question is, should r_filtered include r_icp_filtered / 100% as a multiplier or not.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.