Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18478

ANALYZE for statement should show selectivity of pushed index condition

Details

    Description

      (filing his is a followup for the optimizer call on January, 29th)

      The EXPLAIN output has:

      • rows - expected # of total rows enumerated
      • filtered - expected condition selectivity

      Additional members from ANALYZE:

      • r_rows - shows how many rows were returned to the SQL layer (that is, after pushed index condition was applied (the value will be lower))
      • r_filtered - shows the selectivity of the condition that is applied at the SQL layer (that is, the selectivity of pushed index condition is not counted, the value will be higher)

      Considerations for fixing this:

      • ANALYZE FORMAT=JSON could show #rows enumerated, and both selectivities
      • For tabular ANALYZE, we are limited to two numbers.
      • rows vs r_rows and filtered vs r_filtered should be apples-to-apples comparisons.

      Implementation

      Current handler statistics/counters

      There are multiple ways different statistics about actions inside handler make it to ANALYZE output:

      Option 1: ha_handler_stats.h:class ha_handler_stats

      This was introduced recently to produce r_engine_stats.
      It seems, there is no issue if this structure had another member which handler_index_cond_check would increment...

      Option 2: handler->tracker approach

      This approach is used by a lot of ANALYZE FORMAT=JSON counters.

      Read access time is counted using this member of class handler:

        /* ANALYZE time tracker, if present */
        Exec_time_tracker *tracker;
      

      It points to a member of class Explain_table_access:

        Table_access_tracker tracker;
      

      This works, but do we want to add another member like 'handler::icp_tracker' ?

      Option 3: Rowid Filter's approach

      Storage engine code calls handler_rowid_filter_check() which calls Range_rowid_filter::check() through

      h->pushed_rowid_filter->check(...) 
      

      call. Range_rowid_filter::check increments Rowid_filter::tracker.

      The key part is that class handler has

        Rowid_filter *pushed_rowid_filter;
      

      which is the context and has the counter.

      For ICP, we have instead:

        Item *pushed_idx_cond;
        uint pushed_idx_cond_keyno;  /* The index which the above condition is for */
      

      messing with cleaning/restoring pushed_idx_cond might be hard.

      Which approach to use

      *Option1* seems the best choice...

      Output

      TODO.
      Considerations:

      One should be able to compare rows and r_rows and it should be apples-to-apples comparison.
      rows is number-of-rows-to-enumerate: before rowid filtering, before ICP.
      so, r_rows should be the same.

      There is also filtered which is... not as clearly defined? Should it be:

      • %of rows left after any kind of filtering?
      • %of rows left after checking the attached_condition?
        (TODO: check what happens when we have an extra pushed-index-condition which selectivity is known to the optimizer).
        (TODO: check what happens when using a rowid filter. Is its selectivity included in filtered% or not?)

      In any case, we need something like r_icp_filtered.
      The question is, should r_filtered include r_icp_filtered / 100% as a multiplier or not.

      Attachments

        Issue Links

          Activity

            Based on the testing input,produced a new variant. It is in bb-11.4-MDEV-18478-v4. This makes the docs above out-of-date.

            psergei Sergei Petrunia added a comment - Based on the testing input,produced a new variant. It is in bb-11.4- MDEV-18478 -v4 . This makes the docs above out-of-date.
            psergei Sergei Petrunia added a comment - (Docs at https://mariadb.com/kb/en/analyze-interpreting-execution-stats-for-index-based-access-methods/ are now up to date)

            Testing done. Ok to push

            lstartseva Lena Startseva added a comment - Testing done. Ok to push

            Looking at the latest fix:

            commit 51cd6301add23ec3f73b5ce2cfe8c44cfa806b5b (HEAD -> bb-11.4-MDEV-18478-v4-MDEV-12404, origin/bb-11.4-MDEV-18478-v4-MDEV-12404)
            Author: Dave Gosselin <dave.gosselin@mariadb.com>
            Date:   Mon Apr 1 21:44:36 2024 -0400
             
                We rely on handler_stats pointing to a valid active_handler_stats instance,
             for ICP accounting, which is created during ha_handler_stats_reset().  Always
             invoke that method during table init to ensure that the handler, regardless of 
            implementation, has the pointer set correctly
            

            The fix is a consequence of choices made earlier in this MDEV: ICP counters are in handler_stats structure.

            psergei Sergei Petrunia added a comment - Looking at the latest fix: commit 51cd6301add23ec3f73b5ce2cfe8c44cfa806b5b (HEAD -> bb-11.4-MDEV-18478-v4-MDEV-12404, origin/bb-11.4-MDEV-18478-v4-MDEV-12404) Author: Dave Gosselin <dave.gosselin@mariadb.com> Date: Mon Apr 1 21:44:36 2024 -0400   We rely on handler_stats pointing to a valid active_handler_stats instance, for ICP accounting, which is created during ha_handler_stats_reset(). Always invoke that method during table init to ensure that the handler, regardless of implementation, has the pointer set correctly The fix is a consequence of choices made earlier in this MDEV: ICP counters are in handler_stats structure.
            psergei Sergei Petrunia added a comment - - edited

            All the issues with handler_stats were there because we were trying to get the following scenario to work:

            • There is a server that runs without log_slow_verbosity=engine.
            • A regular SELECT (not ANALYZE) starts to execute. The execution is done without enabling handler_stats feature.
            • SHOW ANALYZE FORMAT=JSON is invoked for the running SELECT.
              • It cannot display ICP statistics, because ICP statistics cannot be collected when handler_stats is not enabled.
              • Note that r_engine_stats are not displayed, either.

            Attached mdev18478-show-analyze.diff with a demo of this.

            There were patches made that tried to rectify this by adding optimizer_track_stats which would have the same effect as having log_slow_verbosity=engine setting.
            Then, Monty requested on the call that we print the engine stats into slow query log if they are available. But then optimizer_track_stats=1 would have the almost same effect as having log_slow_verbosity=engine.

            This seems to go outside of the scope of this MDEV. Getting this MDEV pushed with a limitation that SHOW ANALYZE FORMAT=JSON might not show ICP selectivity under circumstances described above.

            psergei Sergei Petrunia added a comment - - edited All the issues with handler_stats were there because we were trying to get the following scenario to work: There is a server that runs without log_slow_verbosity=engine . A regular SELECT (not ANALYZE) starts to execute. The execution is done without enabling handler_stats feature. SHOW ANALYZE FORMAT=JSON is invoked for the running SELECT. It cannot display ICP statistics, because ICP statistics cannot be collected when handler_stats is not enabled. Note that r_engine_stats are not displayed, either. Attached mdev18478-show-analyze.diff with a demo of this. There were patches made that tried to rectify this by adding optimizer_track_stats which would have the same effect as having log_slow_verbosity=engine setting. Then, Monty requested on the call that we print the engine stats into slow query log if they are available. But then optimizer_track_stats=1 would have the almost same effect as having log_slow_verbosity=engine . This seems to go outside of the scope of this MDEV. Getting this MDEV pushed with a limitation that SHOW ANALYZE FORMAT=JSON might not show ICP selectivity under circumstances described above.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.