[MDEV-18478] ANALYZE for statement should show selectivity of pushed index condition - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 11.5.1
Component/s: Optimizer
Labels:

Description

(filing his is a followup for the optimizer call on January, 29th)

The EXPLAIN output has:

rows - expected # of total rows enumerated
filtered - expected condition selectivity

Additional members from ANALYZE:

r_rows - shows how many rows were returned to the SQL layer (that is, after pushed index condition was applied (the value will be lower))
r_filtered - shows the selectivity of the condition that is applied at the SQL layer (that is, the selectivity of pushed index condition is not counted, the value will be higher)

Considerations for fixing this:

ANALYZE FORMAT=JSON could show #rows enumerated, and both selectivities
For tabular ANALYZE, we are limited to two numbers.
rows vs r_rows and filtered vs r_filtered should be apples-to-apples comparisons.

Implementation

Current handler statistics/counters

There are multiple ways different statistics about actions inside handler make it to ANALYZE output:

Option 1: ha_handler_stats.h:class ha_handler_stats

This was introduced recently to produce r_engine_stats.
It seems, there is no issue if this structure had another member which handler_index_cond_check would increment...

Option 2: handler->tracker approach

This approach is used by a lot of ANALYZE FORMAT=JSON counters.

Read access time is counted using this member of class handler:

  /* ANALYZE time tracker, if present */

  Exec_time_tracker *tracker;

It points to a member of class Explain_table_access:

  Table_access_tracker tracker;

This works, but do we want to add another member like 'handler::icp_tracker' ?

Option 3: Rowid Filter's approach

Storage engine code calls handler_rowid_filter_check() which calls Range_rowid_filter::check() through

h->pushed_rowid_filter->check(...)

call. Range_rowid_filter::check increments Rowid_filter::tracker.

The key part is that class handler has

  Rowid_filter *pushed_rowid_filter;

which is the context and has the counter.

For ICP, we have instead:

  Item *pushed_idx_cond;

  uint pushed_idx_cond_keyno;  /* The index which the above condition is for */

messing with cleaning/restoring pushed_idx_cond might be hard.

Which approach to use

*Option1* seems the best choice...

Output

TODO.
Considerations:

One should be able to compare rows and r_rows and it should be apples-to-apples comparison.
rows is number-of-rows-to-enumerate: before rowid filtering, before ICP.
so, r_rows should be the same.

There is also filtered which is... not as clearly defined? Should it be:

%of rows left after any kind of filtering?
%of rows left after checking the attached_condition?
(TODO: check what happens when we have an extra pushed-index-condition which selectivity is known to the optimizer).
(TODO: check what happens when using a rowid filter. Is its selectivity included in filtered% or not?)

In any case, we need something like r_icp_filtered.
The question is, should r_filtered include r_icp_filtered / 100% as a multiplier or not.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdev18478-show-analyze.diff
2024-04-24 08:22
6 kB
Sergei Petrunia

Issue Links

relates to

MDEV-33054 ANALYZE FORMAT=JSON: rowid_filter.r_lookups should be average, not total

Stalled

Activity

Ascending order - Click to sort in descending order

View 23 older comments

Sergei Petrunia added a comment - 2024-02-26 09:28

Based on the testing input,produced a new variant. It is in bb-11.4-MDEV-18478-v4. This makes the docs above out-of-date.

Sergei Petrunia added a comment - 2024-02-26 09:28 Based on the testing input,produced a new variant. It is in bb-11.4- MDEV-18478 -v4 . This makes the docs above out-of-date.

Sergei Petrunia added a comment - 2024-03-01 12:42

(Docs at https://mariadb.com/kb/en/analyze-interpreting-execution-stats-for-index-based-access-methods/ are now up to date)

Sergei Petrunia added a comment - 2024-03-01 12:42 (Docs at https://mariadb.com/kb/en/analyze-interpreting-execution-stats-for-index-based-access-methods/ are now up to date)

Lena Startseva added a comment - 2024-04-16 13:34

Testing done. Ok to push

Lena Startseva added a comment - 2024-04-16 13:34 Testing done. Ok to push

Sergei Petrunia added a comment - 2024-04-16 17:04

Looking at the latest fix:

commit 51cd6301add23ec3f73b5ce2cfe8c44cfa806b5b (HEAD -> bb-11.4-MDEV-18478-v4-MDEV-12404, origin/bb-11.4-MDEV-18478-v4-MDEV-12404)

Author: Dave Gosselin <dave.gosselin@mariadb.com>

Date:   Mon Apr 1 21:44:36 2024 -0400

    We rely on handler_stats pointing to a valid active_handler_stats instance,

 for ICP accounting, which is created during ha_handler_stats_reset().  Always

 invoke that method during table init to ensure that the handler, regardless of

implementation, has the pointer set correctly

The fix is a consequence of choices made earlier in this MDEV: ICP counters are in handler_stats structure.

Sergei Petrunia added a comment - 2024-04-16 17:04 Looking at the latest fix: commit 51cd6301add23ec3f73b5ce2cfe8c44cfa806b5b (HEAD -> bb-11.4-MDEV-18478-v4-MDEV-12404, origin/bb-11.4-MDEV-18478-v4-MDEV-12404) Author: Dave Gosselin <dave.gosselin@mariadb.com> Date: Mon Apr 1 21:44:36 2024 -0400 We rely on handler_stats pointing to a valid active_handler_stats instance, for ICP accounting, which is created during ha_handler_stats_reset(). Always invoke that method during table init to ensure that the handler, regardless of implementation, has the pointer set correctly The fix is a consequence of choices made earlier in this MDEV: ICP counters are in handler_stats structure.

Sergei Petrunia added a comment - 2024-04-24 08:15 - edited

All the issues with handler_stats were there because we were trying to get the following scenario to work:

There is a server that runs without log_slow_verbosity=engine.
A regular SELECT (not ANALYZE) starts to execute. The execution is done without enabling handler_stats feature.
SHOW ANALYZE FORMAT=JSON is invoked for the running SELECT.
- It cannot display ICP statistics, because ICP statistics cannot be collected when handler_stats is not enabled.
- Note that r_engine_stats are not displayed, either.

Attached mdev18478-show-analyze.diff with a demo of this.

There were patches made that tried to rectify this by adding optimizer_track_stats which would have the same effect as having log_slow_verbosity=engine setting.
Then, Monty requested on the call that we print the engine stats into slow query log if they are available. But then optimizer_track_stats=1 would have the almost same effect as having log_slow_verbosity=engine.

This seems to go outside of the scope of this MDEV. Getting this MDEV pushed with a limitation that SHOW ANALYZE FORMAT=JSON might not show ICP selectivity under circumstances described above.

Sergei Petrunia added a comment - 2024-04-24 08:15 - edited All the issues with handler_stats were there because we were trying to get the following scenario to work: There is a server that runs without log_slow_verbosity=engine . A regular SELECT (not ANALYZE) starts to execute. The execution is done without enabling handler_stats feature. SHOW ANALYZE FORMAT=JSON is invoked for the running SELECT. It cannot display ICP statistics, because ICP statistics cannot be collected when handler_stats is not enabled. Note that r_engine_stats are not displayed, either. Attached mdev18478-show-analyze.diff with a demo of this. There were patches made that tried to rectify this by adding optimizer_track_stats which would have the same effect as having log_slow_verbosity=engine setting. Then, Monty requested on the call that we print the engine stats into slow query log if they are available. But then optimizer_track_stats=1 would have the almost same effect as having log_slow_verbosity=engine . This seems to go outside of the scope of this MDEV. Getting this MDEV pushed with a limitation that SHOW ANALYZE FORMAT=JSON might not show ICP selectivity under circumstances described above.

MariaDB Server

ANALYZE for statement should show selectivity of pushed index condition

Details

Description

Implementation

Current handler statistics/counters

Option 1: ha_handler_stats.h:class ha_handler_stats

Option 2: handler->tracker approach

Option 3: Rowid Filter's approach

Which approach to use

Output

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration