Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
(filing his is a followup for the optimizer call on January, 29th)
The EXPLAIN output has:
- rows - expected # of total rows enumerated
- filtered - expected condition selectivity
Additional members from ANALYZE:
- r_rows - shows how many rows were returned to the SQL layer (that is, after pushed index condition was applied (the value will be lower))
- r_filtered - shows the selectivity of the condition that is applied at the SQL layer (that is, the selectivity of pushed index condition is not counted, the value will be higher)
Considerations for fixing this:
- ANALYZE FORMAT=JSON could show #rows enumerated, and both selectivities
- For tabular ANALYZE, we are limited to two numbers.
- rows vs r_rows and filtered vs r_filtered should be apples-to-apples comparisons.
Implementation
Current handler statistics/counters
There are multiple ways different statistics about actions inside handler make it to ANALYZE output:
Option 1: ha_handler_stats.h:class ha_handler_stats
This was introduced recently to produce r_engine_stats.
It seems, there is no issue if this structure had another member which handler_index_cond_check would increment...
Option 2: handler->tracker approach
This approach is used by a lot of ANALYZE FORMAT=JSON counters.
Read access time is counted using this member of class handler:
/* ANALYZE time tracker, if present */
|
Exec_time_tracker *tracker;
|
It points to a member of class Explain_table_access:
Table_access_tracker tracker;
|
This works, but do we want to add another member like 'handler::icp_tracker' ?
Option 3: Rowid Filter's approach
Storage engine code calls handler_rowid_filter_check() which calls Range_rowid_filter::check() through
h->pushed_rowid_filter->check(...)
|
call. Range_rowid_filter::check increments Rowid_filter::tracker.
The key part is that class handler has
Rowid_filter *pushed_rowid_filter;
|
which is the context and has the counter.
For ICP, we have instead:
Item *pushed_idx_cond;
|
uint pushed_idx_cond_keyno; /* The index which the above condition is for */
|
messing with cleaning/restoring pushed_idx_cond might be hard.
Which approach to use
*Option1* seems the best choice...
Output
TODO.
Considerations:
One should be able to compare rows and r_rows and it should be apples-to-apples comparison.
rows is number-of-rows-to-enumerate: before rowid filtering, before ICP.
so, r_rows should be the same.
There is also filtered which is... not as clearly defined? Should it be:
- %of rows left after any kind of filtering?
- %of rows left after checking the attached_condition?
(TODO: check what happens when we have an extra pushed-index-condition which selectivity is known to the optimizer).
(TODO: check what happens when using a rowid filter. Is its selectivity included in filtered% or not?)
In any case, we need something like r_icp_filtered.
The question is, should r_filtered include r_icp_filtered / 100% as a multiplier or not.
Attachments
Issue Links
- relates to
-
MDEV-33054 ANALYZE FORMAT=JSON: rowid_filter.r_lookups should be average, not total
- Stalled