[MDEV-18144] ANALYZE for statement support for PK filters Created: 2019-01-05  Updated: 2019-02-24  Resolved: 2019-02-24

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.4.3

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Galina Shalygina (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-16188 Use in-memory PK filters built from r... Closed

 Description   

This task is about adding ANALYZE [FORMAT=JSON] support for PK filtering
(MDEV-16188).

User interface

Examples of PK filtering for EXPLAIN and EXPLAIN FORMAT=JSON:

id     1
select_type    SIMPLE
table  lineitem
type   range|filter
possible_keys  PRIMARY,i_l_shipdate,i_l_receiptdate,i_l_orderkey,i_l_orderkey_quantity
key    i_l_receiptdate|i_l_shipdate
key_len        4|4
ref    NULL
rows   6 (0%)
Extra  Using index condition; Using where; Using filter

The filter shows rows="6 (0%)"

in ANALYZE, there will be "r_rows" column.
It should match the rows column, that is, display "nnn (mmm%)" where mmm% - is
the observed selectivity of the filter. If the filter was discared, we should
indicate that in some way (exactly what to print is to be determined).

With FORMAT=JSON we get more date:

+      "rowid_filter": {
+        "range": {
+          "key": "i_l_shipdate",
+          "used_key_parts": ["l_shipDATE"]
+        },
+        "rows": 8,
+        "selectivity_pct": 0.1332
+      },

ANALYZE FORMAT=JSON output should include:

  • r_rows for the quick select that was used to populate the filter.
  • if the filter was discarded (because it was too big), this should be indicated.
  • The total time to populate (i.e. build) the filter should be included as well.
  • selectivity_pct should get "r_selectivity_pct" which will show the observed filter selectivity.
  • (I am not sure if we should include the time it took to make checks in the filter).

Implementation

class Explain_rowid_filter should get a pointer to a "Rowid filter tracker
object" (which will exist for all queries, both ANALYZE or not). (just like
e.g. Explain_update has a pointer to Filesort_tracker object).

Filesort_tracker is a good example of a complex tracker object.
There should be a Rowid_filter_tracker object with information about the query
execution. It should have:

  • a member to track the time to populate the filter.
  • a member to track the time/#rows for table access while populating the
    filter.
  • A member which indicates whether filter construction finished or was aborted.
  • (Optionally) A member which indicates how big the buffer size was.
  • Two counters: one counts how many times the filter was invoked, the other
    counts how many times the filter has filter out the record. Dividing one by
    another produces r_selectivity_pct.

Tracking the time to build the filter

See Explain_select::time_tracker and its use in JOIN::exec.

Tracking table access and r_rows for building the filter.

At the moment, storage engine objects (like ha_innobase) have "tracker" objects
inside them (see handler::tracker).

The question is: does MDEV-16188 use the same handler object to build the
filter (phase #1) and to perform the "primary" table access by the main query
(phase #2) ?

If yes: we will need to switch handler::tracker objects after building the
filter has been finished (as we want time/#rows for phase #1 and phase #2 to be
accounted separately)

If no: then, we'll need to set the secondary handler object's h->tracker to
point to our tracker.

Printing

void Explain_rowid_filter::print_explain_json(Explain_query *query, 
                                              Json_writer *writer,
                                              bool is_analyze)

has "is_analyze" argument. This controls whether analyze data should be
printed.



 Comments   
Comment by Igor Babaev [ 2019-02-24 ]

The patches for this task were pushed into 10.4

Generated at Thu Feb 08 08:41:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.