[MDEV-6109] EXPLAIN JSON Created: 2014-04-15  Updated: 2014-12-04  Resolved: 2014-12-02

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.1.2

Type: Task Priority: Critical
Reporter: Sergei Golubchik Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer

Issue Links:
Relates
relates to MDEV-6995 EXPLAIN JSON and ORDER BY, GROUP BY, etc Open
relates to MDEV-7248 EXPLAIN FORMAT=JSON: Better output Open
relates to MDEV-7264 Assertion `0' failed in subselect_eng... Closed
relates to MDEV-7265 Assertion `0' failed in Explain_table... Closed
relates to MDEV-7266 Assertion `!element_started' failed i... Closed
relates to MDEV-7267 Server crashes in Item_field::print o... Closed

 Description   

Support EXPLAIN FORMAT=JSON like MySQL 5.6 does.

Difference from MySQL's EXPLAIN FORMAT=JSON

We don't want to copy MySQL 5.6:
From the user point of view:

  • 5.6 output format is not documented and unstable (even MySQL Workbench fails to parse it in some cases)
    • I don't expect that any 3rd party is able to parse it, other than relying that it's a well-formed JSON document.
  • 5.6 output format doesn't allow to display some info we want to display
  • 5.6's output format has legacy of tabular output format in many places
  • MariaDB has more optimizer features, so we will have to produce output that 5.6 never produces

Currently supported

  • Basic SELECTs
  • Table access methods
    • full table/index scans
    • range access
    • index_merge access
    • I_S read optimizations
  • "Using where" (attached_condition)
  • rows and 'filtered' columns
  • Index Condition Pushdown ('index_condition')
  • UNIONs (incomplete)
  • Item-based subqueries (incomplete)
  • Basic join buffering
  • Single-table UPDATE/DELETE
  • Derived tables
  • Non-merged semi-joins (JTBMs)
  • [Merged] Semi-joins: SJ-Materialization
  • [Merged] Semi-joins: FirstMatch, DuplicateElimination, LooseScan

Not yet supported (important)

  • ORDER BY/GROUP BY/ DISTINCT handling – as agreed won't be ready before 10.1.2

Not yet supported (less important)

  • Subquery cache feature
  • Advanced join buffering (display MRR as a special kind of scan)


 Comments   
Comment by Sergei Petrunia [ 2014-05-15 ]

MySQL Workbench parses output of EXPLAIN FORMAT=JSON, see plugins/wb.query.analysis/explain_renderer.py.

The code there makes some assumptions about the form of JSON structures. It is not immediately apparent what exactly the assumptions are.

Comment by Sergei Petrunia [ 2014-05-21 ]

MySQL Workbench 6.1.4 choked on every EXPLAIN FORMAT=JSON output for query with subqueries that I've tried (using the latest 5.6).

Last Friday, they have released Workbench 6.1.6 which is able to process subqueries. However, it still makes some strong assumptions about the the data format.
Another thing I noticed - some of the input (e.g. index condition pushdown) is ignored.

Comment by Sergei Petrunia [ 2014-05-21 ]

Tasks

1. Figure out the expected JSON form (decide whether we try to follow it or not)
2. Add code to print the query plan in JSON format.
3. Add code to pretty-print the WHERE conditions.
4. Check the parts of the optimizer that are printed poorly in tabular form
4.1 Using temporary/Using filesort optimization
4.2 Access to I_S tables
5. Design FORMAT=JSON form for parts of query plan that MariaDB has but MySQL doesn't
6. Check out if mysql has any testcases that we could use

Comment by Sergei Petrunia [ 2014-05-23 ]

It looks like in MySQL 5.6, EXPLAIN FORMAT=JSON relies on optimizer_trace.

The JSON is produced here:

(gdb) wher
#0 opt_explain_json_namespace::table_base_ctx::format_body
#1 0x00000000009617b5 in opt_explain_json_namespace::join_tab_ctx::format_body
#2 0x000000000095f4b8 in opt_explain_json_namespace::context::format
#3 0x000000000095cb9e in opt_explain_json_namespace::join_ctx::format_nested_loop
#4 0x000000000095ca11 in opt_explain_json_namespace::join_ctx::format_body
#5 0x000000000095f4b8 in opt_explain_json_namespace::context::format
#6 0x000000000095ed1f in Explain_format_JSON::end_context
#7 0x0000000000954204 in Explain::send
#8 0x0000000000958850 in explain_query_specification
#9 0x00000000008126d8 in JOIN::explain

Check the function arguments:

bool table_base_ctx::format_body(Opt_trace_context *json, Opt_trace_object *obj)

Comment by Sergei Petrunia [ 2014-05-27 ]

https://mariadb.com/kb/en/explain-formatjson-in-mysql/

Comment by Sergei Petrunia [ 2014-08-12 ]

Debugging a query...the following happens:

1. Query plan is created
2. [if this is EXPLAIN, we print EXPLAIN]
3. lex->unit.cleanup() is called. It destroys JOIN objects and other execution structures.
4. THD::cleanup_after_query() calls free_items() which calls Item::delete_self() for allocated Item objects.
5. log_slow_statement() saves EXPLAIN and calls delete_explain_query().

#2 happens before #3 and #4. This means, we can save everything in EXPLAIN data structures.

However, #5 happens after #3 and #4, which means some details about the query plans are not accessible when printing EXPLAIN into slow query log.

Comment by Sergei Petrunia [ 2014-08-12 ]

Possible solutions:
1. Rely on the fact that EXPLAIN FORMAT=JSON is only printed on step#2, and not on step #5. This means, SHOW EXPLAIN FORMAT=JSON or log_slow_verbosity=explain_json will not be possible.
2. At step#1, save all information we might need. The possible disadvantage is extra overhead.
3. Have additional "final" saving at step#4. This will make the code more complex. (TODO: and whats the advantage over solution#2?)

Comment by Sergei Petrunia [ 2014-08-12 ]

Looking at MySQL:

create table t3 (a1 int, a2 int, b1 int, b2 int, key(a1,a2), key(b1,b2)); 
insert into t3 select a,a,a,a from t1;
 
explain select * from t3 where (a1=1 and a2=1) or (b1=1 and b2=1);

Tabular EXPLAIN shows:
key= "a1,b1"
Extra= "Using union(a1,b1), ..."

EXPLAIN FORMAT=JSON shows:
"key": "union(a1,b1)"

So, "key" in tabular and json output are not the same.

Comment by Sergei Petrunia [ 2014-08-12 ]

Another interesting column is JSON's used_key_parts. It seems to be present only when a single index is used.
for index_merge, one has to decode key_length.

Comment by Sergei Petrunia [ 2014-08-14 ]

Came up with some reasonable JSON output for access to single table.
Next things are:

  • join buffering
  • subqueries
  • ORDER/GROUP BY

Looking at MySQL:
== Join buffering ==

  • join buffering is handled in a semi-tabular form, "table" has

    "using_join_buffer": "Block Nested Loop".

  • join condition and table-read condition are printed together.

== Subqueries ==

  • subqueries are listed inside select_list_subqueries, attached_subqueries, etc...
  • attached_subqueries[] is attached to a "table{}". It is possible to get the same subquery listed twice.
  • if subquery is a part of a WHERE, it is printed there in full.

== ORDER/GROUP BY ==
it looks like this:

    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "nested_loop": [

it's possible to get it with using_filesort=false, which means we're scanning an index that produces the desired ordering. ordering_operation always embeds the nested_loop node.

Comment by Sergei Petrunia [ 2014-11-28 ]

How MySQL-5.6 displays semi-join subquery plans:

FirstMatch:

  "table" : { .... ,"first_match" : "other_table", ... }

Duplicate Elimination:

  "duplicates_removal": {
    "using_temporary_table": true,
    "nested_loop": [ ... tables are here ] 
  }

SJ-Materialization looks barely distinguishable from derived table:

    "table": {
      "table_name": "<subquery2>",
      "access_type": "ALL",
      "materialized_from_subquery": {
        "using_temporary_table": true,
        "query_block": {
           ... suquery goes here 
        }
      }

LooseScan

The table being scanned has:

 "loosescan": true,

Comment by roberto spadim [ 2014-12-01 ]

hi guys, i started reading this mdev (thank you implementing this feature with mariadb!), i don't know if it's possible, but optimizer explain why it choise one index instead of another? check MDEV-7239 (an idea about how to report this) and MDEV-7125 (a problem where optimizer select the 'wrong index' (slower) instead the 'right index' (faster)), i don't know what is the optimizer "decision tree" (or something like it), but is it possible to report this "index choise" to dba/developer? it's very interesting to understand how to better use optimizer
i'm watching this mdev, if you need users to test, i'm here

Comment by Sergei Petrunia [ 2014-12-02 ]

I think this should be resolved outside of this MDEV. Within the scope of this MDEV, we limit ourselves to printing basically EXPLAIN or ANALYZE-stmt output in JSON form, with more details. Will reply on MDEV-7239.

Comment by roberto spadim [ 2014-12-02 ]

ok no problem i marked as "is blocked by mdev-6109" at mdev-7239

Comment by Sergei Petrunia [ 2014-12-02 ]

Pushed into 10.1 tree

Generated at Thu Feb 08 07:09:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.