[MDEV-6109] EXPLAIN JSON - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.1.2
Component/s: Optimizer
Labels:
- optimizer

Description

Support EXPLAIN FORMAT=JSON like MySQL 5.6 does.

Difference from MySQL's EXPLAIN FORMAT=JSON

We don't want to copy MySQL 5.6:
From the user point of view:

5.6 output format is not documented and unstable (even MySQL Workbench fails to parse it in some cases)
- I don't expect that any 3rd party is able to parse it, other than relying that it's a well-formed JSON document.
5.6 output format doesn't allow to display some info we want to display
5.6's output format has legacy of tabular output format in many places
MariaDB has more optimizer features, so we will have to produce output that 5.6 never produces

Currently supported

Basic SELECTs
Table access methods
- full table/index scans
- range access
- index_merge access
- I_S read optimizations
"Using where" (attached_condition)
rows and 'filtered' columns
Index Condition Pushdown ('index_condition')
UNIONs (incomplete)
Item-based subqueries (incomplete)
Basic join buffering
Single-table UPDATE/DELETE
Derived tables
Non-merged semi-joins (JTBMs)
[Merged] Semi-joins: SJ-Materialization
[Merged] Semi-joins: FirstMatch, DuplicateElimination, LooseScan

Not yet supported (important)

ORDER BY/GROUP BY/ DISTINCT handling – as agreed won't be ready before 10.1.2

Not yet supported (less important)

Subquery cache feature
Advanced join buffering (display MRR as a special kind of scan)

Attachments

Issue Links

relates to

MDEV-6995 EXPLAIN JSON and ORDER BY, GROUP BY, etc

Open

MDEV-7248 EXPLAIN FORMAT=JSON: Better output

Open

MDEV-7264 Assertion `0' failed in subselect_engine::get_identifier() on EXPLAIN FORMAT=JSON

Closed

MDEV-7265 Assertion `0' failed in Explain_table_access::tag_to_json(Json_writer*, explain_extra_tag)

Closed

MDEV-7266 Assertion `!element_started' failed in Json_writer& Json_writer::add_member(const char*)

Closed

MDEV-7267 Server crashes in Item_field::print on ANALYZE FORMAT=JSON

Closed

links to

MySQL Bug: "EXPLAIN FORMAT=json output not documented."

(1 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia added a comment - 2014-05-15 13:51

MySQL Workbench parses output of EXPLAIN FORMAT=JSON, see plugins/wb.query.analysis/explain_renderer.py.

The code there makes some assumptions about the form of JSON structures. It is not immediately apparent what exactly the assumptions are.

Sergei Petrunia added a comment - 2014-05-15 13:51 MySQL Workbench parses output of EXPLAIN FORMAT=JSON, see plugins/wb.query.analysis/explain_renderer.py. The code there makes some assumptions about the form of JSON structures. It is not immediately apparent what exactly the assumptions are.

Sergei Petrunia added a comment - 2014-05-21 10:43

MySQL Workbench 6.1.4 choked on every EXPLAIN FORMAT=JSON output for query with subqueries that I've tried (using the latest 5.6).

Last Friday, they have released Workbench 6.1.6 which is able to process subqueries. However, it still makes some strong assumptions about the the data format.
Another thing I noticed - some of the input (e.g. index condition pushdown) is ignored.

Sergei Petrunia added a comment - 2014-05-21 10:43 MySQL Workbench 6.1.4 choked on every EXPLAIN FORMAT=JSON output for query with subqueries that I've tried (using the latest 5.6). Last Friday, they have released Workbench 6.1.6 which is able to process subqueries. However, it still makes some strong assumptions about the the data format. Another thing I noticed - some of the input (e.g. index condition pushdown) is ignored.

Sergei Petrunia added a comment - 2014-05-21 11:03

Tasks

1. Figure out the expected JSON form (decide whether we try to follow it or not)
2. Add code to print the query plan in JSON format.
3. Add code to pretty-print the WHERE conditions.
4. Check the parts of the optimizer that are printed poorly in tabular form
4.1 Using temporary/Using filesort optimization
4.2 Access to I_S tables
5. Design FORMAT=JSON form for parts of query plan that MariaDB has but MySQL doesn't
6. Check out if mysql has any testcases that we could use

Sergei Petrunia added a comment - 2014-05-21 11:03 Tasks 1. Figure out the expected JSON form (decide whether we try to follow it or not) 2. Add code to print the query plan in JSON format. 3. Add code to pretty-print the WHERE conditions. 4. Check the parts of the optimizer that are printed poorly in tabular form 4.1 Using temporary/Using filesort optimization 4.2 Access to I_S tables 5. Design FORMAT=JSON form for parts of query plan that MariaDB has but MySQL doesn't 6. Check out if mysql has any testcases that we could use

Sergei Petrunia added a comment - 2014-05-23 19:00

It looks like in MySQL 5.6, EXPLAIN FORMAT=JSON relies on optimizer_trace.

The JSON is produced here:

(gdb) wher
#0 opt_explain_json_namespace::table_base_ctx::format_body
#1 0x00000000009617b5 in opt_explain_json_namespace::join_tab_ctx::format_body
#2 0x000000000095f4b8 in opt_explain_json_namespace::context::format
#3 0x000000000095cb9e in opt_explain_json_namespace::join_ctx::format_nested_loop
#4 0x000000000095ca11 in opt_explain_json_namespace::join_ctx::format_body
#5 0x000000000095f4b8 in opt_explain_json_namespace::context::format
#6 0x000000000095ed1f in Explain_format_JSON::end_context
#7 0x0000000000954204 in Explain::send
#8 0x0000000000958850 in explain_query_specification
#9 0x00000000008126d8 in JOIN::explain

Check the function arguments:

bool table_base_ctx::format_body(Opt_trace_context *json, Opt_trace_object *obj)

Sergei Petrunia added a comment - 2014-05-23 19:00 It looks like in MySQL 5.6, EXPLAIN FORMAT=JSON relies on optimizer_trace. The JSON is produced here: (gdb) wher #0 opt_explain_json_namespace::table_base_ctx::format_body #1 0x00000000009617b5 in opt_explain_json_namespace::join_tab_ctx::format_body #2 0x000000000095f4b8 in opt_explain_json_namespace::context::format #3 0x000000000095cb9e in opt_explain_json_namespace::join_ctx::format_nested_loop #4 0x000000000095ca11 in opt_explain_json_namespace::join_ctx::format_body #5 0x000000000095f4b8 in opt_explain_json_namespace::context::format #6 0x000000000095ed1f in Explain_format_JSON::end_context #7 0x0000000000954204 in Explain::send #8 0x0000000000958850 in explain_query_specification #9 0x00000000008126d8 in JOIN::explain Check the function arguments: bool table_base_ctx::format_body(Opt_trace_context *json, Opt_trace_object *obj)

Sergei Petrunia added a comment - 2014-05-27 16:23

https://mariadb.com/kb/en/explain-formatjson-in-mysql/

Sergei Petrunia added a comment - 2014-05-27 16:23 https://mariadb.com/kb/en/explain-formatjson-in-mysql/

Sergei Petrunia added a comment - 2014-08-12 15:29

Debugging a query...the following happens:

1. Query plan is created
2. [if this is EXPLAIN, we print EXPLAIN]
3. lex->unit.cleanup() is called. It destroys JOIN objects and other execution structures.
4. THD::cleanup_after_query() calls free_items() which calls Item::delete_self() for allocated Item objects.
5. log_slow_statement() saves EXPLAIN and calls delete_explain_query().

#2 happens before #3 and #4. This means, we can save everything in EXPLAIN data structures.

However, #5 happens after #3 and #4, which means some details about the query plans are not accessible when printing EXPLAIN into slow query log.

Sergei Petrunia added a comment - 2014-08-12 15:29 Debugging a query...the following happens: 1. Query plan is created 2. [if this is EXPLAIN, we print EXPLAIN] 3. lex->unit.cleanup() is called. It destroys JOIN objects and other execution structures. 4. THD::cleanup_after_query() calls free_items() which calls Item::delete_self() for allocated Item objects. 5. log_slow_statement() saves EXPLAIN and calls delete_explain_query(). #2 happens before #3 and #4. This means, we can save everything in EXPLAIN data structures. However, #5 happens after #3 and #4, which means some details about the query plans are not accessible when printing EXPLAIN into slow query log.

Sergei Petrunia added a comment - 2014-08-12 15:40

Possible solutions:
1. Rely on the fact that EXPLAIN FORMAT=JSON is only printed on step#2, and not on step #5. This means, SHOW EXPLAIN FORMAT=JSON or log_slow_verbosity=explain_json will not be possible.
2. At step#1, save all information we might need. The possible disadvantage is extra overhead.
3. Have additional "final" saving at step#4. This will make the code more complex. (TODO: and whats the advantage over solution#2?)

Sergei Petrunia added a comment - 2014-08-12 15:40 Possible solutions: 1. Rely on the fact that EXPLAIN FORMAT=JSON is only printed on step#2, and not on step #5. This means, SHOW EXPLAIN FORMAT=JSON or log_slow_verbosity=explain_json will not be possible. 2. At step#1, save all information we might need. The possible disadvantage is extra overhead. 3. Have additional "final" saving at step#4. This will make the code more complex. (TODO: and whats the advantage over solution#2?)

Sergei Petrunia added a comment - 2014-08-12 17:26

Looking at MySQL:

create table t3 (a1 int, a2 int, b1 int, b2 int, key(a1,a2), key(b1,b2));

insert into t3 select a,a,a,a from t1;

explain select * from t3 where (a1=1 and a2=1) or (b1=1 and b2=1);

Tabular EXPLAIN shows:
key= "a1,b1"
Extra= "Using union(a1,b1), ..."

EXPLAIN FORMAT=JSON shows:
"key": "union(a1,b1)"

So, "key" in tabular and json output are not the same.

Sergei Petrunia added a comment - 2014-08-12 17:26 Looking at MySQL: create table t3 (a1 int, a2 int, b1 int, b2 int, key(a1,a2), key(b1,b2)); insert into t3 select a,a,a,a from t1; explain select * from t3 where (a1=1 and a2=1) or (b1=1 and b2=1); Tabular EXPLAIN shows: key= "a1,b1" Extra= "Using union(a1,b1), ..." EXPLAIN FORMAT=JSON shows: "key": "union(a1,b1)" So, "key" in tabular and json output are not the same.

Sergei Petrunia added a comment - 2014-08-12 17:27

Another interesting column is JSON's used_key_parts. It seems to be present only when a single index is used.
for index_merge, one has to decode key_length.

Sergei Petrunia added a comment - 2014-08-12 17:27 Another interesting column is JSON's used_key_parts. It seems to be present only when a single index is used. for index_merge, one has to decode key_length.

Sergei Petrunia added a comment - 2014-08-14 01:33

Came up with some reasonable JSON output for access to single table.
Next things are:

join buffering
subqueries
ORDER/GROUP BY

Looking at MySQL:
== Join buffering ==

join buffering is handled in a semi-tabular form, "table" has
"using_join_buffer": "Block Nested Loop".
join condition and table-read condition are printed together.

== Subqueries ==

subqueries are listed inside select_list_subqueries, attached_subqueries, etc...

attached_subqueries[] is attached to a "table{}". It is possible to get the same subquery listed twice.
if subquery is a part of a WHERE, it is printed there in full.

== ORDER/GROUP BY ==
it looks like this:

    "ordering_operation": {

      "using_temporary_table": true,

      "using_filesort": true,

      "nested_loop": [

it's possible to get it with using_filesort=false, which means we're scanning an index that produces the desired ordering. ordering_operation always embeds the nested_loop node.

Sergei Petrunia added a comment - 2014-08-14 01:33 Came up with some reasonable JSON output for access to single table. Next things are: join buffering subqueries ORDER/GROUP BY Looking at MySQL: == Join buffering == join buffering is handled in a semi-tabular form, "table" has "using_join_buffer": "Block Nested Loop". join condition and table-read condition are printed together. == Subqueries == subqueries are listed inside select_list_subqueries, attached_subqueries, etc... attached_subqueries[] is attached to a "table{}". It is possible to get the same subquery listed twice. if subquery is a part of a WHERE, it is printed there in full. == ORDER/GROUP BY == it looks like this: "ordering_operation": { "using_temporary_table": true, "using_filesort": true, "nested_loop": [ it's possible to get it with using_filesort=false, which means we're scanning an index that produces the desired ordering. ordering_operation always embeds the nested_loop node.

Sergei Petrunia added a comment - 2014-11-28 21:57

How MySQL-5.6 displays semi-join subquery plans:

FirstMatch:

  "table" : { .... ,"first_match" : "other_table", ... }

Duplicate Elimination:

  "duplicates_removal": {

    "using_temporary_table": true,

    "nested_loop": [ ... tables are here ]

SJ-Materialization looks barely distinguishable from derived table:

    "table": {

      "table_name": "<subquery2>",

      "access_type": "ALL",

      "materialized_from_subquery": {

        "using_temporary_table": true,

        "query_block": {

           ... suquery goes here

LooseScan

The table being scanned has:

 "loosescan": true,

Sergei Petrunia added a comment - 2014-11-28 21:57 How MySQL-5.6 displays semi-join subquery plans: FirstMatch: "table" : { .... ,"first_match" : "other_table", ... } Duplicate Elimination: "duplicates_removal": { "using_temporary_table": true, "nested_loop": [ ... tables are here ] } SJ-Materialization looks barely distinguishable from derived table: "table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "query_block": { ... suquery goes here } } LooseScan The table being scanned has: "loosescan": true,

roberto spadim added a comment - 2014-12-01 15:28

hi guys, i started reading this mdev (thank you implementing this feature with mariadb!), i don't know if it's possible, but optimizer explain why it choise one index instead of another? check ~~MDEV-7239~~ (an idea about how to report this) and ~~MDEV-7125~~ (a problem where optimizer select the 'wrong index' (slower) instead the 'right index' (faster)), i don't know what is the optimizer "decision tree" (or something like it), but is it possible to report this "index choise" to dba/developer? it's very interesting to understand how to better use optimizer
i'm watching this mdev, if you need users to test, i'm here

roberto spadim added a comment - 2014-12-01 15:28 hi guys, i started reading this mdev (thank you implementing this feature with mariadb!), i don't know if it's possible, but optimizer explain why it choise one index instead of another? check MDEV-7239 (an idea about how to report this) and MDEV-7125 (a problem where optimizer select the 'wrong index' (slower) instead the 'right index' (faster)), i don't know what is the optimizer "decision tree" (or something like it), but is it possible to report this "index choise" to dba/developer? it's very interesting to understand how to better use optimizer i'm watching this mdev, if you need users to test, i'm here

Sergei Petrunia added a comment - 2014-12-02 00:36 - edited

I think this should be resolved outside of this MDEV. Within the scope of this MDEV, we limit ourselves to printing basically EXPLAIN or ANALYZE-stmt output in JSON form, with more details. Will reply on ~~MDEV-7239~~.

Sergei Petrunia added a comment - 2014-12-02 00:36 - edited I think this should be resolved outside of this MDEV. Within the scope of this MDEV, we limit ourselves to printing basically EXPLAIN or ANALYZE-stmt output in JSON form, with more details. Will reply on MDEV-7239 .

roberto spadim added a comment - 2014-12-02 00:50 - edited

ok no problem i marked as "is blocked by mdev-6109" at mdev-7239

roberto spadim added a comment - 2014-12-02 00:50 - edited ok no problem i marked as "is blocked by mdev-6109" at mdev-7239

Sergei Petrunia added a comment - 2014-12-02 11:40

Pushed into 10.1 tree

Sergei Petrunia added a comment - 2014-12-02 11:40 Pushed into 10.1 tree

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Golubchik

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2014-04-15 20:05

Updated:: 2014-12-04 19:08

Resolved:: 2014-12-02 11:39

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server