Details

    Description

      Support EXPLAIN FORMAT=JSON like MySQL 5.6 does.

      Difference from MySQL's EXPLAIN FORMAT=JSON

      We don't want to copy MySQL 5.6:
      From the user point of view:

      • 5.6 output format is not documented and unstable (even MySQL Workbench fails to parse it in some cases)
        • I don't expect that any 3rd party is able to parse it, other than relying that it's a well-formed JSON document.
      • 5.6 output format doesn't allow to display some info we want to display
      • 5.6's output format has legacy of tabular output format in many places
      • MariaDB has more optimizer features, so we will have to produce output that 5.6 never produces

      Currently supported

      • Basic SELECTs
      • Table access methods
        • full table/index scans
        • range access
        • index_merge access
        • I_S read optimizations
      • "Using where" (attached_condition)
      • rows and 'filtered' columns
      • Index Condition Pushdown ('index_condition')
      • UNIONs (incomplete)
      • Item-based subqueries (incomplete)
      • Basic join buffering
      • Single-table UPDATE/DELETE
      • Derived tables
      • Non-merged semi-joins (JTBMs)
      • [Merged] Semi-joins: SJ-Materialization
      • [Merged] Semi-joins: FirstMatch, DuplicateElimination, LooseScan

      Not yet supported (important)

      • ORDER BY/GROUP BY/ DISTINCT handling – as agreed won't be ready before 10.1.2

      Not yet supported (less important)

      • Subquery cache feature
      • Advanced join buffering (display MRR as a special kind of scan)

      Attachments

        Issue Links

          Activity

            MySQL Workbench parses output of EXPLAIN FORMAT=JSON, see plugins/wb.query.analysis/explain_renderer.py.

            The code there makes some assumptions about the form of JSON structures. It is not immediately apparent what exactly the assumptions are.

            psergei Sergei Petrunia added a comment - MySQL Workbench parses output of EXPLAIN FORMAT=JSON, see plugins/wb.query.analysis/explain_renderer.py. The code there makes some assumptions about the form of JSON structures. It is not immediately apparent what exactly the assumptions are.

            MySQL Workbench 6.1.4 choked on every EXPLAIN FORMAT=JSON output for query with subqueries that I've tried (using the latest 5.6).

            Last Friday, they have released Workbench 6.1.6 which is able to process subqueries. However, it still makes some strong assumptions about the the data format.
            Another thing I noticed - some of the input (e.g. index condition pushdown) is ignored.

            psergei Sergei Petrunia added a comment - MySQL Workbench 6.1.4 choked on every EXPLAIN FORMAT=JSON output for query with subqueries that I've tried (using the latest 5.6). Last Friday, they have released Workbench 6.1.6 which is able to process subqueries. However, it still makes some strong assumptions about the the data format. Another thing I noticed - some of the input (e.g. index condition pushdown) is ignored.

            Tasks

            1. Figure out the expected JSON form (decide whether we try to follow it or not)
            2. Add code to print the query plan in JSON format.
            3. Add code to pretty-print the WHERE conditions.
            4. Check the parts of the optimizer that are printed poorly in tabular form
            4.1 Using temporary/Using filesort optimization
            4.2 Access to I_S tables
            5. Design FORMAT=JSON form for parts of query plan that MariaDB has but MySQL doesn't
            6. Check out if mysql has any testcases that we could use

            psergei Sergei Petrunia added a comment - Tasks 1. Figure out the expected JSON form (decide whether we try to follow it or not) 2. Add code to print the query plan in JSON format. 3. Add code to pretty-print the WHERE conditions. 4. Check the parts of the optimizer that are printed poorly in tabular form 4.1 Using temporary/Using filesort optimization 4.2 Access to I_S tables 5. Design FORMAT=JSON form for parts of query plan that MariaDB has but MySQL doesn't 6. Check out if mysql has any testcases that we could use

            It looks like in MySQL 5.6, EXPLAIN FORMAT=JSON relies on optimizer_trace.

            The JSON is produced here:

            (gdb) wher
            #0 opt_explain_json_namespace::table_base_ctx::format_body
            #1 0x00000000009617b5 in opt_explain_json_namespace::join_tab_ctx::format_body
            #2 0x000000000095f4b8 in opt_explain_json_namespace::context::format
            #3 0x000000000095cb9e in opt_explain_json_namespace::join_ctx::format_nested_loop
            #4 0x000000000095ca11 in opt_explain_json_namespace::join_ctx::format_body
            #5 0x000000000095f4b8 in opt_explain_json_namespace::context::format
            #6 0x000000000095ed1f in Explain_format_JSON::end_context
            #7 0x0000000000954204 in Explain::send
            #8 0x0000000000958850 in explain_query_specification
            #9 0x00000000008126d8 in JOIN::explain

            Check the function arguments:

            bool table_base_ctx::format_body(Opt_trace_context *json, Opt_trace_object *obj)

            psergei Sergei Petrunia added a comment - It looks like in MySQL 5.6, EXPLAIN FORMAT=JSON relies on optimizer_trace. The JSON is produced here: (gdb) wher #0 opt_explain_json_namespace::table_base_ctx::format_body #1 0x00000000009617b5 in opt_explain_json_namespace::join_tab_ctx::format_body #2 0x000000000095f4b8 in opt_explain_json_namespace::context::format #3 0x000000000095cb9e in opt_explain_json_namespace::join_ctx::format_nested_loop #4 0x000000000095ca11 in opt_explain_json_namespace::join_ctx::format_body #5 0x000000000095f4b8 in opt_explain_json_namespace::context::format #6 0x000000000095ed1f in Explain_format_JSON::end_context #7 0x0000000000954204 in Explain::send #8 0x0000000000958850 in explain_query_specification #9 0x00000000008126d8 in JOIN::explain Check the function arguments: bool table_base_ctx::format_body(Opt_trace_context *json, Opt_trace_object *obj)
            psergei Sergei Petrunia added a comment - https://mariadb.com/kb/en/explain-formatjson-in-mysql/

            Debugging a query...the following happens:

            1. Query plan is created
            2. [if this is EXPLAIN, we print EXPLAIN]
            3. lex->unit.cleanup() is called. It destroys JOIN objects and other execution structures.
            4. THD::cleanup_after_query() calls free_items() which calls Item::delete_self() for allocated Item objects.
            5. log_slow_statement() saves EXPLAIN and calls delete_explain_query().

            #2 happens before #3 and #4. This means, we can save everything in EXPLAIN data structures.

            However, #5 happens after #3 and #4, which means some details about the query plans are not accessible when printing EXPLAIN into slow query log.

            psergei Sergei Petrunia added a comment - Debugging a query...the following happens: 1. Query plan is created 2. [if this is EXPLAIN, we print EXPLAIN] 3. lex->unit.cleanup() is called. It destroys JOIN objects and other execution structures. 4. THD::cleanup_after_query() calls free_items() which calls Item::delete_self() for allocated Item objects. 5. log_slow_statement() saves EXPLAIN and calls delete_explain_query(). #2 happens before #3 and #4. This means, we can save everything in EXPLAIN data structures. However, #5 happens after #3 and #4, which means some details about the query plans are not accessible when printing EXPLAIN into slow query log.

            Possible solutions:
            1. Rely on the fact that EXPLAIN FORMAT=JSON is only printed on step#2, and not on step #5. This means, SHOW EXPLAIN FORMAT=JSON or log_slow_verbosity=explain_json will not be possible.
            2. At step#1, save all information we might need. The possible disadvantage is extra overhead.
            3. Have additional "final" saving at step#4. This will make the code more complex. (TODO: and whats the advantage over solution#2?)

            psergei Sergei Petrunia added a comment - Possible solutions: 1. Rely on the fact that EXPLAIN FORMAT=JSON is only printed on step#2, and not on step #5. This means, SHOW EXPLAIN FORMAT=JSON or log_slow_verbosity=explain_json will not be possible. 2. At step#1, save all information we might need. The possible disadvantage is extra overhead. 3. Have additional "final" saving at step#4. This will make the code more complex. (TODO: and whats the advantage over solution#2?)

            Looking at MySQL:

            create table t3 (a1 int, a2 int, b1 int, b2 int, key(a1,a2), key(b1,b2)); 
            insert into t3 select a,a,a,a from t1;
             
            explain select * from t3 where (a1=1 and a2=1) or (b1=1 and b2=1);

            Tabular EXPLAIN shows:
            key= "a1,b1"
            Extra= "Using union(a1,b1), ..."

            EXPLAIN FORMAT=JSON shows:
            "key": "union(a1,b1)"

            So, "key" in tabular and json output are not the same.

            psergei Sergei Petrunia added a comment - Looking at MySQL: create table t3 (a1 int, a2 int, b1 int, b2 int, key(a1,a2), key(b1,b2)); insert into t3 select a,a,a,a from t1;   explain select * from t3 where (a1=1 and a2=1) or (b1=1 and b2=1); Tabular EXPLAIN shows: key= "a1,b1" Extra= "Using union(a1,b1), ..." EXPLAIN FORMAT=JSON shows: "key": "union(a1,b1)" So, "key" in tabular and json output are not the same.

            Another interesting column is JSON's used_key_parts. It seems to be present only when a single index is used.
            for index_merge, one has to decode key_length.

            psergei Sergei Petrunia added a comment - Another interesting column is JSON's used_key_parts. It seems to be present only when a single index is used. for index_merge, one has to decode key_length.

            Came up with some reasonable JSON output for access to single table.
            Next things are:

            • join buffering
            • subqueries
            • ORDER/GROUP BY

            Looking at MySQL:
            == Join buffering ==

            • join buffering is handled in a semi-tabular form, "table" has

              "using_join_buffer": "Block Nested Loop".

            • join condition and table-read condition are printed together.

            == Subqueries ==

            • subqueries are listed inside select_list_subqueries, attached_subqueries, etc...
            • attached_subqueries[] is attached to a "table{}". It is possible to get the same subquery listed twice.
            • if subquery is a part of a WHERE, it is printed there in full.

            == ORDER/GROUP BY ==
            it looks like this:

                "ordering_operation": {
                  "using_temporary_table": true,
                  "using_filesort": true,
                  "nested_loop": [

            it's possible to get it with using_filesort=false, which means we're scanning an index that produces the desired ordering. ordering_operation always embeds the nested_loop node.

            psergei Sergei Petrunia added a comment - Came up with some reasonable JSON output for access to single table. Next things are: join buffering subqueries ORDER/GROUP BY Looking at MySQL: == Join buffering == join buffering is handled in a semi-tabular form, "table" has "using_join_buffer": "Block Nested Loop". join condition and table-read condition are printed together. == Subqueries == subqueries are listed inside select_list_subqueries, attached_subqueries, etc... attached_subqueries[] is attached to a "table{}". It is possible to get the same subquery listed twice. if subquery is a part of a WHERE, it is printed there in full. == ORDER/GROUP BY == it looks like this: "ordering_operation": { "using_temporary_table": true, "using_filesort": true, "nested_loop": [ it's possible to get it with using_filesort=false, which means we're scanning an index that produces the desired ordering. ordering_operation always embeds the nested_loop node.

            How MySQL-5.6 displays semi-join subquery plans:

            FirstMatch:

              "table" : { .... ,"first_match" : "other_table", ... }

            Duplicate Elimination:

              "duplicates_removal": {
                "using_temporary_table": true,
                "nested_loop": [ ... tables are here ] 
              }

            SJ-Materialization looks barely distinguishable from derived table:

                "table": {
                  "table_name": "<subquery2>",
                  "access_type": "ALL",
                  "materialized_from_subquery": {
                    "using_temporary_table": true,
                    "query_block": {
                       ... suquery goes here 
                    }
                  }

            LooseScan

            The table being scanned has:

             "loosescan": true,

            psergei Sergei Petrunia added a comment - How MySQL-5.6 displays semi-join subquery plans: FirstMatch: "table" : { .... ,"first_match" : "other_table", ... } Duplicate Elimination: "duplicates_removal": { "using_temporary_table": true, "nested_loop": [ ... tables are here ] } SJ-Materialization looks barely distinguishable from derived table: "table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "query_block": { ... suquery goes here } } LooseScan The table being scanned has: "loosescan": true,

            hi guys, i started reading this mdev (thank you implementing this feature with mariadb!), i don't know if it's possible, but optimizer explain why it choise one index instead of another? check MDEV-7239 (an idea about how to report this) and MDEV-7125 (a problem where optimizer select the 'wrong index' (slower) instead the 'right index' (faster)), i don't know what is the optimizer "decision tree" (or something like it), but is it possible to report this "index choise" to dba/developer? it's very interesting to understand how to better use optimizer
            i'm watching this mdev, if you need users to test, i'm here

            rspadim roberto spadim added a comment - hi guys, i started reading this mdev (thank you implementing this feature with mariadb!), i don't know if it's possible, but optimizer explain why it choise one index instead of another? check MDEV-7239 (an idea about how to report this) and MDEV-7125 (a problem where optimizer select the 'wrong index' (slower) instead the 'right index' (faster)), i don't know what is the optimizer "decision tree" (or something like it), but is it possible to report this "index choise" to dba/developer? it's very interesting to understand how to better use optimizer i'm watching this mdev, if you need users to test, i'm here
            psergei Sergei Petrunia added a comment - - edited

            I think this should be resolved outside of this MDEV. Within the scope of this MDEV, we limit ourselves to printing basically EXPLAIN or ANALYZE-stmt output in JSON form, with more details. Will reply on MDEV-7239.

            psergei Sergei Petrunia added a comment - - edited I think this should be resolved outside of this MDEV. Within the scope of this MDEV, we limit ourselves to printing basically EXPLAIN or ANALYZE-stmt output in JSON form, with more details. Will reply on MDEV-7239 .
            rspadim roberto spadim added a comment - - edited

            ok no problem i marked as "is blocked by mdev-6109" at mdev-7239

            rspadim roberto spadim added a comment - - edited ok no problem i marked as "is blocked by mdev-6109" at mdev-7239

            Pushed into 10.1 tree

            psergei Sergei Petrunia added a comment - Pushed into 10.1 tree

            People

              psergei Sergei Petrunia
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.