Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31577

Make ANALYZE FORMAT=JSON print innodb stats

Details

    Description

      MDEV-31558 adds InnoDB engine stats into the slow query log.

      It would be nice to include it into ANALYZE FORMAT=JSON output, too.

      User interface: Extra info is printed in ANALYZE FORMAT=JSON output.

      It is printed on per-table basis. table.r_engine_stats will is an object with member counters.

      In current patch, it looks like this:

              "table": {
                "table_name": "t1",
                ...
                "r_engine_stats": {
                  "pages_accessed": integer,
                  "pages_updated" : integer,
                  "pages_read_count" : integer,
                  "pages_read_time_ms" : double_val,
                  "old_rows_read" : integer,
                },
                ...
      

      Only non-zero members are printed.

      mysql-test/include/analyze-format.inc is adjusted to mask the entire contents of r_engine_stats.

      Attachments

        Issue Links

          Activity

            bb-10.11-mdev31577

            psergei Sergei Petrunia added a comment - bb-10.11-mdev31577

            Points about the patch:

            • "JSON_EXT" name. Is it ok, any other suggestions?
            • Current patch adds handler* pointers to the Explain data structures. This means the output of analyze format=json can only be printed as long as the tables are open. Are we really fine with this?
            • Test coverage for SHOW ANALYZE FORMAT=JSON_EXT.
            • ??
            psergei Sergei Petrunia added a comment - Points about the patch: "JSON_EXT" name. Is it ok, any other suggestions? Current patch adds handler* pointers to the Explain data structures. This means the output of analyze format=json can only be printed as long as the tables are open. Are we really fine with this? Test coverage for SHOW ANALYZE FORMAT=JSON_EXT. ??

            I agree that JSON_EXT is a wrong value and FORMAT is a wrong place.

            Why should it be conditional at all? It's JSON. In tabular format you cannot easily add new values and columns, it might break existing applications. But JSON doesn't have a fixed schema and is almost infinitely extendable. Applications simply ignore keys they aren't interested in.

            So I'd suggest to print new information in FORMAT=JSON and not make it conditional

            serg Sergei Golubchik added a comment - I agree that JSON_EXT is a wrong value and FORMAT is a wrong place. Why should it be conditional at all? It's JSON. In tabular format you cannot easily add new values and columns, it might break existing applications. But JSON doesn't have a fixed schema and is almost infinitely extendable. Applications simply ignore keys they aren't interested in. So I'd suggest to print new information in FORMAT=JSON and not make it conditional

            Note that PostreSQL has similar behavior: their EXPLAIN ANALYZE doesn't print number of pages fetched by default, one needs to use "EXPLAIN BUFFERS" syntax to get it.

            psergei Sergei Petrunia added a comment - Note that PostreSQL has similar behavior: their EXPLAIN ANALYZE doesn't print number of pages fetched by default, one needs to use "EXPLAIN BUFFERS" syntax to get it.
            psergei Sergei Petrunia added a comment - - edited

            I agree that using the FORMAT field to control what is printed doesnt' look like a good idea.

            But should engine stats always be printed or one needs some syntax to indicate they want them?

            Arguments for printing always:

            • Fewer options to remember.
            • We can ask somebody for ANALYZE FORMAT=JSON output and get everything.

            Arguments for printing only when requested:

            • The output is smaller without r_engine_stats. Most of the time, the fields in r_engine_stats are not needed (this is why current patch doesn't print fields with zero values).
            • The contents of r_engine_stats are non-deterministic (ie. depend on the current state of InnoDB, which may vary). Which means the mtr scripts must mask them.
              So far, masking worked as follows: we used include/analyze-format.inc which has

              --replace_regex /("(r_[a-z_]*_time(_in_progress)?_ms|r_buffer_size)": )[^, \n]*/\1"REPLACED"/
              

              One can't extend this approach to remove the contents of r_engine_stats:
              EDIT: Wrong, one can!

              "r_engine_stats": {  
                <potentially some stuff here. Spans multiple lines. > 
              }
              

              A workaround is to always print all members of r_engine_stats (regardless if they are zero or not) and overwrite their contents on a per-member basis.
              (Note: removing r_engine_stats with JSON_REMOVE function is hard:

              Wildcards or range in JSON path not allowed in argument 2 to function 'json_remove'
              

              )

            psergei Sergei Petrunia added a comment - - edited I agree that using the FORMAT field to control what is printed doesnt' look like a good idea. But should engine stats always be printed or one needs some syntax to indicate they want them? Arguments for printing always: Fewer options to remember. We can ask somebody for ANALYZE FORMAT=JSON output and get everything. Arguments for printing only when requested: The output is smaller without r_engine_stats . Most of the time, the fields in r_engine_stats are not needed (this is why current patch doesn't print fields with zero values). The contents of r_engine_stats are non-deterministic (ie. depend on the current state of InnoDB, which may vary). Which means the mtr scripts must mask them. So far, masking worked as follows: we used include/analyze-format.inc which has --replace_regex /("(r_[a-z_]*_time(_in_progress)?_ms|r_buffer_size)": )[^, \n]*/\1"REPLACED"/ One can't extend this approach to remove the contents of r_engine_stats: EDIT : Wrong, one can! "r_engine_stats": { <potentially some stuff here. Spans multiple lines. > } A workaround is to always print all members of r_engine_stats (regardless if they are zero or not) and overwrite their contents on a per-member basis. (Note: removing r_engine_stats with JSON_REMOVE function is hard: Wildcards or range in JSON path not allowed in argument 2 to function 'json_remove' )

            Takeaways from discussion with Monty:

            • Let's have ANALYZE FORMAT=JSON always print the new fields.
            • mtr needs to ignore the new fields. This can be done "on the mtr level".
            • If we will need Explain data structures to be able to print JSON output after the query tables were closed, we can add a step where we copy the needed data into Explain data structures before we close the tables.
            psergei Sergei Petrunia added a comment - Takeaways from discussion with Monty: Let's have ANALYZE FORMAT=JSON always print the new fields. mtr needs to ignore the new fields. This can be done "on the mtr level". If we will need Explain data structures to be able to print JSON output after the query tables were closed, we can add a step where we copy the needed data into Explain data structures before we close the tables.

            New patch:

            commit b45af5d582331711772210d4b7fd60aa423e7255 (HEAD -> bb-10.11-mdev31577-v2, origin/bb-10.11-mdev31577-v2)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Thu Jul 6 10:41:46 2023 +0300
             
                MDEV-31577: Make ANALYZE FORMAT=JSON print innodb stats
            

            psergei Sergei Petrunia added a comment - New patch: commit b45af5d582331711772210d4b7fd60aa423e7255 (HEAD -> bb-10.11-mdev31577-v2, origin/bb-10.11-mdev31577-v2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Thu Jul 6 10:41:46 2023 +0300   MDEV-31577: Make ANALYZE FORMAT=JSON print innodb stats
            oleg.smirnov Oleg Smirnov added a comment -

            1. Are you sure "analyze_format_json_ext.test" is a good name for the test since it has been decided not to introduce new FORMAT specifier (JSON_EXT)? The name of the test seems to be confusing.

            2. The only info I have managed to see in the "r_engine_stats" section is "pages_accessed", and it's only available for SELECTs. For an INSERT the section is empty:

            analyze format=json insert into t1 select seq, seq, seq, seq from seq_1_to_10000;
            ANALYZE
            {
              "query_optimization": {
                "r_total_time_ms": 0.101752963
              },
              "query_block": {
                "select_id": 1,
                "r_loops": 1,
                "r_total_time_ms": 752.7264409,
                "nested_loop": [
                  {
                    "table": {
                      "table_name": "seq_1_to_10000",
                      "access_type": "index",
                      "key": "PRIMARY",
                      "key_length": "8",
                      "used_key_parts": ["seq"],
                      "r_loops": 1,
                      "rows": 10000,
                      "r_rows": 10000,
                      "r_table_time_ms": 4.345646607,
                      "r_other_time_ms": 748.376163,
                      "r_engine_stats": {},
                      "filtered": 100,
                      "r_filtered": 100,
                      "using_index": true
                    }
                  }
                ]
              }
            }
            

            and for an UPDATE it is missing:

            ANALYZE FORMAT=json update t1 set pk = pk-1 where pk < 120000;
            ANALYZE
            {
              "query_optimization": {
                "r_total_time_ms": 0.109662098
              },
              "query_block": {
                "select_id": 1,
                "r_total_time_ms": 644.3883589,
                "table": {
                  "update": 1,
                  "table_name": "t1",
                  "access_type": "ALL",
                  "rows": 10000,
                  "r_rows": 10000,
                  "r_filtered": 100,
                  "r_total_time_ms": 604.924738,
                  "attached_condition": "t1.pk < 120000"
                }
              }
            }
            

            The first question is why "pages_updated", "pages_read_count", "pages_read_time_ms", "old_rows_read" are not displayed, and the second is why there is no "r_engine_stats" for INSERTs and UPDATEs?

            oleg.smirnov Oleg Smirnov added a comment - 1. Are you sure "analyze_format_json_ext.test" is a good name for the test since it has been decided not to introduce new FORMAT specifier (JSON_EXT)? The name of the test seems to be confusing. 2. The only info I have managed to see in the "r_engine_stats" section is "pages_accessed", and it's only available for SELECTs. For an INSERT the section is empty: analyze format=json insert into t1 select seq, seq, seq, seq from seq_1_to_10000; ANALYZE { "query_optimization": { "r_total_time_ms": 0.101752963 }, "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 752.7264409, "nested_loop": [ { "table": { "table_name": "seq_1_to_10000", "access_type": "index", "key": "PRIMARY", "key_length": "8", "used_key_parts": ["seq"], "r_loops": 1, "rows": 10000, "r_rows": 10000, "r_table_time_ms": 4.345646607, "r_other_time_ms": 748.376163, "r_engine_stats": {}, "filtered": 100, "r_filtered": 100, "using_index": true } } ] } } and for an UPDATE it is missing: ANALYZE FORMAT=json update t1 set pk = pk-1 where pk < 120000; ANALYZE { "query_optimization": { "r_total_time_ms": 0.109662098 }, "query_block": { "select_id": 1, "r_total_time_ms": 644.3883589, "table": { "update": 1, "table_name": "t1", "access_type": "ALL", "rows": 10000, "r_rows": 10000, "r_filtered": 100, "r_total_time_ms": 604.924738, "attached_condition": "t1.pk < 120000" } } } The first question is why "pages_updated", "pages_read_count", "pages_read_time_ms", "old_rows_read" are not displayed, and the second is why there is no "r_engine_stats" for INSERTs and UPDATEs?

            oleg.smirnov, thanks for the input.

            Re INSERT ... SELECT: currently, EXPLAIN (and ANALYZE) output only describes the SELECT part. In the statement you've provided, the SELECT part uses sequence storage engine so it doesn't show r_engine_stats.

            Should EXPLAIN or ANALYZE output show something about the table we're inserting to? I don't see what EXPLAIN would show. ANALYZE FORMAT=JSON could show the time spent writing to the table, and now with MDEV-31558 it could show r_engine_statistics. I think we need to do this but it is outside of the scope of this MDEV.

            As for UPDATEs/DELETEs, indeed it is an omission in the patch. The engine statistics are counted but not displayed.

            psergei Sergei Petrunia added a comment - oleg.smirnov , thanks for the input. Re INSERT ... SELECT : currently, EXPLAIN (and ANALYZE) output only describes the SELECT part. In the statement you've provided, the SELECT part uses sequence storage engine so it doesn't show r_engine_stats. Should EXPLAIN or ANALYZE output show something about the table we're inserting to? I don't see what EXPLAIN would show. ANALYZE FORMAT=JSON could show the time spent writing to the table, and now with MDEV-31558 it could show r_engine_statistics. I think we need to do this but it is outside of the scope of this MDEV. As for UPDATEs/DELETEs, indeed it is an omission in the patch. The engine statistics are counted but not displayed.

            ralf.gebhardt,

            Is this change only for InnoDB or for any engine?

            The patch for this MDEV will work with any Storage Engine that provides the statistics data.

            However, the patch for MDEV-31558 has added production of statistics data only into

            • InnoDB ,
            • Partitioning Storage Engine (so that statistics from all partitions are summed together).

            So, currently one can get the stats from [partitioned] innodb. Other storage engines can add support for this.

            psergei Sergei Petrunia added a comment - ralf.gebhardt , Is this change only for InnoDB or for any engine? The patch for this MDEV will work with any Storage Engine that provides the statistics data. However, the patch for MDEV-31558 has added production of statistics data only into InnoDB , Partitioning Storage Engine (so that statistics from all partitions are summed together). So, currently one can get the stats from [partitioned] innodb. Other storage engines can add support for this.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.