[MDEV-31577] Make ANALYZE FORMAT=JSON print innodb stats Created: 2023-06-29  Updated: 2023-09-29  Resolved: 2023-07-22

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Fix Version/s: 10.8.8, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: analyze-stmt

Issue Links:
Problem/Incident
causes MDEV-31764 ASAN use-after-poison in trace_engine... Closed
Relates
relates to MDEV-32286 ANALYZE displays a huge number of Inn... Confirmed
relates to MDEV-31558 Add InnoDB engine information to the ... Closed

 Description   

MDEV-31558 adds InnoDB engine stats into the slow query log.

It would be nice to include it into ANALYZE FORMAT=JSON output, too.

User interface: Extra info is printed in ANALYZE FORMAT=JSON output.

It is printed on per-table basis. table.r_engine_stats will is an object with member counters.

In current patch, it looks like this:

        "table": {
          "table_name": "t1",
          ...
          "r_engine_stats": {
            "pages_accessed": integer,
            "pages_updated" : integer,
            "pages_read_count" : integer,
            "pages_read_time_ms" : double_val,
            "old_rows_read" : integer,
          },
          ...

Only non-zero members are printed.

mysql-test/include/analyze-format.inc is adjusted to mask the entire contents of r_engine_stats.



 Comments   
Comment by Sergei Petrunia [ 2023-06-29 ]

bb-10.11-mdev31577

Comment by Sergei Petrunia [ 2023-06-29 ]

Points about the patch:

  • "JSON_EXT" name. Is it ok, any other suggestions?
  • Current patch adds handler* pointers to the Explain data structures. This means the output of analyze format=json can only be printed as long as the tables are open. Are we really fine with this?
  • Test coverage for SHOW ANALYZE FORMAT=JSON_EXT.
  • ??
Comment by Sergei Golubchik [ 2023-06-30 ]

I agree that JSON_EXT is a wrong value and FORMAT is a wrong place.

Why should it be conditional at all? It's JSON. In tabular format you cannot easily add new values and columns, it might break existing applications. But JSON doesn't have a fixed schema and is almost infinitely extendable. Applications simply ignore keys they aren't interested in.

So I'd suggest to print new information in FORMAT=JSON and not make it conditional

Comment by Sergei Petrunia [ 2023-07-03 ]

Note that PostreSQL has similar behavior: their EXPLAIN ANALYZE doesn't print number of pages fetched by default, one needs to use "EXPLAIN BUFFERS" syntax to get it.

Comment by Sergei Petrunia [ 2023-07-03 ]

I agree that using the FORMAT field to control what is printed doesnt' look like a good idea.

But should engine stats always be printed or one needs some syntax to indicate they want them?

Arguments for printing always:

  • Fewer options to remember.
  • We can ask somebody for ANALYZE FORMAT=JSON output and get everything.

Arguments for printing only when requested:

  • The output is smaller without r_engine_stats. Most of the time, the fields in r_engine_stats are not needed (this is why current patch doesn't print fields with zero values).
  • The contents of r_engine_stats are non-deterministic (ie. depend on the current state of InnoDB, which may vary). Which means the mtr scripts must mask them.
    So far, masking worked as follows: we used include/analyze-format.inc which has

    --replace_regex /("(r_[a-z_]*_time(_in_progress)?_ms|r_buffer_size)": )[^, \n]*/\1"REPLACED"/
    

    One can't extend this approach to remove the contents of r_engine_stats:
    EDIT: Wrong, one can!

    "r_engine_stats": {  
      <potentially some stuff here. Spans multiple lines. > 
    }
    

    A workaround is to always print all members of r_engine_stats (regardless if they are zero or not) and overwrite their contents on a per-member basis.
    (Note: removing r_engine_stats with JSON_REMOVE function is hard:

    Wildcards or range in JSON path not allowed in argument 2 to function 'json_remove'
    

    )

Comment by Sergei Petrunia [ 2023-07-04 ]

Takeaways from discussion with Monty:

  • Let's have ANALYZE FORMAT=JSON always print the new fields.
  • mtr needs to ignore the new fields. This can be done "on the mtr level".
  • If we will need Explain data structures to be able to print JSON output after the query tables were closed, we can add a step where we copy the needed data into Explain data structures before we close the tables.
Comment by Sergei Petrunia [ 2023-07-06 ]

New patch:

commit b45af5d582331711772210d4b7fd60aa423e7255 (HEAD -> bb-10.11-mdev31577-v2, origin/bb-10.11-mdev31577-v2)
Author: Sergei Petrunia <sergey@mariadb.com>
Date:   Thu Jul 6 10:41:46 2023 +0300
 
    MDEV-31577: Make ANALYZE FORMAT=JSON print innodb stats

Comment by Oleg Smirnov [ 2023-07-08 ]

1. Are you sure "analyze_format_json_ext.test" is a good name for the test since it has been decided not to introduce new FORMAT specifier (JSON_EXT)? The name of the test seems to be confusing.

2. The only info I have managed to see in the "r_engine_stats" section is "pages_accessed", and it's only available for SELECTs. For an INSERT the section is empty:

analyze format=json insert into t1 select seq, seq, seq, seq from seq_1_to_10000;
ANALYZE
{
  "query_optimization": {
    "r_total_time_ms": 0.101752963
  },
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 752.7264409,
    "nested_loop": [
      {
        "table": {
          "table_name": "seq_1_to_10000",
          "access_type": "index",
          "key": "PRIMARY",
          "key_length": "8",
          "used_key_parts": ["seq"],
          "r_loops": 1,
          "rows": 10000,
          "r_rows": 10000,
          "r_table_time_ms": 4.345646607,
          "r_other_time_ms": 748.376163,
          "r_engine_stats": {},
          "filtered": 100,
          "r_filtered": 100,
          "using_index": true
        }
      }
    ]
  }
}

and for an UPDATE it is missing:

ANALYZE FORMAT=json update t1 set pk = pk-1 where pk < 120000;
ANALYZE
{
  "query_optimization": {
    "r_total_time_ms": 0.109662098
  },
  "query_block": {
    "select_id": 1,
    "r_total_time_ms": 644.3883589,
    "table": {
      "update": 1,
      "table_name": "t1",
      "access_type": "ALL",
      "rows": 10000,
      "r_rows": 10000,
      "r_filtered": 100,
      "r_total_time_ms": 604.924738,
      "attached_condition": "t1.pk < 120000"
    }
  }
}

The first question is why "pages_updated", "pages_read_count", "pages_read_time_ms", "old_rows_read" are not displayed, and the second is why there is no "r_engine_stats" for INSERTs and UPDATEs?

Comment by Sergei Petrunia [ 2023-07-20 ]

oleg.smirnov, thanks for the input.

Re INSERT ... SELECT: currently, EXPLAIN (and ANALYZE) output only describes the SELECT part. In the statement you've provided, the SELECT part uses sequence storage engine so it doesn't show r_engine_stats.

Should EXPLAIN or ANALYZE output show something about the table we're inserting to? I don't see what EXPLAIN would show. ANALYZE FORMAT=JSON could show the time spent writing to the table, and now with MDEV-31558 it could show r_engine_statistics. I think we need to do this but it is outside of the scope of this MDEV.

As for UPDATEs/DELETEs, indeed it is an omission in the patch. The engine statistics are counted but not displayed.

Comment by Sergei Petrunia [ 2023-08-22 ]

ralf.gebhardt,

Is this change only for InnoDB or for any engine?

The patch for this MDEV will work with any Storage Engine that provides the statistics data.

However, the patch for MDEV-31558 has added production of statistics data only into

  • InnoDB ,
  • Partitioning Storage Engine (so that statistics from all partitions are summed together).

So, currently one can get the stats from [partitioned] innodb. Other storage engines can add support for this.

Generated at Thu Feb 08 10:24:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.