Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7648

Extra data in ANALYZE FORMAT=JSON $stmt

Details

    Description

      ANALYZE FORMAT=JSON $stmt can (should) print more data about execution.

      Requested things

      • execution time of various parts of the statement
      • Real IO done by accesses to different tables
      • etc etc

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description ANALYZE FORMAT=JSON $stmt can (should) print more data about execution.
            ANALYZE FORMAT=JSON $stmt can (should) print more data about execution.

            Requested things
            * execution time of various parts of the statement
            * Real IO done by accesses to different tables
            * etc etc

            Studying how to get execution time.

            when the server is started with performance_schema=ON (and no other changes from the default), it actually does count time spent accessing the tables.

            Proof: put a breakpoint in start_table_io_wait_v1, see this to execute:

                if (pfs_table->m_io_timed)
                {
                  timer_start= get_timer_raw_value_and_function(wait_timer, & state->m_timer);
                  state->m_timer_start= timer_start;

            Put a breakpoint in end_table_io_wait_v1, see this:

              if (flags & STATE_FLAG_TIMED)
              {
                timer_end= state->m_timer();
                wait_time= timer_end - state->m_timer_start;

            psergei Sergei Petrunia added a comment - Studying how to get execution time. when the server is started with performance_schema=ON (and no other changes from the default), it actually does count time spent accessing the tables. Proof: put a breakpoint in start_table_io_wait_v1, see this to execute: if (pfs_table->m_io_timed) { timer_start= get_timer_raw_value_and_function(wait_timer, & state->m_timer); state->m_timer_start= timer_start; Put a breakpoint in end_table_io_wait_v1, see this: if (flags & STATE_FLAG_TIMED) { timer_end= state->m_timer(); wait_time= timer_end - state->m_timer_start;

            Looking at P_S code...

            It has The Interface (with names starting with PSI_xxx), and The Implementation (names start with PFS_xxx). There are provisions for having two implementations, but there is only _v1 implementation. _v2 implementation is a dummy stub.

            P_S data structures:

            TABLE_SHARE <-> PFS_table_share
            TABLE  <-> PFS_table
            THD <-> PFS_thread

            The important part: end_table_io_wait_v1 collects its statistics in a PFS_table. A PFS_table is like a TABLE: it is not shared with other threads, a self-join will have two PFS_table objects. That is, the timing data it collects is really per-table.

            psergei Sergei Petrunia added a comment - Looking at P_S code... It has The Interface (with names starting with PSI_xxx), and The Implementation (names start with PFS_xxx). There are provisions for having two implementations, but there is only _v1 implementation. _v2 implementation is a dummy stub. P_S data structures: TABLE_SHARE <-> PFS_table_share TABLE <-> PFS_table THD <-> PFS_thread The important part: end_table_io_wait_v1 collects its statistics in a PFS_table. A PFS_table is like a TABLE: it is not shared with other threads, a self-join will have two PFS_table objects. That is, the timing data it collects is really per-table.

            P_S being on doesn't imply that table reads will be timed. One can manually disable timing:

            update setup_instruments set timed='no' where name='wait/io/table/sql/handler';

            or:

            insert into setup_objects values ('TABLE','test','ten','YES','NO');

            psergei Sergei Petrunia added a comment - P_S being on doesn't imply that table reads will be timed. One can manually disable timing: update setup_instruments set timed='no' where name='wait/io/table/sql/handler'; or: insert into setup_objects values ('TABLE','test','ten','YES','NO');
            psergei Sergei Petrunia added a comment - - edited

            PFS_table has:

            PFS_table_stat m_table_stat;

            PFS_table_stat has:

              /**
                Statistics, per index.
                Each index stat is in [0, MAX_INDEXES-1],
                stats when using no index are in [MAX_INDEXES].
              */
              PFS_table_io_stat m_index_stat[MAX_INDEXES + 1];

            then

            struct PFS_table_io_stat
            {
              bool m_has_data;
              /** FETCH statistics */
              PFS_single_stat m_fetch;
              /** INSERT statistics */
              PFS_single_stat m_insert;
              /** UPDATE statistics */
              PFS_single_stat m_update;
              /** DELETE statistics */
              PFS_single_stat m_delete;

            struct PFS_single_stat
            {
              /** Count of values. */
              ulonglong m_count;
              /** Sum of values. */
              ulonglong m_sum;

            We need this:

            pfs_table->m_table_stat.m_index_stat[ 0 ... MAX_INDEXES].m_fetch.m_sum  /  pfs_table->... .m_count

            (We need it for all indexes so that things like index_merge and DS-MRR work (hmm index_merge actually uses different pfs_table objects)).

            psergei Sergei Petrunia added a comment - - edited PFS_table has: PFS_table_stat m_table_stat; PFS_table_stat has: /** Statistics, per index. Each index stat is in [0, MAX_INDEXES-1], stats when using no index are in [MAX_INDEXES]. */ PFS_table_io_stat m_index_stat[MAX_INDEXES + 1]; then struct PFS_table_io_stat { bool m_has_data; /** FETCH statistics */ PFS_single_stat m_fetch; /** INSERT statistics */ PFS_single_stat m_insert; /** UPDATE statistics */ PFS_single_stat m_update; /** DELETE statistics */ PFS_single_stat m_delete; struct PFS_single_stat { /** Count of values. */ ulonglong m_count; /** Sum of values. */ ulonglong m_sum; We need this: pfs_table->m_table_stat.m_index_stat[ 0 ... MAX_INDEXES].m_fetch.m_sum / pfs_table->... .m_count (We need it for all indexes so that things like index_merge and DS-MRR work (hmm index_merge actually uses different pfs_table objects)).
            psergei Sergei Petrunia added a comment - - edited

            Looking what units are used in PFS_single_stat ...

            select * from table_io_waits_summary_by_table where object_name='ten';

            returns data in picoseconds.

            The data is converted using a"normalizer":

            int table_tiws_by_table::rnd_init(bool scan)
            {
              m_normalizer= time_normalizer::get(wait_timer);
              return 0;
            }

            wait_timer is a global variable (can be set through P_S.setup_timers)

            enum_timer_name wait_timer= TIMER_NAME_CYCLE;

            the normalizer is used here

            (gdb) wher
              #0  PFS_stat_row::set (this=0x7fff90005090, normalizer=0x55555748eb70, stat=0x7ffff01549d0) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:223
              #1  0x0000555555ecf42d in PFS_table_io_stat_row::set (this=0x7fff90005018, normalizer=0x55555748eb70, stat=0x7ffff01549c8) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:299
              #2  0x0000555555ecfa31 in table_tiws_by_table::make_row (this=0x7fff90004e60, share=0x7ffff54389c0) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:171
              #3  0x0000555555ecf866 in table_tiws_by_table::rnd_next (this=0x7fff90004e60) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:127
              #4  0x0000555555ea18e2 in ha_perfschema::rnd_next (this=0x7fff90013a68, buf=0x7fff90013f88 "") at /home/psergey/dev-git/10.1/storage/perfschema/ha_perfschema.cc:359

            like so:

                  m_count= stat->m_count;
                 ...
                  m_sum= normalizer->wait_to_pico(stat->m_sum);
                  m_avg= normalizer->wait_to_pico(stat->m_sum / m_count);

            the resulting data is in pico-seconds.

            psergei Sergei Petrunia added a comment - - edited Looking what units are used in PFS_single_stat ... select * from table_io_waits_summary_by_table where object_name='ten'; returns data in picoseconds. The data is converted using a"normalizer": int table_tiws_by_table::rnd_init(bool scan) { m_normalizer= time_normalizer::get(wait_timer); return 0; } wait_timer is a global variable (can be set through P_S.setup_timers) enum_timer_name wait_timer= TIMER_NAME_CYCLE; the normalizer is used here (gdb) wher #0 PFS_stat_row::set (this=0x7fff90005090, normalizer=0x55555748eb70, stat=0x7ffff01549d0) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:223 #1 0x0000555555ecf42d in PFS_table_io_stat_row::set (this=0x7fff90005018, normalizer=0x55555748eb70, stat=0x7ffff01549c8) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:299 #2 0x0000555555ecfa31 in table_tiws_by_table::make_row (this=0x7fff90004e60, share=0x7ffff54389c0) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:171 #3 0x0000555555ecf866 in table_tiws_by_table::rnd_next (this=0x7fff90004e60) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:127 #4 0x0000555555ea18e2 in ha_perfschema::rnd_next (this=0x7fff90013a68, buf=0x7fff90013f88 "") at /home/psergey/dev-git/10.1/storage/perfschema/ha_perfschema.cc:359 like so: m_count= stat->m_count; ... m_sum= normalizer->wait_to_pico(stat->m_sum); m_avg= normalizer->wait_to_pico(stat->m_sum / m_count); the resulting data is in pico-seconds.

            Tracking table read times will require start/stop points.

            The choice of starting point is obvious:

            • In JOIN::save_explain(), remember the current counter values. This is where the tracking starts

            As for ending, there are options:

            • Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed.
            • Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place).
            • Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
            psergei Sergei Petrunia added a comment - Tracking table read times will require start/stop points. The choice of starting point is obvious: In JOIN::save_explain(), remember the current counter values. This is where the tracking starts As for ending, there are options: Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed. Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place). Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
            psergei Sergei Petrunia added a comment - - edited

            Hit an interesting property in P_S: http://bugs.mysql.com/bug.php?id=76100. It doesn't directly affect this task.

            psergei Sergei Petrunia added a comment - - edited Hit an interesting property in P_S: http://bugs.mysql.com/bug.php?id=76100 . It doesn't directly affect this task.

            Ok so we now can collect the time that is spent accessing the tables.
            The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this:

            void JOIN::exec_inner()
            {
              ...
              THD_STAGE_INFO(thd, stage_executing);

            It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because

            • stage_executing doesn't tell which select is being run.
            • there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner
              • when a child select changes current stage, nobody will care to set it back?
            • the stages are not timed by default.

            select * from performance_schema.setup_instruments where name like 'stage%';

            shows that stage tracking is not enabled by default.

            psergei Sergei Petrunia added a comment - Ok so we now can collect the time that is spent accessing the tables. The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this: void JOIN::exec_inner() { ... THD_STAGE_INFO(thd, stage_executing); It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because stage_executing doesn't tell which select is being run. there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner when a child select changes current stage, nobody will care to set it back? the stages are not timed by default. select * from performance_schema.setup_instruments where name like 'stage%'; shows that stage tracking is not enabled by default.
            psergei Sergei Petrunia added a comment - - edited

            Examples of output with r_time_ms, time spent to access the table:
            https://gist.github.com/spetrunia/ae497fb33e2b5ff2b2c6
            https://gist.github.com/spetrunia/aa4c22b748eb74906950

            ANALYZE in PostgreSQL:
            https://gist.github.com/spetrunia/cab742456f9b0ebb74cf

            PostgreSQL also support a kind of ANALYZE FORMAT=JSON. (their syntax is explain (analyze, format json) select):
            https://gist.github.com/spetrunia/e0e5b706f4691aa81006

            psergei Sergei Petrunia added a comment - - edited Examples of output with r_time_ms, time spent to access the table: https://gist.github.com/spetrunia/ae497fb33e2b5ff2b2c6 https://gist.github.com/spetrunia/aa4c22b748eb74906950 ANALYZE in PostgreSQL: https://gist.github.com/spetrunia/cab742456f9b0ebb74cf PostgreSQL also support a kind of ANALYZE FORMAT=JSON. (their syntax is explain (analyze, format json) select ): https://gist.github.com/spetrunia/e0e5b706f4691aa81006
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -

            Switched from using P_S to using our own timing code.

            psergei Sergei Petrunia added a comment - Switched from using P_S to using our own timing code.
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            Affects Version/s 10.1 [ 16100 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.1 [ 16100 ]
            psergei Sergei Petrunia made changes -
            Labels analyze-stmt
            psergei Sergei Petrunia made changes -
            Component/s Optimizer [ 10200 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 59848 ] MariaDB v3 [ 63640 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.1 [ 16100 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.1 [ 16100 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.3 [ 22126 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 63640 ] MariaDB v4 [ 130321 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.