Put a breakpoint in end_table_io_wait_v1, see this:
if (flags & STATE_FLAG_TIMED)
{
timer_end= state->m_timer();
wait_time= timer_end - state->m_timer_start;
Sergei Petrunia
added a comment - Studying how to get execution time.
when the server is started with performance_schema=ON (and no other changes from the default), it actually does count time spent accessing the tables.
Proof: put a breakpoint in start_table_io_wait_v1, see this to execute:
if (pfs_table->m_io_timed)
{
timer_start= get_timer_raw_value_and_function(wait_timer, & state->m_timer);
state->m_timer_start= timer_start;
Put a breakpoint in end_table_io_wait_v1, see this:
if (flags & STATE_FLAG_TIMED)
{
timer_end= state->m_timer();
wait_time= timer_end - state->m_timer_start;
It has The Interface (with names starting with PSI_xxx), and The Implementation (names start with PFS_xxx). There are provisions for having two implementations, but there is only _v1 implementation. _v2 implementation is a dummy stub.
P_S data structures:
TABLE_SHARE <-> PFS_table_share
TABLE <-> PFS_table
THD <-> PFS_thread
The important part: end_table_io_wait_v1 collects its statistics in a PFS_table. A PFS_table is like a TABLE: it is not shared with other threads, a self-join will have two PFS_table objects. That is, the timing data it collects is really per-table.
Sergei Petrunia
added a comment - Looking at P_S code...
It has The Interface (with names starting with PSI_xxx), and The Implementation (names start with PFS_xxx). There are provisions for having two implementations, but there is only _v1 implementation. _v2 implementation is a dummy stub.
P_S data structures:
TABLE_SHARE <-> PFS_table_share
TABLE <-> PFS_table
THD <-> PFS_thread
The important part: end_table_io_wait_v1 collects its statistics in a PFS_table. A PFS_table is like a TABLE: it is not shared with other threads, a self-join will have two PFS_table objects. That is, the timing data it collects is really per-table.
P_S being on doesn't imply that table reads will be timed. One can manually disable timing:
update setup_instruments set timed='no' where name='wait/io/table/sql/handler';
or:
insert into setup_objects values ('TABLE','test','ten','YES','NO');
Sergei Petrunia
added a comment - P_S being on doesn't imply that table reads will be timed. One can manually disable timing:
update setup_instruments set timed='no' where name='wait/io/table/sql/handler';
or:
insert into setup_objects values ('TABLE','test','ten','YES','NO');
(We need it for all indexes so that things like index_merge and DS-MRR work (hmm index_merge actually uses different pfs_table objects)).
Sergei Petrunia
added a comment - - edited PFS_table has:
PFS_table_stat m_table_stat;
PFS_table_stat has:
/**
Statistics, per index.
Each index stat is in [0, MAX_INDEXES-1],
stats when using no index are in [MAX_INDEXES].
*/
PFS_table_io_stat m_index_stat[MAX_INDEXES + 1];
then
struct PFS_table_io_stat
{
bool m_has_data;
/** FETCH statistics */
PFS_single_stat m_fetch;
/** INSERT statistics */
PFS_single_stat m_insert;
/** UPDATE statistics */
PFS_single_stat m_update;
/** DELETE statistics */
PFS_single_stat m_delete;
struct PFS_single_stat
{
/** Count of values. */
ulonglong m_count;
/** Sum of values. */
ulonglong m_sum;
We need this:
pfs_table->m_table_stat.m_index_stat[ 0 ... MAX_INDEXES].m_fetch.m_sum / pfs_table->... .m_count
(We need it for all indexes so that things like index_merge and DS-MRR work (hmm index_merge actually uses different pfs_table objects)).
Looking what units are used in PFS_single_stat ...
select * from table_io_waits_summary_by_table where object_name='ten';
returns data in picoseconds.
The data is converted using a"normalizer":
int table_tiws_by_table::rnd_init(bool scan)
{
m_normalizer= time_normalizer::get(wait_timer);
return 0;
}
wait_timer is a global variable (can be set through P_S.setup_timers)
enum_timer_name wait_timer= TIMER_NAME_CYCLE;
the normalizer is used here
(gdb) wher
#0 PFS_stat_row::set (this=0x7fff90005090, normalizer=0x55555748eb70, stat=0x7ffff01549d0) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:223
#1 0x0000555555ecf42d in PFS_table_io_stat_row::set (this=0x7fff90005018, normalizer=0x55555748eb70, stat=0x7ffff01549c8) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:299
#2 0x0000555555ecfa31 in table_tiws_by_table::make_row (this=0x7fff90004e60, share=0x7ffff54389c0) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:171
#3 0x0000555555ecf866 in table_tiws_by_table::rnd_next (this=0x7fff90004e60) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:127
#4 0x0000555555ea18e2 in ha_perfschema::rnd_next (this=0x7fff90013a68, buf=0x7fff90013f88 "") at /home/psergey/dev-git/10.1/storage/perfschema/ha_perfschema.cc:359
Sergei Petrunia
added a comment - - edited Looking what units are used in PFS_single_stat ...
select * from table_io_waits_summary_by_table where object_name='ten';
returns data in picoseconds.
The data is converted using a"normalizer":
int table_tiws_by_table::rnd_init(bool scan)
{
m_normalizer= time_normalizer::get(wait_timer);
return 0;
}
wait_timer is a global variable (can be set through P_S.setup_timers)
enum_timer_name wait_timer= TIMER_NAME_CYCLE;
the normalizer is used here
(gdb) wher
#0 PFS_stat_row::set (this=0x7fff90005090, normalizer=0x55555748eb70, stat=0x7ffff01549d0) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:223
#1 0x0000555555ecf42d in PFS_table_io_stat_row::set (this=0x7fff90005018, normalizer=0x55555748eb70, stat=0x7ffff01549c8) at /home/psergey/dev-git/10.1/storage/perfschema/table_helper.h:299
#2 0x0000555555ecfa31 in table_tiws_by_table::make_row (this=0x7fff90004e60, share=0x7ffff54389c0) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:171
#3 0x0000555555ecf866 in table_tiws_by_table::rnd_next (this=0x7fff90004e60) at /home/psergey/dev-git/10.1/storage/perfschema/table_tiws_by_table.cc:127
#4 0x0000555555ea18e2 in ha_perfschema::rnd_next (this=0x7fff90013a68, buf=0x7fff90013f88 "") at /home/psergey/dev-git/10.1/storage/perfschema/ha_perfschema.cc:359
like so:
m_count= stat->m_count;
...
m_sum= normalizer->wait_to_pico(stat->m_sum);
m_avg= normalizer->wait_to_pico(stat->m_sum / m_count);
the resulting data is in pico-seconds.
Tracking table read times will require start/stop points.
The choice of starting point is obvious:
In JOIN::save_explain(), remember the current counter values. This is where the tracking starts
As for ending, there are options:
Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed.
Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place).
Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
Sergei Petrunia
added a comment - Tracking table read times will require start/stop points.
The choice of starting point is obvious:
In JOIN::save_explain(), remember the current counter values. This is where the tracking starts
As for ending, there are options:
Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed.
Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place).
Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
Sergei Petrunia
added a comment - - edited Hit an interesting property in P_S: http://bugs.mysql.com/bug.php?id=76100 . It doesn't directly affect this task.
Ok so we now can collect the time that is spent accessing the tables.
The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this:
void JOIN::exec_inner()
{
...
THD_STAGE_INFO(thd, stage_executing);
It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because
stage_executing doesn't tell which select is being run.
there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner
when a child select changes current stage, nobody will care to set it back?
the stages are not timed by default.
select * from performance_schema.setup_instruments where name like 'stage%';
shows that stage tracking is not enabled by default.
Sergei Petrunia
added a comment - Ok so we now can collect the time that is spent accessing the tables.
The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this:
void JOIN::exec_inner()
{
...
THD_STAGE_INFO(thd, stage_executing);
It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because
stage_executing doesn't tell which select is being run.
there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner
when a child select changes current stage, nobody will care to set it back?
the stages are not timed by default.
select * from performance_schema.setup_instruments where name like 'stage%';
shows that stage tracking is not enabled by default.
Sergei Petrunia
added a comment - - edited Examples of output with r_time_ms, time spent to access the table:
https://gist.github.com/spetrunia/ae497fb33e2b5ff2b2c6
https://gist.github.com/spetrunia/aa4c22b748eb74906950
ANALYZE in PostgreSQL:
https://gist.github.com/spetrunia/cab742456f9b0ebb74cf
PostgreSQL also support a kind of ANALYZE FORMAT=JSON. (their syntax is explain (analyze, format json) select ):
https://gist.github.com/spetrunia/e0e5b706f4691aa81006
Studying how to get execution time.
when the server is started with performance_schema=ON (and no other changes from the default), it actually does count time spent accessing the tables.
Proof: put a breakpoint in start_table_io_wait_v1, see this to execute:
if (pfs_table->m_io_timed)
{
timer_start= get_timer_raw_value_and_function(wait_timer, & state->m_timer);
state->m_timer_start= timer_start;
Put a breakpoint in end_table_io_wait_v1, see this:
if (flags & STATE_FLAG_TIMED)
{
timer_end= state->m_timer();
wait_time= timer_end - state->m_timer_start;