Details
- 
    Task 
- 
    Status: Stalled (View Workflow)
- 
    Major 
- 
    Resolution: Unresolved
Description
(Filing this based on repeated desire to have it that we had when analyzing customer cases)
SHOW PROFILE (https://mariadb.com/kb/en/show-profile/) shows CPU statistics.
The idea is to make ANALYZE FORMAT=JSON check if profiling is enabled.
If yes, print available profiling information in its output. It should include:
- CPU time, user and system (1),
- context switches, voluntary and not (2).
- Number of CPU instructions executed (3).
SHOW PROFILE doesn't seem to show #3, but the CPU/OS are collecting it and so it should be available somewhere? Need to check this.
Looking at SHOW PROFILE
I ran a basic full-join-of-small-tables query.
The query takes
- 23 seconds alone
- 82 seconds when the server is loaded.
Listing notable differences in outputs:
SHOW PROFILE;
|  | Status                 | Duration  | | 
|  +------------------------+-----------+ | 
|  | 
| -| Sending data           | 23.087177 | | 
| +| Sending data | 82.211776 | | 
show profile cpu;
|  +------------------------+-----------+-----------+------------+ | 
|  | Status                 | Duration  | CPU_user  | CPU_system | | 
|  +------------------------+-----------+-----------+------------+ | 
|  | 
| -| Sending data           | 23.087177 | 23.094124 |   0.000000 | | 
| +| Sending data | 82.211776 | 999.999999 | 0.020042 | | 
What is 999.99999 ? 
The extra workload was 20 more threads running the same query.  If one counts 21*20=400... 
show profile context switches;
|   +------------------------+-----------+-------------------+---------------------+ | 
|   | Status                 | Duration  | Context_voluntary | Context_involuntary | | 
|   +------------------------+-----------+-------------------+---------------------+ | 
|  | 
| - | Sending data           | 23.087177 |               256 |                  92 | | 
| + | Sending data | 82.211776 | 277 | 37519 | | 
High number of involuntary switches gives a clue. 
I'm not sure what is the cause of voluntary switches. 
What I do not see here is the number that shows that of 82 seconds the thread has been on-CPU for a much smaller time.
Reading the code
sql_profile.cc calls this
|   getrusage(RUSAGE_SELF, &rusage);
 | 
why doesn't it call RUSAGE_THREAD ?
trying to change:
| diff --git a/sql/sql_profile.cc b/sql/sql_profile.cc | 
| index 863f03698a4..9820889e4aa 100644 | 
| --- a/sql/sql_profile.cc | 
| +++ b/sql/sql_profile.cc | 
| @@ -249,7 +249,7 @@ void PROF_MEASUREMENT::collect() | 
|  { | 
|    time_usecs= my_interval_timer() / 1e3;  /* ns to us */ | 
|  #ifdef HAVE_GETRUSAGE | 
| -  getrusage(RUSAGE_SELF, &rusage); | 
| +  getrusage(RUSAGE_THREAD, &rusage); | 
|  #elif defined(_WIN32) | 
|    FILETIME ftDummy; | 
| // NOTE: Get{Process|Thread}Times has a granularity of the clock interval, | 
gives
SHOW PROFILE;
|  | Status                 | Duration  | | 
|  +------------------------+-----------+ | 
| -| Sending data           | 22.408736 | | 
| +| Sending data | 81.323032 | | 
show profile cpu;
|  +------------------------+-----------+-----------+------------+ | 
|  | Status                 | Duration  | CPU_user  | CPU_system | | 
|  +------------------------+-----------+-----------+------------+ | 
| -| Sending data           | 22.408736 | 22.405303 |   0.001506 | | 
| +| Sending data | 81.323032 | 65.096885 | 0.000000 | | 
show profile context switches;
|   +------------------------+-----------+-------------------+---------------------+ | 
|   | Status                 | Duration  | Context_voluntary | Context_involuntary | | 
|   +------------------------+-----------+-------------------+---------------------+ | 
| - | Sending data           | 22.408736 |                 0 |                  94 | | 
| + | Sending data | 81.323032 | 0 | 1417 | | 
Where to get CPU instructions counts
https://man7.org/linux/man-pages/man2/perf_event_open.2.html ? 
What is PERF_COUNT_HW_INSTRUCTIONS  ? Is it per-thread or not?