Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
Description
(Filing this based on repeated desire to have it that we had when analyzing customer cases)
SHOW PROFILE (https://mariadb.com/kb/en/show-profile/) shows CPU statistics.
The idea is to make ANALYZE FORMAT=JSON check if profiling is enabled.
If yes, print available profiling information in its output. It should include:
- CPU time, user and system (1),
- context switches, voluntary and not (2).
- Number of CPU instructions executed (3).
SHOW PROFILE doesn't seem to show #3, but the CPU/OS are collecting it and so it should be available somewhere? Need to check this.
Looking at SHOW PROFILE
I ran a basic full-join-of-small-tables query.
The query takes
- 23 seconds alone
- 82 seconds when the server is loaded.
Listing notable differences in outputs:
SHOW PROFILE;
| Status | Duration |
|
+------------------------+-----------+
|
|
-| Sending data | 23.087177 |
|
+| Sending data | 82.211776 | |
show profile cpu;
+------------------------+-----------+-----------+------------+
|
| Status | Duration | CPU_user | CPU_system |
|
+------------------------+-----------+-----------+------------+
|
|
-| Sending data | 23.087177 | 23.094124 | 0.000000 |
|
+| Sending data | 82.211776 | 999.999999 | 0.020042 | |
What is 999.99999 ?
The extra workload was 20 more threads running the same query. If one counts 21*20=400...
show profile context switches;
+------------------------+-----------+-------------------+---------------------+
|
| Status | Duration | Context_voluntary | Context_involuntary |
|
+------------------------+-----------+-------------------+---------------------+
|
|
- | Sending data | 23.087177 | 256 | 92 |
|
+ | Sending data | 82.211776 | 277 | 37519 | |
High number of involuntary switches gives a clue.
I'm not sure what is the cause of voluntary switches.
What I do not see here is the number that shows that of 82 seconds the thread has been on-CPU for a much smaller time.
Reading the code
sql_profile.cc calls this
getrusage(RUSAGE_SELF, &rusage);
|
why doesn't it call RUSAGE_THREAD ?
trying to change:
diff --git a/sql/sql_profile.cc b/sql/sql_profile.cc
|
index 863f03698a4..9820889e4aa 100644
|
--- a/sql/sql_profile.cc
|
+++ b/sql/sql_profile.cc
|
@@ -249,7 +249,7 @@ void PROF_MEASUREMENT::collect()
|
{
|
time_usecs= my_interval_timer() / 1e3; /* ns to us */
|
#ifdef HAVE_GETRUSAGE
|
- getrusage(RUSAGE_SELF, &rusage);
|
+ getrusage(RUSAGE_THREAD, &rusage);
|
#elif defined(_WIN32)
|
FILETIME ftDummy;
|
// NOTE: Get{Process|Thread}Times has a granularity of the clock interval, |
gives
SHOW PROFILE;
| Status | Duration |
|
+------------------------+-----------+
|
-| Sending data | 22.408736 |
|
+| Sending data | 81.323032 | |
show profile cpu;
+------------------------+-----------+-----------+------------+
|
| Status | Duration | CPU_user | CPU_system |
|
+------------------------+-----------+-----------+------------+
|
-| Sending data | 22.408736 | 22.405303 | 0.001506 |
|
+| Sending data | 81.323032 | 65.096885 | 0.000000 | |
show profile context switches;
+------------------------+-----------+-------------------+---------------------+
|
| Status | Duration | Context_voluntary | Context_involuntary |
|
+------------------------+-----------+-------------------+---------------------+
|
- | Sending data | 22.408736 | 0 | 94 |
|
+ | Sending data | 81.323032 | 0 | 1417 | |
Where to get CPU instructions counts
https://man7.org/linux/man-pages/man2/perf_event_open.2.html ?
What is PERF_COUNT_HW_INSTRUCTIONS ? Is it per-thread or not?