Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34899

Make ANALYZE FORMAT=JSON print CPU statistics

    XMLWordPrintable

Details

    Description

      (Filing this based on repeated desire to have it that we had when analyzing customer cases)

      SHOW PROFILE (https://mariadb.com/kb/en/show-profile/) shows CPU statistics.

      The idea is to make ANALYZE FORMAT=JSON check if profiling is enabled.
      If yes, print available profiling information in its output. It should include:

      • CPU time, user and system (1),
      • context switches, voluntary and not (2).
      • Number of CPU instructions executed (3).

      SHOW PROFILE doesn't seem to show #3, but the CPU/OS are collecting it and so it should be available somewhere? Need to check this.

      Looking at SHOW PROFILE

      I ran a basic full-join-of-small-tables query.

      The query takes

      • 23 seconds alone
      • 82 seconds when the server is loaded.

      Listing notable differences in outputs:

      SHOW PROFILE;

       | Status                 | Duration  |
       +------------------------+-----------+
       
      -| Sending data           | 23.087177 |
      +| Sending data           | 82.211776 |
      

      show profile cpu;

       +------------------------+-----------+-----------+------------+
       | Status                 | Duration  | CPU_user  | CPU_system |
       +------------------------+-----------+-----------+------------+
       
      -| Sending data           | 23.087177 | 23.094124 |   0.000000 |
      +| Sending data           | 82.211776 | 999.999999 |   0.020042 |
      

      What is 999.99999 ?
      The extra workload was 20 more threads running the same query. If one counts 21*20=400...

      show profile context switches;

        +------------------------+-----------+-------------------+---------------------+
        | Status                 | Duration  | Context_voluntary | Context_involuntary |
        +------------------------+-----------+-------------------+---------------------+
       
      - | Sending data           | 23.087177 |               256 |                  92 |
      + | Sending data           | 82.211776 |               277 |               37519 |
      

      High number of involuntary switches gives a clue.
      I'm not sure what is the cause of voluntary switches.

      What I do not see here is the number that shows that of 82 seconds the thread has been on-CPU for a much smaller time.

      Reading the code

      sql_profile.cc calls this

        getrusage(RUSAGE_SELF, &rusage);
      

      why doesn't it call RUSAGE_THREAD ?

      trying to change:

      diff --git a/sql/sql_profile.cc b/sql/sql_profile.cc
      index 863f03698a4..9820889e4aa 100644
      --- a/sql/sql_profile.cc
      +++ b/sql/sql_profile.cc
      @@ -249,7 +249,7 @@ void PROF_MEASUREMENT::collect()
       {
         time_usecs= my_interval_timer() / 1e3;  /* ns to us */
       #ifdef HAVE_GETRUSAGE
      -  getrusage(RUSAGE_SELF, &rusage);
      +  getrusage(RUSAGE_THREAD, &rusage);
       #elif defined(_WIN32)
         FILETIME ftDummy;
         // NOTE: Get{Process|Thread}Times has a granularity of the clock interval,
      

      gives
      SHOW PROFILE;

       | Status                 | Duration  |
       +------------------------+-----------+
      -| Sending data           | 22.408736 |
      +| Sending data           | 81.323032 |
      

      show profile cpu;

       +------------------------+-----------+-----------+------------+
       | Status                 | Duration  | CPU_user  | CPU_system |
       +------------------------+-----------+-----------+------------+
      -| Sending data           | 22.408736 | 22.405303 |   0.001506 |
      +| Sending data           | 81.323032 | 65.096885 |   0.000000 |
      

      show profile context switches;

        +------------------------+-----------+-------------------+---------------------+
        | Status                 | Duration  | Context_voluntary | Context_involuntary |
        +------------------------+-----------+-------------------+---------------------+
      - | Sending data           | 22.408736 |                 0 |                  94 |
      + | Sending data           | 81.323032 |                 0 |                1417 |
      

      Where to get CPU instructions counts

      https://man7.org/linux/man-pages/man2/perf_event_open.2.html ?
      What is PERF_COUNT_HW_INSTRUCTIONS ? Is it per-thread or not?

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.