Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34125

ANALYZE FORMAT=JSON: r_engine_stats.pages_read_time_ms has wrong scale

Details

    Description

      Discovered this in MDEV-33446,
      https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972

                "r_table_time_ms": 1 425 235.95,
                "r_other_time_ms": 397318.6493,
                "r_engine_stats": {
                  "pages_accessed": 1850995,
                  "pages_read_count": 35272,
                  "pages_read_time_ms": 14 657 261.27
                },
      

      ^ an ANALYZE output where the value of r_engine_stats.pages_read_time_ms doesn't make much sense: it is larger than the query's total r_total_time_ms.

      The cause

      Code from MDEV-31577:
      trace_engine_stats():

            writer->add_member("pages_read_time_ms").
              add_double(hs->pages_read_time / 1000.0);
      

      it's divided by 1000, because according the definition it's microseconds:

      class ha_handler_stats
      {
      ...
        ulonglong pages_read_time;             /* Time reading pages, in microsec. */
      

      On the other hand, it is printed into the slow query log like this: MYSQL_QUERY_LOG::write():

            double tracker_frequency= timer_tracker_frequency();
            sprintf(query_time_buff, "%.4f", 
                    1000.0 * ulonglong2double(stats->pages_read_time)/
                    tracker_frequency);
      

      Attachments

        Issue Links

          Activity

            Gosselin Dave Gosselin added a comment -

            OK to push psergei

            Gosselin Dave Gosselin added a comment - OK to push psergei

            Note to self: the patch shows valrind/asan warnings:

            main.analyze_engine_stats2               [ fail ]  Found warnings/errors in server log file!
                    Test ended at 2024-05-24 15:44:00
            line
            ==11329== Thread 10:
            ==11329== Conditional jump or move depends on uninitialised value(s)
            ==11329==    at 0xFA06FE: ha_handler_stats::has_stats() (ha_handler_stats.h:58)
            ==11329==    by 0xF86200: MYSQL_QUERY_LOG::write(THD*, long, char const*, unsigned long, unsigned long long, unsigned long long, bool, char const*, unsigned long) (log.cc:3273)
            ==11329==    by 0xF7F944: Log_to_file_event_handler::log_slow(THD*, my_hrtime_t, char const*, unsigned long, unsigned long long, unsigned long long, bool, char const*, unsigned long) (log.cc:1076)
            ==11329==    by 0xF80615: LOGGER::slow_log_print(THD*, char const*, unsigned long, unsigned long long) (log.cc:1350)
            ==11329==    by 0xF91F6B: slow_log_print(THD*, char const*, unsigned int, unsigned long long) (log.cc:6950)
            ==11329==    by 0xA84E94: log_slow_statement(THD*) (sql_parse.cc:2585)
            ==11329==    by 0xA848B0: dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) (sql_parse.cc:2453)
            ==11329==    by 0xA813FB: do_command(THD*, bool) (sql_parse.cc:1409)
            ==11329==    by 0xC5160A: do_handle_one_connection(CONNECT*, bool) (sql_connect.cc:1415)
            ==11329==    by 0xC5136C: handle_one_connection (sql_connect.cc:1317)
            ==11329==    by 0x119DDBC: pfs_spawn_thread (pfs.cc:2201)
            ==11329==    by 0x66CC6DA: start_thread (pthread_create.c:463)
            ==11329==    by 0x754861E: clone (clone.S:95)
            

            But these are not related to this MDEV, I can observe them with just 10.6 + this MDEV's testcase.

            psergei Sergei Petrunia added a comment - Note to self: the patch shows valrind/asan warnings: main.analyze_engine_stats2 [ fail ] Found warnings/errors in server log file! Test ended at 2024-05-24 15:44:00 line ==11329== Thread 10: ==11329== Conditional jump or move depends on uninitialised value(s) ==11329== at 0xFA06FE: ha_handler_stats::has_stats() (ha_handler_stats.h:58) ==11329== by 0xF86200: MYSQL_QUERY_LOG::write(THD*, long, char const*, unsigned long, unsigned long long, unsigned long long, bool, char const*, unsigned long) (log.cc:3273) ==11329== by 0xF7F944: Log_to_file_event_handler::log_slow(THD*, my_hrtime_t, char const*, unsigned long, unsigned long long, unsigned long long, bool, char const*, unsigned long) (log.cc:1076) ==11329== by 0xF80615: LOGGER::slow_log_print(THD*, char const*, unsigned long, unsigned long long) (log.cc:1350) ==11329== by 0xF91F6B: slow_log_print(THD*, char const*, unsigned int, unsigned long long) (log.cc:6950) ==11329== by 0xA84E94: log_slow_statement(THD*) (sql_parse.cc:2585) ==11329== by 0xA848B0: dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) (sql_parse.cc:2453) ==11329== by 0xA813FB: do_command(THD*, bool) (sql_parse.cc:1409) ==11329== by 0xC5160A: do_handle_one_connection(CONNECT*, bool) (sql_connect.cc:1415) ==11329== by 0xC5136C: handle_one_connection (sql_connect.cc:1317) ==11329== by 0x119DDBC: pfs_spawn_thread (pfs.cc:2201) ==11329== by 0x66CC6DA: start_thread (pthread_create.c:463) ==11329== by 0x754861E: clone (clone.S:95) But these are not related to this MDEV, I can observe them with just 10.6 + this MDEV's testcase.

            Filed the above as MDEV-34251, it's a separate problem from this one.

            psergei Sergei Petrunia added a comment - Filed the above as MDEV-34251 , it's a separate problem from this one.

            AddressSanitizer does not check for uninitialized values. MemorySanitizer (MDEV-20377) does.

            marko Marko Mäkelä added a comment - AddressSanitizer does not check for uninitialized values. MemorySanitizer ( MDEV-20377 ) does.

            Trying to see which of the binaries were affected.

            made this commit with just testcases:
            https://github.com/MariaDB/server/commit/0241c0e95295dc1e42e0236be9ca4f3538b1a7fb

            Buildbot run
            https://buildbot.mariadb.org/#/grid?branch=bb-10.6-mdev34125-which-platforms

            Shows the expected failure on arm64-windows[-packages]:

            https://buildbot.mariadb.org/#/builders/234/builds/30743/steps/9/logs/stdio
            https://buildbot.mariadb.org/#/builders/239/builds/22188/steps/10/logs/stdio

            -  OK: pages_read_time is same in slow log and ANALYZE
            -
            +  FAIL: 172.8252 not equal to  1728.252
            

            but does NOT show failure on nearly any other build:
            for example, amd64-centos-stream9:

            https://buildbot.mariadb.org/#/builders/495/builds/10913/steps/6/logs/stdio
            the test is run and passes:

            main.analyze_engine_stats2               w6 [ pass ]   1551
            

            psergei Sergei Petrunia added a comment - Trying to see which of the binaries were affected. made this commit with just testcases: https://github.com/MariaDB/server/commit/0241c0e95295dc1e42e0236be9ca4f3538b1a7fb Buildbot run https://buildbot.mariadb.org/#/grid?branch=bb-10.6-mdev34125-which-platforms Shows the expected failure on arm64-windows [-packages] : https://buildbot.mariadb.org/#/builders/234/builds/30743/steps/9/logs/stdio https://buildbot.mariadb.org/#/builders/239/builds/22188/steps/10/logs/stdio - OK: pages_read_time is same in slow log and ANALYZE - + FAIL: 172.8252 not equal to 1728.252 but does NOT show failure on nearly any other build: for example, amd64-centos-stream9: https://buildbot.mariadb.org/#/builders/495/builds/10913/steps/6/logs/stdio the test is run and passes: main.analyze_engine_stats2 w6 [ pass ] 1551

            I was checking why this was not reproducible on many platforms on 10.6 and found MDEV-34538.

            On 10.6, we never used sys_timer_info.cycles.frequency, we always used sys_timer_info.microseconds.frequency. In this case, the code before the fix happened to produce the correct number.

            psergei Sergei Petrunia added a comment - I was checking why this was not reproducible on many platforms on 10.6 and found MDEV-34538 . On 10.6, we never used sys_timer_info.cycles.frequency , we always used sys_timer_info.microseconds.frequency . In this case, the code before the fix happened to produce the correct number.

            For the release notes:
            On 10.6, the bug affects only arm64-windows platform.
            Starting from 10.11, it affects nearly any platform.

            The effect of the bug is that in ANALYZE FORMAT=JSON output, pages_read_time_ms is printed using wrong units. Typically the printed value is much larger than real value in milliseconds.

            psergei Sergei Petrunia added a comment - For the release notes: On 10.6, the bug affects only arm64-windows platform. Starting from 10.11, it affects nearly any platform. The effect of the bug is that in ANALYZE FORMAT=JSON output, pages_read_time_ms is printed using wrong units. Typically the printed value is much larger than real value in milliseconds.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.