Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39368

Add mtr option to test Optimizer Context Replay

    XMLWordPrintable

Details

    • Q3/2026 Server Maintenance

    Description

      Current state

      The code is here:

      commit d04f00eb9c097d5ecc373c47927d9e486ff18985 (HEAD -> bb-12.3-MDEV-39368-test-replay, origin/bb-12.3-MDEV-39368-test-replay)
      Author: Sergei Petrunia <sergey@mariadb.com>
      Date:   Tue Apr 21 13:49:37 2026 +0300
       
          Enable replay for any EXPLAIN query, do not require "FORMAT=JSON".
      

      Usage

      Now mysql-test-run has new arguments:

      --replay-server
      --replay-server-manual
      

      You should use one,

      • --replay-server will start the replay server automatically.
      • --replay-server-manual will tell you the arguments to start the replay server with (like --manual-gdb does). It is useful for debugging.

      How it works

      The replay server

      Mtr will start a replay server (also called "extra server" sometimes, this will change)
      The Replay server log files are in mysql-test dir:

      var/log/extra_server_1.err  
      var/log/extra_server_1.log
      

      The replay server data directory is:

      var/extra_server_1
      

      Mtr will log each replay session in this file, separated with ### REPLAY SESSION ###

      var/log/replay_queries.log
      

      Testing

      When mysql-test-run sees that it is about to run an EXPLAIN query, it will do the following:

      On the server being tested:

      set optimizer_record_context=1;
      EXPLAIN ... ;      -- Mark this ORIGINAL-EXPLAIN
      select context from information_schema.optimizer_context;
      

      Then, split the context into queries (atm this is done in mysqltest.cc, a bit poorly) and run the queries against the replay server.
      It should capture the output of the EXPLAIN command and emit it into mysql-test-run output. (INSTEAD of the output we got at ORIGINAL-EXPLAIN. This way, if replay produced a different explain, we will get mtr test failure.

      Basic test

      ./mtr replay_server_test
      

      passes

      ./mtr --replay-server replay_server_test
      

      currently fails with:

      --- /home/psergey/dev-git/11.8-skip-records-in-range/mysql-test/main/replay_server_test.result  2026-04-21 13:45:55.276870266 +0300
      +++ /home/psergey/dev-git/11.8-skip-records-in-range/mysql-test/main/replay_server_test.reject  2026-04-21 14:02:57.099108450 +0300
      @@ -1,4 +1,11 @@
      -REPLAY_SERVER_SOCKET is NOT set
      +REPLAY_SERVER_SOCKET is set: /home/psergey/dev-git/11.8-skip-records-in-range/mysql-test/var/extra_server_1/mysqld.sock
      +Checking socket: /home/psergey/dev-git/11.8-skip-records-in-range/mysql-test/var/extra_server_1/mysqld.sock
      +Socket exists: YES
      +Is socket: YES
      +Replay server PID: 553865
      +Process running: YES
      +Socket directory: /home/psergey/dev-git/11.8-skip-records-in-range/mysql-test/var/extra_server_1
      +Files in socket dir: mysqld.sock, mysqld.pid, data
       CREATE TABLE t1 (a INT, b INT, KEY(a));
       INSERT INTO t1 VALUES (1,1), (2,2), (3,3);
       analyze table t1;
       
      Result length mismatch
      

      but this just show that the replay server is used.

      Example catch

      ./mtr --replay-server sargable_date_cond 
      

      gives

      ...
      @@ -1723,7 +1715,7 @@
       {
         "query_block": {
           "select_id": 1,
      -    "cost": 0.003808422,
      +    "cost": 0.001617224,
           "nested_loop": [
             {
               "table": {
      @@ -1735,7 +1727,7 @@
                 "used_key_parts": ["a"],
                 "loops": 1,
                 "rows": 2,
      -          "cost": 0.003808422,
      +          "cost": 0.001617224,
                 "filtered": 100,
                 "index_condition": "t1.a between '2010-01-01 00:00:00' and '2010-12-31 23:59:59'",
                 "attached_condition": "t1.c < (subquery#2)"
      
      


      Older things (out of date)


      Ideas

      Modify mtr to do testing of optimizer context replay as follows:
      mtr has a special argument --verify-optimizer-context-replay.

      One can run mtr with this argument and run testcases with it.
      Then, if the current query is $query= EXPLAIN [FORMAT=JSON] ... then:
      Modify the statement to be

      set statement optimizer_record_context=1 for $query; 
      

      process its output as usual.
      After the command, also run

       select * from information_schema.optimizer_context;
      

      .
      This gives enough information to run the replay on another server. It should produce the same output.

      We will need another, clean server to run the replay.
      mtr has the code to create data directory and start the server, etc.
      I'm not sure how hard it is to achieve this.

      Implementation

      Now mysql-test-run has new arguments:

      --replay-server
      --replay-server-manual
      

      You should use one,

      • --replay-server will start the replay server automatically.
      • --replay-server-manual will tell you the arguments to start the replay server with (like --manual-gdb does). It is useful for debugging.

      How it works

      When mysql-test-run sees that it is about to run an EXPLAIN query, it will do the following:

      On the server being tested:

      set optimizer_record_context=1;
      EXPLAIN FORMAT=JSON ... ;      -- Mark this ORIGINAL-EXPLAIN
      select context from information_schema.optimizer_context;
      

      Then, split the context into queries (atm this is done in mysqltest.cc, a bit poorly) and run the queries against the replay server.
      It should capture the output of the EXPLAIN command and emit it into mysql-test-run output. (INSTEAD of the output we got at ORIGINAL-EXPLAIN. This way, if replay produced a different explain, we will get mtr test failure.

      The code

      it is here. Caution, it is AI-generated.

      commit 824f409df6454ae26ae9a0f10e989a02957737b5 (HEAD -> bb-12.3-MDEV-39368-test-replay, origin/bb-12.3-MDEV-39368-test-replay, bb-12.3-MDEV-38805-dev-sprint-work-2)
      Author: Sergei Petrunia <sergey@mariadb.com>
      Date:   Sun Apr 19 19:35:52 2026 +0300
       
          Add --replay-server-manual.
       
      commit fb426c9ff0c032d82020e7414f1f2e7a87cd69ff
      Author: Sergei Petrunia <sergey@mariadb.com>
      Date:   Sun Apr 19 13:44:03 2026 +0300
       
          Trace replay: add --extra-server support, part 1.
      
      

      Problem 1 (FIXED)

      The basic example is:

      ./mtr --replay-server-manual --verbose replay_server_test
      

      and it crashes the replay server here:

      Thread 15 "one_connection" received signal SIGABRT, Aborted.
      [Switching to Thread 0x7ffff40c46c0 (LWP 431724)]
      Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
      __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
      warning: 44     ./nptl/pthread_kill.c: No such file or directory
      (gdb) wher
      #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
      #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
      #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
      #3  0x00007ffff704527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
      #4  0x00007ffff70288ff in __GI_abort () at ./stdlib/abort.c:79
      #5  0x00007ffff702881b in __assert_fail_base (fmt=0x7ffff71d01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55555732d3fb "*freq > 0", 
          file=file@entry=0x55555732d070 "/home/psergey/dev-git/11.8-skip-records-in-range/sql/opt_context_store_replay.cc", line=line@entry=1621, 
          function=function@entry=0x55555732d408 "void Optimizer_context_replay::infuse_table_stats(TABLE*)") at ./assert/assert.c:96
      #6  0x00007ffff703b517 in __assert_fail (assertion=0x55555732d3fb "*freq > 0", 
          file=0x55555732d070 "/home/psergey/dev-git/11.8-skip-records-in-range/sql/opt_context_store_replay.cc", line=1621, 
          function=0x55555732d408 "void Optimizer_context_replay::infuse_table_stats(TABLE*)") at ./assert/assert.c:105
      #7  0x000055555654f9f1 in Optimizer_context_replay::infuse_table_stats (this=0x7fffa4017ba0, table=0x7fffa402b5d8)
          at /home/psergey/dev-git/11.8-skip-records-in-range/sql/opt_context_store_replay.cc:1621
      #8  0x00005555562fa87a in set_statistics_for_table (thd=0x7fffa4000dc8, table=0x7fffa402b5d8) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_statistics.cc:4192
      #9  0x00005555562520bf in make_join_statistics (join=0x7fffa401a268, 
          tables_list=@0x7fffa4018470: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffa401aa30, last = 0x7fffa401aa30, elements = 1}, <No data fields>}, 
          keyuse_array=0x7fffa401a5c8) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_select.cc:5722
      #10 0x0000555556247791 in JOIN::optimize_inner (this=0x7fffa401a268) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_select.cc:2763
      #11 0x0000555556244ac0 in JOIN::optimize (this=0x7fffa401a268) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_select.cc:2016
      #12 0x000055555625134e in mysql_select (thd=0x7fffa4000dc8, tables=0x7fffa40188a8, 
          fields=@0x7fffa4018510: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffa4018858, last = 0x7fffa401acc0, elements = 2}, <No data fields>}, 
          conds=0x7fffa40191b8, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2164525828, result=0x7fffa401a190, unit=0x7fffa4005408, 
          select_lex=0x7fffa4018258) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_select.cc:5425
      #13 0x000055555629c9b3 in mysql_explain_union (thd=0x7fffa4000dc8, unit=0x7fffa4005408, result=0x7fffa401a190)
          at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_select.cc:31761
      #14 0x00005555561db90a in execute_sqlcom_select (thd=0x7fffa4000dc8, all_tables=0x7fffa40188a8) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_parse.cc:6165
      #15 0x00005555561d3611 in mysql_execute_command (thd=0x7fffa4000dc8, is_called_from_prepared_stmt=false)
          at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_parse.cc:3970
      #16 0x00005555561e0fb5 in mysql_parse (thd=0x7fffa4000dc8, rawbuf=0x7fffa4018190 "EXPLAIN FORMAT=JSON SELECT * FROM t1 WHERE a = 1", length=48, parser_state=0x7ffff40c32b0)
          at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_parse.cc:7954
      #17 0x00005555561ccb98 in dispatch_command (command=COM_QUERY, thd=0x7fffa4000dc8, packet=0x7fffa400c0d9 "", packet_length=49, blocking=true)
          at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_parse.cc:1896
      #18 0x00005555561cb4c7 in do_command (thd=0x7fffa4000dc8, blocking=true) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_parse.cc:1432
      #19 0x00005555563ea31d in do_handle_one_connection (connect=0x555559472ec8, put_in_cache=true) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_connect.cc:1503
      #20 0x00005555563ea092 in handle_one_connection (arg=0x555559472ec8) at /home/psergey/dev-git/11.8-skip-records-in-range/sql/sql_connect.cc:1415
      #21 0x00005555569f9db2 in pfs_spawn_thread (arg=0x555559474ea8) at /home/psergey/dev-git/11.8-skip-records-in-range/storage/perfschema/pfs.cc:2198
      #22 0x00007ffff709caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
      #23 0x00007ffff7129c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
      

      Attachments

        1. replay-server-run2.txt
          3.64 MB
        2. replay-server-run1.txt
          490 kB
        3. out-apr24.txt
          2.89 MB
        4. context-replay-apr26.log
          1.12 MB

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Sergei Golubchik Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.