Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28777

binlog.binlog_truncate_multi_engine failed in bb with Lost connection to server during query

Details

    Description

      https://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/8853

      10.6 4179f93d28035ea2798cb1c16feeaaef

      binlog.binlog_truncate_multi_engine 'innodb,row' w4 [ fail ]
              Test ended at 2022-06-06 16:16:56
       
      CURRENT_TEST: binlog.binlog_truncate_multi_engine
      mysqltest: In included file "/usr/share/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc": 
      included from /usr/share/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test at line 47:
      At line 42: query 'if (`SELECT $case = "B"`)
      {' failed: <Unknown> (2013): Lost connection to server during query
       
      The result from queries just before the failure was:
      < snip >
      #
      # Case "B" : "one engine has committed its transaction branch"
      #
      RESET MASTER;
      FLUSH LOGS;
      SET GLOBAL max_binlog_size= 4096;
      connect con1,localhost,root,,;
      List of binary logs before rotation
      show binary logs;
      Log_name	File_size
      master-bin.000001	#
      master-bin.000002	#
      INSERT INTO t1 VALUES (1, REPEAT("x", 1));
      INSERT INTO t2 VALUES (1, REPEAT("x", 1));
      SET GLOBAL debug_dbug="d,enable_log_write_upto_crash";
      BEGIN;
      INSERT INTO t2 VALUES (2, REPEAT("x", 4100));
      INSERT INTO t1 VALUES (2, REPEAT("x", 4100));
      COMMIT;
      connection default;
      

      Has happened a few times, always on kvm-rpm-centos74-amd64-debug.

      Attachments

        Activity

          Binlog does

          void
          MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
          {
          ...
                mysql_mutex_lock(&current->thd->LOCK_thd_data);
                run_commit_ordered(current->thd, current->all);
                mysql_mutex_unlock(&current->thd->LOCK_thd_data);
          

          Then in run_commit_ordered()

          void
          TC_LOG::run_commit_ordered(THD *thd, bool all)
          {
          ...    DEBUG_SYNC(thd, "commit_after_run_commit_ordered");
            }
          }
          

          When the server is being shut down it runs

          static my_bool kill_thread_phase_1(THD *thd, int *n_threads_awaiting_ack)
          {
            if (thd->slave_thread || thd->is_binlog_dump_thread() ||
                (shutdown_wait_for_slaves &&
                ...
          

          which does

          bool THD::is_binlog_dump_thread()
          {
            mysql_mutex_lock(&LOCK_thd_data);
          

          but this mutex is already taken by the group commit leader, which is forever waiting on the "commit_after_run_commit_ordered" sync point. mysqltest gives up waiting for the server to shut down and kills it with SIGABRT.

          serg Sergei Golubchik added a comment - Binlog does void MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader) { ... mysql_mutex_lock(&current->thd->LOCK_thd_data); run_commit_ordered(current->thd, current->all); mysql_mutex_unlock(&current->thd->LOCK_thd_data); Then in run_commit_ordered() void TC_LOG::run_commit_ordered(THD *thd, bool all) { ... DEBUG_SYNC(thd, "commit_after_run_commit_ordered" ); } } When the server is being shut down it runs static my_bool kill_thread_phase_1(THD *thd, int *n_threads_awaiting_ack) { if (thd->slave_thread || thd->is_binlog_dump_thread() || (shutdown_wait_for_slaves && ... which does bool THD::is_binlog_dump_thread() { mysql_mutex_lock(&LOCK_thd_data); but this mutex is already taken by the group commit leader, which is forever waiting on the "commit_after_run_commit_ordered" sync point. mysqltest gives up waiting for the server to shut down and kills it with SIGABRT.
          Elkin Andrei Elkin added a comment -

          Interestingly S@rg's comments in there talk about a different already fixed by him `2ab52cc0e5c3` issue in the case C.

          This time it is B.
          Could you please study the commit which is hopefully instructive.

          Elkin Andrei Elkin added a comment - Interestingly S@rg's comments in there talk about a different already fixed by him `2ab52cc0e5c3` issue in the case C. This time it is B. Could you please study the commit which is hopefully instructive.

          Thanks, the fix makes sense to me.

          angelique.sklavounos Angelique Sklavounos (Inactive) added a comment - Thanks, the fix makes sense to me.

          People

            Elkin Andrei Elkin
            angelique.sklavounos Angelique Sklavounos (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.