Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31448

Killing a replica thread awaiting its GCO can hang/crash a parallel replica

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
    • 10.4.31
    • Replication

    Description

      Killing a replica thread awaiting its GCO can hang/crash a parallel replica

      If any transactions have started their commit phase while a replica thread that is waiting on its GCO to start is killed, the parallel slave will hang (non-debug) or crash in an assertion error (debug) if the actively committing transactions error. This is because the killed replica thread will perform GCO cleanup on the previous GCO while it has not finished, leading to the same outcome as MDEV-30780.

      For example, on a replica with three worker threads, if we have three transactions, T1, T2, and T3, grouped into two GCOs as GCO1

      {T1, T2}

      and GCO2

      {T3}

      such that T1 is executing, T2 is ready and queued for group commit, and T3 is waiting for its GCO to start, if T3 is killed it will perform GCO cleanup on GCO1 even though T1 and T2 are still active. The following MTR test shows this:

      --source include/master-slave.inc
      --source include/have_innodb.inc
      --source include/have_debug.inc
      --source include/have_binlog_format_row.inc
       
      --echo #
      --echo # Initialize test data
      --connection master
      create table t1 (a int) engine=innodb;
      create table t2 (a int) engine=innodb;
      insert into t1 values (1);
      --source include/save_master_gtid.inc
       
      --connection slave
      --source include/sync_with_master_gtid.inc
      --source include/stop_slave.inc
      --let $save_innodb_lock_wait_timeout= `SELECT @@global.innodb_lock_wait_timeout`
      --let $save_transaction_retries= `SELECT @@global.slave_transaction_retries`
      set @@global.slave_parallel_threads= 3;
      set @@global.slave_parallel_mode= CONSERVATIVE;
      set @@global.innodb_lock_wait_timeout= 2;
      set @@global.slave_transaction_retries= 0;
      BEGIN;
      SELECT * FROM t1 WHERE a=1 FOR UPDATE;
       
      --connection master
      SET @old_dbug= @@SESSION.debug_dbug;
      SET @@SESSION.debug_dbug="+d,binlog_force_commit_id";
       
      # GCO 1
      SET @commit_id= 10000;
      # T1
      update t1 set a=2 where a=1;
      # T2
      insert into t2 values (1);
       
      # GCO 2
      SET @commit_id= 10001;
      # T3
      insert into t1 values (3);
       
      --connection slave
      --source include/start_slave.inc
       
      --let $wait_condition= SELECT count(*)=1 FROM information_schema.processlist WHERE state LIKE 'Update_rows_log_event::find_row(-1)' and  command LIKE 'Slave_worker';
      --source include/wait_condition.inc
      --let $wait_condition= SELECT count(*)=1 FROM information_schema.processlist WHERE state LIKE 'Waiting for prior transaction to commit%' and  command LIKE 'Slave_worker';
      --source include/wait_condition.inc
      --let $wait_condition= SELECT count(*)=1 FROM information_schema.processlist WHERE state LIKE 'Waiting for prior transaction to start commit%' and  command LIKE 'Slave_worker';
      --source include/wait_condition.inc
       
      --let $t3_tid= `SELECT ID FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE LIKE 'Waiting for prior transaction to start commit'`
      --eval kill $t3_tid
       
      --echo #
      --echo # Cleanup
      --connection master
      DROP TABLE t1, t2;
      --source include/save_master_gtid.inc
       
      --connection slave
      --source include/sync_with_master_gtid.inc
       
      --source include/rpl_end.inc
      

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.