Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35465

Async replication stops working on Galera async replica node when parallel replication is enabled

Details

    Description

      Test case. Attached configuration files from galera nodes and master server

      GALERA_BASE=/home/ramesh/rpl/GAL_MD291024-mariadb-10.11.10-linux-x86_64-opt
      RPL_BASE=/home/ramesh/rpl/MD291024-mariadb-10.11.10-linux-x86_64-opt
      DATADIR=/home/ramesh/rpl
       
      $RPL_BASE/bin/mariadb-admin -uroot --socket=/home/ramesh/rpl/data/socket.sock shutdown
      $GALERA_BASE/bin/mariadb-admin -uroot --socket=$DATADIR/node2/mysql.sock shutdown
      $GALERA_BASE/bin/mariadb-admin -uroot --socket=$DATADIR/node1/mysql.sock shutdown
       
      rm -Rf $DATADIR/node* $DATADIR/data
       
      $GALERA_BASE/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=$GALERA_BASE --datadir=$DATADIR/node1
      $GALERA_BASE/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=$GALERA_BASE --datadir=$DATADIR/node2
      $RPL_BASE/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=$RPL_BASE --datadir=$DATADIR/data
       
      $GALERA_BASE/bin/mariadbd --defaults-file=$DATADIR/n1.cnf --wsrep-new-cluster > $DATADIR/node1/node1.err 2>&1 & 
       
      sleep 2
       
      $GALERA_BASE/bin/mariadb-admin  -uroot -S$DATADIR/node1/mysql.sock ping
       
      $GALERA_BASE/bin/mariadbd --defaults-file=$DATADIR/n2.cnf > $DATADIR/node2/node2.err 2>&1 &
       
      sleep 5
       
      $RPL_BASE/bin/mariadbd --defaults-file=$DATADIR/my.cnf  > $DATADIR/data/mysql.err 2>&1 & 
       
      sleep 2
       
      $RPL_BASE/bin/mariadb-admin  -uroot -S/home/ramesh/rpl/data/socket.sock ping
       
       
      $RPL_BASE/bin/mysql -uroot --socket=/home/ramesh/rpl/data/socket.sock
       
      set sql_log_bin=0;
      delete from mysql.user where user='';
      flush privileges;
      set sql_log_bin=1;
      create user repl@'%' identified by 'repl';
      grant all on *.* to  repl@'%';
      flush privileges;
      \q
       
      $GALERA_BASE/bin/mysql -uroot -S$DATADIR/node1/mysql.sock
       
      CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=4040, MASTER_USER='repl', MASTER_PASSWORD='repl', MASTER_USE_GTID=slave_pos; START SLAVE; 
      SHOW SLAVE STATUS \G
       
      sysbench /usr/share/sysbench/oltp_insert.lua --mysql-db=test --mysql-user=root  --db-driver=mysql --mysql-socket=/home/ramesh/rpl/data/socket.sock --threads=20 --tables=20 --table-size=100000 prepare
       
      sysbench /usr/share/sysbench/oltp_write_only.lua --table-size=50000 --tables=16 --mysql-db=test --mysql-user=root --threads=16 --db-driver=mysql --mysql-socket=/home/ramesh/rpl/data/socket.sock --time=1000 run
      

      Error info

      2024-11-20 13:31:52 33 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
      2024-11-20 13:31:52 33 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MariaDB server acts as a replica and has its hostname changed. Please use '--log-basename=#' or '--relay-log=galapq-relay-bin' to avoid this problem.
      2024-11-20 13:31:52 33 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='4040', master_log_file='', master_log_pos='4'.
      2024-11-20 13:31:52 33 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
      2024-11-20 13:31:52 34 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@127.0.0.1:4040' in log '' at position 4
      2024-11-20 13:31:52 34 [Note] Slave I/O thread: connected to master 'repl@127.0.0.1:4040',replication starts at GTID position ''
      2024-11-20 13:31:52 35 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './galapq-relay-bin.000001' position: 4; GTID position ''
      2024-11-20 13:33:46 37 [Warning] WSREP: Parallel slave worker failed at wsrep_before_command() hook
      2024-11-20 13:33:46 37 [Warning] Slave: Connection was killed Error_code: 1927
      2024-11-20 13:33:46 37 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2024-11-20 13:33:46 37 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 779747145; GTID position '10-111-6438'
      2024-11-20 13:33:46 47 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 10-111-6440, Internal MariaDB error code: 1964
      2024-11-20 13:33:46 47 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2024-11-20 13:33:46 47 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 779747145; GTID position '10-111-6438'
      

      Attachments

        1. my.cnf
          0.5 kB
        2. n1.cnf
          1 kB
        3. n2.cnf
          1.0 kB
        4. node1.err
          81 kB

        Activity

          Record change info

          2024-11-20 12:05:18 34 [Warning] WSREP: Parallel slave worker failed at wsrep_before_command() hook
          2024-11-20 12:05:18 34 [Warning] Slave: Record has changed since last read in table 'sbtest5' Error_code: 1020
          2024-11-20 12:05:18 34 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
          2024-11-20 12:05:18 34 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987'
          2024-11-20 12:05:18 25 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 10-111-989, Internal MariaDB error code: 1964
          2024-11-20 12:05:18 25 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-11-20 12:05:18 25 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987'
          2024-11-20 12:05:18 24 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-11-20 12:05:18 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987'
          

          ramesh Ramesh Sivaraman added a comment - Record change info 2024-11-20 12:05:18 34 [Warning] WSREP: Parallel slave worker failed at wsrep_before_command() hook 2024-11-20 12:05:18 34 [Warning] Slave: Record has changed since last read in table 'sbtest5' Error_code: 1020 2024-11-20 12:05:18 34 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213 2024-11-20 12:05:18 34 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987' 2024-11-20 12:05:18 25 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 10-111-989, Internal MariaDB error code: 1964 2024-11-20 12:05:18 25 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-11-20 12:05:18 25 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987' 2024-11-20 12:05:18 24 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-11-20 12:05:18 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 77176634; GTID position '10-111-987'

          janlindstrom As discussed, could not reproduce the issue with slave-parallel-threads=1

          ramesh Ramesh Sivaraman added a comment - janlindstrom As discussed, could not reproduce the issue with slave-parallel-threads=1

          Some of the reported errors come from retry_event_group(). So there has already been an error. Maybe the wsrep error state is just not reset properly.

          janlindstrom Jan Lindström added a comment - Some of the reported errors come from retry_event_group(). So there has already been an error. Maybe the wsrep error state is just not reset properly.
          sysprg Julius Goryavsky added a comment - The fix has been merged with the main branch: https://github.com/MariaDB/server/commit/a2575a0703406f633659d9b9c8e0ff9750c888bf

          People

            teemu.ollakka Teemu Ollakka
            ramesh Ramesh Sivaraman
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.