Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28461

semisync-slave server recovery fails to rollback prepared transaction

Details

    Description

      • Test config
      • Primary(master) : 192.168.254.24
      • Primary(master) : 192.168.254.25
      • MaxScale : 192.168.254.26
      • test table schema and query.
      • max client sessions : 5000 concurrent user
      • workload generator and test project file : Jmeter [^sysbench_insert_only_10.6.jmx]

      MariaDB [test]> show create table au_test\G
      *************************** 1. row ***************************
             Table: au_test
      Create Table: CREATE TABLE `au_test` (
        `id` bigint(20) NOT NULL AUTO_INCREMENT,
        `b` varchar(10) NOT NULL COMMENT '@@hostname',
        `c` datetime DEFAULT NULL,
        `d` datetime DEFAULT NULL,
        `e` bigint(20) DEFAULT NULL,
        `f` varchar(500) DEFAULT NULL,
        `g` varchar(100) DEFAULT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=457948 DEFAULT CHARSET=latin1;
       
      insert into au_test (b,c,d,e,f,g) values (@@hostname,now(), now(), last_insert_id(), (select variable_value from information_schema.session_status where variable_name = 'Rpl_semi_sync_slave_status'), cast(@@gtid_binlog_state as varchar(100)));
      

      Once master joins as a master once again after `kill -9 $(pidof mariadbd)` against master node, slave status changes to `running` instead of 'slave, running'

      Here is the output of `maxctrl list status`:

       
      {"timestamp":"2022-05-03 14:26:04.907","command":"maxctrl list servers","status":true,"output":"
      ┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State           │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ primary │ 192.168.254.2433064195        │ Master, Running │ 0-1-7533546
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ replica │ 192.168.254.2533064072        │ Slave, Running  │ 0-2-7496224
      └─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
       
      _this is the time when master server was forcefully killed:
      ----------------------------------------------------------------
      [root@primary my.cnf.d]# date
      Tue May  3 14:26:04 KST 2022
      [root@primary my.cnf.d]# kill -9 $(pidof mariadbd)
      ----------------------------------------------------------------_
       
      {"timestamp":"2022-05-03 14:26:07.547","command":"maxctrl list servers","status":true,"output":"
      ┌─────────┬────────────────┬──────┬─────────────┬────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State          │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
      │ primary │ 192.168.254.2433060           │ Down           │ 0-1-7533546
      ├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
      │ replica │ 192.168.254.2533064303        │ Slave, Running │ 0-2-7496224
      └─────────┴────────────────┴──────┴─────────────┴────────────────┴─────────────┘
       
       
      {"timestamp":"2022-05-03 14:26:10.969","command":"maxctrl list servers","status":true,"output":"
      ┌─────────┬────────────────┬──────┬─────────────┬────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State          │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
      │ primary │ 192.168.254.2433060           │ Down           │ 0-1-7533546
      ├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
      │ replica │ 192.168.254.2533064344        │ Slave, Running │ 0-2-7496224
      └─────────┴────────────────┴──────┴─────────────┴────────────────┴─────────────┘
       
       
      {"timestamp":"2022-05-03 14:26:13.606","command":"maxctrl list servers","status":true,"output":" => master was successfully rejoin as master again.
      ┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State           │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ primary │ 192.168.254.2433062275        │ Master, Running │ 0-1-7534828
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ replica │ 192.168.254.2533064344        │ Slave, Running  │ 0-2-7496224
      └─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
       
       
      *{"timestamp":"2022-05-03 14:26:15.677","command":"maxctrl list servers","status":true,"output":"  => At this point  Slave got changed into Running state due to Duplicate key error.
      ┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State           │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ primary │ 192.168.254.2433063069        │ Master, Running │ 0-1-7534828
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ replica │ 192.168.254.2533060           │ Running         │ 0-1-7496290
      └─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
      *
       
      {"timestamp":"2022-05-03 14:26:26.720","command":"maxctrl list servers","status":true,"output":"
      ┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
      │ Server  │ Address        │ Port │ Connections │ State           │ GTID        │
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ primary │ 192.168.254.2433062927        │ Master, Running │ 0-1-7534828
      ├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
      │ replica │ 192.168.254.2533060           │ Running         │ 0-1-7496290
      └─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
      

      I've run this test for quite some time, but still hitting this error quite a lot.

      -slave server error log

      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 816
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1577
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 643
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 981
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 812
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1398
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 803
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 857
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 833
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 894
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1788
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 846
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 879
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1118
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 490
      2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1330
      2022-05-03 14:25:50 0 [Note] Successfully truncated binlog file:./mariadb-bin.000007 from previous file size 1358 to pos:360 to remove transactions starting from GTID 0-2-7496224
      2022-05-03 14:25:50 0 [Note] Server socket created on IP: '0.0.0.0'.
      2022-05-03 14:25:50 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode.
      2022-05-03 14:25:50 0 [Note] Reading of all Master_info entries succeeded
      2022-05-03 14:25:50 0 [Note] Added new Master_info '' to hash table
      2022-05-03 14:25:50 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.6.7-3-MariaDB-enterprise-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Enterprise Server
      2022-05-03 14:25:53 7 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
      2022-05-03 14:25:53 7 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='192.168.254.24', master_port='3306', master_log_file='', master_log_pos='4'.
      2022-05-03 14:25:53 7 [Note] Previous Using_Gtid=No. New Using_Gtid=Current_Pos
      2022-05-03 14:25:53 10 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.24:3306' in log '' at position 4
      2022-05-03 14:25:53 11 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-2-7496223'
      2022-05-03 14:25:53 10 [Note] Slave I/O thread: connected to master 'repl@192.168.254.24:3306',replication starts at GTID position '0-2-7496223'
      2022-05-03 14:26:04 10 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2022-05-03 14:26:04 10 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000021' at position 41641390; GTID position '0-1-7534857'
      2022-05-03 14:26:04 10 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.24:3306' - retry-time: 60  maximum-retries: 100000  message: Can't connect to server on '192.168.254.24' (111 "Connection refused"), Internal MariaDB error code: 2003
      2022-05-03 14:26:13 11 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000009, pos: 904), semi-sync up to file , position 0.
      2022-05-03 14:26:13 11 [Note] Semi-sync replication switched OFF.
      2022-05-03 14:26:13 11 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.au_test; Duplicate entry '417427' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mariadb-bin.000021, end_log_pos 22320629, Gtid 0-1-7496291, Internal MariaDB error code: 1062
      2022-05-03 14:26:13 11 [Warning] Slave: Duplicate entry '417427' for key 'PRIMARY' Error_code: 1062
      2022-05-03 14:26:13 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000021' position 22320161; GTID position '0-1-7496290'
      2022-05-03 14:26:13 11 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000021' at position 22320161; GTID position '0-1-7496290', master: 192.168.254.24:3306
      2022-05-03 14:27:04 10 [Note] Slave: connected to master 'repl@192.168.254.24:3306',replication resumed in log 'mariadb-bin.000021' at position 41641390
      2022-05-03 14:31:14 2854 [Warning] Aborted connection 2854 to db: 'test' user: 'allen' host: '192.168.254.26' (Got timeout reading communication packets)
      2022-05-03 14:31:14 2876 [Warning] Aborted connection 2876 to db: 'test' user: 'allen' host: '192.168.254.26' (Got timeout reading communication packets)
      

      nothing much in master's error log.

      2022-05-03 14:26:10 0 [Note] Server socket created on IP: '0.0.0.0'.
      2022-05-03 14:26:10 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode.
      2022-05-03 14:26:10 0 [Note] Reading of all Master_info entries succeeded
      2022-05-03 14:26:10 0 [Note] Added new Master_info '' to hash table
      2022-05-03 14:26:10 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.6.7-3-MariaDB-enterprise-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Enterprise Server
      2022-05-03 14:26:14 802 [Note] InnoDB: Number of pools: 2
      2022-05-03 14:26:21 1851 [Note] InnoDB: Number of pools: 3
      2022-05-03 14:26:32 354 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000022, pos: 859), semi-sync up to file , position 0.
      2022-05-03 14:26:32 354 [Note] Semi-sync replication switched OFF.
      2022-05-03 14:27:04 1991 [Note] Start binlog_dump to slave_server(2), pos(mariadb-bin.000021, 41641390), using_gtid(1), gtid('0-1-7534857')
      2022-05-03 14:27:04 1991 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(./mariadb-bin.000022, 4)
      2022-05-03 14:27:04 1 [Note] Semi-sync replication switched ON with slave (server_id: 2) at (mariadb-bin.000022, 972296)
      

      rep_master_semi.7z - master(192.168.254.24) server's semi_sync status and show slave status
      rep_slave_semi.7z - slave(192.168.254.25) server's semi_sync status and show slave status

      Attachments

        1. maxscale.cnf
          3 kB
          Allen Lee
        2. primary-server.cnf
          2 kB
          Allen Lee
        3. rep_master_semi.7z
          31 kB
          Allen Lee
        4. rep_slave_semi.7z
          31 kB
          Allen Lee
        5. replica-server.cnf
          2 kB
          Allen Lee
        6. sysbench_insert_only_10.6.jmx
          31 kB
          Allen Lee

        Activity

          Elkin Andrei Elkin added a comment -

          allen.lee@mariadb.com, howdy.

          I examined the error log files to find in the primary's mariadb.err some strange looking lines suggesting that
          the restarted KILL-9 old master may still be executing some local transactions:

          2022-04-25 19:21:05 0 [Note] Successfully truncated binlog file:./mariadb-bin.000008 from previous file size 47372610 to pos:47371145 to remove transactions starting from GTID 0-1-1437105
          2022-04-25 19:21:05 0 [Note] Server socket created on IP: '0.0.0.0'.
          2022-04-25 19:21:05 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode.
          2022-04-25 19:21:05 0 [Note] /usr/sbin/mariadbd: ready for connections.
          Version: '10.6.7-3-MariaDB-enterprise-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Enterprise Server
          2022-04-25 19:21:05 5 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
          2022-04-25 19:21:05 5 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='192.168.254.25', master_port='3306', master_log_file='', master_log_pos='4'.
          2022-04-25 19:21:05 5 [Note] Previous Using_Gtid=No. New Using_Gtid=Current_Pos
          2022-04-25 19:21:05 6 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.25:3306' in log '' at position 4
          2022-04-25 19:21:05 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-1-1437104'
          2022-04-25 19:21:05 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-1-1437104'
           
          2022-04-25 19:21:16 23 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000010, pos: 876), semi-sync up to file , position 0.
           
          2022-04-25 19:21:16 23 [Note] Semi-sync replication switched OFF.
          2022-04-25 19:21:16 20 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.au_test; Duplicate entry '340208' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mariadb-bin.000001, end_log_pos 166317905, Gtid 0-2-1437108, Internal MariaDB error code: 1062
           
          2022-04-25 19:21:16 20 [Warning] Slave: Duplicate entry '340208' for key 'PRIMARY' Error_code: 1062
          

          With empty lines emphasized

          [Warning] Timeout waiting for reply of binlog

          can be only on a master-role server. It could have been that the timeout warning followed with that transaction commit which ensues the following duplicate error by the slave applier.
          The warning's file: mariadb-bin.000010, pos: 876 simply suggests where to find out what was doing that transaction on the slave.

          Could you please investigate that?

          Cheers,

          Andrei

          Elkin Andrei Elkin added a comment - allen.lee@mariadb.com , howdy. I examined the error log files to find in the primary's mariadb.err some strange looking lines suggesting that the restarted KILL-9 old master may still be executing some local transactions: 2022-04-25 19:21:05 0 [Note] Successfully truncated binlog file:./mariadb-bin.000008 from previous file size 47372610 to pos:47371145 to remove transactions starting from GTID 0-1-1437105 2022-04-25 19:21:05 0 [Note] Server socket created on IP: '0.0.0.0'. 2022-04-25 19:21:05 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode. 2022-04-25 19:21:05 0 [Note] /usr/sbin/mariadbd: ready for connections. Version: '10.6.7-3-MariaDB-enterprise-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Enterprise Server 2022-04-25 19:21:05 5 [Note] Master connection name: '' Master_info_file: 'master.info' Relay_info_file: 'relay-log.info' 2022-04-25 19:21:05 5 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='192.168.254.25', master_port='3306', master_log_file='', master_log_pos='4'. 2022-04-25 19:21:05 5 [Note] Previous Using_Gtid=No. New Using_Gtid=Current_Pos 2022-04-25 19:21:05 6 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.25:3306' in log '' at position 4 2022-04-25 19:21:05 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-1-1437104' 2022-04-25 19:21:05 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-1-1437104'   2022-04-25 19:21:16 23 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000010, pos: 876), semi-sync up to file , position 0.   2022-04-25 19:21:16 23 [Note] Semi-sync replication switched OFF. 2022-04-25 19:21:16 20 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.au_test; Duplicate entry '340208' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mariadb-bin.000001, end_log_pos 166317905, Gtid 0-2-1437108, Internal MariaDB error code: 1062   2022-04-25 19:21:16 20 [Warning] Slave: Duplicate entry '340208' for key 'PRIMARY' Error_code: 1062 With empty lines emphasized [Warning] Timeout waiting for reply of binlog can be only on a master-role server. It could have been that the timeout warning followed with that transaction commit which ensues the following duplicate error by the slave applier. The warning's file: mariadb-bin.000010, pos: 876 simply suggests where to find out what was doing that transaction on the slave. Could you please investigate that? Cheers, Andrei
          Elkin Andrei Elkin added a comment -

          allen.lee@mariadb.com,

          to '2022-05-03 14:0x', you must mean '14:2x', right?
          Because this slave server session

          2022-05-03 13:59:15 6 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.25:3306' in log '' at position 4
          2022-05-03 13:59:15 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-1-5986869'
          2022-05-03 13:59:15 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-1-5986869'
          2022-05-03 13:59:15 6 [ERROR] Error reading packet from server: Error: connecting slave requested to start from GTID 0-1-5986869, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions (server_errno=1236)
          2022-05-03 13:59:15 6 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-5986869, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions', Internal MariaDB error code: 1236
          2022-05-03 13:59:15 6 [Note] Slave I/O thread exiting, read up to log 'FIRST', position 4; GTID position 0-1-5986869, master 192.168.254.25:3306
          2022-05-03 14:23:11 7 [Note] Error reading relay log event: slave SQL thread was killed
          2022-05-03 14:23:11 7 [Note] Slave SQL thread exiting, replication stopped in log 'FIRST' at position 4; GTID position '0-2-5437670', master: 192.168.254.25:3306
          
          

          does not any duplicate key error.
          That session ends with

          2022-05-03 14:23:41 0 [Note] /usr/sbin/mariadbd: Shutdown complete
          
          

          The next one hits the dup error having the same timeout pattern:

           2022-05-03 14:23:42 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-2-7451561'
          2022-05-03 14:24:26 6 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
          2022-05-03 14:24:26 6 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000003' at position 16955153; GTID position '0-2-7485404'
          2022-05-03 14:24:26 6 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.25:3306' - retry-time: 60  maximum-retries: 100000  message: Can't connect to server on '192.168.254.25' (111 "Connection refused"), Internal MariaDB error code: 2003
           
          2022-05-03 14:24:28 7 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000021, pos: 859), semi-sync up to file , position 0.
           
          2022-05-03 14:24:28 7 [Note] Semi-sync replication switched OFF.
          2022-05-03 14:25:26 6 [Note] Slave: connected to master 'repl@192.168.254.25:3306',replication resumed in log 'mariadb-bin.000003' at position 16955153
          2022-05-03 14:25:29 6 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
          2022-05-03 14:25:29 6 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000006' at position 5516244; GTID position '0-2-7496274'
          2022-05-03 14:25:29 6 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.25:3306' - retry-time: 60  maximum-retries: 100000  message: Can't connect to server on '192.168.254.25' (111 "Connection refused"), Internal MariaDB error code: 2003
          2022-05-03 14:25:48 7 [Note] Error reading relay log event: slave SQL thread was killed
          2022-05-03 14:25:48 7 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000006' at position 5516244; GTID position '0-2-7496274', master: 192.168.254.25:3306
          2022-05-03 14:25:48 6 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
          
          

          Elkin Andrei Elkin added a comment - allen.lee@mariadb.com , to '2022-05-03 14:0x', you must mean '14:2x', right? Because this slave server session 2022-05-03 13:59:15 6 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.25:3306' in log '' at position 4 2022-05-03 13:59:15 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-1-5986869' 2022-05-03 13:59:15 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-1-5986869' 2022-05-03 13:59:15 6 [ERROR] Error reading packet from server: Error: connecting slave requested to start from GTID 0-1-5986869, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions (server_errno=1236) 2022-05-03 13:59:15 6 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-5986869, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions', Internal MariaDB error code: 1236 2022-05-03 13:59:15 6 [Note] Slave I/O thread exiting, read up to log 'FIRST', position 4; GTID position 0-1-5986869, master 192.168.254.25:3306 2022-05-03 14:23:11 7 [Note] Error reading relay log event: slave SQL thread was killed 2022-05-03 14:23:11 7 [Note] Slave SQL thread exiting, replication stopped in log 'FIRST' at position 4; GTID position '0-2-5437670', master: 192.168.254.25:3306 does not any duplicate key error. That session ends with 2022-05-03 14:23:41 0 [Note] /usr/sbin/mariadbd: Shutdown complete The next one hits the dup error having the same timeout pattern : 2022-05-03 14:23:42 6 [Note] Slave I/O thread: connected to master 'repl@192.168.254.25:3306',replication starts at GTID position '0-2-7451561' 2022-05-03 14:24:26 6 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013) 2022-05-03 14:24:26 6 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000003' at position 16955153; GTID position '0-2-7485404' 2022-05-03 14:24:26 6 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.25:3306' - retry-time: 60 maximum-retries: 100000 message: Can't connect to server on '192.168.254.25' (111 "Connection refused"), Internal MariaDB error code: 2003   2022-05-03 14:24:28 7 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000021, pos: 859), semi-sync up to file , position 0.   2022-05-03 14:24:28 7 [Note] Semi-sync replication switched OFF. 2022-05-03 14:25:26 6 [Note] Slave: connected to master 'repl@192.168.254.25:3306',replication resumed in log 'mariadb-bin.000003' at position 16955153 2022-05-03 14:25:29 6 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013) 2022-05-03 14:25:29 6 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000006' at position 5516244; GTID position '0-2-7496274' 2022-05-03 14:25:29 6 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.25:3306' - retry-time: 60 maximum-retries: 100000 message: Can't connect to server on '192.168.254.25' (111 "Connection refused"), Internal MariaDB error code: 2003 2022-05-03 14:25:48 7 [Note] Error reading relay log event: slave SQL thread was killed 2022-05-03 14:25:48 7 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000006' at position 5516244; GTID position '0-2-7496274', master: 192.168.254.25:3306 2022-05-03 14:25:48 6 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
          Elkin Andrei Elkin added a comment -

          There's a way to reproduce a dup key error without MXS and it's still at analysis.

          Elkin Andrei Elkin added a comment - There's a way to reproduce a dup key error without MXS and it's still at analysis.

          Looks okay (barring any new changes needed from the review).

          angelique.sklavounos Angelique Sklavounos (Inactive) added a comment - Looks okay (barring any new changes needed from the review).

          People

            Elkin Andrei Elkin
            allen.lee@mariadb.com Allen Lee (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.