Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33909

Slave DB can not sync after Lock wait timeout exceeded transaction on Slave

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • XA

    Description

      Hi team,
      After upgrade mariadb 10.3.16 to 10.6.15 version. We have some slave instances with over 1.2 TB size usually occur: 'Lock wait timeout exceeded' event - Error code 1205. Replicate can not sync after event occured.
      We see 'slave-transacsion-retry-error' parameter already set 1205 to list. So can you help me check it?

      Attachments

        Activity

          You didn't mention it in your description, but the Label "XA" suggests you are using XA transactions (XA PREPARE ...) ?

          If so, XA is known to be broken from 10.5, see MDEV-32020 and similar bugs.
          I'd suggest to use stick to 10.4 for now in this case.

          knielsen Kristian Nielsen added a comment - You didn't mention it in your description, but the Label "XA" suggests you are using XA transactions (XA PREPARE ...) ? If so, XA is known to be broken from 10.5, see MDEV-32020 and similar bugs. I'd suggest to use stick to 10.4 for now in this case.

          using XA transactions (XA PREPARE ...) ==> Yes, right. We are using and during replicate problem, It sometimes occur 'XA PREPARED' state.

          Tony Tran Tran Van Thanh added a comment - using XA transactions (XA PREPARE ...) ==> Yes, right. We are using and during replicate problem, It sometimes occur 'XA PREPARED' state.
          Tony Tran Tran Van Thanh added a comment - - edited

          As you can see the logfile:
          ======
          [(none)] > show slave status\G;

          Slave_IO_State: Waiting for master to send event
          Master_Host: xxxxxxxxxxx
          Master_User: repl
          Master_Port: xxxx
          Connect_Retry: 5
          Master_Log_File: mysql-bin.0029xx
          Read_Master_Log_Pos: 468143153
          Relay_Log_File: mysql-relay.00xx
          Relay_Log_Pos: 28394209
          Relay_Master_Log_File: mysql-bin.00xxx
          Slave_IO_Running: Yes
          Slave_SQL_Running: No
          Replicate_Do_DB:
          Replicate_Ignore_DB:
          Replicate_Do_Table:
          Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
          Replicate_Wild_Ignore_Table:
          Last_Errno: 1205
          Last_Error: Lock wait timeout exceeded; try restarting transaction
          Skip_Counter: 0
          Exec_Master_Log_Pos: 70791548
          Relay_Log_Space: 593076835
          Until_Condition: None
          Until_Log_File:
          Until_Log_Pos: 0
          Master_SSL_Allowed: No

          ==========

          2024-04-12 3:09:29 61450 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
          2024-04-12 3:09:29 61450 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 32-3200-25580091948, Internal MariaDB error code: 1205
          2024-04-12 3:09:29 61450 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
          2024-04-12 3:09:29 61450 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
          548; GTID position '32-3200-25580091947'
          2024-04-12 3:09:29 61448 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61448 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61448 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
          548; GTID position '32-3200-25580091947'
          2024-04-12 3:09:29 61451 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61451 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61451 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
          548; GTID position '32-3200-25580091947'
          2024-04-12 3:09:29 61445 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61445 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
          2024-04-12 3:09:29 61445 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
          548; GTID position '32-3200-25580091947'

          =======

          [(none)] > xa recover format = 'SQL';
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

          formatID gtrid_length bqual_length data

          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

          1096044365 35 23 X'3138322e3139332e32312e3132302e746d313731323838393338393634383135363830',X'3138322e3139332e32312e3132302e746d353335393130',1096044365

          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          1 row in set (0.000 sec)

          [(none)] > show global variables like 'slave_transaction_retry_errors';
          ---------------------------------------------------------------------------------------------------------------------------+

          Variable_name Value

          ---------------------------------------------------------------------------------------------------------------------------+

          slave_transaction_retry_errors 1158,1159,1160,1161,1205,1213,1429,2013,12701,1158,1159,1160,1161,1205,1213,1429,2013,12701

          ---------------------------------------------------------------------------------------------------------------------------+
          1 row in set (0.001 sec)

          ===

          [(none)] > show slave status\G;

          Slave_IO_State: Queueing master event to the relay log
          Master_Host: xxxxx
          Master_User: repl
          Master_Port: 3xxx
          Connect_Retry: 5
          Master_Log_File: mysql-bin.002980
          Read_Master_Log_Pos: 322741798
          Relay_Log_File: mysql-relay.000002
          Relay_Log_Pos: 3526
          Relay_Master_Log_File: mysql-bin.002980
          Slave_IO_Running: Yes
          Slave_SQL_Running: No
          Replicate_Do_DB:
          Replicate_Ignore_DB:
          Replicate_Do_Table:
          Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
          Replicate_Wild_Ignore_Table:
          Last_Errno: 1397
          Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'efl_drive_db008'. Query: 'XA COMMIT X'3138322e3139332e32312e3132312e746d313731323838393338393539363138393937',X'3138322e3139332e32312e3132312e746d353431313539',1096044365'
          Skip_Counter: 0
          Exec_Master_Log_Pos: 70799936
          Relay_Log_Space: 99798356
          Until_Condition: None
          Until_Log_File:
          Until_Log_Pos: 0
          Master_SSL_Allowed: No

          Tony Tran Tran Van Thanh added a comment - - edited As you can see the logfile: ====== [(none)] > show slave status\G; Slave_IO_State: Waiting for master to send event Master_Host: xxxxxxxxxxx Master_User: repl Master_Port: xxxx Connect_Retry: 5 Master_Log_File: mysql-bin.0029xx Read_Master_Log_Pos: 468143153 Relay_Log_File: mysql-relay.00xx Relay_Log_Pos: 28394209 Relay_Master_Log_File: mysql-bin.00xxx Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1205 Last_Error: Lock wait timeout exceeded; try restarting transaction Skip_Counter: 0 Exec_Master_Log_Pos: 70791548 Relay_Log_Space: 593076835 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No ========== 2024-04-12 3:09:29 61450 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. 2024-04-12 3:09:29 61450 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 32-3200-25580091948, Internal MariaDB error code: 1205 2024-04-12 3:09:29 61450 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205 2024-04-12 3:09:29 61450 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791 548; GTID position '32-3200-25580091947' 2024-04-12 3:09:29 61448 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61448 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61448 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791 548; GTID position '32-3200-25580091947' 2024-04-12 3:09:29 61451 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61451 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61451 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791 548; GTID position '32-3200-25580091947' 2024-04-12 3:09:29 61445 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61445 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 2024-04-12 3:09:29 61445 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791 548; GTID position '32-3200-25580091947' ======= [(none)] > xa recover format = 'SQL'; ----------- ------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------+ formatID gtrid_length bqual_length data ----------- ------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------+ 1096044365 35 23 X'3138322e3139332e32312e3132302e746d313731323838393338393634383135363830',X'3138322e3139332e32312e3132302e746d353335393130',1096044365 ----------- ------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) [(none)] > show global variables like 'slave_transaction_retry_errors'; ------------------------------- --------------------------------------------------------------------------------------------+ Variable_name Value ------------------------------- --------------------------------------------------------------------------------------------+ slave_transaction_retry_errors 1158,1159,1160,1161,1205,1213,1429,2013,12701,1158,1159,1160,1161,1205,1213,1429,2013,12701 ------------------------------- --------------------------------------------------------------------------------------------+ 1 row in set (0.001 sec) === [(none)] > show slave status\G; Slave_IO_State: Queueing master event to the relay log Master_Host: xxxxx Master_User: repl Master_Port: 3xxx Connect_Retry: 5 Master_Log_File: mysql-bin.002980 Read_Master_Log_Pos: 322741798 Relay_Log_File: mysql-relay.000002 Relay_Log_Pos: 3526 Relay_Master_Log_File: mysql-bin.002980 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1397 Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'efl_drive_db008'. Query: 'XA COMMIT X'3138322e3139332e32312e3132312e746d313731323838393338393539363138393937',X'3138322e3139332e32312e3132312e746d353431313539',1096044365' Skip_Counter: 0 Exec_Master_Log_Pos: 70799936 Relay_Log_Space: 99798356 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No

          People

            Unassigned Unassigned
            Tony Tran Tran Van Thanh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.