Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
Hi team,
After upgrade mariadb 10.3.16 to 10.6.15 version. We have some slave instances with over 1.2 TB size usually occur: 'Lock wait timeout exceeded' event - Error code 1205. Replicate can not sync after event occured.
We see 'slave-transacsion-retry-error' parameter already set 1205 to list. So can you help me check it?
Attachments
Activity
using XA transactions (XA PREPARE ...) ==> Yes, right. We are using and during replicate problem, It sometimes occur 'XA PREPARED' state.
As you can see the logfile:
======
[(none)] > show slave status\G;
Slave_IO_State: Waiting for master to send event
Master_Host: xxxxxxxxxxx
Master_User: repl
Master_Port: xxxx
Connect_Retry: 5
Master_Log_File: mysql-bin.0029xx
Read_Master_Log_Pos: 468143153
Relay_Log_File: mysql-relay.00xx
Relay_Log_Pos: 28394209
Relay_Master_Log_File: mysql-bin.00xxx
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1205
Last_Error: Lock wait timeout exceeded; try restarting transaction
Skip_Counter: 0
Exec_Master_Log_Pos: 70791548
Relay_Log_Space: 593076835
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
==========
2024-04-12 3:09:29 61450 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
2024-04-12 3:09:29 61450 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 32-3200-25580091948, Internal MariaDB error code: 1205
2024-04-12 3:09:29 61450 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
2024-04-12 3:09:29 61450 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
548; GTID position '32-3200-25580091947'
2024-04-12 3:09:29 61448 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61448 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61448 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
548; GTID position '32-3200-25580091947'
2024-04-12 3:09:29 61451 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61451 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61451 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
548; GTID position '32-3200-25580091947'
2024-04-12 3:09:29 61445 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61445 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2024-04-12 3:09:29 61445 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.002980' position 70791
548; GTID position '32-3200-25580091947'
=======
[(none)] > xa recover format = 'SQL';
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
formatID | gtrid_length | bqual_length | data |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1096044365 | 35 | 23 | X'3138322e3139332e32312e3132302e746d313731323838393338393634383135363830',X'3138322e3139332e32312e3132302e746d353335393130',1096044365 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)
[(none)] > show global variables like 'slave_transaction_retry_errors';
---------------------------------------------------------------------------------------------------------------------------+
Variable_name | Value |
---------------------------------------------------------------------------------------------------------------------------+
slave_transaction_retry_errors | 1158,1159,1160,1161,1205,1213,1429,2013,12701,1158,1159,1160,1161,1205,1213,1429,2013,12701 |
---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)
===
[(none)] > show slave status\G;
Slave_IO_State: Queueing master event to the relay log
Master_Host: xxxxx
Master_User: repl
Master_Port: 3xxx
Connect_Retry: 5
Master_Log_File: mysql-bin.002980
Read_Master_Log_Pos: 322741798
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 3526
Relay_Master_Log_File: mysql-bin.002980
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1397
Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'efl_drive_db008'. Query: 'XA COMMIT X'3138322e3139332e32312e3132312e746d313731323838393338393539363138393937',X'3138322e3139332e32312e3132312e746d353431313539',1096044365'
Skip_Counter: 0
Exec_Master_Log_Pos: 70799936
Relay_Log_Space: 99798356
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
You didn't mention it in your description, but the Label "XA" suggests you are using XA transactions (XA PREPARE ...) ?
If so, XA is known to be broken from 10.5, see MDEV-32020 and similar bugs.
I'd suggest to use stick to 10.4 for now in this case.