[MDEV-30423] Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions Created: 2023-01-17  Updated: 2023-10-27  Resolved: 2023-01-24

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.11
Fix Version/s: 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Blocker
Reporter: Pandikrishnan Gurusamy Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File perf_mon_20230112_033302.log    
Issue Links:
Blocks
Issue split
split to MDEV-30459 XID_cache_element can be modified aft... Open
Relates
relates to MDEV-21953 deadlock between BACKUP STAGE BLOCK_C... Closed

 Description   

We are seeing deadlocks on slave sql thread, during the backup, it causes the slave_sql_thread to stuck. Affected version is 10.6.11

show processlist;

==================================================

3994 system user   bankfrm Slave_worker 47515 Waiting for prior transaction to commit XA COMMIT ... 0.000
3996 system user   NULL Slave_worker 47515 Waiting for prior transaction to commit NULL 0.000
3995 system user   NULL Slave_worker 47515 Waiting for prior transaction to commit NULL 0.000
3997 system user   NULL Slave_worker 47515 Waiting for prior transaction to commit NULL 0.000
3991 system user   NULL Slave_SQL 44523 Waiting for room in worker thread event queue NULL 0.000
5114 ...... 10.93.99.158:52012 NULL Query 0 Optimizing SELECT Event_schema, Event_name FROM information_schema.EVENTS WHERE Status = 'ENABLED' 0.000
715112 ..oper localhost NULL Query 47515 Waiting for backup lock BACKUP STAGE BLOCK_COMMIT 0.000
724545 ....frm 10.93.97.49:44948 ....frm Query 2291 Waiting for backup lock XA ROLLBACK ... 0.000
751381 ....frm 10.93.97.50:46208 ....frm Query 1310 Waiting for backup lock XA ROLLBACK ... 0.000
752581 myoper localhost NULL Query 0 starting show processlist 0.000

==================================================

Show replica status\G

Connection_name:
Slave_SQL_State: Slave has read all relay log; waiting for more updates
Slave_IO_State: Waiting for master to send event
Master_Host: 10.93.99.101
Master_User: ......
Master_Port: 6603
Connect_Retry: 10
Master_Log_File: bin_log.001019
Read_Master_Log_Pos: 30985295
Relay_Log_File: relay_log.000131
Relay_Log_Pos: 2570503
Relay_Master_Log_File: bin_log.001019
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2570206
Relay_Log_Space: 31275705
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 162
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-2-6301093730
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Slave_DDL_Groups: 46
Slave_Non_Transactional_Groups: 19
Slave_Transactional_Groups: 23249343
Retried_transactions: 1
Max_relay_log_size: 268435456
Executed_log_entries: 86984839
Slave_received_heartbeats: 78647
Slave_heartbeat_period: 5.000
Gtid_Slave_Pos: 1-2-6301025975

==================================================

------------------------------------------

WhoLocksWho

------------------------------------------

Thread 715112 IS LOCKED BY Thread 715112
Thread 715112 IS LOCKED BY Thread 3994
Thread 715112 IS LOCKED BY Thread 3993
Thread 3993 IS LOCKED BY Thread 715112
Thread 3993 IS LOCKED BY Thread 3994
Thread 3993 IS LOCKED BY Thread 3993

------------------------------------------



 Comments   
Comment by Andrei Elkin [ 2023-01-19 ]

Howdy Brandon.

The patch is pushed {{012c8120399 HEAD -> bb-10.5-andrei }} having passed
only regression tests.
Please take on review sooner while I'll be watching BB processing.

Cheers,

Andrei

Comment by Brandon Nesterenko [ 2023-01-23 ]

Patch is approved. Discussion of findings took place via Slack.

Thanks, Andrei!

Generated at Thu Feb 08 10:16:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.