[MDEV-29492] WSREP: Failed to apply write set (deadlock with mysqldump) Created: 2022-09-07  Updated: 2022-10-30

Status: Open
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.4.22
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Mark Reibert Assignee: Seppo Jaakola
Resolution: Unresolved Votes: 0
Labels: None
Environment:

MariaDB 10.4.22 / Galera 26.4.9 on Ubuntu 20.04



 Description   

Three-node Galera cluster running in master/slave mode. Intermittently on the slaves we observe wsrep fail to apply a writeset due to a deadlock at the moment a mysqldump backup kicks off. It is important to note we run mysqldump --single-transaction, hence the global read lock obtained by mysqldump exists only briefly. Restarting the node always results in a SST.

Here are the logs from one such failure:

2022-09-01  3:00:01 10 [Note] WSREP: MDL conflict db=zabbix table=history ticket=6 solved by abort
2022-09-01  3:00:01 11109709 [Note] WSREP: MDL conflict db=zabbix table=history ticket=6 solved by abort
2022-09-01  3:00:01 11109709 [Note] WSREP: MDL conflict db=zabbix table=history ticket=6 solved by abort
2022-09-01  3:00:01 11109709 [Note] WSREP: MDL conflict db=zabbix table=history ticket=6 solved by abort
2022-09-02  3:00:01 11198269 [Note] WSREP: MDL conflict db=zabbix table=history ticket=4 solved by abort
2022-09-02  3:00:01 10 [Note] WSREP: MDL conflict db=zabbix table=item_discovery ticket=6 solved by abort
2022-09-02  3:00:01 41 [Note] WSREP: MDL conflict db=zabbix table=item_discovery ticket=6 solved by abort
2022-09-02  3:00:01 11198269 [Note] WSREP: MDL conflict db=zabbix table=item_discovery ticket=6 solved by abort
2022-09-02  3:00:01 11198269 [Note] WSREP: MDL conflict db=zabbix table=item_discovery ticket=6 solved by abort
2022-09-02  3:00:01 11198269 [Note] WSREP: MDL conflict db=zabbix table=item_discovery ticket=6 solved by abort
2022-09-03  3:00:01 11286828 [Note] WSREP: MDL conflict db=zabbix table=history ticket=4 solved by abort
2022-09-03  3:00:01 11286828 [Note] WSREP: MDL conflict db=zabbix table=history ticket=4 solved by abort
2022-09-03  3:00:01 18 [Note] WSREP: MDL conflict db=zabbix table=history ticket=4 solved by abort
2022-09-03  3:00:01 50 [Note] WSREP: MDL conflict db=zabbix table=history ticket=6 solved by abort
2022-09-03  3:00:01 11286828 [Note] WSREP: MDL conflict db=zabbix table=history ticket=4 solved by abort
2022-09-03  3:00:01 11286828 [Note] WSREP: MDL conflict db=zabbix table=trends_uint ticket=4 solved by abort
2022-09-03  3:00:01 41 [Note] WSREP: MDL conflict db=zabbix table=trends ticket=6 solved by abort
2022-09-03  3:00:01 41 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1213, fatal: 0 wsrep = (exec_mode: 2 conflict_state: 0 seqno: 8250807862)
2022-09-03  3:00:01 41 [ERROR] Slave SQL: Error executing row event: 'Deadlock found when trying to get lock; try restarting transaction', Internal MariaDB error code: 1213
2022-09-03  3:00:01 41 [Warning] WSREP: Event 7 Write_rows_v1 apply failed: 1213, seqno 8250807862
2022-09-03  3:00:01 41 [ERROR] WSREP: Failed to apply write set: gtid: 810a5d0c-231c-11e9-b756-dbb7899d7dd0:8250807862 server_id: 9fc16e18-78a4-11ec-8921-daf10d0b3600 client_id: 71633335 trx_id: 47048095835 flags: 3 (start_transaction | commit)
2022-09-03  3:00:01 41 [Note] WSREP: Closing send monitor...
2022-09-03  3:00:01 41 [Note] WSREP: Closed send monitor.
2022-09-03  3:00:01 41 [Note] WSREP: gcomm: terminating thread
2022-09-03  3:00:01 41 [Note] WSREP: gcomm: joining thread
2022-09-03  3:00:01 41 [Note] WSREP: gcomm: closing backend
2022-09-03  3:00:01 41 [Note] WSREP: view(view_id(NON_PRIM,60f1bcc9-8702,254) memb {
60f1bcc9-8702,0
} joined {
} left {
} partitioned {
9fc16e18-8921,0
f6d62ce9-8fe8,0
})
2022-09-03  3:00:01 41 [Note] WSREP: PC protocol downgrade 1 -> 0
2022-09-03  3:00:01 41 [Note] WSREP: view((empty))
2022-09-03  3:00:01 41 [Note] WSREP: gcomm: closed
2022-09-03  3:00:01 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2022-09-03  3:00:01 0 [Note] WSREP: Flow-control interval: [2000000, 2000000]
2022-09-03  3:00:01 0 [Note] WSREP: Received NON-PRIMARY.
2022-09-03  3:00:01 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 8250807872)
2022-09-03  3:00:01 0 [Note] WSREP: New SELF-LEAVE.
2022-09-03  3:00:01 0 [Note] WSREP: Flow-control interval: [2000000, 2000000]
2022-09-03  3:00:01 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2022-09-03  3:00:01 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8250807872)
2022-09-03  3:00:01 0 [Note] WSREP: RECV thread exiting 0: Success

and so on, with other message indicating the node is leaving the cluster.



 Comments   
Comment by Mark Reibert [ 2022-09-07 ]

Is there a chance this is related to MDEV-26803 or MDEV-26818? I ask because of the common 1213 error. Note I am running 10.4.22 so do not have the fix for MDEV-26803, although I do have cert.optimistic_pa = no.

Generated at Thu Feb 08 10:09:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.