Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.5, 10.3.34, 10.8.5
-
None
-
3 node Galera multi-master cluster.
MariaDB 10.6.5 and Galera 26.4.9
Description
We are facing random and intermittently issue with our 3 node galera multi-master cluster since 10.2.
This MDEV-24915 seems to resolve many of the locking issues. However it still happens in 10.6.
When this BF lock issue happens, the affected node simply block/lock out the whole cluster and none of our clients can read/write.
The only way to get out of it is to kill the affected node and let it join via IST.
2022-03-17 20:08:27 6 [Note] InnoDB: WSREP: BF lock wait long for trx:0x1a62643 query: UPDATE queue |
SET status='working', |
host_uid='node-5', |
working_timestamp=UNIX_TIMESTAMP(),
|
id=LAST_INSERT_ID(id)
|
WHERE status='queued' AND (consumer_type=0 OR (7 & consumer_type = consumer_type)) AND (job_type_id IN (9, 10, 12, 15, 16, 18, 19, 20, 32, 24, 25, 28, 29, 30, 31, 33)) AND ((job_type_id IN (13) AND volume IN (4)) OR (job_type_id NOT IN (13))) AND ((job_type_id IN (14, 15, 26, 31) AND volume IN (4)) OR (job_type_id NOT IN (14, 15, 26, 31))) ORDER BY priority, id ASC LIMIT 1 |
2022-03-17 20:09:17 2 [Note] InnoDB: WSREP: BF lock wait long for trx:0x1a62644 query: UPDATE queue |
SET status='working', |
host_uid='node-1', |
working_timestamp=UNIX_TIMESTAMP(),
|
id=LAST_INSERT_ID(id)
|
WHERE status='queued' AND (consumer_type=0 OR (7 & consumer_type = consumer_type)) AND (job_type_id IN (9, 10, 12, 15, 16, 18, 19, 20, 32, 24, 25, 28, 29, 30, 31, 33)) AND ((job_type_id IN (13) AND volume IN (1)) OR (job_type_id NOT IN (13))) AND ((job_type_id IN (14, 15, 26, 31) AND volume IN (1)) OR (job_type_id NOT IN (14, 15, 26, 31))) ORDER BY priority, id ASC LIMIT 1 |
These errors keep looping itself.
Is this a known issue?