Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.11
-
None
-
None
-
Production
Description
Hi,
we are currently on a 3-node Galera set up(Primary) with binlog replication happening to another 3-node Galera cluster (DR).
MariaDB - 10.6.11
Galera - wsrep_provider_version : 26.4.13 (rfe497aeb)
One of the nodes (node 3) mysqld.log showed this error, and all queries started to back up on all of the nodes in the entire cluster. Basically the cluster was not able to commit any transactions until we stop this bad node (node 3) that had the below error . After this the cluster resumed and other nodes were back to normal. Post the cluster was normal, I started mariadb back on the node that had an issue, the node joined back into the cluster, and everything was fine.
To sum up, this one node seemed to be having issues and caused the entire cluster to freeze. Here is the log of what happened at the time on this particular node.
2024-02-08 13:51:16 8 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d778f query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704891!▒▒e
2024-02-08 13:51:16 2 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7792 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704888!▒▒e
2024-02-08 13:51:16 9 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7790 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704962!▒▒e
2024-02-08 13:51:16 7 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7791 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704931!▒▒e
2024-02-08 13:52:06 8 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d778f query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704891!▒▒e
2024-02-08 13:52:06 2 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7792 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704888!▒▒e
2024-02-08 13:52:06 7 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7791 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704931!▒▒e
2024-02-08 13:52:06 9 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d7790 query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704962!▒▒e
2024-02-08 13:52:56 8 [Note] InnoDB: WSREP: BF lock wait long for trx:0x20d778f query: update TF_MS_MAPPING set REQUEST_STATUS=6,ACV_STATUS='COM',MAS_RETRY_STATUS=MAS_RETRY_STATUS+1,MAS_RETRY_DATETIME=sysdate(),ATF_MSG='IAM014 Transaction has completed successfully' where SERIAL_NO=3494704891!▒▒e
Is it something to do with the Galera cluster or with the update statements?
When we keep the cluster on single node (one node only) we dont get the above alert.
Kindly advice.