Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.4.8
-
None
-
None
Description
Our MariaDB 11.4.8 + Galera 26.4.24 cluster hangs forever when parallel appliers deadlock on a WITHOUT OVERLAPS unique key.
The deadlock locks up the entire cluster because applier (BF) transactions are not permitted to be rolled back.
This took a long time to narrow down because our test cluster had higher latency between nodes than our production cluster, which made the bug impossible to reproduce in that environment. Fortunately the test suite runs multiple instances on the same machine which is a latency best case and we were able to use the mariadb test suite to reproduce this consistently.
I've attached two tests I wrote while dissecting this bug. The single server case isn't too bad, but the Galera situation is very problematic when we trigger it.
Based on the nature of the bug, I believe I've mitigated it in our own usage by setting: wsrep_slave_threads = 1
But it still seems like a notable issue that it can deadlock the entire cluster with valid queries.
In addition to 11.4.8, I've also verified this still affects git HEAD.