[MDEV-28452] wsrep_ready: OFF after MDL BF-BF conflict Created: 2022-05-02 Updated: 2023-11-17 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.6.7, 10.7.3 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Rick Tuk | Assignee: | Seppo Jaakola |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | None | ||
| Environment: |
Ubuntu 20.04 LTS |
||
| Description |
|
We are running a 2 node + arbitrator cluster. Galera sets WSREP_READY to OFF after MDL BF-BF conflict on second node. logs:
SHOW CREATE TABLE for the tables mentioned in the logs:
This happened on our test environment (mariadb 10.7.3) and our acceptance environment (mariadb 10.6.7) |
| Comments |
| Comment by Karl Levik [ 2022-09-02 ] |
|
I'm seeing the same thing. I ran a couple of ALTER TABLE statements which succeeded on the node where I was running them, but they actually got stuck on the other two nodes in the cluster. I only saw this later when I did "SHOW PROCESSLIST;". wsrep_ready had gone to OFF, and the nodes stopped working. The .err files on the two broken nodes contained messages including: "WSREP: MDL BF-BF conflict" whereas the good node had messages such as: WSREP: MDL conflict db=name_of_database table=name_of_table ticket=3 solved by abort Only through a "kill -9" on the mariadbd process and subsequently restarting, which triggered an SST, was I able to get the two broken nodes back to a working state. |
| Comment by Luke Cousins [ 2022-11-21 ] |
|
We're having this problem roughly once per week. How can we get more debug information to share with you to help it get fixed? Same as MDEV-28180 I think |
| Comment by Uwe Beierlein [ 2022-12-13 ] |
|
We have this problem as soon as we alter a table or add an index. |
| Comment by Kin [ 2023-11-10 ] |
|
This is issue is not related to resources. But it happens only when the cluster is running more than one nodes. I hope there will be a fix for this. |
| Comment by Kin [ 2023-11-17 ] |
|
Update: We have given 1 CPU to each nodes. According to MariaDB we should be able to have twice the number for wsrep_slave_threads, but that also resulted in MDL conflicts. But when we set wsrep_slave_threads to 1, the issue is gone. We tried to run the same script several times and weren't able to reproduce the issue anymore. For me it not clear wether it is a CPU resource issue or some kind of race condition problem with the slave threads. |