[MDEV-18120] MariaDB deadlocked galera cluster Created: 2019-01-02  Updated: 2019-12-12  Resolved: 2019-12-12

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.0.37-galera, 10.2.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Dima Kovalenko Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Ubuntu 18, DigitalOcean, 4 node cluster, MaxScale



 Description   

I got a notification from DigitalOcean that one of the nodes in my cluster was on a blade with faulty hardware. They decided to hot migrate my VM to a different blade.

After migration was completed, the node never "rejoined the cluster" instead acted idle. After a day of being idle, the whole cluster went into a sync deadlock.

Read actions were still allowed, however, write actions would get stuck in the "Committing" status and never finish. Rebooting the offending node did not fix the issue because the remaining nodes were still stuck in sync deadlock. I had to reboot the whole cluster to get the read/write to work again.

1) I cannot replicate this issue, because I do not have the ability to mess with network/VM settings on the back end like the DigitalOcean admins can.
2) There are no stack traces in the logs, no errors, so I cannot show more information.
3) Doing "service mysql stop" would not kill wsrep PID, i had to kill -9 it myself
4) I have a large DB (300GB) the nodes got so out of sync that after reboot, the whole DB had to be rsynced over to the new node. This took 2x longer than initially when i added the nodes.



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-12-12 ]

Support for 10.0-galera has ended.

Generated at Thu Feb 08 08:41:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.