[MDEV-26316] quorum is lost on full or bigger resync using wsrep on galera multi-master Created: 2021-08-06 Updated: 2021-12-09 Resolved: 2021-12-09 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, wsrep |
| Affects Version/s: | 10.5.11, 10.5 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jaroslav | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
We hit disaster scenario unexpectedly on our multi-master galera setup running 10.5 version. During recovery we were able to bring first node up without issues up. It always started as primary with quorum and allowed another node to join cluster and was safe to bootstrap
To prevent any data mismatch and have the restore faster we recreated disk to have node-1 blank and join cluster as new one. This worked fine, the process started to full sync but after a while (~19-25GB) of data sync the whole node crashed and in logs we could see that QUORUM was lost and both nodes didn't know who is the master now. The sync stopped and nodes never joined back to form any kind of cluster. Error log in attachment where this was thrown
|
| Comments |
| Comment by Jaroslav [ 2021-08-11 ] |
|
This can be closed (deleted). It was caused by our livenessprobe to kick too soon. |