[MDEV-22845] wsrep_sst_mariabackup fails with locking timeout at the end Created: 2020-06-09 Updated: 2020-07-08 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Galera SST, mariabackup |
| Affects Version/s: | 10.3.14 |
| Fix Version/s: | 10.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Nicolai Langfeldt | Assignee: | Vladislav Lesin |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | galera, innodb, lock | ||
| Environment: |
Centos 7.6.1810, packages from http://yum.mariadb.org/10.3/centos, server has ample CPU (load is ever low), and SSD disks |
||
| Description |
|
We have been running a galera cluster for well over a year. After user training and switching to SST via mariabackup we have had a very stable environment. Overnight yesterday one of the nodes fell out of the cluster. We could not find any explanation for that, the only thing we found whas that a SST started and it run some loops during the night. The other nodes (one maria, one garb) were happy and working well. In the morning it had stopped trying to SST and not understanding what went wrong we restarted it and looked around. We finaly looked in the mariabackup.backup.log file and found this at the end: {{ I experimented some with running "FLUSH TABLES WITH READ LOCK" on the running node and the command completed in 1-6 seconds, or just hung indefinitly, or failed at once with "Lock wait timeout exceeded; try restarting transaction". As far as we could gather from the mariabackup documentation mariabackup will attemt to aquire the lock and fail if it's not immediately able to get it. My colleague approached this from a different direction and he made one change to our /etc/my.cnf.d/galera.cnf file: inserting a "innodb_lock_wait_timeout=100" before I could propose adding {--ftwrl-wait-timeout=#} in the script. After this the SST completed without issue and we have lived happily ever after. We think that perhaps the possible need to wait for the lock should be part of the galera documentation. Also not waiting at all is maybe a bit hash? |