After upgrading to MariaDB-10.0.22-Galera we seem to trigger a condition which causes a complete lockup on the database. However the errorlog or innodb engine status do not seem to recognize it as such.
Setup: Three Centos6-x86_64 servers, two with MariaDB-10.0.22-Galera and one with the Garb deamon. The first of the MariaDB servers (Server1) is used for all application queries, the second one (Server2) for running incremental backups (every 5 minutes).
When we are running a performance test on the application (and thus create a simple query load on Server1) after some time the database will enter a locked state.
The conditions we have been able to isolate:
- Servers are running MariaDB-10.0.22-Galera (does not occur with 10.0.21)
- Servers are running in cluster mode
- Application queries run against Server1
- Backup queries run against Server2
The queries used by the backup software on Server2 are attached (backup.txt). The processlist of Server1 after occurance of the issue is attached (processlist.txt) and also a gdb backtrace from all threads (backtrace.txt) on Server1. Server2 has an empty processlist as this time and backups can still continue to run from Server2 while Server1 has entered this locked state. The locked state does not timeout, the only option for recovery is a mysqld restart.
Unfortunately I have not been able yet to create a testcase to reproduce the issue on an isolated system. I can however trigger the issue at will by running the application performance test in our setup so gather additonal information.
Any help to find and resolve this issue would be greatly appreciated.