[MDEV-9224] Database lockup on flush in galera Created: 2015-12-01 Updated: 2015-12-24 Resolved: 2015-12-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Storage Engine - XtraDB |
| Affects Version/s: | 10.0.22-galera |
| Fix Version/s: | 10.1.11, 10.0.23-galera |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Thijs Houtenbos | Assignee: | Nirbhay Choubey (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | galera, upstream, xtradb | ||
| Attachments: |
|
| Sprint: | 10.0.23 |
| Description |
|
After upgrading to MariaDB-10.0.22-Galera we seem to trigger a condition which causes a complete lockup on the database. However the errorlog or innodb engine status do not seem to recognize it as such. Setup: Three Centos6-x86_64 servers, two with MariaDB-10.0.22-Galera and one with the Garb deamon. The first of the MariaDB servers (Server1) is used for all application queries, the second one (Server2) for running incremental backups (every 5 minutes). When we are running a performance test on the application (and thus create a simple query load on Server1) after some time the database will enter a locked state. The conditions we have been able to isolate:
The queries used by the backup software on Server2 are attached (backup.txt). The processlist of Server1 after occurance of the issue is attached (processlist.txt) and also a gdb backtrace from all threads (backtrace.txt) on Server1. Server2 has an empty processlist as this time and backups can still continue to run from Server2 while Server1 has entered this locked state. The locked state does not timeout, the only option for recovery is a mysqld restart. Unfortunately I have not been able yet to create a testcase to reproduce the issue on an isolated system. I can however trigger the issue at will by running the application performance test in our setup so gather additonal information. Any help to find and resolve this issue would be greatly appreciated. |
| Comments |
| Comment by Nirbhay Choubey (Inactive) [ 2015-12-22 ] |
|
seppo Can you please review the following patch? |
| Comment by Brad Jorgensen [ 2015-12-22 ] |
|
I think I'm experiencing the same issue when running innobackupex against my galera cluster. Basically what I know now is that the server locks up when xtrabackup runs its FTWRL. I did find that it is only a problem with more than one node in the cluster so for now I have to shut down all but one node to run a backup. All of our application traffic is currently directed to one node, however there are a few monitoring queries that write to a single table on every node in the cluster. For us, the problem arises regardless of which node the backup runs on. I wrote a message to the mailing list here that has some more information on my problem. |
| Comment by Nirbhay Choubey (Inactive) [ 2015-12-24 ] |
|
https://github.com/MariaDB/server/commit/fe4047dc39090f626408d91999dd4a8f0869ab13 |