[MDEV-9224] Database lockup on flush in galera - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.0.22-galera
Fix Version/s: 10.1.11, 10.0.23-galera
Component/s: Galera, Storage Engine - XtraDB
Labels:
- galera
- upstream
- xtradb

Sprint:
10.0.23

Description

After upgrading to MariaDB-10.0.22-Galera we seem to trigger a condition which causes a complete lockup on the database. However the errorlog or innodb engine status do not seem to recognize it as such.

Setup: Three Centos6-x86_64 servers, two with MariaDB-10.0.22-Galera and one with the Garb deamon. The first of the MariaDB servers (Server1) is used for all application queries, the second one (Server2) for running incremental backups (every 5 minutes).

When we are running a performance test on the application (and thus create a simple query load on Server1) after some time the database will enter a locked state.

The conditions we have been able to isolate:

Servers are running MariaDB-10.0.22-Galera (does not occur with 10.0.21)
Servers are running in cluster mode
Application queries run against Server1
Backup queries run against Server2

The queries used by the backup software on Server2 are attached (backup.txt). The processlist of Server1 after occurance of the issue is attached (processlist.txt) and also a gdb backtrace from all threads (backtrace.txt) on Server1. Server2 has an empty processlist as this time and backups can still continue to run from Server2 while Server1 has entered this locked state. The locked state does not timeout, the only option for recovery is a mysqld restart.

Unfortunately I have not been able yet to create a testcase to reproduce the issue on an isolated system. I can however trigger the issue at will by running the application performance test in our setup so gather additonal information.

Any help to find and resolve this issue would be greatly appreciated.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mysql_repeat.sh
2015-12-17 15:32
0.4 kB
Nirbhay Choubey
Server1 backtrace.txt
2015-12-01 15:08
1.01 MB
Thijs Houtenbos
Server1 processlist.txt
2015-12-01 15:08
141 kB
Thijs Houtenbos
Server2 backup.txt
2015-12-01 15:08
26 kB
Thijs Houtenbos

Activity

Ascending order - Click to sort in descending order

Nirbhay Choubey (Inactive) added a comment - 2015-12-22 05:42

seppo Can you please review the following patch?
http://lists.askmonty.org/pipermail/commits/2015-December/008781.html

Nirbhay Choubey (Inactive) added a comment - 2015-12-22 05:42 seppo Can you please review the following patch? http://lists.askmonty.org/pipermail/commits/2015-December/008781.html

Brad Jorgensen added a comment - 2015-12-22 22:50 - edited

I think I'm experiencing the same issue when running innobackupex against my galera cluster. Basically what I know now is that the server locks up when xtrabackup runs its FTWRL. I did find that it is only a problem with more than one node in the cluster so for now I have to shut down all but one node to run a backup. All of our application traffic is currently directed to one node, however there are a few monitoring queries that write to a single table on every node in the cluster. For us, the problem arises regardless of which node the backup runs on.

I wrote a message to the mailing list here that has some more information on my problem.
https://lists.launchpad.net/maria-discuss/msg03178.html

Brad Jorgensen added a comment - 2015-12-22 22:50 - edited I think I'm experiencing the same issue when running innobackupex against my galera cluster. Basically what I know now is that the server locks up when xtrabackup runs its FTWRL. I did find that it is only a problem with more than one node in the cluster so for now I have to shut down all but one node to run a backup. All of our application traffic is currently directed to one node, however there are a few monitoring queries that write to a single table on every node in the cluster. For us, the problem arises regardless of which node the backup runs on. I wrote a message to the mailing list here that has some more information on my problem. https://lists.launchpad.net/maria-discuss/msg03178.html

Nirbhay Choubey (Inactive) added a comment - 2015-12-24 00:02

https://github.com/MariaDB/server/commit/fe4047dc39090f626408d91999dd4a8f0869ab13
https://github.com/MariaDB/server/commit/89a264809d660fb5a4e7d43e9324b1f529a3a1d7

Nirbhay Choubey (Inactive) added a comment - 2015-12-24 00:02 https://github.com/MariaDB/server/commit/fe4047dc39090f626408d91999dd4a8f0869ab13 https://github.com/MariaDB/server/commit/89a264809d660fb5a4e7d43e9324b1f529a3a1d7

People

Assignee:: Nirbhay Choubey (Inactive)

Reporter:: Thijs Houtenbos

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2015-12-01 15:09

Updated:: 2015-12-24 11:37

Resolved:: 2015-12-24 00:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration