[MDEV-13333] Deadlock failure that does not occur elsewhere Created: 2017-07-16 Updated: 2020-08-25 Resolved: 2018-08-06 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.2, 10.3.0, 10.1.32 |
| Fix Version/s: | 10.1.36, 10.2.18, 10.3.10 |
| Type: | Bug | Priority: | Major |
| Reporter: | Kern Sibbald | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | regression | ||
| Environment: |
MariaDB installed using repo on your site to Ubuntu 16.04. |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
MariaDB gets the following deadlock error: localhost-dir: sql_create.c:837-5 Fill File table Query failed: INSERT INTO File When running the Bacula 9.0.1 regression script named: three-pool-virtual-test I am running all instances of MariaDB and MySQL out of the box. I have changed no parameters. This appears to be a false deadlock detection. Note, it is 100% reproducible. |
| Comments |
| Comment by Elena Stepanova [ 2017-07-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please describe step by step what one needs to do with this tool in order to reproduce the problem, assuming that one has cloned it from github and never had an installation before. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Your request sounds reasonable to me. I will prepare everything you will need, test it, and comment/document it. It will take a couple of days. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The instructions for repeating the problem are simpler than I thought. I have created a file named mariadb-bug and uploaded it to this issue. It is a Linux shell script that runs as non-root, which will download the current Bacula (including some minor modifications I made this morning to make your task easier) into a new subdirectory named "bacula". It will then setup a config file, build Bacula and attempt to run the test script that fails on MariaDB 10.2.7. If you have a new installation of MariaDB, You life will be easier if prior to running the script, you create the MariaDB database and user both named regress. The regress user should have full permissions for the regress database, and if you also give your self full permissions to access/modify the regress database. Otherwise the script will tell you how to correct it. Of course, you can run everything as root and none of the minor privilege problems will occur. If you run the script tests/three-pool-virtual-test with the environment variable REGRESS_DEBUG=1 you will see all the normal Bacula output plus some debug information. E.g. REGRESS_DEBUG=1 tests/three-pool-virtual-test | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried to build bacula with MariaDB 10.2.7 on docker image Ubuntu 16.04, but can not make it work so far,
Then I tried to apply this patch https://bugzilla.redhat.com/show_bug.cgi?id=1467706, but without success for now, will try again later | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, I saw that RedHat ran into that problem. It did not happen on the version I pulled from your binary repo. It appears to be a new difference that 10.2.7 has introduced since prior MariaDB versions that were compatible with MySQL. I suggest to comment out that line and judging from the problems RedHat had, you will need to either change the name of your library back to agree with the MySQL library name, or simply link the MySQL library name to yours. I am not sure why I did not have those problems – do you have several versions of 10.2.7? Comment out "reconnect variable: // mdb->m_instance.reconnect = 1; /* so connection does not timeout */ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
By the way, thanks for pointing me to the RedHat patch. I hadn't seen it. I will apply it here and if it makes both MySQL and MariaDB work, it will be a nice solution. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I used this instructions to install MariaDB 10.2.7 (https://downloads.mariadb.org/mariadb/repositories/#mirror=dotsrc&distro=Ubuntu&distro_release=xenial--ubuntu_xenial&version=10.2)
Then added packages libmariadb-dev and libacl1-dev and got that error when tried make setup. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The simplest way to change then test is: edit the original Bacula file that was downloaded However, that takes time because it rebuilds all of Bacula. The way I do it is: (from the regress directory) This is faster but the change is in the regress/build subtree and will be lost or overridden on the next "make setup". I think you are close to making it work. By the way the commands I used to load MariaDB were the same as yours but I only specified arch=amd64 and I also did | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran tests/three-pool-virtual-test and got this output and didn't find any sign of deadlocks
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Your results look consistent with the problem. The test failed, and if you run the test with: REGRESS_DEBUG=1 tests/three-pool-virtual-test and capture the output. You will find in that output that a Bacula backup job failed because of what MariaDB says is a deadlock. I.e. you will see the message that I posted in the original bug submission. On all other systems the test runs and reports that it succeeded. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
please find output from REGRESS_DEBUG=1 tests/three-pool-virtual-test attached, still no deadlocks there. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The first output you showed as a non-attachment is identical to the errors I have been seeing. There seem to be 4 uploads, but only 2 of them can be accessed, and neither represents a failure. Can you explain the difference between your first execution of Bacula where the error shows up and the executions that correspond to the two outputs that I could examine? What is surprising is that your first job produces the following: !!!!! three-pool-virtual-test failed!!! 07:29:09 00:00:12 12s !!!!! Status: zombie=0 backup=2 restore=0 diff=0 verify=0 !!! Bad termination status !!! Status: backup=2 restore=0 diff=0 verify=0 Test owner of bacula-127.0.0.1 is my-name@domain.com Which is Bacula failing during a backup. I just tried rebuilding the source and re-running the test, and surprisingly the test succeeds. I really do not understand what was going on, because previously it would always fail. Now it runs. I will try a few more tests to see if I can come up with something, and then get back to you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To try to get back to the starting point, I removed MariaDB. Since then I have been unable to re-install it. There are multiple different errors that show up, and now with the new upstartd trying to resolve problems starting daemons is complicated. At this point, I give up. I will attempt to reinstall MySQL, and stay with it until some distribution has worked out the problems with Bacula running with MariaDB 10.2.7. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sorry, I attached by mistake the same file 4 times, then removed 3 of them) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After removing mariadb-server and mariadb-client, I was unable to re-install them. No matter what I did it failed, and I am reasonably familiar with coaching apt-get and dpkgs along when there are problems. I purged the mariadb-server and mariadb-client and deleted the database directory, then reinstalled MySQL, which installed and runs perfectly fine. It is the following version: mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper I still find it very odd that your very first job failed with what looks identical to the failure I saw. All the other Jobs you posted did not fail, and much to my amazement, MariaDB stopped failing here too. Was there a change in your Ubuntu package in the past couple of days, because during my testing I very likely did an upgrade, which might have pulled a newer (or different) version of MariaDB. In any case, I would suggest that someone other than me installs MariaDB on a Ubuntu 16.04, then purges it and removes the /var/lib/mysql directory then attempt to reinstall it. Here it failed, but it could be particular to my site. Thanks for your quick response to my ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
it looks, that I finally got those deadlocks, I will investigate more and update later. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kern Sibbald [ 2017-07-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you. That is good news, because there is problem. It is not so good for you, because, at least, for me intermittent problems are difficult to resolve. If you resolve it, I'll be happy to try again to re-install the fixed version. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-07-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mdev13333.test | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-07-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
svoj, I'm not sure whether it's InnoDB or locking to blame. Deadlock detection is normally an InnoDB thing, but strangely it does not show up in ENGINE INNODB STATUS, and MySQL 5.7 which has InnoDB 5.7 does not exhibit this behavior. So, I think maybe it's the server locking changes that have made the difference. Could you please take the first look at it, and if it turns out to be InnoDB's fault, reassign it to jplindst? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2018-05-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Kern Sibbald were you able to find any workaround for this? See | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2018-07-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Problem is reproducible on MariaDB 10.1-10.3 Simplified test case: ( please use --repeat=N)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-08-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For 10.2 did you use innodb_lock_schedule_algorithm = FCFS ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-08-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Repeatable using MariaDB 10.1.35 and confirmed it is InnoDB deadlock. Not repeatable with MySQL 5.6.38 with both test cases using --repeat=2000 both with release builds. In MariaDB it does not matter what innodb-lock-schedule-algorithm value is. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-08-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok found a reason we execute in bad luck wsrep code when we definitely should not. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2018-08-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2018-08-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
What's the release date of 10.1.36? I can see only 10.1.35 in the road map and it was supposed to be released on 2018-07-27. Thank you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2018-08-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've just checked, 10.1.36 is on the roadmap with the release date 2018-09-14 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe that this may have been caused by my code clean-up in 10.1. I got confused by the many WSREP-related predicates in the code. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-02-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This bug was independently introduced in MariaDB 10.2.2 when applying changes from MySQL 5.7.9. In MariaDB 10.1, the logic was only broken between MariaDB 10.1.32 and 10.1.35 (inclusive). |