MariaDB installed using repo on your site to Ubuntu 16.04.
Description
MariaDB gets the following deadlock error:
localhost-dir: sql_create.c:837-5 Fill File table Query failed: INSERT INTO File
(FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT
batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat,
batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get
lock;
When running the Bacula 9.0.1 regression script named: three-pool-virtual-test
This does not occur on any version of MySQL, nor the Ubuntu version 10.0 of MariaDB. The code has been stable for many years.
I am running all instances of MariaDB and MySQL out of the box. I have changed no parameters.
This appears to be a false deadlock detection. Note, it is 100% reproducible.
Please describe step by step what one needs to do with this tool in order to reproduce the problem, assuming that one has cloned it from github and never had an installation before.
Elena Stepanova
added a comment - Please describe step by step what one needs to do with this tool in order to reproduce the problem, assuming that one has cloned it from github and never had an installation before.
Your request sounds reasonable to me. I will prepare everything you will need, test it, and comment/document it. It will take a couple of days.
Kern Sibbald
added a comment - Your request sounds reasonable to me. I will prepare everything you will need, test it, and comment/document it. It will take a couple of days.
The instructions for repeating the problem are simpler than I thought.
I have created a file named mariadb-bug and uploaded it to this issue. It is a Linux shell script that runs as non-root, which will download the current Bacula (including some minor modifications I made this morning to make your task easier) into a new subdirectory named "bacula". It will then setup a config file, build Bacula and attempt to run the test script that fails on MariaDB 10.2.7.
If you have a new installation of MariaDB, You life will be easier if prior to running the script, you create the MariaDB database and user both named regress. The regress user should have full permissions for the regress database, and if you also give your self full permissions to access/modify the regress database. Otherwise the script will tell you how to correct it.
Of course, you can run everything as root and none of the minor privilege problems will occur.
If you run the script tests/three-pool-virtual-test with the environment variable REGRESS_DEBUG=1 you will see all the normal Bacula output plus some debug information. E.g.
REGRESS_DEBUG=1 tests/three-pool-virtual-test
Kern Sibbald
added a comment - The instructions for repeating the problem are simpler than I thought.
I have created a file named mariadb-bug and uploaded it to this issue. It is a Linux shell script that runs as non-root, which will download the current Bacula (including some minor modifications I made this morning to make your task easier) into a new subdirectory named "bacula". It will then setup a config file, build Bacula and attempt to run the test script that fails on MariaDB 10.2.7.
If you have a new installation of MariaDB, You life will be easier if prior to running the script, you create the MariaDB database and user both named regress. The regress user should have full permissions for the regress database, and if you also give your self full permissions to access/modify the regress database. Otherwise the script will tell you how to correct it.
Of course, you can run everything as root and none of the minor privilege problems will occur.
If you run the script tests/three-pool-virtual-test with the environment variable REGRESS_DEBUG=1 you will see all the normal Bacula output plus some debug information. E.g.
REGRESS_DEBUG=1 tests/three-pool-virtual-test
Alice Sherepa
added a comment - I tried to build bacula with MariaDB 10.2.7 on docker image Ubuntu 16.04, but can not make it work so far,
got an error when building:
/bacula/regress/build/libtool --silent --tag=CXX --mode=link /usr/bin/g++ -o libbaccats.la cats_null.lo -export-dynamic -rpath /bacula/regress/bin -release 9.0.2
mysql.c: In member function 'virtual bool BDB_MYSQL::bdb_open_database(JCR*)':
mysql.c:261:20: error: 'MYSQL {aka struct st_mysql}' has no member named 'reconnect'
mdb->m_instance.reconnect = 1; /* so connection does not timeout */
^
Then I tried to apply this patch https://bugzilla.redhat.com/show_bug.cgi?id=1467706 , but without success for now, will try again later
Yes, I saw that RedHat ran into that problem. It did not happen on the version I pulled from your binary repo. It appears to be a new difference that 10.2.7 has introduced since prior MariaDB versions that were compatible with MySQL. I suggest to comment out that line and judging from the problems RedHat had, you will need to either change the name of your library back to agree with the MySQL library name, or simply link the MySQL library name to yours. I am not sure why I did not have those problems – do you have several versions of 10.2.7?
Comment out "reconnect variable:
// mdb->m_instance.reconnect = 1; /* so connection does not timeout */
Kern Sibbald
added a comment - Yes, I saw that RedHat ran into that problem. It did not happen on the version I pulled from your binary repo. It appears to be a new difference that 10.2.7 has introduced since prior MariaDB versions that were compatible with MySQL. I suggest to comment out that line and judging from the problems RedHat had, you will need to either change the name of your library back to agree with the MySQL library name, or simply link the MySQL library name to yours. I am not sure why I did not have those problems – do you have several versions of 10.2.7?
Comment out "reconnect variable:
// mdb->m_instance.reconnect = 1; /* so connection does not timeout */
By the way, thanks for pointing me to the RedHat patch. I hadn't seen it. I will apply it here and if it makes both MySQL and MariaDB work, it will be a nice solution.
Kern Sibbald
added a comment - By the way, thanks for pointing me to the RedHat patch. I hadn't seen it. I will apply it here and if it makes both MySQL and MariaDB work, it will be a nice solution.
Then added packages libmariadb-dev and libacl1-dev and got that error when tried make setup.
When I change file mysql.c and then run make setup, an error appears again and file is as it was before change, like it is copied from somewhere else.
Alice Sherepa
added a comment - I used this instructions to install MariaDB 10.2.7 ( https://downloads.mariadb.org/mariadb/repositories/#mirror=dotsrc&distro=Ubuntu&distro_release=xenial--ubuntu_xenial&version=10.2 )
sudo apt-get install software-properties-common
sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
sudo add-apt-repository 'deb [arch=amd64,i386,ppc64el] http://mirrors.dotsrc.org/mariadb/repo/10.2/ubuntu xenial main'
sudo apt update
sudo apt install mariadb-server
Then added packages libmariadb-dev and libacl1-dev and got that error when tried make setup.
When I change file mysql.c and then run make setup, an error appears again and file is as it was before change, like it is copied from somewhere else.
edit the original Bacula file that was downloaded
cd regress
make setup
tests/...
However, that takes time because it rebuilds all of Bacula. The way I do it is:
(from the regress directory)
cd build/src/cats
(edit mysql.c)
make
make install
cd (back to regress)
tests/...
This is faster but the change is in the regress/build subtree and will be lost or overridden on the next "make setup".
I think you are close to making it work.
By the way the commands I used to load MariaDB were the same as yours but I only specified arch=amd64 and I also did
sudo apt-get install mariadb-server mariadb-client
Kern Sibbald
added a comment - The simplest way to change then test is:
edit the original Bacula file that was downloaded
cd regress
make setup
tests/...
However, that takes time because it rebuilds all of Bacula. The way I do it is:
(from the regress directory)
cd build/src/cats
(edit mysql.c)
make
make install
cd (back to regress)
tests/...
This is faster but the change is in the regress/build subtree and will be lost or overridden on the next "make setup".
I think you are close to making it work.
By the way the commands I used to load MariaDB were the same as yours but I only specified arch=amd64 and I also did
sudo apt-get install mariadb-server mariadb-client
Your results look consistent with the problem. The test failed, and if you run the test with:
REGRESS_DEBUG=1 tests/three-pool-virtual-test
and capture the output. You will find in that output that a Bacula backup job failed because of what MariaDB says is a deadlock. I.e. you will see the message that I posted in the original bug submission. On all other systems the test runs and reports that it succeeded.
If the mariaDB server is not really getting a deadlock then there is some new error being reported that we have never seen before. Something is going wrong either in MariaDB or in our code. Since our code runs fine on prior MariaDB versions, on Postgresql, and on MySQL, for the moment I am assuming that the problem is on the MariaDB side. In addition, we simply print the message that MariaDB furnishes us: "Deadlock found when trying to get lock;"
Kern Sibbald
added a comment - Your results look consistent with the problem. The test failed, and if you run the test with:
REGRESS_DEBUG=1 tests/three-pool-virtual-test
and capture the output. You will find in that output that a Bacula backup job failed because of what MariaDB says is a deadlock. I.e. you will see the message that I posted in the original bug submission. On all other systems the test runs and reports that it succeeded.
If the mariaDB server is not really getting a deadlock then there is some new error being reported that we have never seen before. Something is going wrong either in MariaDB or in our code. Since our code runs fine on prior MariaDB versions, on Postgresql, and on MySQL, for the moment I am assuming that the problem is on the MariaDB side. In addition, we simply print the message that MariaDB furnishes us: "Deadlock found when trying to get lock;"
The first output you showed as a non-attachment is identical to the errors I have been seeing. There seem to be 4 uploads, but only 2 of them can be accessed, and neither represents a failure.
Can you explain the difference between your first execution of Bacula where the error shows up and the executions that correspond to the two outputs that I could examine?
What is surprising is that your first job produces the following:
=== Start three-pool-virtual-test at 07:28:56 ===
Test owner of bacula-127.0.0.1 is my-name@domain.com
Which is Bacula failing during a backup.
I just tried rebuilding the source and re-running the test, and surprisingly the test succeeds. I really do not understand what was going on, because previously it would always fail. Now it runs.
I will try a few more tests to see if I can come up with something, and then get back to you.
Kern Sibbald
added a comment - The first output you showed as a non-attachment is identical to the errors I have been seeing. There seem to be 4 uploads, but only 2 of them can be accessed, and neither represents a failure.
Can you explain the difference between your first execution of Bacula where the error shows up and the executions that correspond to the two outputs that I could examine?
What is surprising is that your first job produces the following:
=== Start three-pool-virtual-test at 07:28:56 ===
!!!!! three-pool-virtual-test failed!!! 07:29:09 00:00:12 12s !!!!!
Status: zombie=0 backup=2 restore=0 diff=0 verify=0
!!! Bad termination status !!!
Status: backup=2 restore=0 diff=0 verify=0
Test owner of bacula-127.0.0.1 is my-name@domain.com
Which is Bacula failing during a backup.
I just tried rebuilding the source and re-running the test, and surprisingly the test succeeds. I really do not understand what was going on, because previously it would always fail. Now it runs.
I will try a few more tests to see if I can come up with something, and then get back to you.
To try to get back to the starting point, I removed MariaDB. Since then I have been unable to re-install it. There are multiple different errors that show up, and now with the new upstartd trying to resolve problems starting daemons is complicated.
At this point, I give up.
I will attempt to reinstall MySQL, and stay with it until some distribution has worked out the problems with Bacula running with MariaDB 10.2.7.
Kern Sibbald
added a comment - To try to get back to the starting point, I removed MariaDB. Since then I have been unable to re-install it. There are multiple different errors that show up, and now with the new upstartd trying to resolve problems starting daemons is complicated.
At this point, I give up.
I will attempt to reinstall MySQL, and stay with it until some distribution has worked out the problems with Bacula running with MariaDB 10.2.7.
sorry, I attached by mistake the same file 4 times, then removed 3 of them)
I ran the same test, first time without REGRESS_DEBUG=1, second time with REGRESS_DEBUG=1 and sent output into a file, that is all.
when I built Bacula, I needed additional packages : libmariadb-dev, libacl1-dev, openssl
what version of Mysql will you use, please write if you succeed to build and run tests with it.
Alice Sherepa
added a comment - sorry, I attached by mistake the same file 4 times, then removed 3 of them)
I ran the same test, first time without REGRESS_DEBUG=1, second time with REGRESS_DEBUG=1 and sent output into a file, that is all.
when I built Bacula, I needed additional packages : libmariadb-dev, libacl1-dev, openssl
what version of Mysql will you use, please write if you succeed to build and run tests with it.
After removing mariadb-server and mariadb-client, I was unable to re-install them. No matter what I did it failed, and I am reasonably familiar with coaching apt-get and dpkgs along when there are problems.
I purged the mariadb-server and mariadb-client and deleted the database directory, then reinstalled MySQL, which installed and runs perfectly fine. It is the following version:
mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper
I still find it very odd that your very first job failed with what looks identical to the failure I saw. All the other Jobs you posted did not fail, and much to my amazement, MariaDB stopped failing here too. Was there a change in your Ubuntu package in the past couple of days, because during my testing I very likely did an upgrade, which might have pulled a newer (or different) version of MariaDB.
In any case, I would suggest that someone other than me installs MariaDB on a Ubuntu 16.04, then purges it and removes the /var/lib/mysql directory then attempt to reinstall it. Here it failed, but it could be particular to my site.
Thanks for your quick response to my ticket.
Kern Sibbald
added a comment - After removing mariadb-server and mariadb-client, I was unable to re-install them. No matter what I did it failed, and I am reasonably familiar with coaching apt-get and dpkgs along when there are problems.
I purged the mariadb-server and mariadb-client and deleted the database directory, then reinstalled MySQL, which installed and runs perfectly fine. It is the following version:
mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper
I still find it very odd that your very first job failed with what looks identical to the failure I saw. All the other Jobs you posted did not fail, and much to my amazement, MariaDB stopped failing here too. Was there a change in your Ubuntu package in the past couple of days, because during my testing I very likely did an upgrade, which might have pulled a newer (or different) version of MariaDB.
In any case, I would suggest that someone other than me installs MariaDB on a Ubuntu 16.04, then purges it and removes the /var/lib/mysql directory then attempt to reinstall it. Here it failed, but it could be particular to my site.
Thanks for your quick response to my ticket.
Thank you. That is good news, because there is problem. It is not so good for you, because, at least, for me intermittent problems are difficult to resolve. If you resolve it, I'll be happy to try again to re-install the fixed version.
Kern Sibbald
added a comment - Thank you. That is good news, because there is problem. It is not so good for you, because, at least, for me intermittent problems are difficult to resolve. If you resolve it, I'll be happy to try again to re-install the fixed version.
mdev13333.test Please find test case mdev13333 attached to reproduce the problem (it is only for reproducing/debugging, not for regression suite )
On MariaDB 10.2 it returns error " At line 75: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction" , but no errors on 10.1
Note that test is non-deterministic, so maybe it will need --repeat=N option
Alice Sherepa
added a comment - mdev13333.test Please find test case mdev13333 attached to reproduce the problem (it is only for reproducing/debugging, not for regression suite )
On MariaDB 10.2 it returns error " At line 75: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction" , but no errors on 10.1
Note that test is non-deterministic, so maybe it will need --repeat=N option
I'm not sure whether it's InnoDB or locking to blame. Deadlock detection is normally an InnoDB thing, but strangely it does not show up in ENGINE INNODB STATUS, and MySQL 5.7 which has InnoDB 5.7 does not exhibit this behavior. So, I think maybe it's the server locking changes that have made the difference. Could you please take the first look at it, and if it turns out to be InnoDB's fault, reassign it to jplindst?
Elena Stepanova
added a comment - svoj ,
I'm not sure whether it's InnoDB or locking to blame. Deadlock detection is normally an InnoDB thing, but strangely it does not show up in ENGINE INNODB STATUS , and MySQL 5.7 which has InnoDB 5.7 does not exhibit this behavior. So, I think maybe it's the server locking changes that have made the difference. Could you please take the first look at it, and if it turns out to be InnoDB's fault, reassign it to jplindst ?
Repeatable using MariaDB 10.1.35 and confirmed it is InnoDB deadlock. Not repeatable with MySQL 5.6.38 with both test cases using --repeat=2000 both with release builds. In MariaDB it does not matter what innodb-lock-schedule-algorithm value is.
Jan Lindström (Inactive)
added a comment - - edited Repeatable using MariaDB 10.1.35 and confirmed it is InnoDB deadlock. Not repeatable with MySQL 5.6.38 with both test cases using --repeat=2000 both with release builds. In MariaDB it does not matter what innodb-lock-schedule-algorithm value is.
What's the release date of 10.1.36? I can see only 10.1.35 in the road map and it was supposed to be released on 2018-07-27. Thank you.
azurit
added a comment - What's the release date of 10.1.36? I can see only 10.1.35 in the road map and it was supposed to be released on 2018-07-27. Thank you.
I believe that this may have been caused by my code clean-up in 10.1. I got confused by the many WSREP-related predicates in the code.
Marko Mäkelä
added a comment - I believe that this may have been caused by my code clean-up in 10.1. I got confused by the many WSREP-related predicates in the code.
In MariaDB 10.1, the logic was only broken between MariaDB 10.1.32 and 10.1.35 (inclusive).
Marko Mäkelä
added a comment - This bug was independently introduced in MariaDB 10.2.2 when applying changes from MySQL 5.7.9 .
In MariaDB 10.1, the logic was only broken between MariaDB 10.1.32 and 10.1.35 (inclusive).
Please describe step by step what one needs to do with this tool in order to reproduce the problem, assuming that one has cloned it from github and never had an installation before.