Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2.2, 10.3.0, 10.1.32
-
MariaDB installed using repo on your site to Ubuntu 16.04.
Description
MariaDB gets the following deadlock error:
localhost-dir: sql_create.c:837-5 Fill File table Query failed: INSERT INTO File
(FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT
batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat,
batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get
lock;
When running the Bacula 9.0.1 regression script named: three-pool-virtual-test
This does not occur on any version of MySQL, nor the Ubuntu version 10.0 of MariaDB. The code has been stable for many years.
I am running all instances of MariaDB and MySQL out of the box. I have changed no parameters.
This appears to be a false deadlock detection. Note, it is 100% reproducible.
Attachments
- mariadb-bug
- 1 kB
- mdev13333.test
- 2 kB
- output_b
- 6.44 MB
Issue Links
- is duplicated by
-
MDEV-16067 Invalid deadlock detection
-
- Closed
-
-
MDEV-16709 InnoDB: Error: trx already had an AUTO-INC lock
-
- Closed
-
- relates to
-
MDEV-17561 Deadlocks happen often when updating different rows
-
- Closed
-
-
MDEV-11080 InnoDB: Failing assertion: table->n_waiting_or_granted_auto_inc_locks > 0
-
- Closed
-
Activity
Your request sounds reasonable to me. I will prepare everything you will need, test it, and comment/document it. It will take a couple of days.
The instructions for repeating the problem are simpler than I thought.
I have created a file named mariadb-bug and uploaded it to this issue. It is a Linux shell script that runs as non-root, which will download the current Bacula (including some minor modifications I made this morning to make your task easier) into a new subdirectory named "bacula". It will then setup a config file, build Bacula and attempt to run the test script that fails on MariaDB 10.2.7.
If you have a new installation of MariaDB, You life will be easier if prior to running the script, you create the MariaDB database and user both named regress. The regress user should have full permissions for the regress database, and if you also give your self full permissions to access/modify the regress database. Otherwise the script will tell you how to correct it.
Of course, you can run everything as root and none of the minor privilege problems will occur.
If you run the script tests/three-pool-virtual-test with the environment variable REGRESS_DEBUG=1 you will see all the normal Bacula output plus some debug information. E.g.
REGRESS_DEBUG=1 tests/three-pool-virtual-test
I tried to build bacula with MariaDB 10.2.7 on docker image Ubuntu 16.04, but can not make it work so far,
got an error when building:
/bacula/regress/build/libtool --silent --tag=CXX --mode=link /usr/bin/g++ -o libbaccats.la cats_null.lo -export-dynamic -rpath /bacula/regress/bin -release 9.0.2
|
mysql.c: In member function 'virtual bool BDB_MYSQL::bdb_open_database(JCR*)':
|
mysql.c:261:20: error: 'MYSQL {aka struct st_mysql}' has no member named 'reconnect'
|
mdb->m_instance.reconnect = 1; /* so connection does not timeout */
|
^
|
Then I tried to apply this patch https://bugzilla.redhat.com/show_bug.cgi?id=1467706, but without success for now, will try again later
Yes, I saw that RedHat ran into that problem. It did not happen on the version I pulled from your binary repo. It appears to be a new difference that 10.2.7 has introduced since prior MariaDB versions that were compatible with MySQL. I suggest to comment out that line and judging from the problems RedHat had, you will need to either change the name of your library back to agree with the MySQL library name, or simply link the MySQL library name to yours. I am not sure why I did not have those problems – do you have several versions of 10.2.7?
Comment out "reconnect variable:
// mdb->m_instance.reconnect = 1; /* so connection does not timeout */
By the way, thanks for pointing me to the RedHat patch. I hadn't seen it. I will apply it here and if it makes both MySQL and MariaDB work, it will be a nice solution.
I used this instructions to install MariaDB 10.2.7 (https://downloads.mariadb.org/mariadb/repositories/#mirror=dotsrc&distro=Ubuntu&distro_release=xenial--ubuntu_xenial&version=10.2)
sudo apt-get install software-properties-common
|
sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
|
sudo add-apt-repository 'deb [arch=amd64,i386,ppc64el] http://mirrors.dotsrc.org/mariadb/repo/10.2/ubuntu xenial main'
|
|
sudo apt update
|
sudo apt install mariadb-server
|
Then added packages libmariadb-dev and libacl1-dev and got that error when tried make setup.
When I change file mysql.c and then run make setup, an error appears again and file is as it was before change, like it is copied from somewhere else.
The simplest way to change then test is:
edit the original Bacula file that was downloaded
cd regress
make setup
tests/...
However, that takes time because it rebuilds all of Bacula. The way I do it is:
(from the regress directory)
cd build/src/cats
(edit mysql.c)
make
make install
cd (back to regress)
tests/...
This is faster but the change is in the regress/build subtree and will be lost or overridden on the next "make setup".
I think you are close to making it work.
By the way the commands I used to load MariaDB were the same as yours but I only specified arch=amd64 and I also did
sudo apt-get install mariadb-server mariadb-client
I ran tests/three-pool-virtual-test and got this output and didn't find any sign of deadlocks
root@366e219453f5:/bacula/regress# tests/three-pool-virtual-test
|
|
|
=== Start three-pool-virtual-test at 07:28:56 ===
|
|
|
!!!!! three-pool-virtual-test failed!!! 07:29:09 00:00:12 12s !!!!!
|
Status: zombie=0 backup=2 restore=0 diff=0 verify=0
|
!!! Bad termination status !!!
|
Status: backup=2 restore=0 diff=0 verify=0
|
Test owner of bacula-127.0.0.1 is my-name@domain.com
|
MariaDB [(none)]> show processlist;
|
+-----+-------------+-----------+---------+---------+------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-----+-------------+-----------+---------+---------+------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 1 | system user | | NULL | Daemon | NULL | | NULL | 0.000 |
|
| 2 | system user | | NULL | Daemon | NULL | | NULL | 0.000 |
|
| 4 | system user | | NULL | Daemon | NULL | | NULL | 0.000 |
|
| 3 | system user | | NULL | Daemon | NULL | | NULL | 0.000 |
|
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
|
| 98 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
| 105 | regress | localhost | regress | Query | 0 | query end | INSERT INTO Job (Job,Name,Type,Level,JobStatus,SchedTime,JobTDate,ClientId,Comment) VALUES ('threepo | 0.000 |
|
+-----+-------------+-----------+---------+---------+------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
7 rows in set (0.00 sec)
|
|
MariaDB [(none)]> show global status like '%wait%';
|
+---------------------------------------+-------+
|
| Variable_name | Value |
|
+---------------------------------------+-------+
|
| Binlog_group_commit_trigger_lock_wait | 0 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_log_waits | 0 |
|
| Innodb_row_lock_current_waits | 0 |
|
| Innodb_row_lock_waits | 0 |
|
| Master_gtid_wait_count | 0 |
|
| Master_gtid_wait_time | 0 |
|
| Master_gtid_wait_timeouts | 0 |
|
| Table_locks_waited | 0 |
|
| Tc_log_page_waits | 0 |
|
+---------------------------------------+-------+
|
10 rows in set (0.00 sec)
|
Your results look consistent with the problem. The test failed, and if you run the test with:
REGRESS_DEBUG=1 tests/three-pool-virtual-test
and capture the output. You will find in that output that a Bacula backup job failed because of what MariaDB says is a deadlock. I.e. you will see the message that I posted in the original bug submission. On all other systems the test runs and reports that it succeeded.
If the mariaDB server is not really getting a deadlock then there is some new error being reported that we have never seen before. Something is going wrong either in MariaDB or in our code. Since our code runs fine on prior MariaDB versions, on Postgresql, and on MySQL, for the moment I am assuming that the problem is on the MariaDB side. In addition, we simply print the message that MariaDB furnishes us: "Deadlock found when trying to get lock;"
please find output from REGRESS_DEBUG=1 tests/three-pool-virtual-test attached, still no deadlocks there.
output_b
The first output you showed as a non-attachment is identical to the errors I have been seeing. There seem to be 4 uploads, but only 2 of them can be accessed, and neither represents a failure.
Can you explain the difference between your first execution of Bacula where the error shows up and the executions that correspond to the two outputs that I could examine?
What is surprising is that your first job produces the following:
=== Start three-pool-virtual-test at 07:28:56 ===
!!!!! three-pool-virtual-test failed!!! 07:29:09 00:00:12 12s !!!!!
Status: zombie=0 backup=2 restore=0 diff=0 verify=0
!!! Bad termination status !!!
Status: backup=2 restore=0 diff=0 verify=0
Test owner of bacula-127.0.0.1 is my-name@domain.com
Which is Bacula failing during a backup.
I just tried rebuilding the source and re-running the test, and surprisingly the test succeeds. I really do not understand what was going on, because previously it would always fail. Now it runs.
I will try a few more tests to see if I can come up with something, and then get back to you.
To try to get back to the starting point, I removed MariaDB. Since then I have been unable to re-install it. There are multiple different errors that show up, and now with the new upstartd trying to resolve problems starting daemons is complicated.
At this point, I give up.
I will attempt to reinstall MySQL, and stay with it until some distribution has worked out the problems with Bacula running with MariaDB 10.2.7.
sorry, I attached by mistake the same file 4 times, then removed 3 of them)
I ran the same test, first time without REGRESS_DEBUG=1, second time with REGRESS_DEBUG=1 and sent output into a file, that is all.
when I built Bacula, I needed additional packages : libmariadb-dev, libacl1-dev, openssl
what version of Mysql will you use, please write if you succeed to build and run tests with it.
After removing mariadb-server and mariadb-client, I was unable to re-install them. No matter what I did it failed, and I am reasonably familiar with coaching apt-get and dpkgs along when there are problems.
I purged the mariadb-server and mariadb-client and deleted the database directory, then reinstalled MySQL, which installed and runs perfectly fine. It is the following version:
mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper
I still find it very odd that your very first job failed with what looks identical to the failure I saw. All the other Jobs you posted did not fail, and much to my amazement, MariaDB stopped failing here too. Was there a change in your Ubuntu package in the past couple of days, because during my testing I very likely did an upgrade, which might have pulled a newer (or different) version of MariaDB.
In any case, I would suggest that someone other than me installs MariaDB on a Ubuntu 16.04, then purges it and removes the /var/lib/mysql directory then attempt to reinstall it. Here it failed, but it could be particular to my site.
Thanks for your quick response to my ticket.
it looks, that I finally got those deadlocks, I will investigate more and update later.
Thank you. That is good news, because there is problem. It is not so good for you, because, at least, for me intermittent problems are difficult to resolve. If you resolve it, I'll be happy to try again to re-install the fixed version.
mdev13333.test Please find test case mdev13333 attached to reproduce the problem (it is only for reproducing/debugging, not for regression suite )
On MariaDB 10.2 it returns error " At line 75: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction" , but no errors on 10.1
Note that test is non-deterministic, so maybe it will need --repeat=N option
svoj,
I'm not sure whether it's InnoDB or locking to blame. Deadlock detection is normally an InnoDB thing, but strangely it does not show up in ENGINE INNODB STATUS, and MySQL 5.7 which has InnoDB 5.7 does not exhibit this behavior. So, I think maybe it's the server locking changes that have made the difference. Could you please take the first look at it, and if it turns out to be InnoDB's fault, reassign it to jplindst?
Kern Sibbald were you able to find any workaround for this? See MDEV-16067 .
Problem is reproducible on MariaDB 10.1-10.3
Simplified test case: ( please use --repeat=N)
--source include/have_innodb.inc
|
|
CREATE TABLE tr (i2 int, i1 int); |
INSERT INTO tr VALUES(1,1); |
|
CREATE TABLE t3 (j1 int, j2 int, t2 varchar(5), n1 varchar(5)); |
INSERT INTO t3 VALUES (1697,2,'/b','g'); |
|
CREATE TABLE t4 (j1 int, j2 int, t2 varchar(5), n1 varchar(5)); |
INSERT INTO t4 VALUES (97,3,'/b','u'); |
|
CREATE TABLE t5 (j1 int, j2 int, t2 varchar(5), n1 varchar(5)); |
INSERT INTO t5 VALUES (97,1,'/b','u'); |
|
CREATE TABLE tt ( |
i3 int AUTO_INCREMENT PRIMARY KEY, |
j1 int, j2 int, i2 int, i1 int) ENGINE=InnoDB; |
|
--connect (con1,localhost,root,,test)
|
--connect (con2,localhost,root,,test)
|
--connect (con3,localhost,root,,test)
|
|
--connection con1
|
--send
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t4.j1, t4.j2, tr.i2, tr.i1 FROM t4 JOIN tr; |
|
--connection con2
|
--send
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t5.j1, t5.j2, tr.i2, tr.i1 FROM t5 JOIN tr; |
|
--connection con3
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t3.j1, t3.j2, tr.i2, tr.i1 FROM t3 JOIN tr; |
--disconnect con3
|
|
--connection con2
|
--reap
|
--disconnect con2
|
|
--connection con1
|
--reap
|
--disconnect con1
|
|
--connection default
|
DROP TABLE tr,tt,t3,t4,t5; |
MariaDB Version 10.1.35-MariaDB-debug (commit 36ea82617c1506532e863cb241296acc8b657243)
|
|
CREATE TABLE tr (i2 int, i1 int);
|
INSERT INTO tr VALUES(1,1);
|
CREATE TABLE t3 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t3 VALUES (1697,2,'/b','g');
|
CREATE TABLE t4 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t4 VALUES (97,3,'/b','u');
|
CREATE TABLE t5 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t5 VALUES (97,1,'/b','u');
|
CREATE TABLE tt (
|
i3 int AUTO_INCREMENT PRIMARY KEY,
|
j1 int, j2 int, i2 int, i1 int) ENGINE=InnoDB;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t4.j1, t4.j2, tr.i2, tr.i1 FROM t4 JOIN tr;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t5.j1, t5.j2, tr.i2, tr.i1 FROM t5 JOIN tr;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t3.j1, t3.j2, tr.i2, tr.i1 FROM t3 JOIN tr;
|
main.1_my 'innodb_plugin' [ fail ]
|
Test ended at 2018-07-02 20:11:15
|
|
CURRENT_TEST: main.1_my
|
mysqltest: At line 36: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction
|
|
##############################
|
CREATE TABLE tr (i2 int, i1 int);
|
INSERT INTO tr VALUES(1,1);
|
CREATE TABLE t3 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t3 VALUES (1697,2,'/b','g');
|
CREATE TABLE t4 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t4 VALUES (97,3,'/b','u');
|
CREATE TABLE t5 (j1 int, j2 int, t2 varchar(5), n1 varchar(5));
|
INSERT INTO t5 VALUES (97,1,'/b','u');
|
CREATE TABLE tt (
|
i3 int AUTO_INCREMENT PRIMARY KEY,
|
j1 int, j2 int, i2 int, i1 int) ENGINE=InnoDB;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t4.j1, t4.j2, tr.i2, tr.i1 FROM t4 JOIN tr;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t5.j1, t5.j2, tr.i2, tr.i1 FROM t5 JOIN tr;
|
INSERT INTO tt (j1, j2, i2, i1) SELECT t3.j1, t3.j2, tr.i2, tr.i1 FROM t3 JOIN tr;
|
main.1_my 'xtradb' [ fail ]
|
Test ended at 2018-07-02 20:11:17
|
|
CURRENT_TEST: main.1_my
|
mysqltest: At line 40: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction
|
For 10.2 did you use innodb_lock_schedule_algorithm = FCFS ?
Repeatable using MariaDB 10.1.35 and confirmed it is InnoDB deadlock. Not repeatable with MySQL 5.6.38 with both test cases using --repeat=2000 both with release builds. In MariaDB it does not matter what innodb-lock-schedule-algorithm value is.
Ok found a reason we execute in bad luck wsrep code when we definitely should not.
What's the release date of 10.1.36? I can see only 10.1.35 in the road map and it was supposed to be released on 2018-07-27. Thank you.
I've just checked, 10.1.36 is on the roadmap with the release date 2018-09-14
I believe that this may have been caused by my code clean-up in 10.1. I got confused by the many WSREP-related predicates in the code.
This bug was independently introduced in MariaDB 10.2.2 when applying changes from MySQL 5.7.9.
In MariaDB 10.1, the logic was only broken between MariaDB 10.1.32 and 10.1.35 (inclusive).
Please describe step by step what one needs to do with this tool in order to reproduce the problem, assuming that one has cloned it from github and never had an installation before.