[MDEV-5914] Parallel replication deadlock due to InnoDB lock conflicts Created: 2014-03-20  Updated: 2014-07-11  Resolved: 2014-07-11

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.9
Fix Version/s: 10.0.13

Type: Bug Priority: Critical
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: parallelslave, replication

Issue Links:
Relates
relates to MDEV-6063 overlocking next-record in REPEATABLE... Closed
relates to MDEV-5863 "Lock wait timeout exceeded", lost ev... Closed
relates to MDEV-5941 Slave SQL: Error 'Lock wait timeout e... Closed

 Description   

The fundamental assumption with parallel replication is the following: If two
transactions T1 and T2 commit in parallel on the master (they are
group-committed together), then they can be executed in parallel on the slave
without any risk of conflicting locks. The slave will commit them in the same
order (eg. T1 first T2 second), so it is critical that T2 will not take any
locks that would block T1, or a deadlock occurs.

Unfortunately, this assumption turns out to be invalid.

Consider this table and two transactions T1, T2:

CREATE TABLE t1 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
T1: INSERT INTO t1 VALUES (7, NULL);
T2: DELETE FROM t1 WHERE b <= 3;

If T1 runs first and then T2, there is no blocking, and they can group commit
together. But if T2 runs first, then it takes a gap lock on the index on b
which blocks the insert of a row with B=NULL.

Thus, the bug is when they run in T1,T2 order on the master, they group commit,
slave tries to run them in parallel. T2 happens to take the gap lock first, T1
waits for T2 to commit, then T2 waits for T1 to commit -> deadlock.

Another example of this is with an UPDATE and a DELETE:

UPDATE t1 SET secondary=NULL WHERE primary=1
DELETE t1 WHERE secondary <= 3

Two possible solutions are being considered:

1. Run the slaves in READ COMMITTED mode. This however means that binlog may
not be serialised correctly if there are multiple multi-source master
connections and/or users doing direct updates on the slave, which happen to
run in parallel with conflicting gap locks.

2. Modify InnoDB locking so that two transactions that run in parallel due to
group commit on the master will not wait for the gap lock of each other, but
will still use the gap locks normally with respect to other
transactions. This however requires a rather risky modification of InnoDB
locking that needs to be fully assessed for correctness

This bug is one of the problems reported in MDEV-5863.

Here is a test case. It may need to be run multiple times to trigger the error:

--source include/have_innodb.inc
--source include/have_debug_sync.inc
--source include/have_debug_sync.inc
--source include/master-slave.inc
 
--connection slave
--source include/stop_slave.inc
 
--connection master
CREATE TABLE t1 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
 
# Create a group commit with INSERT and DELETE, in that order.
--connect (con1,127.0.0.1,root,,test,$SERVER_MYPORT_1,)
SET debug_sync='commit_after_release_LOCK_prepare_ordered SIGNAL master_queued1 WAIT_FOR master_cont1';
send INSERT INTO t1 VALUES (7, NULL);
--connection master
SET debug_sync='now WAIT_FOR master_queued1';
 
--connect (con2,127.0.0.1,root,,test,$SERVER_MYPORT_1,)
SET debug_sync='commit_after_release_LOCK_prepare_ordered SIGNAL master_queued2';
send DELETE FROM t1 WHERE b <= 3;
 
--connection master
SET debug_sync='now WAIT_FOR master_queued2';
SET debug_sync='now SIGNAL master_cont1';
 
--connection con1
REAP;
--connection con2
REAP;
SET debug_sync='RESET';
--save_master_pos
 
--connection slave
SET @old_parallel= @@GLOBAL.slave_parallel_threads;
SET GLOBAL slave_parallel_threads=16;
--source include/start_slave.inc
--sync_with_master
 
SELECT * FROM t1 ORDER BY a;
 
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=@old_parallel;
--source include/start_slave.inc
 
--connection master
DROP TABLE t1;
--source include/rpl_end.inc



 Comments   
Comment by Kristian Nielsen [ 2014-03-21 ]

I have pushed to 10.0 a patch that partially solves this. It runs the parallel slaves
in READ COMMITTED mode.

However, I will keep this bug open, as Jan has a potentially better patch
that more specifically relaxes gap locks just for the transactions run in
parallel. That may be the right long-term solution.

Comment by Kristian Nielsen [ 2014-04-11 ]

See also the proposed solution to MDEV-5941

Comment by Sergei Golubchik [ 2014-06-10 ]

Was there any progress with the Jan's patch?

Comment by Kristian Nielsen [ 2014-06-10 ]

Yes, the last of the patch series that I gave to you for review, it reverts my temporary solution and instead includes a modified version of Jan's patch for a better solution.

Comment by Kristian Nielsen [ 2014-06-27 ]

In fact, the temporary patch with READ COMMITTED is incorrect and not safe.
If we have transactions like:

T1: INSERT INTO t1 VALUES (1);
T2: INSERT INTO t2 SELECT * FROM t1;

Then it is not safe to use READ COMMITTED. Because once T1 starts to commit,
T2 is allowed to run. But T2 cannot see the changes of T1 until T1 has
finished committing. And in READ COMMITTED, T2 will not wait on the row lock
set by t1.

This can be solved with the Jan's patch. The transactions will not be
replicated with READ COMMITTED. Instead, we will only relax locks between
transactions that group committed together on the master. But T1 and T2 in the
example above conflict with each other on the inserted row, so they would not
be able to group commit together.

Comment by Patryk Pomykalski [ 2014-06-27 ]

Isn't read committed always replicated in row format?

Comment by Kristian Nielsen [ 2014-06-27 ]

> Isn't read committed always replicated in row format?

That's not the issue.

The issue is when transactions run in repeatable read on the master in statement mode. On the slave, we want to avoid deadlocks between transactions that we already know ran in parallel on the master and thus are known to not conflict with each other. We can do that by relaxing lock waits between only those transactions.

There is a temporary patch that does this on the slave by using READ COMMITTED. This is incorrect, and will be replaced with a proper solution. Patch is just waiting for review.

Comment by Kristian Nielsen [ 2014-07-11 ]

Pushed to 10.0.13.

Generated at Thu Feb 08 07:07:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.