[MDEV-5914] Parallel replication deadlock due to InnoDB lock conflicts - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0.9
Fix Version/s: 10.0.13
Component/s: None
Labels:
- parallelslave
- replication

Description

The fundamental assumption with parallel replication is the following: If two
transactions T1 and T2 commit in parallel on the master (they are
group-committed together), then they can be executed in parallel on the slave
without any risk of conflicting locks. The slave will commit them in the same
order (eg. T1 first T2 second), so it is critical that T2 will not take any
locks that would block T1, or a deadlock occurs.

Unfortunately, this assumption turns out to be invalid.

Consider this table and two transactions T1, T2:

CREATE TABLE t1 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);

T1: INSERT INTO t1 VALUES (7, NULL);

T2: DELETE FROM t1 WHERE b <= 3;

If T1 runs first and then T2, there is no blocking, and they can group commit
together. But if T2 runs first, then it takes a gap lock on the index on b
which blocks the insert of a row with B=NULL.

Thus, the bug is when they run in T1,T2 order on the master, they group commit,
slave tries to run them in parallel. T2 happens to take the gap lock first, T1
waits for T2 to commit, then T2 waits for T1 to commit -> deadlock.

Another example of this is with an UPDATE and a DELETE:

UPDATE t1 SET secondary=NULL WHERE primary=1

DELETE t1 WHERE secondary <= 3

Two possible solutions are being considered:

1. Run the slaves in READ COMMITTED mode. This however means that binlog may
not be serialised correctly if there are multiple multi-source master
connections and/or users doing direct updates on the slave, which happen to
run in parallel with conflicting gap locks.

2. Modify InnoDB locking so that two transactions that run in parallel due to
group commit on the master will not wait for the gap lock of each other, but
will still use the gap locks normally with respect to other
transactions. This however requires a rather risky modification of InnoDB
locking that needs to be fully assessed for correctness

This bug is one of the problems reported in ~~MDEV-5863~~.

Here is a test case. It may need to be run multiple times to trigger the error:

--source include/have_innodb.inc

--source include/have_debug_sync.inc

--source include/have_debug_sync.inc

--source include/master-slave.inc

--connection slave

--source include/stop_slave.inc

--connection master

CREATE TABLE t1 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);

# Create a group commit with INSERT and DELETE, in that order.

--connect (con1,127.0.0.1,root,,test,$SERVER_MYPORT_1,)

SET debug_sync='commit_after_release_LOCK_prepare_ordered SIGNAL master_queued1 WAIT_FOR master_cont1';

send INSERT INTO t1 VALUES (7, NULL);

--connection master

SET debug_sync='now WAIT_FOR master_queued1';

--connect (con2,127.0.0.1,root,,test,$SERVER_MYPORT_1,)

SET debug_sync='commit_after_release_LOCK_prepare_ordered SIGNAL master_queued2';

send DELETE FROM t1 WHERE b <= 3;

--connection master

SET debug_sync='now WAIT_FOR master_queued2';

SET debug_sync='now SIGNAL master_cont1';

--connection con1

REAP;

--connection con2

REAP;

SET debug_sync='RESET';

--save_master_pos

--connection slave

SET @old_parallel= @@GLOBAL.slave_parallel_threads;

SET GLOBAL slave_parallel_threads=16;

--source include/start_slave.inc

--sync_with_master

SELECT * FROM t1 ORDER BY a;

--source include/stop_slave.inc

SET GLOBAL slave_parallel_threads=@old_parallel;

--source include/start_slave.inc

--connection master

DROP TABLE t1;

--source include/rpl_end.inc

Attachments

Issue Links

relates to

MDEV-6063 overlocking next-record in REPEATABLE-READ with delete from where x <= num

Closed

MDEV-5863 "Lock wait timeout exceeded", lost events, Assertion `!table || !table->in_use || ta ble->in_use == _current_thd()' failure upon parallel replicating

Closed

MDEV-5941 Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR

Closed

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2014-03-21 14:48

I have pushed to 10.0 a patch that partially solves this. It runs the parallel slaves
in READ COMMITTED mode.

However, I will keep this bug open, as Jan has a potentially better patch
that more specifically relaxes gap locks just for the transactions run in
parallel. That may be the right long-term solution.

Kristian Nielsen added a comment - 2014-03-21 14:48 I have pushed to 10.0 a patch that partially solves this. It runs the parallel slaves in READ COMMITTED mode. However, I will keep this bug open, as Jan has a potentially better patch that more specifically relaxes gap locks just for the transactions run in parallel. That may be the right long-term solution.

Kristian Nielsen added a comment - 2014-04-11 13:35

See also the proposed solution to ~~MDEV-5941~~

Kristian Nielsen added a comment - 2014-04-11 13:35 See also the proposed solution to MDEV-5941

Sergei Golubchik added a comment - 2014-06-10 22:07

Was there any progress with the Jan's patch?

Sergei Golubchik added a comment - 2014-06-10 22:07 Was there any progress with the Jan's patch?

Kristian Nielsen added a comment - 2014-06-10 22:14

Yes, the last of the patch series that I gave to you for review, it reverts my temporary solution and instead includes a modified version of Jan's patch for a better solution.

Kristian Nielsen added a comment - 2014-06-10 22:14 Yes, the last of the patch series that I gave to you for review, it reverts my temporary solution and instead includes a modified version of Jan's patch for a better solution.

Kristian Nielsen added a comment - 2014-06-27 14:47

In fact, the temporary patch with READ COMMITTED is incorrect and not safe.
If we have transactions like:

T1: INSERT INTO t1 VALUES (1);
T2: INSERT INTO t2 SELECT * FROM t1;

Then it is not safe to use READ COMMITTED. Because once T1 starts to commit,
T2 is allowed to run. But T2 cannot see the changes of T1 until T1 has
finished committing. And in READ COMMITTED, T2 will not wait on the row lock
set by t1.

This can be solved with the Jan's patch. The transactions will not be
replicated with READ COMMITTED. Instead, we will only relax locks between
transactions that group committed together on the master. But T1 and T2 in the
example above conflict with each other on the inserted row, so they would not
be able to group commit together.

Kristian Nielsen added a comment - 2014-06-27 14:47 In fact, the temporary patch with READ COMMITTED is incorrect and not safe. If we have transactions like: T1: INSERT INTO t1 VALUES (1); T2: INSERT INTO t2 SELECT * FROM t1; Then it is not safe to use READ COMMITTED. Because once T1 starts to commit, T2 is allowed to run. But T2 cannot see the changes of T1 until T1 has finished committing. And in READ COMMITTED, T2 will not wait on the row lock set by t1. This can be solved with the Jan's patch. The transactions will not be replicated with READ COMMITTED. Instead, we will only relax locks between transactions that group committed together on the master. But T1 and T2 in the example above conflict with each other on the inserted row, so they would not be able to group commit together.

Patryk Pomykalski added a comment - 2014-06-27 15:01

Isn't read committed always replicated in row format?

Patryk Pomykalski added a comment - 2014-06-27 15:01 Isn't read committed always replicated in row format?

Kristian Nielsen added a comment - 2014-06-27 16:12

> Isn't read committed always replicated in row format?

That's not the issue.

The issue is when transactions run in repeatable read on the master in statement mode. On the slave, we want to avoid deadlocks between transactions that we already know ran in parallel on the master and thus are known to not conflict with each other. We can do that by relaxing lock waits between only those transactions.

There is a temporary patch that does this on the slave by using READ COMMITTED. This is incorrect, and will be replaced with a proper solution. Patch is just waiting for review.

Kristian Nielsen added a comment - 2014-06-27 16:12 > Isn't read committed always replicated in row format? That's not the issue. The issue is when transactions run in repeatable read on the master in statement mode. On the slave, we want to avoid deadlocks between transactions that we already know ran in parallel on the master and thus are known to not conflict with each other. We can do that by relaxing lock waits between only those transactions. There is a temporary patch that does this on the slave by using READ COMMITTED. This is incorrect, and will be replaced with a proper solution. Patch is just waiting for review.

Kristian Nielsen added a comment - 2014-07-11 13:09

Pushed to 10.0.13.

Kristian Nielsen added a comment - 2014-07-11 13:09 Pushed to 10.0.13.

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2014-03-20 16:15

Updated:: 2014-07-11 13:09

Resolved:: 2014-07-11 13:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server