[MDEV-5914] Parallel replication deadlock due to InnoDB lock conflicts Created: 2014-03-20 Updated: 2014-07-11 Resolved: 2014-07-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.9 |
| Fix Version/s: | 10.0.13 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Kristian Nielsen | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | parallelslave, replication | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
The fundamental assumption with parallel replication is the following: If two Unfortunately, this assumption turns out to be invalid. Consider this table and two transactions T1, T2:
If T1 runs first and then T2, there is no blocking, and they can group commit Thus, the bug is when they run in T1,T2 order on the master, they group commit, Another example of this is with an UPDATE and a DELETE:
Two possible solutions are being considered: 1. Run the slaves in READ COMMITTED mode. This however means that binlog may 2. Modify InnoDB locking so that two transactions that run in parallel due to This bug is one of the problems reported in Here is a test case. It may need to be run multiple times to trigger the error:
|
| Comments |
| Comment by Kristian Nielsen [ 2014-03-21 ] |
|
I have pushed to 10.0 a patch that partially solves this. It runs the parallel slaves However, I will keep this bug open, as Jan has a potentially better patch |
| Comment by Kristian Nielsen [ 2014-04-11 ] |
|
See also the proposed solution to |
| Comment by Sergei Golubchik [ 2014-06-10 ] |
|
Was there any progress with the Jan's patch? |
| Comment by Kristian Nielsen [ 2014-06-10 ] |
|
Yes, the last of the patch series that I gave to you for review, it reverts my temporary solution and instead includes a modified version of Jan's patch for a better solution. |
| Comment by Kristian Nielsen [ 2014-06-27 ] |
|
In fact, the temporary patch with READ COMMITTED is incorrect and not safe. T1: INSERT INTO t1 VALUES (1); Then it is not safe to use READ COMMITTED. Because once T1 starts to commit, This can be solved with the Jan's patch. The transactions will not be |
| Comment by Patryk Pomykalski [ 2014-06-27 ] |
|
Isn't read committed always replicated in row format? |
| Comment by Kristian Nielsen [ 2014-06-27 ] |
|
> Isn't read committed always replicated in row format? That's not the issue. The issue is when transactions run in repeatable read on the master in statement mode. On the slave, we want to avoid deadlocks between transactions that we already know ran in parallel on the master and thus are known to not conflict with each other. We can do that by relaxing lock waits between only those transactions. There is a temporary patch that does this on the slave by using READ COMMITTED. This is incorrect, and will be replaced with a proper solution. Patch is just waiting for review. |
| Comment by Kristian Nielsen [ 2014-07-11 ] |
|
Pushed to 10.0.13. |