[MDEV-5941] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR Created: 2014-03-24 Updated: 2014-07-11 Resolved: 2014-07-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.10 |
| Fix Version/s: | 10.0.13 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Elena Stepanova | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | parallelslave, replication | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.
The test mainly executes DML, but involves DDL and FLUSH LOGS as well. Slave is not restarted during the test (neither the server, nor the logical slave). I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition. The following logs are attached:
Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):
|
| Comments |
| Comment by Kristian Nielsen [ 2014-03-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems to be these two statements that conflict with one another; these
Here is the content of the table after replaying the binlogs up to this point:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-03-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problem is probably caused by different execution plan on slave compared The first update can apparently use two different execution plans. One that When using the index, the first UPDATE does not conflict with the second, However, when the index is not used, there is conflict with the second UPDATE, This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in
Unfortunately, this is a very serious issue, and I am not sure how it could be I am not sure that this requires using unsafe statements to trigger (like the | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-03-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jonas Oreland [ 2014-04-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Kristian, May I propose a solution to this problem (that I think might be useful also for a different scenario). Add counters so that DBA can see if this is a frequently occurring scenario. (one can also imagine have more heuristics for how to handle the parallelism assuming — An other scenario where a framework like above would be useful is to allow for Way of guessing could be
What do you think ? /Jonas | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-04-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jonas, > May I propose a solution to this problem (that I think might be useful also for a different scenario). The main problem is the need for a timeout. This timeout needs to be small However, we could actually detect the deadlock immediately without I think this is an interesting solution worth pursuing. I believe this is also > An other scenario where a framework like above would be useful is to allow for Yes, I agree. Since we already know the desired commit order, we can just It is an interesting idea for sure. Maybe the time to try it out is getting - Kristian. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-04-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So here is how I think this bug should be fixed: 1. First, 2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by This should solve this and other similar deadlocks due to the enforced commit | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Temporary re-assigned to Serg for review | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-07-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Pushed to 10.0.13. |