[MDEV-10863] parallel replication tries to continue from wrong position after stopping and restarting slave threads Created: 2016-09-21 Updated: 2020-08-25 Resolved: 2016-11-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.0.26 |
| Fix Version/s: | 10.0.29, 10.1.20 |
| Type: | Bug | Priority: | Major |
| Reporter: | Hartmut Holzgraefe | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | parallelslave, replication | ||
| Environment: |
linux |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file. How to reproduce: Master config:
Slave config:
Create replication user on the master:
Start slave:
On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)
Then run several instances of it in parallel:
Now on the slave run this script to repeatedly stop and restart the SQL thread:
This usually fails with a duplicate key error after a small number of iterations already |
| Comments |
| Comment by Hartmut Holzgraefe [ 2016-09-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We assumed that this might already have been fixed with http://lists.askmonty.org/pipermail/commits/2016-January/008828.html but it turns out that it isn't ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-09-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Full gtid_% variables on slave after running into duplicate key error with attached logs:
Already replicated tables and their sizes:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-09-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The correct position to continue from in hartmut-3510-relay-bin.000038 would have been 81695:
which matches the information for domain 2 in gtid_current_pos / gtid_slave_pos which shows GTID 2-1-22 as last applied for this domain. I don't fully understand the code behind this yet, but my blind educated guess is that the last applied transaction GTID 2-1-22 is actually being searched for first, and only then 2-1-23, and as the beginning event of transaction 2-1-22 was in the previous, already purged, binlog hartmut-3510-relay-bin.000037, it can't be found in the current binlog and so the position is never set correctly, instead just remains at the beginning of .000038, and replication wrongfully continues from there. (Or at least mysqlbinlog doesn't show any trace of GTID 2-1-22, not sure whether its output is complete in that respect though) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is a summary I wrote elsewhere last week. Unfortunately recent tests have shown that the proposed cause in the first paragraph can't be the complete story. More about this in the next comment ... — Original Text — When a Transactions is being spread across a relay log border Now, as far as I can tell, we have two ways to solve this: 1) Change relay log copying logic so that transactions will It will require a) some extra logic in the IO thread to be 2) Change repositioning logic. Right now it simply repositions So by extending the repositioning logic to not simply rewind to Then take the position of this event as the new start position. 3) Change GTID filtering logic to completely ignore binlog/relaylog | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-11-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Further testing has shown that shown that the "how to reproduce" instructions from above only lead to duplicate key errors on sufficiently capable hardware. E.g. I can reproduce this very quickly when replicating from my desktop (6 cores) to my laptop (4 cores), or when replicating between my two public web servers (8 cores each – or actually 4 real ones plus hyper threading), When trying the same steps, but using two Virtualbox VMs as master and slave on my 6 core machine, and giving each VM 3 virtual cores, the test can run for hours though without failing at all. So this doesn't only seem to be about relay log splitting at the "wrong" place, but there also seems to be some race condition that needs to be hit ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ouch, as Hartmut says this is the following code in handle_slave_sql():
So it always restarts at the beginning of a relay log, which can easily be Probably proposed solution (3) should be the right fix. It would be nice to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-11-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Kristian: do you have an idea though why this only seems to be reproducible on sufficiently powerful hardware (e.g. only when replicating between real machines, not between Virtualbox VMs?) If it were only about relay log being split in the middle of an event group I'd expect it to happen regardless of "machine power", and just being related to event group sizes vs. max relaylog size? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's hard to say why you could not reproduce, your test setup reproduced Here is a mysql-test-run testcase that seems to reproduce it reliably.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fix: http://lists.askmonty.org/pipermail/commits/2016-November/010061.html Lixun, do you want to review the patch before I push it into 10.0?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-11-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks good, I ran my test setup for almost two days without any replication breaks | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Lixun Peng [ 2016-11-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
knielsen I'm not sure if this logic always be right under parallels replication. Do you think keep the previous relay log file is a better way? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> I checked your patch, the main idea is ignore the first part without a The way the code works is, it scans over the earlier GTIDs in the relay log But the code starts scanning at the beginning of a relay log, which can be We can be sure that any GTID from an earlier relay log was already applied, An alternative might be to keep track of the position of the first GTID (or > I'm not sure if this logic always be right under parallels replication. Yes. This code which handles restarting the SQL thread is really I think the better overall approach would be that the SQL thread was able to (And it would still be necessary to handle partiel event groups at the start > Do you think keep the previous relay log file is a better way? I mean, if I don't see how this will help. There might still be another earlier Thanks,
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Lixun Peng [ 2016-11-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Kritian, I think you can merge your code into the master branch. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> I think you can merge your code into the master branch. Ok, will do.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2016-11-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Pushed to 10.0. Thanks to Hartmut for an excellent bug report. Detailed analysis, and a | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2016-11-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Actually most of it was originally done by Jean-François, the only part really done by me was the PHP generator script that made hitting the situation described by J-F much more likely ... |