[MDEV-6462] Slave replicating using GTID doesn't recover correctly when master crashes in the middle of transaction Created: 2014-07-19 Updated: 2014-09-03 Resolved: 2014-09-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.12 |
| Fix Version/s: | 10.0.14 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Pavel Ivanov | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gtid | ||
| Attachments: |
|
| Description |
|
When master crashes in the middle of writing a multi-statement transaction to binlog so that slave has already received some events from this transaction, then IO thread reconnects to the restarted master and thinks that it will re-download the same binlog events. But master will actually either not send anything (if no new transactions has executed), or will send completely different events from new transactions which will result in completely different data on the slave compared to data on the master. I'd bet the root cause of the problem is in how IO thread reconnects when GTID-based replication is turned on, and in these few lines of code starting at sql/slave.cc:5310:
In the scenario I described above SQL thread actually must roll back the active transaction. In the attachment is the patch that allows to emulate this scenario. Apply it, run rpl_gtid_crash test and look at the results of last two SELECTs – they will be different on master and slave. I will look into a way to fix this problem myself, but will appreciate any help. I'll attach a patch if I manage to find a fix before anyone on MariaDB side. |
| Comments |
| Comment by Pavel Ivanov [ 2014-07-21 ] | |||||||
|
Attaching a proposed fix together with the revised and expanded test case. The fix seem to work well, but please check if it's appropriate enough for non-standard use cases, e.g. for parallel slave. | |||||||
| Comment by Pavel Ivanov [ 2014-07-24 ] | |||||||
|
Attaching the fix that is modified a little bit after internal review. | |||||||
| Comment by Kristian Nielsen [ 2014-08-19 ] | |||||||
|
Ok, I'm back from vacation and got to this issue. There are actually more bugs related to when a master crashes in the middle of I suspect that this is poorly tested, and there may be other bugs lurking I looked at MySQL 5.6, they took a different approach. During binlog recovery What do you think of this approach? It might make things more robust that (Even with this approach, I think something like your suggested patch could | |||||||
| Comment by Kristian Nielsen [ 2014-08-19 ] | |||||||
|
BTW, I think it's not actually possible for the slave to see any of the However, the bug would still be possible to trigger in real life. Because a | |||||||
| Comment by Kristian Nielsen [ 2014-08-19 ] | |||||||
|
The other bug I mentioned is | |||||||
| Comment by Pavel Ivanov [ 2014-08-19 ] | |||||||
|
When I looked at the code I had an impression that LOCK_log is held only when binlog dumping threads sleep waiting for new events to arrive. But if the dumping thread is actively reading binlog file without stopping then it can easily jump from the previous event group to the next incomplete event group if that one has already partially written (probably that event group should be big so that it's not fully cached in the IO_CACHE during writing). So it's still possible for the slave to receive partial event group before master has crashed. Did I miss something? So I'd claim that binlog truncation won't solve this bug completely, but could be a useful addition anyway. | |||||||
| Comment by Kristian Nielsen [ 2014-08-19 ] | |||||||
|
Hm, interesting... Well, I see this code in mysql_binlog_send():
And this in Log_event::read_log_event():
So it looks like LOCK_log is held while reading each event to be sent to the However, if my understanding is correct, that's actually something that I And in any case, I agree that both approaches are good simultaneously - | |||||||
| Comment by Pavel Ivanov [ 2014-08-19 ] | |||||||
|
Ah, I didn't notice such indirect locking. Looks weird though as to why it's necessary at all... | |||||||
| Comment by Kristian Nielsen [ 2014-08-26 ] | |||||||
|
Hm, I think I found an issue with the patch. If I understand the patch correctly, it works like this: At I/O thread reconnect, it looks at the file and position of the initial fake rotate event sent from the master. If these are different from the position of the GTID of a partial event group prior to the reconnect, a warning is output and a fake format description event is created to make the SQL thread roll back the partial event group. But the position of that fake rotate event will always be 4, so it will never be equal to the prior GTID position - if my reading of the code is correct. So it seems the effect of the patch is to always roll back and re-queue a partial event group in case of reconnect? In fact, the test case rpl.rpl_gtid_reconnect fails because of this: rpl.rpl_gtid_reconnect 'mix,xtradb' [ fail ] Found warnings/errors in server log file! In fact, for InnoDB, always rolling back and re-queueing seems like a reasonable approach. It should also work for most DDL (not sure about CREATE ... SELECT in row-based mode). But it can cause replication failure for MyISAM, I think. (Not the first time things would have been easier if a transactional storage engine could be assumed...) | |||||||
| Comment by Kristian Nielsen [ 2014-08-26 ] | |||||||
|
An easy fix might be to compare only the file name, not the position? In case of crash, the file name has changed. In case of normal reconnect (no master crash), the file name will be the same. | |||||||
| Comment by Pavel Ivanov [ 2014-08-26 ] | |||||||
|
Hm... When I wrote that I indeed was worried that first rotate event will have position 4 and won't point to the GTID event. But I think I was convinced somehow that it's not true and it points to GTID event. I don't remember now what made me think so. If you are sure that it always has position 4 then I guess we don't have other choice but only check the file name. But this check won't be as reliable. E.g. when master crashes, then its IP:port is routed to a new master and that server has different transaction at the same file, but different position. Can we do something better than just file name check? | |||||||
| Comment by Kristian Nielsen [ 2014-08-27 ] | |||||||
|
We can detect silent change of master server by checking if the server_id of Maybe something like this: 1. If during a reconnect we see that the server_id of the master changes, we 2. On a reconnect where the server_id of the master is unchanged, it seems I think this should be safe for all transactional events. I suppose (1) could | |||||||
| Comment by Pavel Ivanov [ 2014-08-27 ] | |||||||
|
Yes, the approach of checking server_id and then file name sounds reasonable to me. | |||||||
| Comment by Kristian Nielsen [ 2014-09-02 ] | |||||||
|
I committed a patch for this that implements what is discussed above: http://lists.askmonty.org/pipermail/commits/2014-September/006478.html Monty, do you think you could review the patch? | |||||||
| Comment by Michael Widenius [ 2014-09-02 ] | |||||||
|
Have now review it and it looks ok. | |||||||
| Comment by Michael Widenius [ 2014-09-02 ] | |||||||
|
Have now review it and it looks ok. | |||||||
| Comment by Michael Widenius [ 2014-09-02 ] | |||||||
|
ok to push | |||||||
| Comment by Kristian Nielsen [ 2014-09-03 ] | |||||||
|
Pushed to 10.0.14. |