[MDEV-4725] GTID strict mode does not allow slave to continue replicating after crash which happened during writing event group Created: 2013-06-27 Updated: 2013-11-21 Resolved: 2013-11-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 10.0.7 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Elena Stepanova | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
If a slave crashes while writing an event group into its binary log, specifically after writing GTID event but before finishing Xid, and if gtid_strict_mode is enabled, the slave cannot resume replication after restart, it aborts with the error "An attempt was made to binlog GTID X-Y-Z which would create an out-of-order sequence number with existing GTID X-Y-Z, and gtid strict mode is enabled". Note: This is a follow-up on the failure that we discussed earlier on IRC. I can now positively confirm that the failure I observed also happened upon slave crash. And as you suggested, I was able to reproduce it with "crash_before_writing_xid" (see the test case below). In fact, in my case the picture was slightly different, the slave crashed one step earlier, right after writing the GTID event, and before writing anything else. I tried to add a debug crash point there and it causes the same problem, so I suppose crash_before_writing_xid will do just as well. Test case:
bzr version-info
|
| Comments |
| Comment by Elena Stepanova [ 2013-06-28 ] |
|
Forgot to quote your analysis from IRC: "the bug seems to be the following: We crash in the middle of an event group. Then during crash recovery, we scan the binlog to collect all logged GTIDs in the crashed binlog, to put in GTID_LIST in the next log. But the code does not correctly handle that a partial event group should not be used" |
| Comment by Kristian Nielsen [ 2013-11-21 ] |
|
Pushed to 10.0-base. Make sure that we only recover binlog state from fully written event groups in the binlog, not from any partial group written at the end of the log just before crashing. |