Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Can result in data loss
Description
Hi,
From discussing a customer case with Elkin we found that there is a missing feature in replication. With the gtid replication feature _gtid_slave_pos_ the slave is now crash-safe. But what about the slaves of that slave? And what about if the binary log is used for point in time recovery? For example Zmanda uses this point in time recovery for incremental backups.
Imagine a slave where sync_binlogs=0 and innodb_flush_log_at_trx_commit=0 and a power outage. It is possible that gtid_slave_pos says position 12, but the binlog says seqno 10. If the server is restarted, the slave will login to the master and start the stream from seqno 12. This means transaction 11 is never written to the binlog.
The other way around is also possible: It could be that the binlog has seqno 12, but gtid_slave_pos says position 10. This means seqno 11 is written to the binlog twice.
This can be avoided by two features.
Feature 1:
if storage engine position is higher then binlog's position, start replicating from binlog gtid position but don't execute (only binlog, like blackhole storage engine replication) until we reach the position in gtid_slave_pos.
Feature 2:
If binlog's position is higher then storage engine's position: Don't binlog (but do execute to storage engine) that amount of transactions.
It may be useful to make these 2 features user-initiatable as well, for example in the same way as sql_slave_skip_counter works.
Thank you!
Attachments
Issue Links
- is part of
-
MDEV-34705 Improving performance of binary logging by removing the need of syncing it
-
- In Progress
-