Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
The current semi-sync logic for replication can result in slaves becoming out of sync with their master if the master crashes when configured with rpl_semi_sync_master_wait_point=AFTER_SYNC . That is, there may be transactions which were written to the binary log, but not committed in the storage engine. MDEV-21117 added logic to truncate the binary log when these transactions are rolled back during recovery. This option can only be used when it is known ahead-of-time (i.e. before starting the server back up) that the server will recover as a slave (specified via --init-rpl-role=SLAVE, MDEV-33465). This is fairly complicated for users to understand, and still allows them to end up with inconsistent servers if they aren't configured to handle such situations (or can't dynamically change configuration options).
Alternatively, we can add another option for rpl_semi_sync_master_wait_point, AFTER_PREPARE, which would send a transaction to the slave before writing it to the binary log (but still with the correct GTID). This would circumvent the need to truncate the binary log altogether, and would set a clear precedent that if a semi-sync master fails, it must re-join the topology as a slave.
Attachments
Issue Links
- relates to
-
MDEV-11855 Make semisync crash safe with the cluster
-
- Open
-
-
MDEV-37604 Semi-sync Replication Slave ACK at Commit
-
- Open
-
-
MDEV-21117 refine the server binlog-based recovery for semisync
-
- Closed
-
-
MDEV-33465 an option to enable semisync recovery
-
- Closed
-
-
MDEV-34878 Semisync recovery mode
-
- Needs Feedback
-