[MDEV-11855] Make semisync crash safe with the cluster Created: 2017-01-20 Updated: 2021-09-29 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | VAROQUI Stephane | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | semisync, upstream | ||
| Issue Links: |
|
||||||||
| Description |
|
Semisync state and document that the slave is up to date with master under some predefine delay , but this is not physically true (despite the client have never seen those extra transaction they are in the binlog) as the ACK is done after SYNC or after COMMIT. What is true is that no transaction have been committed until it reach a slave , but it can lead the old master in a state where it need to be restore from the cluster. PERSISTENT ACK The sync would be prepare with the group commit store inside InnoDB system table with expected gtid inside a master system table and ACK acknowledge the system table receive before commit. During Crash recover all trx missing the acknowledge would be rollback. PUSH MODEL Semisync master plugin would inject into a spider table linked to a relay log system table on every node of the cluster , based on the number of success will assign the status of in sync to the replication. This is loosing the first ACK win but bring back true crash safe capabilities . The Semisync slave plugin would select witch queue to apply in sync mode it read the relay from system table , in assync mode from the binog A remote failure would commit anyway as there is no reason for the local spider system table not to succeed. Only spider monitoring would tell us that remote slaves are down and replication should be switch to un sync . Coming back to sync state would reset optimistic spider table status to make an other tentative by reseting the state of the slave inside the spider local table. mysql_sandbox5012-bin.000001 648803 Gtid 5054 648841 BEGIN GTID 0-5054-4185 Crash 1 follow by crash recovery would make an unfinished transaction rollback on master restart , Crash 2 in AFTER SYNC Client have receive trx ok but slave may not receive XID Commit Crash in 2 |
| Comments |
| Comment by Valerii Kravchuk [ 2020-04-28 ] |
|
See also this recent upstream MySQL bug report: |