Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11855

Make semisync crash safe with the cluster




      Semisync state and document that the slave is up to date with master under some predefine delay , but this is not physically true (despite the client have never seen those extra transaction they are in the binlog) as the ACK is done after SYNC or after COMMIT. What is true is that no transaction have been committed until it reach a slave , but it can lead the old master in a state where it need to be restore from the cluster.


      The sync would be prepare with the group commit store inside InnoDB system table with expected gtid inside a master system table and ACK acknowledge the system table receive before commit. During Crash recover all trx missing the acknowledge would be rollback.

      Inter storage engine 2PC can be use to push the binlog to the slaves. Lucky spider engine can do this in 2PC with all slaves or some preselected slaves and report back the failure of the transaction via it's monitoring of 2PC feature.

      Semisync master plugin would inject into a spider table linked to a relay log system table on every node of the cluster , based on the number of success will assign the status of in sync to the replication. This is loosing the first ACK win but bring back true crash safe capabilities .

      The Semisync slave plugin would select witch queue to apply in sync mode it read the relay from system table , in assync mode from the binog

      A remote failure would commit anyway as there is no reason for the local spider system table not to succeed. Only spider monitoring would tell us that remote slaves are down and replication should be switch to un sync . Coming back to sync state would reset optimistic spider table status to make an other tentative by reseting the state of the slave inside the spider local table.

      mysql_sandbox5012-bin.000001 648803 Gtid 5054 648841 BEGIN GTID 0-5054-4185
      mysql_sandbox5012-bin.000001 648841 Table_map 5054 648889 table_id: 60 (test.test119)
      mysql_sandbox5012-bin.000001 648889 Write_rows_v1 5054 648931 table_id: 60 flags: STMT_END_F
      AFTER SYNC ACK (crash 1)
      mysql_sandbox5012-bin.000001 648931 Xid 5054 648958 COMMIT /* xid=1400239 */
      (crash 2)

      Crash 1 follow by crash recovery would make an unfinished transaction rollback on master restart ,
      how the slave manage such case ?

      Crash 2 in AFTER SYNC Client have receive trx ok but slave may not receive XID Commit
      if elect a slave here we may miss the transaction

      Crash in 2
      May leave the master with extra transaction , elect slave will miss one transaction


        Issue Links



              Elkin Andrei Elkin
              stephane@skysql.com VAROQUI Stephane
              2 Vote for this issue
              9 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.