[MDEV-11855] Make semisync crash safe with the cluster - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:
- semisync
- upstream

Description

Semisync state and document that the slave is up to date with master under some predefine delay , but this is not physically true (despite the client have never seen those extra transaction they are in the binlog) as the ACK is done after SYNC or after COMMIT. What is true is that no transaction have been committed until it reach a slave , but it can lead the old master in a state where it need to be restore from the cluster.

PERSISTENT ACK

The sync would be prepare with the group commit store inside InnoDB system table with expected gtid inside a master system table and ACK acknowledge the system table receive before commit. During Crash recover all trx missing the acknowledge would be rollback.

PUSH MODEL
Inter storage engine 2PC can be use to push the binlog to the slaves. Lucky spider engine can do this in 2PC with all slaves or some preselected slaves and report back the failure of the transaction via it's monitoring of 2PC feature.

Semisync master plugin would inject into a spider table linked to a relay log system table on every node of the cluster , based on the number of success will assign the status of in sync to the replication. This is loosing the first ACK win but bring back true crash safe capabilities .

The Semisync slave plugin would select witch queue to apply in sync mode it read the relay from system table , in assync mode from the binog

A remote failure would commit anyway as there is no reason for the local spider system table not to succeed. Only spider monitoring would tell us that remote slaves are down and replication should be switch to un sync . Coming back to sync state would reset optimistic spider table status to make an other tentative by reseting the state of the slave inside the spider local table.

mysql_sandbox5012-bin.000001 648803 Gtid 5054 648841 BEGIN GTID 0-5054-4185
mysql_sandbox5012-bin.000001 648841 Table_map 5054 648889 table_id: 60 (test.test119)
mysql_sandbox5012-bin.000001 648889 Write_rows_v1 5054 648931 table_id: 60 flags: STMT_END_F
AFTER SYNC ACK (crash 1)
mysql_sandbox5012-bin.000001 648931 Xid 5054 648958 COMMIT /* xid=1400239 */
(crash 2)
AFTER COMMIT ACK

Crash 1 follow by crash recovery would make an unfinished transaction rollback on master restart ,
how the slave manage such case ?

Crash 2 in AFTER SYNC Client have receive trx ok but slave may not receive XID Commit
if elect a slave here we may miss the transaction

Crash in 2
AFTER COMMIT ACK
May leave the master with extra transaction , elect slave will miss one transaction

Attachments

Issue Links

is blocked by

MDEV-21117 refine the server binlog-based recovery for semisync

Closed

relates to

MDEV-19140 Full synchronous replication

Open

Activity

People

Assignee:: Andrei Elkin

Reporter:: VAROQUI Stephane

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2017-01-20 14:52

Updated:: 2024-07-08 17:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server