[MDEV-21117] refine the server binlog-based recovery for semisync - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.6.2
Component/s: Replication
Labels:
None

Description

When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

Once a transaction reaches the binary logs it should roll forward.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

recovery_design.txt
6 kB
2021-01-28 14:23

Issue Links

blocks

MDEV-11855 Make semisync crash safe with the cluster

Open

causes

MDEV-33465 an option to enable semisync recovery

Closed

includes

MDEV-26652 xa transactions binlogged in wrong order

Closed

is duplicated by

MDEV-20996 Maxscale auto-failover with semi-sync replication is not providing a true HA solution

Closed

relates to

MDEV-21168 Active XA transactions stop slave from working after backup was restored.

Closed

MDEV-25395 server recovery hits replication event checksum error

Stalled

MDEV-33424 when both rpl_semi_sync_MASTER,SLAVE_enabled set the server should recover as master

Closed

MDEV-18959 Engine transaction recovery through persistent binlog

Stalled

links to

MySQL WL#5493: Binlog crash-safe when master crashed

(3 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

Andrei Elkin created issue - 2019-11-21 16:28

Andrei Elkin made changes - 2019-11-21 16:28

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-20996~~ [ ~~MDEV-20996~~ ]

Andrei Elkin made changes - 2019-11-21 16:28

Link

This issue relates to MENT-203 [ MENT-203 ]

Andrei Elkin made changes - 2019-11-22 13:47

Description

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present in the master-slave gtid connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.

Geoff Montee (Inactive) made changes - 2019-11-25 19:09

Remote Link

This issue links to "MySQL WL#5493: Binlog crash-safe when master crashed (Web Link)" [ 29310 ]

Sujatha Sivakumar (Inactive) made changes - 2019-11-26 08:48

Status

Open [ 1 ]

In Progress [ 3 ]

Max Mether made changes - 2019-11-28 10:27

Link

This issue relates to ~~MDEV-21168~~ [ ~~MDEV-21168~~ ]

Julien Fritsch made changes - 2020-02-20 15:09

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Golubchik made changes - 2020-02-20 16:02

Priority

Critical [ 2 ]

Blocker [ 1 ]

Julien Fritsch made changes - 2020-02-26 14:18

Labels

need_feedback

Chris Calender (Inactive) made changes - 2020-03-03 18:50

Labels

need_feedback

Sergei Golubchik made changes - 2020-03-23 18:14

Fix Version/s

10.5 [ 23123 ]

Sergei Golubchik made changes - 2020-03-23 18:14

Affects Version/s

10.5 [ 23123 ]

Sujatha Sivakumar (Inactive) made changes - 2020-04-01 17:51

Assignee	Sujatha Sivakumar [ sujatha.sivakumar ]	Andrei Elkin [ elkin ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Andrei Elkin made changes - 2020-04-14 10:12

Assignee	Andrei Elkin [ elkin ]	Sujatha Sivakumar [ sujatha.sivakumar ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä made changes - 2020-04-23 10:08

Link

This issue is blocked by ~~MDEV-22351~~ [ ~~MDEV-22351~~ ]

Sujatha Sivakumar (Inactive) made changes - 2020-04-23 15:34

Assignee	Sujatha Sivakumar [ sujatha.sivakumar ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2020-05-04 11:53

Priority

Blocker [ 1 ]

Critical [ 2 ]

Sergei Golubchik made changes - 2020-05-11 18:19

Link

This issue is blocked by ~~MDEV-22351~~ [ ~~MDEV-22351~~ ]

Sergei Golubchik made changes - 2020-05-11 18:21

Assignee	Sergei Golubchik [ serg ]	Andrei Elkin [ elkin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Andrei Elkin made changes - 2020-08-11 12:05

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Andrei Elkin made changes - 2020-10-23 11:50

Assignee	Andrei Elkin [ elkin ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Julien Fritsch made changes - 2020-11-06 15:58

Fix Version/s

10.1 [ 16100 ]

Ralf Gebhardt made changes - 2020-11-12 15:23

Fix Version/s

10.5 [ 23123 ]

Julien Fritsch made changes - 2020-11-17 13:51

Link

This issue blocks MENT-203 [ MENT-203 ]

Julien Fritsch made changes - 2020-11-17 13:51

Link

This issue relates to MENT-203 [ MENT-203 ]

Sergei Golubchik made changes - 2020-12-18 16:58

Assignee	Sergei Golubchik [ serg ]	Andrei Elkin [ elkin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Andrei Elkin made changes - 2021-01-19 13:10

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Andrei Elkin made changes - 2021-01-22 13:19

Link

This issue relates to MDEV-24654 [ MDEV-24654 ]

Andrei Elkin made changes - 2021-01-22 16:53

Assignee	Andrei Elkin [ elkin ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Andrei Elkin made changes - 2021-01-22 17:24

Link

This issue is blocked by MDEV-24654 [ MDEV-24654 ]

Andrei Elkin made changes - 2021-01-22 17:24

Link

This issue relates to MDEV-24654 [ MDEV-24654 ]

Andrei Elkin made changes - 2021-01-23 20:09

Attachment

recovery_design.txt [ 55817 ]

Andrei Elkin made changes - 2021-01-23 21:39

Summary

--tc-heuristic-recover=rollback is not replication safe

recovery for --rpl-semi-sync-slave-enabled server

Andrei Elkin made changes - 2021-01-24 13:31

Attachment

recovery_design.txt [ 55817 ]

Andrei Elkin made changes - 2021-01-24 13:31

Attachment

recovery_design.txt [ 55820 ]

Sergei Golubchik made changes - 2021-01-28 13:56

Link

This issue is blocked by MDEV-24654 [ MDEV-24654 ]

Andrei Elkin made changes - 2021-01-28 14:23

Attachment

recovery_design.txt [ 55892 ]

Andrei Elkin made changes - 2021-01-28 14:23

Attachment

recovery_design.txt [ 55820 ]

Sergei Golubchik made changes - 2021-02-26 13:15

Assignee	Sergei Golubchik [ serg ]	Sujatha Sivakumar [ sujatha.sivakumar ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Sujatha Sivakumar (Inactive) made changes - 2021-03-01 14:45

Assignee

Sujatha Sivakumar [ sujatha.sivakumar ]

Andrei Elkin [ elkin ]

Ralf Gebhardt made changes - 2021-04-08 13:36

Link

This issue is duplicated by ~~MDEV-20996~~ [ ~~MDEV-20996~~ ]

Ralf Gebhardt made changes - 2021-04-08 13:37

Link

This issue relates to ~~MDEV-20996~~ [ ~~MDEV-20996~~ ]

Andrei Elkin made changes - 2021-04-12 19:32

Link

This issue relates to MDEV-25395 [ MDEV-25395 ]

Andrei Elkin made changes - 2021-04-13 09:43

Assignee	Andrei Elkin [ elkin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-04-23 13:30

Summary

recovery for --rpl-semi-sync-slave-enabled server

refine the server binlog-based recovery for semisync

Sergei Golubchik made changes - 2021-04-25 14:19

Assignee	Sergei Golubchik [ serg ]	Andrei Elkin [ elkin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Julien Fritsch made changes - 2021-04-26 07:45

Link

This issue blocks MDEV-11855 [ MDEV-11855 ]

Julien Fritsch made changes - 2021-04-27 16:25

Affects Version/s	10.2 [ 14601 ]
Affects Version/s	10.1 [ 16100 ]
Affects Version/s	10.3 [ 22126 ]
Affects Version/s	10.4 [ 22408 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Julien Fritsch made changes - 2021-04-27 18:46

Issue Type

Task [ 3 ]

Bug [ 1 ]

Julien Fritsch made changes - 2021-04-27 18:47

Affects Version/s		10.1 [ 16100 ]
Affects Version/s		10.2 [ 14601 ]
Affects Version/s		10.3 [ 22126 ]
Affects Version/s		10.4 [ 22408 ]
Affects Version/s		10.5 [ 23123 ]

Sergei Golubchik made changes - 2021-04-28 13:58

Affects Version/s	10.2 [ 14601 ]
Affects Version/s	10.1 [ 16100 ]
Affects Version/s	10.3 [ 22126 ]
Affects Version/s	10.4 [ 22408 ]
Affects Version/s	10.5 [ 23123 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Sergei Golubchik made changes - 2021-04-28 13:58

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]

Julien Fritsch made changes - 2021-04-29 09:19

Link

This issue blocks MENT-1187 [ MENT-1187 ]

Chris Calender (Inactive) made changes - 2021-05-06 00:54

Comment

[ A comment with security level 'Developers' was removed. ]

Andrei Elkin made changes - 2021-05-26 14:40

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Andrei Elkin made changes - 2021-05-26 14:43

Assignee	Andrei Elkin [ elkin ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-06-02 20:49

Assignee	Sergei Golubchik [ serg ]	Andrei Elkin [ elkin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Andrei Elkin made changes - 2021-06-03 12:16

Assignee	Andrei Elkin [ elkin ]	Sergei Golubchik [ serg ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Golubchik made changes - 2021-06-09 15:19

Assignee	Sergei Golubchik [ serg ]	Andrei Elkin [ elkin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Andrei Elkin made changes - 2021-06-11 17:32

Fix Version/s		10.6.2 [ 25800 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Andrei Elkin made changes - 2021-06-11 18:24

Assignee

Andrei Elkin [ elkin ]

Ian Gilfillan [ greenman ]

Andrei Elkin made changes - 2021-06-11 18:24

Labels

need_feedback

Julien Fritsch made changes - 2021-06-28 08:47

Labels

need_feedback

Sergei Golubchik made changes - 2021-09-25 20:24

Link

This issue relates to ~~MDEV-26652~~ [ ~~MDEV-26652~~ ]

Sergei Golubchik made changes - 2021-10-17 08:21

Link

This issue includes ~~MDEV-26652~~ [ ~~MDEV-26652~~ ]

Sergei Golubchik made changes - 2021-10-17 08:22

Link

This issue relates to ~~MDEV-26652~~ [ ~~MDEV-26652~~ ]

Sergei Golubchik made changes - 2021-12-06 21:24

Workflow

MariaDB v3 [ 101333 ]

MariaDB v4 [ 134141 ]

Michael Widenius made changes - 2024-02-07 15:49

Description

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

Michael Widenius made changes - 2024-02-07 15:53

Description

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

Once a transaction reaches the binary logs it should roll forward.

Michael Widenius made changes - 2024-02-07 16:05

Assignee	Ian Gilfillan [ greenman ]	Andrei Elkin [ elkin ]
Resolution	Fixed [ 1 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Michael Widenius made changes - 2024-02-07 16:05

Assignee

Andrei Elkin [ elkin ]

Brandon Nesterenko [ JIRAUSER48702 ]

Sergei Golubchik made changes - 2024-02-07 18:10

Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Michael Widenius made changes - 2024-02-11 12:44

Link

This issue relates to ~~MDEV-33424~~ [ ~~MDEV-33424~~ ]

Michael Widenius made changes - 2024-02-11 12:52

Resolution	Fixed [ 1 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Sergei Golubchik made changes - 2024-02-15 11:13

Link

This issue causes ~~MDEV-33465~~ [ ~~MDEV-33465~~ ]

Sergei Golubchik made changes - 2024-03-27 20:38

Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Jira Automation (IT) made changes - 2024-07-04 03:19

Zendesk Related Tickets

125800 172110 134539

Dave Gosselin made changes - 2024-07-26 14:14

Link

This issue relates to MDEV-18959 [ MDEV-18959 ]

People

Assignee:: Brandon Nesterenko

Reporter:: Andrei Elkin

Votes:: 3 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 2019-11-21 16:28

Updated:: 2025-03-13 11:40

Resolved:: 2024-03-27 20:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration