Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21117

refine the server binlog-based recovery for semisync

Details

    Description

      When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
      Such way recovered server may not be used for replication. E.g when such way recovered
      ex-master is demoted into slave its binlog state needs further correction to subtract
      the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
      those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

      This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
      https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

      Once a transaction reaches the binary logs it should roll forward.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin created issue -
            Elkin Andrei Elkin made changes -
            Field Original Value New Value
            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Description When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present in the master-slave gtid connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.
            When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.
            GeoffMontee Geoff Montee (Inactive) made changes -
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            maxmether Max Mether made changes -
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback
            ccalender Chris Calender (Inactive) made changes -
            Labels need_feedback
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.5 [ 23123 ]
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Andrei Elkin [ elkin ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sujatha Sivakumar [ sujatha.sivakumar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Sergei Golubchik [ serg ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Andrei Elkin [ elkin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            Elkin Andrei Elkin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5 [ 23123 ]
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Andrei Elkin [ elkin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            Elkin Andrei Elkin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Attachment recovery_design.txt [ 55817 ]
            Elkin Andrei Elkin made changes -
            Summary --tc-heuristic-recover=rollback is not replication safe recovery for --rpl-semi-sync-slave-enabled server
            Elkin Andrei Elkin made changes -
            Attachment recovery_design.txt [ 55817 ]
            Elkin Andrei Elkin made changes -
            Attachment recovery_design.txt [ 55820 ]
            serg Sergei Golubchik made changes -
            Elkin Andrei Elkin made changes -
            Attachment recovery_design.txt [ 55892 ]
            Elkin Andrei Elkin made changes -
            Attachment recovery_design.txt [ 55820 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Sujatha Sivakumar [ sujatha.sivakumar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Andrei Elkin [ elkin ]
            ralf.gebhardt Ralf Gebhardt made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sergei Golubchik [ serg ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Summary recovery for --rpl-semi-sync-slave-enabled server refine the server binlog-based recovery for semisync
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Andrei Elkin [ elkin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] Bug [ 1 ]
            julien.fritsch Julien Fritsch made changes -
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            julien.fritsch Julien Fritsch made changes -
            ccalender Chris Calender (Inactive) made changes -
            Comment [ A comment with security level 'Developers' was removed. ]
            Elkin Andrei Elkin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Andrei Elkin [ elkin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sergei Golubchik [ serg ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Andrei Elkin [ elkin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            Elkin Andrei Elkin made changes -
            Fix Version/s 10.6.2 [ 25800 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Ian Gilfillan [ greenman ]
            Elkin Andrei Elkin made changes -
            Labels need_feedback
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 101333 ] MariaDB v4 [ 134141 ]
            monty Michael Widenius made changes -
            Description When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/.
            When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
            https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
            monty Michael Widenius made changes -
            Description When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
            https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
            When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
            Such way recovered server may not be used for replication. E.g when such way recovered
            ex-master is demoted into slave its binlog state needs further correction to subtract
            the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
            those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

            This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
            https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

            Once a transaction reaches the binary logs it should roll forward.
            monty Michael Widenius made changes -
            Assignee Ian Gilfillan [ greenman ] Andrei Elkin [ elkin ]
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            monty Michael Widenius made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            serg Sergei Golubchik made changes -
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            monty Michael Widenius made changes -
            monty Michael Widenius made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 125800 172110 134539
            Gosselin Dave Gosselin made changes -

            People

              bnestere Brandon Nesterenko
              Elkin Andrei Elkin
              Votes:
              3 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.