Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33465

an option to enable semisync recovery

Details

    Description

      MDEV-21117 has implemented correct binlog crash recovery in semisync replication. The server decides how to recover based on the role it has in the replication topology. In MDEV-21117 it deduces the role based on rpl_semi_sync_slave_enabled and rpl_semi_sync_master_enabled variables.

      This leaves an important use case open. Typically not all slaves are semisync. If one has a mix of semysync and async slaves and the master crashes and after a failover wakes up as an async slave it won't have rpl_semi_sync_slave_enabled set and won't be able to recover correctly, because it won't be able to deduce its replication role.

      We need to introduce the dedicated option that tells the server its initial¹ role in the replication topology. Perhaps the server can also automatically configure other settings, as appropriate for the specified role.

      ¹) initial — because the role can change dynamically, but such a change has no impact on crash recovery.

      Attachments

        Issue Links

          Activity

            there is already such an option, init-rpl-role. It can be empty, MASTER, or SLAVE. Let's reuse it and enable slave-side recovery on --init-rpl-role=SLAVE

            serg Sergei Golubchik added a comment - there is already such an option, init-rpl-role . It can be empty, MASTER , or SLAVE . Let's reuse it and enable slave-side recovery on --init-rpl-role=SLAVE

            Hey Andrei!

            I've drafted up a patch for this which is ready for your review: PR 3099

            bnestere Brandon Nesterenko added a comment - Hey Andrei! I've drafted up a patch for this which is ready for your review: PR 3099

            As far as I have understood the issue, it is about master-side-recovery and not slave side recovery.
            Currently, having rpl_semi_sync_master_enabled, causes binary logs to be truncated on the master side in the case where the slave has got the binary log event but the master dies before getting the semi-sync acknowledgement.

            Truncation of the binary should only happen in the case the master was before a semi-sync-master and then it is changed to slave.
            Truncation should not happen in the case the master continues to be a master or if semi-sync was not enabled when the server crashed.

            I do not understand how setting just --init-rpl-role=SLAVE can tell the server if it can truncate the last event(s) from the binary log or not.
            This is because the server cannot know the original value of rpl_semi_sync_master_enabled as the current value it may not be the same as when the server restarts. It also does not know if it was a master before the restart.
            Because of this it may be better to have a dedicated variable to define if binary logs should be truncated or not.

            There is another bigger problem:

            • Truncation of the binary log should not be done automatically as on automatic master restart, the server cannot automatically know if it should continue as a master or as a slave. This decision needs to be done by other means, like by MaxScale or a human, that has to decide which server should be the new master. If the original master restarts fast, it is better to continue with this as a master. If the master was down for a long time or corrupted, then it is better to assign one of the slaves as a new master.

            Here is a suggestion of how to solve this:

            • Add option --init-rpl-role=UNKNOWN
            • When server starts and notices that UNKNOWN is used, it should leave the binary log untruncated and not commit or rollback any of the binary logged transactions, and wait for MaxScale (or human) to connect and examine the state of the server. MaxScale can then examine the state of replication and decide for each server if it should be a master or slave by executing SET @@global.init_rpl_role=MASTER/SLAVE.
            • We still need a separate variable or command to specify if the binary log should be truncated on the slave. Truncation should only be done in the case where the old master is now a slave and its binary log is one transaction before the new master and that transaction is not committed.
            monty Michael Widenius added a comment - As far as I have understood the issue, it is about master-side-recovery and not slave side recovery. Currently, having rpl_semi_sync_master_enabled, causes binary logs to be truncated on the master side in the case where the slave has got the binary log event but the master dies before getting the semi-sync acknowledgement. Truncation of the binary should only happen in the case the master was before a semi-sync-master and then it is changed to slave. Truncation should not happen in the case the master continues to be a master or if semi-sync was not enabled when the server crashed. I do not understand how setting just --init-rpl-role=SLAVE can tell the server if it can truncate the last event(s) from the binary log or not. This is because the server cannot know the original value of rpl_semi_sync_master_enabled as the current value it may not be the same as when the server restarts. It also does not know if it was a master before the restart. Because of this it may be better to have a dedicated variable to define if binary logs should be truncated or not. There is another bigger problem: Truncation of the binary log should not be done automatically as on automatic master restart, the server cannot automatically know if it should continue as a master or as a slave. This decision needs to be done by other means, like by MaxScale or a human, that has to decide which server should be the new master. If the original master restarts fast, it is better to continue with this as a master. If the master was down for a long time or corrupted, then it is better to assign one of the slaves as a new master. Here is a suggestion of how to solve this: Add option --init-rpl-role=UNKNOWN When server starts and notices that UNKNOWN is used, it should leave the binary log untruncated and not commit or rollback any of the binary logged transactions, and wait for MaxScale (or human) to connect and examine the state of the server. MaxScale can then examine the state of replication and decide for each server if it should be a master or slave by executing SET @@global.init_rpl_role=MASTER/SLAVE. We still need a separate variable or command to specify if the binary log should be truncated on the slave. Truncation should only be done in the case where the old master is now a slave and its binary log is one transaction before the new master and that transaction is not committed.
            markus makela markus makela added a comment -

            There was an idea that MaxScale would compare the GTIDs between the master and the slaves and see if a rollback has happened and do a forced failover in that case. This would help avoid some of these but it will not work if the master crashes and recovers faster than the monitoring in MaxScale can detect.

            If the aforementioned happens, the crash and subsequent restart of the server is not detected in time and the old Master status in MaxScale remains in effect. This will cause writes to be sent to the server as soon as it comes back up and since it rolled back the transactions, duplicate GTIDs with different contents may be created.

            This means that the only real solution to this is for the server to implement the --init-rpl-role=UNKNOWN in a way so that a crash of a server will prevent new writes until someone (MaxScale in this case) tells it whether to commit or rollback the transactions. In the aforementioned case, new connections would get some sort of an error until the monitor in MaxScale detects the situation and then correctly sets the server role.

            markus makela markus makela added a comment - There was an idea that MaxScale would compare the GTIDs between the master and the slaves and see if a rollback has happened and do a forced failover in that case. This would help avoid some of these but it will not work if the master crashes and recovers faster than the monitoring in MaxScale can detect. If the aforementioned happens, the crash and subsequent restart of the server is not detected in time and the old Master status in MaxScale remains in effect. This will cause writes to be sent to the server as soon as it comes back up and since it rolled back the transactions, duplicate GTIDs with different contents may be created. This means that the only real solution to this is for the server to implement the --init-rpl-role=UNKNOWN in a way so that a crash of a server will prevent new writes until someone (MaxScale in this case) tells it whether to commit or rollback the transactions. In the aforementioned case, new connections would get some sort of an error until the monitor in MaxScale detects the situation and then correctly sets the server role.

            https://github.com/MariaDB/server/commit/945093dd7d9470c41ec1d61db39ab63b94acf593 is ok to push.

            "as on automatic master restart, the server cannot automatically know if it should continue as a master or as a slave"

            of course, it can, if it was started with the rpl-role=SLAVE it means it was started as a slave. The server knows what is has been told. What you wanted to say is that whoever starts the server cannot always know whether it'll be a master or a slave. This is true, it cannot always know that, and when it does not will not set the rpl-role=SLAVE to avoid slave-side recovery (that is, truncation).

            There can be different solutions to this, delaying recovery is one of them. But let's keep this MDEV-33465 about the original issue where users set both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled and do not want any binlog truncation to happen. To solve that immediate problem, moving the decision to a different option is enough.

            serg Sergei Golubchik added a comment - https://github.com/MariaDB/server/commit/945093dd7d9470c41ec1d61db39ab63b94acf593 is ok to push. "as on automatic master restart, the server cannot automatically know if it should continue as a master or as a slave" of course, it can, if it was started with the rpl-role=SLAVE it means it was started as a slave. The server knows what is has been told. What you wanted to say is that whoever starts the server cannot always know whether it'll be a master or a slave. This is true, it cannot always know that, and when it does not will not set the rpl-role=SLAVE to avoid slave-side recovery (that is, truncation). There can be different solutions to this, delaying recovery is one of them. But let's keep this MDEV-33465 about the original issue where users set both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled and do not want any binlog truncation to happen. To solve that immediate problem, moving the decision to a different option is enough.

            Pushed into 10.6 as eb4458e9935.

            Merge conflicts observed in 11.3 and 11.4. The 11.3 merge conflict seems like it might go away naturally in the next merge-up, though the 11.4 one will still be needed.

            bnestere Brandon Nesterenko added a comment - Pushed into 10.6 as eb4458e9935 . Merge conflicts observed in 11.3 and 11.4. The 11.3 merge conflict seems like it might go away naturally in the next merge-up, though the 11.4 one will still be needed. 11.3-MDEV-33465-mergefix 11.4-MDEV-33465-mergefix

            People

              bnestere Brandon Nesterenko
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.