Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37135

Data Loss if a Primary is Reverted And Made a Slave

    XMLWordPrintable

Details

    • Can result in data loss

    Description

      Transactions will be lost if a master is reverted to a prior snapshot and made a slave. I.e. taking a master, reverting it to some past state of itself (where there are newer transactions from this same server that its slaves have), and then setting it to be a slave of one of its slaves. In other words, our master of state (D-S-N) D=domain_id, S=server_id, N=seq_no is then reverted to some past state D-S-P where P < N, but S is the same. --replicate-same-server-id and --log-slave-updates currently can't both be enabled, so we have two options:

      1. If --replicate-same-server-id is off, all transactions between D-S-P and D-S-N will be dropped when replicated to the newly-demoted-slave (formerly the master (server S) which logged D-S-P through D-S-N.
      2. If --replicate-same-server-id is on, --log-slave-updates must be off, and then the newly-demoted slave (server S) won't re-binlog these transactions, and its binary logs will have a hole.

      A workaround for this is to temporarily change the server_id of the demoting server while it replicates transactions it serviced after the restored snapshot. The following steps will ensure this:

      1. Ensure the newly promoted primary has replicated all transactions from its former primary (which is demoting to a replica)
      2. `SELECT @@global.gtid_binlog_state` on the newly promoted primary. This will show a list of GTIDs (possibly in the same domain). Each GTID in this list has a unique <domain_id, server_id> combination, where the GTID itself shows the last transaction executed by that server_id. Find the GTIDs with the server_id that matches the @@global.server_id variable of the demoting-to-replica server (i.e. the server which is having a snapshot restored on it). These GTIDs will be referred to later as <last_gtids_from_old_master>.
      3. After restoring the snapshot on the former primary (the newly-demoting replica), temporarily change the server_id to some value that is unique-to-the-cluster.
      4. Ensure replication is configured to use MASTER_USE_GTID=Slave_pos (rather than Current_pos or No).
      5. When ready to start replication on the newly-demoted replica, use the command START REPLICA UNTIL master_gtid_pos="<last_gtids_from_old_master>" where last_gtids_from_old_master comes from step 2. This will catch the server up to the state that the server was formerly at before the snapshot was restored.
      6. Wait for replication to automatically stop (due to the UNTIL condition being satisfied).
      7. Restore the server_id of the newly-demoted replica to its original value (i.e. before step 3).
      8. Start replication as normal.

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.