Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33342

Add a replication MTR test cloning the slave with mariadb-backup

Details

    Description

      The test should start as a usual replication test:

      • run some statements on the master
      • make sure they are propagated to the slave side

      Then, the test should emulate cloning data from the current slave to a new virgin slave.

      This sequence on the slave should do:

      • mariadb-backup --backup
      • shutdown the server
      • drop the entire data directory
      • mariadb-backup --prepare
      • mariadb-backup --copy-back
      • restart the server
      • configure replication using CHANGE MASTER to set user, host, port
      • configure replication using xtrabackup_slave_info, to set log file name and log position
      • restart replication using START SLAVE

      In parallel with these steps on the slave, the master should run some more transactions at different points in time:

      • while the "old" slave is running
      • while the slave is shut down
      • while the "new" slave is running

      At the end, the test should check that all master transactions are reflected on the "new" slave.

      Attachments

        Issue Links

          Activity

            See also this test case, it does almost exactly what you describe, except that it copies the master to a new slave:

            mysql-test/suite/mariabackup/slave_provision_nolock.test

            (in case you were not aware of that test).

            Hope this helps,

            - Kristian.

            knielsen Kristian Nielsen added a comment - See also this test case, it does almost exactly what you describe, except that it copies the master to a new slave: mysql-test/suite/mariabackup/slave_provision_nolock.test (in case you were not aware of that test). Hope this helps, - Kristian.
            bar Alexander Barkov added a comment - - edited

            Hello knielsen, thank you for the information. I will check this test.

            The reason why we need a new test:
            There is a user complain telling that cloning a slave to a new virgin slave with mariadb-backup loses one transaction when not using GTID. That happened multiple times for the user already.

            The new test being added in this task found no problems in a simple scenario:

            • Small amount of small transactions
            • No log rotate
            • --sync_slave_with_master after each master transaction

            Btw, the patch is here:
            https://github.com/MariaDB/server/commit/8fbad587311044a542f5d5c4d0ee4ffd7362f70b

            Now we need to investigate the problem further, under more complex conditions.

            bar Alexander Barkov added a comment - - edited Hello knielsen , thank you for the information. I will check this test. The reason why we need a new test: There is a user complain telling that cloning a slave to a new virgin slave with mariadb-backup loses one transaction when not using GTID. That happened multiple times for the user already. The new test being added in this task found no problems in a simple scenario: Small amount of small transactions No log rotate --sync_slave_with_master after each master transaction Btw, the patch is here: https://github.com/MariaDB/server/commit/8fbad587311044a542f5d5c4d0ee4ffd7362f70b Now we need to investigate the problem further, under more complex conditions.

            Sure Bar, agree that another test is needed for mariabackup --slave-info. That's not covered by the existing test at all.

            The slave_provision_nolock.test was effective to catch a problem with the starting slave position from mariabackup --no-lock of the master server, it generates some DML load while the backup is running. But this was a problem with how InnoDB internally saves a copy of the binlog position. mariabackup --slave-info uses SHOW SLAVE STATUS in combination with some table locking and/or stopping the SQL thread, depending on options given. So not sure if a similar test with --slave-info would be able to catch a problem. The original test needs to be run multiple times to catch the original bug, as it was a race/non-deterministic.

            If the problem is always the loss of one transaction (as opposed to multiple transactions, or duplicating a transaction leading to eg. duplicate key error), that sounds like the obtained replication position is obtained one transaction later than the state of InnoDB in the backup. But that is strange, as the InnoDB commit happens first, followed by the update of the position. If the problem is a race where the position is obtained in-between that, it would be one transaction behind, not ahead. If the problem is missing table locking between backup up InnoDB state and SHOW SLAVE STATUS, we would expect that the position could be different number of transactions ahead.

            Hopefully you will find a way to reproduce. I'd suggest (if not done already) to carefully get exact information about what kind of locking options etc. is used for the backup, such as --nolock, --safe-slave-backup.

            knielsen Kristian Nielsen added a comment - Sure Bar, agree that another test is needed for mariabackup --slave-info. That's not covered by the existing test at all. The slave_provision_nolock.test was effective to catch a problem with the starting slave position from mariabackup --no-lock of the master server, it generates some DML load while the backup is running. But this was a problem with how InnoDB internally saves a copy of the binlog position. mariabackup --slave-info uses SHOW SLAVE STATUS in combination with some table locking and/or stopping the SQL thread, depending on options given. So not sure if a similar test with --slave-info would be able to catch a problem. The original test needs to be run multiple times to catch the original bug, as it was a race/non-deterministic. If the problem is always the loss of one transaction (as opposed to multiple transactions, or duplicating a transaction leading to eg. duplicate key error), that sounds like the obtained replication position is obtained one transaction later than the state of InnoDB in the backup. But that is strange, as the InnoDB commit happens first, followed by the update of the position. If the problem is a race where the position is obtained in-between that, it would be one transaction behind, not ahead. If the problem is missing table locking between backup up InnoDB state and SHOW SLAVE STATUS, we would expect that the position could be different number of transactions ahead. Hopefully you will find a way to reproduce. I'd suggest (if not done already) to carefully get exact information about what kind of locking options etc. is used for the backup, such as --nolock, --safe-slave-backup.

            Thank you for your feedback, knielsen.

            bar Alexander Barkov added a comment - Thank you for your feedback, knielsen .

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.