[MDEV-33342] Add a replication MTR test cloning the slave with mariadb-backup - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 11.3.2, 11.4.1, 10.5.25, 10.11.8, 10.6.18, 11.0.6, 11.1.5, 11.2.4
Component/s: mariabackup, Replication
Labels:
None

Description

The test should start as a usual replication test:

run some statements on the master
make sure they are propagated to the slave side

Then, the test should emulate cloning data from the current slave to a new virgin slave.

This sequence on the slave should do:

mariadb-backup --backup
shutdown the server
drop the entire data directory
mariadb-backup --prepare
mariadb-backup --copy-back
restart the server
configure replication using CHANGE MASTER to set user, host, port
configure replication using xtrabackup_slave_info, to set log file name and log position
restart replication using START SLAVE

In parallel with these steps on the slave, the master should run some more transactions at different points in time:

while the "old" slave is running
while the slave is shut down
while the "new" slave is running

At the end, the test should check that all master transactions are reflected on the "new" slave.

Attachments

Issue Links

relates to

MDEV-33355 Add a Galera-2-node-to-MariaDB replication MTR test cloning the slave with mariadb-backup

Closed

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2024-01-31 10:48

See also this test case, it does almost exactly what you describe, except that it copies the master to a new slave:

mysql-test/suite/mariabackup/slave_provision_nolock.test

(in case you were not aware of that test).

Hope this helps,

- Kristian.

Kristian Nielsen added a comment - 2024-01-31 10:48 See also this test case, it does almost exactly what you describe, except that it copies the master to a new slave: mysql-test/suite/mariabackup/slave_provision_nolock.test (in case you were not aware of that test). Hope this helps, - Kristian.

Alexander Barkov added a comment - 2024-02-01 09:18 - edited

Hello knielsen, thank you for the information. I will check this test.

The reason why we need a new test:
There is a user complain telling that cloning a slave to a new virgin slave with mariadb-backup loses one transaction when not using GTID. That happened multiple times for the user already.

The new test being added in this task found no problems in a simple scenario:

Small amount of small transactions
No log rotate
--sync_slave_with_master after each master transaction

Btw, the patch is here:
https://github.com/MariaDB/server/commit/8fbad587311044a542f5d5c4d0ee4ffd7362f70b

Now we need to investigate the problem further, under more complex conditions.

Alexander Barkov added a comment - 2024-02-01 09:18 - edited Hello knielsen , thank you for the information. I will check this test. The reason why we need a new test: There is a user complain telling that cloning a slave to a new virgin slave with mariadb-backup loses one transaction when not using GTID. That happened multiple times for the user already. The new test being added in this task found no problems in a simple scenario: Small amount of small transactions No log rotate --sync_slave_with_master after each master transaction Btw, the patch is here: https://github.com/MariaDB/server/commit/8fbad587311044a542f5d5c4d0ee4ffd7362f70b Now we need to investigate the problem further, under more complex conditions.

Kristian Nielsen added a comment - 2024-02-01 10:16

Sure Bar, agree that another test is needed for mariabackup --slave-info. That's not covered by the existing test at all.

The slave_provision_nolock.test was effective to catch a problem with the starting slave position from mariabackup --no-lock of the master server, it generates some DML load while the backup is running. But this was a problem with how InnoDB internally saves a copy of the binlog position. mariabackup --slave-info uses SHOW SLAVE STATUS in combination with some table locking and/or stopping the SQL thread, depending on options given. So not sure if a similar test with --slave-info would be able to catch a problem. The original test needs to be run multiple times to catch the original bug, as it was a race/non-deterministic.

If the problem is always the loss of one transaction (as opposed to multiple transactions, or duplicating a transaction leading to eg. duplicate key error), that sounds like the obtained replication position is obtained one transaction later than the state of InnoDB in the backup. But that is strange, as the InnoDB commit happens first, followed by the update of the position. If the problem is a race where the position is obtained in-between that, it would be one transaction behind, not ahead. If the problem is missing table locking between backup up InnoDB state and SHOW SLAVE STATUS, we would expect that the position could be different number of transactions ahead.

Hopefully you will find a way to reproduce. I'd suggest (if not done already) to carefully get exact information about what kind of locking options etc. is used for the backup, such as --nolock, --safe-slave-backup.

Kristian Nielsen added a comment - 2024-02-01 10:16 Sure Bar, agree that another test is needed for mariabackup --slave-info. That's not covered by the existing test at all. The slave_provision_nolock.test was effective to catch a problem with the starting slave position from mariabackup --no-lock of the master server, it generates some DML load while the backup is running. But this was a problem with how InnoDB internally saves a copy of the binlog position. mariabackup --slave-info uses SHOW SLAVE STATUS in combination with some table locking and/or stopping the SQL thread, depending on options given. So not sure if a similar test with --slave-info would be able to catch a problem. The original test needs to be run multiple times to catch the original bug, as it was a race/non-deterministic. If the problem is always the loss of one transaction (as opposed to multiple transactions, or duplicating a transaction leading to eg. duplicate key error), that sounds like the obtained replication position is obtained one transaction later than the state of InnoDB in the backup. But that is strange, as the InnoDB commit happens first, followed by the update of the position. If the problem is a race where the position is obtained in-between that, it would be one transaction behind, not ahead. If the problem is missing table locking between backup up InnoDB state and SHOW SLAVE STATUS, we would expect that the position could be different number of transactions ahead. Hopefully you will find a way to reproduce. I'd suggest (if not done already) to carefully get exact information about what kind of locking options etc. is used for the backup, such as --nolock, --safe-slave-backup.

Alexander Barkov added a comment - 2024-02-01 10:21

Thank you for your feedback, knielsen.

Alexander Barkov added a comment - 2024-02-01 10:21 Thank you for your feedback, knielsen .

MariaDB Server

Add a replication MTR test cloning the slave with mariadb-backup

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration