[MDEV-33342] Add a replication MTR test cloning the slave with mariadb-backup Created: 2024-01-31 Updated: 2024-02-06 Resolved: 2024-02-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup, Replication |
| Fix Version/s: | 11.3.2, 11.4.1, 10.5.25, 10.6.18, 10.11.8, 11.0.6, 11.1.5, 11.2.4 |
| Type: | Task | Priority: | Critical |
| Reporter: | Alexander Barkov | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
The test should start as a usual replication test:
Then, the test should emulate cloning data from the current slave to a new virgin slave. This sequence on the slave should do:
In parallel with these steps on the slave, the master should run some more transactions at different points in time:
At the end, the test should check that all master transactions are reflected on the "new" slave. |
| Comments |
| Comment by Kristian Nielsen [ 2024-01-31 ] |
|
See also this test case, it does almost exactly what you describe, except that it copies the master to a new slave: mysql-test/suite/mariabackup/slave_provision_nolock.test (in case you were not aware of that test). Hope this helps, - Kristian. |
| Comment by Alexander Barkov [ 2024-02-01 ] |
|
Hello knielsen, thank you for the information. I will check this test. The reason why we need a new test: The new test being added in this task found no problems in a simple scenario:
Btw, the patch is here: Now we need to investigate the problem further, under more complex conditions. |
| Comment by Kristian Nielsen [ 2024-02-01 ] |
|
Sure Bar, agree that another test is needed for mariabackup --slave-info. That's not covered by the existing test at all. The slave_provision_nolock.test was effective to catch a problem with the starting slave position from mariabackup --no-lock of the master server, it generates some DML load while the backup is running. But this was a problem with how InnoDB internally saves a copy of the binlog position. mariabackup --slave-info uses SHOW SLAVE STATUS in combination with some table locking and/or stopping the SQL thread, depending on options given. So not sure if a similar test with --slave-info would be able to catch a problem. The original test needs to be run multiple times to catch the original bug, as it was a race/non-deterministic. If the problem is always the loss of one transaction (as opposed to multiple transactions, or duplicating a transaction leading to eg. duplicate key error), that sounds like the obtained replication position is obtained one transaction later than the state of InnoDB in the backup. But that is strange, as the InnoDB commit happens first, followed by the update of the position. If the problem is a race where the position is obtained in-between that, it would be one transaction behind, not ahead. If the problem is missing table locking between backup up InnoDB state and SHOW SLAVE STATUS, we would expect that the position could be different number of transactions ahead. Hopefully you will find a way to reproduce. I'd suggest (if not done already) to carefully get exact information about what kind of locking options etc. is used for the backup, such as --nolock, --safe-slave-backup. |
| Comment by Alexander Barkov [ 2024-02-01 ] |
|
Thank you for your feedback, knielsen. |