There is a lot of problems with mariabackup, if DDL statements are executed in parallel to backup.
Some of them show up even if --lock-ddl-per-table is used.
A fundamental problem with lock-ddl-per-table is that only those tables, that exist at the start of backup, are MDL-locked. Another problem is that mariabackup will deadlock during backup's "FLUSH TABLE WITH READ LOCKS", if concurrent DDL statements run. To resolve this deadlock, we either KILL user's DDL query, or user has the option to omit FTWRL with unsafe --no-lock option (MDEV-15636)
So, there are problems which we currently have with backup in presence of Innodb DDL.
Tables created during backup are not there after prepare (with/without lock-ddl-per-table)
Tables dropped during backup might show up after prepare (without lock-ddl-per-table), or backup fails if file that was at the start of backup, is not found during copy.
If lock-ddl-per-table is used, acquiring MDL lock will fail, if the table is concurrently being dropped or renamed.
Tables that are renamed during backup, do not show up after prepare, if rename happens after table was copied.
Tables that are recreated (dropped, and created under the same name) during backup,after tablespace copy, break prepare due to different tablespace id.
Backup fails with "ALTER TABLE or OPTIMIZE TABLE was executed during backup"
Concurrent multi-rename is prone to race condition, for example table t1 can be missed from backup, if RENAME TABLE t1 to t2, t3 to t1 runs concurrently.
A possible fix to some of those problems would be to
Tolerate missing files during innodb copy - missing files can happen when DROP or RENAME runs in parallel. Also allow MDL lock failures in --lock-ddl-per-table.
At the end of backup, under protection of FTWRL, during the stage when frm and MyISAM files are copied, rescan data directory, looking for Innodb tablespaces, and copy those that are not already in backup. This will pick up tables from parallel CREATE or RENAME.
We might need to remove orphan (frm-less) .ibd files in prepare - there can be some left due to RENAME, or DROP.
Recreated (DROP/CREATE) tablespaces, that have changed tablespace ids at the end of backup. We might need to copy them to backup second time. This is be tricky for streaming backup, since xbstream format does not support multiple copies of the same file , and does not have any "delete" command. One way to workaround it, it to give second copy special extension, e.g "ibd.new" , so that "prepare" would know it can replace old copy with the new one.
Maybe some of the above can be taken care by Innodb recovery, I'm not really sure how this would work. marko, any idea on that?
"ALTER TABLE or OPTIMIZE TABLE was executed during backup" must be taken care by Innodb however.
I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works.
Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).
Vladislav Vaintroub
added a comment - - edited I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works.
Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).
MDEV-16863 Extend the RQG infrastructure for backup testing
because there are serious doubts if existing and future MTR based tests will be ever sufficient
for covering mariabackup in presence of concurrent DDL .
Matthias Leich
added a comment -
I have created
MDEV-16863 Extend the RQG infrastructure for backup testing
because there are serious doubts if existing and future MTR based tests will be ever sufficient
for covering mariabackup in presence of concurrent DDL .
mleich
Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF
logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example.
--let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END
--let after_copy_test_t2=BEGIN NOT ATOMIC DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END
--let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY
What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is
{before_copy,after_copy}_$dbname_$tablename.
it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.
Vladislav Vaintroub
added a comment - - edited mleich
Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF
logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example.
--let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END
--let after_copy_test_t2=BEGIN NOT ATOMIC DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END
--let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY
echo # xtrabackup backup;
--disable_result_log
exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir --close-files --dbug=+d,mariabackup_events;
This is an actual test, from
https://github.com/MariaDB/server/blob/bb-10.2-wlad-release/mysql-test/suite/mariabackup/recreate_table_during_backup.test#L11
What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is
{before_copy,after_copy}_$dbname_$tablename.
it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.
What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL.
The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests.
I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863.
Matthias Leich
added a comment - What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL.
The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests.
I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863 .
I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works.
Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).