Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16791

mariabackup : allow consistent backup, in presence of concurrent DDL, also without --lock-ddl-per-table

Details

    Description

      There is a lot of problems with mariabackup, if DDL statements are executed in parallel to backup.
      Some of them show up even if --lock-ddl-per-table is used.

      A fundamental problem with lock-ddl-per-table is that only those tables, that exist at the start of backup, are MDL-locked. Another problem is that mariabackup will deadlock during backup's "FLUSH TABLE WITH READ LOCKS", if concurrent DDL statements run. To resolve this deadlock, we either KILL user's DDL query, or user has the option to omit FTWRL with unsafe --no-lock option (MDEV-15636)

      So, there are problems which we currently have with backup in presence of Innodb DDL.

      1. Tables created during backup are not there after prepare (with/without lock-ddl-per-table)
      2. Tables dropped during backup might show up after prepare (without lock-ddl-per-table), or backup fails if file that was at the start of backup, is not found during copy.
      3. If lock-ddl-per-table is used, acquiring MDL lock will fail, if the table is concurrently being dropped or renamed.
      4. Tables that are renamed during backup, do not show up after prepare, if rename happens after table was copied.
      5. Tables that are recreated (dropped, and created under the same name) during backup,after tablespace copy, break prepare due to different tablespace id.
      6. Backup fails with "ALTER TABLE or OPTIMIZE TABLE was executed during backup"
      7. Concurrent multi-rename is prone to race condition, for example table t1 can be missed from backup, if RENAME TABLE t1 to t2, t3 to t1 runs concurrently.

      A possible fix to some of those problems would be to

      • Tolerate missing files during innodb copy - missing files can happen when DROP or RENAME runs in parallel. Also allow MDL lock failures in --lock-ddl-per-table.
      • At the end of backup, under protection of FTWRL, during the stage when frm and MyISAM files are copied, rescan data directory, looking for Innodb tablespaces, and copy those that are not already in backup. This will pick up tables from parallel CREATE or RENAME.
      • We might need to remove orphan (frm-less) .ibd files in prepare - there can be some left due to RENAME, or DROP.
      • Recreated (DROP/CREATE) tablespaces, that have changed tablespace ids at the end of backup. We might need to copy them to backup second time. This is be tricky for streaming backup, since xbstream format does not support multiple copies of the same file , and does not have any "delete" command. One way to workaround it, it to give second copy special extension, e.g "ibd.new" , so that "prepare" would know it can replace old copy with the new one.

      Maybe some of the above can be taken care by Innodb recovery, I'm not really sure how this would work. marko, any idea on that?
      "ALTER TABLE or OPTIMIZE TABLE was executed during backup" must be taken care by Innodb however.

      Attachments

        Issue Links

          Activity

            wlad Vladislav Vaintroub added a comment - - edited

            I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works.

            Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).

            wlad Vladislav Vaintroub added a comment - - edited I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works. Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).

            I have created
                MDEV-16863 Extend the RQG infrastructure for backup testing
            because there are serious doubts if existing and future MTR based tests will be ever sufficient
            for covering mariabackup in presence of concurrent DDL .
            

            mleich Matthias Leich added a comment - I have created MDEV-16863 Extend the RQG infrastructure for backup testing because there are serious doubts if existing and future MTR based tests will be ever sufficient for covering mariabackup in presence of concurrent DDL .
            wlad Vladislav Vaintroub added a comment - - edited

            mleich
            Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF
            logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example.

            --let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END
            --let after_copy_test_t2=BEGIN NOT ATOMIC  DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END
            --let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY
            echo # xtrabackup backup;
            --disable_result_log
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir --close-files --dbug=+d,mariabackup_events;
            

            This is an actual test, from
            https://github.com/MariaDB/server/blob/bb-10.2-wlad-release/mysql-test/suite/mariabackup/recreate_table_during_backup.test#L11

            What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is

             
            {before_copy,after_copy}_$dbname_$tablename.
            

            it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.

            wlad Vladislav Vaintroub added a comment - - edited mleich Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example. --let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END --let after_copy_test_t2=BEGIN NOT ATOMIC DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END --let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY echo # xtrabackup backup; --disable_result_log exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir --close-files --dbug=+d,mariabackup_events; This is an actual test, from https://github.com/MariaDB/server/blob/bb-10.2-wlad-release/mysql-test/suite/mariabackup/recreate_table_during_backup.test#L11 What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is {before_copy,after_copy}_$dbname_$tablename. it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.

            What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL.
            The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests.
            I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863.

            mleich Matthias Leich added a comment - What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL. The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests. I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863 .

            Is there any chance to have this fixed/implemented in versions 10.1.x?

            valerii Valerii Kravchuk added a comment - Is there any chance to have this fixed/implemented in versions 10.1.x?

            People

              Unassigned Unassigned
              wlad Vladislav Vaintroub
              Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.