Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16791

mariabackup : allow consistent backup, in presence of concurrent DDL, also without --lock-ddl-per-table

Details

    Description

      There is a lot of problems with mariabackup, if DDL statements are executed in parallel to backup.
      Some of them show up even if --lock-ddl-per-table is used.

      A fundamental problem with lock-ddl-per-table is that only those tables, that exist at the start of backup, are MDL-locked. Another problem is that mariabackup will deadlock during backup's "FLUSH TABLE WITH READ LOCKS", if concurrent DDL statements run. To resolve this deadlock, we either KILL user's DDL query, or user has the option to omit FTWRL with unsafe --no-lock option (MDEV-15636)

      So, there are problems which we currently have with backup in presence of Innodb DDL.

      1. Tables created during backup are not there after prepare (with/without lock-ddl-per-table)
      2. Tables dropped during backup might show up after prepare (without lock-ddl-per-table), or backup fails if file that was at the start of backup, is not found during copy.
      3. If lock-ddl-per-table is used, acquiring MDL lock will fail, if the table is concurrently being dropped or renamed.
      4. Tables that are renamed during backup, do not show up after prepare, if rename happens after table was copied.
      5. Tables that are recreated (dropped, and created under the same name) during backup,after tablespace copy, break prepare due to different tablespace id.
      6. Backup fails with "ALTER TABLE or OPTIMIZE TABLE was executed during backup"
      7. Concurrent multi-rename is prone to race condition, for example table t1 can be missed from backup, if RENAME TABLE t1 to t2, t3 to t1 runs concurrently.

      A possible fix to some of those problems would be to

      • Tolerate missing files during innodb copy - missing files can happen when DROP or RENAME runs in parallel. Also allow MDL lock failures in --lock-ddl-per-table.
      • At the end of backup, under protection of FTWRL, during the stage when frm and MyISAM files are copied, rescan data directory, looking for Innodb tablespaces, and copy those that are not already in backup. This will pick up tables from parallel CREATE or RENAME.
      • We might need to remove orphan (frm-less) .ibd files in prepare - there can be some left due to RENAME, or DROP.
      • Recreated (DROP/CREATE) tablespaces, that have changed tablespace ids at the end of backup. We might need to copy them to backup second time. This is be tricky for streaming backup, since xbstream format does not support multiple copies of the same file , and does not have any "delete" command. One way to workaround it, it to give second copy special extension, e.g "ibd.new" , so that "prepare" would know it can replace old copy with the new one.

      Maybe some of the above can be taken care by Innodb recovery, I'm not really sure how this would work. marko, any idea on that?
      "ALTER TABLE or OPTIMIZE TABLE was executed during backup" must be taken care by Innodb however.

      Attachments

        Issue Links

          Activity

            I think we should meet the lessened risk from xtrabackup implementation (ie the point and ways where it can fail) and then work to lower risk even more.

            manjot Manjot Singh (Inactive) added a comment - I think we should meet the lessened risk from xtrabackup implementation (ie the point and ways where it can fail) and then work to lower risk even more.
            wlad Vladislav Vaintroub added a comment - - edited

            manjot, not getting it. What's lessened risk? This is xtrabackup logic that is there currently.
            Percona's real solution is backup locks, and this is not anything we get into 10.2

            Our lock-ddl-per-table is already a solid improvement compared to xtrabackup - it does not force unsafe no-lock, and it does not read potentially huge rows with SELECT * from TABLE LIMIT 1, and it locks tables with fulltext correctly. also, we do not get "table changed since transaction start, please retry" when locking MDL

            wlad Vladislav Vaintroub added a comment - - edited manjot , not getting it. What's lessened risk? This is xtrabackup logic that is there currently. Percona's real solution is backup locks, and this is not anything we get into 10.2 Our lock-ddl-per-table is already a solid improvement compared to xtrabackup - it does not force unsafe no-lock, and it does not read potentially huge rows with SELECT * from TABLE LIMIT 1, and it locks tables with fulltext correctly. also, we do not get "table changed since transaction start, please retry" when locking MDL

            I believe that an extra step of copying files at the end of the backup can be avoided. As I noted in MDEV-16809, it would only be necessary if redo logging was disabled during ALTER TABLE, and the copied redo log contains MLOG_INDEX_LOAD records, informing that some changes to data pages may have been missed. With MDEV-16809, the full redo logging can be enabled by SET GLOBAL innodb_log_optimize_ddl=OFF. In this case, no MLOG_INDEX_LOAD record will be emitted.

            File creation can be detected by observing a MLOG_INIT_FILE_PAGE2 record for page number 0. There also is a preceding MLOG_FILE_CREATE2 record, as well as a MLOG_FILE_NAME record:

            ./mtr --mysqld=--debug=d,ib_log innodb.log_file_name

            recv_group_scan_log_recs: ib_log: scan 1633817: multi-log rec MLOG_FILE_CREATE2 len 23 page 4:0
            recv_group_scan_log_recs: ib_log: scan 1633817: multi-log rec MLOG_FILE_NAME len 19 page 4:0
            recv_group_scan_log_recs: ib_log: scan 1633817: multi-log end total_len 43 n=3
            recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_INIT_FILE_PAGE2 len 3 page 4:1
            recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_IBUF_BITMAP_INIT len 3 page 4:1
            recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_FILE_NAME len 19 page 4:0
            recv_group_scan_log_recs: ib_log: scan 1633860: multi-log end total_len 26 n=4
            recv_group_scan_log_recs: ib_log: scan 1633886: multi-log rec MLOG_INIT_FILE_PAGE2 len 3 page 4:0
            recv_group_scan_log_recs: ib_log: scan 1633886: multi-log rec MLOG_2BYTES len 6 page 4:0
            recv_group_scan_log_recs: ib_log: scan 1633886: multi-log end total_len 388 n=54
            

            In the above 2 mini-transactions, the file is created and the first two pages (0 and 1) are initialized. The redo log does contain all the necessary information for creating and initializing the file.

            Normal InnoDB crash recovery does not apply any file deletion or creation operations; it only applies rename operations (MLOG_FILE_RENAME2). (Before MDEV-14717 (MariaDB 10.3.3), RENAME TABLE is not crash-safe within InnoDB. But, this does not affect backup; mariabackup --backup would fail anyway if the server crashed during the backup.

            File deletion is reflected by MLOG_FILE_DELETE records. For normal recovery, these records are merely informational (indicating that recovery can safely ignore any preceding log records for the deleted file). (Starting with MySQL 5.7.4 and MariaDB 10.2.2, InnoDB does not silently ignore redo log records for files that are missing.)

            I will try an alternative approach where mariabackup --backup or mariabackup --prepare will create, delete and rename files based on the redo log records. An extra file-copying step would only be necessary for handling MLOG_INDEX_LOAD records (if their creation is not prevented by MDEV-16809 SET GLOBAL innodb_log_optimize_ddl=OFF).

            marko Marko Mäkelä added a comment - I believe that an extra step of copying files at the end of the backup can be avoided. As I noted in MDEV-16809 , it would only be necessary if redo logging was disabled during ALTER TABLE , and the copied redo log contains MLOG_INDEX_LOAD  records, informing that some changes to data pages may have been missed. With MDEV-16809 , the full redo logging can be enabled by SET GLOBAL innodb_log_optimize_ddl=OFF . In this case, no MLOG_INDEX_LOAD record will be emitted. File creation can be detected by observing a MLOG_INIT_FILE_PAGE2 record for page number 0. There also is a preceding MLOG_FILE_CREATE2 record, as well as a MLOG_FILE_NAME record: ./mtr --mysqld=--debug=d,ib_log innodb.log_file_name recv_group_scan_log_recs: ib_log: scan 1633817: multi-log rec MLOG_FILE_CREATE2 len 23 page 4:0 recv_group_scan_log_recs: ib_log: scan 1633817: multi-log rec MLOG_FILE_NAME len 19 page 4:0 recv_group_scan_log_recs: ib_log: scan 1633817: multi-log end total_len 43 n=3 recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_INIT_FILE_PAGE2 len 3 page 4:1 recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_IBUF_BITMAP_INIT len 3 page 4:1 recv_group_scan_log_recs: ib_log: scan 1633860: multi-log rec MLOG_FILE_NAME len 19 page 4:0 recv_group_scan_log_recs: ib_log: scan 1633860: multi-log end total_len 26 n=4 recv_group_scan_log_recs: ib_log: scan 1633886: multi-log rec MLOG_INIT_FILE_PAGE2 len 3 page 4:0 recv_group_scan_log_recs: ib_log: scan 1633886: multi-log rec MLOG_2BYTES len 6 page 4:0 … recv_group_scan_log_recs: ib_log: scan 1633886: multi-log end total_len 388 n=54 In the above 2 mini-transactions, the file is created and the first two pages (0 and 1) are initialized. The redo log does contain all the necessary information for creating and initializing the file. Normal InnoDB crash recovery does not apply any file deletion or creation operations; it only applies rename operations ( MLOG_FILE_RENAME2 ). (Before MDEV-14717 (MariaDB 10.3.3), RENAME TABLE is not crash-safe within InnoDB. But, this does not affect backup; mariabackup --backup would fail anyway if the server crashed during the backup. File deletion is reflected by MLOG_FILE_DELETE records. For normal recovery, these records are merely informational (indicating that recovery can safely ignore any preceding log records for the deleted file). (Starting with MySQL 5.7.4 and MariaDB 10.2.2, InnoDB does not silently ignore redo log records for files that are missing.) I will try an alternative approach where mariabackup --backup or mariabackup --prepare will create, delete and rename files based on the redo log records. An extra file-copying step would only be necessary for handling MLOG_INDEX_LOAD records (if their creation is not prevented by MDEV-16809 SET GLOBAL innodb_log_optimize_ddl=OFF ).

            One more note: Operations on .frm files are not directly reflected by the InnoDB redo log. Therefore, we probably should process all the MLOG_FILE_ records at the final phase of mariabackup --backup already, and extend the processing to the corresponding .frm files as well. That is:

            • MLOG_FILE_CREATE2 must be handled by copying the .frm file (and creating a stub for .ibd file)
            • MLOG_FILE_RENAME2 must be handled by renaming the .frm file (or maybe safer: copy-or-delete both the ‘old’ and ‘new’ file names from the data directory to the backup)
            • MLOG_FILE_DELETE2 must be handled by deleting the .frm and .ibd files from the backup

            I believe that in order for the .frm file copying or renaming to be safe, this part may have to be protected by FLUSH TABLES WITH READ LOCK or similar, to prevent the .frm files from being concurrently created or renamed at the server data directory, for example if there would be a subsequent table-rebuilding ALTER TABLE at the very end of the backup.

            marko Marko Mäkelä added a comment - One more note: Operations on .frm files are not directly reflected by the InnoDB redo log. Therefore, we probably should process all the MLOG_FILE_ records at the final phase of mariabackup --backup already, and extend the processing to the corresponding .frm files as well. That is: MLOG_FILE_CREATE2 must be handled by copying the .frm file (and creating a stub for .ibd file) MLOG_FILE_RENAME2 must be handled by renaming the .frm file (or maybe safer: copy-or-delete both the ‘old’ and ‘new’ file names from the data directory to the backup) MLOG_FILE_DELETE2 must be handled by deleting the .frm and .ibd files from the backup I believe that in order for the .frm file copying or renaming to be safe, this part may have to be protected by FLUSH TABLES WITH READ LOCK or similar, to prevent the .frm files from being concurrently created or renamed at the server data directory, for example if there would be a subsequent table-rebuilding ALTER TABLE at the very end of the backup.

            There is no need to do anything to .frm at the final phase of the backup, or later.
            .frms are copied under FTWRL already (we assume normal case, without --no-lock, which does not give any guanrantees).

            There is no absolute need to process log records during backup, it can be replaced by rescanning the data directory, and reloading tablespaces. It is also done under FTWRL protections, in normal case. We would know which files are created, recreated, dropped, renamed (and their ultimate name, not all possible intermediate ones that we do not need).

            Basically, we need, at the end of backup

            • list of new tablespaces (.ibd files)
            • map of renamed tablespaces (name of the file at the start of backup => name of the file at the end of backup)
            • list of dropped tablespaces
            • list of recreates (i.e dropped, then created under the same name)

            How do I handle this currently, is in function that is incorrectly named "copy_tablespaces_created_during_backup" https://github.com/MariaDB/server/commit/27a52d90eb0dde52c585011dbfffe8f8d3435021#diff-e63469103825200ae5c25b83f957c7bdR4284

            Basically, backup maintains space_id and space names, for everything that was copied during backup.
            There is reload of tablespaces at the end of backup, and some analysis of what has changed between the start and end of backup.

            There is also "fixup DDL" phase at the start of "prepare".

            wlad Vladislav Vaintroub added a comment - There is no need to do anything to .frm at the final phase of the backup, or later. .frms are copied under FTWRL already (we assume normal case, without --no-lock, which does not give any guanrantees). There is no absolute need to process log records during backup, it can be replaced by rescanning the data directory, and reloading tablespaces. It is also done under FTWRL protections, in normal case. We would know which files are created, recreated, dropped, renamed (and their ultimate name, not all possible intermediate ones that we do not need). Basically, we need, at the end of backup list of new tablespaces (.ibd files) map of renamed tablespaces (name of the file at the start of backup => name of the file at the end of backup) list of dropped tablespaces list of recreates (i.e dropped, then created under the same name) How do I handle this currently, is in function that is incorrectly named "copy_tablespaces_created_during_backup" https://github.com/MariaDB/server/commit/27a52d90eb0dde52c585011dbfffe8f8d3435021#diff-e63469103825200ae5c25b83f957c7bdR4284 Basically, backup maintains space_id and space names, for everything that was copied during backup. There is reload of tablespaces at the end of backup, and some analysis of what has changed between the start and end of backup. There is also "fixup DDL" phase at the start of "prepare".

            The redo log provides an efficient way of obtaining a mapping from tablespace IDs to file names. We could extend 0001-MDEV-16791-Trace-file-operations-during-backup.patch that I created today. File creations are easiest to handle by copying the 4 first pages (FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift bytes). If we do not utilize the redo log, detecting file renames could require reading tablespace ID (bytes 34..37) from every data file at the end of the backup. I would not want to do that while the server is blocked by FLUSH TABLES WITH READ LOCK. There could be millions of files in the worst case.

            marko Marko Mäkelä added a comment - The redo log provides an efficient way of obtaining a mapping from tablespace IDs to file names. We could extend 0001-MDEV-16791-Trace-file-operations-during-backup.patch that I created today. File creations are easiest to handle by copying the 4 first pages ( FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift bytes). If we do not utilize the redo log, detecting file renames could require reading tablespace ID (bytes 34..37) from every data file at the end of the backup. I would not want to do that while the server is blocked by FLUSH TABLES WITH READ LOCK . There could be millions of files in the worst case.
            wlad Vladislav Vaintroub added a comment - - edited

            I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works.

            Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).

            wlad Vladislav Vaintroub added a comment - - edited I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works. Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well).

            I have created
                MDEV-16863 Extend the RQG infrastructure for backup testing
            because there are serious doubts if existing and future MTR based tests will be ever sufficient
            for covering mariabackup in presence of concurrent DDL .
            

            mleich Matthias Leich added a comment - I have created MDEV-16863 Extend the RQG infrastructure for backup testing because there are serious doubts if existing and future MTR based tests will be ever sufficient for covering mariabackup in presence of concurrent DDL .
            wlad Vladislav Vaintroub added a comment - - edited

            mleich
            Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF
            logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example.

            --let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END
            --let after_copy_test_t2=BEGIN NOT ATOMIC  DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END
            --let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY
            echo # xtrabackup backup;
            --disable_result_log
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir --close-files --dbug=+d,mariabackup_events;
            

            This is an actual test, from
            https://github.com/MariaDB/server/blob/bb-10.2-wlad-release/mysql-test/suite/mariabackup/recreate_table_during_backup.test#L11

            What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is

             
            {before_copy,after_copy}_$dbname_$tablename.
            

            it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.

            wlad Vladislav Vaintroub added a comment - - edited mleich Well, there is no doubt that RQG is good for this kind of testing, however some of deterministic "parallel" DDL is already possible, ´with DBUG_EXECUTE_IF logic. so one can tell to execute a specific query predefined stages of the backup, after or before a specific table is copied, along these lines for example. --let before_copy_test_t1=BEGIN NOT ATOMIC DROP TABLE test.t1;CREATE TABLE test.t1 ENGINE=INNODB SELECT UUID() from test.seq_1_to_100; END --let after_copy_test_t2=BEGIN NOT ATOMIC DROP TABLE test.t2;CREATE TABLE test.t2 ENGINE=INNODB SELECT UUID() from test.seq_1_to_1000; END --let after_copy_test_t3=ALTER TABLE test.t3 ADD INDEX index_a(a),ALGORITHM=COPY echo # xtrabackup backup; --disable_result_log exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir --close-files --dbug=+d,mariabackup_events; This is an actual test, from https://github.com/MariaDB/server/blob/bb-10.2-wlad-release/mysql-test/suite/mariabackup/recreate_table_during_backup.test#L11 What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself for the cases above, the env.variable name in question is {before_copy,after_copy}_$dbname_$tablename. it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok.

            What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL.
            The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests.
            I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863.

            mleich Matthias Leich added a comment - What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL. The approach via RQG is mostly just a brute force safety net for bugs which are not caught by the existing exemplaric tests. I do not vote for making the completion of MDEV-16791 depending on completion of MDEV-16863 .

            Is there any chance to have this fixed/implemented in versions 10.1.x?

            valerii Valerii Kravchuk added a comment - Is there any chance to have this fixed/implemented in versions 10.1.x?

            People

              Unassigned Unassigned
              wlad Vladislav Vaintroub
              Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.