[MDEV-16791] mariabackup : allow consistent backup, in presence of concurrent DDL, also without --lock-ddl-per-table Created: 2018-07-20 Updated: 2020-08-25 Resolved: 2018-08-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Backup |
| Fix Version/s: | 10.2.18, 10.3.10 |
| Type: | Task | Priority: | Major |
| Reporter: | Vladislav Vaintroub | Assignee: | Unassigned |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
There is a lot of problems with mariabackup, if DDL statements are executed in parallel to backup. A fundamental problem with lock-ddl-per-table is that only those tables, that exist at the start of backup, are MDL-locked. Another problem is that mariabackup will deadlock during backup's "FLUSH TABLE WITH READ LOCKS", if concurrent DDL statements run. To resolve this deadlock, we either KILL user's DDL query, or user has the option to omit FTWRL with unsafe --no-lock option ( So, there are problems which we currently have with backup in presence of Innodb DDL.
A possible fix to some of those problems would be to
Maybe some of the above can be taken care by Innodb recovery, I'm not really sure how this would work. marko, any idea on that? |
| Comments |
| Comment by Manjot Singh (Inactive) [ 2018-07-20 ] | ||||||||||||
|
I think we should meet the lessened risk from xtrabackup implementation (ie the point and ways where it can fail) and then work to lower risk even more. | ||||||||||||
| Comment by Vladislav Vaintroub [ 2018-07-20 ] | ||||||||||||
|
manjot, not getting it. What's lessened risk? This is xtrabackup logic that is there currently. Our lock-ddl-per-table is already a solid improvement compared to xtrabackup - it does not force unsafe no-lock, and it does not read potentially huge rows with SELECT * from TABLE LIMIT 1, and it locks tables with fulltext correctly. also, we do not get "table changed since transaction start, please retry" when locking MDL | ||||||||||||
| Comment by Marko Mäkelä [ 2018-07-25 ] | ||||||||||||
|
I believe that an extra step of copying files at the end of the backup can be avoided. As I noted in File creation can be detected by observing a MLOG_INIT_FILE_PAGE2 record for page number 0. There also is a preceding MLOG_FILE_CREATE2 record, as well as a MLOG_FILE_NAME record:
In the above 2 mini-transactions, the file is created and the first two pages (0 and 1) are initialized. The redo log does contain all the necessary information for creating and initializing the file. Normal InnoDB crash recovery does not apply any file deletion or creation operations; it only applies rename operations (MLOG_FILE_RENAME2). (Before File deletion is reflected by MLOG_FILE_DELETE records. For normal recovery, these records are merely informational (indicating that recovery can safely ignore any preceding log records for the deleted file). (Starting with MySQL 5.7.4 and MariaDB 10.2.2, InnoDB does not silently ignore redo log records for files that are missing.) I will try an alternative approach where mariabackup --backup or mariabackup --prepare will create, delete and rename files based on the redo log records. An extra file-copying step would only be necessary for handling MLOG_INDEX_LOAD records (if their creation is not prevented by | ||||||||||||
| Comment by Marko Mäkelä [ 2018-07-25 ] | ||||||||||||
|
One more note: Operations on .frm files are not directly reflected by the InnoDB redo log. Therefore, we probably should process all the MLOG_FILE_ records at the final phase of mariabackup --backup already, and extend the processing to the corresponding .frm files as well. That is:
I believe that in order for the .frm file copying or renaming to be safe, this part may have to be protected by FLUSH TABLES WITH READ LOCK or similar, to prevent the .frm files from being concurrently created or renamed at the server data directory, for example if there would be a subsequent table-rebuilding ALTER TABLE at the very end of the backup. | ||||||||||||
| Comment by Vladislav Vaintroub [ 2018-07-25 ] | ||||||||||||
|
There is no need to do anything to .frm at the final phase of the backup, or later. There is no absolute need to process log records during backup, it can be replaced by rescanning the data directory, and reloading tablespaces. It is also done under FTWRL protections, in normal case. We would know which files are created, recreated, dropped, renamed (and their ultimate name, not all possible intermediate ones that we do not need). Basically, we need, at the end of backup
How do I handle this currently, is in function that is incorrectly named "copy_tablespaces_created_during_backup" https://github.com/MariaDB/server/commit/27a52d90eb0dde52c585011dbfffe8f8d3435021#diff-e63469103825200ae5c25b83f957c7bdR4284 Basically, backup maintains space_id and space names, for everything that was copied during backup. There is also "fixup DDL" phase at the start of "prepare". | ||||||||||||
| Comment by Marko Mäkelä [ 2018-07-25 ] | ||||||||||||
|
The redo log provides an efficient way of obtaining a mapping from tablespace IDs to file names. We could extend 0001-MDEV-16791-Trace-file-operations-during-backup.patch | ||||||||||||
| Comment by Vladislav Vaintroub [ 2018-07-25 ] | ||||||||||||
|
I agree, yet I'd first get an easy version, which already works in bb-10.2-wlad-release. Then optimize for performance, once other stuff is tested and works. Either way, we need to collect and classify DDL changes at the end of backup. changing the mechanism from rescan/reload to collecting redo log events should not be the hardest thing to do (once everything else works well). | ||||||||||||
| Comment by Matthias Leich [ 2018-07-31 ] | ||||||||||||
|
| ||||||||||||
| Comment by Vladislav Vaintroub [ 2018-07-31 ] | ||||||||||||
|
mleich
This is an actual test, from What it does, on certain points in mariabackup, it "SQL execute"s an environment variable from mariabackup itself
it is not "random" as RQG, and it relies on debug compilation, but for testing base cases in deterministic fashion it works ok. | ||||||||||||
| Comment by Matthias Leich [ 2018-07-31 ] | ||||||||||||
|
What you present is a very good example of MTR based exemplaric tests which we must have for any feature sensitive to concurrent SQL. | ||||||||||||
| Comment by Valerii Kravchuk [ 2019-02-11 ] | ||||||||||||
|
Is there any chance to have this fixed/implemented in versions 10.1.x? |