[MDEV-29050] mariabackup issues error messages during InnoDB tablespaces export on partial backup preparing Created: 2022-07-06 Updated: 2023-05-03 Resolved: 2023-03-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup |
| Affects Version/s: | 10.4.20, 10.6.8 |
| Fix Version/s: | 10.11.3, 10.4.29, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Rob Schwyzer | Assignee: | Vladislav Lesin |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Environment: |
COLO running CentOS Linux release 7.9.2009 (Core) |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
User upgraded from MariaDB Community Server 10.2 to 10.4. User had setup automation to check output of mariabackup --prepare to ensure no errors occurred which could imperil the health of the backup. The basic procedure is to take a partial backup using tables-exclude or similar. Prepare is then performed with export. In general, mariabackup does not seem to explicitly track what tables were excluded between backup and prepare steps. In 10.2, during prepare export we see-
To be clear, the backup stage for the above log snippet used tables-exclude='Slap2' successfully. The good news for 10.2 is this is flagged as a Warning, recognizing that prepare export simply does not know if this is a problem or not and it may be legitimate/intended behavior. 10.4 (possibly 10.3; most explicitly, possibly 10.3.5) changes this for the worse-
So now in 10.4, it is referencing the same table which 10.2 identified, but instead of throwing a WARNING, 10.4 is putting an ERROR into its output. While it seems both 10.2 and 10.4 return 0 indicating a successful process exit, users practicing due diligence are rightly going to see an ERROR entry and figure out what to do about it. Between the 10.2 and 10.4 behaviors, the 10.2 behavior of flagging this as a WARNING is preferable because it differentiates the issue from breaking errors, especially given this is just expected partial backup operation. And while the specific concern being expressed here is for how this makes life difficult for automation, it is also worth pointing out that 10.4's behavior makes it less clear than 10.2 does for what is going on and what may be the cause, such that it is likely to cause even experienced DBAs who have not encountered this situation yet cause to panic and worry that their partial backup is broken. In an ideal case though, no errors or warnings should be thrown for table files which are missing due to exclusions made during mariabackup backup. According to our KB documentation for --tables-exclude-
In short, mariabackup currently advertises retaining this information for it to use later. This is the behavior we would like to see mariabackup prepare export provide. The problematic behavior can be reproduced via the following steps on MariaDB Community Server 10.6.8:
In the output for the last mariabackup command you should see [ERROR] InnoDB: Operating system error number 2 in a file operation.. You can adapt the above procedure for Community Server 10.2 by adjusting the mariabackup user GRANT to-
The process is otherwise the same and should output a WARNING instead of an ERROR. A .zip is attached with log output from backup and {{prepare}] commands run from the above procedure for MariaDB versions 10.2.27, 10.2.43, 10.4.20, and 10.6.8. |
| Comments |
| Comment by Thejaka Kanewala [ 2022-07-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In an attempt to fix this issue, I created this PR – https://github.com/MariaDB/server/pull/2183. Appreciate your review comments. Thanks | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thejaka Kanewala [ 2022-07-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Further, I have a feeling the issue might have an impact on incremental 'backup restores' and restores with rollback_xa. Yet to test this in a MTR test, in the mean time I would like to know Maria's opinion on this. Thanks | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thejaka Kanewala [ 2022-08-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Addressed the review comments for https://github.com/MariaDB/server/pull/2183. Thank you | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-08-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am only guessing here, because I did not investigate this in detail. Possibly the undo log format change of I posted a comment to thejaka’s pull request. I think that it may break the export logic. If the problem is that undo log records are referring to excluded tables, that should be addressed at a lower level. If my guess about | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thejaka Kanewala [ 2022-09-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Addressed the above review feedback from Marko and revised the PR. The revised PR is in https://github.com/MariaDB/server/pull/2183. The revised fix has the following: 1. Revert the code changes made in the previous submission Appreciate your feedback on the revised PR. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
thejaka, I think that it would be better to adjust the low-level code that triggers the error than to duplicate some logic. I think that a better approach would be to check the latest 10.6 branch and identify the code that causes the trouble, and add some conditions of srv_operation as needed. Once we have something that works for 10.6, we can see how to backport that to earlier major versions. The data dictionary interface was heavily refactored in 10.6 to support atomic DDL. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce the following with some changes to an existing test:
(The rr record command in the test should not be part of the final modification to the test.)
For some reason, my breakpoints would not work in rr, so I cannot determine the call stacks for those messages. When it comes to the server version that I tested, this appears to be a cosmetic error to me: the non-excluded data files ought to be exported just fine. I think that the correct fix would be to suppress those messages if the file names match the exclude patterns. Side note: In mariabackup --export, there should be no need to create an InnoDB temporary tablespace. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
10.4 is a bit more spammy:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-03-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The difference between 10.3 and 10.6 is in "strict" variable calculation in fil_ibd_open() . 10.3:
10.6
We could partially repeat the logic in 10.[345] to fix the bug. |