[MDEV-22929] MariaBackup option to report and/or continue when corruption is encountered Created: 2020-06-17  Updated: 2021-04-13  Resolved: 2020-12-01

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9, 10.6.0

Type: Task Priority: Critical
Reporter: Juan Assignee: Vladislav Lesin
Resolution: Fixed Votes: 2
Labels: corruption, mariabackup

Issue Links:
Relates
relates to MDEV-21681 Log sequence numbers do not match dur... Closed
relates to MDEV-23971 add the ability to fix corrupted page... Closed
relates to MDEV-24479 Document mariabackup option from MDEV... Closed
relates to MDEV-21109 Table corruption not detected with CH... Closed

 Description   

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal explanaition why this would be important to be done in 10.6 ralf.gebhardt@mariadb.com.

-------------------
From Vlad Lesin - Here is detailed description of the feature from commit message:

The new option --log-innodb-page-corruption is introduced.

When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.

For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.

During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.

For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.

If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.



 Comments   
Comment by Vladislav Lesin [ 2020-08-20 ]

MDEV-20607 is only for innodb initialization. It does not touch the code of page consistency verification, so, no, this issue can not be implemented with just MDEV-20607 reverting.

Comment by Vladislav Lesin [ 2020-08-20 ]

I would not use mariabackup as a corruption detection tool. The backup tool must not be used for unintended purposes. We have another tools for it. CHECK TABLE, innochecksum, for example. It's better to add this functionality to one of those tools. See also this https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160912 comment.

Comment by Vladislav Lesin [ 2020-08-20 ]

I have some doubts in necessity of such option implementation. This option can potentially lead to data inconsistency in restored data. We already have such dangerous option like --no-lock, which I would like to remove, because our customers use it, then ask us to find the problem in backup, while documentation says the option does not guarantee data consistency. I think, if we implement it, it will add work for support and engineering, as it will be harder to diagnose the issues.

Comment by Sergei Golubchik [ 2020-10-03 ]

Not that it matters much, but the "corruption_found" flag seems a bit redundant. The sheer presence of the "backup_corrupted" file with the list of corrupted tables already means that the corruption was found, doesn't it?

Comment by Vladislav Lesin [ 2020-10-08 ]

If we decide to implement it, then we could also check during --prepare if corrupted page is allocated or not in tablespace, and zero out it if it is not allocated, and does not treat it as corrupted page. In MDEV-21109 there are non-allocated pages in tablespace, which does not pass validation during backup because they contain wrong page id and/or page number, but there must not be non-zeroed non-allocated pages in tablespaces.

Comment by Vladislav Lesin [ 2020-10-28 ]

According to our discussion in slack, this and MDEV-23971 should be joined, as they have the same source and solve the same issue.

So we introduce new --log-innodb-pages-corruption. When this option is used, mariabackup do not stop backup process if innodb page corruption is detected, it continues backup and logs corrupted pages in "backup_corrupted" file in backup destination directory, after backup is taken, mariabackup finishes execution with error and error message in backup log. On --prepare phase, mariabackup checks each page from the list in "backup_corrupted" file, if the page is not allocated in the tablespace, it's zeroed out, flushed to data file, and removed from corrupted pages list, the corresponding message is logged to backup log(stdout). If all pages from the list were restored successfully with such a manner, "backup_corrupted" file is deleted and "mariabackup --prepare" returns success. Otherwise "backup_corrupted" file will contain list of pages, which were not restored, "mariabackup --prepare" will be finished with error and error message in backup log.

Comment by Vladislav Lesin [ 2020-11-22 ]

I pushed bb-10.2-MDEV-22929-log_corrupted_pages branch for testing. There will be conflicts on merging it to 10.[2345]. The conflicts are resolved in branches 10.[345]-MDEV-22929-log_corrupted_pages.
wlad, could you please review it?

Comment by Vladislav Lesin [ 2020-11-24 ]

Testing looks good to me: https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-22929-log_corrupted_pages

Comment by Vladislav Vaintroub [ 2020-11-24 ]

Looks fine.

Comment by Ian Gilfillan [ 2020-12-23 ]

This needs to be documented - created MDEV-24479

Generated at Thu Feb 08 09:18:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.