[MDEV-22929] MariaBackup option to report and/or continue when corruption is encountered - Jira

Juan created issue - 2020-06-17 21:12

Juan made changes - 2020-06-17 21:12

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-21109~~ [ ~~MDEV-21109~~ ]

Juan made changes - 2020-06-19 16:20

Description

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding an innodb_focre_recovery=1 option to mariabackup, for instance?

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?

Julien Fritsch made changes - 2020-06-24 15:08

Assignee

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Nick (Inactive) made changes - 2020-07-14 21:57

Description

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?

Nick (Inactive) made changes - 2020-07-14 21:58

Summary

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

MariaBackup option to report and/or continue when corruption is encountered

Julien Fritsch made changes - 2020-07-16 15:21

Description

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

Nick (Inactive) made changes - 2020-08-19 15:46

Description

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

--------------
From Nick, Additional Thoughts: I believe we could achieve this simply by reverting or modifying https://jira.mariadb.org/browse/MDEV-20607. That changed the behavior of backup to instead of reporting errors in the log and then *Completed: Ok* to crash when an error occurs. My thought is that it would be better to continue the backup (or have a flag to allow this) and report* Completed - Errors encountered, check log instead*. This would allow customers to use backup as a corruption detection tool and allow for partial backups in the case of corrupted tables.

Vladislav Lesin added a comment - 2020-08-20 14:04

~~MDEV-20607~~ is only for innodb initialization. It does not touch the code of page consistency verification, so, no, this issue can not be implemented with just ~~MDEV-20607~~ reverting.

Vladislav Lesin added a comment - 2020-08-20 14:04 MDEV-20607 is only for innodb initialization. It does not touch the code of page consistency verification, so, no, this issue can not be implemented with just MDEV-20607 reverting.

Vladislav Lesin added a comment - 2020-08-20 14:06

I would not use mariabackup as a corruption detection tool. The backup tool must not be used for unintended purposes. We have another tools for it. CHECK TABLE, innochecksum, for example. It's better to add this functionality to one of those tools. See also this https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160912 comment.

Vladislav Lesin added a comment - 2020-08-20 14:06 I would not use mariabackup as a corruption detection tool. The backup tool must not be used for unintended purposes. We have another tools for it. CHECK TABLE, innochecksum, for example. It's better to add this functionality to one of those tools. See also this https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160912 comment.

Vladislav Lesin added a comment - 2020-08-20 14:07

I have some doubts in necessity of such option implementation. This option can potentially lead to data inconsistency in restored data. We already have such dangerous option like --no-lock, which I would like to remove, because our customers use it, then ask us to find the problem in backup, while documentation says the option does not guarantee data consistency. I think, if we implement it, it will add work for support and engineering, as it will be harder to diagnose the issues.

Vladislav Lesin added a comment - 2020-08-20 14:07 I have some doubts in necessity of such option implementation. This option can potentially lead to data inconsistency in restored data. We already have such dangerous option like --no-lock, which I would like to remove, because our customers use it, then ask us to find the problem in backup, while documentation says the option does not guarantee data consistency. I think, if we implement it, it will add work for support and engineering, as it will be harder to diagnose the issues.

Vladislav Lesin made changes - 2020-08-20 14:43

Link

This issue relates to TODO-2507 [ TODO-2507 ]

Nick (Inactive) made changes - 2020-08-20 16:03

Description

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

--------------
From Nick, Additional Thoughts: I believe we could achieve this simply by reverting or modifying https://jira.mariadb.org/browse/MDEV-20607. That changed the behavior of backup to instead of reporting errors in the log and then *Completed: Ok* to crash when an error occurs. My thought is that it would be better to continue the backup (or have a flag to allow this) and report* Completed - Errors encountered, check log instead*. This would allow customers to use backup as a corruption detection tool and allow for partial backups in the case of corrupted tables.

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

Sergei Golubchik added a comment - 2020-10-03 18:17

Not that it matters much, but the "corruption_found" flag seems a bit redundant. The sheer presence of the "backup_corrupted" file with the list of corrupted tables already means that the corruption was found, doesn't it?

Sergei Golubchik added a comment - 2020-10-03 18:17 Not that it matters much, but the "corruption_found" flag seems a bit redundant. The sheer presence of the "backup_corrupted" file with the list of corrupted tables already means that the corruption was found, doesn't it?

Vladislav Lesin added a comment - 2020-10-08 06:01 - edited

If we decide to implement it, then we could also check during --prepare if corrupted page is allocated or not in tablespace, and zero out it if it is not allocated, and does not treat it as corrupted page. In ~~MDEV-21109~~ there are non-allocated pages in tablespace, which does not pass validation during backup because they contain wrong page id and/or page number, but there must not be non-zeroed non-allocated pages in tablespaces.

Vladislav Lesin added a comment - 2020-10-08 06:01 - edited If we decide to implement it, then we could also check during --prepare if corrupted page is allocated or not in tablespace, and zero out it if it is not allocated, and does not treat it as corrupted page. In MDEV-21109 there are non-allocated pages in tablespace, which does not pass validation during backup because they contain wrong page id and/or page number, but there must not be non-zeroed non-allocated pages in tablespaces.

Vladislav Lesin made changes - 2020-10-16 08:36

Link

This issue relates to ~~MDEV-23971~~ [ ~~MDEV-23971~~ ]

Ralf Gebhardt made changes - 2020-10-27 12:36

Assignee

Ralf Gebhardt [ ralf.gebhardt@mariadb.com ]

Vladislav Lesin [ vlad.lesin ]

Vladislav Lesin added a comment - 2020-10-28 16:06

According to our discussion in slack, this and ~~MDEV-23971~~ should be joined, as they have the same source and solve the same issue.

So we introduce new --log-innodb-pages-corruption. When this option is used, mariabackup do not stop backup process if innodb page corruption is detected, it continues backup and logs corrupted pages in "backup_corrupted" file in backup destination directory, after backup is taken, mariabackup finishes execution with error and error message in backup log. On --prepare phase, mariabackup checks each page from the list in "backup_corrupted" file, if the page is not allocated in the tablespace, it's zeroed out, flushed to data file, and removed from corrupted pages list, the corresponding message is logged to backup log(stdout). If all pages from the list were restored successfully with such a manner, "backup_corrupted" file is deleted and "mariabackup --prepare" returns success. Otherwise "backup_corrupted" file will contain list of pages, which were not restored, "mariabackup --prepare" will be finished with error and error message in backup log.

Vladislav Lesin added a comment - 2020-10-28 16:06 According to our discussion in slack, this and MDEV-23971 should be joined, as they have the same source and solve the same issue. So we introduce new --log-innodb-pages-corruption. When this option is used, mariabackup do not stop backup process if innodb page corruption is detected, it continues backup and logs corrupted pages in "backup_corrupted" file in backup destination directory, after backup is taken, mariabackup finishes execution with error and error message in backup log. On --prepare phase, mariabackup checks each page from the list in "backup_corrupted" file, if the page is not allocated in the tablespace, it's zeroed out, flushed to data file, and removed from corrupted pages list, the corresponding message is logged to backup log(stdout). If all pages from the list were restored successfully with such a manner, "backup_corrupted" file is deleted and "mariabackup --prepare" returns success. Otherwise "backup_corrupted" file will contain list of pages, which were not restored, "mariabackup --prepare" will be finished with error and error message in backup log.

Vladislav Lesin made changes - 2020-10-30 13:27

Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]

Vladislav Lesin made changes - 2020-10-30 13:27

Status

Open [ 1 ]

In Progress [ 3 ]

Ralf Gebhardt made changes - 2020-11-09 20:46

Priority

Major [ 3 ]

Critical [ 2 ]

Vladislav Lesin added a comment - 2020-11-22 17:36 - edited

I pushed bb-10.2-~~MDEV-22929~~-log_corrupted_pages branch for testing. There will be conflicts on merging it to 10.[2345]. The conflicts are resolved in branches 10.[345]-MDEV-22929-log_corrupted_pages.
wlad, could you please review it?

Vladislav Lesin added a comment - 2020-11-22 17:36 - edited I pushed bb-10.2- MDEV-22929 -log_corrupted_pages branch for testing. There will be conflicts on merging it to 10. [2345] . The conflicts are resolved in branches 10. [345] -MDEV-22929-log_corrupted_pages. wlad , could you please review it?

Vladislav Lesin made changes - 2020-11-22 17:37

Assignee	Vladislav Lesin [ vlad.lesin ]	Vladislav Vaintroub [ wlad ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Vladislav Lesin made changes - 2020-11-23 10:15

Description

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

Currently Mariabackup aborts when it detects any InnoDB corruption. Needs an option to complete the backup and flag or log the corruption rather than leaving the entire server with no backup.

In situations where Mariabackup detects corruption while taking a backup, it currently aborts where InnoDB would assert, making backing up a corrupted server impossible.

This is obviously not practical when corruption in one table prevents making backups of the entire server.

Would it be possible to address this need by adding a force option like innodb_focre_recovery=1 to mariabackup, for instance?
-----------------
From Julien - Here is an additonal [explanaition |https://jira.mariadb.org/browse/MDEV-21109?focusedCommentId=160067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-160067]why this would be important to be done in 10.6 [~ralf.gebhardt@mariadb.com].

-------------------
From Vlad Lesin - Here is detailed description of the feature from commit message:

The new option --log-innodb-page-corruption is introduced.

When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.

For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.

During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.

For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.

If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.

Vladislav Lesin added a comment - 2020-11-24 11:45

Testing looks good to me: https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-22929-log_corrupted_pages

Vladislav Lesin added a comment - 2020-11-24 11:45 Testing looks good to me: https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-22929-log_corrupted_pages

Vladislav Vaintroub made changes - 2020-11-24 11:55

Assignee	Vladislav Vaintroub [ wlad ]	Vladislav Lesin [ vlad.lesin ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Vladislav Vaintroub added a comment - 2020-11-24 11:55

Looks fine.

Vladislav Vaintroub added a comment - 2020-11-24 11:55 Looks fine.

Vladislav Lesin made changes - 2020-12-01 05:16

Fix Version/s		10.2.37 [ 25112 ]
Fix Version/s		10.3.28 [ 25111 ]
Fix Version/s		10.4.18 [ 25110 ]
Fix Version/s		10.5.9 [ 25109 ]
Fix Version/s		10.6.0 [ 24431 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Vladislav Lesin made changes - 2020-12-04 05:51

Link

This issue relates to TODO-2679 [ TODO-2679 ]

Ian Gilfillan made changes - 2020-12-23 13:46

Link

This issue relates to ~~MDEV-24479~~ [ ~~MDEV-24479~~ ]

Ian Gilfillan added a comment - 2020-12-23 13:49

This needs to be documented - created ~~MDEV-24479~~

Ian Gilfillan added a comment - 2020-12-23 13:49 This needs to be documented - created MDEV-24479

Marko Mäkelä made changes - 2021-04-13 16:28

Link

This issue relates to ~~MDEV-21681~~ [ ~~MDEV-21681~~ ]

Sergei Golubchik made changes - 2021-12-06 21:24

Workflow

MariaDB v3 [ 110171 ]

MariaDB v4 [ 134292 ]

Jira Automation (IT) made changes - 2024-07-04 03:05

Zendesk Related Tickets

110821 185032 126984

MariaDB Server

MariaBackup option to report and/or continue when corruption is encountered

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration