[MDEV-21109] Table corruption not detected with CHECK TABLE or innochecksum, only with mariabackup - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.3.17, 10.3.21, 10.3.23
Fix Version/s: N/A
Component/s: mariabackup
Labels:
- corruption

Description

It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:

Backup command used:

/var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019

mariabackup output:

[00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock

[00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log

/var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64)

....

....

[01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp

[01] 2019-11-16 06:29:27         ...done

[01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...

[01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying...

[00] 2019-11-16 06:29:28 >> log scanned up to (30851334774)

[01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted.

2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):

.......

.......

InnoDB: End of page dump

2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962

2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185

[01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed.

[00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile.

Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring.

set OLD_ALTER_TABLE=1
Alter table table_name engine=InnoDB
Alter table table_name FORCE
Take mysqldump of table and restore it to Database

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

break_down_rate.ddl
2020-05-20 23:52
0.8 kB
Juan
break_down_rate.ibd
2020-05-20 23:52
112 kB
Juan
corrupt20200921.tgz.enc
2020-09-22 00:42
10 kB
Juan
cs0093460_20200807_core.21139.gza
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gzb
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gzc
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gzd
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gze
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gzf
2020-08-14 21:27
10.00 MB
Juan
cs0093460_20200807_core.21139.gzg
2020-08-14 21:28
10.00 MB
Juan
cs0093460_20200807_core.21139.gzh
2020-08-14 21:29
10.00 MB
Juan
cs0093460_20200807_core.21139.gzi
2020-08-14 21:26
10.00 MB
Juan
cs0093460_20200807_core.21139.gzj
2020-08-14 21:27
10.00 MB
Juan
cs0093460_20200807_core.21139.gzk
2020-08-14 21:27
10.00 MB
Juan
cs0093460_20200807_core.21139.gzl
2020-08-14 21:27
10.00 MB
Juan
cs0093460_20200807_core.21139.gzm
2020-08-14 21:26
8.14 MB
Juan
cs0093460_20200807_error_mariadb_x01bofiddb1a.log
2020-08-14 21:29
78 kB
Juan
cs0093460_20200807_lib.tar.gza
2020-08-14 21:33
10.00 MB
Juan
cs0093460_20200807_lib.tar.gzb
2020-08-14 21:34
4.03 MB
Juan
cs0093460_20200917_error_mariadb_x01bstredb1a.log
2020-09-22 00:33
14 kB
Juan
cs0093460_20200917_full_backup_20200916_054536.log
2020-09-22 00:32
30 kB
Juan
cs0093460_20200917_full_backup_20200917_120913.log
2020-09-22 00:30
29 kB
Juan
mdev-21109-rr.sh
2020-08-25 09:32
4 kB
Vladislav Lesin
show_global_status_x07gisiddb3a.log
2020-10-29 00:44
62 kB
Allen Lee
show_global_variables_x07gisiddb3a.log
2020-10-29 00:44
475 kB
Allen Lee
t33914-202006252311.log
2020-06-26 03:22
104 kB
Juan
t33914-202006252313.tgz.enc.a
2020-06-26 03:27
9.00 MB
Juan
t33914-202006252313.tgz.enc.b
2020-06-26 03:28
9.00 MB
Juan
t33914-202006252313.tgz.enc.c
2020-06-26 03:28
9.00 MB
Juan
t33914-202006252313.tgz.enc.d
2020-06-26 03:28
8.09 MB
Juan

Issue Links

is duplicated by

MDEV-24260 mariabackup and innochecksum detects page faults but all ok in application

Closed

relates to

MDEV-19871 Add page id matching check in innochecksum tool

Closed

MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered

Closed

MDEV-23971 add the ability to fix corrupted pages on --prepare

Closed

MDEV-29938 InnoDB: Assertion failure in btr0pcur.cc line 532

Open

MDEV-25361 innochecksum must not report errors for freed pages

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 7 older comments

Vladislav Lesin added a comment - 2020-10-05 08:14

marko Yes, the "corruption" can be detected by innochecksum with ~~MDEV-19871~~ fix. And yes, we can zero out non-allocated pages both with innochecksum and "mariabackup --prepare". But I have the following concerns:

1. innochecksum is the tool which is supposed to check tablespaces and does not modify them. We have option to rewrite checksum algorithm. But it modifies only checksums. Should we turn the tool for checksums check to the tool for corruptions fixing?

2. "mariabackup --prepare" could fix such pages. But, as I understood, there must not be non-zero non-allocated pages. And we need to understand what is the source of such pages. And when we understand it, the code to fix such pages will be useless. There will be one more option which is rarely used, and which we have to maintain until some major version.

Vladislav Lesin added a comment - 2020-10-05 08:14 marko Yes, the "corruption" can be detected by innochecksum with MDEV-19871 fix. And yes, we can zero out non-allocated pages both with innochecksum and "mariabackup --prepare". But I have the following concerns: 1. innochecksum is the tool which is supposed to check tablespaces and does not modify them. We have option to rewrite checksum algorithm. But it modifies only checksums. Should we turn the tool for checksums check to the tool for corruptions fixing? 2. "mariabackup --prepare" could fix such pages. But, as I understood, there must not be non-zero non-allocated pages. And we need to understand what is the source of such pages. And when we understand it, the code to fix such pages will be useless. There will be one more option which is rarely used, and which we have to maintain until some major version.

Marko Mäkelä added a comment - 2021-09-14 16:38

One change that we should be able to implement rather easily in mariabackup --prepare if it has not been already done, is what ~~MDEV-25361~~ did to innochecksum: Do not care about corrupted pages that are marked as freed in the data files. Is that fix still missing, now that ~~MDEV-22929~~ has been implemented in backup? Does someone get bogus page corruption alarms for pages that are actually marked as free?

I think that we need an executable test case that does not involve any pre-corrupted .ibd files but only depends on a newly reinitialized server, to demonstrate what actually is not working as expected.

While waiting for a test case, I think that we should fix reproducible backup bugs, such as ~~MDEV-18200~~ and ~~MDEV-26326~~.

The MariaDB Server 10.6 release fixed the last known InnoDB violations of the write-ahead logging protocol that I was aware of: ~~MDEV-24626~~ and ~~MDEV-25506~~. With that release, most remaining backup problems should be bugs in the backup code, not in the server.

Marko Mäkelä added a comment - 2021-09-14 16:38 One change that we should be able to implement rather easily in mariabackup --prepare if it has not been already done, is what MDEV-25361 did to innochecksum : Do not care about corrupted pages that are marked as freed in the data files. Is that fix still missing, now that MDEV-22929 has been implemented in backup? Does someone get bogus page corruption alarms for pages that are actually marked as free? I think that we need an executable test case that does not involve any pre-corrupted .ibd files but only depends on a newly reinitialized server, to demonstrate what actually is not working as expected. While waiting for a test case, I think that we should fix reproducible backup bugs, such as MDEV-18200 and MDEV-26326 . The MariaDB Server 10.6 release fixed the last known InnoDB violations of the write-ahead logging protocol that I was aware of: MDEV-24626 and MDEV-25506 . With that release, most remaining backup problems should be bugs in the backup code, not in the server.

Valerii Kravchuk added a comment - 2021-09-14 17:24

Isn't this bug report about CHECK TABLE (and innochecksum) behavior, a request to find problems that affect mariabackup at earlier stages? OK, in new version 10.6 mariabackup should work in a more robust way, but what about 10.2 - 10.5?

Valerii Kravchuk added a comment - 2021-09-14 17:24 Isn't this bug report about CHECK TABLE (and innochecksum) behavior, a request to find problems that affect mariabackup at earlier stages? OK, in new version 10.6 mariabackup should work in a more robust way, but what about 10.2 - 10.5?

Marko Mäkelä added a comment - 2021-09-17 09:10

In the original customer case that prompted us to file ~~MDEV-19871~~ (and later this ticket), we found that a data file had an incorrect page written to it. This was never reproducible outside that customer’s environment. We added some internal checks to detect incorrect page ID when writing to a file, and those checks have never failed. I can only suspect that something was wrong in the underlying layer (corrupted file system causing a block to be mapped to two files, or incorrectly working block device).

valerii, it is not clear to me whether there are any "problems that affect mariabackup at earlier stages" anymore. I believe that my comment in this ticket from 2020-06-10 was actually addressed in ~~MDEV-22929~~ and ~~MDEV-25361~~:

The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free.

What is the exact problem that we want to detect and where?

Marko Mäkelä added a comment - 2021-09-17 09:10 In the original customer case that prompted us to file MDEV-19871 (and later this ticket), we found that a data file had an incorrect page written to it. This was never reproducible outside that customer’s environment. We added some internal checks to detect incorrect page ID when writing to a file, and those checks have never failed. I can only suspect that something was wrong in the underlying layer (corrupted file system causing a block to be mapped to two files, or incorrectly working block device). valerii , it is not clear to me whether there are any "problems that affect mariabackup at earlier stages" anymore. I believe that my comment in this ticket from 2020-06-10 was actually addressed in MDEV-22929 and MDEV-25361 : The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free. What is the exact problem that we want to detect and where?

Marko Mäkelä added a comment - 2021-10-26 09:01

I was hoping that this could be closed as a duplicate of ~~MDEV-22929~~. But, valerii stated that the problem was repeatable with 10.3.28.

However, we were not able to repeat this. Hence, I am closing this as "Cannot Reproduce". We can reopen and fix this if someone provides something for repeating this bug.

Marko Mäkelä added a comment - 2021-10-26 09:01 I was hoping that this could be closed as a duplicate of MDEV-22929 . But, valerii stated that the problem was repeatable with 10.3.28. However, we were not able to repeat this. Hence, I am closing this as "Cannot Reproduce". We can reopen and fix this if someone provides something for repeating this bug.

MariaDB Server

Table corruption not detected with CHECK TABLE or innochecksum, only with mariabackup

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration