Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.3.17, 10.3.21, 10.3.23
Description
It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:
Backup command used:
/var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019
|
|
mariabackup output:
[00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock
|
[00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log
|
/var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64)
|
....
|
....
|
[01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp
|
[01] 2019-11-16 06:29:27 ...done
|
[01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying...
|
[00] 2019-11-16 06:29:28 >> log scanned up to (30851334774)
|
[01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted.
|
2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
|
.......
|
.......
|
InnoDB: End of page dump
|
2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962
|
2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185
|
[01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed.
|
[00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile.
|
|
Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring.
set OLD_ALTER_TABLE=1
Alter table table_name engine=InnoDB
Alter table table_name FORCE
Take mysqldump of table and restore it to Database
Attachments
Issue Links
- is duplicated by
-
MDEV-24260 mariabackup and innochecksum detects page faults but all ok in application
-
- Closed
-
- relates to
-
MDEV-19871 Add page id matching check in innochecksum tool
-
- Closed
-
-
MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered
-
- Closed
-
-
MDEV-23971 add the ability to fix corrupted pages on --prepare
-
- Closed
-
-
MDEV-29938 InnoDB: Assertion failure in btr0pcur.cc line 532
-
- Open
-
-
MDEV-25361 innochecksum must not report errors for freed pages
-
- Closed
-
In break_down_rate.ddl
, the PAGE_LEVEL of both pages 3 (the PRIMARY KEY root page) and 4 (the first secondary index root page, if all indexes were created before inserting data) is 0. These are the 2 bytes at byte offsets 0xc040 and 0x10040.
This suggests that page 5 coudd be unused garbage, and CHECK TABLE does not read such pages. It is only reading pages that are reachable from the index tree root pages.
Furthermore, CHECK TABLE (more specifically, btr_validate_index()) does not seem to fetch any BLOB pages. They would be reachable from clustered index (PRIMARY KEY) leaf pages.
Because page 5 (starting at 0x14000) does contain the B-tree page identifiers "infimum" and "supremum", and because FIL_PAGE_TYPE is FIL_PAGE_INDEX, it should not be a BLOB page. So, CHECK TABLE seems to be working as designed.
Both index root pages 3 and 4 are empty: the bytes 0x000d right before the "infimum" point straight to the "supremum". So, there cannot be any references to BLOB pages.
The file appears to contain two ‘extra’ pages: the allegedly corrupted page 5 and an all-zero page 6.
One last thing that we can check is the page allocation bitmap in page 0, which should describe pages 0 to 16383. The FSP_SIZE at offset 0x2e is 7, which does match the file size (page 6 is the last page). It is normal to have a few extra empty pages at the end of the data file.
The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free.
This is a border case, but apart from failing to read and check BLOB pages, I do not see any wrongdoing by CHECK TABLE.
I would tend to suspect a problem in Mariabackup. The page at byte offset 0x14000 (page 5) looks unrelated to the rest of the data file. vlad.lesin, did you notice that the tablespace ID at byte offset 0x14022 is 0x431 instead of 0x4ec?
I remember diagnosing a similar issue with the same support customer about a year ago, but we failed to reach a conclusion then. On July 1, 2019, we added a debug check to InnoDB in
MDEV-19871to specifically catch such tablespace ID mismatch. (The commit message uses the wrong MDEV number and is misleadingly mentioning the innochecksum tool, which was not changed.). That check has never failed in our internal testing since then. Based on that, I firmly believe that a bug must exist in the interaction between mariabackup --backup and the server process. It remains to be seen whether that is a design bug (which could be fixed by implementing server-side backup, as noted in MDEV-14992) or something that could be improved in the code.I would request that a repeatable test case for this be created. Because the failure is likely to be nondeterministic, I would suggest the following approach with https://rr-project.org/:
I am proposing frequent server restarts, so that the rr replay traces would be smaller. Note that rr replay traces are not portable between computers in practice. For analysis, remote access to the system would be needed.
This approach was successfully used when fixing crash recovery bugs in code that was rewritten in MariaDB 10.5, for example
MDEV-22139.