Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.3.17, 10.3.21, 10.3.23
Description
It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:
Backup command used:
/var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019
|
|
mariabackup output:
[00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock
|
[00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log
|
/var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64)
|
....
|
....
|
[01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp
|
[01] 2019-11-16 06:29:27 ...done
|
[01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
|
[01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying...
|
[00] 2019-11-16 06:29:28 >> log scanned up to (30851334774)
|
[01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted.
|
2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
|
.......
|
.......
|
InnoDB: End of page dump
|
2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962
|
2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185
|
[01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed.
|
[00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile.
|
|
Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring.
set OLD_ALTER_TABLE=1
Alter table table_name engine=InnoDB
Alter table table_name FORCE
Take mysqldump of table and restore it to Database
Attachments
Issue Links
- is duplicated by
-
MDEV-24260 mariabackup and innochecksum detects page faults but all ok in application
-
- Closed
-
- relates to
-
MDEV-19871 Add page id matching check in innochecksum tool
-
- Closed
-
-
MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered
-
- Closed
-
-
MDEV-23971 add the ability to fix corrupted pages on --prepare
-
- Closed
-
-
MDEV-29938 InnoDB: Assertion failure in btr0pcur.cc line 532
-
- Open
-
-
MDEV-25361 innochecksum must not report errors for freed pages
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Assignee | Vladislav Lesin [ vlad.lesin ] | |
Labels | corruption | |
Priority | Major [ 3 ] | Critical [ 2 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Description |
It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:
Backup command used: ---- /var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019 ---- mariabackup output: ---- [00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock [00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log /var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64) .... .... [01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp [01] 2019-11-16 06:29:27 ...done [01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying... [00] 2019-11-16 06:29:28 >> log scanned up to (30851334774) [01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted. 2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes): ....... ....... InnoDB: End of page dump 2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962 2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185 [01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed. [00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile. ---- Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring. ---- set OLD_ALTER_TABLE=1 Alter table table_name engine=InnoDB Alter table table_name FORCE Take mysqldump of table and restore it to Database ---- |
It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:
Backup command used: ---- {code} /var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019 {code} ---- mariabackup output: ---- {code} [00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock [00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log /var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64) .... .... [01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp [01] 2019-11-16 06:29:27 ...done [01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying... [01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying... [00] 2019-11-16 06:29:28 >> log scanned up to (30851334774) [01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted. 2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes): ....... ....... InnoDB: End of page dump 2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962 2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185 [01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed. [00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile. {code} ---- Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring. ---- set OLD_ALTER_TABLE=1 Alter table table_name engine=InnoDB Alter table table_name FORCE Take mysqldump of table and restore it to Database ---- |
Link | This issue relates to TODO-2089 [ TODO-2089 ] |
Labels | corruption | corruption need_feedback |
Link | This issue relates to TODO-2183 [ TODO-2183 ] |
Labels | corruption need_feedback | corruption |
Labels | corruption | corruption need_feedback |
Affects Version/s | 10.3.21 [ 24014 ] |
Link | This issue relates to TODO-2341 [ TODO-2341 ] |
Attachment | break_down_rate.ddl [ 51826 ] | |
Attachment | break_down_rate.ibd [ 51827 ] |
Labels | corruption need_feedback | corruption |
Summary | Mariabackup failed with table corruption | Table corruption not detected with CHECK TABLE or innochecksum, only with mariabackup |
Assignee | Vladislav Lesin [ vlad.lesin ] | Marko Mäkelä [ marko ] |
Link |
This issue relates to |
Assignee | Marko Mäkelä [ marko ] | Vladislav Lesin [ vlad.lesin ] |
Comment |
[ [~marko] I have a couple of questions.
> I remember diagnosing a similar issue with the same support customer about a year ago, but we failed to reach a conclusion then. On July 1, 2019, we added a debug check to InnoDB in What interaction do you mean? The backup was never restored, mariabackup does not change data files during copying. The only problem I can assume is that mariabackup does some checks during copying, which must not be done. At the other hand, it uses innodb code to read pages. But I am checking this version. > On July 1, 2019, we added a debug check to InnoDB in Yes, we did it. The following code was added in fil_io(): {code:java} ut_ad(!req_type.is_write() || page_id.space() == SRV_LOG_SPACE_FIRST_ID || !fil_is_user_tablespace_id(page_id.space()) || offset == page_id.page_no() * page_size.physical()); {code} This code does not check the content of the page, it just checks the correctness of passed to fil_io() parameters. I have added the following check: {code:java} if (req_type.is_write()⏎ && page_id.space() != SRV_LOG_SPACE_FIRST_ID⏎ && fil_is_user_tablespace_id(page_id.space())⏎ && (offset != page_id.page_no() * page_size.physical()⏎ || mach_read_from_4(static_cast<const byte *>(buf) + FIL_PAGE_OFFSET)⏎ != page_id.page_no()⏎ || mach_read_from_4(static_cast<const byte *>(buf)⏎ + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID) != page_id.space())) {⏎ {code} in my custom build. If this check is correct from your perspective, should not we add it in trunk to catch the bug during testing? ] |
Link |
This issue relates to |
Link | This issue relates to TODO-2438 [ TODO-2438 ] |
Attachment | t33914-202006252215.tgz.enc [ 52375 ] |
Attachment | t33914-202006252215.tgz.enc [ 52375 ] |
Attachment | t33914-202006252311.log [ 52376 ] |
Attachment | t33914-202006252313.tgz.enc.a [ 52377 ] | |
Attachment | t33914-202006252313.tgz.enc.b [ 52378 ] | |
Attachment | t33914-202006252313.tgz.enc.c [ 52379 ] | |
Attachment | t33914-202006252313.tgz.enc.d [ 52380 ] |
Comment | [ A comment with security level 'Developers' was removed. ] |
Labels | corruption | corruption need_feedback |
Comment | [ A comment with security level 'Developers' was removed. ] |
Attachment | cs0093460_20200807_core.21139.gzm [ 53271 ] | |
Attachment | cs0093460_20200807_core.21139.gzl [ 53272 ] | |
Attachment | cs0093460_20200807_core.21139.gzk [ 53273 ] | |
Attachment | cs0093460_20200807_core.21139.gzj [ 53274 ] | |
Attachment | cs0093460_20200807_core.21139.gzi [ 53275 ] | |
Attachment | cs0093460_20200807_core.21139.gzh [ 53276 ] | |
Attachment | cs0093460_20200807_core.21139.gzg [ 53277 ] | |
Attachment | cs0093460_20200807_core.21139.gzf [ 53278 ] | |
Attachment | cs0093460_20200807_core.21139.gze [ 53279 ] | |
Attachment | cs0093460_20200807_core.21139.gzd [ 53280 ] | |
Attachment | cs0093460_20200807_core.21139.gzc [ 53281 ] | |
Attachment | cs0093460_20200807_core.21139.gzb [ 53282 ] | |
Attachment | cs0093460_20200807_core.21139.gza [ 53283 ] |
Attachment | cs0093460_20200807_error_mariadb_x01bofiddb1a.log [ 53284 ] |
Attachment | cs0093460_20200807_lib.tar.gzb [ 53285 ] | |
Attachment | cs0093460_20200807_lib.tar.gza [ 53286 ] |
Link | This issue relates to TODO-2506 [ TODO-2506 ] |
Link | This issue relates to TODO-2507 [ TODO-2507 ] |
Attachment | mdev-21109-rr.sh [ 53625 ] |
Attachment | mdev-21109-rr.sh [ 53625 ] |
Attachment | mdev-21109-rr.sh [ 53628 ] |
Attachment | cs0093460_20200917_full_backup_20200917_120913.log [ 53896 ] |
Attachment | cs0093460_20200917_full_backup_20200916_054536.log [ 53897 ] |
Attachment | cs0093460_20200917_error_mariadb_x01bstredb1a.log [ 53898 ] |
Attachment | stre_run_status.frm [ 53899 ] |
Attachment | stre_run_status.ibd [ 53900 ] |
Attachment | stre_run_status.frm [ 53899 ] |
Attachment | stre_run_status.ibd [ 53900 ] |
Attachment | corrupt20200921.tgz.enc [ 53901 ] |
Labels | corruption need_feedback | corruption |
Link |
This issue relates to |
Link |
This issue relates to |
Affects Version/s | 10.3.23 [ 24222 ] |
Attachment | show_global_status_x07gisiddb3a.log [ 54539 ] | |
Attachment | show_global_variables_x07gisiddb3a.log [ 54540 ] |
Link |
This issue is blocked by |
Link | This issue relates to TODO-2679 [ TODO-2679 ] |
Link |
This issue is duplicated by |
Link |
This issue is blocked by |
Labels | corruption | corruption need_feedback |
Assignee | Vladislav Lesin [ vlad.lesin ] | Chris Calender [ ccalender ] |
Labels | corruption need_feedback | corruption |
Assignee | Chris Calender [ ccalender ] | Vladislav Lesin [ vlad.lesin ] |
Assignee | Vladislav Lesin [ vlad.lesin ] | Sergei Golubchik [ serg ] |
Assignee | Sergei Golubchik [ serg ] | Marko Mäkelä [ marko ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Labels | corruption | ServiceNow corruption |
Labels | ServiceNow corruption | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption |
Labels | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption | corruption |
Labels | corruption | corruption need_feedback |
Labels | corruption need_feedback | corruption |
Labels | corruption | corruption need_feedback |
Labels | corruption need_feedback | corruption |
Labels | corruption | corruption need_feedback |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Cannot Reproduce [ 5 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Resolution | Cannot Reproduce [ 5 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Workflow | MariaDB v3 [ 101312 ] | MariaDB v4 [ 143620 ] |
Status | Stalled [ 10000 ] | Open [ 1 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Labels | corruption need_feedback | corruption |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Resolution | Incomplete [ 4 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Link | This issue relates to MDEV-29938 [ MDEV-29938 ] |
Zendesk Related Tickets | 201658 110821 112315 173466 179943 123943 187534 176114 119382 185032 126984 132703 167577 161518 | |
Zendesk active tickets | 201658 |
In break_down_rate.ddl
, the PAGE_LEVEL of both pages 3 (the PRIMARY KEY root page) and 4 (the first secondary index root page, if all indexes were created before inserting data) is 0. These are the 2 bytes at byte offsets 0xc040 and 0x10040.
This suggests that page 5 coudd be unused garbage, and CHECK TABLE does not read such pages. It is only reading pages that are reachable from the index tree root pages.
Furthermore, CHECK TABLE (more specifically, btr_validate_index()) does not seem to fetch any BLOB pages. They would be reachable from clustered index (PRIMARY KEY) leaf pages.
Because page 5 (starting at 0x14000) does contain the B-tree page identifiers "infimum" and "supremum", and because FIL_PAGE_TYPE is FIL_PAGE_INDEX, it should not be a BLOB page. So, CHECK TABLE seems to be working as designed.
Both index root pages 3 and 4 are empty: the bytes 0x000d right before the "infimum" point straight to the "supremum". So, there cannot be any references to BLOB pages.
The file appears to contain two ‘extra’ pages: the allegedly corrupted page 5 and an all-zero page 6.
One last thing that we can check is the page allocation bitmap in page 0, which should describe pages 0 to 16383. The FSP_SIZE at offset 0x2e is 7, which does match the file size (page 6 is the last page). It is normal to have a few extra empty pages at the end of the data file.
The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free.
This is a border case, but apart from failing to read and check BLOB pages, I do not see any wrongdoing by CHECK TABLE.
I would tend to suspect a problem in Mariabackup. The page at byte offset 0x14000 (page 5) looks unrelated to the rest of the data file. vlad.lesin, did you notice that the tablespace ID at byte offset 0x14022 is 0x431 instead of 0x4ec?
I remember diagnosing a similar issue with the same support customer about a year ago, but we failed to reach a conclusion then. On July 1, 2019, we added a debug check to InnoDB in
MDEV-19871to specifically catch such tablespace ID mismatch. (The commit message uses the wrong MDEV number and is misleadingly mentioning the innochecksum tool, which was not changed.). That check has never failed in our internal testing since then. Based on that, I firmly believe that a bug must exist in the interaction between mariabackup --backup and the server process. It remains to be seen whether that is a design bug (which could be fixed by implementing server-side backup, as noted in MDEV-14992) or something that could be improved in the code.I would request that a repeatable test case for this be created. Because the failure is likely to be nondeterministic, I would suggest the following approach with https://rr-project.org/:
I am proposing frequent server restarts, so that the rr replay traces would be smaller. Note that rr replay traces are not portable between computers in practice. For analysis, remote access to the system would be needed.
This approach was successfully used when fixing crash recovery bugs in code that was rewritten in MariaDB 10.5, for example
MDEV-22139.