[MDEV-27931] A page in innodb_checksum_algorithm=innodb is wrongly claimed to be corrupted Created: 2022-02-24 Updated: 2022-03-29 Resolved: 2022-03-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.0, 10.3.0, 10.4.0, 10.5.0, 10.6.0, 10.5.15, 10.6.7, 10.5 |
| Fix Version/s: | 10.2.44, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3, 10.9.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Harvey Cooper | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Hi, all MariaDB developers. My team recently performed a successful MariaDB server upgrade, from 10.0.17 to 10.6.7.
After a successful upgrade, we can verify that MariaDB 10.6.7 was good about the prepared data, no error log, business level sanity check was totally fine. We haven't done CHECK TABLE because we set innodb_file_per_table=1 in my.cnf, ibdata1 should mostly be undo tablespace data(occupied around 200G in size), so I though running CHECK TABLE won't help either. But when I run MariaDB 10.6.7 innodbchecksum for the ibdata1, it'll report page1 error. All data upgraded from 10.0.17 to 10.6.7 will report corruption. The command I executed for backup the upgraded databased: Logs
Since I add the --log-innodb-page-corruption cli option, I expect page corruption will be logged but not to halt the backup procedure, and after data preparing, the page corruption will be gone, since it happened in unallocated system table space?(need confirmation) |
| Comments |
| Comment by Marko Mäkelä [ 2022-02-24 ] | ||||||||||||||||||||||||||||||||||||
|
This bug is about page checksum validation. I would need a copy of a corrupted page to see what the cause of the problem might be. Does innochecksum report checksum errors? If you rewrite the file to use innodb_checksum_algorithm=crc32 (using innochecksum), will backup then work? | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-02-24 ] | ||||||||||||||||||||||||||||||||||||
Can I share it with you personally (for example, by email), because there are some sensitive data in it.
When I run 10.6.7 innochecksum, it'll report an error:
And there is a another log: When I run innochecksum -p 3994534 ibdata1, it will report the given page is ok.
| ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-02-24 ] | ||||||||||||||||||||||||||||||||||||
|
hcooper, thank you. You can extract the page contents with something like this:
The other page should be in a different data file if innochecksum did not complain about that. Unfortunately, the mariadb-backup error message does not identify the file name. You can send them to my email. Page 1 is the change buffer bitmap page and it should not contain any user data. Note: If you used innodb_checksum_algorithm=crc32 in the past on a big-endian processor on MariaDB Server 10.0 or 10.1 or MySQL 5.6, then this bug should occur due to | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-02-25 ] | ||||||||||||||||||||||||||||||||||||
The page1.bin and the detailed logs had been sent to your email.
I can confirm that our servers are always Intel-based CPU so it's always being little-endian. | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-02-28 ] | ||||||||||||||||||||||||||||||||||||
|
Hi, marko, have you received my email yet? | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-03-02 ] | ||||||||||||||||||||||||||||||||||||
|
Any update for this bug? | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-25 ] | ||||||||||||||||||||||||||||||||||||
|
hcooper, sorry, I had overlooked this report. I don’t think that the page1.bin The first 10.6 commit with which the page is claimed invalid implemented The hexdump from mariabackup seems to be for page 0x3cf3a6 in the system tablespace. It looks like an undo log page. That page was reported invalid also by the 10.5 innochecksum but not the 10.2 innochecksum. I will not attach that page to this report because it contains confidential data. I created a dummy tablespace by doing the following:
Which values of innodb_checksum_algorithm have you used in the past? In any case, I would recommend you to rebuild the data files from a logical dump, so that you would benefit from the more secure innodb_checksum_algorithm=full_crc32 format. If the data files were originally created before MySQL 5.1.48 or MariaDB 5.1.48, you could hit | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
|
The page page1.bin My intention was that the file could be converted to the slightly better innodb_checksum_algorithm=crc32 format by the following:
Alas, in
Converting the system tablespace to the full_crc32 format will require the database to be rebuilt. I will check the other page shortly. That one is being rejected due to some earlier change somewhere between 10.2 and 10.5. | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
|
@marko, thank you very much! Although we solved this glitch by utilizing the mydumper to dump the whole db concurrently, and then load the db into MariaDB 10.6. | ||||||||||||||||||||||||||||||||||||
| Comment by Harvey Cooper [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
We haven't set innodb_checksum_algorithm explicitly, according to https://mariadb.com/kb/en/innodb-system-variables/#innodb_checksum_algorithm, it should be innodb_checksum_algorithm=innodb in MariaDB 10.0.x | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
|
I tested the other page dump again. It is necessary to add one all-zero page before that page, so that page size will be correctly detected as 16384 bytes. Such a 32768-byte file would be accepted by the 10.2, 10.3, and 10.4 versions of innochecksum by default, but no longer by the 10.5 version.
The 10.5 innochecksum would accept the file when invoked as follows:
Starting with 10.5, the non-strict innodb_checksum_algorithm=crc32 validation is rejecting this page that appears to use innodb_checksum_algorithm=innodb. | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
|
The implementation of
Due to this change, buf_page_is_corrupted() in innochecksum would invoke buf_calc_page_crc32(). Coincidentally, buf_calc_page_crc32() will return crc32=687675747 for this page, and that value happens to be the value of checksum_field2. Therefore, the following check will wrongly determine that the page is corrupted:
I can repeat the wrong outcome also in the 10.4 innochecksum when applying the above patch. The wrong claim of corrupted page in mariadb-backup --backup should be due to the same problem, but I have no easy way to check it. | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-28 ] | ||||||||||||||||||||||||||||||||||||
|
The incorrect page validation should affect any MariaDB Server version since 10.0. After testing a fix on 10.4 and 10.5, I posted a fix for 10.2, so that this can be fixed in the last scheduled release before end-of-life. |