[MDEV-23653] Server crash: InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350] - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.4.12, 10.4.13
Fix Version/s: N/A
Component/s: Server, Storage Engine - InnoDB
Labels:
None
Environment:
debian

Description

We have one slave running 10.4.13 which all of a sudden crashed with:

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):

Sep  2 13:58:20 db2120 mysqld[1095]:  len 16384; hex 58252d810000c0c60000c0c50000c0c7000014f062d2a6a045bf00000000000000000000134800323b8c80c9000000003b47

 -cut-

Sep  2 13:58:20 db2120 mysqld[1095]: InnoDB: End of page dump

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Uncompressed page, stored checksum in field1 1478831489, calculated checksums for field1: crc32 1725978419, innodb 1478831489,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 1725978419, calculated checksums for field2: crc32 1725978419, innodb 1725978419, none 3735928559,  page LSN 5360 1657972384, low 4 bytes of LSN at page end 1657972384, page number (if stored to page already) 49350, space id (if created with >= MySQL-4.1.1 and stored already) 4936

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page may be an index page where index id is 19838

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):

There are no HW errors on that host or any other trace of OS corruption, so this looks entirely related to InnoDB.
Could it be the same as https://jira.mariadb.org/browse/MDEV-21165 and/or https://jira.mariadb.org/browse/MDEV-19978?

We do use

innodb_checksum_algorithm | crc32 |

It is hard to send a way to reproduce this crash as it happened all of a sudden.
We've assumed the data is corrupted.

Attachments

Issue Links

relates to

MDEV-19978 Page read from tablespace is corrupted

Closed

Activity

Ascending order - Click to sort in descending order

View 12 older comments

Marko Mäkelä added a comment - 2021-03-22 10:00

marostegui, is any further information available on this? Are you still seeing these failures? Was any hardware fault found?

Marko Mäkelä added a comment - 2021-03-22 10:00 marostegui , is any further information available on this? Are you still seeing these failures? Was any hardware fault found?

Manuel Arostegui added a comment - 2021-03-22 10:30

We've seen errors like the ones described on the comments above (https://jira.mariadb.org/browse/MDEV-23653?focusedCommentId=167372&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167372) but we've not seen mysql crashing because of them anymore.
We are checking hosts tables after upgrading them from 10.1 to 10.4 and rebuilding the tables that get index corruption flagged.

Manuel Arostegui added a comment - 2021-03-22 10:30 We've seen errors like the ones described on the comments above ( https://jira.mariadb.org/browse/MDEV-23653?focusedCommentId=167372&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167372 ) but we've not seen mysql crashing because of them anymore. We are checking hosts tables after upgrading them from 10.1 to 10.4 and rebuilding the tables that get index corruption flagged.

Manuel Arostegui added a comment - 2021-04-26 15:37

Not sure how much feedback I can give on this

Manuel Arostegui added a comment - 2021-04-26 15:37 Not sure how much feedback I can give on this

Marko Mäkelä added a comment - 2021-07-23 08:34

marostegui, I am closing this ticket because this does not appear to be reproducible, and we never encountered anything like that in our internal testing. I double-checked that no complete page dump of a corrupted page was provided in this ticket. Hence, there is not enough information to diagnose any bug in the page checksum calculation.

In 10.4, I would recommend

SET GLOBAL innodb_checksum_algorithm=full_crc32;

to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32, as strict_crc32).

One thing worth noting is that the definition of innodb_checksum_algorithm=crc32 was slightly changed in ~~MDEV-17958~~ so that we would no longer accept an incorrect variant of a CRC-32C algorithm that was implemented in MySQL 5.6, MariaDB 10.0, 10.1 on processors using big-endian byte order. That incorrect variant was introduced in MySQL 5.7 and MariaDB 10.2.

If you were using innodb_encrypt_tables, there were some improvements to the way how the checksums are calculated, in ~~MDEV-18529~~ and some related tickets.

I would assume that you are using unencrypted tables on commodity AMD64 hardware.

I am afraid that this will remain a mystery. Page corruption could remain dormant for a long time, until the page is actually accessed. Maybe some very rarely accessed BLOB contained a corrupted page?

Index corruption (such as ~~MDEV-22373~~) occurs at a much higher level and cannot be blamed on hardware as easily as page corruption.

Marko Mäkelä added a comment - 2021-07-23 08:34 marostegui , I am closing this ticket because this does not appear to be reproducible, and we never encountered anything like that in our internal testing. I double-checked that no complete page dump of a corrupted page was provided in this ticket. Hence, there is not enough information to diagnose any bug in the page checksum calculation. In 10.4, I would recommend SET GLOBAL innodb_checksum_algorithm=full_crc32; to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32 , as strict_crc32 ). One thing worth noting is that the definition of innodb_checksum_algorithm=crc32 was slightly changed in MDEV-17958 so that we would no longer accept an incorrect variant of a CRC-32C algorithm that was implemented in MySQL 5.6, MariaDB 10.0, 10.1 on processors using big-endian byte order. That incorrect variant was introduced in MySQL 5.7 and MariaDB 10.2. If you were using innodb_encrypt_tables , there were some improvements to the way how the checksums are calculated, in MDEV-18529 and some related tickets. I would assume that you are using unencrypted tables on commodity AMD64 hardware. I am afraid that this will remain a mystery. Page corruption could remain dormant for a long time, until the page is actually accessed. Maybe some very rarely accessed BLOB contained a corrupted page? Index corruption (such as MDEV-22373 ) occurs at a much higher level and cannot be blamed on hardware as easily as page corruption.

Manuel Arostegui added a comment - 2021-07-23 11:44

Thank you Marko

Manuel Arostegui added a comment - 2021-07-23 11:44 Thank you Marko

MariaDB Server

Server crash: InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration