Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23653

Server crash: InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]

Details

    Description

      We have one slave running 10.4.13 which all of a sudden crashed with:

      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
      Sep  2 13:58:20 db2120 mysqld[1095]:  len 16384; hex 58252d810000c0c60000c0c50000c0c7000014f062d2a6a045bf00000000000000000000134800323b8c80c9000000003b47
       
       -cut-
       
       
       
      Sep  2 13:58:20 db2120 mysqld[1095]: InnoDB: End of page dump
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Uncompressed page, stored checksum in field1 1478831489, calculated checksums for field1: crc32 1725978419, innodb 1478831489,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 1725978419, calculated checksums for field2: crc32 1725978419, innodb 1725978419, none 3735928559,  page LSN 5360 1657972384, low 4 bytes of LSN at page end 1657972384, page number (if stored to page already) 49350, space id (if created with >= MySQL-4.1.1 and stored already) 4936
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page may be an index page where index id is 19838
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.
      Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
      
      

      There are no HW errors on that host or any other trace of OS corruption, so this looks entirely related to InnoDB.
      Could it be the same as https://jira.mariadb.org/browse/MDEV-21165 and/or https://jira.mariadb.org/browse/MDEV-19978?

      We do use

      innodb_checksum_algorithm | crc32 |
      

      It is hard to send a way to reproduce this crash as it happened all of a sudden.
      We've assumed the data is corrupted.

      Attachments

        Issue Links

          Activity

            marostegui, is any further information available on this? Are you still seeing these failures? Was any hardware fault found?

            marko Marko Mäkelä added a comment - marostegui , is any further information available on this? Are you still seeing these failures? Was any hardware fault found?

            We've seen errors like the ones described on the comments above (https://jira.mariadb.org/browse/MDEV-23653?focusedCommentId=167372&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167372) but we've not seen mysql crashing because of them anymore.
            We are checking hosts tables after upgrading them from 10.1 to 10.4 and rebuilding the tables that get index corruption flagged.

            marostegui Manuel Arostegui added a comment - We've seen errors like the ones described on the comments above ( https://jira.mariadb.org/browse/MDEV-23653?focusedCommentId=167372&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167372 ) but we've not seen mysql crashing because of them anymore. We are checking hosts tables after upgrading them from 10.1 to 10.4 and rebuilding the tables that get index corruption flagged.

            Not sure how much feedback I can give on this

            marostegui Manuel Arostegui added a comment - Not sure how much feedback I can give on this

            marostegui, I am closing this ticket because this does not appear to be reproducible, and we never encountered anything like that in our internal testing. I double-checked that no complete page dump of a corrupted page was provided in this ticket. Hence, there is not enough information to diagnose any bug in the page checksum calculation.

            In 10.4, I would recommend

            SET GLOBAL innodb_checksum_algorithm=full_crc32;
            

            to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32, as strict_crc32).

            One thing worth noting is that the definition of innodb_checksum_algorithm=crc32 was slightly changed in MDEV-17958 so that we would no longer accept an incorrect variant of a CRC-32C algorithm that was implemented in MySQL 5.6, MariaDB 10.0, 10.1 on processors using big-endian byte order. That incorrect variant was introduced in MySQL 5.7 and MariaDB 10.2.

            If you were using innodb_encrypt_tables, there were some improvements to the way how the checksums are calculated, in MDEV-18529 and some related tickets.

            I would assume that you are using unencrypted tables on commodity AMD64 hardware.

            I am afraid that this will remain a mystery. Page corruption could remain dormant for a long time, until the page is actually accessed. Maybe some very rarely accessed BLOB contained a corrupted page?

            Index corruption (such as MDEV-22373) occurs at a much higher level and cannot be blamed on hardware as easily as page corruption.

            marko Marko Mäkelä added a comment - marostegui , I am closing this ticket because this does not appear to be reproducible, and we never encountered anything like that in our internal testing. I double-checked that no complete page dump of a corrupted page was provided in this ticket. Hence, there is not enough information to diagnose any bug in the page checksum calculation. In 10.4, I would recommend SET GLOBAL innodb_checksum_algorithm=full_crc32; to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32 , as strict_crc32 ). One thing worth noting is that the definition of innodb_checksum_algorithm=crc32 was slightly changed in MDEV-17958 so that we would no longer accept an incorrect variant of a CRC-32C algorithm that was implemented in MySQL 5.6, MariaDB 10.0, 10.1 on processors using big-endian byte order. That incorrect variant was introduced in MySQL 5.7 and MariaDB 10.2. If you were using innodb_encrypt_tables , there were some improvements to the way how the checksums are calculated, in MDEV-18529 and some related tickets. I would assume that you are using unencrypted tables on commodity AMD64 hardware. I am afraid that this will remain a mystery. Page corruption could remain dormant for a long time, until the page is actually accessed. Maybe some very rarely accessed BLOB contained a corrupted page? Index corruption (such as MDEV-22373 ) occurs at a much higher level and cannot be blamed on hardware as easily as page corruption.

            Thank you Marko

            marostegui Manuel Arostegui added a comment - Thank you Marko

            People

              marko Marko Mäkelä
              marostegui Manuel Arostegui
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.