Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32639

InnoDB recovery fails due to corrupted page despite innodb_doublewrite=ON

    XMLWordPrintable

Details

    Description

      I observed a test failure in a CI system:

      10.6 5b53342a6a59ae1141b2a46467954fc3836f8e80

      CURRENT_TEST: innodb_fts.crash_recovery
      $ /home/buildbot/aarch64-fedora-38/build/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/home/buildbot/aarch64-fedora-38/build/mysql-test/var/3/my.cnf --log-output=file --disable-gdb --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --enable-plugin-innodb-sys-tables --core-file --loose-debug-sync-timeout=300
      2023-10-26 12:32:21 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32262)
      2023-10-26 12:32:21 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      2023-10-26 12:32:21 0 [Note] Starting MariaDB 10.6.16-MariaDB-log source revision 5b53342a6a59ae1141b2a46467954fc3836f8e80 as process 115945
      2023-10-26 12:32:22 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
      2023-10-26 12:32:22 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=44053,44053
      2023-10-26 12:32:22 0 [Note] InnoDB: Trying to recover page [page id: space=0, page number=54] from the doublewrite buffer.
      2023-10-26 12:32:22 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to recover from a backup.
      2023-10-26 12:32:22 0 [Note] InnoDB: Page dump (16384 bytes):
      2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000036ffffffffffffffff0000000000010c4b0006000000000000
      2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000000000000000000000000010000013700780000013700780000
      2023-10-26 12:32:22 0 [Note] InnoDB: 000000000002123200000137ffffffffffffffffffffffffffffffffffffffff
      2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffff000000000000000000000000000000000000000000000000
      2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000000000000000000000000000000000000000000000000000000
      2023-10-26 12:32:22 0 [Note] InnoDB: 000000000000000000000000000000000000000000000000000039e36d240a52
      2023-10-26 12:32:22 0 [Note] InnoDB: End of page dump
      

      I downloaded https://ci.mariadb.org/39864/logs/aarch64-fedora-38/var.tar.gz and checked the contents. At the start of the page we can see FIL_PAGE_LSN 0x10c4b and a FIL_PAGE_TYPE of FIL_PAGE_TYPE_SYS. At the end, right before the 32-bit checksum, we see the 32 least significant bits of an LSN 0x39e3 (not 0x10c4b). It looks like the server had been killed in the middle of the page write. This is very well possible in the innodb_fts.crash_recovery,release test variant; it kills the server at random intervals. I assume that this was an undo log page that had been freed and later reallocated.

      The problem here is that there is no copy of the corrupted page in the doublewrite buffer of the InnoDB system tablespace ibdata1 (pages 64 to 191).

      I think that to reproduce this problem, I must add some fault injection so that we would occasionally write less than a full page. Upon the completion of such an incomplete write, we would crash. This should hopefully improve the chances of reproducing the corruption.

      This failure may occur in any crash recovery test:

      innodb.innodb-alter-tempfile '64k,innodb' w50 [ fail ]
              Test ended at 2023-10-25 09:00:32
       
      CURRENT_TEST: innodb.innodb-alter-tempfile
      mysqltest: At line 45: query 'show create table t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
      2023-10-25  9:00:32 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=52247,52247
      2023-10-25  9:00:32 0 [Note] InnoDB: Trying to recover page [page id: space=0, page number=279] from the doublewrite buffer.
      2023-10-25  9:00:32 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=279]. You may have to recover from a backup.
      

      Attachments

        Activity

          People

            marko Marko Mäkelä
            marko Marko Mäkelä
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.