Details
Description
I observed a test failure in a CI system:
10.6 5b53342a6a59ae1141b2a46467954fc3836f8e80 |
CURRENT_TEST: innodb_fts.crash_recovery
|
$ /home/buildbot/aarch64-fedora-38/build/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/home/buildbot/aarch64-fedora-38/build/mysql-test/var/3/my.cnf --log-output=file --disable-gdb --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --enable-plugin-innodb-sys-tables --core-file --loose-debug-sync-timeout=300
|
2023-10-26 12:32:21 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32262)
|
2023-10-26 12:32:21 0 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
|
2023-10-26 12:32:21 0 [Note] Starting MariaDB 10.6.16-MariaDB-log source revision 5b53342a6a59ae1141b2a46467954fc3836f8e80 as process 115945
|
…
|
2023-10-26 12:32:22 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
|
2023-10-26 12:32:22 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=44053,44053
|
2023-10-26 12:32:22 0 [Note] InnoDB: Trying to recover page [page id: space=0, page number=54] from the doublewrite buffer.
|
2023-10-26 12:32:22 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to recover from a backup.
|
2023-10-26 12:32:22 0 [Note] InnoDB: Page dump (16384 bytes):
|
2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000036ffffffffffffffff0000000000010c4b0006000000000000
|
2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000000000000000000000000010000013700780000013700780000
|
2023-10-26 12:32:22 0 [Note] InnoDB: 000000000002123200000137ffffffffffffffffffffffffffffffffffffffff
|
2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
|
…
|
2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
|
2023-10-26 12:32:22 0 [Note] InnoDB: ffffffffffffffff000000000000000000000000000000000000000000000000
|
2023-10-26 12:32:22 0 [Note] InnoDB: 0000000000000000000000000000000000000000000000000000000000000000
|
…
|
2023-10-26 12:32:22 0 [Note] InnoDB: 000000000000000000000000000000000000000000000000000039e36d240a52
|
2023-10-26 12:32:22 0 [Note] InnoDB: End of page dump
|
I downloaded https://ci.mariadb.org/39864/logs/aarch64-fedora-38/var.tar.gz and checked the contents. At the start of the page we can see FIL_PAGE_LSN 0x10c4b and a FIL_PAGE_TYPE of FIL_PAGE_TYPE_SYS. At the end, right before the 32-bit checksum, we see the 32 least significant bits of an LSN 0x39e3 (not 0x10c4b). It looks like the server had been killed in the middle of the page write. This is very well possible in the innodb_fts.crash_recovery,release test variant; it kills the server at random intervals. I assume that this was an undo log page that had been freed and later reallocated.
The problem here is that there is no copy of the corrupted page in the doublewrite buffer of the InnoDB system tablespace ibdata1 (pages 64 to 191).
I think that to reproduce this problem, I must add some fault injection so that we would occasionally write less than a full page. Upon the completion of such an incomplete write, we would crash. This should hopefully improve the chances of reproducing the corruption.
This failure may occur in any crash recovery test:
innodb.innodb-alter-tempfile '64k,innodb' w50 [ fail ]
|
Test ended at 2023-10-25 09:00:32
|
|
CURRENT_TEST: innodb.innodb-alter-tempfile
|
mysqltest: At line 45: query 'show create table t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
|
…
|
2023-10-25 9:00:32 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=52247,52247
|
2023-10-25 9:00:32 0 [Note] InnoDB: Trying to recover page [page id: space=0, page number=279] from the doublewrite buffer.
|
2023-10-25 9:00:32 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=279]. You may have to recover from a backup.
|