[MDEV-29999] innodb_undo_log_truncate=ON is not crash safe Created: 2022-11-11  Updated: 2023-05-31  Resolved: 2022-11-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.11.2, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: corruption, recovery

Attachments: File utr4k.tar.xz    
Issue Links:
Blocks
blocks MDEV-29986 Set innodb_undo_tablespaces=3 by default Closed
Relates
relates to MDEV-31373 innodb_undo_log_truncate=ON recovery ... Closed
relates to MDEV-14795 InnoDB system tablespace cannot be sh... Closed

 Description   

Undo log truncation is still buggy, even after MDEV-26450 was fixed.

while ./mtr --rr=-h --parallel=auto --mysqld=--innodb-purge-threads=1 innodb.undo_truncate_recover,4k,strict_full_crc32{,,,,,,,,,,}; do :; done

10.6 67f4ba240e5fa5f95ab4f1688bdb18f61e7d4d7f

innodb.undo_truncate_recover '2,4k,innodb,strict_full_crc32' w4 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2022-11-11 09:51:35
line
2022-11-11  9:51:34 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=2, page number=74]
2022-11-11  9:51:34 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2022-11-11  9:51:34 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=2, page number=74]; set innodb_force_recovery to ignore
2022-11-11  9:51:34 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1505] with error Data structure corruption
2022-11-11  9:51:34 0 [ERROR] InnoDB: Failed to read page 74 from file './/undo002': Page read from tablespace is corrupted.
2022-11-11  9:51:34 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-11-11  9:51:34 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

Crash recovery will read an older version of the page. The checkpoint LSN was equal to the commit LSN of mtr.commit_shrink(space) in trx_purge_truncate_history(). So, the error is only on the recovery side.

Recovery invoked recv_sys_t::trim() on page 640, but in fact all pages in the undo log tablespace will be reinitialized, and hence all redo log records for the tablespace before the truncation LSN need to be discarded.

This failure has also occurred on our CI systems a couple of times:

10.10 d371e35257c45895318c4efbed20e5bfdcc7cee9

innodb.undo_truncate_recover '2,4k,innodb,strict_full_crc32' w4 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2022-06-18 12:14:25
line
2022-06-18 12:14:25 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=2, page number=74]
2022-06-18 12:14:25 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2022-06-18 12:14:25 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1243] with error Data structure corruption
2022-06-18 12:14:25 0 [ERROR] InnoDB: Failed to read page 74 from file './/undo002': Page read from tablespace is corrupted.
2022-06-18 12:14:25 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-06-18 12:14:25 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
^ Found warnings in /buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/4/log/mysqld.1.err
ok
 
 - saving '/buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/4/log/innodb.undo_truncate_recover-2,4k,innodb,strict_full_crc32/' to '/buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/log/innodb.undo_truncate_recover-2,4k,innodb,strict_full_crc32/'

We can thank MDEV-18976 and https://rr-project.org for finding this bug. Before it, the failure looked like this:

10.6 660cfe4782380bbb10ab844014b82f4d5c2fee50

innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' w2 [ fail ]
        Test ended at 2021-12-12 18:54:02
 
CURRENT_TEST: innodb.undo_truncate_recover
 
 
Server [mysqld.1 - pid: 191888, winpid: 191888, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
$ /buildbot/amd64-ubuntu-1804-msan/build/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=4K --skip-innodb-read-only-compressed --innodb-checksum-algorithm=strict_crc32 --innodb-undo-tablespaces=2 --innodb-purge-threads=1 --core-file --loose-debug-sync-timeout=300
2021-12-12 18:53:30 0 [Note] /buildbot/amd64-ubuntu-1804-msan/build/sql/mariadbd (server 10.6.6-MariaDB-debug-log) starting as process 190177 ...
2021-12-12 18:54:01 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=20546035,20621018
2021-12-12 18:54:01 0 [Note] InnoDB: Opened 2 undo tablespaces
2021-12-12 18:54:01 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2021-12-12 18:54:01 0 [Note] InnoDB: Starting final batch to recover 1816 pages from redo log.
mariadbd: /buildbot/amd64-ubuntu-1804-msan/build/storage/innobase/log/log0recv.cc:275: log_phys_t::apply_status log_phys_t::apply(const buf_block_t &, uint16_t &) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.



 Comments   
Comment by Marko Mäkelä [ 2022-11-11 ]

Test case:

tar xJf utr4k.tar.gz
mariadbd --innodb-page-size=4k --innodb-log-file-size=10m --datadir $(pwd)/data

Comment by Marko Mäkelä [ 2022-11-11 ]

I do not think that this is feasible to fix in 10.3 or 10.4. Recovery as well as the undo log truncation was rewritten in 10.5.

Generated at Thu Feb 08 10:12:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.