[MDEV-32971] Assertion !recv_sys.is_corrupt_fs() failed on recovery Created: 2023-12-08  Updated: 2023-12-08  Resolved: 2023-12-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.5, 10.6, 10.7
Fix Version/s: 10.5.24, 10.6.17

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: debug, not-10.8

Attachments: File data.tar.xz    
Issue Links:
Relates
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed

 Description   

An encryption test failed on a mandatory builder:

10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3

encryption.innodb-redo-nokeys 'ctr,innodb' w12 [ fail ]
...
2023-12-08  7:46:08 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
2023-12-08  7:46:08 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
2023-12-08  7:46:08 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
mariadbd: /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/log/log0recv.cc:3510: dberr_t recv_recovery_from_checkpoint_start(lsn_t): Assertion `!recv_sys.found_corrupt_fs' failed.

This is reproducible also when attempting to start up 10.6 on a copy of the data directory (data.tar.xz).
The code path that outputs the error message when processing a FILE_MODIFY record will also set recv_sys.found_corrupt_fs. This is triggered by a fil_ibd_load() return value FIL_LOAD_INVALID because of missing encryption information:

	if (crypt_data && !crypt_data->is_key_found()) {
		crypt_data->~fil_space_crypt_t();
		ut_free(crypt_data);
		return FIL_LOAD_INVALID;
	}

In 10.6 the code is different:

	if (crypt_data && !fil_crypt_check(crypt_data, filename)) {
		return FIL_LOAD_INVALID;
	}

All we need to do is to relax the too strict debug assertion:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index d52a62cec99..05120871b0a 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -3507,7 +3507,7 @@ recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
 	recv_group_scan_log_recs(checkpoint_lsn, &contiguous_lsn, false);
 	/* The first scan should not have stored or applied any records. */
 	ut_ad(recv_sys.pages.empty());
-	ut_ad(!recv_sys.found_corrupt_fs);
+	ut_ad(!recv_sys.found_corrupt_fs || !srv_force_recovery);
 
 	if (srv_read_only_mode && recv_needed_recovery) {
 		mysql_mutex_unlock(&log_sys.mutex);

In this way, the server startup will fail gracefully, just like it is expected by the test:

10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3 with patch

2023-12-08 10:47:50 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
2023-12-08 10:47:50 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
2023-12-08 10:47:50 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-12-08 10:47:50 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1435867 between the checkpoint 53205 and the end 1501184.
2023-12-08 10:47:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
2023-12-08 10:47:50 0 [Note] InnoDB: Starting shutdown...
2023-12-08 10:47:51 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

In 10.8, the log format was changed and the logic around FILE_CHECKPOINT was rewritten in MDEV-14425. We would still invoke fil_name_process() on the FILE_MODIFY records, and recv_sys.found_corrupt_fs could be set for a similar reason, but there is no assertion failure in recv_recovery_from_checkpoint_start() after the initial call to recv_scan_log(). In our CI systems, I only found this type of failure on the 10.5 and 10.6 branches.


Generated at Thu Feb 08 10:35:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.