[MDEV-31098] InnoDB Recovery doesn't display encryption message when no encryption configuration passed Created: 2023-04-20  Updated: 2023-10-13  Resolved: 2023-10-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: recovery

Attachments: File data.tar.xz     Text File encryption_keys.txt    
Issue Links:
Problem/Incident
is caused by MDEV-24626 Remove synchronous write of page0 and... Closed

 Description   

An attempt to recover the attached data.tar.xz fails:

tar xJf data.tar.xz
mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data

10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6

2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.

A crude attempt at fixing this as follows caused a crash.

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index f79eb015673..0f614068afd 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -2496,23 +2496,16 @@ fil_ibd_load(
 	}
 
 	os_offset_t	size;
-	bool		deferred_space = false;
 
 	/* Read and validate the first page of the tablespace.
 	Assign a tablespace name based on the tablespace type. */
 	switch (file.validate_for_recovery()) {
 		os_offset_t	minimum_size;
 	case DB_SUCCESS:
-		deferred_space = file.m_defer;
-
-		if (deferred_space) {
-			goto tablespace_check;
-		}
-
-		if (file.space_id() != space_id) {
+		if (!file.m_defer && file.space_id() != space_id) {
 			return(FIL_LOAD_ID_CHANGED);
 		}
-tablespace_check:
+
 		/* Get and test the file size. */
 		size = os_file_get_size(file.handle());
 
@@ -2528,9 +2521,7 @@ fil_ibd_load(
 			ib::error() << "Could not measure the size of"
 				" single-table tablespace file '"
 				<< file.filepath() << "'";
-		} else if (deferred_space) {
-			return FIL_LOAD_DEFER;
-		} else if (size < minimum_size) {
+		} else if (!file.m_defer && size < minimum_size) {
 			ib::error() << "The size of tablespace file '"
 				<< file.filepath() << "' is only " << size
 				<< ", should be at least " << minimum_size

The problem is that the FIL_LOAD_DEFER return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The FIL_PAGE_LSN in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a WRITE) that recv_sys_t::recover_deferred() would see for the page carries start_lsn = 79742990 and lsn = 79743157, which is even later. The deferred recovery fails because skip_read was not set for the recv_sys.pages entry.

In case the problem is actually on the 'write' side (which I doubt), you can access rr replay of the previously killed server as follows:

ssh sdp
rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1

It starts straight after bootstrap, at LSN 41114.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2023-04-25 ]

If we start the 10.6 server with the following command:

./sql/mysqld --basedir=$PWD --datadir=$PWD/data --socket=$PWD/data/mysql.sock --log-error=$PWD/data/mysql.log --plugin-dir=$PWD/plugin/file_key_management --file-key-management --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$PWD/../encryption_keys.txt --file-key-management-encryption-algorithm=aes_cbc --innodb_page_size=32k

There is no failure in plain 10.6. This issue is related to MDEV-29911 Patch.

Comment by Marko Mäkelä [ 2023-04-26 ]

Thank you for debugging this and fixing the MDEV-29911 code. As we discussed, it would be good to clarify the error message, to say that the page cannot be decrypted.

Comment by Thirunarayanan Balathandayuthapani [ 2023-09-26 ]

Patch is in bb-10.6-MDEV-31851

Comment by Marko Mäkelä [ 2023-09-28 ]

Thank you, this is a simple fix. I’d suggest to modify the test encryption.innodb-redo-nokeys to ensure that the error message will be reported.

Comment by Marko Mäkelä [ 2023-10-11 ]

I understand that due to the somewhat nondeterministic nature of page I/O, it may be hard to write a test that would invoke source include/search_pattern_in_file.inc to ensure that the error message was output.

Can you still try to do that? Please also address my minor comments about the added function fil_crypt_check().

Generated at Thu Feb 08 10:21:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.