[MDEV-33274] The test encryption.innodb-redo-nokeys often fails Created: 2024-01-18  Updated: 2024-02-08  Resolved: 2024-02-08

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB, Tests
Affects Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 11.3.2, 10.5.25, 10.6.18, 10.11.8, 11.0.6, 11.1.5, 11.2.4

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Debarun Banerjee
Resolution: Fixed Votes: 0
Labels: recovery

Issue Links:
Relates
relates to MDEV-12353 Efficient InnoDB redo log record format Closed

 Description   

The test encryption.innodb-redo-nokeys often fails in various ways on 10.5 and later versions. On 10.4 I found a shutdown hang as well as SAFEMALLOC warnings on memory leaks, but nothing like this. Here are a few examples:

10.11 d15260990d94258bca9eb2bae94159f6

encryption.innodb-redo-nokeys 'cbc,innodb' w1 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2022-10-27 15:10:25
line
2022-10-27 15:10:21 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT(72379) at 1561912
2022-10-27 15:10:21 0 [ERROR] InnoDB: Log scan aborted at LSN 1561912

10.5 8a763c014ede4adad9c84852269b5af61845c0d9

encryption.innodb-redo-nokeys 'cbc,innodb' w21 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2024-01-12 16:33:53
line
2024-01-12 16:33:49 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1566925 between the checkpoint 53172 and the end 1632256.

10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a

encryption.innodb-redo-nokeys 'cbc,innodb' w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2024-01-11 04:20:08
line
2024-01-11  4:20:06 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1108630 between the checkpoint 52669 and the end 1174016.

10.6 0f510d8b5a0089814edbadeebd68a1b83f8db9eb

encryption.innodb-redo-nokeys 'cbc,innodb' w11 [ fail ]
        Test ended at 2023-12-14 12:09:14
 
CURRENT_TEST: encryption.innodb-redo-nokeys
--- /home/buildbot/amd64-ubuntu-2204-clang14-asan/build/mysql-test/suite/encryption/r/innodb-redo-nokeys.result	2023-12-14 11:52:21.000000000 +0000
+++ /home/buildbot/amd64-ubuntu-2204-clang14-asan/build/mysql-test/suite/encryption/r/innodb-redo-nokeys.reject	2023-12-14 12:09:13.318204770 +0000
@@ -40,6 +40,6 @@
 WHERE engine = 'innodb'
 AND support IN ('YES', 'DEFAULT', 'ENABLED');
 ENGINE	SUPPORT	COMMENT	TRANSACTIONS	XA	SAVEPOINTS
-FOUND 1 /\[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
+NOT FOUND /\[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
 # restart: --file-key-management-filename=MYSQL_TEST_DIR/std_data/keys2.txt
 drop table t1,t2,t3,t4,t5;
 
mysqltest: Result length mismatch

The redo logging and recovery was rewritten in MDEV-12353 and later refined in MDEV-14425.

These failures need to be analyzed to determine the reason of the failure. Possibly we will need to improve the control of background writes, page flushing and the checkpoint activity, like we have done in the test innodb.doublewrite (MDEV-23855, MDEV-32242, MDEV-32968).



 Comments   
Comment by Debarun Banerjee [ 2024-01-30 ]

10.11: Could reproduce all the symptoms. The intermittent behaviour is caused by the page flush and intermittent checkpoint.

1. [ERROR] InnoDB: Missing FILE_CHECKPOINT(72379) at 1561912

2. [ERROR] InnoDB: Log scan aborted at LSN 1561912

  • We set recv_sys.set_corrupt_fs(); when not able to load the tablespace due to missing encryption key.
  • When it happens while trying to search for the checkpoint we ignore the corrupt fs and do set_corrupt_log
  • This is inaccurate and could be corrected by checking for corrupt fs error in callers.

3. -FOUND 1 /[ERROR] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
+NOT FOUND /[ERROR] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err

  • Based on the flush and checkpoint we could return error for any one of t1-t5 encrypted table
  • The test needs to be corrected.

All 3 cases can be repeated by forcing intermediate flush and checkpoint.

diff --git a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
index 550979680c5..646aede1947 100644
--- a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
+++ b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
@@ -32,6 +32,9 @@ while ($i)
   insert into t1(c,b) values (repeat('secret1',20), repeat('secret2',6000));
   dec $i;
 }
+
+SET GLOBAL innodb_buf_flush_list_now = 1;
+SET GLOBAL innodb_log_checkpoint_now = 1;
 --enable_query_log
 
 insert into t2 select * from t1;

Fix (10.11):

diff --git a/mysql-test/suite/encryption/r/innodb-redo-nokeys.result b/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
index b7d6eb50e70..ea75e4a873e 100644
--- a/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
+++ b/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
@@ -2,15 +2,10 @@ call mtr.add_suppression("mariadbd.*: File ");
 call mtr.add_suppression("Plugin 'file_key_management' ");
 call mtr.add_suppression("InnoDB: Recovery cannot access file");
 call mtr.add_suppression("InnoDB: Plugin initialization aborted");
-call mtr.add_suppression("Plugin 'InnoDB' init function returned error\\.");
 call mtr.add_suppression("Plugin 'InnoDB' registration as a STORAGE ENGINE failed.");
 call mtr.add_suppression("InnoDB: (Unable to apply log to|Discarding log for) corrupted page ");
 call mtr.add_suppression("InnoDB: Cannot apply log to \\[page id: space=[1-9][0-9]*, page number=0\\] of corrupted file '.*test.t[1-5]\\.ibd'");
 call mtr.add_suppression("InnoDB: Failed to read page .* from file '.*'");
-call mtr.add_suppression("InnoDB: OPT_PAGE_CHECKSUM mismatch");
-call mtr.add_suppression("InnoDB: Missing FILE_CHECKPOINT");
-call mtr.add_suppression("InnoDB: Log scan aborted at LSN");
-call mtr.add_suppression("InnoDB: Set innodb_force_recovery=1 to ignore corruption");
 call mtr.add_suppression("InnoDB: Encryption key is not found for");
 # restart: --file-key-management-filename=MYSQL_TEST_DIR/std_data/keys2.txt
 SET GLOBAL innodb_file_per_table = ON;
@@ -42,6 +37,6 @@ SELECT * FROM INFORMATION_SCHEMA.ENGINES
 WHERE engine = 'innodb'
 AND support IN ('YES', 'DEFAULT', 'ENABLED');
 ENGINE SUPPORT COMMENT TRANSACTIONS    XA      SAVEPOINTS
-FOUND 1 /\[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
+FOUND 1 /\[ERROR\] InnoDB: Encryption key is not found for .*test.t[1-5].ibd/ in mysqld.1.err
 # restart: --file-key-management-filename=MYSQL_TEST_DIR/std_data/keys2.txt
 drop table t1,t2,t3,t4,t5;
diff --git a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
index 52c12761f69..550979680c5 100644
--- a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
+++ b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
@@ -7,15 +7,10 @@ call mtr.add_suppression("mariadbd.*: File ");
 call mtr.add_suppression("Plugin 'file_key_management' ");
 call mtr.add_suppression("InnoDB: Recovery cannot access file");
 call mtr.add_suppression("InnoDB: Plugin initialization aborted");
-call mtr.add_suppression("Plugin 'InnoDB' init function returned error\\.");
 call mtr.add_suppression("Plugin 'InnoDB' registration as a STORAGE ENGINE failed.");
 call mtr.add_suppression("InnoDB: (Unable to apply log to|Discarding log for) corrupted page ");
 call mtr.add_suppression("InnoDB: Cannot apply log to \\[page id: space=[1-9][0-9]*, page number=0\\] of corrupted file '.*test.t[1-5]\\.ibd'");
 call mtr.add_suppression("InnoDB: Failed to read page .* from file '.*'");
-call mtr.add_suppression("InnoDB: OPT_PAGE_CHECKSUM mismatch");
-call mtr.add_suppression("InnoDB: Missing FILE_CHECKPOINT");
-call mtr.add_suppression("InnoDB: Log scan aborted at LSN");
-call mtr.add_suppression("InnoDB: Set innodb_force_recovery=1 to ignore corruption");
 call mtr.add_suppression("InnoDB: Encryption key is not found for");
 
 -- let $restart_parameters=--file-key-management-filename=$MYSQL_TEST_DIR/std_data/keys2.txt
@@ -77,7 +72,7 @@ WHERE engine = 'innodb'
 AND support IN ('YES', 'DEFAULT', 'ENABLED');
 
 let SEARCH_FILE = $MYSQLTEST_VARDIR/log/mysqld.1.err;
-let SEARCH_PATTERN = \[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd;
+let SEARCH_PATTERN = \[ERROR\] InnoDB: Encryption key is not found for .*test.t[1-5].ibd;
 --source include/search_pattern_in_file.inc
 
 #
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index e72f842f9a9..21917800c66 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -4122,7 +4122,7 @@ static bool recv_scan_log(bool last_phase)
           ut_ad(!end || end == recv_sys.lsn);
           mysql_mutex_unlock(&recv_sys.mutex);
 
-          if (!end)
+          if (!recv_sys.is_corrupt_fs() && !end)
           {
             recv_sys.set_corrupt_log();
             sql_print_error("InnoDB: Missing FILE_CHECKPOINT(" LSN_PF
@@ -4602,6 +4602,9 @@ dberr_t recv_recovery_from_checkpoint_start()
                                        LSN_PF, recv_sys.lsn);
                         goto err_exit;
                }
+               if (recv_sys.is_corrupt_fs()) {
+                       goto err_exit;
+               }
                ut_ad(recv_sys.file_checkpoint);
                if (rewind) {
                        recv_sys.lsn = log_sys.next_checkpoint_lsn;
@@ -4640,9 +4643,9 @@ dberr_t recv_recovery_from_checkpoint_start()
 
                        do {
                                rescan = recv_scan_log(false);
-                               ut_ad(!recv_sys.is_corrupt_fs());
 
-                               if (recv_sys.is_corrupt_log()) {
+                               if (recv_sys.is_corrupt_log() ||
+                                   recv_sys.is_corrupt_fs()) {
                                        goto err_exit;
                                }

Comment by Debarun Banerjee [ 2024-01-30 ]

10.6: Two issues from 10.11 are relevant. The reason is same.

1. [ERROR] InnoDB: Missing FILE_CHECKPOINT(72379) at 1561912
2. -FOUND 1 /[ERROR] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
+NOT FOUND /[ERROR] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err

Can be repeated with following change in test in debug mode.

diff --git a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
index 366298026a2..cc3077b2040 100644
--- a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
+++ b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
@@ -28,6 +28,7 @@ create table t4(a int not null primary key auto_increment, c char(200), b blob,
 
 begin;
 --disable_query_log
+SET GLOBAL innodb_buf_flush_list_now = 1;
 --let $i = 20
 begin;
 while ($i)
@@ -41,6 +42,7 @@ insert into t2 select * from t1;
 insert into t3 select * from t1;
 insert into t4 select * from t1;
 commit;
+sleep 2;
 
 --source ../../suite/innodb/include/no_checkpoint_start.inc
 #

Fix 10.6

diff --git a/mysql-test/suite/encryption/r/innodb-redo-nokeys.result b/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
index ce24c7c7033..9eaf8fb9430 100644
--- a/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
+++ b/mysql-test/suite/encryption/r/innodb-redo-nokeys.result
@@ -40,6 +40,6 @@ SELECT * FROM INFORMATION_SCHEMA.ENGINES
 WHERE engine = 'innodb'
 AND support IN ('YES', 'DEFAULT', 'ENABLED');
 ENGINE	SUPPORT	COMMENT	TRANSACTIONS	XA	SAVEPOINTS
-FOUND 1 /\[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd/ in mysqld.1.err
+FOUND 1 /\[ERROR\] InnoDB: Encryption key is not found for .*test.t[1-5].ibd/ in mysqld.1.err
 # restart: --file-key-management-filename=MYSQL_TEST_DIR/std_data/keys2.txt
 drop table t1,t2,t3,t4,t5;
diff --git a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
index 7323a9148c3..366298026a2 100644
--- a/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
+++ b/mysql-test/suite/encryption/t/innodb-redo-nokeys.test
@@ -75,7 +75,7 @@ WHERE engine = 'innodb'
 AND support IN ('YES', 'DEFAULT', 'ENABLED');
 
 let SEARCH_FILE = $MYSQLTEST_VARDIR/log/mysqld.1.err;
-let SEARCH_PATTERN = \[ERROR\] InnoDB: Encryption key is not found for .*test.t1.ibd;
+let SEARCH_PATTERN = \[ERROR\] InnoDB: Encryption key is not found for .*test.t[1-5].ibd;
 --source include/search_pattern_in_file.inc
 
 #
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 1d26b7a195e..502e01e1adb 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -4531,6 +4531,15 @@ recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
 		return(DB_ERROR);
 	}
 
+	/* If we fail to open a tablespace while looking for FILE_CHECKPOINT, we
+	set the corruption flag. Specifically, if encryption key is missing, we
+	would not be able to open an encrypted tablespace and the flag could be
+	set. */
+	if (recv_sys.is_corrupt_fs()) {
+		mysql_mutex_unlock(&log_sys.mutex);
+		return DB_ERROR;
+	}
+
 	if (recv_sys.mlog_checkpoint_lsn == 0) {
 		lsn_t scan_lsn = log_sys.log.scanned_lsn;
 		if (!srv_read_only_mode && scan_lsn != checkpoint_lsn) {

Comment by Debarun Banerjee [ 2024-01-30 ]

Only one issue is relevant for 10.5. The reason is same.

1. [ERROR] InnoDB: Missing FILE_CHECKPOINT(72379) at 1561912
Can be repeated with same debug change as 10.6 in the test.

Fix 10.5

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index ffe35e9dbc7..620e9c9209d 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -3538,6 +3538,15 @@ recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
 		return(DB_ERROR);
 	}
 
+	/* If we fail to open a tablespace while looking for FILE_CHECKPOINT, we
+	set the corruption flag. Specifically, if encryption key is missing, we
+	would not be able to open an encrypted tablespace and the flag could be
+	set. */
+	if (recv_sys.found_corrupt_fs) {
+		mysql_mutex_unlock(&log_sys.mutex);
+		return DB_ERROR;
+	}
+
 	if (recv_sys.mlog_checkpoint_lsn == 0) {
 		lsn_t scan_lsn = log_sys.log.scanned_lsn;
 		if (!srv_read_only_mode && scan_lsn != checkpoint_lsn) {

Comment by Debarun Banerjee [ 2024-01-30 ]

Created 3 pull requests for 10.5, 10.6 & 10.11 as the patches vary.

10.5: https://github.com/MariaDB/server/pull/3031
10.6: https://github.com/MariaDB/server/pull/3032
10.11: https://github.com/MariaDB/server/pull/3033

Comment by Debarun Banerjee [ 2024-01-30 ]

Hi,
I have created the review request with the patches as discussed. Please review.

Comment by Marko Mäkelä [ 2024-02-02 ]

Great work!

Unrelated to your improvements, I noticed that the 10.6 and 10.11 versions of the test are being skipped very easily on my system, and also on buildbot. It could be useful to ensure that a log checkpoint is executed right before the no_checkpoint_start.inc. We can’t use the debug setting SET GLOBAL innodb_buf_flush_list_now = 1; to achieve that, because this test is targeting all builds.

Comment by Debarun Banerjee [ 2024-02-08 ]

The fix is pushed to 10.5
commit 5e7047067e4c99f61d398c9dbee1fd74ef0b1632 (HEAD -> 10.5, origin/10.5)
Author: mariadb-DebarunBanerjee <debarun.banerjee@mariadb.com>
Date: Tue Feb 6 15:50:10 2024 +0530

MDEV-33274 The test encryption.innodb-redo-nokeys often fails

The merge to 10.6 and 10.11 is non-trivial. We need to use the following patches while merging.

10.6: https://github.com/MariaDB/server/pull/3032
10.11: https://github.com/MariaDB/server/pull/3033

Generated at Thu Feb 08 10:37:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.