[MDEV-31827] InnoDB multi-batch recovery stops prematurely Created: 2023-08-02  Updated: 2023-08-08  Resolved: 2023-08-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, need_rr, recovery, regression

Attachments: File MDEV-31827-data.tar.xz    
Issue Links:
Problem/Incident
is caused by MDEV-29911 InnoDB recovery and mariadb-backup --... Closed

 Description   

In the final testing of MDEV-31791, I observed the following failure:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w4 [ 89 fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-02 12:21:21
line
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

A larger excerpt of the server error log indicates that we actually scanned more of the log, but for some reason failed to fully apply the final recovery batch and stopped prematurely at LSN 418951162 instead of 421869561:

2023-08-02 12:21:18 0 [Note] InnoDB: Multi-batch recovery needed at LSN 418950921
2023-08-02 12:21:18 0 [Note] InnoDB: End of log at LSN=421869561
2023-08-02 12:21:18 0 [Note] InnoDB: To recover: LSN 418950921/421869561; 106 pages
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 15548 row operations to undo
2023-08-02 12:21:18 0 [Note] InnoDB: Trx id counter is 652795
2023-08-02 12:21:18 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-08-02 12:21:18 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
2023-08-02 12:21:18 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
...
2023-08-02 12:21:19 0 [Note] InnoDB: log sequence number 418951162; transaction id 652796

This might be a regression due to MDEV-29911.



 Comments   
Comment by Marko Mäkelä [ 2023-08-02 ]

Here is another occurrence that looks a little different (single-batch recovery):

innodb.recovery_memory 'innodb,release'  w29 [ 13 fail ]
        Test ended at 2023-08-02 13:44:18
 
CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
2023-08-02 13:44:17 0 [Note] InnoDB: Completed initialization of buffer pool
2023-08-02 13:44:17 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2023-08-02 13:44:17 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=62853214
2023-08-02 13:44:17 0 [Note] InnoDB: End of log at LSN=62856487
2023-08-02 13:44:17 0 [Note] InnoDB: Ignoring a doublewrite copy of page [page id: space=2, page number=36] with future log sequence number 62857383
2023-08-02 13:44:17 0 [ERROR] InnoDB: Page [page id: space=0, page number=7] log sequence number 62864128 is in the future! Current system log sequence number 62856588.

Another run, with one more patch and some additional locking to rule out a problem of misbehaving environment (MDEV-28976):

./mtr --mysqld=--external-locking=1 --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
index f12fe1807eb..f9e0cb8284f 100644
--- a/storage/innobase/buf/buf0buf.cc
+++ b/storage/innobase/buf/buf0buf.cc
@@ -533,7 +533,7 @@ static void buf_page_check_lsn(bool check_lsn, const byte* read_buf)
 			const uint32_t page_no = mach_read_from_4(
 				read_buf + FIL_PAGE_OFFSET);
 
-			ib::error() << "Page " << page_id_t(space_id, page_no)
+			ib::fatal() << "Page " << page_id_t(space_id, page_no)
 				<< " log sequence number " << page_lsn
 				<< " is in the future! Current system"
 				<< " log sequence number "

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w21 [ 64 fail ]
        Test ended at 2023-08-02 14:12:27
 
CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 42: query 'CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
2023-08-02 14:12:26 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2023-08-02 14:12:26 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=324689424
2023-08-02 14:12:26 0 [Note] InnoDB: End of log at LSN=324689897
2023-08-02 14:12:26 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 251
2023-08-02 14:12:26 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption

Comment by Marko Mäkelä [ 2023-08-02 ]

One more failure:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w35 [ 15 fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-02 14:26:09
line
2023-08-02 14:26:05 0 [ERROR] InnoDB: Expected tablespace id 58 but found 0 in the file ./test/t1.ibd

This might be MDEV-31826. To prevent that, I applied a temporary fix:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index a4a5dcf1aa3..f1304418870 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -1756,12 +1756,16 @@ pfs_os_file_t fil_delete_tablespace(uint32_t id)
 {
   ut_ad(!is_system_tablespace(id));
   pfs_os_file_t handle= OS_FILE_CLOSED;
+  mtr_t mtr;
+  mtr.start();
+  mysql_mutex_lock(&fil_system.mutex);
+  if (fil_space_t *space= fil_space_get_by_id(id))
+    mtr.log_file_op(FILE_DELETE, id, space->chain.start->name);
+  mysql_mutex_unlock(&fil_system.mutex);
+
   if (fil_space_t *space= fil_space_t::check_pending_operations(id))
   {
     /* Before deleting the file(s), persistently write a log record. */
-    mtr_t mtr;
-    mtr.start();
-    mtr.log_file_op(FILE_DELETE, id, space->chain.start->name);
     mtr.commit_file(*space, nullptr, &handle);
     fil_space_free_low(space);
   }

Comment by Marko Mäkelä [ 2023-08-02 ]

Here is one more failure, suggesting some kind of redo log corruption (premature end of the log when parsing):

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w85 [ 3 fail ]
        Test ended at 2023-08-02 14:46:41
 
CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
2023-08-02 14:46:40 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2023-08-02 14:46:40 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT(10032912) at 10680209
2023-08-02 14:46:40 0 [ERROR] InnoDB: Log scan aborted at LSN 10680209
2023-08-02 14:46:40 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error

I believe that in order to make any progress with this, we need rr replay traces of failures. On my system, everything appears to be fine if the databases reside in /dev/shm and the memory-mapped log ("fake PMEM") is being used. It might merely have something to do with timing.

Comment by Marko Mäkelä [ 2023-08-02 ]

With the memory-mapped log ("fake PMEM" on /dev/shm), the test appears rock solid to me.

I can’t tell if the rather frequent failures on Linux tmpfs when using normal pwrite() calls for the ib_logfile0 are due to timing differences or due to a bug in my kernel:

Linux version 6.3.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.3.0-5) 12.3.0, GNU ld (GNU Binutils for Debian) 2.40.50.20230625) #1 SMP PREEMPT_DYNAMIC Debian 6.3.11-1 (2023-07-01)

I ran one test round on ext4fs on NVMe storage and finally reproduced the bug in the Description, this time with a core dump thanks to my modification of buf_page_check_lsn():

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

2023-08-02 15:41:14 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
2023-08-02 15:41:14 0 [Note] InnoDB: Completed initialization of buffer pool
2023-08-02 15:41:14 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
2023-08-02 15:41:14 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=271813827
2023-08-02 15:41:14 0 [Note] InnoDB: Multi-batch recovery needed at LSN 272313244
2023-08-02 15:41:14 0 [Note] InnoDB: End of log at LSN=276035576
2023-08-02 15:41:15 0 [ERROR] [FATAL] InnoDB: Page [page id: space=1, page number=312] log sequence number 274108480 is in the future! Current system log sequence number 272313346.
#7  0x00005634b7132be7 in ib::fatal::~fatal (this=this@entry=0x7ffc8c806ba0, __in_chrg=<optimized out>) at /mariadb/11/storage/innobase/ut/ut0ut.cc:527
#8  0x00005634b6fb44de in buf_page_check_lsn (read_buf=read_buf@entry=0x7fdb3a9c8000 "", check_lsn=<optimized out>) at /mariadb/11/storage/innobase/buf/buf0buf.cc:536
#9  0x00005634b6fb2c91 in buf_page_check_lsn (read_buf=0x7fdb3a9c8000 "", check_lsn=true) at /mariadb/11/storage/innobase/buf/buf0buf.cc:522
#10 buf_page_is_corrupted (check_lsn=check_lsn@entry=true, read_buf=read_buf@entry=0x7fdb3a9c8000 "", fsp_flags=<optimized out>) at /mariadb/11/storage/innobase/buf/buf0buf.cc:611
#11 0x00005634b6fb3170 in buf_page_check_corrupt (node=@0x5634b924d870: {space = 0x5634b924d740, name = 0x5634b924d8c0 ".//undo001", handle = {m_file = 9}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 1216, init_size = 0, max_size = 4294967295, being_extended = {m = std::atomic<bool> = { false }}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}, bpage=0x7fdb3a9c3000) at /mariadb/11/storage/innobase/buf/buf0buf.cc:3160
#12 buf_page_t::read_complete (this=this@entry=0x7fdb3a9c3000, node=@0x5634b924d870: {space = 0x5634b924d740, name = 0x5634b924d8c0 ".//undo001", handle = {m_file = 9}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 1216, init_size = 0, max_size = 4294967295, being_extended = {m = std::atomic<bool> = { false }}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at /mariadb/11/storage/innobase/buf/buf0buf.cc:3251
#13 0x00005634b70009d7 in buf_read_page_low (page_id={m_id = 4294967608}, zip_size=zip_size@entry=0, chain=@0x5634b9246058: {first = 0x7fdb3a9c3000}, space=space@entry=0x5634b924d740, block=@0x7ffc8c807028: 0x0, sync=sync@entry=true) at /mariadb/11/storage/innobase/buf/buf0rea.cc:263
#14 0x00005634b7000dca in buf_read_page (page_id=page_id@entry={m_id = 4294967608}, zip_size=zip_size@entry=0, chain=@0x5634b9246058: {first = 0x7fdb3a9c3000}) at /mariadb/11/storage/innobase/buf/buf0rea.cc:436
#15 0x00005634b6fb74d8 in buf_page_get_low (page_id={m_id = 4294967608}, zip_size=0, rw_latch=2, guess=0x0, mode=10, mtr=0x7ffc8c807340, err=0x0) at /mariadb/11/storage/innobase/buf/buf0buf.cc:2363
#16 0x00005634b6fb2f55 in buf_page_get_gen (page_id=<optimized out>, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>) at /mariadb/11/storage/innobase/buf/buf0buf.cc:2656
#17 0x00005634b712c568 in trx_undo_mem_create_at_db_start (rseg=rseg@entry=0x5634b73df940 <trx_sys+17152>, id=id@entry=0, page_no=312) at /mariadb/11/storage/innobase/trx/trx0undo.cc:1017
#18 0x00005634b7123817 in trx_undo_lists_init (rseg_header=0x7fdb3aac5620, rseg=<optimized out>) at /mariadb/11/storage/innobase/trx/trx0rseg.cc:424
#19 trx_rseg_mem_restore (mtr=0x7ffc8c807800, rseg=<optimized out>) at /mariadb/11/storage/innobase/trx/trx0rseg.cc:495
#20 trx_rseg_array_init () at /mariadb/11/storage/innobase/trx/trx0rseg.cc:611
#21 0x00005634b7128570 in trx_lists_init_at_db_start () at /mariadb/11/storage/innobase/trx/trx0trx.cc:721
#22 0x00005634b710cdc0 in srv_start (create_new_db=false) at /mariadb/11/storage/innobase/srv/srv0start.cc:1561

The function srv_start() would invoke recv_sys.apply(true) to apply the last batch of log records a little later.

Comment by Marko Mäkelä [ 2023-08-02 ]

The LSN check in buf_page_check_lsn() is guarded both by the function parameter check_lsn and by the global variable recv_lsn_checks_on. That variable would be set in recv_recovery_from_checkpoint_start(), after parsing the log.

In MDEV-29911, the parsing of the log was changed, and recv_recovery_from_checkpoint_start() may no longer assign to log_sys.lsn the latest LSN. In the core dump that I am analyzing, I see log_sys.write_lsn=recv_sys.lsn=272313244, which corresponds to the LSN at which we noticed that multi-batch recovery will be needed. The end of the log was parsed at 276035576.

I am yet to evaluate whether this also affects the 10.6 version of MDEV-29911, or only 10.9 and later versions.

Comment by Marko Mäkelä [ 2023-08-02 ]

The recovery fails nondeterministically. An attempt to recover a copy of the failed data directory happened to works just fine, at least under rr record:

2023-08-02 16:43:13 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
2023-08-02 16:43:13 0 [Note] InnoDB: Completed initialization of buffer pool
2023-08-02 16:43:13 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2023-08-02 16:43:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=271813827
2023-08-02 16:43:13 0 [Note] InnoDB: Multi-batch recovery needed at LSN 272340327
2023-08-02 16:43:13 0 [Note] InnoDB: End of log at LSN=276035576
2023-08-02 16:43:13 0 [Note] InnoDB: To recover: LSN 272340327/276035576; 278 pages
2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 273245127/276035576; 54 pages
2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 274181976/276035576; 55 pages
2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 275118954/276035576; 54 pages
2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 276008436/276035576; 59 pages
2023-08-02 16:43:14 0 [Note] InnoDB: To recover: 7 pages
2023-08-02 16:43:14 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=0 because previous shutdown was not with innodb_fast_shutdown=0
2023-08-02 16:43:14 0 [Note] InnoDB: log sequence number 276035576; transaction id 535920

In the rr replay of that execution, I see that recv_sys.lsn (which would be assigned to the log_sys.lsn before recv_lsn_checks_on is set) was last updated to the correct value here:

#0  recv_sys_t::parse<recv_buf, true> (
    this=this@entry=0x55dd04a86c40 <recv_sys>, 
    l=@0x7ffffe498a68: {ptr = 0x7f29b0eab3f8 ""}, 
    if_exists=if_exists@entry=true)
    at /mariadb/11/storage/innobase/include/log0log.h:313
#1  0x000055dd04733452 in recv_sys_t::parse_mtr<true> (if_exists=true)
    at /mariadb/11/storage/innobase/log/log0recv.cc:2996
#2  recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=true)
    at /mariadb/11/storage/innobase/log/log0recv.cc:3007
#3  0x000055dd04724f98 in recv_scan_log (last_phase=last_phase@entry=true)
    at /mariadb/11/storage/innobase/log/log0recv.cc:4009
#4  0x000055dd04725d07 in recv_recovery_from_checkpoint_start ()
    at /mariadb/11/storage/innobase/log/log0recv.cc:4506
#5  0x000055dd047ba383 in srv_start (create_new_db=false)
    at /mariadb/11/storage/innobase/srv/srv0start.cc:1513

It looks like we were reading the log wrong, hitting premature EOF while re-reading it. In the core dump, I saw recv_sys.lsn=272313244 and recv_sys.scanned_lsn=276035576. I will rerun the test with an additional assertion, hoping to catch this earlier.

Comment by Marko Mäkelä [ 2023-08-02 ]

The attached MDEV-31827-data.tar.xz triggers an assertion failure of the following patch:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 6f58be6ed35..07887e082c9 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -4662,6 +4662,7 @@ dberr_t recv_recovery_from_checkpoint_start()
 	}
 
 	mysql_mutex_lock(&recv_sys.mutex);
+	ut_a(recv_sys.scanned_lsn == recv_sys.lsn);
 	recv_sys.apply_log_recs = true;
 	recv_no_ibuf_operations = false;
 	ut_d(recv_no_log_write = srv_operation == SRV_OPERATION_RESTORE

If the datadir is located in /dev/shm and therefore the "fake PMEM" is being used, the dataset will recover fine as far as InnoDB is concerned:

10.9 8760fe11b89450d56269ed062b36b4238d45707f

2023-08-02 17:31:00 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
2023-08-02 17:31:00 0 [Note] InnoDB: Completed initialization of buffer pool
2023-08-02 17:31:00 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2023-08-02 17:31:00 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=72811914
2023-08-02 17:31:00 0 [Note] InnoDB: Multi-batch recovery needed at LSN 73324548
2023-08-02 17:31:00 0 [Note] InnoDB: End of log at LSN=73844162
2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73324548/73844162; 328 pages
2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73808457/73844162; 315 pages
2023-08-02 17:31:00 0 [ERROR] InnoDB: The change buffer is corrupted or has been removed on upgrade to MariaDB 11.0 or later
2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2023-08-02 17:31:00 0 [Note] InnoDB: To recover: 5 pages
2023-08-02 17:31:00 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.

Comment by Marko Mäkelä [ 2023-08-02 ]

This patch did the trick:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 6f58be6ed35..d8d976f5e67 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -4147,7 +4147,8 @@ static bool recv_scan_log(bool last_phase)
     if (recv_sys.is_corrupt_log())
       break;
 
-    if (recv_sys.offset < log_sys.get_block_size())
+    if (recv_sys.offset < log_sys.get_block_size() &&
+        recv_sys.lsn == recv_sys.scanned_lsn)
       goto got_eof;
 
     if (recv_sys.offset > buf_size / 4 ||

The failure scenario was that we did premature goto got_eof when the very first mini-transaction at the "rewind LSN" wraps around the circular ib_logfile0 to byte offset 12288. The mini-transaction started 2048-4 bytes before the end of ib_logfile0. 2048 bytes were read from the end of the file into the parsing buffer, and the recv_sys.offset was 4.

After applying that patch (and retaining all above mentioned assertions), while testing on /run/user/$UID (tmpfs with pwrite() and pread() based ib_logfile0 access) I only got a timing-related failure that does not affect correctness:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w79 [ 41 fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-02 18:21:00
line
2023-08-02 18:20:55 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 480 OS file reads, 430 OS file writes, 0 OS fsyncs.

I restarted the test with innodb_log_file_size=5m (half the mtr default size), which should improve the failure rate. Got the same non-failure again:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w34 [ 20 fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-02 18:40:52
line
2023-08-02 18:40:48 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 595 OS file reads, 432 OS file writes, 0 OS fsyncs.

I temporarily suppressed that warning (it should eventually be investigated and fixed), hoping to get 100×100 successful test runs with regular file I/O on RAM disk. We should have the results in 30 minutes.

Comment by Marko Mäkelä [ 2023-08-02 ]

The following test with the patch and the suppressed warning was successful, on /run/user/$UID:

./mtr --mysqld=--innodb-log-file-size=5m --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w11 [ 100 pass ]  14533
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 158268.156 of 1651 seconds executing testcases
 
Completed: All 10000 tests were successful.

Generated at Thu Feb 08 10:26:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.