[MDEV-31791] Crash recovery in the test innodb.recovery_memory occasionally fails Created: 2023-07-28  Updated: 2023-08-02  Resolved: 2023-08-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, recovery, rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-14425 Change the InnoDB redo log format to ... Closed
Relates
relates to MDEV-31354 SIGSEGV in log_sort_flush_list() in I... Closed

 Description   

While trying to reproduce MDEV-31354, I observed that in the test innodb.recovery_memory, InnoDB occasionally fails to recover, because it is reading garbage, typically from an undo tablespace page. This occurs also with innodb_use_native_aio=0.

We should try to reproduce this with rr record so that the root cause can be analyzed.



 Comments   
Comment by Matthias Leich [ 2023-07-31 ]

pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio

Comment by Marko Mäkelä [ 2023-08-01 ]

Thank you, mleich. The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page:

{0x34, 0x1, 0x34, 0x19, 0x6}

I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage:

#0  recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290
#1  recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false)
    at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826
#2  0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199
#3  0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014

It may be notable that the parsing buffer is log_sys.buf+0x3001, that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF.

I will rerun the test somewhere else than /dev/shm, to see if the MDEV-14425 "fake PMEM" could be involved here.

Comment by Marko Mäkelä [ 2023-08-01 ]

The "fake PMEM" is not the culprit (and we did test buffer wrap-around in MDEV-14425), because I can reproduce this also as follows:

mkdir /run/user/$UID/var
rm -fr var
ln -sf /run/user/$UID/var var
./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

11.2 e81fa345020ec6a067583db6a7019d6404b26f93

innodb.recovery_memory 'innodb,release'  w65 [ 14 fail ]
        Test ended at 2023-08-01 17:22:19
 
CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

Comment by Marko Mäkelä [ 2023-08-02 ]

In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000:

bytes description
0x34, 0x1, 0x0, 0x75, 0xb2 WRITE, page 1:0
0x35, 0x1, 0x58, 0x84, 0x39, 0xb1 WRITE, page 1:0x58
0xc3, 0x8, 0x4, 0xff MEMSET, same page
0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff MEMSET, same page
0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff MEMSET, same page
0xd3, 0x2, 0x6, 0xb MEMMOVE, same page
0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2 WRITE, same page
0xc4, 0x0, 0x80, 0x0, 0xff MEMSET, same page
0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe WRITE, page 1:0
0x34, 0x1, 0x0, 0x3d, 0x37 WRITE, page 1:0
0x12, 0x1, 0x34 INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00)
0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34 WRITE, page 1:0x58
0x34, 0x1, 0x34, 0x19, 0x6 WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record)
0xb3, 0x2a, 0x4, 0xb2 … WRITE, same page …

This bug is something that must have been introduced by MDEV-14425.

Comment by Marko Mäkelä [ 2023-08-02 ]

I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed:

diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt
index 7207fd27a42..87c20c9abad 100644
--- a/mysql-test/suite/innodb/t/recovery_memory.opt
+++ b/mysql-test/suite/innodb/t/recovery_memory.opt
@@ -1 +1 @@
---innodb_buffer_pool_size=1073741824
+--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 66b763fdf2e..19b25a0e0f4 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf
   const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start,
                              size_t len)
   {
-    if (!len)
-      return ptr;
     const size_t s(*this - start);
     ut_ad(s + len <= srv_page_size);
     if (!log_sys.is_encrypted())
@@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
         store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE);
         if (UNIV_UNLIKELY(rlen != 0))
           goto record_corrupted;
+      copy_if_needed:
+        cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
         break;
       case EXTENDED:
         if (UNIV_UNLIKELY(!rlen))
@@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
         break;
       case OPTION:
         if (rlen == 5 && *l == OPT_PAGE_CHECKSUM)
-        {
-          cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
-          break;
-        }
+          goto copy_if_needed;
         /* fall through */
       case RESERVED:
         continue;

Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.

Comment by Marko Mäkelä [ 2023-08-02 ]

The above patch (with the fix of MDEV-31354 additionally applied) worked for me.

./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w60 [ 100 pass ]  14458
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 157475.278 of 1635 seconds executing testcases
 
Completed: All 10000 tests were successful.

I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.

Comment by Marko Mäkelä [ 2023-08-02 ]

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w73 [ 100 pass ]  22498
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 166775.724 of 1760 seconds executing testcases
 
Completed: All 10000 tests were successful.

Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827) was encountered:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

innodb.recovery_memory 'innodb,release'  w4 [ 89 fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-02 12:21:21
line
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

Generated at Thu Feb 08 10:26:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.