[MDEV-31791] Crash recovery in the test innodb.recovery_memory occasionally fails - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1
Component/s: Storage Engine - InnoDB
Labels:

Description

While trying to reproduce ~~MDEV-31354~~, I observed that in the test innodb.recovery_memory, InnoDB occasionally fails to recover, because it is reading garbage, typically from an undo tablespace page. This occurs also with innodb_use_native_aio=0.

We should try to reproduce this with rr record so that the root cause can be analyzed.

Attachments

Issue Links

is caused by

MDEV-14425 Change the InnoDB redo log format to reduce write amplification

Closed

relates to

MDEV-31354 SIGSEGV in log_sort_flush_list() in InnoDB crash recovery

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä created issue - 2023-07-28 09:30

Marko Mäkelä made changes - 2023-07-28 09:30

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-31354~~ [ ~~MDEV-31354~~ ]

Matthias Leich added a comment - 2023-07-31 13:22

pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio

Matthias Leich added a comment - 2023-07-31 13:22 pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio

Matthias Leich made changes - 2023-08-01 09:15

Labels

corruption need_rr recovery

corruption recovery

Marko Mäkelä added a comment - 2023-08-01 14:11

Thank you, mleich. The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page:

{0x34, 0x1, 0x34, 0x19, 0x6}

I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage:

#0  recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290

#1  recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false)

    at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826

#2  0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199

#3  0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014

It may be notable that the parsing buffer is log_sys.buf+0x3001, that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF.

I will rerun the test somewhere else than /dev/shm, to see if the ~~MDEV-14425~~ "fake PMEM" could be involved here.

Marko Mäkelä added a comment - 2023-08-01 14:11 Thank you, mleich . The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page: {0x34, 0x1, 0x34, 0x19, 0x6} I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage: #0 recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290 #1 recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826 #2 0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199 #3 0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014 It may be notable that the parsing buffer is log_sys.buf+0x3001 , that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF . I will rerun the test somewhere else than /dev/shm , to see if the MDEV-14425 "fake PMEM" could be involved here.

Marko Mäkelä added a comment - 2023-08-01 14:28

The "fake PMEM" is not the culprit (and we did test buffer wrap-around in ~~MDEV-14425~~), because I can reproduce this also as follows:

mkdir /run/user/$UID/var

rm -fr var

ln -sf /run/user/$UID/var var

./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

11.2 e81fa345020ec6a067583db6a7019d6404b26f93
innodb.recovery_memory 'innodb,release' w65 [ 14 fail ]
Test ended at 2023-08-01 17:22:19

CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

Marko Mäkelä added a comment - 2023-08-01 14:28 The "fake PMEM" is not the culprit (and we did test buffer wrap-around in MDEV-14425 ), because I can reproduce this also as follows: mkdir /run/user/ $UID /var rm -fr var ln -sf /run/user/ $UID /var var . /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 innodb.recovery_memory 'innodb,release' w65 [ 14 fail ] Test ended at 2023-08-01 17:22:19 CURRENT_TEST: innodb.recovery_memory mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

Marko Mäkelä added a comment - 2023-08-02 06:50

In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000:

bytes	description
0x34, 0x1, 0x0, 0x75, 0xb2	WRITE, page 1:0
0x35, 0x1, 0x58, 0x84, 0x39, 0xb1	WRITE, page 1:0x58
0xc3, 0x8, 0x4, 0xff	MEMSET, same page
0xd3, 0x2, 0x6, 0xb	MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff	MEMSET, same page
0xd3, 0x2, 0x6, 0xb	MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff	MEMSET, same page
0xd3, 0x2, 0x6, 0xb	MEMMOVE, same page
0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2	WRITE, same page
0xc4, 0x0, 0x80, 0x0, 0xff	MEMSET, same page
0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe	WRITE, page 1:0
0x34, 0x1, 0x0, 0x3d, 0x37	WRITE, page 1:0
0x12, 0x1, 0x34	INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00)
0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34	WRITE, page 1:0x58
0x34, 0x1, 0x34, 0x19, 0x6	WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record)
0xb3, 0x2a, 0x4, 0xb2 …	WRITE, same page …

This bug is something that must have been introduced by ~~MDEV-14425~~.

Marko Mäkelä added a comment - 2023-08-02 06:50 In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000 : bytes description 0x34, 0x1, 0x0, 0x75, 0xb2 WRITE, page 1:0 0x35, 0x1, 0x58, 0x84, 0x39, 0xb1 WRITE, page 1:0x58 0xc3, 0x8, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xc3, 0x4, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xc3, 0x4, 0x4, 0xff MEMSET, same page 0xd3, 0x2, 0x6, 0xb MEMMOVE, same page 0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2 WRITE, same page 0xc4, 0x0, 0x80, 0x0, 0xff MEMSET, same page 0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe WRITE, page 1:0 0x34, 0x1, 0x0, 0x3d, 0x37 WRITE, page 1:0 0x12, 0x1, 0x34 INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00) 0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34 WRITE, page 1:0x58 0x34, 0x1, 0x34, 0x19, 0x6 WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record) 0xb3, 0x2a, 0x4, 0xb2 … WRITE, same page … This bug is something that must have been introduced by MDEV-14425 .

Marko Mäkelä made changes - 2023-08-02 06:50

Link

This issue is caused by ~~MDEV-14425~~ [ ~~MDEV-14425~~ ]

Marko Mäkelä made changes - 2023-08-02 06:50

Fix Version/s		10.9 [ 26905 ]
Fix Version/s		10.10 [ 27530 ]
Fix Version/s		10.11 [ 27614 ]
Fix Version/s		11.0 [ 28320 ]
Fix Version/s		11.1 [ 28549 ]
Affects Version/s		10.8 [ 26121 ]
Affects Version/s		10.9 [ 26905 ]
Affects Version/s		10.10 [ 27530 ]
Affects Version/s		10.11 [ 27614 ]
Affects Version/s		11.0 [ 28320 ]
Affects Version/s		11.1 [ 28549 ]
Assignee	Matthias Leich [ mleich ]	Marko Mäkelä [ marko ]
Labels	corruption recovery	corruption recovery rr-profile-analyzed
Priority	Critical [ 2 ]	Blocker [ 1 ]

Marko Mäkelä added a comment - 2023-08-02 07:41

I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed:

diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt

index 7207fd27a42..87c20c9abad 100644

--- a/mysql-test/suite/innodb/t/recovery_memory.opt

+++ b/mysql-test/suite/innodb/t/recovery_memory.opt

@@ -1 +1 @@

---innodb_buffer_pool_size=1073741824

+--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 66b763fdf2e..19b25a0e0f4 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf

   const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start,

                              size_t len)

-    if (!len)

-      return ptr;

     const size_t s(*this - start);

     ut_ad(s + len <= srv_page_size);

     if (!log_sys.is_encrypted())

@@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)

         store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE);

         if (UNIV_UNLIKELY(rlen != 0))

           goto record_corrupted;

+      copy_if_needed:

+        cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);

         break;

       case EXTENDED:

         if (UNIV_UNLIKELY(!rlen))

@@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)

         break;

       case OPTION:

         if (rlen == 5 && *l == OPT_PAGE_CHECKSUM)

-        {

-          cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);

-          break;

-        }

+          goto copy_if_needed;

         /* fall through */

       case RESERVED:

         continue;

Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.

Marko Mäkelä added a comment - 2023-08-02 07:41 I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed: diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt index 7207fd27a42..87c20c9abad 100644 --- a/mysql-test/suite/innodb/t/recovery_memory.opt +++ b/mysql-test/suite/innodb/t/recovery_memory.opt @@ -1 +1 @@ ---innodb_buffer_pool_size=1073741824 +--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 66b763fdf2e..19b25a0e0f4 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start, size_t len) { - if (!len) - return ptr; const size_t s(*this - start); ut_ad(s + len <= srv_page_size); if (!log_sys.is_encrypted()) @@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists) store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE); if (UNIV_UNLIKELY(rlen != 0)) goto record_corrupted; + copy_if_needed: + cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen); break; case EXTENDED: if (UNIV_UNLIKELY(!rlen)) @@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists) break; case OPTION: if (rlen == 5 && *l == OPT_PAGE_CHECKSUM) - { - cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen); - break; - } + goto copy_if_needed; /* fall through */ case RESERVED: continue; Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.

Marko Mäkelä added a comment - 2023-08-02 08:14

The above patch (with the fix of ~~MDEV-31354~~ additionally applied) worked for me.

./mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
…
innodb.recovery_memory 'innodb,release' w60 [ 100 pass ] 14458
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 157475.278 of 1635 seconds executing testcases

Completed: All 10000 tests were successful.

I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.

Marko Mäkelä added a comment - 2023-08-02 08:14 The above patch (with the fix of MDEV-31354 additionally applied) worked for me. . /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches … innodb.recovery_memory 'innodb,release' w60 [ 100 pass ] 14458 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 157475.278 of 1635 seconds executing testcases Completed: All 10000 tests were successful. I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.

Marko Mäkelä added a comment - 2023-08-02 09:45

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
…
innodb.recovery_memory 'innodb,release' w73 [ 100 pass ] 22498
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 166775.724 of 1760 seconds executing testcases

Completed: All 10000 tests were successful.

Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of ~~MDEV-31354~~ in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as ~~MDEV-31827~~) was encountered:

11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
innodb.recovery_memory 'innodb,release' w4 [ 89 fail ] Found warnings/errors in server log file!
Test ended at 2023-08-02 12:21:21
line
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

Marko Mäkelä added a comment - 2023-08-02 09:45 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches … innodb.recovery_memory 'innodb,release' w73 [ 100 pass ] 22498 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 166775.724 of 1760 seconds executing testcases Completed: All 10000 tests were successful. Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827 ) was encountered: 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches innodb.recovery_memory 'innodb,release' w4 [ 89 fail ] Found warnings/errors in server log file! Test ended at 2023-08-02 12:21:21 line 2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt! 2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369. 2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

Marko Mäkelä made changes - 2023-08-02 10:28

issue.field.resolutiondate

2023-08-02 10:28:57.0

2023-08-02 10:28:57.101

Marko Mäkelä made changes - 2023-08-02 10:29

Fix Version/s		10.9.8 [ 29015 ]
Fix Version/s		10.10.6 [ 29017 ]
Fix Version/s		10.11.5 [ 29019 ]
Fix Version/s		11.0.3 [ 28920 ]
Fix Version/s		11.1.2 [ 28921 ]
Fix Version/s		11.2.1 [ 29034 ]
Fix Version/s	10.9 [ 26905 ]
Fix Version/s	10.10 [ 27530 ]
Fix Version/s	10.11 [ 27614 ]
Fix Version/s	11.0 [ 28320 ]
Fix Version/s	11.1 [ 28549 ]
Fix Version/s	11.2 [ 28603 ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2023-07-28 09:30

Updated:: 2023-08-02 10:28

Resolved:: 2023-08-02 10:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server