While trying to reproduce MDEV-31354, I observed that in the test innodb.recovery_memory, InnoDB occasionally fails to recover, because it is reading garbage, typically from an undo tablespace page. This occurs also with innodb_use_native_aio=0.
We should try to reproduce this with rr record so that the root cause can be analyzed.
Attachments
Issue Links
is caused by
MDEV-14425Change the InnoDB redo log format to reduce write amplification
Closed
relates to
MDEV-31354SIGSEGV in log_sort_flush_list() in InnoDB crash recovery
Thank you, mleich. The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page:
{0x34, 0x1, 0x34, 0x19, 0x6}
I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage:
at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826
#2 0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199
#3 0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014
It may be notable that the parsing buffer is log_sys.buf+0x3001, that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF.
I will rerun the test somewhere else than /dev/shm, to see if the MDEV-14425 "fake PMEM" could be involved here.
Marko Mäkelä
added a comment - Thank you, mleich . The trace is of a debug-instrumented server. It fails on the second log_phys_t::apply() on page 1:0x34, which is a WRITE to the page:
{0x34, 0x1, 0x34, 0x19, 0x6}
I think that the problem is that the first record on which log_phys_t::apply() is invoked is corrupted. I would expect it to be INIT_PAGE (0x12 0x01 0x34), but I am seeing 0x00 0x40 0x00, and log_phys_t::apply() would treat the first 0x00 as an end-of-record marker, making the first record a no-op. Because no INIT_PAGE record was applied, a page number mismatch in the page frame would trip an debug assertion while trying to apply the second record. It looks like this record was copied due to an error already during the parsing stage:
#0 recv_ring::get_buf (decrypt_buf=0x7ffc690f8200 "", start=<optimized out>, buf=0x7efcc4ebd001 "8\001X\204r", this=0x7ffc690fcfe0) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2290
#1 recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55ecf371b700 <recv_sys>, l=@0x7ffc690fcfe0: {<recv_buf> = {ptr = 0x7efcc4ebd001 "8\001X\204r"}, <No data fields>}, if_exists=if_exists@entry=false)
at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2826
#2 0x000055ecf185bbf2 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:2199
#3 0x000055ecf1810a60 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-14795A/storage/innobase/log/log0recv.cc:4014
It may be notable that the parsing buffer is log_sys.buf+0x3001 , that is, one byte after the start of the log record area. I did not yet figure out how the bytes got transformed into 0x00 0x40 0x40, but I can say that innodb_encrypt_log=OFF .
I will rerun the test somewhere else than /dev/shm , to see if the MDEV-14425 "fake PMEM" could be involved here.
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
Marko Mäkelä
added a comment - The "fake PMEM" is not the culprit (and we did test buffer wrap-around in MDEV-14425 ), because I can reproduce this also as follows:
mkdir /run/user/ $UID /var
rm -fr var
ln -sf /run/user/ $UID /var var
. /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
11.2 e81fa345020ec6a067583db6a7019d6404b26f93
innodb.recovery_memory 'innodb,release' w65 [ 14 fail ]
Test ended at 2023-08-01 17:22:19
CURRENT_TEST: innodb.recovery_memory
mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000:
bytes
description
0x34, 0x1, 0x0, 0x75, 0xb2
WRITE, page 1:0
0x35, 0x1, 0x58, 0x84, 0x39, 0xb1
WRITE, page 1:0x58
0xc3, 0x8, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2
WRITE, same page
0xc4, 0x0, 0x80, 0x0, 0xff
MEMSET, same page
0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe
WRITE, page 1:0
0x34, 0x1, 0x0, 0x3d, 0x37
WRITE, page 1:0
0x12, 0x1, 0x34
INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00)
0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34
WRITE, page 1:0x58
0x34, 0x1, 0x34, 0x19, 0x6
WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record)
0xb3, 0x2a, 0x4, 0xb2 …
WRITE, same page …
This bug is something that must have been introduced by MDEV-14425.
Marko Mäkelä
added a comment - In the rr replay trace that I am analyzing, the suspicious bytes 0x00 0x40 0x40 are being added for the page as a result of parsing a mini-transaction that wraps around the end of the circular log file. The mini-transaction looks like this, starting at the memory-mapped log_sys.buf+0x9fffc5 and wrapping around 0xa00000 :
bytes
description
0x34, 0x1, 0x0, 0x75, 0xb2
WRITE, page 1:0
0x35, 0x1, 0x58, 0x84, 0x39, 0xb1
WRITE, page 1:0x58
0xc3, 0x8, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xc3, 0x4, 0x4, 0xff
MEMSET, same page
0xd3, 0x2, 0x6, 0xb
MEMMOVE, same page
0xb5, 0x0, 0x5, 0xd6, 0x69, 0xd2
WRITE, same page
0xc4, 0x0, 0x80, 0x0, 0xff
MEMSET, same page
0x35, 0x1, 0x0, 0x80, 0x3b, 0xfe
WRITE, page 1:0
0x34, 0x1, 0x0, 0x3d, 0x37
WRITE, page 1:0
0x12, 0x1, 0x34
INIT_PAGE 1:0x34 (the buffer wraps around between 0x12, 0x1 and 0x34; this is wrongly buffered as 0x00, 0x40, 0x00)
0x38, 0x1, 0x58, 0x84, 0x72, 0x0, 0x0, 0x0, 0x34
WRITE, page 1:0x58
0x34, 0x1, 0x34, 0x19, 0x6
WRITE, page 1:0x34, byte 0x6 at offset 0x19 (a debug assertion fails while applying this record)
0xb3, 0x2a, 0x4, 0xb2 …
WRITE, same page …
This bug is something that must have been introduced by MDEV-14425 .
I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed:
Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.
Marko Mäkelä
added a comment - I was able to reproduce the scenario of the rr replay trace using a variant of the patch below that would assert cl == l.ptr when an INIT_PAGE record was parsed:
diff --git a/mysql-test/suite/innodb/t/recovery_memory.opt b/mysql-test/suite/innodb/t/recovery_memory.opt
index 7207fd27a42..87c20c9abad 100644
--- a/mysql-test/suite/innodb/t/recovery_memory.opt
+++ b/mysql-test/suite/innodb/t/recovery_memory.opt
@@ -1 +1 @@
---innodb_buffer_pool_size=1073741824
+--innodb_buffer_pool_size=1073741824 --innodb-log-file-size=5m
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 66b763fdf2e..19b25a0e0f4 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -2303,8 +2303,6 @@ struct recv_ring : public recv_buf
const byte *copy_if_needed(const byte *iv, byte *tmp, recv_ring start,
size_t len)
{
- if (!len)
- return ptr;
const size_t s(*this - start);
ut_ad(s + len <= srv_page_size);
if (!log_sys.is_encrypted())
@@ -2658,6 +2656,8 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
store_freed_or_init_rec(id, (b & 0x70) == FREE_PAGE);
if (UNIV_UNLIKELY(rlen != 0))
goto record_corrupted;
+ copy_if_needed:
+ cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
break;
case EXTENDED:
if (UNIV_UNLIKELY(!rlen))
@@ -2689,10 +2689,7 @@ recv_sys_t::parse_mtr_result recv_sys_t::parse(source &l, bool if_exists)
break;
case OPTION:
if (rlen == 5 && *l == OPT_PAGE_CHECKSUM)
- {
- cl= l.copy_if_needed(iv, decrypt_buf, recs, rlen);
- break;
- }
+ goto copy_if_needed;
/* fall through */
case RESERVED:
continue;
Also for me, the last byte of a 3-byte INIT_PAGE record would start at byte offset 12288.
Spent 157475.278 of 1635 seconds executing testcases
Completed: All 10000 tests were successful.
I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.
Marko Mäkelä
added a comment - The above patch (with the fix of MDEV-31354 additionally applied) worked for me.
. /mtr --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
…
innodb.recovery_memory 'innodb,release' w60 [ 100 pass ] 14458
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 157475.278 of 1635 seconds executing testcases
Completed: All 10000 tests were successful.
I will rerun the test once more without the change for the mysql-test/suite/innodb/t/recovery_memory.opt file.
Spent 166775.724 of 1760 seconds executing testcases
Completed: All 10000 tests were successful.
Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827) was encountered:
11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
innodb.recovery_memory 'innodb,release' w4 [ 89 fail ] Found warnings/errors in server log file!
Test ended at 2023-08-02 12:21:21
line
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Marko Mäkelä
added a comment -
11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
…
innodb.recovery_memory 'innodb,release' w73 [ 100 pass ] 22498
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 166775.724 of 1760 seconds executing testcases
Completed: All 10000 tests were successful.
Yesterday, I reproduced a failure of the test with InnoDB: Buffered log writes (block size=512 bytes) instead of the "fake PMEM" (memory-mapped log). That failure might have been explained by the absence of the fix of MDEV-31354 in the branch. I conducted one more run of that, while there was some significant concurrent load on the system during part of the run. Surprisingly, near the end of the run, a new type of failure (filed as MDEV-31827 ) was encountered:
11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches
innodb.recovery_memory 'innodb,release' w4 [ 89 fail ] Found warnings/errors in server log file!
Test ended at 2023-08-02 12:21:21
line
2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
pluto:/data/Server_bin/bb-11.2-MDEV-14795A_asan_Og/mariadb-test$ _RR_TRACE_DIR=MDEV-31791/195/log/mysqld.1.rr/ rr replay --mark-stdio