[MDEV-22108] Crash recovery fails with [ERROR] InnoDB: Malformed log record Created: 2020-04-01  Updated: 2020-04-06  Resolved: 2020-04-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.2
Fix Version/s: 10.5.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 000196.log     File c00000.yy    
Issue Links:
Problem/Incident
is caused by MDEV-12353 Efficient InnoDB redo log record format Closed

 Description   

Workflow of RQG test
 
1. Create DB server and start it
2. One session starts to run some SQL stream against this server
3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).
4. Make some copy of the data content+logs etc. of the killed server.
5. Try to restart the killed server.
 
5. fails with
2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
...
2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
<some data>
2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
...
2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
 
RQG
git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00
 
MariaDB
origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
build with debug
 
perl rqg.pl \ 
--duration=120 \ 
--queries=10000000 \
--threads=1 \
--no_mask \
--seed=random \
--gendata=conf/engines/many_indexes.zz \
--rpl_mode=none \
--engine=InnoDB \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--lock-wait-timeout=86400 \ 
--mysqld=--log-output=none \
--mysqld=--slave_net_timeout=60 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--net_write_timeout=60 \
--mysqld=--connect_timeout=60 \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--mysqld=--log_bin_trust_function_creators=1 \ 
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--innodb_page_size=32K \
--mysqld=--wait_timeout=28800 \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose_innodb_use_native_aio=1 \
--mysqld=--log-bin \
--mysqld=--net_read_timeout=30 \
--reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
--validators=None \
--grammar=c00000.yy \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<path to MariaDB binaries> \
--script_debug=_nix
There might be several replay attempts required.



 Comments   
Comment by Marko Mäkelä [ 2020-04-02 ]

Does the following patch fix it?

diff --git a/storage/innobase/include/mtr0log.h b/storage/innobase/include/mtr0log.h
index d07fa069dfd..ee823caf6cb 100644
--- a/storage/innobase/include/mtr0log.h
+++ b/storage/innobase/include/mtr0log.h
@@ -427,7 +427,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
     if (oend + len > &log_ptr[16])
     {
       len+= oend - log_ptr - 15;
-      if (len >= MIN_3BYTE)
+      if (len >= MIN_3BYTE - 1)
         len+= 2;
       else if (len >= MIN_2BYTE)
         len++;
@@ -448,7 +448,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
   else if (len >= 3 && end + len > &log_ptr[16])
   {
     len+= end - log_ptr - 15;
-    if (len >= MIN_3BYTE)
+    if (len >= MIN_3BYTE - 1)
       len+= 2;
     else if (len >= MIN_2BYTE)
       len++;

The record with the unfortunate value len == MIN_3BYTE - 1 was written by btr_page_reorganize_low().

This bug is only possible with innodb_page_size=32k or innodb_page_size=64k. For smaller page sizes, we can never have that long redo log records, because for the length of a single record we have the constraint that log_phys_t::alloc_size(length) must not exceed innodb_page_size.

Comment by Matthias Leich [ 2020-04-02 ]

Yes
 
Results of testing on origin/10.5 HEAD 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00 + Patch applied
1. The test mentioned in the description (threads=1, pagesize=32K, grammar c00000.yy) replays the problem usually quite fast.
     But now it seems to be (~ 350 RQG runs) no more capable to replay a failing crash recovery showing the following line in server error log 
    "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
2. Test battery for covering a broad range of functionality (~ 1800 RQG runs):
     None of the failing tests showed "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
For 1. and 2.
     Some of the RQG tests failed but in all cases because of other already known issues.

Generated at Thu Feb 08 09:12:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.