[MDEV-18038] Assertion failure in innodb.undo_truncate_recover: "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i" Created: 2018-12-19  Updated: 2022-11-11  Resolved: 2022-11-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.12
Fix Version/s: 10.3.16

Type: Bug Priority: Major
Reporter: Eugene Kosov (Inactive) Assignee: Marko Mäkelä
Resolution: Cannot Reproduce Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-14192 mariabackup.incremental_backup failed... Closed
relates to MDEV-18370 InnoDB: Failing assertion: lsn % OS_F... Closed
relates to MDEV-13080 [ERROR] InnoDB: Missing MLOG_CHECKPOI... Closed

 Description   

This tested at least on b26736cdb1105f5c500c0a6b51954ac4a83665b0 of 10.3

mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100

And here is actually two failures. One is Missing MLOG_CHECKPOINT at 24666925 between the checkpoint 23868993 and the end 24666925 similar to https://jira.mariadb.org/browse/MDEV-13080

The second one is a crash:

#4  __GI_raise (sig=sig@entry=6) at raise.c:50
#5  __GI_abort () at abort.c:79
#6  __assert_fail_base (fmt=0x7ffa583e9858 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x16a4106 "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i", file=0x16a3080 "/work/mariadb/storage/innobase/log/log0log.cc", line=839, function=<optimized out>) at assert.c:92
#7  __GI___assert_fail (assertion=0x16a4106 "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i", file=0x16a3080 "/work/mariadb/storage/innobase/log/log0log.cc", line=839, function=0x16a3fd4 "void log_write_buf(byte *, ulint, ulint, lsn_t, ulint)") at assert.c:101
#8  log_write_buf (buf=0x7ffa484a00ca "\200", len=717312, pad_len=0, start_lsn=23930880, new_data_offset=352) at log0log.cc:835
#9  log_write_up_to (lsn=24648005, flush_to_disk=true) at log0log.cc:1104
#10 trx_purge_initiate_truncate (limit=..., undo_trunc=0x1c28cb0 <purge_sys+624>) at trx0purge.cc:1033
#11 trx_purge_truncate_history () at trx0purge.cc:1109
#12 trx_purge (n_purge_threads=4, truncate=true) at trx0purge.cc:1623
#13 srv_do_purge (n_total_purged=0x7ffa3d7f9e48) at srv0srv.cc:2595
#14 srv_purge_coordinator_thread (arg=0x0) at srv0srv.cc:2720
#15 start_thread (arg=<optimized out>) at pthread_create.c:486
#16 clone () at clone.S:95

Both failures happens rarely and only with a bash trick {,,,} which ensures a parallel execution of tests. I suppose it's a concurrency issue.

Also, I think not only 10.3 is affected but I haven't check it.



 Comments   
Comment by Marko Mäkelä [ 2019-04-23 ]

This looks very similar to MDEV-14192 and MDEV-18370.

Comment by Marko Mäkelä [ 2022-11-11 ]

The last failure on buildbot was after MariaDB 10.3.14 and before MariaDB 10.3.15:

10.3 323e6cd74ce76c7811835bed640a2934

innodb.undo_truncate_recover '16k,2,innodb' w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2019-04-19 00:26:03
line
mysqld: /home/buildbot/buildbot/build/mariadb-10.3.15/storage/innobase/log/log0log.cc:822: void log_write_buf(byte*, ulint, ulint, lsn_t, ulint): Assertion `pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i' failed.
Attempting backtrace. You can use the following information to find out

MDEV-14192 was fixed in 10.3.16, so let us guess that this bug was fixed at the same time.

Generated at Thu Feb 08 08:41:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.