[MDEV-18370] InnoDB: Failing assertion: lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE in log0log.cc with innodb_scrub_log=ON and high values of innodb_scrub_log_speed Created: 2019-01-24  Updated: 2020-08-07  Resolved: 2020-08-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.5.2

Type: Bug Priority: Major
Reporter: Alice Sherepa Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 0
Labels: None

Attachments: File my.cnf    
Issue Links:
Blocks
is blocked by MDEV-21870 Deprecate and ignore innodb_scrub_log... Closed
Relates
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-18038 Assertion failure in innodb.undo_trun... Closed
relates to MDEV-20474 Assertion `!recv_no_log_write' failed... Closed
relates to MDEV-20475 Assertion `flushed_lsn == log_get_lsn... Closed
relates to MDEV-21870 Deprecate and ignore innodb_scrub_log... Closed

 Description   

10.2 dc91372d

2019-08-17 20:43:39 0x7fed6c9f6700  InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/log/log0log.cc line 2238
InnoDB: Failing assertion: lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE
 
#5  0x00007fed6d93742a in __GI_abort () at abort.c:89
#6  0x000055d1c6fc5752 in ut_dbg_assertion_failed (expr=0x55d1c7501290 "lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE", file=0x55d1c7500470 "/data/src/10.2/storage/innobase/log/log0log.cc", line=2238) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:60
#7  0x000055d1c6e4d818 in log_pad_current_log_block () at /data/src/10.2/storage/innobase/log/log0log.cc:2238
#8  0x000055d1c6e4d883 in log_scrub () at /data/src/10.2/storage/innobase/log/log0log.cc:2254
#9  0x000055d1c6e4d934 in log_scrub_thread () at /data/src/10.2/storage/innobase/log/log0log.cc:2281
#10 0x00007fed6f4a34a4 in start_thread (arg=0x7fed6c9f6700) at pthread_create.c:456
#11 0x00007fed6d9ebd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

To reproduce, bootstrap a clean datadir with all defaults and start the server with

--innodb_scrub_log=ON --innodb_scrub_log_speed=50000

Do it in a loop if it doesn't fail right away.
It happens pretty much every time for me on my local HDD and SSD disks, sporadically on an (apparently faster or otherwise different) SSD disk on a lab machine, and almost never in shm. So, don't run it in shm.

Happens both on debug and non-debug builds.

Couldn't reproduce on 10.1 and 10.5.



 Comments   
Comment by Marko Mäkelä [ 2019-02-01 ]

This report duplicates MDEV-14192.

Comment by Marko Mäkelä [ 2019-04-08 ]

This is not necessarily related to MDEV-14192, which seems like a Mariabackup specific bug.

It is worth noting that the mentioned code is only running when using the non-default setting innodb_scrub_log=ON.

alice, can you please try to create a reproducible test case for this? If this only occurs on 10.4, I wonder if this is related to the changed format of innodb_encrypt_log=ON in 10.4 (MDEV-12041).

Comment by Marko Mäkelä [ 2019-04-08 ]

In my.cnf I do not see innodb_encrypt_log; there is only innodb_scrub_log=ON and innodb_scrub_log_speed = 48799.

Comment by Elena Stepanova [ 2019-08-17 ]

marko: I've updated the description, but it's pretty much as alice initially described: just start the server with the options you listed on a clean datadir, and repeat if it doesn't fail right away. One additional note – don't do it in memory.
50000 and 48799 are not special unique values, apparently various high values can cause the problem. In tests, it's been seen around 14,000 and higher, but it might be specific to the environment.

Update: It did eventually happen in memory on a lab machine, but it took over 100 attempts.

Update2:
For the machine where tests are run, innodb_scrub_log_speed=719 also seems to be a dark magic value. After limiting the random range from 1..50000 to 1..1000, I got only 5 failures overnight (instead of hundreds), and all of them happened with innodb_scrub_log_speed=719.

Update3:
Got it once with innodb_scrub_log_speed=236 which is actually lower than the default 256.

Comment by Marko Mäkelä [ 2019-09-30 ]

Do we actually need this feature? With a completely new redo log file format in MDEV-14425, it should be easier to free unused parts of the redo log.

I have not studied the log scrubbing code, but if the page scrubbing code (which does not work, according to MDEV-8139) does not give much hope. That code is modifying data pages without writing redo log. I have been cleaning up some of that as part of MDEV-12353. I think that we should deprecate&remove that feature as part of implementing MDEV-15528.

Comment by Elena Stepanova [ 2019-09-30 ]

According to JIRA, MDEV-15528 is targeted for 10.4, I don't think we can remove anything there.

Generally, I personally don't have any objections regarding removing the feature, but we don't really have much of experience/procedure for it, I don't know how it needs to be done.

Comment by Marko Mäkelä [ 2020-08-07 ]

MDEV-21870 in MariaDB 10.5.2 deprecated the options related to redo log scrubbing. I do not think that it was ever working correctly.

A correct implementation of this would be to punch holes or write sequences of zero bytes to the redo log blocks as they are logically freed upon completing a checkpoint. Proper synchronization between checkpoint, log append, and a dedicated log scrubbing thread could be difficult to achieve.

Generated at Thu Feb 08 08:43:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.