Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18370

InnoDB: Failing assertion: lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE in log0log.cc with innodb_scrub_log=ON and high values of innodb_scrub_log_speed

Details

    Description

      10.2 dc91372d

      2019-08-17 20:43:39 0x7fed6c9f6700  InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/log/log0log.cc line 2238
      InnoDB: Failing assertion: lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE
       
      #5  0x00007fed6d93742a in __GI_abort () at abort.c:89
      #6  0x000055d1c6fc5752 in ut_dbg_assertion_failed (expr=0x55d1c7501290 "lsn % OS_FILE_LOG_BLOCK_SIZE == LOG_BLOCK_HDR_SIZE", file=0x55d1c7500470 "/data/src/10.2/storage/innobase/log/log0log.cc", line=2238) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:60
      #7  0x000055d1c6e4d818 in log_pad_current_log_block () at /data/src/10.2/storage/innobase/log/log0log.cc:2238
      #8  0x000055d1c6e4d883 in log_scrub () at /data/src/10.2/storage/innobase/log/log0log.cc:2254
      #9  0x000055d1c6e4d934 in log_scrub_thread () at /data/src/10.2/storage/innobase/log/log0log.cc:2281
      #10 0x00007fed6f4a34a4 in start_thread (arg=0x7fed6c9f6700) at pthread_create.c:456
      #11 0x00007fed6d9ebd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      To reproduce, bootstrap a clean datadir with all defaults and start the server with

      --innodb_scrub_log=ON --innodb_scrub_log_speed=50000
      

      Do it in a loop if it doesn't fail right away.
      It happens pretty much every time for me on my local HDD and SSD disks, sporadically on an (apparently faster or otherwise different) SSD disk on a lab machine, and almost never in shm. So, don't run it in shm.

      Happens both on debug and non-debug builds.

      Couldn't reproduce on 10.1 and 10.5.

      Attachments

        Issue Links

          Activity

            This report duplicates MDEV-14192.

            marko Marko Mäkelä added a comment - This report duplicates MDEV-14192 .

            This is not necessarily related to MDEV-14192, which seems like a Mariabackup specific bug.

            It is worth noting that the mentioned code is only running when using the non-default setting innodb_scrub_log=ON.

            alice, can you please try to create a reproducible test case for this? If this only occurs on 10.4, I wonder if this is related to the changed format of innodb_encrypt_log=ON in 10.4 (MDEV-12041).

            marko Marko Mäkelä added a comment - This is not necessarily related to MDEV-14192 , which seems like a Mariabackup specific bug. It is worth noting that the mentioned code is only running when using the non-default setting innodb_scrub_log=ON . alice , can you please try to create a reproducible test case for this? If this only occurs on 10.4, I wonder if this is related to the changed format of innodb_encrypt_log=ON in 10.4 ( MDEV-12041 ).

            In my.cnf I do not see innodb_encrypt_log; there is only innodb_scrub_log=ON and innodb_scrub_log_speed = 48799.

            marko Marko Mäkelä added a comment - In my.cnf I do not see innodb_encrypt_log ; there is only innodb_scrub_log=ON and innodb_scrub_log_speed = 48799 .
            elenst Elena Stepanova added a comment - - edited

            marko: I've updated the description, but it's pretty much as alice initially described: just start the server with the options you listed on a clean datadir, and repeat if it doesn't fail right away. One additional note – don't do it in memory.
            50000 and 48799 are not special unique values, apparently various high values can cause the problem. In tests, it's been seen around 14,000 and higher, but it might be specific to the environment.

            Update: It did eventually happen in memory on a lab machine, but it took over 100 attempts.

            Update2:
            For the machine where tests are run, innodb_scrub_log_speed=719 also seems to be a dark magic value. After limiting the random range from 1..50000 to 1..1000, I got only 5 failures overnight (instead of hundreds), and all of them happened with innodb_scrub_log_speed=719.

            Update3:
            Got it once with innodb_scrub_log_speed=236 which is actually lower than the default 256.

            elenst Elena Stepanova added a comment - - edited marko : I've updated the description, but it's pretty much as alice initially described: just start the server with the options you listed on a clean datadir, and repeat if it doesn't fail right away. One additional note – don't do it in memory. 50000 and 48799 are not special unique values, apparently various high values can cause the problem. In tests, it's been seen around 14,000 and higher, but it might be specific to the environment. Update: It did eventually happen in memory on a lab machine, but it took over 100 attempts. Update2: For the machine where tests are run, innodb_scrub_log_speed=719 also seems to be a dark magic value. After limiting the random range from 1..50000 to 1..1000, I got only 5 failures overnight (instead of hundreds), and all of them happened with innodb_scrub_log_speed=719 . Update3: Got it once with innodb_scrub_log_speed=236 which is actually lower than the default 256.

            Do we actually need this feature? With a completely new redo log file format in MDEV-14425, it should be easier to free unused parts of the redo log.

            I have not studied the log scrubbing code, but if the page scrubbing code (which does not work, according to MDEV-8139) does not give much hope. That code is modifying data pages without writing redo log. I have been cleaning up some of that as part of MDEV-12353. I think that we should deprecate&remove that feature as part of implementing MDEV-15528.

            marko Marko Mäkelä added a comment - Do we actually need this feature? With a completely new redo log file format in MDEV-14425 , it should be easier to free unused parts of the redo log. I have not studied the log scrubbing code, but if the page scrubbing code (which does not work, according to MDEV-8139 ) does not give much hope. That code is modifying data pages without writing redo log. I have been cleaning up some of that as part of MDEV-12353 . I think that we should deprecate&remove that feature as part of implementing MDEV-15528 .

            According to JIRA, MDEV-15528 is targeted for 10.4, I don't think we can remove anything there.

            Generally, I personally don't have any objections regarding removing the feature, but we don't really have much of experience/procedure for it, I don't know how it needs to be done.

            elenst Elena Stepanova added a comment - According to JIRA, MDEV-15528 is targeted for 10.4, I don't think we can remove anything there. Generally, I personally don't have any objections regarding removing the feature, but we don't really have much of experience/procedure for it, I don't know how it needs to be done.

            MDEV-21870 in MariaDB 10.5.2 deprecated the options related to redo log scrubbing. I do not think that it was ever working correctly.

            A correct implementation of this would be to punch holes or write sequences of zero bytes to the redo log blocks as they are logically freed upon completing a checkpoint. Proper synchronization between checkpoint, log append, and a dedicated log scrubbing thread could be difficult to achieve.

            marko Marko Mäkelä added a comment - MDEV-21870 in MariaDB 10.5.2 deprecated the options related to redo log scrubbing. I do not think that it was ever working correctly. A correct implementation of this would be to punch holes or write sequences of zero bytes to the redo log blocks as they are logically freed upon completing a checkpoint. Proper synchronization between checkpoint, log append, and a dedicated log scrubbing thread could be difficult to achieve.

            People

              marko Marko Mäkelä
              alice Alice Sherepa
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.