Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18038

Assertion failure in innodb.undo_truncate_recover: "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i"

    XMLWordPrintable

    Details

      Description

      This tested at least on b26736cdb1105f5c500c0a6b51954ac4a83665b0 of 10.3

      mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100

      And here is actually two failures. One is Missing MLOG_CHECKPOINT at 24666925 between the checkpoint 23868993 and the end 24666925 similar to https://jira.mariadb.org/browse/MDEV-13080

      The second one is a crash:

      #4  __GI_raise (sig=sig@entry=6) at raise.c:50
      #5  __GI_abort () at abort.c:79
      #6  __assert_fail_base (fmt=0x7ffa583e9858 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x16a4106 "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i", file=0x16a3080 "/work/mariadb/storage/innobase/log/log0log.cc", line=839, function=<optimized out>) at assert.c:92
      #7  __GI___assert_fail (assertion=0x16a4106 "pad_len >= len || i * 512U >= len - pad_len || log_block_get_hdr_no( buf + i * 512U) == log_block_get_hdr_no(buf) + i", file=0x16a3080 "/work/mariadb/storage/innobase/log/log0log.cc", line=839, function=0x16a3fd4 "void log_write_buf(byte *, ulint, ulint, lsn_t, ulint)") at assert.c:101
      #8  log_write_buf (buf=0x7ffa484a00ca "\200", len=717312, pad_len=0, start_lsn=23930880, new_data_offset=352) at log0log.cc:835
      #9  log_write_up_to (lsn=24648005, flush_to_disk=true) at log0log.cc:1104
      #10 trx_purge_initiate_truncate (limit=..., undo_trunc=0x1c28cb0 <purge_sys+624>) at trx0purge.cc:1033
      #11 trx_purge_truncate_history () at trx0purge.cc:1109
      #12 trx_purge (n_purge_threads=4, truncate=true) at trx0purge.cc:1623
      #13 srv_do_purge (n_total_purged=0x7ffa3d7f9e48) at srv0srv.cc:2595
      #14 srv_purge_coordinator_thread (arg=0x0) at srv0srv.cc:2720
      #15 start_thread (arg=<optimized out>) at pthread_create.c:486
      #16 clone () at clone.S:95
      

      Both failures happens rarely and only with a bash trick {,,,} which ensures a parallel execution of tests. I suppose it's a concurrency issue.

      Also, I think not only 10.3 is affected but I haven't check it.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              vlad.lesin Vladislav Lesin
              Reporter:
              kevg Eugene Kosov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: