Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-654

LP:1000495 - Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables

Details

    Description

      #6 0x00007f14992ba235 in __assert_fail () from /lib64/libc.so.6
      #7 0x0000000000a98d23 in flush_log_for_bitmap (page=0x7f1490ff5038 "", page_no=0,
      data_ptr=0x7f14840853f8 "\376\376\t\003\f") at ma_bitmap.c:2905
      #8 0x0000000000a28890 in pagecache_fwrite (pagecache=0x1b34280,
      filedesc=0x7f1490d32e58, buffer=0x7f1490ff5038 "", pageno=0,
      type=PAGECACHE_PLAIN_PAGE, flags=36) at ma_pagecache.c:658
      #9 0x0000000000a32a6b in flush_cached_blocks (pagecache=0x1b34280, file=0x7f1484085e70,
      cache=0x7f1499289b20, end=0x7f1499289b28, type=FLUSH_KEEP,
      first_errno=0x7f1499289afc) at ma_pagecache.c:4489
      #10 0x0000000000a33344 in flush_pagecache_blocks_int (pagecache=0x1b34280,
      file=0x7f1484085e70, type=FLUSH_KEEP, filter=0xa93f77 <filter_flush_bitmap_pages>,
      filter_arg=0x7f1484085fa0) at ma_pagecache.c:4782
      #11 0x0000000000a3370b in flush_pagecache_blocks_with_filter (pagecache=0x1b34280,
      file=0x7f1484085e70, type=FLUSH_KEEP, filter=0xa93f77 <filter_flush_bitmap_pages>,
      filter_arg=0x7f1484085fa0) at ma_pagecache.c:4905
      #12 0x0000000000a941af in _ma_bitmap_flush_all (share=0x7f14840853f8) at ma_bitmap.c:535
      #13 0x0000000000a3af85 in collect_tables (str=0x7f149928dd80,
      checkpoint_start_log_horizon=4295421793) at ma_checkpoint.c:1084
      #14 0x0000000000a39761 in really_execute_checkpoint () at ma_checkpoint.c:198
      #15 0x0000000000a39553 in ma_checkpoint_execute (level=CHECKPOINT_MEDIUM,
      no_wait=1 '\001') at ma_checkpoint.c:132
      #16 0x0000000000a3a19e in ma_checkpoint_background (arg=0x1) at ma_checkpoint.c:621
      #17 0x00007f1499f75a4f in start_thread () from /lib64/libpthread.so.0
      #18 0x00007f149935f82d in clone () from /lib64/libc.so.6

      bzr version-info
      revision-id: <email address hidden>
      date: 2012-05-15 08:31:07 +0300
      revno: 3523

      Also reproducible on maria/5.5 revno 3407.

      The only test case I have for now is the RQG grammar below. It reproduces the problem on relatively decent machines (2x4 cores or 4x2 cores), and it takes from 1 to 30 minutes to get the assertion.
      Coredump, server datadir with Aria logs and stack traces can be found on hasky.

      The test was run with aria-checkpoint-interval=1 and aria-checkpoint-log-activity=0, but this is not essential, it just allows to get the failure somewhat faster.

      RQG grammar (test.yy):

      query1:
              SELECT alias1 . _field_indexed AS field1 FROM A AS alias1, B;              
       
      query:
           query1 ; CREATE TABLE _tmptable[invariant] AS query1 LIMIT 0; DELETE FROM _tmptable[invariant] ; INSERT INTO _tmptable[invariant] query1 ; DELETE FROM _tmptable[invariant] ; DROP TABLE _tmptable[invariant] ;
       

      Run as:

      perl runall.pl \
      --duration=3600 \
      --queries=100M \
      --threads=4 \
      --rows=1,200 \
      --engine=Aria  \
      --mysqld=--aria-checkpoint-interval=1 \
      --mysqld=--aria-checkpoint-log-activity=0  \
      --grammar=test.yy \
      --vardir=<your vardir> \
      --basedir=<your basedir>

      Attachments

        Issue Links

          Activity

            Launchpad bug id: 1000495

            ratzpo Rasmus Johansson (Inactive) added a comment - Launchpad bug id: 1000495

            Re: Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables
            I've set importance to 'Medium' because it's a debug assertion with relatively low probability; but it affects testing, so it would be good to fix it.

            elenst Elena Stepanova added a comment - Re: Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables I've set importance to 'Medium' because it's a debug assertion with relatively low probability; but it affects testing, so it would be good to fix it.

            I couldn't repeat this as the runall options are for the old rangen which is not supported.

            I have however pushed a likely fix for this in 10.3, that we can backport after Elena has time to test if this fixes things.

            monty Michael Widenius added a comment - I couldn't repeat this as the runall options are for the old rangen which is not supported. I have however pushed a likely fix for this in 10.3, that we can backport after Elena has time to test if this fixes things.

            Problem was that the bitmap needs to be flushed before disabling
            logging of redo entries, as writing the bitmap to disk by
            background checkpoint may cause redo entries.

            monty Michael Widenius added a comment - Problem was that the bitmap needs to be flushed before disabling logging of redo entries, as writing the bitmap to disk by background checkpoint may cause redo entries.

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.