[MDEV-654] LP:1000495 - Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables Created: 2012-05-17  Updated: 2018-06-12  Resolved: 2018-05-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 5.3.9, 10.3
Fix Version/s: 5.5.61, 10.0.36, 10.1.34, 10.2.15, 10.3.7

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug1000495.xml    
Issue Links:
Relates
relates to MDEV-4312 aria: background thread crashes in ma... Closed
relates to MDEV-15256 Crash in flush_cached_blocks Confirmed

 Description   

#6 0x00007f14992ba235 in __assert_fail () from /lib64/libc.so.6
#7 0x0000000000a98d23 in flush_log_for_bitmap (page=0x7f1490ff5038 "", page_no=0,
data_ptr=0x7f14840853f8 "\376\376\t\003\f") at ma_bitmap.c:2905
#8 0x0000000000a28890 in pagecache_fwrite (pagecache=0x1b34280,
filedesc=0x7f1490d32e58, buffer=0x7f1490ff5038 "", pageno=0,
type=PAGECACHE_PLAIN_PAGE, flags=36) at ma_pagecache.c:658
#9 0x0000000000a32a6b in flush_cached_blocks (pagecache=0x1b34280, file=0x7f1484085e70,
cache=0x7f1499289b20, end=0x7f1499289b28, type=FLUSH_KEEP,
first_errno=0x7f1499289afc) at ma_pagecache.c:4489
#10 0x0000000000a33344 in flush_pagecache_blocks_int (pagecache=0x1b34280,
file=0x7f1484085e70, type=FLUSH_KEEP, filter=0xa93f77 <filter_flush_bitmap_pages>,
filter_arg=0x7f1484085fa0) at ma_pagecache.c:4782
#11 0x0000000000a3370b in flush_pagecache_blocks_with_filter (pagecache=0x1b34280,
file=0x7f1484085e70, type=FLUSH_KEEP, filter=0xa93f77 <filter_flush_bitmap_pages>,
filter_arg=0x7f1484085fa0) at ma_pagecache.c:4905
#12 0x0000000000a941af in _ma_bitmap_flush_all (share=0x7f14840853f8) at ma_bitmap.c:535
#13 0x0000000000a3af85 in collect_tables (str=0x7f149928dd80,
checkpoint_start_log_horizon=4295421793) at ma_checkpoint.c:1084
#14 0x0000000000a39761 in really_execute_checkpoint () at ma_checkpoint.c:198
#15 0x0000000000a39553 in ma_checkpoint_execute (level=CHECKPOINT_MEDIUM,
no_wait=1 '\001') at ma_checkpoint.c:132
#16 0x0000000000a3a19e in ma_checkpoint_background (arg=0x1) at ma_checkpoint.c:621
#17 0x00007f1499f75a4f in start_thread () from /lib64/libpthread.so.0
#18 0x00007f149935f82d in clone () from /lib64/libc.so.6

bzr version-info
revision-id: <email address hidden>
date: 2012-05-15 08:31:07 +0300
revno: 3523

Also reproducible on maria/5.5 revno 3407.

The only test case I have for now is the RQG grammar below. It reproduces the problem on relatively decent machines (2x4 cores or 4x2 cores), and it takes from 1 to 30 minutes to get the assertion.
Coredump, server datadir with Aria logs and stack traces can be found on hasky.

The test was run with aria-checkpoint-interval=1 and aria-checkpoint-log-activity=0, but this is not essential, it just allows to get the failure somewhat faster.

RQG grammar (test.yy):

query1:
        SELECT alias1 . _field_indexed AS field1 FROM A AS alias1, B;              
 
query:
     query1 ; CREATE TABLE _tmptable[invariant] AS query1 LIMIT 0; DELETE FROM _tmptable[invariant] ; INSERT INTO _tmptable[invariant] query1 ; DELETE FROM _tmptable[invariant] ; DROP TABLE _tmptable[invariant] ;
 

Run as:

perl runall.pl \
--duration=3600 \
--queries=100M \
--threads=4 \
--rows=1,200 \
--engine=Aria  \
--mysqld=--aria-checkpoint-interval=1 \
--mysqld=--aria-checkpoint-log-activity=0  \
--grammar=test.yy \
--vardir=<your vardir> \
--basedir=<your basedir>



 Comments   
Comment by Rasmus Johansson (Inactive) [ 2012-05-17 ]

Launchpad bug id: 1000495

Comment by Elena Stepanova [ 2012-05-17 ]

Re: Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables
I've set importance to 'Medium' because it's a debug assertion with relatively low probability; but it affects testing, so it would be good to fix it.

Comment by Michael Widenius [ 2018-05-14 ]

I couldn't repeat this as the runall options are for the old rangen which is not supported.

I have however pushed a likely fix for this in 10.3, that we can backport after Elena has time to test if this fixes things.

Comment by Michael Widenius [ 2018-05-15 ]

Problem was that the bitmap needs to be flushed before disabling
logging of redo entries, as writing the bitmap to disk by
background checkpoint may cause redo entries.

Generated at Thu Feb 08 06:30:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.