[MDEV-26017] Assertion `stat.flush_list_bytes <= curr_pool_size' failed in buf_pool_t::insert_into_flush_list Created: 2021-06-24  Updated: 2021-07-03  Resolved: 2021-06-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.5.12, 10.6.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-25113 Reduce effect of parallel background ... Closed

 Description   

origin/10.6, 10.6 82fe83a34c26b88338f7dae638566f6f3a6e84ac 2021-06-24T16:07:27+03:00
 
mysqld: /data/Server/10.6T/storage/innobase/buf/buf0flu.cc:197: void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t): Assertion `stat.flush_list_bytes <= curr_pool_size' failed.
...
# 2021-06-24T16:45:53 [2904612] | [rr 2905820 495513]Query (0x0): [rr 2905820 495515](null)[rr 2905820 495517]
# 2021-06-24T16:45:53 [2904612] | Connection ID (thread ID): 0
# 2021-06-24T16:45:53 [2904612] | [rr 2905820 495519]Status: NOT_KILLED
...
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007feba5e54859 in __GI_abort () at abort.c:79
#2  0x00007feba5e54729 in __assert_fail_base (fmt=0x7feba5fea588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x559ee67ba8d0 "stat.flush_list_bytes <= curr_pool_size", 
    file=0x559ee67ba670 "/data/Server/10.6T/storage/innobase/buf/buf0flu.cc", line=197, function=<optimized out>) at assert.c:92
#3  0x00007feba5e65f36 in __GI___assert_fail (assertion=0x559ee67ba8d0 "stat.flush_list_bytes <= curr_pool_size", file=0x559ee67ba670 "/data/Server/10.6T/storage/innobase/buf/buf0flu.cc", line=197, 
    function=0x559ee67ba718 "void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t)") at assert.c:101
#4  0x0000559ee60c1f4e in buf_pool_t::insert_into_flush_list (this=0x559ee6d81f00 <buf_pool>, block=0x7feb97b6c810, lsn=53448836) at /data/Server/10.6T/storage/innobase/buf/buf0flu.cc:197
#5  0x0000559ee5ed05d3 in buf_flush_note_modification (block=0x7feb97b6c810, start_lsn=53448836, end_lsn=53448862) at /data/Server/10.6T/storage/innobase/include/buf0flu.ic:62
#6  0x0000559ee5ed0ffc in ReleaseBlocks::operator() (this=0x7feb68aaff50, slot=0x7feb68ab0040) at /data/Server/10.6T/storage/innobase/mtr/mtr0mtr.cc:348
#7  0x0000559ee5ed36de in CIterate<ReleaseBlocks const>::operator() (this=0x7feb68aaff50, block=0x7feb68ab0010) at /data/Server/10.6T/storage/innobase/mtr/mtr0mtr.cc:61
#8  0x0000559ee5ed28cf in mtr_buf_t::for_each_block_in_reverse<CIterate<ReleaseBlocks const> > (this=0x7feb68aaffe8, functor=...) at /data/Server/10.6T/storage/innobase/include/dyn0buf.h:379
#9  0x0000559ee5ecc0a3 in mtr_t::commit (this=0x7feb68aaffd0) at /data/Server/10.6T/storage/innobase/mtr/mtr0mtr.cc:444
#10 0x0000559ee5ffcd0f in trx_purge_truncate_rseg_history (rseg=..., limit=...) at /data/Server/10.6T/storage/innobase/trx/trx0purge.cc:493
#11 0x0000559ee5ffd107 in trx_purge_truncate_history () at /data/Server/10.6T/storage/innobase/trx/trx0purge.cc:562
#12 0x0000559ee5fff973 in trx_purge (n_tasks=4, truncate=true) at /data/Server/10.6T/storage/innobase/trx/trx0purge.cc:1275
#13 0x0000559ee5fea236 in srv_do_purge (n_total_purged=0x7feb68ab0cb8) at /data/Server/10.6T/storage/innobase/srv/srv0srv.cc:1845
#14 0x0000559ee5fea736 in purge_coordinator_callback_low () at /data/Server/10.6T/storage/innobase/srv/srv0srv.cc:1932
#15 0x0000559ee5fea817 in purge_coordinator_callback () at /data/Server/10.6T/storage/innobase/srv/srv0srv.cc:1961
#16 0x0000559ee61c54e8 in tpool::task_group::execute (this=0x559ee77ca1e0 <purge_coordinator_task_group>, t=0x559ee77ca280 <purge_coordinator_task>) at /data/Server/10.6T/tpool/task_group.cc:55
#17 0x0000559ee61c5864 in tpool::task::execute (this=0x559ee77ca280 <purge_coordinator_task>) at /data/Server/10.6T/tpool/task.cc:47
#18 0x0000559ee61be1ff in tpool::thread_pool_generic::worker_main (this=0x559ee8ffdea0, thread_var=0x559ee900d8c0) at /data/Server/10.6T/tpool/tpool_generic.cc:550
#19 0x0000559ee61c52f0 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
    @0x7feb70049468: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x559ee61be166 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, 
    __t=@0x7feb70049460: 0x559ee8ffdea0) at /usr/include/c++/9/bits/invoke.h:73
#20 0x0000559ee61c51d2 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
    @0x7feb70049468: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x559ee61be166 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>)
    at /usr/include/c++/9/bits/invoke.h:95
#21 0x0000559ee61c50f1 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7feb70049458)
    at /usr/include/c++/9/thread:244
#22 0x0000559ee61c508c in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7feb70049458) at /usr/include/c++/9/thread:251
#23 0x0000559ee61c506c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7feb70049450)
    at /usr/include/c++/9/thread:195
#24 0x00007feba6261d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x00007feba637d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#26 0x00007feba5f51293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
pluto:/data/Results/1624552691/TBR-1133/dev/shm/vardir/1624552691/143/1/rr
 
RQG
===
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=240 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--innodb_adaptive_hash_index=on \
--mysqld=--log-bin \
--mysqld=--loose-max-statement-time=30 \
--threads=9 \
--mysqld=--innodb-use-native-aio=0 \
--rr=Extended \
--rr_options="--wait" \
--mysqld=--innodb_page_size=64K \
--mysqld=--innodb-buffer-pool-size=24M \
--no_mask \
.... <some local settings> 
--script_debug=_nix_
 
[ 'TBR-1133-MDEV-26017', 'mysqld: .{1,200}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list\(buf_block_t\*, lsn_t\): Assertion \`stat.flush_list_bytes <= curr_pool_size\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],



 Comments   
Comment by Matthias Leich [ 2021-06-25 ]

The automatic RQG test simplifier is in the moment running.

Comment by Marko Mäkelä [ 2021-06-26 ]

I believe that this problem was caused by MDEV-25113. The function buf_flush_relocate_on_flush_list() would now have to evict clean blocks (lsn == 1) from buf_pool.flush_list. In the execution trace, I noticed that UT_LIST_REMOVE(buf_pool.flush_list, bpage) in that function would be invoked 40 times until the crash, and for 39 of them, we would not add dpage to buf_pool.flush_list. If we are removing the block from buf_pool.flush_list, we must subtract the block size from buf_pool.stat.flush_list_bytes.

This problem only affects ROW_FORMAT=COMPRESSED tables or buffer pool resizing.

Comment by Marko Mäkelä [ 2021-07-03 ]

The test innodb.innodb_buffer_pool_resize could still occasionally fail on a ROW_FORMAT=COMPRESSED table. Fixed.

Generated at Thu Feb 08 09:42:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.