The test encryption.innochecksum very often hangs, timing out in 900 seconds (15 minutes). I found it while working on MDEV-29445, but I see that the test fails also in other branches. The following (refining something that was modified in MDEV-33053) would fix the hang for me:
While the test encryption.innochecksum occasionally hangs on 10.5 as well, it cannot be caused by MDEV-26827, which is 10.6 only.
Marko Mäkelä
added a comment - While the test encryption.innochecksum occasionally hangs on 10.5 as well, it cannot be caused by MDEV-26827 , which is 10.6 only.
I am done with the review and have given my comments. I would need to re-check once after the comments are addressed.
It would be good if the stack trace for the hang is updated here.
Debarun Banerjee
added a comment - I am done with the review and have given my comments. I would need to re-check once after the comments are addressed.
It would be good if the stack trace for the hang is updated here.
#8 0x000002aa044fb59a in row_purge_reposition_pcur (mode=mode@entry=BTR_MODIFY_LEAF, node=node@entry=0x2aa06c2aaf8, mtr=mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:83
No locals.
#9 0x000002aa044fbd30 in row_purge_reset_trx_id (node=node@entry=0x2aa06c2aaf8, mtr=mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:604
No locals.
#10 0x000002aa044fcc9c in row_purge_record_func (node=node@entry=0x2aa06c2aaf8, undo_rec=0x3ff2b7fd378 "", undo_rec@entry=0x3ff8097d078 "", thr=thr@entry=0x3ff2b7fda08, updated_extern=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:1094
#13 0x000002aa044c5292 in que_thr_step (thr=0x2aa06c2aa58) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:588
node = 0x2aa06c2aaf8
old_thr = 0x2aa06c2aa58
trx = 0x3ff80d17200
type = <optimized out>
node = <optimized out>
old_thr = <optimized out>
trx = <optimized out>
type = <optimized out>
#14 que_run_threads_low (thr=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:644
next_thr = <optimized out>
trx = <optimized out>
#15 que_run_threads (thr=0x2aa06c2aa58) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:664
loop = <optimized out>
trx = 0x3ff80d17200
#16 0x000002aa03df2dd0 in trx_purge (n_tasks=n_tasks@entry=4, history_size=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/trx/trx0purge.cc:1484
no_throttle = <optimized out>
thd = 0x2aa06fc52a8
head = {trx_no = <optimized out>, undo_no = <optimized out>}
n_pages = <optimized out>
thr = <optimized out>
#17 0x000002aa0451c67c in purge_coordinator_state::do_purge (this=this@entry=0x2aa05980320 <purge_state>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1425
n_pages_handled = <optimized out>
n_threads = <optimized out>
#18 0x000002aa0451c29a in purge_coordinator_callback () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1503
ctx = 0x3ff1c000b60
thd = 0x2aa06fc52a8
#19 0x000002aa045ff348 in tpool::task_group::execute (this=0x2aa05980628 <purge_coordinator_task_group>, t=0x2aa05980338 <purge_coordinator_task>) at /home/buildbot/s390x-rhel-9/build/tpool/task_group.cc:73
#20 0x000002aa045fd904 in tpool::thread_pool_generic::worker_main (this=0x2aa06a5ece0, thread_var=0x2aa06a5fe20) at /home/buildbot/s390x-rhel-9/build/tpool/tpool_generic.cc:583
task = 0x2aa05980338 <purge_coordinator_task>
#21 0x000003ff876f298e in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#22 0x000003ff8731e2e6 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#23 0x000003ff872afdbe in thread_start () from /lib64/libc.so.6
No symbol table info available.
…
Thread 1 (Thread 0x3ff881ebca0 (LWP 31001)):
#0 0x000003ff87320126 in __pthread_kill_implementation () from /lib64/libc.so.6
No symbol table info available.
#1 0x000002aa0417fae6 in handle_fatal_signal (sig=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/signal_handler.cc:357
#3 0x000003ff8731ab26 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
No symbol table info available.
#4 0x000003ff8731d774 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#5 0x000003ff876eb8f8 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
No symbol table info available.
#6 0x000002aa045ff682 in tpool::waitable_task::wait (this=this@entry=0x2aa05980338 <purge_coordinator_task>, lk=@0x3ffd1776330: {_M_device = 0x2aa05980358 <purge_coordinator_task+32>, _M_owns = true}) at /home/buildbot/s390x-rhel-9/build/tpool/task.cc:63
No locals.
#7 0x000002aa045ff794 in tpool::waitable_task::disable (this=this@entry=0x2aa05980338 <purge_coordinator_task>) at /home/buildbot/s390x-rhel-9/build/tpool/task.cc:80
#9 srv_purge_shutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1580
history_size = 6
#10 0x000002aa0451d13c in innodb_preshutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0start.cc:1993
first_time = false
#11 innodb_preshutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0start.cc:1979
first_time = false
#12 0x000002aa0417fe42 in plugin_pre_shutdown (plugin=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/handler.cc:1113
hton = <optimized out>
#13 0x000002aa03f1f624 in plugin_foreach_with_mask (thd=thd@entry=0x0, func=func@entry=0x2aa0417fe20 <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=10, arg=arg@entry=0x0) at /home/buildbot/s390x-rhel-9/build/sql/sql_plugin.cc:2559
idx = 8
total = 1146
plugin = <optimized out>
plugins = 0x3ffd17765e0
res = 0 '\000'
#14 0x000002aa04183cac in ha_pre_shutdown () at /home/buildbot/s390x-rhel-9/build/sql/handler.cc:1120
No locals.
#15 0x000002aa03e0add8 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/mysqld.cc:6001
please_close_stdin = <optimized out>
ho_error = <optimized out>
new_thread_stack_size = <optimized out>
user = <optimized out>
#16 0x000003ff872b08b2 in __libc_start_call_main () from /lib64/libc.so.6
No symbol table info available.
#17 0x000003ff872b0990 in __libc_start_main_impl () from /lib64/libc.so.6
No symbol table info available.
#18 0x000002aa03dfd1c0 in _start () at /home/buildbot/s390x-rhel-9/build/sql/my_decimal.h:147
No symbol table info available.
I only included InnoDB threads above. We can see that both buf_LRU_get_free_block() and buf_flush_page_cleaner() are in an indefinite (untimed) wait. The caller of buf_LRU_get_free_block() is a purge worker task, which Thread 1 is waiting for. At one point during the development of MDEV-29445 I was able to reproduce this scenario deterministically on my local system.
Marko Mäkelä
added a comment - Here is an example of a hang :
bb-11.5-bar-MDEV-25829 1e0e0e27b862fc0907f9c9599e467edfc4c9b3c4
encryption.innochecksum '8k,ctr,innodb,release,strict_full_crc32' w6 [ fail ] Found warnings/errors in server log file!
Test ended at 2024-03-14 01:04:32
line
Attempting backtrace. You can use the following information to find out
^ Found warnings in /home/buildbot/s390x-rhel-9/build/mysql-test/var/6/log/mysqld.1.err
…
Thread 6 (Thread 0x3ff7bfff840 (LWP 31032)):
#0 0x000003ff8731ab26 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
No symbol table info available.
#1 0x000003ff8731d774 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2 0x000002aa04576862 in buf_flush_page_cleaner () at /home/buildbot/s390x-rhel-9/build/storage/innobase/buf/buf0flu.cc:2460
soft_lsn_limit = <optimized out>
oldest_lsn = <optimized out>
pct_lwm = <optimized out>
n_flushed = <optimized out>
n = <optimized out>
dirty_blocks = <optimized out>
dirty_pct = <optimized out>
last_pages = <optimized out>
abstime = {tv_sec = 1710378205, tv_nsec = 429323000}
lsn_limit = 0
last_activity_count = <optimized out>
#3 0x000003ff876f298e in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#4 0x000003ff8731e2e6 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#5 0x000003ff872afdbe in thread_start () from /lib64/libc.so.6
No symbol table info available.
…
Thread 3 (Thread 0x3ff2b7fe840 (LWP 31199)):
#0 0x000003ff8731ab26 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
No symbol table info available.
#1 0x000003ff8731d774 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#2 0x000002aa0457a1aa in buf_LRU_get_free_block (get=get@entry=have_mutex) at /home/buildbot/s390x-rhel-9/build/storage/innobase/buf/buf0lru.cc:481
wake = true
n_flush = 0
retry = <optimized out>
n_iterations = 0
flush_failures = 0
block = <optimized out>
#3 0x000002aa0457af94 in buf_read_page (page_id=page_id@entry={m_id = 51539607582}, zip_size=zip_size@entry=0, chain=@0x2aa06b52a10: {first = 0x0}) at /home/buildbot/s390x-rhel-9/build/storage/innobase/buf/buf0rea.cc:429
space = 0x3ff541c7068
block = 0x0
err = <optimized out>
#4 0x000002aa03df85f2 in buf_page_get_gen (page_id=page_id@entry={m_id = 51539607582}, zip_size=zip_size@entry=0, rw_latch=4396028032552, rw_latch@entry=729795592, guess=0x0, guess@entry=0x3ff2b7fcb68, mode=4395911718976, mode@entry=10, mtr=0x3ff2b7fd358, err=0x3ff2b7fcb0c) at /home/buildbot/s390x-rhel-9/build/storage/innobase/buf/buf0buf.cc:2677
local_err = <optimized out>
loop = <optimized out>
retries = <optimized out>
chain = @0x2aa0707af60: {first = 0x38800000001}
hash_lock = @0x2aa04fd91c0: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 26054432}, <No data fields>}}}
block = 0x0
state = <optimized out>
#5 0x000002aa04555d66 in btr_cur_t::search_leaf (this=this@entry=0x2aa06c2ab80, tuple=tuple@entry=0x3ff1c028368, mode=mode@entry=PAGE_CUR_LE, latch_mode=729795360, mtr=0xaa, mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/btr/btr0cur.cc:1142
same_key_root = <optimized out>
search_loop = <optimized out>
guess = 0x3ff2b7fcb68
lock_intention = <optimized out>
detected_same_key_root = true
heap = 0x0
offsets_ = {300, 2, 32774, 4, 8, 2, 1718, 59160, 0, 12, 0, 0, 0, 1023, 32919, 49152, 0, 0, 0, 8192, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 34, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1023, 34722, 46448, 0, 1023, 11135, 51968, 240, 29806, 60917, 2424, 0, 0, 0, 8162, 0, 1023, 34610, 56812, 0, 1023, 11135, 52000, 0, 0, 32, 30286, 0, 0, 0, 170, 0, 1023, 11135, 52232, 0, 1023, 34811, 45312, 0, 1023, 11135, 52072, 0, 682, 1277, 37312, 0, 1023, 11135, 53252, 0, 682, 1087, 27786, 0, 1023, 11135, 52072, 0, 0, 0, 34, 0, 1023, 7168, 2256, 0, 0, 256, 64562, 0, 0, 0, 5, 0, 0, 0, 22, 0, 1023, 11135, 52600, 0, 1023, 11135, 55241, 0, 1023, 11135, 55256, 0 <repeats 11 times>, 1, 0, 682, 1730, 43849, 0, 1023, 11135, 52176, 57034, 19424, 30044, 27368, 0, 1023, 11135, 52896, 0, 1023, 11135, 52888, 0, 1023, 11135, 52896, 0, 0, 0, 22, 0, 1023, 11135, 52600, 0, 1023, 11135, 52264, 0, 682, 1277, 37312, 0, 0, 0, 14, 0, 682, 986, 63592, 0, 1023, 11135, 52264, 0, 682, 1277, 37312, 0...}
offsets = 0x20764e
offsets2_ = {300, 0, 0, 0, 0, 11263, 44297, 21140, 0, 0, 0, 34, 0, 0, 0, 16, 0, 1023, 32919, 49152, 0, 1023, 11135, 52552, 0, 682, 1277, 37312, 0, 682, 1718, 59160, 0, 682, 1110, 33984, 0, 1023, 11135, 52552, 0, 0, 0, 8192, 0, 682, 1284, 14896, 0, 0, 0, 8192, 0, 1023, 11135, 52848, 0, 1023, 0, 0, 0, 682, 1718, 59496, 0, 1023, 11135, 59456, 0, 0, 0, 0, 0, 1, 0, 57, 0, 1, 0, 57, 0, 1, 0, 57, 0, 682, 1718, 60024, 0, 1023, 11135, 52688, 0, 682, 1277, 37312, 0, 0, 0, 0, 0, 0, 0, 107, 0, 1023, 32913, 36352, 0, 0, 0, 8192, 0, 1023, 11135, 52912, 0, 0, 0, 0, 0, 1023, 11135, 52752, 0, 7, 0, 6, 0, 1023, 33532, 30208, 0, 0, 0, 0, 0, 682, 1630, 21248, 0, 1023, 34440, 4608, 0, 0, 0, 0, 0, 1023, 11135, 52928, 0, 1023, 11135, 52928, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1023, 11135, 52896, 0, 1023, 11135, 52896, 45057, 3072, 11823, 29797, 57034, 19424, 45691, 62183, 25600, 1023, 11135, 55816, 0, 682, 1284, 2324, 0 <repeats 13 times>, 1023, 32768...}
offsets2 = 0x22
latch_by_caller = <optimized out>
info = <optimized out>
ahi_enabled = <optimized out>
savepoint = 0
node_ptr_max_size = 4396027131248
compress_limit = 0
rw_latch = (RW_NO_LATCH | unknown: 0x2b7fcc00)
err = DB_SUCCESS
zip_size = <optimized out>
page_id = {m_id = 51539607582}
page_mode = 3992258936
height = 2929251422656
block_savepoint = 2
block = <optimized out>
not_first_access = <optimized out>
page_level = <optimized out>
#6 0x000002aa04500e18 in btr_pcur_open (mtr=0x3ff2b7fd358, cursor=0x2aa06c2ab80, latch_mode=<optimized out>, mode=PAGE_CUR_LE, tuple=0x3ff1c028368) at /home/buildbot/s390x-rhel-9/build/storage/innobase/include/btr0pcur.h:431
No locals.
#7 row_search_on_row_ref (pcur=pcur@entry=0x2aa06c2ab80, mode=<optimized out>, table=<optimized out>, ref=0x3ff1c028368, mtr=mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0row.cc:1211
index = 0x3ff541c7290
#8 0x000002aa044fb59a in row_purge_reposition_pcur (mode=mode@entry=BTR_MODIFY_LEAF, node=node@entry=0x2aa06c2aaf8, mtr=mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:83
No locals.
#9 0x000002aa044fbd30 in row_purge_reset_trx_id (node=node@entry=0x2aa06c2aaf8, mtr=mtr@entry=0x3ff2b7fd358) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:604
No locals.
#10 0x000002aa044fcc9c in row_purge_record_func (node=node@entry=0x2aa06c2aaf8, undo_rec=0x3ff2b7fd378 "", undo_rec@entry=0x3ff8097d078 "", thr=thr@entry=0x3ff2b7fda08, updated_extern=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:1094
mtr = {m_last = 0x0, m_last_offset = 0, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_latch_ex = 0, m_trim_pages = 0, m_crc = 2067013353, m_memo = {<small_vector_base> = {BeginX = 0x3ff2b7fd378, Size = 2, Capacity = 16}, small = {{object = 0x3ff541c7390, type = MTR_MEMO_S_LOCK}, {object = 0x3ff8081a160, type = MTR_MEMO_PAGE_S_FIX}, {object = 0x3ff80a1a5c0, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x2aa04fd91c0, type = 0}, {object = 0x2aa0449d592 <mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long)+210>, type = 1023}, {object = 0x6, type = 1023}, {object = 0x15df, type = 682}, {object = 0x3ff2b7fda08, type = 0}, {object = 0x6, type = 1023}, {object = 0x1, type = 15758446}, {object = 0x3ff2b7fd5b7, type = 0}, {object = 0x3ff2b7fd448, type = 0}, {object = 0x3ff2b7fe840, type = 682}, {object = 0x3ff2b7fd9f8, type = 682}, {object = 0x3ff2b7fd400, type = 682}, {object = 0x10000003c, type = 682}}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x3ff2b7fd4a0, prev = 0x3ff2b7fd4a0}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x3ff2b7fd480, prev = 0x3ff2b7fd480}, m_data = "F\f\035\212E\006\000\262\000\200\303\000\006\000\000<", '\000' <repeats 15 times>, "\002\000\000\003\377+\177\324\340\000\000\000\000\000\000\000\020\000\000\003\377\200a\210`\000\000\000\001\000\000\000\002\000\000\003\377+\177test\000\377+\177\324\320\000\000\000\000\000\000\000\000\000\000\002\252\006\245\354\340\000\000\000\000\000\000\000\001\336\312K\340u]\326\350\000\000\000\000\000\000\001\347\000\000\000\000\000#\342\333\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\000\336\312K\340\034\243Z\305\000\000\002\252\006e\377H\000\000\000\000\000\000\000\200\000\000\002\252\006\245\356\000\001\312K\340\034\243F\305\000\000\000\000\000\000\000\000\000\000\003\377+\177\325\220"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
purged = true
#11 0x000002aa044fd2da in row_purge (thr=<optimized out>, undo_rec=<optimized out>, node=0x2aa06c2aaf8) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:1138
purged = <optimized out>
updated_extern = false
#12 row_purge_step (thr=thr@entry=0x2aa06c2aa58) at /home/buildbot/s390x-rhel-9/build/storage/innobase/row/row0purge.cc:1201
purge_rec = {undo_rec = <optimized out>, roll_ptr = <optimized out>}
node = 0x2aa06c2aaf8
#13 0x000002aa044c5292 in que_thr_step (thr=0x2aa06c2aa58) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:588
node = 0x2aa06c2aaf8
old_thr = 0x2aa06c2aa58
trx = 0x3ff80d17200
type = <optimized out>
node = <optimized out>
old_thr = <optimized out>
trx = <optimized out>
type = <optimized out>
#14 que_run_threads_low (thr=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:644
next_thr = <optimized out>
trx = <optimized out>
#15 que_run_threads (thr=0x2aa06c2aa58) at /home/buildbot/s390x-rhel-9/build/storage/innobase/que/que0que.cc:664
loop = <optimized out>
trx = 0x3ff80d17200
#16 0x000002aa03df2dd0 in trx_purge (n_tasks=n_tasks@entry=4, history_size=<optimized out>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/trx/trx0purge.cc:1484
no_throttle = <optimized out>
thd = 0x2aa06fc52a8
head = {trx_no = <optimized out>, undo_no = <optimized out>}
n_pages = <optimized out>
thr = <optimized out>
#17 0x000002aa0451c67c in purge_coordinator_state::do_purge (this=this@entry=0x2aa05980320 <purge_state>) at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1425
n_pages_handled = <optimized out>
n_threads = <optimized out>
#18 0x000002aa0451c29a in purge_coordinator_callback () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1503
ctx = 0x3ff1c000b60
thd = 0x2aa06fc52a8
#19 0x000002aa045ff348 in tpool::task_group::execute (this=0x2aa05980628 <purge_coordinator_task_group>, t=0x2aa05980338 <purge_coordinator_task>) at /home/buildbot/s390x-rhel-9/build/tpool/task_group.cc:73
lk = {_M_device = <optimized out>, _M_owns = false}
#20 0x000002aa045fd904 in tpool::thread_pool_generic::worker_main (this=0x2aa06a5ece0, thread_var=0x2aa06a5fe20) at /home/buildbot/s390x-rhel-9/build/tpool/tpool_generic.cc:583
task = 0x2aa05980338 <purge_coordinator_task>
#21 0x000003ff876f298e in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#22 0x000003ff8731e2e6 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#23 0x000003ff872afdbe in thread_start () from /lib64/libc.so.6
No symbol table info available.
…
Thread 1 (Thread 0x3ff881ebca0 (LWP 31001)):
#0 0x000003ff87320126 in __pthread_kill_implementation () from /lib64/libc.so.6
No symbol table info available.
#1 0x000002aa0417fae6 in handle_fatal_signal (sig=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/signal_handler.cc:357
curr_time = 1710378264
tm = {tm_sec = 24, tm_min = 4, tm_hour = 1, tm_mday = 14, tm_mon = 2, tm_year = 124, tm_wday = 4, tm_yday = 73, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x2aa066b4ef0 "UTC"}
thd = 0x0
print_invalid_query_pointer = 89
#2 <signal handler called>
No symbol table info available.
#3 0x000003ff8731ab26 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
No symbol table info available.
#4 0x000003ff8731d774 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
No symbol table info available.
#5 0x000003ff876eb8f8 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
No symbol table info available.
#6 0x000002aa045ff682 in tpool::waitable_task::wait (this=this@entry=0x2aa05980338 <purge_coordinator_task>, lk=@0x3ffd1776330: {_M_device = 0x2aa05980358 <purge_coordinator_task+32>, _M_owns = true}) at /home/buildbot/s390x-rhel-9/build/tpool/task.cc:63
No locals.
#7 0x000002aa045ff794 in tpool::waitable_task::disable (this=this@entry=0x2aa05980338 <purge_coordinator_task>) at /home/buildbot/s390x-rhel-9/build/tpool/task.cc:80
lk = {_M_device = 0x2aa05980358 <purge_coordinator_task+32>, _M_owns = true}
#8 0x000002aa0451bfb2 in srv_shutdown_purge_tasks () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1515
lk = {_M_device = 0xdeca4be0719c7ee8, _M_owns = false}
lk = {_M_device = <optimized out>, _M_owns = <optimized out>}
#9 srv_purge_shutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0srv.cc:1580
history_size = 6
#10 0x000002aa0451d13c in innodb_preshutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0start.cc:1993
first_time = false
#11 innodb_preshutdown () at /home/buildbot/s390x-rhel-9/build/storage/innobase/srv/srv0start.cc:1979
first_time = false
#12 0x000002aa0417fe42 in plugin_pre_shutdown (plugin=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/handler.cc:1113
hton = <optimized out>
#13 0x000002aa03f1f624 in plugin_foreach_with_mask (thd=thd@entry=0x0, func=func@entry=0x2aa0417fe20 <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=10, arg=arg@entry=0x0) at /home/buildbot/s390x-rhel-9/build/sql/sql_plugin.cc:2559
idx = 8
total = 1146
plugin = <optimized out>
plugins = 0x3ffd17765e0
res = 0 '\000'
#14 0x000002aa04183cac in ha_pre_shutdown () at /home/buildbot/s390x-rhel-9/build/sql/handler.cc:1120
No locals.
#15 0x000002aa03e0add8 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/buildbot/s390x-rhel-9/build/sql/mysqld.cc:6001
please_close_stdin = <optimized out>
ho_error = <optimized out>
new_thread_stack_size = <optimized out>
user = <optimized out>
#16 0x000003ff872b08b2 in __libc_start_call_main () from /lib64/libc.so.6
No symbol table info available.
#17 0x000003ff872b0990 in __libc_start_main_impl () from /lib64/libc.so.6
No symbol table info available.
#18 0x000002aa03dfd1c0 in _start () at /home/buildbot/s390x-rhel-9/build/sql/my_decimal.h:147
No symbol table info available.
I only included InnoDB threads above. We can see that both buf_LRU_get_free_block() and buf_flush_page_cleaner() are in an indefinite (untimed) wait. The caller of buf_LRU_get_free_block() is a purge worker task, which Thread 1 is waiting for. At one point during the development of MDEV-29445 I was able to reproduce this scenario deterministically on my local system.
debarun made a valid point that we’d better use a timed wait in buf_LRU_get_free_block(), so that we can test and set buf_pool.LRU_warned and possibly display a warning message if the buf_flush_page_cleaner() thread appears to be hung. I do not currently think that the other review comments mandate any code changes.
Marko Mäkelä
added a comment - debarun made a valid point that we’d better use a timed wait in buf_LRU_get_free_block() , so that we can test and set buf_pool.LRU_warned and possibly display a warning message if the buf_flush_page_cleaner() thread appears to be hung. I do not currently think that the other review comments mandate any code changes.
By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. This hang here occurs when the buffer pool is only temporarily running out and the situation can be rescued by writing out some dirty pages or evicting some clean pages.
Marko Mäkelä
added a comment - By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. This hang here occurs when the buffer pool is only temporarily running out and the situation can be rescued by writing out some dirty pages or evicting some clean pages.
The parameter innodb_lru_flush_size became useless as part of this fix. debarun, maybe we should file a follow-up, make that parameter a data member of buf_pool and consider a patch like this:
Marko Mäkelä
added a comment - The parameter innodb_lru_flush_size became useless as part of this fix. debarun , maybe we should file a follow-up, make that parameter a data member of buf_pool and consider a patch like this:
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 46129a037b0..4f1b40e309c 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2054,7 +2054,7 @@ static void buf_flush_sync_for_checkpoint(lsn_t lsn)
/* We intend to only evict pages keeping maximum flush bandwidth for
flush list pages advancing checkpoint. However, if the LRU tail is full
of dirty pages, we might need some flushing. */
- std::ignore= buf_flush_LRU(srv_io_capacity);
+ std::ignore= buf_flush_LRU(innodb_lru_flush_size);
mysql_mutex_unlock(&buf_pool.mutex);
mysql_mutex_lock(&buf_pool.flush_list_mutex);
People
Marko Mäkelä
Marko Mäkelä
Votes:
1Vote for this issue
Watchers:
8Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
While the test encryption.innochecksum occasionally hangs on 10.5 as well, it cannot be caused by
MDEV-26827, which is 10.6 only.