[MDEV-31899] Server crashes when thread_stack is set to the higher value Created: 2023-08-11  Updated: 2023-11-03

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Ramesh Sivaraman Assignee: Ramesh Sivaraman
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File full_bt.log    

 Description   

# mysqld options required for replay: --thread-stack=1125899906842624

Leads to

10.6.15 0be4781428a4044b13b085965820a995652bb0e9 (Debug)

mariadbd: /test/10.6_dbg/storage/innobase/include/fsp0sysspace.h:283: bool is_predefined_tablespace(ulint): Assertion `srv_sys_space.space_id() == TRX_SYS_SPACE' failed.

10.6.15 0be4781428a4044b13b085965820a995652bb0e9 (Debug)

Core was generated by `/test/MD100823-mariadb-10.6.15-linux-x86_64-dbg/bin/mariadbd --no-defaults --th'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x14e42a82f700 (LWP 3999194))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014e441ebd859 in __GI_abort () at abort.c:79
#2  0x000014e441ebd729 in __assert_fail_base (fmt=0x14e442053588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x563be16ddab0 "srv_sys_space.space_id() == TRX_SYS_SPACE", file=0x563be16ddae0 "/test/10.6_dbg/storage/innobase/include/fsp0sysspace.h", line=283, function=<optimized out>) at assert.c:92
#3  0x000014e441ecefd6 in __GI___assert_fail (assertion=assertion@entry=0x563be16ddab0 "srv_sys_space.space_id() == TRX_SYS_SPACE", file=file@entry=0x563be16ddae0 "/test/10.6_dbg/storage/innobase/include/fsp0sysspace.h", line=line@entry=283, function=function@entry=0x563be16ddb18 "bool is_predefined_tablespace(ulint)") at assert.c:101
#4  0x0000563be0faa79b in is_predefined_tablespace (id=2) at /test/10.6_dbg/storage/innobase/include/fsp0space.h:117
#5  mtr_t::do_write (this=this@entry=0x14e42a82e670) at /test/10.6_dbg/storage/innobase/mtr/mtr0mtr.cc:829
#6  0x0000563be0fad784 in mtr_t::commit (this=this@entry=0x14e42a82e670) at /test/10.6_dbg/storage/innobase/mtr/mtr0mtr.cc:147
#7  0x0000563be107e0b1 in row_purge_reset_trx_id (node=node@entry=0x563be350d318, mtr=mtr@entry=0x14e42a82e670) at /test/10.6_dbg/storage/innobase/row/row0purge.cc:741
#8  0x0000563be107ff4b in row_purge_record_func (node=node@entry=0x563be350d318, undo_rec=undo_rec@entry=0x563be351bf08 "", thr=thr@entry=0x563be350d278, updated_extern=<optimized out>) at /test/10.6_dbg/storage/innobase/row/row0purge.cc:1232
#9  0x0000563be10822df in row_purge (thr=<optimized out>, undo_rec=<optimized out>, node=<optimized out>) at /test/10.6_dbg/storage/innobase/row/row0purge.cc:1276
#10 row_purge_step (thr=thr@entry=0x563be350d278) at /test/10.6_dbg/storage/innobase/row/row0purge.cc:1339
#11 0x0000563be0ffc3e7 in que_thr_step (thr=0x563be350d278) at /test/10.6_dbg/storage/innobase/que/que0que.cc:588
#12 que_run_threads_low (thr=0x563be350d278) at /test/10.6_dbg/storage/innobase/que/que0que.cc:644
#13 que_run_threads (thr=thr@entry=0x563be350d278) at /test/10.6_dbg/storage/innobase/que/que0que.cc:664
#14 0x0000563be10cc40d in srv_task_execute () at /test/10.6_dbg/storage/innobase/srv/srv0srv.cc:1598
#15 purge_worker_callback () at /test/10.6_dbg/storage/innobase/srv/srv0srv.cc:1853
#16 0x0000563be12aee34 in tpool::task_group::execute (this=0x563be2604bc0 <purge_task_group>, t=t@entry=0x563be25d1de0 <purge_worker_task>) at /test/10.6_dbg/tpool/task_group.cc:55
#17 0x0000563be12aeebd in tpool::task::execute (this=0x563be25d1de0 <purge_worker_task>) at /test/10.6_dbg/tpool/task.cc:32
#18 0x0000563be12ace7b in tpool::thread_pool_generic::worker_main (this=0x563be347cae0, thread_var=0x563be347cf10) at /test/10.6_dbg/tpool/tpool_generic.cc:580
#19 0x0000563be12ae0c8 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
#20 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#21 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#22 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#23 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#24 0x000014e4422b4de4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x000014e4423ce609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#26 0x000014e441fba133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Attached full backtrace



 Comments   
Comment by Marko Mäkelä [ 2023-08-11 ]

I think that this must be caused by a rogue write to srv_sys.m_space_id somewhere. That field is supposed to be always 0.

Please provide detailed output of a GDB session where this can be reproduced. I think that we need the following GDB commands:

watch -l srv_sys_space.m_space_id
run

Each time the watchpoint is hit, execute backtrace and continue.

Comment by Marko Mäkelä [ 2023-08-11 ]

Can you provide a self-contained test case, including the full build and invocation parameters and a copy of a data directory?

Comment by Marko Mäkelä [ 2023-08-11 ]

I tried to invoke the provided executable. I can’t reproduce the described failure within GDB or rr record. In rr record, I was able to reproduce some memory leaks, which I believe are due to MDEV-31886.

The following seems to reproduce the reported assertion failure, but I would have to analyze it inside a debugger.

rm -fr /dev/shm/marko
/data/ramesh/10.6/mariadb-10.6.15-linux-x86_64/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=/data/ramesh/10.6/mariadb-10.6.15-linux-x86_64 --datadir=/dev/shm/marko
/data/ramesh/10.6/mariadb-10.6.15-linux-x86_64/bin/mariadbd  --no-defaults --thread-stack=1125899906842624 --basedir=/data/ramesh/10.6/mariadb-10.6.15-linux-x86_64 --tmpdir=/data/ramesh/10.6/mariadb-10.6.15-linux-x86_64/data --datadir=/dev/shm/marko --socket /dev/shm/marko/mariadb.sock

Generated at Thu Feb 08 10:27:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.