[MDEV-21903] FTS optimize thread aborts during shutdown Created: 2020-03-10  Updated: 2020-03-13  Resolved: 2020-03-13

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.5.2
Fix Version/s: 10.5.2

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File backtrace.txt    
Issue Links:
Problem/Incident
is caused by MDEV-16264 Implement a common work queue for Inn... Closed

 Description   

origin/10.5 b753ac066bc26acda9deb707a31c112f1bbf9ec2 2020-03-10T15:09:17+05:30
 
Server error log
...
2020-03-10 13:01:56 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
2020-03-10 13:01:56 0 [Note] Event Scheduler: Purging the queue. 0 events
2020-03-10 13:02:08 0 [Note] InnoDB: FTS optimize thread exiting.
200310 13:02:08 [ERROR] mysqld got signal 11 ;
...
stack_bottom = 0x0 thread_stack 0x49000
/home/mleich/10.5/bld_debug/sql/mysqld(my_print_stacktrace+0x40)[0x56258aecfac5]
 /home/mleich/10.5/bld_debug/sql/mysqld(handle_fatal_signal+0x3ae)[0x56258a5c88a1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fe4c18d5890]
/home/mleich/10.5/bld_debug/sql/mysqld(+0x13fa851)[0x56258add7851]
/home/mleich/10.5/bld_debug/sql/mysqld(+0x1401020)[0x56258adde020]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZN5tpool10task_group7executeEPNS_4taskE+0x9c)[0x56258ae32868]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZN5tpool4task7executeEv+0x30)[0x56258ae32c1e]
2020-03-10 13:02:10 0 [Note] InnoDB: Starting shutdown...
/home/mleich/10.5/bld_debug/sql/mysqld(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x98)[0x56258ae2c24e]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZSt13__invoke_implIvMN5tpool19thread_pool_genericEFvPNS0_11worker_dataEEPS1_JS3_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_+0x80)[0x56258ae2f6bd]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZSt8__invokeIMN5tpool19thread_pool_genericEFvPNS0_11worker_dataEEJPS1_S3_EENSt15__invoke_resultIT_JDpT0_EE4typeEOS8_DpOS9_+0x57)[0x56258ae2e9a9]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZNSt6thread8_InvokerISt5tupleIJMN5tpool19thread_pool_genericEFvPNS2_11worker_dataEEPS3_S5_EEE9_M_invokeIJLm0ELm1ELm2EEEEDTcl8__invokespcl10_S_declvalIXT_EEEEESt12_Index_tupleIJXspT_EEE+0x5f)[0x56258ae3253d]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZNSt6thread8_InvokerISt5tupleIJMN5tpool19thread_pool_genericEFvPNS2_11worker_dataEEPS3_S5_EEEclEv+0x1d)[0x56258ae324d7]
/home/mleich/10.5/bld_debug/sql/mysqld(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJMN5tpool19thread_pool_genericEFvPNS3_11worker_dataEEPS4_S6_EEEEE6_M_runEv+0x1c)[0x56258ae324b6]
2020-03-10 13:02:10 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/vardir/1583844040/157/1/data/ib_buffer_pool
2020-03-10 13:02:10 0 [Note] InnoDB: Restricted to 1014 pages due to innodb_buf_pool_dump_pct=25
2020-03-10 13:02:10 0 [Note] InnoDB: Buffer pool(s) dump completed at 200310 13:02:10
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f)[0x7fe4c13f366f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fe4c18ca6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fe4c0ab088f]
 
Thread 1 (Thread 0x7fe2477f6700 (LWP 10117)):
#0  __pthread_kill (threadid=<optimized out>, signo=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000056258aecfbb7 in my_write_core (sig=11) at /home/mleich/10.5/mysys/stacktrace.c:518
#2  0x000056258a5c8c2b in handle_fatal_signal (sig=11) at /home/mleich/10.5/sql/signal_handler.cc:343
#3  <signal handler called>
#4  0x000056258add7851 in ib_vector_size (vec=0x0) at /home/mleich/10.5/storage/innobase/include/ut0vec.ic:118
#5  0x000056258adde020 in fts_optimize_callback () at /home/mleich/10.5/storage/innobase/fts/fts0opt.cc:2926
#6  0x000056258ae32868 in tpool::task_group::execute (this=0x56258c660e80 <_ZL10task_group>, t=0x56258c660f20 <task>) at /home/mleich/10.5/tpool/task_group.cc:55
#7  0x000056258ae32c1e in tpool::task::execute (this=0x56258c660f20 <task>) at /home/mleich/10.5/tpool/task.cc:47
#8  0x000056258ae2c24e in tpool::thread_pool_generic::worker_main (this=0x56258ca3ee40, thread_var=0x56258ca765d0) at /home/mleich/10.5/tpool/tpool_generic.cc:518
#9  0x000056258ae2f6bd in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x7fe36c65d678: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x56258ae2c1b6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x7fe36c65d670: 0x56258ca3ee40, __args#0=@0x7fe36c65d668: 0x56258ca765d0) at /usr/include/c++/7/bits/invoke.h:73
#10 0x000056258ae2e9a9 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x7fe36c65d678: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x56258ae2c1b6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __args#0=@0x7fe36c65d670: 0x56258ca3ee40, __args#1=@0x7fe36c65d668: 0x56258ca765d0) at /usr/include/c++/7/bits/invoke.h:95
#11 0x000056258ae3253d in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7fe36c65d668) at /usr/include/c++/7/thread:234
#12 0x000056258ae324d7 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7fe36c65d668) at /usr/include/c++/7/thread:243
#13 0x000056258ae324b6 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7fe36c65d660) at /usr/include/c++/7/thread:186
#14 0x00007fe4c13f366f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007fe4c18ca6db in start_thread (arg=0x7fe2477f6700) at pthread_create.c:463
#16 0x00007fe4c0ab088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
Some remarkable characteristic of the shutdown crashes observed is that "fts_optimize_callback ()" is usually around.
 
/home/mleich/RQG_mleich/rqg.pl \
--gendata=conf/engines/innodb/full_text_search.zz \
--short_column_names \
--grammar=conf/engines/innodb/full_text_search.yy \
--redefine=conf/mariadb/alter_table.yy \
--redefine=conf/mariadb/instant_add.yy \
--redefine=conf/mariadb/modules/alter_table_columns.yy \
--redefine=conf/mariadb/sp.yy \
--redefine=conf/mariadb/bulk_insert.yy \
--redefine=conf/mariadb/modules/foreign_keys.yy \
--redefine=conf/mariadb/modules/locks.yy \
--redefine=conf/mariadb/modules/sql_mode.yy \
--redefine=conf/mariadb/redefine_temporary_tables.yy \
--redefine=conf/mariadb/versioning.yy \
--redefine=conf/mariadb/sequences.yy \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--duration=300 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--mysqld=--innodb-buffer-pool-size=256M \
--threads=33 \
--mysqld=--innodb_page_size=64K \
--no_mask \
... some setting being specific to the box used.
I will add more information soon.



 Comments   
Comment by Matthias Leich [ 2020-03-12 ]

Result of RQG testing on origin/bb-10.5-thiru commit d9761927fc9609d1fa2b40bd3dc7a7ff85e202b0
1. No replay of the bug
2. Compared to actual 10.5 no additional problems + no significant change in the frequency of known problems

Comment by Marko Mäkelä [ 2020-03-13 ]

The regression was introduced in MDEV-16264. fts_optimize_callback() should free fts_optimize_wq and make it as NULL when it receives a FTS_MSG_STOP message.

Generated at Thu Feb 08 09:10:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.