Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11026

Make InnoDB number of IO write/read threads dynamic

Details

    Description

      Currently static variables and require server shutdown and restart to be able to change

      • innodb_write_io_threads = n
      • innodb_read_io_threads = n

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            origin/bb-10.10-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a 2022-06-17T17:33:15+03:00
             
            1. Frequent seen on bb-10.10-MDEV-11026 trees and no more on other trees since mid 2021
                SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 in fts_is_sync_needed
                pluto:/data/results/1655481418/TBR-931$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
            2. Rare seen on 10.8 till 10.10 development trees since begin 2022-06
                SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 in create_table_info_t::check_table_options()
                pluto:/data/results/1655481418/TBR-1402$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
            3. Intentional kill of DB server, restart with success, OPTIMIZE TABLE fails with SEGV
                (gdb) bt
            #0  0x0000559749c95810 in dict_table_t::is_readable (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1847
            #1  0x000055974a2aa4ea in dict_table_t::is_accessible (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1854
            #2  0x000055974a3b8eee in fts_optimize_table_bk (slot=0x61a0003dffc8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2409
            #3  0x000055974a3bb733 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2831
            #4  0x000055974a45d9d4 in tpool::task_group::execute (this=0x55974cf416e0 <task_group>, t=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task_group.cc:68
            #5  0x000055974a45e2e7 in tpool::task::execute (this=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task.cc:32
            #6  0x000055974a44a897 in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001f280) at /data/Server/bb-10.10-MDEV-11026B/tpool/tpool_generic.cc:588
            #7  0x000055974a45d3ed in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
                @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, 
                __t=@0x60400000c5a0: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73
            #8  0x000055974a45d198 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
                @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>)
                at /usr/include/c++/9/bits/invoke.h:95
            #9  0x000055974a45d029 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x60400000c598)
                at /usr/include/c++/9/thread:244
            #10 0x000055974a45cf73 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x60400000c598)
                at /usr/include/c++/9/thread:251
            #11 0x000055974a45ced8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x60400000c590)
                at /usr/include/c++/9/thread:195
            #12 0x00007f55f5452de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #13 0x00007f55f556d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #14 0x00007f55f5140293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (gdb)
               pluto:/data/results/1655481418/no_pattern$ gdb -c ./1/data/core /data/Server_bin/bb-10.10-MDEV-11026B_asan/bin/mysqld
            4. Intentional kill of DB server, restart attempt fails   (ASAN failure first time observed)
                 SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B  /storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const
                pluto:/data/results/1655481418/TBR-1554$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-2
                
                
            
            

            mleich Matthias Leich added a comment - - edited origin/bb-10.10-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a 2022-06-17T17:33:15+03:00   1. Frequent seen on bb-10.10-MDEV-11026 trees and no more on other trees since mid 2021 SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 in fts_is_sync_needed pluto:/data/results/1655481418/TBR-931$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1 2. Rare seen on 10.8 till 10.10 development trees since begin 2022-06 SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 in create_table_info_t::check_table_options() pluto:/data/results/1655481418/TBR-1402$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1 3. Intentional kill of DB server, restart with success, OPTIMIZE TABLE fails with SEGV (gdb) bt #0 0x0000559749c95810 in dict_table_t::is_readable (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1847 #1 0x000055974a2aa4ea in dict_table_t::is_accessible (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1854 #2 0x000055974a3b8eee in fts_optimize_table_bk (slot=0x61a0003dffc8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2409 #3 0x000055974a3bb733 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2831 #4 0x000055974a45d9d4 in tpool::task_group::execute (this=0x55974cf416e0 <task_group>, t=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task_group.cc:68 #5 0x000055974a45e2e7 in tpool::task::execute (this=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task.cc:32 #6 0x000055974a44a897 in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001f280) at /data/Server/bb-10.10-MDEV-11026B/tpool/tpool_generic.cc:588 #7 0x000055974a45d3ed in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f= @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x60400000c5a0: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73 #8 0x000055974a45d198 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn= @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95 #9 0x000055974a45d029 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x60400000c598) at /usr/include/c++/9/thread:244 #10 0x000055974a45cf73 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x60400000c598) at /usr/include/c++/9/thread:251 #11 0x000055974a45ced8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x60400000c590) at /usr/include/c++/9/thread:195 #12 0x00007f55f5452de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x00007f55f556d609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f55f5140293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) pluto:/data/results/1655481418/no_pattern$ gdb -c ./1/data/core /data/Server_bin/bb-10.10-MDEV-11026B_asan/bin/mysqld 4. Intentional kill of DB server, restart attempt fails (ASAN failure first time observed) SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B /storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const pluto:/data/results/1655481418/TBR-1554$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-2

            mleich, thank you. I checked each rr replay trace as well as the core dump.

            First, let me mention failure 2, which is definitely not an InnoDB problem. Can you please file a separate bug for it, tentatively for the partitioning storage engine? The options was freed in:

            bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a

            #5  0x0000564287694428 in free_root (root=0x619001434150, MyFlags=0) at /data/Server/bb-10.10-MDEV-11026B/mysys/my_alloc.c:501
            #6  0x0000564285f9c0ec in closefrm (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table.cc:4557
            #7  0x00005642862b9912 in intern_close_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:220
            #8  0x00005642862b9b79 in tc_remove_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:258
            #9  0x00005642862baf4d in tc_release_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:449
            #10 0x0000564285aa2470 in close_thread_table (thd=0x62b00038e218, table_ptr=0x62b00038e308) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_base.cc:999
            #11 0x0000564285c84dde in handle_alter_part_error (lpt=0x7f9edddb33e0, action_completed=false, drop_partition=false, frm_install=false, reopen=true)
                at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7113
            #12 0x0000564285c89461 in fast_alter_partition_table (thd=0x62b00038e218, table=0x619001433e98, alter_info=0x7f9edddb63f0, alter_ctx=0x7f9edddb5080, create_info=0x7f9edddb6520, table_list=0x6290012e83b0)
                at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7819
            #13 0x0000564285eeae27 in mysql_alter_table (thd=0x62b00038e218, new_db=0x62b000392cd8, new_name=0x62b0003930e8, create_info=0x7f9edddb6520, table_list=0x6290012e83b0, alter_info=0x7f9edddb63f0, order_num=0, 
                order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_table.cc:10409
            #14 0x00005642860806b4 in Sql_cmd_alter_table::execute (this=0x6290012e8b50, thd=0x62b00038e218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_alter.cc:547
            #15 0x0000564285c40758 in mysql_execute_command (thd=0x62b00038e218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996
            #16 0x0000564285c4cc44 in mysql_parse (thd=0x62b00038e218, rawbuf=0x6290012e8238 "ALTER TABLE l REBUILD PARTITION p0,p1,p2,p3 /* E_R Thread31 QNO 219 CON_ID 43 */", length=80, parser_state=0x7f9edddb7b10)
                at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036
            

            A subsequent operation is attempting to read the freed memory:

            bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a

            #6  0x00007f9f0a6d108b in __asan_report_load4 () from /lib/x86_64-linux-gnu/libasan.so.5
            #7  0x0000564286d307a7 in create_table_info_t::check_table_options (this=0x7f9edf2ba2c0) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217
            #8  0x0000564286d34637 in create_table_info_t::prepare_create_table (this=0x7f9edf2ba2c0, name=0x619001cf8038 "test/l#P#p0", strict=false)
                at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11988
            #9  0x0000564286d757f3 in ha_innobase::create (this=0x61d0010ea0b8, name=0x619001cf8038 "test/l#P#p0", form=0x6190014f7d98, create_info=0x7f9edf2ba9c0, file_per_table=true, trx=0x7f9efdc5c240)
                at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13162
            #10 0x0000564286d412ba in ha_innobase::truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13930
            #11 0x00005642864889ae in handler::ha_truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/sql/handler.cc:5070
            #12 0x0000564286c1ab9b in ha_partition::truncate_partition (this=0x61d0010e96b8, alter_info=0x62b000354c20, binlog_stmt=0x7f9edf2baed0) at /data/Server/bb-10.10-MDEV-11026B/sql/ha_partition.cc:4913
            #13 0x000056428606e4a6 in Sql_cmd_alter_table_truncate_partition::execute (this=0x6290011f8ac8, thd=0x62b00034f218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition_admin.cc:959
            #14 0x0000564285c40758 in mysql_execute_command (thd=0x62b00034f218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996
            #15 0x0000564285c4cc44 in mysql_parse (thd=0x62b00034f218, rawbuf=0x6290011f8238 "ALTER TABLE l TRUNCATE PARTITION p0 /* E_R Thread24 QNO 232 CON_ID 38 */", length=72, parser_state=0x7f9edf2bbb10)
                at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036
            

            I expect this bug to affect older releases. While the TRUNCATE in InnoDB was refactored in 10.6, it has been similar ever since MDEV-13564 in MariaDB Server 10.2.19.

            The remaining failures are related to the InnoDB FULLTEXT INDEX implementation, of which thiru is the subsystem owner (and should file separate bug(s) for these).

            In Failure 1, the table was not dropped, but the cached metadata was freed in innobase_reload_table() during ALTER TABLE. There is a race condition between two InnoDB threads that handle FULLTEXT INDEX:

            (rr) reverse-continue
            Continuing.
            [Switching to Thread 3047615.3048245]
            Thread 6 hit Hardware watchpoint 2: -location slot->table
             
            Old value = (dict_table_t *) 0x0
            New value = (dict_table_t *) 0x6180000ab908
            0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679
            2679				slot->table = NULL;
            (rr) bt
            #0  0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679
            #1  0x000055ff2377a904 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2863
            …
            (rr) thread apply 3 backtrace
             
            Thread 3 (Thread 3047615.3048290):
            #0  0x0000000070000002 in syscall_traced ()
            …
            #22 0x000000006f6e31ab in __asan_report_load8 () from /lib/x86_64-linux-gnu/libasan.so.5
            #23 0x000055ff23779e0b in fts_is_sync_needed () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741
            #24 0x000055ff2377a5d1 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2816
            

            That is, during the time ASAN already noticed that slot->table is pointing to freed memory, another thread is assigning slot->table to nullptr.

            Failures 3 and 4 seem to be due to the same problem: the slot passed to fts_optimize_table_bk() points to freed memory. With the rr trace, it should be easy to debug this in more detail:

            rr replay /data/results/1655481418/TBR-1554/1/rr/latest-trace
            continue
            watch -l *(char*)0x0c307fff8628
            reverse-continue
            reverse-continue
            backtrace
            

            The memory was freed during ha_innobase::delete_table(), which was executed as part of ddl_recovery.log processing. I think that this bug should affect 10.6 already. One possible fix to this failure might be to not schedule any FTS tasks before DDL recovery has been completed.

            marko Marko Mäkelä added a comment - mleich , thank you. I checked each rr replay trace as well as the core dump. First, let me mention failure 2, which is definitely not an InnoDB problem. Can you please file a separate bug for it, tentatively for the partitioning storage engine? The options was freed in: bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a #5 0x0000564287694428 in free_root (root=0x619001434150, MyFlags=0) at /data/Server/bb-10.10-MDEV-11026B/mysys/my_alloc.c:501 #6 0x0000564285f9c0ec in closefrm (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table.cc:4557 #7 0x00005642862b9912 in intern_close_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:220 #8 0x00005642862b9b79 in tc_remove_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:258 #9 0x00005642862baf4d in tc_release_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:449 #10 0x0000564285aa2470 in close_thread_table (thd=0x62b00038e218, table_ptr=0x62b00038e308) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_base.cc:999 #11 0x0000564285c84dde in handle_alter_part_error (lpt=0x7f9edddb33e0, action_completed=false, drop_partition=false, frm_install=false, reopen=true) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7113 #12 0x0000564285c89461 in fast_alter_partition_table (thd=0x62b00038e218, table=0x619001433e98, alter_info=0x7f9edddb63f0, alter_ctx=0x7f9edddb5080, create_info=0x7f9edddb6520, table_list=0x6290012e83b0) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7819 #13 0x0000564285eeae27 in mysql_alter_table (thd=0x62b00038e218, new_db=0x62b000392cd8, new_name=0x62b0003930e8, create_info=0x7f9edddb6520, table_list=0x6290012e83b0, alter_info=0x7f9edddb63f0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_table.cc:10409 #14 0x00005642860806b4 in Sql_cmd_alter_table::execute (this=0x6290012e8b50, thd=0x62b00038e218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_alter.cc:547 #15 0x0000564285c40758 in mysql_execute_command (thd=0x62b00038e218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996 #16 0x0000564285c4cc44 in mysql_parse (thd=0x62b00038e218, rawbuf=0x6290012e8238 "ALTER TABLE l REBUILD PARTITION p0,p1,p2,p3 /* E_R Thread31 QNO 219 CON_ID 43 */", length=80, parser_state=0x7f9edddb7b10) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036 A subsequent operation is attempting to read the freed memory: bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a #6 0x00007f9f0a6d108b in __asan_report_load4 () from /lib/x86_64-linux-gnu/libasan.so.5 #7 0x0000564286d307a7 in create_table_info_t::check_table_options (this=0x7f9edf2ba2c0) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 #8 0x0000564286d34637 in create_table_info_t::prepare_create_table (this=0x7f9edf2ba2c0, name=0x619001cf8038 "test/l#P#p0", strict=false) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11988 #9 0x0000564286d757f3 in ha_innobase::create (this=0x61d0010ea0b8, name=0x619001cf8038 "test/l#P#p0", form=0x6190014f7d98, create_info=0x7f9edf2ba9c0, file_per_table=true, trx=0x7f9efdc5c240) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13162 #10 0x0000564286d412ba in ha_innobase::truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13930 #11 0x00005642864889ae in handler::ha_truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/sql/handler.cc:5070 #12 0x0000564286c1ab9b in ha_partition::truncate_partition (this=0x61d0010e96b8, alter_info=0x62b000354c20, binlog_stmt=0x7f9edf2baed0) at /data/Server/bb-10.10-MDEV-11026B/sql/ha_partition.cc:4913 #13 0x000056428606e4a6 in Sql_cmd_alter_table_truncate_partition::execute (this=0x6290011f8ac8, thd=0x62b00034f218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition_admin.cc:959 #14 0x0000564285c40758 in mysql_execute_command (thd=0x62b00034f218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996 #15 0x0000564285c4cc44 in mysql_parse (thd=0x62b00034f218, rawbuf=0x6290011f8238 "ALTER TABLE l TRUNCATE PARTITION p0 /* E_R Thread24 QNO 232 CON_ID 38 */", length=72, parser_state=0x7f9edf2bbb10) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036 I expect this bug to affect older releases. While the TRUNCATE in InnoDB was refactored in 10.6, it has been similar ever since MDEV-13564 in MariaDB Server 10.2.19. The remaining failures are related to the InnoDB FULLTEXT INDEX implementation, of which thiru is the subsystem owner (and should file separate bug(s) for these). In Failure 1, the table was not dropped, but the cached metadata was freed in innobase_reload_table() during ALTER TABLE . There is a race condition between two InnoDB threads that handle FULLTEXT INDEX : (rr) reverse-continue Continuing. [Switching to Thread 3047615.3048245] Thread 6 hit Hardware watchpoint 2: -location slot->table   Old value = (dict_table_t *) 0x0 New value = (dict_table_t *) 0x6180000ab908 0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679 2679 slot->table = NULL; (rr) bt #0 0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679 #1 0x000055ff2377a904 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2863 … (rr) thread apply 3 backtrace   Thread 3 (Thread 3047615.3048290): #0 0x0000000070000002 in syscall_traced () … #22 0x000000006f6e31ab in __asan_report_load8 () from /lib/x86_64-linux-gnu/libasan.so.5 #23 0x000055ff23779e0b in fts_is_sync_needed () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 #24 0x000055ff2377a5d1 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2816 That is, during the time ASAN already noticed that slot->table is pointing to freed memory, another thread is assigning slot->table to nullptr . Failures 3 and 4 seem to be due to the same problem: the slot passed to fts_optimize_table_bk() points to freed memory. With the rr trace, it should be easy to debug this in more detail: rr replay /data/results/1655481418/TBR-1554/1/rr/latest-trace continue watch -l *(char*)0x0c307fff8628 reverse-continue reverse-continue backtrace The memory was freed during ha_innobase::delete_table() , which was executed as part of ddl_recovery.log processing. I think that this bug should affect 10.6 already. One possible fix to this failure might be to not schedule any FTS tasks before DDL recovery has been completed.

            Failure 2 is is now reported as https://jira.mariadb.org/browse/MDEV-28937

            mleich Matthias Leich added a comment - Failure 2 is is now reported as https://jira.mariadb.org/browse/MDEV-28937

            I realized that at least my earlier comment about Failure 1 shows multiple instances of fts_optimize_callback executing concurrently, which is very wrong. That should have the same underlying cause as MDEV-28954. mleich, did you see those FTS failures after that bug was fixed?

            marko Marko Mäkelä added a comment - I realized that at least my earlier comment about Failure 1 shows multiple instances of fts_optimize_callback executing concurrently, which is very wrong. That should have the same underlying cause as MDEV-28954 . mleich , did you see those FTS failures after that bug was fixed?

            I have not seen these FTS related failures again.
            origin/bb-10.10-MDEV-11026 c8e3bcf79b3931dad9ce16fecc714d39d2247f89 2022-06-27T12:00:31+02:00
            performed well in RQG testing.
            

            mleich Matthias Leich added a comment - I have not seen these FTS related failures again. origin/bb-10.10-MDEV-11026 c8e3bcf79b3931dad9ce16fecc714d39d2247f89 2022-06-27T12:00:31+02:00 performed well in RQG testing.

            People

              wlad Vladislav Vaintroub
              jplindst Jan Lindström (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.