[MDEV-11026] Make InnoDB number of IO write/read threads dynamic - Jira

Marko Mäkelä added a comment - 2018-01-22 15:09

Instead of doing this, I would prefer a thorough analysis of how the page flushing currently works in InnoDB and how it could be improved. Do we really need so many threads being idle, or would it suffice to have one I/O handler thread that collects completed requests?

For pending reads, perhaps it would be more efficient for buf_page_get_gen() to submit a read request and then wait for the completion, in that same thread? Currently we are sleeping and retrying while the actual work happens in other threads.

For pending writes, perhaps the thread that goes through buf_pool->flush_list should asynchronously submit the write requests, and pause for collecting completion notifications if the queue of pending requests grows long enough.

Do we really need so many mostly idle threads? And do we need these parameters at all?

Marko Mäkelä added a comment - 2018-01-22 15:09 Instead of doing this, I would prefer a thorough analysis of how the page flushing currently works in InnoDB and how it could be improved. Do we really need so many threads being idle, or would it suffice to have one I/O handler thread that collects completed requests? For pending reads, perhaps it would be more efficient for buf_page_get_gen() to submit a read request and then wait for the completion, in that same thread? Currently we are sleeping and retrying while the actual work happens in other threads. For pending writes, perhaps the thread that goes through buf_pool->flush_list should asynchronously submit the write requests, and pause for collecting completion notifications if the queue of pending requests grows long enough. Do we really need so many mostly idle threads? And do we need these parameters at all?

Marko Mäkelä added a comment - 2022-05-10 12:44

A test failure needs to be addressed:

sys_vars.innodb_read_io_threads_basic 'innodb' w4 [ fail ]

        Test ended at 2022-04-08 00:09:29

CURRENT_TEST: sys_vars.innodb_read_io_threads_basic

--- /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/sys_vars/r/innodb_read_io_threads_basic.result	2022-04-05 17:21:24.000000000 +0000

+++ /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/sys_vars/r/innodb_read_io_threads_basic.reject	2022-04-08 00:09:28.924974005 +0000

@@ -17,6 +17,8 @@

 INNODB_READ_IO_THREADS	2

 select @@innodb_read_io_threads into @n;

 set global innodb_read_io_threads = 1;

+Warnings:

+Warning	1105	Could not reserve max. number of concurrent ios.Increase the /proc/sys/fs/aio-max-nr to fix.

 select @@innodb_read_io_threads;

 @@innodb_read_io_threads

@@ -27,3 +29,5 @@

 set session innodb_read_io_threads=1;

 ERROR HY000: Variable 'innodb_read_io_threads' is a GLOBAL variable and should be set with SET GLOBAL

 set global innodb_read_io_threads=@n;

+Warnings:

+Warning	1105	Could not reserve max. number of concurrent ios.Increase the /proc/sys/fs/aio-max-nr to fix.

mysqltest: Result length mismatch

Furthermore, I can observe many compilation failures, due to an unfortunately chosen base revision. It would be better to rebase this on something more recent, to get wider regression test coverage. Once rebased, this will have to be stress tested.

Marko Mäkelä added a comment - 2022-05-10 12:44 A test failure needs to be addressed: sys_vars.innodb_read_io_threads_basic 'innodb' w4 [ fail ] Test ended at 2022-04-08 00:09:29 CURRENT_TEST: sys_vars.innodb_read_io_threads_basic --- /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/sys_vars/r/innodb_read_io_threads_basic.result 2022-04-05 17:21:24.000000000 +0000 +++ /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/sys_vars/r/innodb_read_io_threads_basic.reject 2022-04-08 00:09:28.924974005 +0000 @@ -17,6 +17,8 @@ INNODB_READ_IO_THREADS 2 select @@innodb_read_io_threads into @n; set global innodb_read_io_threads = 1; +Warnings: +Warning 1105 Could not reserve max. number of concurrent ios.Increase the /proc/sys/fs/aio-max-nr to fix. select @@innodb_read_io_threads; @@innodb_read_io_threads 1 @@ -27,3 +29,5 @@ set session innodb_read_io_threads=1; ERROR HY000: Variable 'innodb_read_io_threads' is a GLOBAL variable and should be set with SET GLOBAL set global innodb_read_io_threads=@n; +Warnings: +Warning 1105 Could not reserve max. number of concurrent ios.Increase the /proc/sys/fs/aio-max-nr to fix. mysqltest: Result length mismatch Furthermore, I can observe many compilation failures, due to an unfortunately chosen base revision. It would be better to rebase this on something more recent, to get wider regression test coverage. Once rebased, this will have to be stress tested.

Vladislav Vaintroub added a comment - 2022-05-12 06:00

rebased and eliminated a warning in test

Vladislav Vaintroub added a comment - 2022-05-12 06:00 rebased and eliminated a warning in test

Marko Mäkelä added a comment - 2022-05-24 13:50

I think that this needs some stress testing.

Marko Mäkelä added a comment - 2022-05-24 13:50 I think that this needs some stress testing.

Matthias Leich added a comment - 2022-05-31 11:52

Preliminary result of testing:

There is some frequent and up till today never seen failure

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 235512][rr 3571015 235516]==3571015==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f0200cdb558 at pc 0x55f472e6ca52 bp 0x7f01dec42ab0 sp 0x7f01dec42aa0

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 235519][rr 3571015 235521]READ of size 8 at 0x7f0200cdb558 thread T45

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249109]    #0 0x55f472e6ca51 in tpool::task_group::execute(tpool::task*) /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:56

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249115]    #1 0x55f472e6d2f6 in tpool::task::execute() /data/Server/bb-10.10-MDEV-11026/tpool/task.cc:32

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249297]    #2 0x55f472e59938 in tpool::thread_pool_generic::worker_main(tpool::worker_data*) /data/Server/bb-10.10-MDEV-11026/tpool/tpool_generic.cc:588

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249311]    #3 0x55f472e6c48e in void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b748e)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249319]    #4 0x55f472e6c239 in std::__invoke_result<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>::type std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b7239)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249327]    #5 0x55f472e6c0ca in void std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b70ca)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249333]    #6 0x55f472e6c014 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator()() (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b7014)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249339]    #7 0x55f472e6bf79 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run() (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b6f79)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249341]    #8 0x7f020cf2cde3  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6de3)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249347]    #9 0x7f020d047608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249349]    #10 0x7f020cc1a292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249351]

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249361]0x7f0200cdb558 is located 150872 bytes inside of 155648-byte region [0x7f0200cb6800,0x7f0200cdc800)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249367]freed by thread T12 here:

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249373]    #0 0x7f020d7728df in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x1108df)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249697]    #1 0x55f4728ac38d in __gnu_cxx::new_allocator<tpool::aiocb>::deallocate(tpool::aiocb*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249843]    #2 0x55f4728ab47d in std::allocator_traits<std::allocator<tpool::aiocb> >::deallocate(std::allocator<tpool::aiocb>&, tpool::aiocb*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249849]    #3 0x55f4728a903b in std::_Vector_base<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_deallocate(tpool::aiocb*, unsigned long) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x2af403b)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249855]    #4 0x55f4728aa48f in std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_append(unsigned long) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x2af548f)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249857]    #5 0x55f4728a8644 in std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::resize(unsigned long) /usr/include/c++/9/bits/stl_vector.h:937

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249859]    #6 0x55f4728a6e69 in tpool::cache<tpool::aiocb>::resize(unsigned long) /data/Server/bb-10.10-MDEV-11026/tpool/tpool_structs.h:146

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249861]    #7 0x55f4728a57a2 in io_slots::resize(int, int, std::unique_lock<std::mutex>&) /data/Server/bb-10.10-MDEV-11026/storage/innobase/os/os0file.cc:142

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 249863]    #8 0x55f4728a1637 in os_aio_resize(unsigned long, unsigned long) /data/Server/bb-10.10-MDEV-11026/storage/innobase/os/os0file.cc:3807

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250193]    #9 0x55f4726860ba in innodb_update_io_thread_count /data/Server/bb-10.10-MDEV-11026/storage/innobase/handler/ha_innodb.cc:19306

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250195]    #10 0x55f472686355 in innodb_write_io_threads_update /data/Server/bb-10.10-MDEV-11026/storage/innobase/handler/ha_innodb.cc:19329

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250358]    #11 0x55f4715ec933 in sys_var_pluginvar::global_update(THD*, set_var*) /data/Server/bb-10.10-MDEV-11026/sql/sql_plugin.cc:3622

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250551]    #12 0x55f4712f280f in sys_var::update(THD*, set_var*) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:207

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250641]    #13 0x55f4712f71cb in set_var::update(THD*) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:863

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 250643]    #14 0x55f4712f6637 in sql_set_variables(THD*, List<set_var_base>*, bool) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:745

...

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 253313]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:56 in tpool::task_group::execute(tpool::task*)

# 2022-05-24T17:31:22 [3562669] | [rr 3571015 253747]Shadow bytes around the buggy address:

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c0193650: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c0193660: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c0193670: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c0193680: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c0193690: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] | =>0x0fe0c01936a0: fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c01936b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c01936c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c01936d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c01936e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

# 2022-05-24T17:31:22 [3562669] |   0x0fe0c01936f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

I am trying to simplify the test.

Matthias Leich added a comment - 2022-05-31 11:52 Preliminary result of testing: There is some frequent and up till today never seen failure # 2022-05-24T17:31:22 [3562669] | [rr 3571015 235512][rr 3571015 235516]==3571015==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f0200cdb558 at pc 0x55f472e6ca52 bp 0x7f01dec42ab0 sp 0x7f01dec42aa0 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 235519][rr 3571015 235521]READ of size 8 at 0x7f0200cdb558 thread T45 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249109] #0 0x55f472e6ca51 in tpool::task_group::execute(tpool::task*) /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:56 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249115] #1 0x55f472e6d2f6 in tpool::task::execute() /data/Server/bb-10.10-MDEV-11026/tpool/task.cc:32 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249297] #2 0x55f472e59938 in tpool::thread_pool_generic::worker_main(tpool::worker_data*) /data/Server/bb-10.10-MDEV-11026/tpool/tpool_generic.cc:588 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249311] #3 0x55f472e6c48e in void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b748e) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249319] #4 0x55f472e6c239 in std::__invoke_result<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>::type std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b7239) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249327] #5 0x55f472e6c0ca in void std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b70ca) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249333] #6 0x55f472e6c014 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator()() (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b7014) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249339] #7 0x55f472e6bf79 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run() (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x30b6f79) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249341] #8 0x7f020cf2cde3 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6de3) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249347] #9 0x7f020d047608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249349] #10 0x7f020cc1a292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249351] # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249361]0x7f0200cdb558 is located 150872 bytes inside of 155648-byte region [0x7f0200cb6800,0x7f0200cdc800) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249367]freed by thread T12 here: # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249373] #0 0x7f020d7728df in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x1108df) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249697] #1 0x55f4728ac38d in __gnu_cxx::new_allocator<tpool::aiocb>::deallocate(tpool::aiocb*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249843] #2 0x55f4728ab47d in std::allocator_traits<std::allocator<tpool::aiocb> >::deallocate(std::allocator<tpool::aiocb>&, tpool::aiocb*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249849] #3 0x55f4728a903b in std::_Vector_base<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_deallocate(tpool::aiocb*, unsigned long) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x2af403b) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249855] #4 0x55f4728aa48f in std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_append(unsigned long) (/data/Server_bin/bb-10.10-MDEV-11026_asan/bin/mariadbd+0x2af548f) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249857] #5 0x55f4728a8644 in std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::resize(unsigned long) /usr/include/c++/9/bits/stl_vector.h:937 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249859] #6 0x55f4728a6e69 in tpool::cache<tpool::aiocb>::resize(unsigned long) /data/Server/bb-10.10-MDEV-11026/tpool/tpool_structs.h:146 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249861] #7 0x55f4728a57a2 in io_slots::resize(int, int, std::unique_lock<std::mutex>&) /data/Server/bb-10.10-MDEV-11026/storage/innobase/os/os0file.cc:142 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 249863] #8 0x55f4728a1637 in os_aio_resize(unsigned long, unsigned long) /data/Server/bb-10.10-MDEV-11026/storage/innobase/os/os0file.cc:3807 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250193] #9 0x55f4726860ba in innodb_update_io_thread_count /data/Server/bb-10.10-MDEV-11026/storage/innobase/handler/ha_innodb.cc:19306 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250195] #10 0x55f472686355 in innodb_write_io_threads_update /data/Server/bb-10.10-MDEV-11026/storage/innobase/handler/ha_innodb.cc:19329 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250358] #11 0x55f4715ec933 in sys_var_pluginvar::global_update(THD*, set_var*) /data/Server/bb-10.10-MDEV-11026/sql/sql_plugin.cc:3622 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250551] #12 0x55f4712f280f in sys_var::update(THD*, set_var*) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:207 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250641] #13 0x55f4712f71cb in set_var::update(THD*) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:863 # 2022-05-24T17:31:22 [3562669] | [rr 3571015 250643] #14 0x55f4712f6637 in sql_set_variables(THD*, List<set_var_base>*, bool) /data/Server/bb-10.10-MDEV-11026/sql/set_var.cc:745 ... # 2022-05-24T17:31:22 [3562669] | [rr 3571015 253313]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026/tpool/task_group.cc:56 in tpool::task_group::execute(tpool::task*) # 2022-05-24T17:31:22 [3562669] | [rr 3571015 253747]Shadow bytes around the buggy address: # 2022-05-24T17:31:22 [3562669] | 0x0fe0c0193650: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c0193660: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c0193670: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c0193680: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c0193690: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | =>0x0fe0c01936a0: fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c01936b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c01936c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c01936d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c01936e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd # 2022-05-24T17:31:22 [3562669] | 0x0fe0c01936f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd I am trying to simplify the test.

Vladislav Vaintroub added a comment - 2022-05-31 21:59

mleich, addressed in fad5f3a6caf34d9

Vladislav Vaintroub added a comment - 2022-05-31 21:59 mleich , addressed in fad5f3a6caf34d9

Matthias Leich added a comment - 2022-06-17 11:03

origin/bb-10.10-MDEV-11026 2acc2601910d5292f1ea46e06e5affeb56dc8010 2022-06-10T14:05:00+03:00

suffers from some extreme amount of asserts hit.

Examples:

# 2022-06-16T23:03:01 [743652] | 2022-06-16 23:01:41 0x7f1aa8a46700  InnoDB: Assertion failure in file /data/Server/bb-10.10-MDEV-11026A/storage/innobase/include/ut0vec.inl line 130

# 2022-06-16T23:03:01 [743652] | InnoDB: Failing assertion: n < vec->used

# 2022-06-17T00:17:36 [1902287] | [rr 1905190 340489]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026A/storage/innobase/fts/fts0opt.cc:2772 in fts_is_sync_needed

# mysqld: /data/Server/bb-10.10-MDEV-11026A/storage/innobase/mem/mem0mem.cc:227: void

mem_heap_validate(const mem_heap_t*): Assertion `size == heap->total_size' failed.

# SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026A/storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const

I propose to merge the latest improvement of MariaDB 10.10 into that source tree

or to pick one of the failures and than I will test again.

Matthias Leich added a comment - 2022-06-17 11:03 origin/bb-10.10-MDEV-11026 2acc2601910d5292f1ea46e06e5affeb56dc8010 2022-06-10T14:05:00+03:00 suffers from some extreme amount of asserts hit. Examples: # 2022-06-16T23:03:01 [743652] | 2022-06-16 23:01:41 0x7f1aa8a46700 InnoDB: Assertion failure in file /data/Server/bb-10.10-MDEV-11026A/storage/innobase/include/ut0vec.inl line 130 # 2022-06-16T23:03:01 [743652] | InnoDB: Failing assertion: n < vec->used # 2022-06-17T00:17:36 [1902287] | [rr 1905190 340489]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026A/storage/innobase/fts/fts0opt.cc:2772 in fts_is_sync_needed # mysqld: /data/Server/bb-10.10-MDEV-11026A/storage/innobase/mem/mem0mem.cc:227: void mem_heap_validate(const mem_heap_t*): Assertion `size == heap->total_size' failed. # SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026A/storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const I propose to merge the latest improvement of MariaDB 10.10 into that source tree or to pick one of the failures and than I will test again.

Marko Mäkelä added a comment - 2022-06-17 14:45

mleich, I have updated the branch with the latest 10.10.

Marko Mäkelä added a comment - 2022-06-17 14:45 mleich , I have updated the branch with the latest 10.10.

Matthias Leich added a comment - 2022-06-20 16:08 - edited

origin/bb-10.10-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a 2022-06-17T17:33:15+03:00

1. Frequent seen on bb-10.10-MDEV-11026 trees and no more on other trees since mid 2021

    SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 in fts_is_sync_needed

    pluto:/data/results/1655481418/TBR-931$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1

2. Rare seen on 10.8 till 10.10 development trees since begin 2022-06

    SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 in create_table_info_t::check_table_options()

    pluto:/data/results/1655481418/TBR-1402$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1

3. Intentional kill of DB server, restart with success, OPTIMIZE TABLE fails with SEGV

    (gdb) bt

#0  0x0000559749c95810 in dict_table_t::is_readable (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1847

#1  0x000055974a2aa4ea in dict_table_t::is_accessible (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1854

#2  0x000055974a3b8eee in fts_optimize_table_bk (slot=0x61a0003dffc8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2409

#3  0x000055974a3bb733 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2831

#4  0x000055974a45d9d4 in tpool::task_group::execute (this=0x55974cf416e0 <task_group>, t=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task_group.cc:68

#5  0x000055974a45e2e7 in tpool::task::execute (this=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task.cc:32

#6  0x000055974a44a897 in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001f280) at /data/Server/bb-10.10-MDEV-11026B/tpool/tpool_generic.cc:588

#7  0x000055974a45d3ed in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=

    @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>,

    __t=@0x60400000c5a0: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73

#8  0x000055974a45d198 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=

    @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>)

    at /usr/include/c++/9/bits/invoke.h:95

#9  0x000055974a45d029 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x60400000c598)

    at /usr/include/c++/9/thread:244

#10 0x000055974a45cf73 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x60400000c598)

    at /usr/include/c++/9/thread:251

#11 0x000055974a45ced8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x60400000c590)

    at /usr/include/c++/9/thread:195

#12 0x00007f55f5452de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

#13 0x00007f55f556d609 in start_thread (arg=<optimized out>) at pthread_create.c:477

#14 0x00007f55f5140293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb)

   pluto:/data/results/1655481418/no_pattern$ gdb -c ./1/data/core /data/Server_bin/bb-10.10-MDEV-11026B_asan/bin/mysqld

4. Intentional kill of DB server, restart attempt fails   (ASAN failure first time observed)

     SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B  /storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const

    pluto:/data/results/1655481418/TBR-1554$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-2

Matthias Leich added a comment - 2022-06-20 16:08 - edited origin/bb-10.10-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a 2022-06-17T17:33:15+03:00 1. Frequent seen on bb-10.10-MDEV-11026 trees and no more on other trees since mid 2021 SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 in fts_is_sync_needed pluto:/data/results/1655481418/TBR-931$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1 2. Rare seen on 10.8 till 10.10 development trees since begin 2022-06 SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 in create_table_info_t::check_table_options() pluto:/data/results/1655481418/TBR-1402$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1 3. Intentional kill of DB server, restart with success, OPTIMIZE TABLE fails with SEGV (gdb) bt #0 0x0000559749c95810 in dict_table_t::is_readable (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1847 #1 0x000055974a2aa4ea in dict_table_t::is_accessible (this=0xbebebebebebebebe) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/include/dict0mem.h:1854 #2 0x000055974a3b8eee in fts_optimize_table_bk (slot=0x61a0003dffc8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2409 #3 0x000055974a3bb733 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2831 #4 0x000055974a45d9d4 in tpool::task_group::execute (this=0x55974cf416e0 <task_group>, t=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task_group.cc:68 #5 0x000055974a45e2e7 in tpool::task::execute (this=0x55974cf417a0 <task>) at /data/Server/bb-10.10-MDEV-11026B/tpool/task.cc:32 #6 0x000055974a44a897 in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001f280) at /data/Server/bb-10.10-MDEV-11026B/tpool/tpool_generic.cc:588 #7 0x000055974a45d3ed in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f= @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x60400000c5a0: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73 #8 0x000055974a45d198 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn= @0x60400000c5a8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x55974a44a6b4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95 #9 0x000055974a45d029 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x60400000c598) at /usr/include/c++/9/thread:244 #10 0x000055974a45cf73 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x60400000c598) at /usr/include/c++/9/thread:251 #11 0x000055974a45ced8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x60400000c590) at /usr/include/c++/9/thread:195 #12 0x00007f55f5452de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x00007f55f556d609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f55f5140293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) pluto:/data/results/1655481418/no_pattern$ gdb -c ./1/data/core /data/Server_bin/bb-10.10-MDEV-11026B_asan/bin/mysqld 4. Intentional kill of DB server, restart attempt fails (ASAN failure first time observed) SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/bb-10.10-MDEV-11026B /storage/innobase/include/dict0mem.h:1847 in dict_table_t::is_readable() const pluto:/data/results/1655481418/TBR-1554$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-2

Marko Mäkelä added a comment - 2022-06-21 06:36

mleich, thank you. I checked each rr replay trace as well as the core dump.

First, let me mention failure 2, which is definitely not an InnoDB problem. Can you please file a separate bug for it, tentatively for the partitioning storage engine? The options was freed in:

bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a
#5 0x0000564287694428 in free_root (root=0x619001434150, MyFlags=0) at /data/Server/bb-10.10-MDEV-11026B/mysys/my_alloc.c:501
#6 0x0000564285f9c0ec in closefrm (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table.cc:4557
#7 0x00005642862b9912 in intern_close_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:220
#8 0x00005642862b9b79 in tc_remove_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:258
#9 0x00005642862baf4d in tc_release_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:449
#10 0x0000564285aa2470 in close_thread_table (thd=0x62b00038e218, table_ptr=0x62b00038e308) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_base.cc:999
#11 0x0000564285c84dde in handle_alter_part_error (lpt=0x7f9edddb33e0, action_completed=false, drop_partition=false, frm_install=false, reopen=true)
at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7113
#12 0x0000564285c89461 in fast_alter_partition_table (thd=0x62b00038e218, table=0x619001433e98, alter_info=0x7f9edddb63f0, alter_ctx=0x7f9edddb5080, create_info=0x7f9edddb6520, table_list=0x6290012e83b0)
at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7819
#13 0x0000564285eeae27 in mysql_alter_table (thd=0x62b00038e218, new_db=0x62b000392cd8, new_name=0x62b0003930e8, create_info=0x7f9edddb6520, table_list=0x6290012e83b0, alter_info=0x7f9edddb63f0, order_num=0,
order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_table.cc:10409
#14 0x00005642860806b4 in Sql_cmd_alter_table::execute (this=0x6290012e8b50, thd=0x62b00038e218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_alter.cc:547
#15 0x0000564285c40758 in mysql_execute_command (thd=0x62b00038e218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996
#16 0x0000564285c4cc44 in mysql_parse (thd=0x62b00038e218, rawbuf=0x6290012e8238 "ALTER TABLE l REBUILD PARTITION p0,p1,p2,p3 /* E_R Thread31 QNO 219 CON_ID 43 */", length=80, parser_state=0x7f9edddb7b10)
at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036

A subsequent operation is attempting to read the freed memory:

bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a
#6 0x00007f9f0a6d108b in __asan_report_load4 () from /lib/x86_64-linux-gnu/libasan.so.5
#7 0x0000564286d307a7 in create_table_info_t::check_table_options (this=0x7f9edf2ba2c0) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217
#8 0x0000564286d34637 in create_table_info_t::prepare_create_table (this=0x7f9edf2ba2c0, name=0x619001cf8038 "test/l#P#p0", strict=false)
at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11988
#9 0x0000564286d757f3 in ha_innobase::create (this=0x61d0010ea0b8, name=0x619001cf8038 "test/l#P#p0", form=0x6190014f7d98, create_info=0x7f9edf2ba9c0, file_per_table=true, trx=0x7f9efdc5c240)
at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13162
#10 0x0000564286d412ba in ha_innobase::truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13930
#11 0x00005642864889ae in handler::ha_truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/sql/handler.cc:5070
#12 0x0000564286c1ab9b in ha_partition::truncate_partition (this=0x61d0010e96b8, alter_info=0x62b000354c20, binlog_stmt=0x7f9edf2baed0) at /data/Server/bb-10.10-MDEV-11026B/sql/ha_partition.cc:4913
#13 0x000056428606e4a6 in Sql_cmd_alter_table_truncate_partition::execute (this=0x6290011f8ac8, thd=0x62b00034f218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition_admin.cc:959
#14 0x0000564285c40758 in mysql_execute_command (thd=0x62b00034f218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996
#15 0x0000564285c4cc44 in mysql_parse (thd=0x62b00034f218, rawbuf=0x6290011f8238 "ALTER TABLE l TRUNCATE PARTITION p0 /* E_R Thread24 QNO 232 CON_ID 38 */", length=72, parser_state=0x7f9edf2bbb10)
at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036

I expect this bug to affect older releases. While the TRUNCATE in InnoDB was refactored in 10.6, it has been similar ever since ~~MDEV-13564~~ in MariaDB Server 10.2.19.

The remaining failures are related to the InnoDB FULLTEXT INDEX implementation, of which thiru is the subsystem owner (and should file separate bug(s) for these).

In Failure 1, the table was not dropped, but the cached metadata was freed in innobase_reload_table() during ALTER TABLE. There is a race condition between two InnoDB threads that handle FULLTEXT INDEX:

(rr) reverse-continue

Continuing.

[Switching to Thread 3047615.3048245]

Thread 6 hit Hardware watchpoint 2: -location slot->table

Old value = (dict_table_t *) 0x0

New value = (dict_table_t *) 0x6180000ab908

0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679

2679				slot->table = NULL;

(rr) bt

#0  0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679

#1  0x000055ff2377a904 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2863

…

(rr) thread apply 3 backtrace

Thread 3 (Thread 3047615.3048290):

#0  0x0000000070000002 in syscall_traced ()

…

#22 0x000000006f6e31ab in __asan_report_load8 () from /lib/x86_64-linux-gnu/libasan.so.5

#23 0x000055ff23779e0b in fts_is_sync_needed () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741

#24 0x000055ff2377a5d1 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2816

That is, during the time ASAN already noticed that slot->table is pointing to freed memory, another thread is assigning slot->table to nullptr.

Failures 3 and 4 seem to be due to the same problem: the slot passed to fts_optimize_table_bk() points to freed memory. With the rr trace, it should be easy to debug this in more detail:

rr replay /data/results/1655481418/TBR-1554/1/rr/latest-trace

continue

watch -l *(char*)0x0c307fff8628

reverse-continue

reverse-continue

backtrace

The memory was freed during ha_innobase::delete_table(), which was executed as part of ddl_recovery.log processing. I think that this bug should affect 10.6 already. One possible fix to this failure might be to not schedule any FTS tasks before DDL recovery has been completed.

Marko Mäkelä added a comment - 2022-06-21 06:36 mleich , thank you. I checked each rr replay trace as well as the core dump. First, let me mention failure 2, which is definitely not an InnoDB problem. Can you please file a separate bug for it, tentatively for the partitioning storage engine? The options was freed in: bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a #5 0x0000564287694428 in free_root (root=0x619001434150, MyFlags=0) at /data/Server/bb-10.10-MDEV-11026B/mysys/my_alloc.c:501 #6 0x0000564285f9c0ec in closefrm (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table.cc:4557 #7 0x00005642862b9912 in intern_close_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:220 #8 0x00005642862b9b79 in tc_remove_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:258 #9 0x00005642862baf4d in tc_release_table (table=0x619001433e98) at /data/Server/bb-10.10-MDEV-11026B/sql/table_cache.cc:449 #10 0x0000564285aa2470 in close_thread_table (thd=0x62b00038e218, table_ptr=0x62b00038e308) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_base.cc:999 #11 0x0000564285c84dde in handle_alter_part_error (lpt=0x7f9edddb33e0, action_completed=false, drop_partition=false, frm_install=false, reopen=true) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7113 #12 0x0000564285c89461 in fast_alter_partition_table (thd=0x62b00038e218, table=0x619001433e98, alter_info=0x7f9edddb63f0, alter_ctx=0x7f9edddb5080, create_info=0x7f9edddb6520, table_list=0x6290012e83b0) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition.cc:7819 #13 0x0000564285eeae27 in mysql_alter_table (thd=0x62b00038e218, new_db=0x62b000392cd8, new_name=0x62b0003930e8, create_info=0x7f9edddb6520, table_list=0x6290012e83b0, alter_info=0x7f9edddb63f0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_table.cc:10409 #14 0x00005642860806b4 in Sql_cmd_alter_table::execute (this=0x6290012e8b50, thd=0x62b00038e218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_alter.cc:547 #15 0x0000564285c40758 in mysql_execute_command (thd=0x62b00038e218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996 #16 0x0000564285c4cc44 in mysql_parse (thd=0x62b00038e218, rawbuf=0x6290012e8238 "ALTER TABLE l REBUILD PARTITION p0,p1,p2,p3 /* E_R Thread31 QNO 219 CON_ID 43 */", length=80, parser_state=0x7f9edddb7b10) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036 A subsequent operation is attempting to read the freed memory: bb-10.6-MDEV-11026 cfb6106e511cae168aaba4fa2964d2c110ed834a #6 0x00007f9f0a6d108b in __asan_report_load4 () from /lib/x86_64-linux-gnu/libasan.so.5 #7 0x0000564286d307a7 in create_table_info_t::check_table_options (this=0x7f9edf2ba2c0) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11217 #8 0x0000564286d34637 in create_table_info_t::prepare_create_table (this=0x7f9edf2ba2c0, name=0x619001cf8038 "test/l#P#p0", strict=false) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:11988 #9 0x0000564286d757f3 in ha_innobase::create (this=0x61d0010ea0b8, name=0x619001cf8038 "test/l#P#p0", form=0x6190014f7d98, create_info=0x7f9edf2ba9c0, file_per_table=true, trx=0x7f9efdc5c240) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13162 #10 0x0000564286d412ba in ha_innobase::truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/handler/ha_innodb.cc:13930 #11 0x00005642864889ae in handler::ha_truncate (this=0x61d0010ea0b8) at /data/Server/bb-10.10-MDEV-11026B/sql/handler.cc:5070 #12 0x0000564286c1ab9b in ha_partition::truncate_partition (this=0x61d0010e96b8, alter_info=0x62b000354c20, binlog_stmt=0x7f9edf2baed0) at /data/Server/bb-10.10-MDEV-11026B/sql/ha_partition.cc:4913 #13 0x000056428606e4a6 in Sql_cmd_alter_table_truncate_partition::execute (this=0x6290011f8ac8, thd=0x62b00034f218) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_partition_admin.cc:959 #14 0x0000564285c40758 in mysql_execute_command (thd=0x62b00034f218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:5996 #15 0x0000564285c4cc44 in mysql_parse (thd=0x62b00034f218, rawbuf=0x6290011f8238 "ALTER TABLE l TRUNCATE PARTITION p0 /* E_R Thread24 QNO 232 CON_ID 38 */", length=72, parser_state=0x7f9edf2bbb10) at /data/Server/bb-10.10-MDEV-11026B/sql/sql_parse.cc:8036 I expect this bug to affect older releases. While the TRUNCATE in InnoDB was refactored in 10.6, it has been similar ever since MDEV-13564 in MariaDB Server 10.2.19. The remaining failures are related to the InnoDB FULLTEXT INDEX implementation, of which thiru is the subsystem owner (and should file separate bug(s) for these). In Failure 1, the table was not dropped, but the cached metadata was freed in innobase_reload_table() during ALTER TABLE . There is a race condition between two InnoDB threads that handle FULLTEXT INDEX : (rr) reverse-continue Continuing. [Switching to Thread 3047615.3048245] Thread 6 hit Hardware watchpoint 2: -location slot->table Old value = (dict_table_t *) 0x0 New value = (dict_table_t *) 0x6180000ab908 0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679 2679 slot->table = NULL; (rr) bt #0 0x000055ff2377997d in fts_optimize_del_table (remove=0x6100000184e8) at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2679 #1 0x000055ff2377a904 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2863 … (rr) thread apply 3 backtrace Thread 3 (Thread 3047615.3048290): #0 0x0000000070000002 in syscall_traced () … #22 0x000000006f6e31ab in __asan_report_load8 () from /lib/x86_64-linux-gnu/libasan.so.5 #23 0x000055ff23779e0b in fts_is_sync_needed () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2741 #24 0x000055ff2377a5d1 in fts_optimize_callback () at /data/Server/bb-10.10-MDEV-11026B/storage/innobase/fts/fts0opt.cc:2816 That is, during the time ASAN already noticed that slot->table is pointing to freed memory, another thread is assigning slot->table to nullptr . Failures 3 and 4 seem to be due to the same problem: the slot passed to fts_optimize_table_bk() points to freed memory. With the rr trace, it should be easy to debug this in more detail: rr replay /data/results/1655481418/TBR-1554/1/rr/latest-trace continue watch -l *(char*)0x0c307fff8628 reverse-continue reverse-continue backtrace The memory was freed during ha_innobase::delete_table() , which was executed as part of ddl_recovery.log processing. I think that this bug should affect 10.6 already. One possible fix to this failure might be to not schedule any FTS tasks before DDL recovery has been completed.

Matthias Leich added a comment - 2022-06-23 16:50

Failure 2 is is now reported as https://jira.mariadb.org/browse/MDEV-28937

Matthias Leich added a comment - 2022-06-23 16:50 Failure 2 is is now reported as https://jira.mariadb.org/browse/MDEV-28937

Marko Mäkelä added a comment - 2022-06-28 13:06

I realized that at least my earlier comment about Failure 1 shows multiple instances of fts_optimize_callback executing concurrently, which is very wrong. That should have the same underlying cause as ~~MDEV-28954~~. mleich, did you see those FTS failures after that bug was fixed?

Marko Mäkelä added a comment - 2022-06-28 13:06 I realized that at least my earlier comment about Failure 1 shows multiple instances of fts_optimize_callback executing concurrently, which is very wrong. That should have the same underlying cause as MDEV-28954 . mleich , did you see those FTS failures after that bug was fixed?

Matthias Leich added a comment - 2022-06-28 17:19

I have not seen these FTS related failures again.

origin/bb-10.10-MDEV-11026 c8e3bcf79b3931dad9ce16fecc714d39d2247f89 2022-06-27T12:00:31+02:00

performed well in RQG testing.

Matthias Leich added a comment - 2022-06-28 17:19 I have not seen these FTS related failures again. origin/bb-10.10-MDEV-11026 c8e3bcf79b3931dad9ce16fecc714d39d2247f89 2022-06-27T12:00:31+02:00 performed well in RQG testing.

MariaDB Server

Make InnoDB number of IO write/read threads dynamic

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration