[MDEV-26445] innodb_undo_log_truncate is unnecessarily slow - Jira

martina342 created issue - 2021-08-20 08:24

martina342 made changes - 2021-08-20 08:26

Field	Original Value	New Value
Description	Even after the version 10.5.9 which should have fixed it, it still crashes for me --Thread 139751148128000 has waited at btr0pcur.cc line 510 for 435.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 2021-08-20 7:33:31 0 [Note] InnoDB: A semaphore wait: --Thread 139751139219200 has waited at btr0pcur.cc line 510 for 254.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 2021-08-20 7:33:31 0 [Note] InnoDB: A semaphore wait: --Thread 139751134918400 has waited at btr0pcur.cc line 510 for 511.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 InnoDB: Pending reads 0, writes 0 2021-08-20 7:33:31 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 210820 7:33:31 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.11-MariaDB-1 key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=346 max_threads=402 thread_count=330 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)max_threads = 1016043 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 Can't start addr2line /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x555b4482793e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x555b44333145] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f40b1317140] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f40b0e60ce1] /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f40b0e4a537] /usr/sbin/mariadbd(+0x62fefd)[0x555b44017efd] /usr/sbin/mariadbd(+0x626b02)[0x555b4400eb02] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x555b447c91fb] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x555b447ca592] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x555b447c8fbf] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f40b11feed0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f40b130bea7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f40b0f22def] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1030947 1030947 processes Max open files 60000 60000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1030947 1030947 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core Then after this crash, it has corrupted undo file and mariadb can't start : 2021-08-20 10:07:23 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 884310016 outside the bounds of the file: .//undo002 210820 10:07:23 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.11-MariaDB-1 key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=402 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)max_threads = 1016043 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 ??:0(my_print_stacktrace)[0x55a09b42993e] ??:0(handle_fatal_signal)[0x55a09af35145] sigaction.c:0(__restore_rt)[0x7f9b7a0e0140] linux/raise.c:51(__GI_raise)[0x7f9b79c29ce1] stdlib/abort.c:81(__GI_abort)[0x7f9b79c13537] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac19efd] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d0f2] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d137] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34b247] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34c494] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b33adb0] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b302398] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2f873a] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2fee6e] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac13662] ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const))[0x55a09b213139] ??:0(ha_initialize_handlerton(st_plugin_int))[0x55a09af380de] ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain, char const, st_plugin_int, st_mysql_sys_var))[0x55a09ad4db5a] ??:0(plugin_init(int, char, int))[0x55a09ad4ec47] ??:0(unireg_abort)[0x55a09ac7a457] ??:0(mysqld_main(int, char*))[0x55a09ac800e0] csu/libc-start.c:308(__libc_start_main)[0x7f9b79c14d0a] ??:0(_start)[0x55a09ac7446a] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1030947 1030947 processes Max open files 60000 60000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1030947 1030947 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core	Even after the version 10.5.9 which should have fixed it, it still crashes for me {noformat} --Thread 139751148128000 has waited at btr0pcur.cc line 510 for 435.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 2021-08-20 7:33:31 0 [Note] InnoDB: A semaphore wait: --Thread 139751139219200 has waited at btr0pcur.cc line 510 for 254.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 2021-08-20 7:33:31 0 [Note] InnoDB: A semaphore wait: --Thread 139751134918400 has waited at btr0pcur.cc line 510 for 511.00 seconds the semaphore: S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221 a writer (thread id 139751159285504) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file btr0pcur.cc line 257 InnoDB: Pending reads 0, writes 0 2021-08-20 7:33:31 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 210820 7:33:31 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.11-MariaDB-1 key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=346 max_threads=402 thread_count=330 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)max_threads = 1016043 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 Can't start addr2line /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x555b4482793e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x555b44333145] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f40b1317140] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f40b0e60ce1] /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f40b0e4a537] /usr/sbin/mariadbd(+0x62fefd)[0x555b44017efd] /usr/sbin/mariadbd(+0x626b02)[0x555b4400eb02] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x555b447c91fb] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x555b447ca592] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x555b447c8fbf] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f40b11feed0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f40b130bea7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f40b0f22def] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1030947 1030947 processes Max open files 60000 60000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1030947 1030947 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core {noformat} Then after this crash, it has corrupted undo file and mariadb can't start : {noformat} 2021-08-20 10:07:23 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 884310016 outside the bounds of the file: .//undo002 210820 10:07:23 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.11-MariaDB-1 key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=402 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)max_threads = 1016043 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 ??:0(my_print_stacktrace)[0x55a09b42993e] ??:0(handle_fatal_signal)[0x55a09af35145] sigaction.c:0(__restore_rt)[0x7f9b7a0e0140] linux/raise.c:51(__GI_raise)[0x7f9b79c29ce1] stdlib/abort.c:81(__GI_abort)[0x7f9b79c13537] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac19efd] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d0f2] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d137] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34b247] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34c494] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b33adb0] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b302398] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2f873a] ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2fee6e] ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac13662] ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const))[0x55a09b213139] ??:0(ha_initialize_handlerton(st_plugin_int))[0x55a09af380de] ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain, char const, st_plugin_int, st_mysql_sys_var))[0x55a09ad4db5a] ??:0(plugin_init(int, char, int))[0x55a09ad4ec47] ??:0(unireg_abort)[0x55a09ac7a457] ??:0(mysqld_main(int, char*))[0x55a09ac800e0] csu/libc-start.c:308(__libc_start_main)[0x7f9b79c14d0a] ??:0(_start)[0x55a09ac7446a] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1030947 1030947 processes Max open files 60000 60000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1030947 1030947 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core {noformat}

Marko Mäkelä added a comment - 2021-08-20 08:40

Can you please produce a stack trace of all threads at the time of the hang (from the core dump)? A link to the instructions was included in the output that you posted: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

Marko Mäkelä added a comment - 2021-08-20 08:40 Can you please produce a stack trace of all threads at the time of the hang (from the core dump)? A link to the instructions was included in the output that you posted: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

Marko Mäkelä made changes - 2021-08-20 08:40

Component/s		Storage Engine - InnoDB [ 10129 ]
Labels		need_feedback

martina342 added a comment - 2021-08-20 08:58 - edited

the mariadb binary is stripped and now the database is corrupted, i don't even know if i can fix it, i setup innodb_force_recovery = 6 to start but I don't know if I can clear the content of the undo logs, but I don't want to corrupt again the database.

EDIT: force recovery is read only, can you please tell me how to cancel undo logs ?
I tried to set innodb_undo_tablespaces = 0 but then it fails with
InnoDB: File ./ibdata1: 'aio write' returned OS error 209. Cannot continue operation

martina342 added a comment - 2021-08-20 08:58 - edited the mariadb binary is stripped and now the database is corrupted, i don't even know if i can fix it, i setup innodb_force_recovery = 6 to start but I don't know if I can clear the content of the undo logs, but I don't want to corrupt again the database. EDIT: force recovery is read only, can you please tell me how to cancel undo logs ? I tried to set innodb_undo_tablespaces = 0 but then it fails with InnoDB: File ./ibdata1: 'aio write' returned OS error 209. Cannot continue operation

martina342 added a comment - 2021-08-22 17:17 - edited

i installed mariadb 10.5 from the mariadb repo, and installed mariadb-server-core-10.5-dbgsym, so I followed the steps for the stack trace, and it crashed again (i noticed i need to add a variable "core_file" afterward)

--Thread 139462800680704 has waited at btr0pcur.cc line 510 for 556.00 seconds the semaphore:

S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227

a writer (thread id 139461606029056) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file row0ins.cc line 2633

2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:

--Thread 139461610944256 has waited at trx0trx.cc line 881 for 606.00 seconds the semaphore:

Mutex at 0x55d0b144f300, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:404, lock var 2

2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:

--Thread 139461603878656 has waited at btr0pcur.cc line 510 for 266.00 seconds the semaphore:

S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227

a writer (thread id 139461606029056) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file row0ins.cc line 2633

2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:

--Thread 139462802523904 has waited at btr0pcur.cc line 510 for 370.00 seconds the semaphore:

S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227

a writer (thread id 139461606029056) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file row0ins.cc line 2633

2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:

--Thread 139462801295104 has waited at btr0pcur.cc line 510 for 534.00 seconds the semaphore:

S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227

a writer (thread id 139461606029056) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file row0ins.cc line 2633

2021-08-21 22:49:09 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

210821 22:49:09 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=144

max_threads=402

thread_count=74

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d00358949e]

/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55d002fe7305]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7efd7f5a2140]

2021-08-21 22:49:18 0 [Note] InnoDB: Truncated .//undo001

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7efd7f0ebce1]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7efd7f0d5537]

/usr/sbin/mariadbd(+0x63724f)[0x55d002ca624f]

/usr/sbin/mariadbd(+0x62c07d)[0x55d002c9b07d]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x55d00351f60b]

/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55d0035204c2]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55d00351f3cf]

/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7efd7f489ed0]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7efd7f596ea7]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7efd7f1addef]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /home/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        0                    unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             1031116              1031116              processes

Max open files            32768                32768                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       1031116              1031116              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

martina342 added a comment - 2021-08-22 17:17 - edited i installed mariadb 10.5 from the mariadb repo, and installed mariadb-server-core-10.5-dbgsym, so I followed the steps for the stack trace, and it crashed again (i noticed i need to add a variable "core_file" afterward) --Thread 139462800680704 has waited at btr0pcur.cc line 510 for 556.00 seconds the semaphore: S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227 a writer (thread id 139461606029056) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file row0ins.cc line 2633 2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait: --Thread 139461610944256 has waited at trx0trx.cc line 881 for 606.00 seconds the semaphore: Mutex at 0x55d0b144f300, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:404, lock var 2 2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait: --Thread 139461603878656 has waited at btr0pcur.cc line 510 for 266.00 seconds the semaphore: S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227 a writer (thread id 139461606029056) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file row0ins.cc line 2633 2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait: --Thread 139462802523904 has waited at btr0pcur.cc line 510 for 370.00 seconds the semaphore: S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227 a writer (thread id 139461606029056) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file row0ins.cc line 2633 2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait: --Thread 139462801295104 has waited at btr0pcur.cc line 510 for 534.00 seconds the semaphore: S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227 a writer (thread id 139461606029056) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time write locked in file row0ins.cc line 2633 2021-08-21 22:49:09 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 210821 22:49:09 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=144 max_threads=402 thread_count=74 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d00358949e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55d002fe7305] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7efd7f5a2140] 2021-08-21 22:49:18 0 [Note] InnoDB: Truncated .//undo001 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7efd7f0ebce1] /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7efd7f0d5537] /usr/sbin/mariadbd(+0x63724f)[0x55d002ca624f] /usr/sbin/mariadbd(+0x62c07d)[0x55d002c9b07d] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x55d00351f60b] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55d0035204c2] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55d00351f3cf] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7efd7f489ed0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7efd7f596ea7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7efd7f1addef] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1031116 1031116 processes Max open files 32768 32768 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1031116 1031116 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core

martina342 added a comment - 2021-08-24 09:03

here is a full stack

GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git

Copyright (C) 2021 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.

Type "show copying" and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<https://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from /usr/sbin/mysqld...

Reading symbols from /usr/lib/debug/.build-id/67/9570f0b96cbccab866cbf085e309ed3a7e8ddc.debug...

warning: Can't open file /[aio] (deleted) during file-backed mapping note processing

[New LWP 669786]

[New LWP 258999]

[New LWP 259018]

[New LWP 258654]

[New LWP 293279]

[New LWP 259058]

[New LWP 258658]

[New LWP 293242]

[New LWP 258667]

[New LWP 293230]

[New LWP 293236]

[New LWP 258951]

[New LWP 293306]

[New LWP 293209]

[New LWP 258653]

[New LWP 293313]

[New LWP 259016]

[New LWP 259001]

[New LWP 293225]

[New LWP 293285]

[New LWP 293576]

[New LWP 258924]

[New LWP 259008]

[New LWP 293316]

[New LWP 259059]

[New LWP 293589]

[New LWP 293312]

[New LWP 293318]

[New LWP 293202]

[New LWP 293595]

[New LWP 258990]

[New LWP 259078]

[New LWP 293577]

[New LWP 293254]

[New LWP 259068]

[New LWP 293237]

[New LWP 294894]

[New LWP 258652]

[New LWP 293314]

[New LWP 259084]

[New LWP 293571]

[New LWP 293317]

[New LWP 293266]

[New LWP 293275]

[New LWP 293327]

[New LWP 259024]

[New LWP 259082]

[New LWP 294705]

[New LWP 293584]

[New LWP 293310]

[New LWP 293597]

[New LWP 259014]

[New LWP 294934]

[New LWP 293601]

[New LWP 294688]

[New LWP 294928]

[New LWP 293220]

[New LWP 258689]

[New LWP 293610]

[New LWP 293246]

[New LWP 669785]

[New LWP 258655]

[New LWP 293235]

[New LWP 259031]

[New LWP 294694]

[New LWP 293288]

[New LWP 293241]

[New LWP 428265]

[New LWP 293694]

[New LWP 293693]

[New LWP 293311]

[New LWP 294929]

[New LWP 294908]

[New LWP 293585]

[New LWP 294893]

[New LWP 293249]

[New LWP 294920]

[New LWP 294935]

[New LWP 669125]

[New LWP 293196]

[New LWP 293244]

[New LWP 293692]

[New LWP 294689]

[New LWP 309945]

[New LWP 294907]

[New LWP 293217]

[New LWP 293570]

[New LWP 670639]

[New LWP 258668]

[New LWP 293583]

[New LWP 293336]

[New LWP 601081]

[New LWP 294901]

[New LWP 294693]

[New LWP 294778]

[New LWP 294775]

[New LWP 309770]

[New LWP 294690]

[New LWP 331566]

[New LWP 293320]

[New LWP 294927]

[New LWP 304551]

[New LWP 293272]

[New LWP 298162]

[New LWP 293263]

[New LWP 294687]

[New LWP 294909]

[New LWP 294916]

[New LWP 294917]

[New LWP 258958]

[New LWP 293578]

[New LWP 259034]

[New LWP 304560]

[New LWP 258956]

[New LWP 294910]

[New LWP 294777]

[New LWP 294773]

[New LWP 304541]

[New LWP 670720]

[New LWP 293574]

[New LWP 298152]

[New LWP 295013]

[New LWP 294937]

[New LWP 298153]

[New LWP 335788]

[New LWP 293300]

[New LWP 293324]

[New LWP 294895]

[New LWP 293228]

[New LWP 294932]

[New LWP 294921]

[New LWP 293581]

[New LWP 670858]

[New LWP 293287]

[New LWP 294933]

[New LWP 258962]

[New LWP 295014]

[New LWP 300352]

[New LWP 294919]

[New LWP 294940]

[New LWP 670831]

[New LWP 294939]

[New LWP 383425]

[New LWP 295046]

[New LWP 601054]

[New LWP 304540]

[New LWP 393894]

[New LWP 293582]

[New LWP 294691]

[New LWP 294692]

[New LWP 294903]

[New LWP 309767]

[New LWP 294923]

[New LWP 428228]

[New LWP 304561]

[New LWP 259010]

[New LWP 294897]

[New LWP 309946]

[New LWP 294931]

[New LWP 298160]

[New LWP 259041]

[New LWP 294942]

[New LWP 428142]

[New LWP 383429]

[New LWP 304552]

[New LWP 298154]

[New LWP 670807]

[New LWP 293296]

[New LWP 293259]

[New LWP 293596]

[New LWP 294914]

[New LWP 309773]

[New LWP 294936]

[New LWP 670658]

[New LWP 670833]

[New LWP 670638]

[New LWP 294902]

[New LWP 259022]

[New LWP 670772]

[New LWP 294938]

[New LWP 670870]

[New LWP 293226]

[New LWP 293319]

[New LWP 304557]

[New LWP 670709]

[New LWP 670124]

[New LWP 383436]

[New LWP 670640]

[New LWP 293245]

[New LWP 670842]

[New LWP 310227]

[New LWP 670834]

[New LWP 309768]

[New LWP 670732]

[New LWP 294912]

[New LWP 670871]

[New LWP 259029]

[New LWP 428184]

[New LWP 670781]

[New LWP 670662]

[New LWP 298361]

[New LWP 293308]

[New LWP 304542]

[New LWP 293944]

[New LWP 670645]

[New LWP 295012]

[New LWP 309766]

[New LWP 670131]

[New LWP 258926]

[New LWP 304547]

[New LWP 294915]

[New LWP 670736]

[New LWP 293307]

[New LWP 259036]

[New LWP 298163]

[New LWP 259020]

[New LWP 601080]

[New LWP 428153]

[New LWP 670796]

[New LWP 670791]

[New LWP 428224]

[New LWP 293572]

[New LWP 601079]

[New LWP 670712]

[New LWP 294896]

[New LWP 298151]

[New LWP 670714]

[New LWP 259053]

[New LWP 670766]

[New LWP 298156]

[New LWP 259064]

[New LWP 258943]

[New LWP 670671]

[New LWP 601084]

[New LWP 294941]

[New LWP 304553]

[New LWP 670866]

[New LWP 670837]

[New LWP 670702]

[New LWP 428145]

[New LWP 304545]

[New LWP 294922]

[New LWP 304556]

[New LWP 298158]

[New LWP 670865]

[New LWP 670637]

[New LWP 309774]

[New LWP 670667]

[New LWP 428245]

[New LWP 670856]

[New LWP 670765]

[New LWP 670703]

[New LWP 294926]

[New LWP 259044]

[New LWP 309771]

[New LWP 259073]

[New LWP 293298]

[New LWP 298155]

[New LWP 670643]

[New LWP 298159]

[New LWP 670701]

[New LWP 298164]

[New LWP 670771]

[New LWP 670769]

[New LWP 304555]

[New LWP 258940]

[New LWP 259087]

[New LWP 309775]

[New LWP 293221]

[New LWP 334045]

[New LWP 670770]

[New LWP 670706]

[New LWP 670809]

[New LWP 383427]

[New LWP 259060]

[New LWP 298150]

[New LWP 304549]

[New LWP 259091]

[New LWP 670803]

[New LWP 670838]

[New LWP 670729]

[New LWP 669783]

[New LWP 293255]

[New LWP 670839]

[New LWP 670852]

[New LWP 294913]

[New LWP 293580]

[New LWP 293261]

[New LWP 259086]

[New LWP 304544]

[New LWP 309765]

[New LWP 298161]

[New LWP 304550]

[New LWP 293599]

[New LWP 293262]

[New LWP 298973]

[New LWP 293579]

[New LWP 293277]

[New LWP 293199]

[New LWP 304546]

[New LWP 304559]

[New LWP 293304]

[New LWP 293315]

[New LWP 294955]

[New LWP 294918]

[New LWP 293321]

[New LWP 309769]

[New LWP 293322]

[New LWP 383433]

[New LWP 397906]

[New LWP 383435]

[New LWP 293329]

[New LWP 670704]

[New LWP 670705]

[New LWP 293573]

[New LWP 670741]

[New LWP 670764]

[New LWP 293575]

[New LWP 670832]

[New LWP 670811]

[New LWP 293586]

[New LWP 670681]

[New LWP 294774]

[New LWP 294776]

[New LWP 294930]

[New LWP 294966]

[New LWP 294968]

[New LWP 304537]

[New LWP 309948]

[New LWP 428180]

[New LWP 428253]

[New LWP 393900]

[New LWP 601082]

[New LWP 670648]

[New LWP 670787]

[New LWP 670849]

[New LWP 670872]

[New LWP 259023]

[New LWP 309772]

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

--Type <RET> for more, q to quit, c to continue without paging--

Core was generated by `/usr/sbin/mariadbd'.

Program terminated with signal SIGABRT, Aborted.

#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56

56      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.

[Current thread is 1 (Thread 0x7f7455ffb700 (LWP 669786))]

martina342 added a comment - 2021-08-24 09:03 here is a full stack GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/mysqld... Reading symbols from /usr/lib/debug/.build-id/67/9570f0b96cbccab866cbf085e309ed3a7e8ddc.debug... warning: Can't open file /[aio] (deleted) during file-backed mapping note processing [New LWP 669786] [New LWP 258999] [New LWP 259018] [New LWP 258654] [New LWP 293279] [New LWP 259058] [New LWP 258658] [New LWP 293242] [New LWP 258667] [New LWP 293230] [New LWP 293236] [New LWP 258951] [New LWP 293306] [New LWP 293209] [New LWP 258653] [New LWP 293313] [New LWP 259016] [New LWP 259001] [New LWP 293225] [New LWP 293285] [New LWP 293576] [New LWP 258924] [New LWP 259008] [New LWP 293316] [New LWP 259059] [New LWP 293589] [New LWP 293312] [New LWP 293318] [New LWP 293202] [New LWP 293595] [New LWP 258990] [New LWP 259078] [New LWP 293577] [New LWP 293254] [New LWP 259068] [New LWP 293237] [New LWP 294894] [New LWP 258652] [New LWP 293314] [New LWP 259084] [New LWP 293571] [New LWP 293317] [New LWP 293266] [New LWP 293275] [New LWP 293327] [New LWP 259024] [New LWP 259082] [New LWP 294705] [New LWP 293584] [New LWP 293310] [New LWP 293597] [New LWP 259014] [New LWP 294934] [New LWP 293601] [New LWP 294688] [New LWP 294928] [New LWP 293220] [New LWP 258689] [New LWP 293610] [New LWP 293246] [New LWP 669785] [New LWP 258655] [New LWP 293235] [New LWP 259031] [New LWP 294694] [New LWP 293288] [New LWP 293241] [New LWP 428265] [New LWP 293694] [New LWP 293693] [New LWP 293311] [New LWP 294929] [New LWP 294908] [New LWP 293585] [New LWP 294893] [New LWP 293249] [New LWP 294920] [New LWP 294935] [New LWP 669125] [New LWP 293196] [New LWP 293244] [New LWP 293692] [New LWP 294689] [New LWP 309945] [New LWP 294907] [New LWP 293217] [New LWP 293570] [New LWP 670639] [New LWP 258668] [New LWP 293583] [New LWP 293336] [New LWP 601081] [New LWP 294901] [New LWP 294693] [New LWP 294778] [New LWP 294775] [New LWP 309770] [New LWP 294690] [New LWP 331566] [New LWP 293320] [New LWP 294927] [New LWP 304551] [New LWP 293272] [New LWP 298162] [New LWP 293263] [New LWP 294687] [New LWP 294909] [New LWP 294916] [New LWP 294917] [New LWP 258958] [New LWP 293578] [New LWP 259034] [New LWP 304560] [New LWP 258956] [New LWP 294910] [New LWP 294777] [New LWP 294773] [New LWP 304541] [New LWP 670720] [New LWP 293574] [New LWP 298152] [New LWP 295013] [New LWP 294937] [New LWP 298153] [New LWP 335788] [New LWP 293300] [New LWP 293324] [New LWP 294895] [New LWP 293228] [New LWP 294932] [New LWP 294921] [New LWP 293581] [New LWP 670858] [New LWP 293287] [New LWP 294933] [New LWP 258962] [New LWP 295014] [New LWP 300352] [New LWP 294919] [New LWP 294940] [New LWP 670831] [New LWP 294939] [New LWP 383425] [New LWP 295046] [New LWP 601054] [New LWP 304540] [New LWP 393894] [New LWP 293582] [New LWP 294691] [New LWP 294692] [New LWP 294903] [New LWP 309767] [New LWP 294923] [New LWP 428228] [New LWP 304561] [New LWP 259010] [New LWP 294897] [New LWP 309946] [New LWP 294931] [New LWP 298160] [New LWP 259041] [New LWP 294942] [New LWP 428142] [New LWP 383429] [New LWP 304552] [New LWP 298154] [New LWP 670807] [New LWP 293296] [New LWP 293259] [New LWP 293596] [New LWP 294914] [New LWP 309773] [New LWP 294936] [New LWP 670658] [New LWP 670833] [New LWP 670638] [New LWP 294902] [New LWP 259022] [New LWP 670772] [New LWP 294938] [New LWP 670870] [New LWP 293226] [New LWP 293319] [New LWP 304557] [New LWP 670709] [New LWP 670124] [New LWP 383436] [New LWP 670640] [New LWP 293245] [New LWP 670842] [New LWP 310227] [New LWP 670834] [New LWP 309768] [New LWP 670732] [New LWP 294912] [New LWP 670871] [New LWP 259029] [New LWP 428184] [New LWP 670781] [New LWP 670662] [New LWP 298361] [New LWP 293308] [New LWP 304542] [New LWP 293944] [New LWP 670645] [New LWP 295012] [New LWP 309766] [New LWP 670131] [New LWP 258926] [New LWP 304547] [New LWP 294915] [New LWP 670736] [New LWP 293307] [New LWP 259036] [New LWP 298163] [New LWP 259020] [New LWP 601080] [New LWP 428153] [New LWP 670796] [New LWP 670791] [New LWP 428224] [New LWP 293572] [New LWP 601079] [New LWP 670712] [New LWP 294896] [New LWP 298151] [New LWP 670714] [New LWP 259053] [New LWP 670766] [New LWP 298156] [New LWP 259064] [New LWP 258943] [New LWP 670671] [New LWP 601084] [New LWP 294941] [New LWP 304553] [New LWP 670866] [New LWP 670837] [New LWP 670702] [New LWP 428145] [New LWP 304545] [New LWP 294922] [New LWP 304556] [New LWP 298158] [New LWP 670865] [New LWP 670637] [New LWP 309774] [New LWP 670667] [New LWP 428245] [New LWP 670856] [New LWP 670765] [New LWP 670703] [New LWP 294926] [New LWP 259044] [New LWP 309771] [New LWP 259073] [New LWP 293298] [New LWP 298155] [New LWP 670643] [New LWP 298159] [New LWP 670701] [New LWP 298164] [New LWP 670771] [New LWP 670769] [New LWP 304555] [New LWP 258940] [New LWP 259087] [New LWP 309775] [New LWP 293221] [New LWP 334045] [New LWP 670770] [New LWP 670706] [New LWP 670809] [New LWP 383427] [New LWP 259060] [New LWP 298150] [New LWP 304549] [New LWP 259091] [New LWP 670803] [New LWP 670838] [New LWP 670729] [New LWP 669783] [New LWP 293255] [New LWP 670839] [New LWP 670852] [New LWP 294913] [New LWP 293580] [New LWP 293261] [New LWP 259086] [New LWP 304544] [New LWP 309765] [New LWP 298161] [New LWP 304550] [New LWP 293599] [New LWP 293262] [New LWP 298973] [New LWP 293579] [New LWP 293277] [New LWP 293199] [New LWP 304546] [New LWP 304559] [New LWP 293304] [New LWP 293315] [New LWP 294955] [New LWP 294918] [New LWP 293321] [New LWP 309769] [New LWP 293322] [New LWP 383433] [New LWP 397906] [New LWP 383435] [New LWP 293329] [New LWP 670704] [New LWP 670705] [New LWP 293573] [New LWP 670741] [New LWP 670764] [New LWP 293575] [New LWP 670832] [New LWP 670811] [New LWP 293586] [New LWP 670681] [New LWP 294774] [New LWP 294776] [New LWP 294930] [New LWP 294966] [New LWP 294968] [New LWP 304537] [New LWP 309948] [New LWP 428180] [New LWP 428253] [New LWP 393900] [New LWP 601082] [New LWP 670648] [New LWP 670787] [New LWP 670849] [New LWP 670872] [New LWP 259023] [New LWP 309772] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". --Type <RET> for more, q to quit, c to continue without paging-- Core was generated by `/usr/sbin/mariadbd'. Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56 56 ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory. [Current thread is 1 (Thread 0x7f7455ffb700 (LWP 669786))]

martina342 made changes - 2021-08-24 09:03

Labels

need_feedback

martina342 added a comment - 2021-08-24 09:07

the database is again now corrupted ....

2021-08-24 11:04:50 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 4633837568 outside the bounds of the file: .//undo002

210824 11:04:50 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=0

max_threads=402

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5625bef0e49e]

/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x5625be96c305]

sigaction.c:0(__restore_rt)[0x7fc93b1b9140]

linux/raise.c:51(__GI_raise)[0x7fc93ad02ce1]

stdlib/abort.c:81(__GI_abort)[0x7fc93acec537]

/usr/sbin/mariadbd(+0x63724f)[0x5625be62b24f]

/usr/sbin/mariadbd(+0x64bd1b)[0x5625be63fd1b]

ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x5625be63fd61]

fil/fil0fil.cc:3224(fil_report_invalid_page_access(char const*, unsigned long, unsigned long, bool))[0x5625bedf4aac]

include/fil0fil.h:535(fil_space_t::acquire())[0x5625bede2054]

buf/buf0rea.cc:344(buf_read_page(page_id_t, unsigned long))[0x5625bed9e9ca]

buf/buf0buf.cc:2960(buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool))[0x5625bed8c7a6]

trx/trx0undo.cc:854(trx_undo_mem_create_at_db_start(trx_rseg_t*, unsigned long, unsigned int, unsigned long&))[0x5625bed97914]

trx/trx0rseg.cc:431(trx_rseg_array_init())[0x5625be622daf]

trx/trx0trx.cc:740(trx_lists_init_at_db_start())[0x5625bec79319]

srv/srv0start.cc:1488(srv_start(bool))[0x5625be96ee9e]

handler/ha_innodb.cc:4025(innodb_init(void*))[0x5625be76ccca]

sql/handler.cc:645(ha_initialize_handlerton(st_plugin_int*))[0x5625be76df4d]

sql/sql_plugin.cc:1459(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x5625be68bf9f]

sql/sql_plugin.cc:1751(plugin_init(int*, char**, int))[0x5625be691919]

csu/libc-start.c:308(__libc_start_main)[0x7fc93acedd0a]

/usr/sbin/mariadbd(_start+0x2a)[0x5625be68662a]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /home/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             1031116              1031116              processes

Max open files            32768                32768                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       1031116              1031116              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

martina342 added a comment - 2021-08-24 09:07 the database is again now corrupted .... 2021-08-24 11:04:50 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 4633837568 outside the bounds of the file: .//undo002 210824 11:04:50 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=402 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5625bef0e49e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x5625be96c305] sigaction.c:0(__restore_rt)[0x7fc93b1b9140] linux/raise.c:51(__GI_raise)[0x7fc93ad02ce1] stdlib/abort.c:81(__GI_abort)[0x7fc93acec537] /usr/sbin/mariadbd(+0x63724f)[0x5625be62b24f] /usr/sbin/mariadbd(+0x64bd1b)[0x5625be63fd1b] ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x5625be63fd61] fil/fil0fil.cc:3224(fil_report_invalid_page_access(char const*, unsigned long, unsigned long, bool))[0x5625bedf4aac] include/fil0fil.h:535(fil_space_t::acquire())[0x5625bede2054] buf/buf0rea.cc:344(buf_read_page(page_id_t, unsigned long))[0x5625bed9e9ca] buf/buf0buf.cc:2960(buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool))[0x5625bed8c7a6] trx/trx0undo.cc:854(trx_undo_mem_create_at_db_start(trx_rseg_t*, unsigned long, unsigned int, unsigned long&))[0x5625bed97914] trx/trx0rseg.cc:431(trx_rseg_array_init())[0x5625be622daf] trx/trx0trx.cc:740(trx_lists_init_at_db_start())[0x5625bec79319] srv/srv0start.cc:1488(srv_start(bool))[0x5625be96ee9e] handler/ha_innodb.cc:4025(innodb_init(void*))[0x5625be76ccca] sql/handler.cc:645(ha_initialize_handlerton(st_plugin_int*))[0x5625be76df4d] sql/sql_plugin.cc:1459(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x5625be68bf9f] sql/sql_plugin.cc:1751(plugin_init(int*, char**, int))[0x5625be691919] csu/libc-start.c:308(__libc_start_main)[0x7fc93acedd0a] /usr/sbin/mariadbd(_start+0x2a)[0x5625be68662a] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /home/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 1031116 1031116 processes Max open files 32768 32768 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1031116 1031116 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core

martina342 made changes - 2021-08-25 08:11

Attachment

crash.log [ 58683 ]

martina342 added a comment - 2021-08-25 08:11

here is a crash with really full stack trace (attached)

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041117320960 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041117013760 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041116706560 has waited at dict0dict.cc line 1083 for 38.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041116399360 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041116092160 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041115784960 has waited at dict0dict.cc line 1083 for 32.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041115477760 has waited at dict0dict.cc line 1083 for 30.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041115170560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041114863360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041114556160 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041114248960 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041113941760 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041113634560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

2021-08-25  8:04:11 0 [Note] InnoDB: A semaphore wait:

--Thread 140041113327360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:

Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2

InnoDB: Pending reads 0, writes 0

2021-08-25  8:04:11 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

210825  8:04:11 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=347

max_threads=402

thread_count=347

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55c5e7c0549e]

/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55c5e7663305]

sigaction.c:0(__restore_rt)[0x7f84317a0140]

linux/raise.c:51(__GI_raise)[0x7f84312e9ce1]

stdlib/abort.c:81(__GI_abort)[0x7f84312d3537]

/usr/sbin/mariadbd(+0x63724f)[0x55c5e732224f]

/usr/sbin/mariadbd(+0x62c07d)[0x55c5e731707d]

ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x55c5e7b9b60b]

srv/srv0srv.cc:1337(srv_monitor_task.cold)[0x55c5e7b9c4c2]

bits/atomic_base.h:404(std::__atomic_base<bool>::store(bool, std::memory_order))[0x55c5e7b9b3cf]

2021-08-25  8:04:14 0 [Note] InnoDB: Truncated .//undo002

??:0(std::error_code::default_error_condition() const)[0x7f8431687ed0]

nptl/pthread_create.c:478(start_thread)[0x7f8431794ea7]

x86_64/clone.S:97(__GI___clone)[0x7f84313abdef]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

crash.log

martina342 added a comment - 2021-08-25 08:11 here is a crash with really full stack trace (attached) 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041117320960 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041117013760 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041116706560 has waited at dict0dict.cc line 1083 for 38.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041116399360 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041116092160 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041115784960 has waited at dict0dict.cc line 1083 for 32.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041115477760 has waited at dict0dict.cc line 1083 for 30.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041115170560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041114863360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041114556160 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041114248960 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041113941760 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041113634560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait: --Thread 140041113327360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore: Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2 InnoDB: Pending reads 0, writes 0 2021-08-25 8:04:11 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 210825 8:04:11 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=347 max_threads=402 thread_count=347 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55c5e7c0549e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55c5e7663305] sigaction.c:0(__restore_rt)[0x7f84317a0140] linux/raise.c:51(__GI_raise)[0x7f84312e9ce1] stdlib/abort.c:81(__GI_abort)[0x7f84312d3537] /usr/sbin/mariadbd(+0x63724f)[0x55c5e732224f] /usr/sbin/mariadbd(+0x62c07d)[0x55c5e731707d] ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x55c5e7b9b60b] srv/srv0srv.cc:1337(srv_monitor_task.cold)[0x55c5e7b9c4c2] bits/atomic_base.h:404(std::__atomic_base<bool>::store(bool, std::memory_order))[0x55c5e7b9b3cf] 2021-08-25 8:04:14 0 [Note] InnoDB: Truncated .//undo002 ??:0(std::error_code::default_error_condition() const)[0x7f8431687ed0] nptl/pthread_create.c:478(start_thread)[0x7f8431794ea7] x86_64/clone.S:97(__GI___clone)[0x7f84313abdef] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. crash.log

martina342 added a comment - 2021-08-25 08:15

the full log at the time of crash log.txt

martina342 added a comment - 2021-08-25 08:15 the full log at the time of crash log.txt

martina342 made changes - 2021-08-25 08:16

Attachment

log.txt [ 58684 ]

martina342 added a comment - 2021-08-25 16:42 - edited

@Marko Mäkelä, do you have all the informations needed or do you need more stack ?

martina342 added a comment - 2021-08-25 16:42 - edited @Marko Mäkelä, do you have all the informations needed or do you need more stack ?

Marko Mäkelä added a comment - 2021-08-31 14:57

martina342, the InnoDB "semaphore wait" output is insufficient for analyzing hangs.

https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ uses the terms "backtrace" or "bt" to refer to stack traces.

Marko Mäkelä added a comment - 2021-08-31 14:57 martina342 , the InnoDB "semaphore wait" output is insufficient for analyzing hangs. https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ uses the terms "backtrace" or "bt" to refer to stack traces.

martina342 added a comment - 2021-08-31 15:00

can you please check crash.log ? It is the "backtrace" (bt)

martina342 added a comment - 2021-08-31 15:00 can you please check crash.log ? It is the "backtrace" (bt)

Marko Mäkelä added a comment - 2021-08-31 15:15

Sorry, indeed, crash.log it is. I see page_id=... in many stack traces. The following command should prevent such hiding:

set print frame-arguments all

I see lock=0x7f83dc1c2578 being passed to pfs_rw_lock_s_lock_func() in many threads. Based on the stack traces, that is an index page. Thread 44 is waiting for the exclusive page latch, which will cause any subsequent shared latch requests to wait until the exclusive latch has been granted and released. That purge thread is also by design holding dict_sys.latch in shared mode (this was just fixed in ~~MDEV-24258~~). That dict_sys.latch which in turn will block some other thread that is waiting for dict_sys.mutex. But, I cannot quickly determine which thread is blocking Thread 44. It might even be impossible based on the available information.

These stack traces are definitely a step to the right direction to analyzing this, but it would be very helpful to see some more data from the core dump. If you still have the core dump from which crash.log was generated, then please provide the following:

print dict_sys.latch

print dict_sys.mutex

If you have to generate a new core dump first, then I would also need the stack traces for that core dump.

Theoretically, I could go through all 361 stack traces and try to match them with the source code, to see which threads could possibly be holding the exclusive latch, but I would like to save that effort.

Marko Mäkelä added a comment - 2021-08-31 15:15 Sorry, indeed, crash.log it is. I see page_id=... in many stack traces. The following command should prevent such hiding: set print frame-arguments all I see lock=0x7f83dc1c2578 being passed to pfs_rw_lock_s_lock_func() in many threads. Based on the stack traces, that is an index page. Thread 44 is waiting for the exclusive page latch, which will cause any subsequent shared latch requests to wait until the exclusive latch has been granted and released. That purge thread is also by design holding dict_sys.latch in shared mode (this was just fixed in MDEV-24258 ). That dict_sys.latch which in turn will block some other thread that is waiting for dict_sys.mutex . But, I cannot quickly determine which thread is blocking Thread 44. It might even be impossible based on the available information. These stack traces are definitely a step to the right direction to analyzing this, but it would be very helpful to see some more data from the core dump. If you still have the core dump from which crash.log was generated, then please provide the following: print dict_sys.latch print dict_sys.mutex If you have to generate a new core dump first, then I would also need the stack traces for that core dump. Theoretically, I could go through all 361 stack traces and try to match them with the source code, to see which threads could possibly be holding the exclusive latch, but I would like to save that effort.

martina342 made changes - 2021-08-31 15:24

Attachment

crash2.log [ 58802 ]

martina342 added a comment - 2021-08-31 15:25 - edited

I have a new crash log from a new core dump

crash2.log

(gdb) print dict_sys.latch

$1 = {<ilist_node<void>> = {next = 0x55f6b7f887a0, prev = 0x55f6b7f8a850}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = {

    m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x55f6b7f88e08, wait_ex_event = 0x55f6b7f88e98,

  cfile_name = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x55f61010fdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 4,

  pfs_psi = 0x0}

(gdb) print dict_sys.mutex

$2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x55f6b7f88d58, m_policy = {

      m_filename = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0}

martina342 added a comment - 2021-08-31 15:25 - edited I have a new crash log from a new core dump crash2.log (gdb) print dict_sys.latch $1 = {<ilist_node<void>> = {next = 0x55f6b7f887a0, prev = 0x55f6b7f8a850}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = { m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x55f6b7f88e08, wait_ex_event = 0x55f6b7f88e98, cfile_name = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x55f61010fdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 4, pfs_psi = 0x0} (gdb) print dict_sys.mutex $2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x55f6b7f88d58, m_policy = { m_filename = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0}

Alice Sherepa made changes - 2021-08-31 16:28

Link

This issue relates to MDEV-24911 [ MDEV-24911 ]

Marko Mäkelä added a comment - 2021-09-01 15:09

Is crash2.log from MariaDB Server 10.5.12? The line number in srv_monitor_task() suggests so. I only see 3 InnoDB threads being executed: the purge coordinator in Thread 63 (busy, but not waiting for anything), the page cleaner (idle) in Thread 42, and the crashing srv_monitor_task() in Thread 1. That looks like a false alarm; maybe a larger value of innodb_fatal_semaphore_wait_threshold could help avoid that.

In any case, the stack traces are completely different from crash.log , because there is no os_event::wait to be seen.

Marko Mäkelä added a comment - 2021-09-01 15:09 Is crash2.log from MariaDB Server 10.5.12? The line number in srv_monitor_task() suggests so. I only see 3 InnoDB threads being executed: the purge coordinator in Thread 63 (busy, but not waiting for anything), the page cleaner (idle) in Thread 42, and the crashing srv_monitor_task() in Thread 1. That looks like a false alarm; maybe a larger value of innodb_fatal_semaphore_wait_threshold could help avoid that. In any case, the stack traces are completely different from crash.log , because there is no os_event::wait to be seen.

martina342 added a comment - 2021-09-01 15:15

yes it's 10.5.12

martina342 added a comment - 2021-09-01 15:15 yes it's 10.5.12

Marko Mäkelä added a comment - 2021-09-01 15:25

martina342 Good, I would tend to believe that some fixes in 10.5.12 (such as ~~MDEV-25801~~) could have removed the original slowness that you reported.

I do not have any idea what could trigger a false alarm in the InnoDB watchdog. The watchdog has remained largely unchanged for several years, and the last time I remember fixing a false alarm was some 10 years ago. In 10.6, I removed the "sync array" and rewrote the watchdog to cover only the dict_sys.mutex or dict_sys.latch.

You might want to ensure that the InnoDB buffer pool and redo log are configured to be large enough for your workload.

Marko Mäkelä added a comment - 2021-09-01 15:25 martina342 Good, I would tend to believe that some fixes in 10.5.12 (such as MDEV-25801 ) could have removed the original slowness that you reported. I do not have any idea what could trigger a false alarm in the InnoDB watchdog. The watchdog has remained largely unchanged for several years, and the last time I remember fixing a false alarm was some 10 years ago. In 10.6, I removed the "sync array" and rewrote the watchdog to cover only the dict_sys.mutex or dict_sys.latch . You might want to ensure that the InnoDB buffer pool and redo log are configured to be large enough for your workload.

martina342 added a comment - 2021-09-01 15:38

i'm not sure if the crash2 was the first crash after the semaphore lock. Maybe it was the crash once the data was corrupted by the first crash and I tried to restart mariadb.

I will send you another crash with the correct settings if I have another crash that I'm sure is the one after lock

martina342 added a comment - 2021-09-01 15:38 i'm not sure if the crash2 was the first crash after the semaphore lock. Maybe it was the crash once the data was corrupted by the first crash and I tried to restart mariadb. I will send you another crash with the correct settings if I have another crash that I'm sure is the one after lock

martina342 made changes - 2021-09-04 19:03

Attachment

crash3.log [ 58849 ]

martina342 added a comment - 2021-09-04 19:03

Hello

it crashed again.

2021-09-04 20:39:02 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 1000 seconds. We intentionally crash the server because it appears to be hung.

210904 20:39:02 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=197

max_threads=402

thread_count=197

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x559dbb56749e]

/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x559dbafc5305]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f4c57031140]

2021-09-04 20:39:11 0 [Note] InnoDB: Truncated .//undo001

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f4c56b7ace1]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f4c56b64537]

/usr/sbin/mariadbd(+0x63724f)[0x559dbac8424f]

/usr/sbin/mariadbd(+0x62c07d)[0x559dbac7907d]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x559dbb4fd60b]

/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x559dbb4fe4c2]

2021-09-04 20:39:18 0 [Note] InnoDB: Truncating .//undo002

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x559dbb4fd3cf]

/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f4c56f18ed0]

2021-09-04 20:39:23 0 [Note] InnoDB: Truncated .//undo002

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f4c57025ea7]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4c56c3cdef]

(gdb) print dict_sys.latch

$1 = {<ilist_node<void>> = {next = 0x559e6324d840, prev = 0x559e6324f8c0}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = {

    m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x559e6324de78, wait_ex_event = 0x559e6324df08,

  cfile_name = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x559dbb7abdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 3872,

  pfs_psi = 0x0}

(gdb) print dict_sys.mutex

$2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x559e6324ddc8, m_policy = {

      m_filename = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0}

the backtrace is attached on crash3.log

martina342 added a comment - 2021-09-04 19:03 Hello it crashed again. 2021-09-04 20:39:02 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 1000 seconds. We intentionally crash the server because it appears to be hung. 210904 20:39:02 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=197 max_threads=402 thread_count=197 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x559dbb56749e] /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x559dbafc5305] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f4c57031140] 2021-09-04 20:39:11 0 [Note] InnoDB: Truncated .//undo001 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f4c56b7ace1] /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f4c56b64537] /usr/sbin/mariadbd(+0x63724f)[0x559dbac8424f] /usr/sbin/mariadbd(+0x62c07d)[0x559dbac7907d] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x559dbb4fd60b] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x559dbb4fe4c2] 2021-09-04 20:39:18 0 [Note] InnoDB: Truncating .//undo002 /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x559dbb4fd3cf] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f4c56f18ed0] 2021-09-04 20:39:23 0 [Note] InnoDB: Truncated .//undo002 /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f4c57025ea7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4c56c3cdef] (gdb) print dict_sys.latch $1 = {<ilist_node<void>> = {next = 0x559e6324d840, prev = 0x559e6324f8c0}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = { m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x559e6324de78, wait_ex_event = 0x559e6324df08, cfile_name = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x559dbb7abdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 3872, pfs_psi = 0x0} (gdb) print dict_sys.mutex $2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x559e6324ddc8, m_policy = { m_filename = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0} the backtrace is attached on crash3.log

martina342 made changes - 2021-09-06 08:37

Attachment

semaphore without crash.txt [ 58852 ]

martina342 added a comment - 2021-09-06 08:37

hello,

today I got logs of semaphore wait, but which didn't resultet in a crash.
See attached semaphore without crash.txt

martina342 added a comment - 2021-09-06 08:37 hello, today I got logs of semaphore wait, but which didn't resultet in a crash. See attached semaphore without crash.txt

martina342 added a comment - 2021-09-06 09:22 - edited

update : it's waiting for semaphore right now again, with all requests frozen on the server.
They are stuck in "statistics" or "sending data", they are very easy requests.

I have a wordpress website and each visitor generate this request stuck on "statistics" :
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'

I have tens of this request on my process list.
This table have 3 853 rows for 20mb

edit: it unstucked itself after 15 minutes, all requests were processed

martina342 added a comment - 2021-09-06 09:22 - edited update : it's waiting for semaphore right now again, with all requests frozen on the server. They are stuck in "statistics" or "sending data", they are very easy requests. I have a wordpress website and each visitor generate this request stuck on "statistics" : SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes' I have tens of this request on my process list. This table have 3 853 rows for 20mb edit: it unstucked itself after 15 minutes, all requests were processed

Marko Mäkelä added a comment - 2021-09-06 15:42

semaphore without crash.txt does not include enough information for analysis. For crash3.log it does not look like dict_sys is really locked. I only see 3 threads whose stack frames contain the string storage/innobase. None of them are waiting for anything resource inside InnoDB. The monitor task is killing the server, and the 2 other threads are actively executing something.

I think that your system is misconfigured in some way, because such output is not normal. It could be useful to check with sudo perf record or http://poormansprofiler.org or similar tools what is going on. But, I cannot provide free support via this bug reporting system.

Marko Mäkelä added a comment - 2021-09-06 15:42 semaphore without crash.txt does not include enough information for analysis. For crash3.log it does not look like dict_sys is really locked. I only see 3 threads whose stack frames contain the string storage/innobase . None of them are waiting for anything resource inside InnoDB. The monitor task is killing the server, and the 2 other threads are actively executing something. I think that your system is misconfigured in some way, because such output is not normal. It could be useful to check with sudo perf record or http://poormansprofiler.org or similar tools what is going on. But, I cannot provide free support via this bug reporting system.

martina342 made changes - 2021-09-09 11:42

Attachment

stackfreeze [ 58896 ]

martina342 made changes - 2021-09-09 12:03

Attachment

stackfreeze2 [ 58898 ]

martina342 made changes - 2021-09-09 12:11

Attachment

stackfreeze3 [ 58899 ]

martina342 added a comment - 2021-09-09 12:12 - edited

i upgraded to mariadb 10.6, but it's still happening. It's still freezing, but don't crash anymore
the command "top" doesn't show a lot of cpu usage (it show some cpu usage for mariadb)
iotop show zero io for mariadb, and almost zero read / write or io % for other processes

the datadir of mariadb doesn't show any modification since than the time where it started freezing.
I followed this http://poormansprofiler.org/ and executed " gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mariadbd)"
here is the file generated stackfreeze

i don't see anything special on the syslog
mdstat doesn't show any problem with the raid
smartctl doesn't show any error for any of the 2 nvme drives

all requests are stuck. Like
"SHOW STATUS" is stuck on "Filling schema table"
"SHOW ENGINE INNODB STATUS" is stuck on "starting"
others are stuck on "update" or "statistics"

testing the disk doesn't show error :
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.30623 s, 466 MB/s

I have a undo002 file that is 35gb large and at 13h01 it started truncating it, and at 13h25 mariadb stopped doing anything.
My ib_logfile0 is 37gb large.

Here is my mariadb configuration :

Import all .cnf files from configuration directory
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mariadb.conf.d/

[mysql]

[mysqld]
datadir = /home/mysql
tmpdir = /home/mysqltmp/
default-time-zone = Europe/Paris
wait_timeout = 43200
max_allowed_packet = 64M
query_cache_size = 0
query_cache_type = 0
innodb_buffer_pool_size = 150G
innodb_log_file_size = 37G
innodb_buffer_pool_instances = 64
innodb_undo_log_truncate = ON
skip-name-resolve = 1
tmp_table_size = 256M
max_heap_table_size = 256M
max_connections = 400
net_read_timeout = 250600
net_write_timeout = 250600
join_buffer_size = 1MB
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
innodb-defragment = 1
innodb_online_alter_log_max_size= 50000000000
table_definition_cache = 1000
innodb_undo_tablespaces = 2
table_open_cache = 30000
log_error = /var/log/mysql/error.log
innodb_fatal_semaphore_wait_threshold = 2000

[mariadb]
core_file

I notice a file ddl_recovery.log was created at 13h21

i tried to kill some of the oldest request from processlist, but they stay on processlist with the "Killed" command

i modified the gdb command to add frame arguments :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt" --batch -p $(pidof mariadbd)
here is the output
stackfreeze2

When i executed the command, it showed this "29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory." . I don't know if it's relevant.

in my opinion, there is something wrong in mariadb, not my server configuration. And if I have some mariadb configuration not appropriate for my workload, it shouldn't completely freeze it

here is a third stack generated, in case you would like to compare with the previous one and see if mariadb did anything in between
stackfreeze3

I have set innodb_fatal_semaphore_wait_threshold = 2000
but now, 45 minutes after it started freezeing (it's still freezeing), it didn't crashed or generated logs about semaphore wait

martina342 added a comment - 2021-09-09 12:12 - edited i upgraded to mariadb 10.6, but it's still happening. It's still freezing, but don't crash anymore the command "top" doesn't show a lot of cpu usage (it show some cpu usage for mariadb) iotop show zero io for mariadb, and almost zero read / write or io % for other processes the datadir of mariadb doesn't show any modification since than the time where it started freezing. I followed this http://poormansprofiler.org/ and executed " gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mariadbd)" here is the file generated stackfreeze i don't see anything special on the syslog mdstat doesn't show any problem with the raid smartctl doesn't show any error for any of the 2 nvme drives all requests are stuck. Like "SHOW STATUS" is stuck on "Filling schema table" "SHOW ENGINE INNODB STATUS" is stuck on "starting" others are stuck on "update" or "statistics" testing the disk doesn't show error : dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.30623 s, 466 MB/s I have a undo002 file that is 35gb large and at 13h01 it started truncating it, and at 13h25 mariadb stopped doing anything. My ib_logfile0 is 37gb large. Here is my mariadb configuration : Import all .cnf files from configuration directory !includedir /etc/mysql/conf.d/ !includedir /etc/mysql/mariadb.conf.d/ [mysql] [mysqld] datadir = /home/mysql tmpdir = /home/mysqltmp/ default-time-zone = Europe/Paris wait_timeout = 43200 max_allowed_packet = 64M query_cache_size = 0 query_cache_type = 0 innodb_buffer_pool_size = 150G innodb_log_file_size = 37G innodb_buffer_pool_instances = 64 innodb_undo_log_truncate = ON skip-name-resolve = 1 tmp_table_size = 256M max_heap_table_size = 256M max_connections = 400 net_read_timeout = 250600 net_write_timeout = 250600 join_buffer_size = 1MB sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION innodb-defragment = 1 innodb_online_alter_log_max_size= 50000000000 table_definition_cache = 1000 innodb_undo_tablespaces = 2 table_open_cache = 30000 log_error = /var/log/mysql/error.log innodb_fatal_semaphore_wait_threshold = 2000 [mariadb] core_file I notice a file ddl_recovery.log was created at 13h21 i tried to kill some of the oldest request from processlist, but they stay on processlist with the "Killed" command i modified the gdb command to add frame arguments : gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt" --batch -p $(pidof mariadbd) here is the output stackfreeze2 When i executed the command, it showed this "29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory." . I don't know if it's relevant. in my opinion, there is something wrong in mariadb, not my server configuration. And if I have some mariadb configuration not appropriate for my workload, it shouldn't completely freeze it here is a third stack generated, in case you would like to compare with the previous one and see if mariadb did anything in between stackfreeze3 I have set innodb_fatal_semaphore_wait_threshold = 2000 but now, 45 minutes after it started freezeing (it's still freezeing), it didn't crashed or generated logs about semaphore wait

martina342 made changes - 2021-09-09 13:02

Attachment

showenginecensored.txt [ 58901 ]

martina342 added a comment - 2021-09-09 13:03 - edited

update : it partially unfroze for a moment and let through some commands at 14h51.
"SHOW ENGINE INNODB STATUS;" worked after waiting for more than an hour

here is the output (censored)
showenginecensored.txt

on the mariadb data folder, the file ib_logfile0 have been updated at 14h51, no other file have modification time changed since 13h23

the processlist still shows a lot of request stuck since the beginning of the freeze. I don't know why some request worked

edit: the mariadb log show some sort aborted, that was the requests on "Creating sort index" that i killed :
2021-09-09 14:51:41 63364 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 63364, query: SELECT ***
2021-09-09 14:51:41 58831 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58831, query: SELECT ***
2021-09-09 14:51:41 55718 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 55718, query: SELECT ***
2021-09-09 14:51:41 58784 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58784, query: SELECT ***
2021-09-09 14:51:41 45088 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 45088, query: SELECT ***
2021-09-09 14:51:41 58798 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58798, query: SELECT ***

not all requests that i killed earlier are gone, i still see some of them with the "killed" command

martina342 added a comment - 2021-09-09 13:03 - edited update : it partially unfroze for a moment and let through some commands at 14h51. "SHOW ENGINE INNODB STATUS;" worked after waiting for more than an hour here is the output (censored) showenginecensored.txt on the mariadb data folder, the file ib_logfile0 have been updated at 14h51, no other file have modification time changed since 13h23 the processlist still shows a lot of request stuck since the beginning of the freeze. I don't know why some request worked edit: the mariadb log show some sort aborted, that was the requests on "Creating sort index" that i killed : 2021-09-09 14:51:41 63364 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 63364, query: SELECT *** 2021-09-09 14:51:41 58831 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58831, query: SELECT *** 2021-09-09 14:51:41 55718 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 55718, query: SELECT *** 2021-09-09 14:51:41 58784 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58784, query: SELECT *** 2021-09-09 14:51:41 45088 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 45088, query: SELECT *** 2021-09-09 14:51:41 58798 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58798, query: SELECT *** not all requests that i killed earlier are gone, i still see some of them with the "killed" command

martina342 added a comment - 2021-09-13 14:37 - edited

i cleared completely the mariadb server (removed mysql data folder, recreate from zero) and I am importing from mysqldump's .sql files, and it still freezes :

Id      User    Host    db      Command Time    State   Info    Progress

12407   root    localhost       db  Query   535     Update  INSERT INTO `import` VALUES (4339807,1992787),(4339807,71207374),(4339808,27759    0.000

22303   root    localhost       sitewordpress   Query   371     creating table  CREATE TABLE `wbk_appointments` (\n  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\n  `name` varchar(  0.000

22332   root    localhost       NULL    Killed  347     Filling schema table    SHOW STATUS     0.000

22493   root    localhost       NULL    Killed  187     Filling schema table    show global status like '%dirty%'       0.000

22568   root    localhost       bdd_principale  Sleep   114             NULL    0.000

22684   root    localhost       NULL    Query   0       starting        show processlist        0.000

gdb output : stillstuckmariadb.txt

it unstuck itself after 15 minutes. The server was doing nothing else, wasn't overloaded

martina342 added a comment - 2021-09-13 14:37 - edited i cleared completely the mariadb server (removed mysql data folder, recreate from zero) and I am importing from mysqldump's .sql files, and it still freezes : Id User Host db Command Time State Info Progress 12407 root localhost db Query 535 Update INSERT INTO `import` VALUES (4339807,1992787),(4339807,71207374),(4339808,27759 0.000 22303 root localhost sitewordpress Query 371 creating table CREATE TABLE `wbk_appointments` (\n `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\n `name` varchar( 0.000 22332 root localhost NULL Killed 347 Filling schema table SHOW STATUS 0.000 22493 root localhost NULL Killed 187 Filling schema table show global status like '%dirty%' 0.000 22568 root localhost bdd_principale Sleep 114 NULL 0.000 22684 root localhost NULL Query 0 starting show processlist 0.000 gdb output : stillstuckmariadb.txt it unstuck itself after 15 minutes. The server was doing nothing else, wasn't overloaded

martina342 made changes - 2021-09-13 14:37

Attachment

stillstuckmariadb.txt [ 58945 ]

martina342 made changes - 2021-09-13 16:21

Attachment

show status.txt [ 58948 ]

martina342 made changes - 2021-09-13 16:21

Attachment

stillstuckmariadb3 [ 58949 ]

martina342 added a comment - 2021-09-13 16:21

freeze again, one simple insert stuck again. But this time, "SHOW ENGINE INNODB STATUS;" or "show status" works

Id User Host db Command Time State Info Progress
12407 root localhost bdd_principale Query 4466 Update INSERT INTO `**` VALUES (2510****RE 0.000
23056 root localhost bdd_principale Query 5802 Waiting for table metadata lock SELECT eff*****, 0.000
25326 root localhost NULL Sleep 50 NULL 0.000
29037 root localhost NULL Query 0 starting show processlist 0.000

the second query waiting for table metadata is because it's on the same table. The insert is stuck for more than 1 hour.

2021-09-13 18:16:57 0x7fe7d01f6700 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 48 seconds

-----------------

BACKGROUND THREAD

-----------------

srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 17624 srv_idle

srv_master_thread log flush and writes: 17624

----------

SEMAPHORES

----------

------------

TRANSACTIONS

------------

Trx id counter 90242

Purge done for trx's n:o < 70633 undo n:o < 0 state: running

History list length 9692

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 90241, ACTIVE 4416 sec inserting

mysql tables in use 1, locked 1

1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 342

MariaDB thread id 12407, OS thread handle 140633604970240, query id 83758 localhost root Update

INSERT INTO `****` VALUES (2510******

--------

FILE I/O

--------

Pending flushes (fsync) log: 0; buffer pool: 0

206 OS file reads, 1454819 OS file writes, 89726 OS fsyncs

0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 1, free list len 0, seg size 2, 0 merges

merged operations:

 insert 0, delete mark 0, delete 0

discarded operations:

 insert 0, delete mark 0, delete 0

0.00 hash searches/s, 0.00 non-hash searches/s

---

LOG

---

Log sequence number 60340282849

Log flushed up to   60340282849

Pages flushed up to 28160947137

Last checkpoint at  28160947137

0 pending log flushes, 0 pending chkp writes

50192 log i/o's done, 0.00 log i/o's/second

----------------------

BUFFER POOL AND MEMORY

----------------------

Total large memory allocated 161094828032

Dictionary memory allocated 956706280

Buffer pool size   9716400

Free buffers       6346865

Database pages     3369535

Old database pages 1243812

Modified db pages  2030057

Percent of dirty pages(LRU & free pages): 20.893

Max dirty pages percent: 5.000

Pending reads 0

Pending writes: LRU 0, flush list 1025

Pages made young 33628, not young 77

0.00 youngs/s, 0.00 non-youngs/s

Pages read 187, created 3498534, written 1403598

0.00 reads/s, 0.00 creates/s, 0.00 writes/s

No buffer pool page gets since the last printout

Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s

LRU len: 3369535, unzip_LRU len: 0

I/O sum[0]:cur[0], unzip sum[0]:cur[0]

--------------

ROW OPERATIONS

--------------

0 read views open inside InnoDB

Process ID=0, Main thread ID=0, state: sleeping

Number of rows inserted 127004126, updated 0, deleted 0, read 0

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

Number of system rows inserted 0, updated 0, deleted 0, read 0

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

----------------------------

END OF INNODB MONITOR OUTPUT

output of show status :
show status.txt

gdb output :
stillstuckmariadb3

Marko, do you need more infos ?

martina342 added a comment - 2021-09-13 16:21 freeze again, one simple insert stuck again. But this time, "SHOW ENGINE INNODB STATUS;" or "show status" works Id User Host db Command Time State Info Progress 12407 root localhost bdd_principale Query 4466 Update INSERT INTO `** ` VALUES (2510 ****RE 0.000 23056 root localhost bdd_principale Query 5802 Waiting for table metadata lock SELECT eff*****, 0.000 25326 root localhost NULL Sleep 50 NULL 0.000 29037 root localhost NULL Query 0 starting show processlist 0.000 the second query waiting for table metadata is because it's on the same table. The insert is stuck for more than 1 hour. 2021-09-13 18:16:57 0x7fe7d01f6700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 48 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 17624 srv_idle srv_master_thread log flush and writes: 17624 ---------- SEMAPHORES ---------- ------------ TRANSACTIONS ------------ Trx id counter 90242 Purge done for trx's n:o < 70633 undo n:o < 0 state: running History list length 9692 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 90241, ACTIVE 4416 sec inserting mysql tables in use 1, locked 1 1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 342 MariaDB thread id 12407, OS thread handle 140633604970240, query id 83758 localhost root Update INSERT INTO `****` VALUES (2510****** -------- FILE I/O -------- Pending flushes (fsync) log: 0; buffer pool: 0 206 OS file reads, 1454819 OS file writes, 89726 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 60340282849 Log flushed up to 60340282849 Pages flushed up to 28160947137 Last checkpoint at 28160947137 0 pending log flushes, 0 pending chkp writes 50192 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 161094828032 Dictionary memory allocated 956706280 Buffer pool size 9716400 Free buffers 6346865 Database pages 3369535 Old database pages 1243812 Modified db pages 2030057 Percent of dirty pages(LRU & free pages): 20.893 Max dirty pages percent: 5.000 Pending reads 0 Pending writes: LRU 0, flush list 1025 Pages made young 33628, not young 77 0.00 youngs/s, 0.00 non-youngs/s Pages read 187, created 3498534, written 1403598 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 3369535, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 read views open inside InnoDB Process ID=0, Main thread ID=0, state: sleeping Number of rows inserted 127004126, updated 0, deleted 0, read 0 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s Number of system rows inserted 0, updated 0, deleted 0, read 0 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT output of show status : show status.txt gdb output : stillstuckmariadb3 Marko, do you need more infos ?

Marko Mäkelä added a comment - 2021-09-13 18:01

martina342, to which version did you upgrade? Unfortunately, stillstuckmariadb3 is showing ?? for all InnoDB functions. I do not think that a metadata lock wait counts as a server hang. You can try setting lock_wait_timeout to a smaller value if you expect frequent conflicts with DDL operations or statements like FLUSH TABLES WITH READ LOCK.

Marko Mäkelä added a comment - 2021-09-13 18:01 martina342 , to which version did you upgrade? Unfortunately, stillstuckmariadb3 is showing ?? for all InnoDB functions. I do not think that a metadata lock wait counts as a server hang. You can try setting lock_wait_timeout to a smaller value if you expect frequent conflicts with DDL operations or statements like FLUSH TABLES WITH READ LOCK .

martina342 added a comment - 2021-09-13 19:01 - edited

it is version 10.6.4-MariaDB-1:10.6.4+maria~bullseye mariadb.org binary distribution

did you saw that I got a freeze during 15 minutes, and any request like "SHOW status" was freezing with "Filling schema table"

"stillstuckmariadb3 is showing ?? for all InnoDB functions."
what do you recommand to make it show the function name instead of "??" ?

modifying the variable "lock_wait_timeout" would be only a temporary workaround, right ?

I don't know if that was about "Waiting for table metadata lock", which is not important here as it is probably the dump locking the table.

My problem is not that, but the INSERT which was stuck (i killed it after more than 2 hours), while other imports for the same table took much less time, and mariadb was not doing anyting

martina342 added a comment - 2021-09-13 19:01 - edited it is version 10.6.4-MariaDB-1:10.6.4+maria~bullseye mariadb.org binary distribution did you saw that I got a freeze during 15 minutes, and any request like "SHOW status" was freezing with "Filling schema table" "stillstuckmariadb3 is showing ?? for all InnoDB functions." what do you recommand to make it show the function name instead of "??" ? modifying the variable "lock_wait_timeout" would be only a temporary workaround, right ? I don't know if that was about "Waiting for table metadata lock", which is not important here as it is probably the dump locking the table. My problem is not that, but the INSERT which was stuck (i killed it after more than 2 hours), while other imports for the same table took much less time, and mariadb was not doing anyting

martina342 added a comment - 2021-09-14 13:14

i started over again by removing the mariadb data folder and reimporting again the mysqldump backups.
This time it froze earlier, with another table

Id      User    Host    db      Command Time    State   Info    Progress

3793    root    localhost       bdd_principale  Query   15033   Update  INSERT INTO `***` VALUES ('*****    0.000

16167   root    localhost       *****  Query   37960   Sending data    SELECT /*!40001 SQL_NO_CACHE */ `id`,*****    0.000

44841   root    localhost       NULL    Sleep   9064            NULL    0.000

53986   root    localhost       NULL    Query   0       starting        show processlist        0.000

the query "SELECT /*!40001 SQL_NO_CACHE" that kicked in is the automatic backup, but it's stuck too.

The table at which is stuck at, the frm file is about 100MB, but it got stuck at 68MB

I ran mariadb using perf record. It generated a 8GB file perf.data.
Here is the result (the beginning) for "perf report".

Samples: 217M of event 'cycles', Event count (approx.): 156165128856310

Overhead  Command   Shared Object        Symbol

  65.66%  mariadbd  mariadbd             [.] 0x0000000000de75e3                                                                                                                                                                                             ◆

  30.53%  mariadbd  mariadbd             [.] 0x0000000000de75e7                                                                                                                                                                                             ▒

   0.91%  mariadbd  mariadbd             [.] 0x0000000000de75ee                                                                                                                                                                                             ▒

   0.28%  mariadbd  mariadbd             [.] MYSQLparse                                                                                                                                                                                                     ▒

   0.10%  mariadbd  mariadbd             [.] 0x0000000000de75de                                                                                                                                                                                             ▒

   0.08%  mariadbd  mariadbd             [.] Lex_input_stream::get_text                                                                                                                                                                                     ▒

   0.07%  mariadbd  mariadbd             [.] 0x0000000000d1c195                                                                                                                                                                                             ▒

   0.06%  mariadbd  mariadbd             [.] Lex_input_stream::lex_one_token                                                                                                                                                                                ▒

   0.04%  mariadbd  mariadbd             [.] my_numchars_mb                                                                                                                                                                                                 ▒

   0.03%  mariadbd  [kernel.kallsyms]    [k] asm_sysvec_apic_timer_interrupt                                                                                                                                                                                ▒

   0.03%  mariadbd  libc-2.31.so         [.] __memmove_avx_unaligned_erms                                                                                                                                                                                   ▒

   0.03%  mariadbd  mariadbd             [.] 0x0000000000ddb73e                                                                                                                                                                                             ▒

   0.03%  mariadbd  [kernel.kallsyms]    [k] ktime_get_update_offsets_now                                                                                                                                                                                   ▒

   0.02%  mariadbd  mariadbd             [.] fill_record                                                                                                                                                                                                    ▒

   0.02%  mariadbd  mariadbd             [.] 0x0000000000f24360                                                                                                                                                                                             ▒

   0.02%  mariadbd  mariadbd             [.] 0x0000000000ddb707                                                                                                                                                                                             ▒

   0.02%  mariadbd  mariadbd             [.] setup_fields                                                                                                                                                                                                   ▒

   0.02%  mariadbd  [kernel.kallsyms]    [k] psi_group_change                                                                                                                                                                                               ▒

   0.02%  mariadbd  mariadbd             [.] 0x0000000000de75eb                                                                                                                                                                                             ▒

   0.02%  mariadbd  mariadbd             [.] mysql_insert                                                                                                                                                                                                   ▒

   0.02%  mariadbd  [kernel.kallsyms]    [k] native_write_msr                                                                                                                                                                                               ▒

   0.02%  mariadbd  libc-2.31.so         [.] _int_malloc                                                                                                                                                                                                    ▒

   0.02%  mariadbd  [kernel.kallsyms]    [k] irq_enter_rcu                                                                                                                                                                                                  ▒

   0.02%  mariadbd  mariadbd             [.] alloc_root                                                                                                                                                                                                     ▒

   0.02%  mariadbd  libc-2.31.so         [.] __memcmp_avx2_movbe                                                                                                                                                                                            ▒

   0.02%  mariadbd  [kernel.kallsyms]    [k] acct_account_cputime                                                                                                                                                                                           ▒

   0.02%  mariadbd  mariadbd             [.] Query_arena::free_items                                                                                                                                                                                        ▒

   0.02%  mariadbd  mariadbd             [.] 0x0000000000cf8c70                                                                                                                                                                                             ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000d1c57b                                                                                                                                                                                             ▒

   0.01%  mariadbd  mariadbd             [.] get_charset                                                                                                                                                                                                    ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000ddb784                                                                                                                                                                                             ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] hrtimer_interrupt                                                                                                                                                                                              ▒

   0.01%  mariadbd  mariadbd             [.] Lex_input_stream::unescape                                                                                                                                                                                     ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000f29083                                                                                                                                                                                             ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] cpufreq_this_cpu_can_update                                                                                                                                                                                    ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] pvclock_gtod_notify                                                                                                                                                                                            ▒

   0.01%  mariadbd  libc-2.31.so         [.] _int_free                                                                                                                                                                                                      ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave                                                                                                                                                                                         ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000f29067                                                                                                                                                                                             ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000f2904b                                                                                                                                                                                             ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] __update_load_avg_se                                                                                                                                                                                           ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000ddb7d4                                                                                                                                                                                             ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] perf_event_task_tick                                                                                                                                                                                           ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000f24363                                                                                                                                                                                             ▒

   0.01%  mariadbd  mariadbd             [.] THD::make_string_literal                                                                                                                                                                                       ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000d1c573                                                                                                                                                                                             ▒

   0.01%  mariadbd  mariadbd             [.] 0x0000000000d1b13a                                                                                                                                                                                             ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] native_read_msr                                                                                                                                                                                                ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] account_user_time                                                                                                                                                                                              ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] update_curr                                                                                                                                                                                                    ▒

   0.01%  mariadbd  mariadbd             [.] my_convert                                                                                                                                                                                                     ▒

   0.01%  mariadbd  [kernel.kallsyms]    [k] __sysvec_apic_timer_interrupt

i suppose I should compile a debug version of mariadb to show real names instead of "??" or "0x..." ?

martina342 added a comment - 2021-09-14 13:14 i started over again by removing the mariadb data folder and reimporting again the mysqldump backups. This time it froze earlier, with another table Id User Host db Command Time State Info Progress 3793 root localhost bdd_principale Query 15033 Update INSERT INTO `***` VALUES ('***** 0.000 16167 root localhost ***** Query 37960 Sending data SELECT /*!40001 SQL_NO_CACHE */ `id`,***** 0.000 44841 root localhost NULL Sleep 9064 NULL 0.000 53986 root localhost NULL Query 0 starting show processlist 0.000 the query "SELECT /*!40001 SQL_NO_CACHE" that kicked in is the automatic backup, but it's stuck too. The table at which is stuck at, the frm file is about 100MB, but it got stuck at 68MB I ran mariadb using perf record. It generated a 8GB file perf.data. Here is the result (the beginning) for "perf report". Samples: 217M of event 'cycles', Event count (approx.): 156165128856310 Overhead Command Shared Object Symbol 65.66% mariadbd mariadbd [.] 0x0000000000de75e3 ◆ 30.53% mariadbd mariadbd [.] 0x0000000000de75e7 ▒ 0.91% mariadbd mariadbd [.] 0x0000000000de75ee ▒ 0.28% mariadbd mariadbd [.] MYSQLparse ▒ 0.10% mariadbd mariadbd [.] 0x0000000000de75de ▒ 0.08% mariadbd mariadbd [.] Lex_input_stream::get_text ▒ 0.07% mariadbd mariadbd [.] 0x0000000000d1c195 ▒ 0.06% mariadbd mariadbd [.] Lex_input_stream::lex_one_token ▒ 0.04% mariadbd mariadbd [.] my_numchars_mb ▒ 0.03% mariadbd [kernel.kallsyms] [k] asm_sysvec_apic_timer_interrupt ▒ 0.03% mariadbd libc-2.31.so [.] __memmove_avx_unaligned_erms ▒ 0.03% mariadbd mariadbd [.] 0x0000000000ddb73e ▒ 0.03% mariadbd [kernel.kallsyms] [k] ktime_get_update_offsets_now ▒ 0.02% mariadbd mariadbd [.] fill_record ▒ 0.02% mariadbd mariadbd [.] 0x0000000000f24360 ▒ 0.02% mariadbd mariadbd [.] 0x0000000000ddb707 ▒ 0.02% mariadbd mariadbd [.] setup_fields ▒ 0.02% mariadbd [kernel.kallsyms] [k] psi_group_change ▒ 0.02% mariadbd mariadbd [.] 0x0000000000de75eb ▒ 0.02% mariadbd mariadbd [.] mysql_insert ▒ 0.02% mariadbd [kernel.kallsyms] [k] native_write_msr ▒ 0.02% mariadbd libc-2.31.so [.] _int_malloc ▒ 0.02% mariadbd [kernel.kallsyms] [k] irq_enter_rcu ▒ 0.02% mariadbd mariadbd [.] alloc_root ▒ 0.02% mariadbd libc-2.31.so [.] __memcmp_avx2_movbe ▒ 0.02% mariadbd [kernel.kallsyms] [k] acct_account_cputime ▒ 0.02% mariadbd mariadbd [.] Query_arena::free_items ▒ 0.02% mariadbd mariadbd [.] 0x0000000000cf8c70 ▒ 0.01% mariadbd mariadbd [.] 0x0000000000d1c57b ▒ 0.01% mariadbd mariadbd [.] get_charset ▒ 0.01% mariadbd mariadbd [.] 0x0000000000ddb784 ▒ 0.01% mariadbd [kernel.kallsyms] [k] hrtimer_interrupt ▒ 0.01% mariadbd mariadbd [.] Lex_input_stream::unescape ▒ 0.01% mariadbd mariadbd [.] 0x0000000000f29083 ▒ 0.01% mariadbd [kernel.kallsyms] [k] cpufreq_this_cpu_can_update ▒ 0.01% mariadbd [kernel.kallsyms] [k] pvclock_gtod_notify ▒ 0.01% mariadbd libc-2.31.so [.] _int_free ▒ 0.01% mariadbd [kernel.kallsyms] [k] _raw_spin_lock_irqsave ▒ 0.01% mariadbd mariadbd [.] 0x0000000000f29067 ▒ 0.01% mariadbd mariadbd [.] 0x0000000000f2904b ▒ 0.01% mariadbd [kernel.kallsyms] [k] __update_load_avg_se ▒ 0.01% mariadbd mariadbd [.] 0x0000000000ddb7d4 ▒ 0.01% mariadbd [kernel.kallsyms] [k] perf_event_task_tick ▒ 0.01% mariadbd mariadbd [.] 0x0000000000f24363 ▒ 0.01% mariadbd mariadbd [.] THD::make_string_literal ▒ 0.01% mariadbd mariadbd [.] 0x0000000000d1c573 ▒ 0.01% mariadbd mariadbd [.] 0x0000000000d1b13a ▒ 0.01% mariadbd [kernel.kallsyms] [k] native_read_msr ▒ 0.01% mariadbd [kernel.kallsyms] [k] account_user_time ▒ 0.01% mariadbd [kernel.kallsyms] [k] update_curr ▒ 0.01% mariadbd mariadbd [.] my_convert ▒ 0.01% mariadbd [kernel.kallsyms] [k] __sysvec_apic_timer_interrupt i suppose I should compile a debug version of mariadb to show real names instead of "??" or "0x..." ?

martina342 added a comment - 2021-09-14 19:26

i compiled a debug version to test, but it's importing like 20 times slower. It will take a week to import

martina342 added a comment - 2021-09-14 19:26 i compiled a debug version to test, but it's importing like 20 times slower. It will take a week to import

Marko Mäkelä added a comment - 2021-09-15 06:38

martina342, I just realized that stillstuckmariadb3 is missing the function parameters as well. This would suggest that separately available "dbgsym" packages were not installed.

I do not know why the non-stripped executable would be missing InnoDB symbols. That is not a new problem, and it seems to be affecting every system except Microsoft Windows. I suspect that it has something to do with the way how the intermediate target libinnobase.a is being built and linked with the final executables, but I have not investigated it. I think that the numeric addresses from perf could be due to the same problem.

I think that you will have to install additional packages, with names something like the following:

mariadb-server-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb
mariadb-server-core-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb

After installing those, we should see fully resolved stack traces, with function parameter values as well.

Note: The debugging information for GDB and similar tools independent of additional runtime checks that would be enabled in special debug builds. I do not think that it would be useful to track this problem using a debug build, because the timing characteristics could become completely different.

Marko Mäkelä added a comment - 2021-09-15 06:38 martina342 , I just realized that stillstuckmariadb3 is missing the function parameters as well. This would suggest that separately available "dbgsym" packages were not installed. I do not know why the non-stripped executable would be missing InnoDB symbols. That is not a new problem, and it seems to be affecting every system except Microsoft Windows. I suspect that it has something to do with the way how the intermediate target libinnobase.a is being built and linked with the final executables, but I have not investigated it. I think that the numeric addresses from perf could be due to the same problem. I think that you will have to install additional packages, with names something like the following: mariadb-server-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb mariadb-server-core-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb After installing those, we should see fully resolved stack traces, with function parameter values as well. Note: The debugging information for GDB and similar tools independent of additional runtime checks that would be enabled in special debug builds. I do not think that it would be useful to track this problem using a debug build, because the timing characteristics could become completely different.

Daniel Black added a comment - 2021-09-15 07:52 - edited

martina342 thanks for helping out marko. I've been getting ready to release the below functionality as a service, its still really early days so any problems let me know.

This uses a build of the 10.6, exactly what appears in a release with the debug symbols installed in the container. This is from about 4 days ago currently (ref: https://quay.io/repository/mariadb-foundation/mariadb-debug?tab=tags).

I usually use podman, but docker works as well. Same arguments on command line:

Steps:

can you copy your cnf file, without datadir, tmpdir or log_error to a file in a directory my.cnf.d

Run the latest 10.6 version with the same configuration on your current data (or a copy if you want).

run image with debug info

docker run  --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined  -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0.0.1:3306:3306 -d --name mariadb    quay.io/mariadb-foundation/mariadb-debug:10.6-amd64

Note: seccomp is docker only.

install gdb into container
docker exec -ti mariadb sh -c "apt-get update && apt-get install -y gdb"

When you are at a point and want to take a snapshot:

take snapshot

docker exec -ti mariadb gosu mysql  gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" --eval-command="print dict_sys.latch" --eval-command="print dict_sys.mutex" -p 1  | tee  mariadb_bt_all_threads.txt

Test this now before applying workload to check all permissions have been relaxed enough to examine workload. Look at logs of the container with:

container logs
docker logs mariadb

Apply workload

To include some metadata like which exact build it was generated from in the same file:

capture metadata
docker exec -ti mariadb cat /manifest.txt \| tee -a mariadb_bt_all_threads.txt

In my test with Marko mariadb_bt_all_threads.txt was enough of a backtrace.

Daniel Black added a comment - 2021-09-15 07:52 - edited martina342 thanks for helping out marko . I've been getting ready to release the below functionality as a service, its still really early days so any problems let me know. This uses a build of the 10.6, exactly what appears in a release with the debug symbols installed in the container. This is from about 4 days ago currently (ref: https://quay.io/repository/mariadb-foundation/mariadb-debug?tab=tags ). I usually use podman, but docker works as well. Same arguments on command line: Steps: can you copy your cnf file, without datadir, tmpdir or log_error to a file in a directory my.cnf.d Run the latest 10.6 version with the same configuration on your current data (or a copy if you want). run image with debug info docker run --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0.0.1:3306:3306 -d --name mariadb quay.io/mariadb-foundation/mariadb-debug:10.6-amd64 Note: seccomp is docker only. install gdb into container docker exec -ti mariadb sh -c "apt-get update && apt-get install -y gdb" When you are at a point and want to take a snapshot: take snapshot docker exec -ti mariadb gosu mysql gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" --eval-command="print dict_sys.latch" --eval-command="print dict_sys.mutex" -p 1 | tee mariadb_bt_all_threads.txt Test this now before applying workload to check all permissions have been relaxed enough to examine workload. Look at logs of the container with: container logs docker logs mariadb Apply workload To include some metadata like which exact build it was generated from in the same file: capture metadata docker exec -ti mariadb cat /manifest.txt | tee -a mariadb_bt_all_threads.txt In my test with Marko mariadb_bt_all_threads.txt was enough of a backtrace.

martina342 made changes - 2021-09-15 10:58

Attachment

hopeenoughdata.txt [ 58996 ]

martina342 added a comment - 2021-09-15 11:06 - edited

thanks both of you.

Before your message, Daniel, I already had started over again, but with the package mariadb-server-10.6-dbgsym installed , so I let it run until it froze, instead of stopping it.

So it looks like it's blocked again

Id      User    Host    db      Command Time    State   Info    Progress

2717    root    localhost       bdd_principale  Query   3625    Update  INSERT INTO `i***********998    0.000

13685   root    localhost       bdd_principale  Sleep   30              NULL    0.000

13717   root    localhost       NULL    Query   0       starting        show processlist        0.000

I ran this :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt full" --batch -p $(pidof mariadbd)

Here is the output :
hopeenoughdata.txt

And this command :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "print dict_sys.latch" --batch -p $(pidof mariadbd)
[New LWP 2201079]
[New LWP 2201080]
[New LWP 2201092]
[New LWP 2201098]
[New LWP 2201100]
[New LWP 2201104]
[New LWP 2201136]
[New LWP 2202646]
[New LWP 2209615]
[New LWP 2212111]
[New LWP 2228121]
[New LWP 2229924]
[New LWP 2253944]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f2c857913ff in _GI__poll (fds=0x560ff3d83bf8, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
No symbol "dict_sys" in current context.
[Inferior 1 (process 2201077) detached]

And here is the perf record

  40.05%  mariadbd  mariadbd             [.] 0x0000000000de75e3

  15.19%  mariadbd  mariadbd             [.] 0x0000000000de75e7

   8.74%  mariadbd  mariadbd             [.] MYSQLparse

   1.80%  mariadbd  mariadbd             [.] Lex_input_stream::lex_one_token

   1.71%  mariadbd  mariadbd             [.] 0x0000000000d1c195

   1.46%  mariadbd  mariadbd             [.] Lex_input_stream::get_text

   0.55%  mariadbd  mariadbd             [.] setup_fields

   0.54%  mariadbd  mariadbd             [.] 0x0000000000ddb7d4

   0.53%  mariadbd  mariadbd             [.] my_numchars_mb

   0.48%  mariadbd  mariadbd             [.] fill_record

   0.48%  mariadbd  mariadbd             [.] Query_arena::free_items

   0.45%  mariadbd  libc-2.31.so         [.] __memmove_avx_unaligned_erms

   0.44%  mariadbd  libc-2.31.so         [.] __memcmp_avx2_movbe

   0.42%  mariadbd  mariadbd             [.] alloc_root

   0.38%  mariadbd  mariadbd             [.] 0x0000000000de75ee

   0.37%  mariadbd  mariadbd             [.] 0x0000000000cf8c70

   0.37%  mariadbd  libc-2.31.so         [.] _int_malloc

   0.35%  mariadbd  mariadbd             [.] THD::make_string_literal

   0.34%  mariadbd  mariadbd             [.] 0x0000000000ddb707

   0.33%  mariadbd  mariadbd             [.] 0x0000000000ddb73e

   0.33%  mariadbd  mariadbd             [.] Item::Item

   0.31%  mariadbd  libc-2.31.so         [.] _int_free

   0.30%  mariadbd  mariadbd             [.] 0x0000000000ddb844

   0.28%  mariadbd  mariadbd             [.] get_charset

   0.27%  mariadbd  mariadbd             [.] Lex_input_stream::add_digest_token

   0.27%  mariadbd  mariadbd             [.] 0x0000000000ddbb81

   0.26%  mariadbd  mariadbd             [.] 0x0000000000d1c57b

   0.22%  mariadbd  mariadbd             [.] 0x0000000000ddb784

   0.22%  mariadbd  mariadbd             [.] Lex_input_stream::lex_token

   0.21%  mariadbd  mariadbd             [.] 0x0000000000f24360

   0.21%  mariadbd  mariadbd             [.] 0x0000000000d1c573

   0.21%  mariadbd  mariadbd             [.] mysql_insert

   0.20%  mariadbd  mariadbd             [.] 0x0000000000d1b1ce

   0.18%  mariadbd  mariadbd             [.] 0x0000000000cd9192

   0.18%  mariadbd  mariadbd             [.] 0x0000000000d1b13a

   0.18%  mariadbd  mariadbd             [.] 0x0000000000f2904b

   0.18%  mariadbd  mariadbd             [.] 0x0000000000f29083

   0.18%  mariadbd  mariadbd             [.] 0x0000000000d1b1ef

   0.17%  mariadbd  mariadbd             [.] 0x0000000000f29067

   0.17%  mariadbd  mariadbd             [.] Item::set_name

   0.16%  mariadbd  mariadbd             [.] Item::cleanup

   0.16%  mariadbd  libc-2.31.so         [.] malloc

   0.16%  mariadbd  mariadbd             [.] Field_blob::store

   0.15%  mariadbd  mariadbd             [.] 0x0000000000f24363

   0.14%  mariadbd  mariadbd             [.] 0x0000000000ddb8ff

   0.14%  mariadbd  mariadbd             [.] 0x0000000000d1c259

   0.14%  mariadbd  libc-2.31.so         [.] __memset_avx2_unaligned_erms

   0.14%  mariadbd  mariadbd             [.] Item_string::save_in_field

   0.13%  mariadbd  mariadbd             [.] 0x0000000000ddb8f9

   0.13%  mariadbd  mariadbd             [.] my_convert

   0.13%  mariadbd  mariadbd             [.] 0x0000000000dc55cb

   0.12%  mariadbd  libc-2.31.so         [.] __memmove_avx_unaligned

I will follow the steps for the docker test

edit: I have an error with the first command :

docker run  --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined  -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0.0.1:3306:3306 -d --name mariadb    quay.io/mariadb-foundation/mariadb-debug:10.6-amd64

Unable to find image 'quay.io/mariadb-foundation/mariadb-debug:10.6-amd64' locally

10.6-amd64: Pulling from mariadb-foundation/mariadb-debug

5297cdb712ed: Pull complete

834628376a35: Pull complete

1a4e5185f03f: Pull complete

Digest: sha256:d86994bf9f22c7bd097445a0d232835ac15e5847acab190f7602501437582fca

Status: Downloaded newer image for quay.io/mariadb-foundation/mariadb-debug:10.6-amd64

docker: Error response from daemon: create ./my.cnf.d: "./my.cnf.d" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.

See 'docker run --help'.

the folder exists :

ls ./my.cnf.d

my.cnf

martina342 added a comment - 2021-09-15 11:06 - edited thanks both of you. Before your message, Daniel, I already had started over again, but with the package mariadb-server-10.6-dbgsym installed , so I let it run until it froze, instead of stopping it. So it looks like it's blocked again Id User Host db Command Time State Info Progress 2717 root localhost bdd_principale Query 3625 Update INSERT INTO `i***********998 0.000 13685 root localhost bdd_principale Sleep 30 NULL 0.000 13717 root localhost NULL Query 0 starting show processlist 0.000 I ran this : gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt full" --batch -p $(pidof mariadbd) Here is the output : hopeenoughdata.txt And this command : gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "print dict_sys.latch" --batch -p $(pidof mariadbd) [New LWP 2201079] [New LWP 2201080] [New LWP 2201092] [New LWP 2201098] [New LWP 2201100] [New LWP 2201104] [New LWP 2201136] [New LWP 2202646] [New LWP 2209615] [New LWP 2212111] [New LWP 2228121] [New LWP 2229924] [New LWP 2253944] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007f2c857913ff in _ GI __poll (fds=0x560ff3d83bf8, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory. No symbol "dict_sys" in current context. [Inferior 1 (process 2201077) detached] And here is the perf record 40.05% mariadbd mariadbd [.] 0x0000000000de75e3 15.19% mariadbd mariadbd [.] 0x0000000000de75e7 8.74% mariadbd mariadbd [.] MYSQLparse 1.80% mariadbd mariadbd [.] Lex_input_stream::lex_one_token 1.71% mariadbd mariadbd [.] 0x0000000000d1c195 1.46% mariadbd mariadbd [.] Lex_input_stream::get_text 0.55% mariadbd mariadbd [.] setup_fields 0.54% mariadbd mariadbd [.] 0x0000000000ddb7d4 0.53% mariadbd mariadbd [.] my_numchars_mb 0.48% mariadbd mariadbd [.] fill_record 0.48% mariadbd mariadbd [.] Query_arena::free_items 0.45% mariadbd libc-2.31.so [.] __memmove_avx_unaligned_erms 0.44% mariadbd libc-2.31.so [.] __memcmp_avx2_movbe 0.42% mariadbd mariadbd [.] alloc_root 0.38% mariadbd mariadbd [.] 0x0000000000de75ee 0.37% mariadbd mariadbd [.] 0x0000000000cf8c70 0.37% mariadbd libc-2.31.so [.] _int_malloc 0.35% mariadbd mariadbd [.] THD::make_string_literal 0.34% mariadbd mariadbd [.] 0x0000000000ddb707 0.33% mariadbd mariadbd [.] 0x0000000000ddb73e 0.33% mariadbd mariadbd [.] Item::Item 0.31% mariadbd libc-2.31.so [.] _int_free 0.30% mariadbd mariadbd [.] 0x0000000000ddb844 0.28% mariadbd mariadbd [.] get_charset 0.27% mariadbd mariadbd [.] Lex_input_stream::add_digest_token 0.27% mariadbd mariadbd [.] 0x0000000000ddbb81 0.26% mariadbd mariadbd [.] 0x0000000000d1c57b 0.22% mariadbd mariadbd [.] 0x0000000000ddb784 0.22% mariadbd mariadbd [.] Lex_input_stream::lex_token 0.21% mariadbd mariadbd [.] 0x0000000000f24360 0.21% mariadbd mariadbd [.] 0x0000000000d1c573 0.21% mariadbd mariadbd [.] mysql_insert 0.20% mariadbd mariadbd [.] 0x0000000000d1b1ce 0.18% mariadbd mariadbd [.] 0x0000000000cd9192 0.18% mariadbd mariadbd [.] 0x0000000000d1b13a 0.18% mariadbd mariadbd [.] 0x0000000000f2904b 0.18% mariadbd mariadbd [.] 0x0000000000f29083 0.18% mariadbd mariadbd [.] 0x0000000000d1b1ef 0.17% mariadbd mariadbd [.] 0x0000000000f29067 0.17% mariadbd mariadbd [.] Item::set_name 0.16% mariadbd mariadbd [.] Item::cleanup 0.16% mariadbd libc-2.31.so [.] malloc 0.16% mariadbd mariadbd [.] Field_blob::store 0.15% mariadbd mariadbd [.] 0x0000000000f24363 0.14% mariadbd mariadbd [.] 0x0000000000ddb8ff 0.14% mariadbd mariadbd [.] 0x0000000000d1c259 0.14% mariadbd libc-2.31.so [.] __memset_avx2_unaligned_erms 0.14% mariadbd mariadbd [.] Item_string::save_in_field 0.13% mariadbd mariadbd [.] 0x0000000000ddb8f9 0.13% mariadbd mariadbd [.] my_convert 0.13% mariadbd mariadbd [.] 0x0000000000dc55cb 0.12% mariadbd libc-2.31.so [.] __memmove_avx_unaligned I will follow the steps for the docker test edit: I have an error with the first command : docker run --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0 . 0.1 : 3306 : 3306 -d --name mariadb quay.io/mariadb-foundation/mariadb-debug: 10.6 -amd64 Unable to find image 'quay.io/mariadb-foundation/mariadb-debug:10.6-amd64' locally 10.6 -amd64: Pulling from mariadb-foundation/mariadb-debug 5297cdb712ed: Pull complete 834628376a35: Pull complete 1a4e5185f03f: Pull complete Digest: sha256:d86994bf9f22c7bd097445a0d232835ac15e5847acab190f7602501437582fca Status: Downloaded newer image for quay.io/mariadb-foundation/mariadb-debug: 10.6 -amd64 docker: Error response from daemon: create ./my.cnf.d: "./my.cnf.d" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path. See 'docker run --help' . the folder exists : ls ./my.cnf.d my.cnf

martina342 made changes - 2021-09-15 19:49

Attachment

mariadb_bt_all_threads.txt [ 59005 ]

martina342 made changes - 2021-09-15 19:50

Attachment

mariadb_bt_all_threads.txt [ 59005 ]

martina342 made changes - 2021-09-15 19:57

Attachment

mariadb_bt_all_threads.txt [ 59006 ]

Marko Mäkelä added a comment - 2021-09-15 19:59

martina342, it looks like the interesting command thread apply all backtrace was not being executed at all. It is not promising either that the symbol dict_sys.latch was not recognized by the debugger. Hopefully it was just because the current stack frame was in a shared library and not the main mariadbd executable. So, that GDB invocation did not get us any further yet.

I do not know if perf report is supposed to be helped by the dbgsym packages.

If this involves Docker, I understood from earlier discussions with danblack that some extra effort may be needed to make GDB work. He is our Docker expert.

I am sorry that this is so tricky, but for hangs or some forms of performance trouble there really is no substitute for having stack traces of all threads. For some hangs, we would want to see several samples to distinguish genuine deadlocks from mere slowness.

Marko Mäkelä added a comment - 2021-09-15 19:59 martina342 , it looks like the interesting command thread apply all backtrace was not being executed at all. It is not promising either that the symbol dict_sys.latch was not recognized by the debugger. Hopefully it was just because the current stack frame was in a shared library and not the main mariadbd executable. So, that GDB invocation did not get us any further yet. I do not know if perf report is supposed to be helped by the dbgsym packages. If this involves Docker, I understood from earlier discussions with danblack that some extra effort may be needed to make GDB work. He is our Docker expert. I am sorry that this is so tricky, but for hangs or some forms of performance trouble there really is no substitute for having stack traces of all threads. For some hangs, we would want to see several samples to distinguish genuine deadlocks from mere slowness.

Marko Mäkelä added a comment - 2021-09-15 20:11

Sorry, I overlooked the attachment hopeenoughdata.txt again. It seems that we are still missing function call parameter names and values and local variables for most stack frames. I only see them for some library functions, such as the pthread functions.

There is no occurrence of innobase in the output, but there are ?? function names, some of which must belong to InnoDB threads. One is Thread 9 apparently executing an INSERT statement. One of Thread 4 or Thread 3 should be the buf_flush_page_cleaner(), but I cannot know which one, and I have no idea what the other one could be. Maybe it’d be an Aria checkpointer thread, because there are no Aria symbols either. Thread 13 could be running some InnoDB code. We would really need to know what those threads are doing.

One thing that you might try is to set innodb_stats_persistent=OFF or innodb_stats_auto_recalc=OFF. If this hang is related to the automatic recalculation of persistent statistics (~~MDEV-15020~~), it should be fixed in the upcoming 10.6.5 release. But, I’d rather expect Thread 13 to run the InnoDB purge coordinator task than the statistics updates.

Marko Mäkelä added a comment - 2021-09-15 20:11 Sorry, I overlooked the attachment hopeenoughdata.txt again. It seems that we are still missing function call parameter names and values and local variables for most stack frames. I only see them for some library functions, such as the pthread functions. There is no occurrence of innobase in the output, but there are ?? function names, some of which must belong to InnoDB threads. One is Thread 9 apparently executing an INSERT statement. One of Thread 4 or Thread 3 should be the buf_flush_page_cleaner() , but I cannot know which one, and I have no idea what the other one could be. Maybe it’d be an Aria checkpointer thread, because there are no Aria symbols either. Thread 13 could be running some InnoDB code. We would really need to know what those threads are doing. One thing that you might try is to set innodb_stats_persistent=OFF or innodb_stats_auto_recalc=OFF . If this hang is related to the automatic recalculation of persistent statistics ( MDEV-15020 ), it should be fixed in the upcoming 10.6.5 release. But, I’d rather expect Thread 13 to run the InnoDB purge coordinator task than the statistics updates.

martina342 added a comment - 2021-09-15 20:40 - edited

i replaced ./my.cnf.d with the fullpath to fix the error I mentioned.
I also have problems to access mariadb from the server to docker as only 127.0.0.1 is allowed, so I had to adjust it.

Here is the log file using docker, without any workload mariadb_bt_all_threads.txt

without workload :

docker logs mariadb

2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.

2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'

2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.

Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2

Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3

2021-09-15 19:56:15 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ...

2021-09-15 19:56:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2021-09-15 19:56:15 0 [Note] InnoDB: Number of pools: 1

2021-09-15 19:56:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-09-15 19:56:15 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)

2021-09-15 19:56:15 0 [Note] InnoDB: Using Linux native AIO

2021-09-15 19:56:15 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728

2021-09-15 19:56:15 0 [Note] InnoDB: Completed initialization of buffer pool

2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces

2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces

2021-09-15 19:56:15 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active.

2021-09-15 19:56:15 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-09-15 19:56:15 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-09-15 19:56:15 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2021-09-15 19:56:15 0 [Note] InnoDB: 10.6.5 started; log sequence number 42906; transaction id 14

2021-09-15 19:56:15 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool

2021-09-15 19:56:15 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 19:56:15 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.

2021-09-15 19:56:15 0 [Note] Server socket created on IP: '0.0.0.0'.

2021-09-15 19:56:15 0 [Note] Server socket created on IP: '::'.

2021-09-15 19:56:15 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 19:56:15

2021-09-15 19:56:15 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.

2021-09-15 19:56:15 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.

2021-09-15 19:56:15 0 [Note] mariadbd: ready for connections.

Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

i'm currently importing data on the container to generate logs with workload, but if the bug is indeed the one fixed on 10.6.5, it should not reproduce anymore

martina342 added a comment - 2021-09-15 20:40 - edited i replaced ./my.cnf.d with the fullpath to fix the error I mentioned. I also have problems to access mariadb from the server to docker as only 127.0.0.1 is allowed, so I had to adjust it. Here is the log file using docker, without any workload mariadb_bt_all_threads.txt without workload : docker logs mariadb 2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started. 2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started. Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2 Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3 2021-09-15 19:56:15 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ... 2021-09-15 19:56:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-09-15 19:56:15 0 [Note] InnoDB: Number of pools: 1 2021-09-15 19:56:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-09-15 19:56:15 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts) 2021-09-15 19:56:15 0 [Note] InnoDB: Using Linux native AIO 2021-09-15 19:56:15 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728 2021-09-15 19:56:15 0 [Note] InnoDB: Completed initialization of buffer pool 2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces 2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces 2021-09-15 19:56:15 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active. 2021-09-15 19:56:15 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-09-15 19:56:15 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-09-15 19:56:15 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2021-09-15 19:56:15 0 [Note] InnoDB: 10.6.5 started; log sequence number 42906; transaction id 14 2021-09-15 19:56:15 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2021-09-15 19:56:15 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 19:56:15 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work. 2021-09-15 19:56:15 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-09-15 19:56:15 0 [Note] Server socket created on IP: '::'. 2021-09-15 19:56:15 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 19:56:15 2021-09-15 19:56:15 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode. 2021-09-15 19:56:15 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode. 2021-09-15 19:56:15 0 [Note] mariadbd: ready for connections. Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution i'm currently importing data on the container to generate logs with workload, but if the bug is indeed the one fixed on 10.6.5, it should not reproduce anymore

martina342 made changes - 2021-09-16 08:31

Attachment

mariadb_bt_all_threads_workload_slowed_down.txt [ 59013 ]

martina342 added a comment - 2021-09-16 08:39 - edited

Hello,
the current import from mysqldump with the docker version of mariadb (10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4 mariadb.org binary distribution) seems to work without complete freeze.
Almost all insert take less than a second. I have a QPS of 5.8, and i'm doing a show processlist every second.
However, I noticed at some point an insert that took more than 15 minutes. I took a stacktrace during it (i censored some informations). The table that was inserting, the ibd file is 332mb "only", not all inserts on this table took this long, even on this table most of the time it took less than a second :
mariadb_bt_all_threads_workload_slowed_down.txt

Here are the mariadb logs :

2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.

2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'

2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.

Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2

Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3

2021-09-15 20:09:06 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ...

2021-09-15 20:09:06 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2021-09-15 20:09:06 0 [Note] InnoDB: Number of pools: 1

2021-09-15 20:09:06 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-09-15 20:09:06 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)

2021-09-15 20:09:06 0 [Note] InnoDB: Using Linux native AIO

2021-09-15 20:09:06 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728

2021-09-15 20:09:06 0 [Note] InnoDB: Completed initialization of buffer pool

2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces

2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces

2021-09-15 20:09:06 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active.

2021-09-15 20:09:06 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-09-15 20:09:06 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-09-15 20:09:06 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2021-09-15 20:09:06 0 [Note] InnoDB: 10.6.5 started; log sequence number 42918; transaction id 14

2021-09-15 20:09:06 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool

2021-09-15 20:09:06 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.

2021-09-15 20:09:06 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.

2021-09-15 20:09:06 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 20:09:06

2021-09-15 20:09:06 0 [Note] Server socket created on IP: '0.0.0.0'.

2021-09-15 20:09:06 0 [Note] Server socket created on IP: '::'.

2021-09-15 20:09:06 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.

2021-09-15 20:09:06 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.

2021-09-15 20:09:06 0 [Note] mariadbd: ready for connections.

Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

2021-09-15 20:09:39 3 [Warning] Aborted connection 3 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication)

2021-09-15 20:19:07 21 [Warning] Aborted connection 21 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication)

2021-09-15 20:22:39 24 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.

2021-09-15 20:22:39 24 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.

2021-09-15 20:23:33 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:23:53 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:23:54 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:24:02 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:24:38 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:24:38 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:24:39 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:24:39 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:25:16 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:25:16 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:25:17 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:25:51 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:26:30 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:26:30 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:26:31 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:26:31 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:27:01 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:31:02 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:31:27 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:34:46 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:34:48 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:35:30 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:35:31 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:35:31 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:36:00 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:36:01 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:36:01 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:36:09 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:42:19 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:42:19 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:42:20 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:42:20 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:43:25 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:45:03 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:45:04 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:48:17 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:48:43 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:48:57 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:49:53 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:49:54 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:49:54 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:49:55 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:50:23 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:52:35 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:52:36 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:53:00 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:55:56 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:55:56 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:56:32 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:56:33 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:57:07 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:57:08 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:58:48 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 20:58:48 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 20:58:49 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 20:58:49 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 20:59:18 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:00:16 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:00:17 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:00:17 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:00:46 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:00:46 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:00:47 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:00:47 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:01:17 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:01:17 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:01:18 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:01:18 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:01:46 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:01:46 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:01:47 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:01:47 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:02:15 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:08:44 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:08:45 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:08:59 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:28:05 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:30:25 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:30:25 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:34:08 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:34:34 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:34:34 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:34:35 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:34:35 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:35:07 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:35:07 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:35:08 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:35:08 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:35:38 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:35:38 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:35:39 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:35:39 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:36:08 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:36:08 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:36:09 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:36:09 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:36:38 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:36:40 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:37:10 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:37:11 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:37:11 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:37:12 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:37:42 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:37:43 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:37:43 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:37:44 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:38:16 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:38:17 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:38:46 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:38:47 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:39:21 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:39:22 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:39:59 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:39:59 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:40:00 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:40:00 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:40:37 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:40:37 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:40:38 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:40:38 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:41:14 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:41:14 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:41:15 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:41:15 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:41:51 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:41:52 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:42:26 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:42:27 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:43:03 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:43:03 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:43:04 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:43:04 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:43:40 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:43:40 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:43:41 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:43:59 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:44:34 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:44:34 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 21:44:35 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:44:35 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 21:45:13 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 21:45:14 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 23:00:57 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 23:01:01 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 23:01:02 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 23:01:02 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 23:01:41 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 23:35:25 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 23:35:26 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 23:37:30 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 23:38:05 0 [Note] InnoDB: Truncating .//undo001

2021-09-15 23:45:39 0 [Note] InnoDB: Truncated .//undo001

2021-09-15 23:45:40 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 23:45:40 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 23:46:19 0 [Note] InnoDB: Truncating .//undo002

2021-09-15 23:51:57 0 [Note] InnoDB: Truncated .//undo002

2021-09-15 23:51:58 0 [Note] InnoDB: Truncating .//undo001

2021-09-16  0:02:12 0 [Note] InnoDB: Truncated .//undo001

Notice how it often truncate undo logs, but sometime it takes longer and most of the time it's instaneous :
2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002
2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002
2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002
2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002

This matches with the moment i took the stacktrace.

So if everything continue to do the same behaviour, it's only randomly stuck for some minutes, instead of blocking completely.

So I think there is still something to check. I guess I need to compile mariadb until the next version is released ? Do you have an ETA for it ?

martina342 added a comment - 2021-09-16 08:39 - edited Hello, the current import from mysqldump with the docker version of mariadb (10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4 mariadb.org binary distribution) seems to work without complete freeze. Almost all insert take less than a second. I have a QPS of 5.8, and i'm doing a show processlist every second. However, I noticed at some point an insert that took more than 15 minutes. I took a stacktrace during it (i censored some informations). The table that was inserting, the ibd file is 332mb "only", not all inserts on this table took this long, even on this table most of the time it took less than a second : mariadb_bt_all_threads_workload_slowed_down.txt Here are the mariadb logs : 2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started. 2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started. Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2 Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3 2021-09-15 20:09:06 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ... 2021-09-15 20:09:06 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-09-15 20:09:06 0 [Note] InnoDB: Number of pools: 1 2021-09-15 20:09:06 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-09-15 20:09:06 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts) 2021-09-15 20:09:06 0 [Note] InnoDB: Using Linux native AIO 2021-09-15 20:09:06 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728 2021-09-15 20:09:06 0 [Note] InnoDB: Completed initialization of buffer pool 2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces 2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces 2021-09-15 20:09:06 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active. 2021-09-15 20:09:06 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-09-15 20:09:06 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-09-15 20:09:06 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2021-09-15 20:09:06 0 [Note] InnoDB: 10.6.5 started; log sequence number 42918; transaction id 14 2021-09-15 20:09:06 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2021-09-15 20:09:06 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files. 2021-09-15 20:09:06 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work. 2021-09-15 20:09:06 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 20:09:06 2021-09-15 20:09:06 0 [Note] Server socket created on IP: '0.0.0.0'. 2021-09-15 20:09:06 0 [Note] Server socket created on IP: '::'. 2021-09-15 20:09:06 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode. 2021-09-15 20:09:06 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode. 2021-09-15 20:09:06 0 [Note] mariadbd: ready for connections. Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution 2021-09-15 20:09:39 3 [Warning] Aborted connection 3 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication) 2021-09-15 20:19:07 21 [Warning] Aborted connection 21 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication) 2021-09-15 20:22:39 24 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode. 2021-09-15 20:22:39 24 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode. 2021-09-15 20:23:33 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:23:53 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:23:54 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:24:02 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:24:38 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:24:38 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:24:39 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:24:39 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:25:16 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:25:16 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:25:17 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:25:51 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:26:30 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:26:30 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:26:31 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:26:31 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:27:01 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:31:02 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:31:27 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:34:46 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:34:48 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:35:30 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:35:31 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:35:31 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:36:00 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:36:01 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:36:01 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:36:09 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:42:19 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:42:19 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:42:20 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:42:20 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:43:25 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:45:03 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:45:04 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:48:17 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:48:43 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:48:57 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:49:53 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:49:54 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:49:54 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:49:55 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:50:23 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:52:35 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:52:36 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:53:00 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:55:56 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:55:56 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:56:32 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:56:33 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:57:07 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:57:08 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:58:48 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 20:58:48 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 20:58:49 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 20:58:49 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 20:59:18 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:00:16 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:00:17 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:00:17 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:00:46 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:00:46 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:00:47 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:00:47 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:01:17 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:01:17 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:01:18 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:01:18 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:01:46 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:01:46 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:01:47 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:01:47 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:02:15 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:08:44 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:08:45 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:08:59 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:28:05 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:30:25 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:30:25 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:34:08 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:34:34 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:34:34 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:34:35 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:34:35 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:35:07 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:35:07 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:35:08 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:35:08 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:35:38 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:35:38 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:35:39 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:35:39 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:36:08 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:36:08 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:36:09 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:36:09 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:36:38 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:36:40 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:37:10 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:37:11 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:37:11 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:37:12 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:37:42 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:37:43 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:37:43 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:37:44 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:38:16 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:38:17 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:38:46 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:38:47 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:39:21 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:39:22 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:39:59 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:39:59 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:40:00 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:40:00 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:40:37 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:40:37 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:40:38 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:40:38 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:41:14 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:41:14 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:41:15 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:41:15 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:41:51 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:41:52 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:42:26 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:42:27 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:43:03 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:43:03 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:43:04 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:43:04 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:43:40 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:43:40 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:43:41 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:43:59 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:44:34 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:44:34 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 21:44:35 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:44:35 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:45:13 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 21:45:14 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 23:00:57 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 23:01:01 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 23:01:02 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 23:01:02 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 23:01:41 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 23:35:25 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 23:35:26 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 23:37:30 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 23:38:05 0 [Note] InnoDB: Truncating .//undo001 2021-09-15 23:45:39 0 [Note] InnoDB: Truncated .//undo001 2021-09-15 23:45:40 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 23:45:40 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 23:46:19 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 23:51:57 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 23:51:58 0 [Note] InnoDB: Truncating .//undo001 2021-09-16 0:02:12 0 [Note] InnoDB: Truncated .//undo001 Notice how it often truncate undo logs, but sometime it takes longer and most of the time it's instaneous : 2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002 2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002 2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002 This matches with the moment i took the stacktrace. So if everything continue to do the same behaviour, it's only randomly stuck for some minutes, instead of blocking completely. So I think there is still something to check. I guess I need to compile mariadb until the next version is released ? Do you have an ETA for it ?

Marko Mäkelä added a comment - 2021-09-16 08:49

In mariadb_bt_all_threads_workload_slowed_down.txt I see that undo tablespace truncation is being executed. (Please be aware of ~~MDEV-26450~~.) There previously was a 10.5 performance regression about that (~~MDEV-25801~~), but this is the fixed code.

I also see that the INSERT is waiting for buffer pool LRU eviction. Based on this single snapshot during the workload, I would guess that the data load is being slowed down by ‘eviction flushing’ (writing out least recently used pages so that they can be evicted, to make room for new pages). If it is not possible to make the buffer pool or the redo log bigger, there is nothing that we can do about this. But, let’s wait for the stack trace for the actual hang, to see if there is some other problem. In this stack trace dump, everything still looks normal to me.

How big are the buffer pool and the redo log, and how big would the final table be? Also, what type of storage are you using?

Marko Mäkelä added a comment - 2021-09-16 08:49 In mariadb_bt_all_threads_workload_slowed_down.txt I see that undo tablespace truncation is being executed. (Please be aware of MDEV-26450 .) There previously was a 10.5 performance regression about that ( MDEV-25801 ), but this is the fixed code. I also see that the INSERT is waiting for buffer pool LRU eviction. Based on this single snapshot during the workload, I would guess that the data load is being slowed down by ‘eviction flushing’ (writing out least recently used pages so that they can be evicted, to make room for new pages). If it is not possible to make the buffer pool or the redo log bigger, there is nothing that we can do about this. But, let’s wait for the stack trace for the actual hang, to see if there is some other problem. In this stack trace dump, everything still looks normal to me. How big are the buffer pool and the redo log, and how big would the final table be? Also, what type of storage are you using?

Marko Mäkelä added a comment - 2021-09-16 08:58

martina342, I checked the new attachment and composed my message before you posted the update.

It looks like the problem is related to undo tablespace truncation. I think that we must test its performance as part of fixing ~~MDEV-26450~~. Until that bug has been fully analyzed and fixed, I would not recommend enabling undo tablespace truncation. It is possible that this correctness bug affects all MariaDB releases since 10.2.

I do not have an estimate when we could fix that, and in my opinion it is not feasible to tweak the performance before the code has been confirmed to be correct. It is covered by a simple kill+restart test in our regression test suite, but that test never fails.

I try to treat any corruption bugs with high priority. This one might miss the next round of point releases. We aim to have releases about 4 times per year.

Marko Mäkelä added a comment - 2021-09-16 08:58 martina342 , I checked the new attachment and composed my message before you posted the update. It looks like the problem is related to undo tablespace truncation. I think that we must test its performance as part of fixing MDEV-26450 . Until that bug has been fully analyzed and fixed, I would not recommend enabling undo tablespace truncation. It is possible that this correctness bug affects all MariaDB releases since 10.2. I do not have an estimate when we could fix that, and in my opinion it is not feasible to tweak the performance before the code has been confirmed to be correct. It is covered by a simple kill+restart test in our regression test suite, but that test never fails. I try to treat any corruption bugs with high priority. This one might miss the next round of point releases. We aim to have releases about 4 times per year.

Marko Mäkelä made changes - 2021-09-16 08:58

Link

This issue is blocked by ~~MDEV-26450~~ [ ~~MDEV-26450~~ ]

martina342 added a comment - 2021-09-16 09:05

this are the settings :

wait_timeout 					= 43200

max_allowed_packet				= 64M

query_cache_size 				= 0

query_cache_type 				= 0

innodb_buffer_pool_size 		= 150G

innodb_log_file_size 			= 37G

innodb_buffer_pool_instances	= 64

innodb_undo_log_truncate 		= ON

skip-name-resolve 				= 1

tmp_table_size 					= 256M

max_heap_table_size 			= 256M

max_connections			 		= 400

net_read_timeout 				= 250600

net_write_timeout 				= 250600

join_buffer_size 				= 1MB

sql_mode						= ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

innodb-defragment				= 1

innodb_online_alter_log_max_size= 50000000000

table_definition_cache			= 1000

innodb_undo_tablespaces			= 2

table_open_cache				= 30000

innodb_fatal_semaphore_wait_threshold = 2000

[mariadb]

core_file

I can't make the innodb_buffer_pool_size larger (it's already 150G), i have innodb_log_file_size at 37G, which was suggested by mysqltuner.

The table that was inserting during the dump is 317Mb large only.

I'm using only innodb tables.

https://jira.mariadb.org/browse/MDEV-26450 : this is what happened to me on version 10.5 when the semaphore wait was killing the server and forced me to restore backups. But it also happened on 10.6 during normal shutdown. So i will disable undo truncate until it's fixed.

martina342 added a comment - 2021-09-16 09:05 this are the settings : wait_timeout = 43200 max_allowed_packet = 64M query_cache_size = 0 query_cache_type = 0 innodb_buffer_pool_size = 150G innodb_log_file_size = 37G innodb_buffer_pool_instances = 64 innodb_undo_log_truncate = ON skip-name-resolve = 1 tmp_table_size = 256M max_heap_table_size = 256M max_connections = 400 net_read_timeout = 250600 net_write_timeout = 250600 join_buffer_size = 1MB sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION innodb-defragment = 1 innodb_online_alter_log_max_size= 50000000000 table_definition_cache = 1000 innodb_undo_tablespaces = 2 table_open_cache = 30000 innodb_fatal_semaphore_wait_threshold = 2000 [mariadb] core_file I can't make the innodb_buffer_pool_size larger (it's already 150G), i have innodb_log_file_size at 37G, which was suggested by mysqltuner. The table that was inserting during the dump is 317Mb large only. I'm using only innodb tables. https://jira.mariadb.org/browse/MDEV-26450 : this is what happened to me on version 10.5 when the semaphore wait was killing the server and forced me to restore backups. But it also happened on 10.6 during normal shutdown. So i will disable undo truncate until it's fixed.

Daniel Black added a comment - 2021-09-16 09:39

The slowness in the container may be as a result of what docker its storage, overlayfs, as a guess. Its good enough for a few package updates but for the datadir it should be in its own volume. "docker volume create mariadbdata", and use that volume "-v mariadbdata:/var/lib/mysql" when the container is started. This should be a direct binding of a directory without the complexities of userspace filesystems.

release schedule: https://jira.mariadb.org/projects/MDEV?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page

Daniel Black added a comment - 2021-09-16 09:39 The slowness in the container may be as a result of what docker its storage, overlayfs, as a guess. Its good enough for a few package updates but for the datadir it should be in its own volume. "docker volume create mariadbdata", and use that volume "-v mariadbdata:/var/lib/mysql" when the container is started. This should be a direct binding of a directory without the complexities of userspace filesystems. release schedule: https://jira.mariadb.org/projects/MDEV?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page

Marko Mäkelä added a comment - 2021-09-16 12:52

martina342, I see. If both the buffer pool and the data were smaller (so that there wasn’t too much LRU page eviction going on), then I might suggest using a larger redo log file size, so that log checkpoints would be executed less often.

I see that you are not configuring innodb_io_capacity. The default value could be way to small for your storage. (Which type is it? SATA HDD? SSD? NVMe? What is the read and write speed?)

Possibly there still is some room of improvement for our page flushing (~~MDEV-26055~~).

In any case, please let me know if the performance is more acceptable when undo tablespace truncation is disabled. That will be helpful when we test the fix of ~~MDEV-26450~~.

Marko Mäkelä added a comment - 2021-09-16 12:52 martina342 , I see. If both the buffer pool and the data were smaller (so that there wasn’t too much LRU page eviction going on), then I might suggest using a larger redo log file size, so that log checkpoints would be executed less often. I see that you are not configuring innodb_io_capacity . The default value could be way to small for your storage. (Which type is it? SATA HDD? SSD? NVMe? What is the read and write speed?) Possibly there still is some room of improvement for our page flushing ( MDEV-26055 ). In any case, please let me know if the performance is more acceptable when undo tablespace truncation is disabled. That will be helpful when we test the fix of MDEV-26450 .

Marko Mäkelä made changes - 2021-09-16 13:00

Assignee

Marko Mäkelä [ marko ]

martina342 added a comment - 2021-09-16 13:14

i'm using a raid 1 nvme (soft) which is apparently rated at this :
Random Read (4kb) : 430 Kiops
Random Write (4kb) : 40 Kiops
Seq. Read : 3000 Mb/s
Seq. Write : 1900 Mb/s

i can't test right now for real world benchmark as i'm still importing backup.

thanks for the hint of innodb_io_capacity, apparently i was bottlenecking my server due to that variable.

Globally the performance is acceptable, except when it seem to freeze like it happened, which seems to happen rarely, so hard to monitor

martina342 added a comment - 2021-09-16 13:14 i'm using a raid 1 nvme (soft) which is apparently rated at this : Random Read (4kb) : 430 Kiops Random Write (4kb) : 40 Kiops Seq. Read : 3000 Mb/s Seq. Write : 1900 Mb/s i can't test right now for real world benchmark as i'm still importing backup. thanks for the hint of innodb_io_capacity, apparently i was bottlenecking my server due to that variable. Globally the performance is acceptable, except when it seem to freeze like it happened, which seems to happen rarely, so hard to monitor

Marko Mäkelä added a comment - 2021-09-16 14:07

On 10.6, you could try to adapt your data load for ~~MDEV-515~~, to avoid unnecessary row-level undo logging when inserting into empty tables. (And insert into each table only from one transaction.)

SET unique_checks=0, foreign_key_checks=0;

This is admittedly a hack, to be compatible with mysqldump files. And you’d better be sure that there are no violations of those checks.

Marko Mäkelä added a comment - 2021-09-16 14:07 On 10.6, you could try to adapt your data load for MDEV-515 , to avoid unnecessary row-level undo logging when inserting into empty tables. (And insert into each table only from one transaction.) SET unique_checks=0, foreign_key_checks=0; This is admittedly a hack, to be compatible with mysqldump files. And you’d better be sure that there are no violations of those checks.

martina342 added a comment - 2021-09-17 08:55

here is real world benchmark using fio :

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64

fio-3.25

Starting 1 process

test: Laying out IO file (1 file / 4096MiB)

Jobs: 1 (f=1): [m(1)][100.0%][r=276MiB/s,w=91.7MiB/s][r=70.6k,w=23.5k IOPS][eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=3064140: Fri Sep 17 10:39:57 2021

  read: IOPS=74.9k, BW=293MiB/s (307MB/s)(3070MiB/10487msec)

   bw (  KiB/s): min=278040, max=326000, per=100.00%, avg=305779.20, stdev=15118.69, samples=20

   iops        : min=69510, max=81500, avg=76444.80, stdev=3779.67, samples=20

  write: IOPS=25.0k, BW=97.8MiB/s (103MB/s)(1026MiB/10487msec); 0 zone resets

   bw (  KiB/s): min=91712, max=108600, per=100.00%, avg=102215.60, stdev=5362.53, samples=20

   iops        : min=22928, max=27148, avg=25553.90, stdev=1340.58, samples=20

  cpu          : usr=11.85%, sys=55.60%, ctx=309103, majf=0, minf=134

  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%

     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0

     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):

   READ: bw=293MiB/s (307MB/s), 293MiB/s-293MiB/s (307MB/s-307MB/s), io=3070MiB (3219MB), run=10487-10487msec

  WRITE: bw=97.8MiB/s (103MB/s), 97.8MiB/s-97.8MiB/s (103MB/s-103MB/s), io=1026MiB (1076MB), run=10487-10487msec

Disk stats (read/write):

    md2: ios=779713/260682, merge=0/0, ticks=366080/197908, in_queue=563988, util=99.13%, aggrios=392960/262799, aggrmerge=0/11, aggrticks=185182/181367, aggrin_queue=366550, aggrutil=98.77%

  nvme1n1: ios=388833/262799, merge=0/11, ticks=187345/181869, in_queue=369215, util=98.69%

  nvme0n1: ios=397087/262799, merge=0/11, ticks=183019/180865, in_queue=363885, util=98.77%

so i guess i could set innodb_io_capacity to at least 15000 ?

Before disabling unto truncation, I've saw another request that took 1 hour instead of being instantaneous.

I've disabled unto truncate, but could't stay to monitor if some other requests were taking too long, but it finished, so I suppose it made it works better.

martina342 added a comment - 2021-09-17 08:55 here is real world benchmark using fio : fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 fio-3.25 Starting 1 process test: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [m(1)][100.0%][r=276MiB/s,w=91.7MiB/s][r=70.6k,w=23.5k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3064140: Fri Sep 17 10:39:57 2021 read: IOPS=74.9k, BW=293MiB/s (307MB/s)(3070MiB/10487msec) bw ( KiB/s): min=278040, max=326000, per=100.00%, avg=305779.20, stdev=15118.69, samples=20 iops : min=69510, max=81500, avg=76444.80, stdev=3779.67, samples=20 write: IOPS=25.0k, BW=97.8MiB/s (103MB/s)(1026MiB/10487msec); 0 zone resets bw ( KiB/s): min=91712, max=108600, per=100.00%, avg=102215.60, stdev=5362.53, samples=20 iops : min=22928, max=27148, avg=25553.90, stdev=1340.58, samples=20 cpu : usr=11.85%, sys=55.60%, ctx=309103, majf=0, minf=134 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=293MiB/s (307MB/s), 293MiB/s-293MiB/s (307MB/s-307MB/s), io=3070MiB (3219MB), run=10487-10487msec WRITE: bw=97.8MiB/s (103MB/s), 97.8MiB/s-97.8MiB/s (103MB/s-103MB/s), io=1026MiB (1076MB), run=10487-10487msec Disk stats (read/write): md2: ios=779713/260682, merge=0/0, ticks=366080/197908, in_queue=563988, util=99.13%, aggrios=392960/262799, aggrmerge=0/11, aggrticks=185182/181367, aggrin_queue=366550, aggrutil=98.77% nvme1n1: ios=388833/262799, merge=0/11, ticks=187345/181869, in_queue=369215, util=98.69% nvme0n1: ios=397087/262799, merge=0/11, ticks=183019/180865, in_queue=363885, util=98.77% so i guess i could set innodb_io_capacity to at least 15000 ? Before disabling unto truncation, I've saw another request that took 1 hour instead of being instantaneous. I've disabled unto truncate, but could't stay to monitor if some other requests were taking too long, but it finished, so I suppose it made it works better.

martina342 added a comment - 2021-09-17 09:07 - edited

DAMN

i've disabled unto truncation on runtime ... and in the my.cnf, but it corrupted the data anyway :

Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

2021-09-17 09:09:27 0x7f058b00b700  InnoDB: Assertion failure in file /buildbot/amd64-ubuntu-2004-deb-autobake/build/storage/innobase/trx/trx0purge.cc line 123

InnoDB: Failing assertion: purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no()

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mariadbd startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/

InnoDB: about forcing recovery.

210917  9:09:27 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.6.5-MariaDB-1:10.6.5+maria~focal

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=0

max_threads=402

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016262 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x55df3500f828

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f058b00aca8 thread_stack 0x49000

mariadbd(my_print_stacktrace+0x32)[0x55df33442c32]

please tell me how to fix that without reimporting everything, my client can't wait any more. It took 1.5 days to reimport the data

It was working, and once i restarted mariadb, it crashed

apprently it crashed but doesn't want to stop :

docker exec mariadb ps aux

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

mysql          1  4.6 16.3 165041080 43171172 ?  Ssl  09:09   2:19 mariadbd

mysql         55  0.8  1.5 160747856 4036876 ?   S    09:09   0:23 mariadbd

root         122  0.0  0.0  10896  4248 ?        Ss   09:38   0:00 mysqladmin shutdown

root         269  0.0  0.0   5904  2784 ?        Rs   09:59   0:00 ps aux

and the logs ends with this and nothing else after :

2021-09-17  9:09:53 0 [Note] InnoDB: Buffer pool(s) load completed at 210917  9:09:53

2021-09-17  9:38:44 0 [Note] mariadbd (initiated by: root[root] @ localhost []): Normal shutdown

2021-09-17  9:38:44 0 [Note] InnoDB: FTS optimize thread exiting.

martina342 added a comment - 2021-09-17 09:07 - edited DAMN i've disabled unto truncation on runtime ... and in the my.cnf, but it corrupted the data anyway : Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution 2021-09-17 09:09:27 0x7f058b00b700 InnoDB: Assertion failure in file /buildbot/amd64-ubuntu-2004-deb-autobake/build/storage/innobase/trx/trx0purge.cc line 123 InnoDB: Failing assertion: purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no() InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mariadbd startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/ InnoDB: about forcing recovery. 210917 9:09:27 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.6.5-MariaDB-1:10.6.5+maria~focal key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=402 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016262 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x55df3500f828 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f058b00aca8 thread_stack 0x49000 mariadbd(my_print_stacktrace+0x32)[0x55df33442c32] please tell me how to fix that without reimporting everything, my client can't wait any more. It took 1.5 days to reimport the data It was working, and once i restarted mariadb, it crashed apprently it crashed but doesn't want to stop : docker exec mariadb ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mysql 1 4.6 16.3 165041080 43171172 ? Ssl 09:09 2:19 mariadbd mysql 55 0.8 1.5 160747856 4036876 ? S 09:09 0:23 mariadbd root 122 0.0 0.0 10896 4248 ? Ss 09:38 0:00 mysqladmin shutdown root 269 0.0 0.0 5904 2784 ? Rs 09:59 0:00 ps aux and the logs ends with this and nothing else after : 2021-09-17 9:09:53 0 [Note] InnoDB: Buffer pool(s) load completed at 210917 9:09:53 2021-09-17 9:38:44 0 [Note] mariadbd (initiated by: root[root] @ localhost []): Normal shutdown 2021-09-17 9:38:44 0 [Note] InnoDB: FTS optimize thread exiting.

Marko Mäkelä added a comment - 2021-09-17 12:10

martina342, if the undo logs were already corrupted due to ~~MDEV-26450~~ in some way (you did not restart from the scratch after disabling that), then I am sorry but some (most?) forms of corruption in the InnoDB system and undo tablespaces are not self-healing or easy to fix.

I do not think that we currently have any option to discard and re-create the undo logs. There is only the option innodb_force_recovery=5 that would ignore all undo logs and treat all reads as READ UNCOMMITTED. Technically, it should not be hard to implement such an option, if the user accepts the resulting inconsistency and the potential corruption of secondary indexes for any partially completed row operations. So far, no support customer has requested that.

Marko Mäkelä added a comment - 2021-09-17 12:10 martina342 , if the undo logs were already corrupted due to MDEV-26450 in some way (you did not restart from the scratch after disabling that), then I am sorry but some (most?) forms of corruption in the InnoDB system and undo tablespaces are not self-healing or easy to fix. I do not think that we currently have any option to discard and re-create the undo logs. There is only the option innodb_force_recovery=5 that would ignore all undo logs and treat all reads as READ UNCOMMITTED . Technically, it should not be hard to implement such an option, if the user accepts the resulting inconsistency and the potential corruption of secondary indexes for any partially completed row operations. So far, no support customer has requested that.

martina342 added a comment - 2021-09-19 11:41 - edited

Hello
I started over again, this time with undo truncation disabled from the beginning, and I didn't noticed any freeze or corrumption using the own compiled 10.6.5

martina342 added a comment - 2021-09-19 11:41 - edited Hello I started over again, this time with undo truncation disabled from the beginning, and I didn't noticed any freeze or corrumption using the own compiled 10.6.5

Marko Mäkelä added a comment - 2021-09-21 12:31

martina342, thank you. Meanwhile, I think I nailed down ~~MDEV-26450~~.

I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to ~~MDEV-15528~~. We will handle that as part of this ticket, not as part of the correctness issue, which is ~~MDEV-26450~~.

Marko Mäkelä added a comment - 2021-09-21 12:31 martina342 , thank you. Meanwhile, I think I nailed down MDEV-26450 . I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to MDEV-15528 . We will handle that as part of this ticket, not as part of the correctness issue, which is MDEV-26450 .

Marko Mäkelä made changes - 2021-09-21 15:19

Link

This issue blocks MENT-1304 [ MENT-1304 ]

Marko Mäkelä made changes - 2021-09-21 15:19

Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Affects Version/s		10.6 [ 24028 ]
Labels		performance
Summary	[ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.	innodb_undo_log_truncate is unnecessarily slow

Marko Mäkelä made changes - 2021-09-21 15:19

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2021-09-21 15:22

Link

This issue relates to ~~MDEV-21952~~ [ ~~MDEV-21952~~ ]

Marko Mäkelä made changes - 2021-09-22 06:50

Link

This issue relates to ~~MDEV-26662~~ [ ~~MDEV-26662~~ ]

Marko Mäkelä added a comment - 2021-09-22 13:26

I implemented a fix that avoids writing any pages of the being-truncated undo tablespace. It does not depend on ~~MDEV-15528~~, but rather on ~~MDEV-25113~~ and ~~MDEV-15058~~. A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON:

connections	baseline/tps	baseline 95%/ms	baseline max/ms	truncate/tps	truncate 95%/ms	truncate max/ms
10	95285.41	0.19	77.87	97687.20	0.18	38.54
20	157405.77	0.18	195.21	159989.50	0.19	407.60
40	164189.49	0.42	336.01	166069.04	0.70	333.03
80	163296.38	1.32	773.72	161948.80	2.07	716.47
160	146380.27	5.00	161.18	143285.16	5.88	136.79

Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

Marko Mäkelä added a comment - 2021-09-22 13:26 I implemented a fix that avoids writing any pages of the being-truncated undo tablespace . It does not depend on MDEV-15528 , but rather on MDEV-25113 and MDEV-15058 . A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON : connections baseline/tps baseline 95%/ms baseline max/ms truncate/tps truncate 95%/ms truncate max/ms 10 95285.41 0.19 77.87 97687.20 0.18 38.54 20 157405.77 0.18 195.21 159989.50 0.19 407.60 40 164189.49 0.42 336.01 166069.04 0.70 333.03 80 163296.38 1.32 773.72 161948.80 2.07 716.47 160 146380.27 5.00 161.18 143285.16 5.88 136.79 Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

Marko Mäkelä added a comment - 2021-09-23 08:25

It occurred to me that it is better to not hold any mutex while truncating the file. It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

Marko Mäkelä added a comment - 2021-09-23 08:25 It occurred to me that it is better to not hold any mutex while truncating the file . It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

Marko Mäkelä added a comment - 2021-09-23 09:03

I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB.

connections	throughput/tps	95% latency/ms	max latency/ms
10	105850.83	0.12	19.65
20	161065.65	0.17	23.45
40	172238.07	0.44	38.84
80	169323.66	0.86	108.88

I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch.

Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint.

The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

Marko Mäkelä added a comment - 2021-09-23 09:03 I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB. connections throughput/tps 95% latency/ms max latency/ms 10 105850.83 0.12 19.65 20 161065.65 0.17 23.45 40 172238.07 0.44 38.84 80 169323.66 0.86 108.88 I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch . Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint. The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

Matthias Leich added a comment - 2021-09-23 15:20

origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00

which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

Matthias Leich added a comment - 2021-09-23 15:20 origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00 which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

Marko Mäkelä made changes - 2021-09-24 15:02

issue.field.resolutiondate

2021-09-24 15:02:51.0

2021-09-24 15:02:51.532

Marko Mäkelä made changes - 2021-09-24 15:02

Fix Version/s		10.5.13 [ 26026 ]
Fix Version/s		10.6.5 [ 26034 ]
Fix Version/s		10.7.1 [ 26120 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2021-10-25 10:02

Link

This issue includes ~~MDEV-21952~~ [ ~~MDEV-21952~~ ]

Sergei Golubchik made changes - 2021-10-25 10:02

Link

This issue relates to ~~MDEV-21952~~ [ ~~MDEV-21952~~ ]

Sergei Golubchik made changes - 2021-12-06 21:53

Workflow

MariaDB v3 [ 124481 ]

MariaDB v4 [ 159625 ]

Alice Sherepa made changes - 2022-01-18 12:38

Link

This issue relates to ~~MDEV-27530~~ [ ~~MDEV-27530~~ ]

Marko Mäkelä made changes - 2023-11-10 07:49

Link

This issue causes ~~MDEV-32757~~ [ ~~MDEV-32757~~ ]

Jira Automation (IT) made changes - 2024-07-04 02:22

Zendesk Related Tickets

131727

MariaDB Server

innodb_undo_log_truncate is unnecessarily slow

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration