|
Can you please produce a stack trace of all threads at the time of the hang (from the core dump)? A link to the instructions was included in the output that you posted: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
|
the mariadb binary is stripped and now the database is corrupted, i don't even know if i can fix it, i setup innodb_force_recovery = 6 to start but I don't know if I can clear the content of the undo logs, but I don't want to corrupt again the database.
EDIT: force recovery is read only, can you please tell me how to cancel undo logs ?
I tried to set innodb_undo_tablespaces = 0 but then it fails with
InnoDB: File ./ibdata1: 'aio write' returned OS error 209. Cannot continue operation
|
|
i installed mariadb 10.5 from the mariadb repo, and installed mariadb-server-core-10.5-dbgsym, so I followed the steps for the stack trace, and it crashed again (i noticed i need to add a variable "core_file" afterward)
--Thread 139462800680704 has waited at btr0pcur.cc line 510 for 556.00 seconds the semaphore:
|
S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227
|
a writer (thread id 139461606029056) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time write locked in file row0ins.cc line 2633
|
2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:
|
--Thread 139461610944256 has waited at trx0trx.cc line 881 for 606.00 seconds the semaphore:
|
Mutex at 0x55d0b144f300, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:404, lock var 2
|
|
2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:
|
--Thread 139461603878656 has waited at btr0pcur.cc line 510 for 266.00 seconds the semaphore:
|
S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227
|
a writer (thread id 139461606029056) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time write locked in file row0ins.cc line 2633
|
2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:
|
--Thread 139462802523904 has waited at btr0pcur.cc line 510 for 370.00 seconds the semaphore:
|
S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227
|
a writer (thread id 139461606029056) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time write locked in file row0ins.cc line 2633
|
2021-08-21 22:49:09 0 [Note] InnoDB: A semaphore wait:
|
--Thread 139462801295104 has waited at btr0pcur.cc line 510 for 534.00 seconds the semaphore:
|
S-lock on RW-latch at 0x7ee83c0cdc78 created in file buf0buf.cc line 1227
|
a writer (thread id 139461606029056) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time write locked in file row0ins.cc line 2633
|
|
2021-08-21 22:49:09 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
|
210821 22:49:09 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=144
|
max_threads=402
|
thread_count=74
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d00358949e]
|
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55d002fe7305]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7efd7f5a2140]
|
2021-08-21 22:49:18 0 [Note] InnoDB: Truncated .//undo001
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7efd7f0ebce1]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7efd7f0d5537]
|
/usr/sbin/mariadbd(+0x63724f)[0x55d002ca624f]
|
/usr/sbin/mariadbd(+0x62c07d)[0x55d002c9b07d]
|
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x55d00351f60b]
|
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55d0035204c2]
|
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55d00351f3cf]
|
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7efd7f489ed0]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7efd7f596ea7]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7efd7f1addef]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /home/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size 0 unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 1031116 1031116 processes
|
Max open files 32768 32768 files
|
Max locked memory 65536 65536 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 1031116 1031116 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: core
|
|
|
|
here is a full stack
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
|
Copyright (C) 2021 Free Software Foundation, Inc.
|
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
|
This is free software: you are free to change and redistribute it.
|
There is NO WARRANTY, to the extent permitted by law.
|
Type "show copying" and "show warranty" for details.
|
This GDB was configured as "x86_64-linux-gnu".
|
Type "show configuration" for configuration details.
|
For bug reporting instructions, please see:
|
<https://www.gnu.org/software/gdb/bugs/>.
|
Find the GDB manual and other documentation resources online at:
|
<http://www.gnu.org/software/gdb/documentation/>.
|
|
For help, type "help".
|
Type "apropos word" to search for commands related to "word"...
|
Reading symbols from /usr/sbin/mysqld...
|
Reading symbols from /usr/lib/debug/.build-id/67/9570f0b96cbccab866cbf085e309ed3a7e8ddc.debug...
|
|
warning: Can't open file /[aio] (deleted) during file-backed mapping note processing
|
[New LWP 669786]
|
[New LWP 258999]
|
[New LWP 259018]
|
[New LWP 258654]
|
[New LWP 293279]
|
[New LWP 259058]
|
[New LWP 258658]
|
[New LWP 293242]
|
[New LWP 258667]
|
[New LWP 293230]
|
[New LWP 293236]
|
[New LWP 258951]
|
[New LWP 293306]
|
[New LWP 293209]
|
[New LWP 258653]
|
[New LWP 293313]
|
[New LWP 259016]
|
[New LWP 259001]
|
[New LWP 293225]
|
[New LWP 293285]
|
[New LWP 293576]
|
[New LWP 258924]
|
[New LWP 259008]
|
[New LWP 293316]
|
[New LWP 259059]
|
[New LWP 293589]
|
[New LWP 293312]
|
[New LWP 293318]
|
[New LWP 293202]
|
[New LWP 293595]
|
[New LWP 258990]
|
[New LWP 259078]
|
[New LWP 293577]
|
[New LWP 293254]
|
[New LWP 259068]
|
[New LWP 293237]
|
[New LWP 294894]
|
[New LWP 258652]
|
[New LWP 293314]
|
[New LWP 259084]
|
[New LWP 293571]
|
[New LWP 293317]
|
[New LWP 293266]
|
[New LWP 293275]
|
[New LWP 293327]
|
[New LWP 259024]
|
[New LWP 259082]
|
[New LWP 294705]
|
[New LWP 293584]
|
[New LWP 293310]
|
[New LWP 293597]
|
[New LWP 259014]
|
[New LWP 294934]
|
[New LWP 293601]
|
[New LWP 294688]
|
[New LWP 294928]
|
[New LWP 293220]
|
[New LWP 258689]
|
[New LWP 293610]
|
[New LWP 293246]
|
[New LWP 669785]
|
[New LWP 258655]
|
[New LWP 293235]
|
[New LWP 259031]
|
[New LWP 294694]
|
[New LWP 293288]
|
[New LWP 293241]
|
[New LWP 428265]
|
[New LWP 293694]
|
[New LWP 293693]
|
[New LWP 293311]
|
[New LWP 294929]
|
[New LWP 294908]
|
[New LWP 293585]
|
[New LWP 294893]
|
[New LWP 293249]
|
[New LWP 294920]
|
[New LWP 294935]
|
[New LWP 669125]
|
[New LWP 293196]
|
[New LWP 293244]
|
[New LWP 293692]
|
[New LWP 294689]
|
[New LWP 309945]
|
[New LWP 294907]
|
[New LWP 293217]
|
[New LWP 293570]
|
[New LWP 670639]
|
[New LWP 258668]
|
[New LWP 293583]
|
[New LWP 293336]
|
[New LWP 601081]
|
[New LWP 294901]
|
[New LWP 294693]
|
[New LWP 294778]
|
[New LWP 294775]
|
[New LWP 309770]
|
[New LWP 294690]
|
[New LWP 331566]
|
[New LWP 293320]
|
[New LWP 294927]
|
[New LWP 304551]
|
[New LWP 293272]
|
[New LWP 298162]
|
[New LWP 293263]
|
[New LWP 294687]
|
[New LWP 294909]
|
[New LWP 294916]
|
[New LWP 294917]
|
[New LWP 258958]
|
[New LWP 293578]
|
[New LWP 259034]
|
[New LWP 304560]
|
[New LWP 258956]
|
[New LWP 294910]
|
[New LWP 294777]
|
[New LWP 294773]
|
[New LWP 304541]
|
[New LWP 670720]
|
[New LWP 293574]
|
[New LWP 298152]
|
[New LWP 295013]
|
[New LWP 294937]
|
[New LWP 298153]
|
[New LWP 335788]
|
[New LWP 293300]
|
[New LWP 293324]
|
[New LWP 294895]
|
[New LWP 293228]
|
[New LWP 294932]
|
[New LWP 294921]
|
[New LWP 293581]
|
[New LWP 670858]
|
[New LWP 293287]
|
[New LWP 294933]
|
[New LWP 258962]
|
[New LWP 295014]
|
[New LWP 300352]
|
[New LWP 294919]
|
[New LWP 294940]
|
[New LWP 670831]
|
[New LWP 294939]
|
[New LWP 383425]
|
[New LWP 295046]
|
[New LWP 601054]
|
[New LWP 304540]
|
[New LWP 393894]
|
[New LWP 293582]
|
[New LWP 294691]
|
[New LWP 294692]
|
[New LWP 294903]
|
[New LWP 309767]
|
[New LWP 294923]
|
[New LWP 428228]
|
[New LWP 304561]
|
[New LWP 259010]
|
[New LWP 294897]
|
[New LWP 309946]
|
[New LWP 294931]
|
[New LWP 298160]
|
[New LWP 259041]
|
[New LWP 294942]
|
[New LWP 428142]
|
[New LWP 383429]
|
[New LWP 304552]
|
[New LWP 298154]
|
[New LWP 670807]
|
[New LWP 293296]
|
[New LWP 293259]
|
[New LWP 293596]
|
[New LWP 294914]
|
[New LWP 309773]
|
[New LWP 294936]
|
[New LWP 670658]
|
[New LWP 670833]
|
[New LWP 670638]
|
[New LWP 294902]
|
[New LWP 259022]
|
[New LWP 670772]
|
[New LWP 294938]
|
[New LWP 670870]
|
[New LWP 293226]
|
[New LWP 293319]
|
[New LWP 304557]
|
[New LWP 670709]
|
[New LWP 670124]
|
[New LWP 383436]
|
[New LWP 670640]
|
[New LWP 293245]
|
[New LWP 670842]
|
[New LWP 310227]
|
[New LWP 670834]
|
[New LWP 309768]
|
[New LWP 670732]
|
[New LWP 294912]
|
[New LWP 670871]
|
[New LWP 259029]
|
[New LWP 428184]
|
[New LWP 670781]
|
[New LWP 670662]
|
[New LWP 298361]
|
[New LWP 293308]
|
[New LWP 304542]
|
[New LWP 293944]
|
[New LWP 670645]
|
[New LWP 295012]
|
[New LWP 309766]
|
[New LWP 670131]
|
[New LWP 258926]
|
[New LWP 304547]
|
[New LWP 294915]
|
[New LWP 670736]
|
[New LWP 293307]
|
[New LWP 259036]
|
[New LWP 298163]
|
[New LWP 259020]
|
[New LWP 601080]
|
[New LWP 428153]
|
[New LWP 670796]
|
[New LWP 670791]
|
[New LWP 428224]
|
[New LWP 293572]
|
[New LWP 601079]
|
[New LWP 670712]
|
[New LWP 294896]
|
[New LWP 298151]
|
[New LWP 670714]
|
[New LWP 259053]
|
[New LWP 670766]
|
[New LWP 298156]
|
[New LWP 259064]
|
[New LWP 258943]
|
[New LWP 670671]
|
[New LWP 601084]
|
[New LWP 294941]
|
[New LWP 304553]
|
[New LWP 670866]
|
[New LWP 670837]
|
[New LWP 670702]
|
[New LWP 428145]
|
[New LWP 304545]
|
[New LWP 294922]
|
[New LWP 304556]
|
[New LWP 298158]
|
[New LWP 670865]
|
[New LWP 670637]
|
[New LWP 309774]
|
[New LWP 670667]
|
[New LWP 428245]
|
[New LWP 670856]
|
[New LWP 670765]
|
[New LWP 670703]
|
[New LWP 294926]
|
[New LWP 259044]
|
[New LWP 309771]
|
[New LWP 259073]
|
[New LWP 293298]
|
[New LWP 298155]
|
[New LWP 670643]
|
[New LWP 298159]
|
[New LWP 670701]
|
[New LWP 298164]
|
[New LWP 670771]
|
[New LWP 670769]
|
[New LWP 304555]
|
[New LWP 258940]
|
[New LWP 259087]
|
[New LWP 309775]
|
[New LWP 293221]
|
[New LWP 334045]
|
[New LWP 670770]
|
[New LWP 670706]
|
[New LWP 670809]
|
[New LWP 383427]
|
[New LWP 259060]
|
[New LWP 298150]
|
[New LWP 304549]
|
[New LWP 259091]
|
[New LWP 670803]
|
[New LWP 670838]
|
[New LWP 670729]
|
[New LWP 669783]
|
[New LWP 293255]
|
[New LWP 670839]
|
[New LWP 670852]
|
[New LWP 294913]
|
[New LWP 293580]
|
[New LWP 293261]
|
[New LWP 259086]
|
[New LWP 304544]
|
[New LWP 309765]
|
[New LWP 298161]
|
[New LWP 304550]
|
[New LWP 293599]
|
[New LWP 293262]
|
[New LWP 298973]
|
[New LWP 293579]
|
[New LWP 293277]
|
[New LWP 293199]
|
[New LWP 304546]
|
[New LWP 304559]
|
[New LWP 293304]
|
[New LWP 293315]
|
[New LWP 294955]
|
[New LWP 294918]
|
[New LWP 293321]
|
[New LWP 309769]
|
[New LWP 293322]
|
[New LWP 383433]
|
[New LWP 397906]
|
[New LWP 383435]
|
[New LWP 293329]
|
[New LWP 670704]
|
[New LWP 670705]
|
[New LWP 293573]
|
[New LWP 670741]
|
[New LWP 670764]
|
[New LWP 293575]
|
[New LWP 670832]
|
[New LWP 670811]
|
[New LWP 293586]
|
[New LWP 670681]
|
[New LWP 294774]
|
[New LWP 294776]
|
[New LWP 294930]
|
[New LWP 294966]
|
[New LWP 294968]
|
[New LWP 304537]
|
[New LWP 309948]
|
[New LWP 428180]
|
[New LWP 428253]
|
[New LWP 393900]
|
[New LWP 601082]
|
[New LWP 670648]
|
[New LWP 670787]
|
[New LWP 670849]
|
[New LWP 670872]
|
[New LWP 259023]
|
[New LWP 309772]
|
[Thread debugging using libthread_db enabled]
|
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
|
--Type <RET> for more, q to quit, c to continue without paging--
|
Core was generated by `/usr/sbin/mariadbd'.
|
Program terminated with signal SIGABRT, Aborted.
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
|
56 ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
|
[Current thread is 1 (Thread 0x7f7455ffb700 (LWP 669786))]
|
|
|
|
the database is again now corrupted ....
2021-08-24 11:04:50 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 4633837568 outside the bounds of the file: .//undo002
|
210824 11:04:50 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=0
|
max_threads=402
|
thread_count=0
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5625bef0e49e]
|
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x5625be96c305]
|
sigaction.c:0(__restore_rt)[0x7fc93b1b9140]
|
linux/raise.c:51(__GI_raise)[0x7fc93ad02ce1]
|
stdlib/abort.c:81(__GI_abort)[0x7fc93acec537]
|
/usr/sbin/mariadbd(+0x63724f)[0x5625be62b24f]
|
/usr/sbin/mariadbd(+0x64bd1b)[0x5625be63fd1b]
|
ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x5625be63fd61]
|
fil/fil0fil.cc:3224(fil_report_invalid_page_access(char const*, unsigned long, unsigned long, bool))[0x5625bedf4aac]
|
include/fil0fil.h:535(fil_space_t::acquire())[0x5625bede2054]
|
buf/buf0rea.cc:344(buf_read_page(page_id_t, unsigned long))[0x5625bed9e9ca]
|
buf/buf0buf.cc:2960(buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool))[0x5625bed8c7a6]
|
trx/trx0undo.cc:854(trx_undo_mem_create_at_db_start(trx_rseg_t*, unsigned long, unsigned int, unsigned long&))[0x5625bed97914]
|
trx/trx0rseg.cc:431(trx_rseg_array_init())[0x5625be622daf]
|
trx/trx0trx.cc:740(trx_lists_init_at_db_start())[0x5625bec79319]
|
srv/srv0start.cc:1488(srv_start(bool))[0x5625be96ee9e]
|
handler/ha_innodb.cc:4025(innodb_init(void*))[0x5625be76ccca]
|
sql/handler.cc:645(ha_initialize_handlerton(st_plugin_int*))[0x5625be76df4d]
|
sql/sql_plugin.cc:1459(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x5625be68bf9f]
|
sql/sql_plugin.cc:1751(plugin_init(int*, char**, int))[0x5625be691919]
|
csu/libc-start.c:308(__libc_start_main)[0x7fc93acedd0a]
|
/usr/sbin/mariadbd(_start+0x2a)[0x5625be68662a]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /home/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size unlimited unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 1031116 1031116 processes
|
Max open files 32768 32768 files
|
Max locked memory 65536 65536 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 1031116 1031116 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: core
|
|
|
|
|
here is a crash with really full stack trace (attached)
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041117320960 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041117013760 has waited at dict0dict.cc line 1083 for 39.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041116706560 has waited at dict0dict.cc line 1083 for 38.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041116399360 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041116092160 has waited at dict0dict.cc line 1083 for 36.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041115784960 has waited at dict0dict.cc line 1083 for 32.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041115477760 has waited at dict0dict.cc line 1083 for 30.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041115170560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041114863360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041114556160 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041114248960 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041113941760 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041113634560 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
2021-08-25 8:04:11 0 [Note] InnoDB: A semaphore wait:
|
--Thread 140041113327360 has waited at dict0dict.cc line 1083 for 10.00 seconds the semaphore:
|
Mutex at 0x55c5e83dd040, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1027, lock var 2
|
|
InnoDB: Pending reads 0, writes 0
|
2021-08-25 8:04:11 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
|
210825 8:04:11 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=347
|
max_threads=402
|
thread_count=347
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55c5e7c0549e]
|
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55c5e7663305]
|
sigaction.c:0(__restore_rt)[0x7f84317a0140]
|
linux/raise.c:51(__GI_raise)[0x7f84312e9ce1]
|
stdlib/abort.c:81(__GI_abort)[0x7f84312d3537]
|
/usr/sbin/mariadbd(+0x63724f)[0x55c5e732224f]
|
/usr/sbin/mariadbd(+0x62c07d)[0x55c5e731707d]
|
ut/ut0ut.cc:576(ib::fatal::~fatal() [clone .cold])[0x55c5e7b9b60b]
|
srv/srv0srv.cc:1337(srv_monitor_task.cold)[0x55c5e7b9c4c2]
|
bits/atomic_base.h:404(std::__atomic_base<bool>::store(bool, std::memory_order))[0x55c5e7b9b3cf]
|
2021-08-25 8:04:14 0 [Note] InnoDB: Truncated .//undo002
|
??:0(std::error_code::default_error_condition() const)[0x7f8431687ed0]
|
nptl/pthread_create.c:478(start_thread)[0x7f8431794ea7]
|
x86_64/clone.S:97(__GI___clone)[0x7f84313abdef]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
crash.log
|
|
the full log at the time of crash log.txt
|
|
@Marko Mäkelä, do you have all the informations needed or do you need more stack ?
|
|
martina342, the InnoDB "semaphore wait" output is insufficient for analyzing hangs.
https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ uses the terms "backtrace" or "bt" to refer to stack traces.
|
|
can you please check crash.log ? It is the "backtrace" (bt)
|
|
Sorry, indeed, crash.log it is. I see page_id=... in many stack traces. The following command should prevent such hiding:
set print frame-arguments all
|
I see lock=0x7f83dc1c2578 being passed to pfs_rw_lock_s_lock_func() in many threads. Based on the stack traces, that is an index page. Thread 44 is waiting for the exclusive page latch, which will cause any subsequent shared latch requests to wait until the exclusive latch has been granted and released. That purge thread is also by design holding dict_sys.latch in shared mode (this was just fixed in MDEV-24258). That dict_sys.latch which in turn will block some other thread that is waiting for dict_sys.mutex. But, I cannot quickly determine which thread is blocking Thread 44. It might even be impossible based on the available information.
These stack traces are definitely a step to the right direction to analyzing this, but it would be very helpful to see some more data from the core dump. If you still have the core dump from which crash.log was generated, then please provide the following:
print dict_sys.latch
|
print dict_sys.mutex
|
If you have to generate a new core dump first, then I would also need the stack traces for that core dump.
Theoretically, I could go through all 361 stack traces and try to match them with the source code, to see which threads could possibly be holding the exclusive latch, but I would like to save that effort.
|
|
I have a new crash log from a new core dump
crash2.log
(gdb) print dict_sys.latch
|
$1 = {<ilist_node<void>> = {next = 0x55f6b7f887a0, prev = 0x55f6b7f8a850}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = {
|
m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x55f6b7f88e08, wait_ex_event = 0x55f6b7f88e98,
|
cfile_name = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x55f61010fdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 4,
|
pfs_psi = 0x0}
|
(gdb) print dict_sys.mutex
|
$2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x55f6b7f88d58, m_policy = {
|
m_filename = 0x55f61011a268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0}
|
|
|
|
Is crash2.log from MariaDB Server 10.5.12? The line number in srv_monitor_task() suggests so. I only see 3 InnoDB threads being executed: the purge coordinator in Thread 63 (busy, but not waiting for anything), the page cleaner (idle) in Thread 42, and the crashing srv_monitor_task() in Thread 1. That looks like a false alarm; maybe a larger value of innodb_fatal_semaphore_wait_threshold could help avoid that.
In any case, the stack traces are completely different from crash.log , because there is no os_event::wait to be seen.
|
|
yes it's 10.5.12
|
|
martina342 Good, I would tend to believe that some fixes in 10.5.12 (such as MDEV-25801) could have removed the original slowness that you reported.
I do not have any idea what could trigger a false alarm in the InnoDB watchdog. The watchdog has remained largely unchanged for several years, and the last time I remember fixing a false alarm was some 10 years ago. In 10.6, I removed the "sync array" and rewrote the watchdog to cover only the dict_sys.mutex or dict_sys.latch.
You might want to ensure that the InnoDB buffer pool and redo log are configured to be large enough for your workload.
|
|
i'm not sure if the crash2 was the first crash after the semaphore lock. Maybe it was the crash once the data was corrupted by the first crash and I tried to restart mariadb.
I will send you another crash with the correct settings if I have another crash that I'm sure is the one after lock
|
|
Hello
it crashed again.
2021-09-04 20:39:02 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 1000 seconds. We intentionally crash the server because it appears to be hung.
|
210904 20:39:02 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.12-MariaDB-1:10.5.12+maria~bullseye
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=197
|
max_threads=402
|
thread_count=197
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016036 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x559dbb56749e]
|
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x559dbafc5305]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f4c57031140]
|
2021-09-04 20:39:11 0 [Note] InnoDB: Truncated .//undo001
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f4c56b7ace1]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f4c56b64537]
|
/usr/sbin/mariadbd(+0x63724f)[0x559dbac8424f]
|
/usr/sbin/mariadbd(+0x62c07d)[0x559dbac7907d]
|
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x559dbb4fd60b]
|
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x559dbb4fe4c2]
|
2021-09-04 20:39:18 0 [Note] InnoDB: Truncating .//undo002
|
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x559dbb4fd3cf]
|
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f4c56f18ed0]
|
2021-09-04 20:39:23 0 [Note] InnoDB: Truncated .//undo002
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f4c57025ea7]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4c56c3cdef]
|
|
(gdb) print dict_sys.latch
|
$1 = {<ilist_node<void>> = {next = 0x559e6324d840, prev = 0x559e6324f8c0}, lock_word = {m = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}, waiters = {
|
m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, sx_recursive = 0, writer_thread = 0, event = 0x559e6324de78, wait_ex_event = 0x559e6324df08,
|
cfile_name = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", last_x_file_name = 0x559dbb7abdb8 "./storage/innobase/srv/srv0srv.cc", cline = 1036, is_block_lock = 0, last_x_line = 1538, count_os_wait = 3872,
|
pfs_psi = 0x0}
|
|
(gdb) print dict_sys.mutex
|
$2 = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x559e6324ddc8, m_policy = {
|
m_filename = 0x559dbb7b6268 "./storage/innobase/dict/dict0dict.cc", m_line = 1027, m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_DICT_SYS}}, m_ptr = 0x0}
|
|
|
the backtrace is attached on crash3.log
|
|
hello,
today I got logs of semaphore wait, but which didn't resultet in a crash.
See attached semaphore without crash.txt
|
|
update : it's waiting for semaphore right now again, with all requests frozen on the server.
They are stuck in "statistics" or "sending data", they are very easy requests.
I have a wordpress website and each visitor generate this request stuck on "statistics" :
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
I have tens of this request on my process list.
This table have 3 853 rows for 20mb
edit: it unstucked itself after 15 minutes, all requests were processed
|
|
semaphore without crash.txt does not include enough information for analysis. For crash3.log it does not look like dict_sys is really locked. I only see 3 threads whose stack frames contain the string storage/innobase. None of them are waiting for anything resource inside InnoDB. The monitor task is killing the server, and the 2 other threads are actively executing something.
I think that your system is misconfigured in some way, because such output is not normal. It could be useful to check with sudo perf record or http://poormansprofiler.org or similar tools what is going on. But, I cannot provide free support via this bug reporting system.
|
|
i upgraded to mariadb 10.6, but it's still happening. It's still freezing, but don't crash anymore
the command "top" doesn't show a lot of cpu usage (it show some cpu usage for mariadb)
iotop show zero io for mariadb, and almost zero read / write or io % for other processes
the datadir of mariadb doesn't show any modification since than the time where it started freezing.
I followed this http://poormansprofiler.org/ and executed " gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mariadbd)"
here is the file generated stackfreeze
i don't see anything special on the syslog
mdstat doesn't show any problem with the raid
smartctl doesn't show any error for any of the 2 nvme drives
all requests are stuck. Like
"SHOW STATUS" is stuck on "Filling schema table"
"SHOW ENGINE INNODB STATUS" is stuck on "starting"
others are stuck on "update" or "statistics"
testing the disk doesn't show error :
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.30623 s, 466 MB/s
I have a undo002 file that is 35gb large and at 13h01 it started truncating it, and at 13h25 mariadb stopped doing anything.
My ib_logfile0 is 37gb large.
Here is my mariadb configuration :
- Import all .cnf files from configuration directory
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mariadb.conf.d/
[mysql]
[mysqld]
datadir = /home/mysql
tmpdir = /home/mysqltmp/
default-time-zone = Europe/Paris
wait_timeout = 43200
max_allowed_packet = 64M
query_cache_size = 0
query_cache_type = 0
innodb_buffer_pool_size = 150G
innodb_log_file_size = 37G
innodb_buffer_pool_instances = 64
innodb_undo_log_truncate = ON
skip-name-resolve = 1
tmp_table_size = 256M
max_heap_table_size = 256M
max_connections = 400
net_read_timeout = 250600
net_write_timeout = 250600
join_buffer_size = 1MB
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
innodb-defragment = 1
innodb_online_alter_log_max_size= 50000000000
table_definition_cache = 1000
innodb_undo_tablespaces = 2
table_open_cache = 30000
log_error = /var/log/mysql/error.log
innodb_fatal_semaphore_wait_threshold = 2000
[mariadb]
core_file
I notice a file ddl_recovery.log was created at 13h21
i tried to kill some of the oldest request from processlist, but they stay on processlist with the "Killed" command
i modified the gdb command to add frame arguments :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt" --batch -p $(pidof mariadbd)
here is the output
stackfreeze2
When i executed the command, it showed this "29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory." . I don't know if it's relevant.
in my opinion, there is something wrong in mariadb, not my server configuration. And if I have some mariadb configuration not appropriate for my workload, it shouldn't completely freeze it
here is a third stack generated, in case you would like to compare with the previous one and see if mariadb did anything in between
stackfreeze3
I have set innodb_fatal_semaphore_wait_threshold = 2000
but now, 45 minutes after it started freezeing (it's still freezeing), it didn't crashed or generated logs about semaphore wait
|
|
update : it partially unfroze for a moment and let through some commands at 14h51.
"SHOW ENGINE INNODB STATUS;" worked after waiting for more than an hour
here is the output (censored)
showenginecensored.txt
on the mariadb data folder, the file ib_logfile0 have been updated at 14h51, no other file have modification time changed since 13h23
the processlist still shows a lot of request stuck since the beginning of the freeze. I don't know why some request worked
edit: the mariadb log show some sort aborted, that was the requests on "Creating sort index" that i killed :
2021-09-09 14:51:41 63364 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 63364, query: SELECT ***
2021-09-09 14:51:41 58831 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58831, query: SELECT ***
2021-09-09 14:51:41 55718 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 55718, query: SELECT ***
2021-09-09 14:51:41 58784 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58784, query: SELECT ***
2021-09-09 14:51:41 45088 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 45088, query: SELECT ***
2021-09-09 14:51:41 58798 [Warning] Sort aborted, host: localhost, user: sitewordpress, thread: 58798, query: SELECT ***
not all requests that i killed earlier are gone, i still see some of them with the "killed" command
|
|
i cleared completely the mariadb server (removed mysql data folder, recreate from zero) and I am importing from mysqldump's .sql files, and it still freezes :
Id User Host db Command Time State Info Progress
|
12407 root localhost db Query 535 Update INSERT INTO `import` VALUES (4339807,1992787),(4339807,71207374),(4339808,27759 0.000
|
22303 root localhost sitewordpress Query 371 creating table CREATE TABLE `wbk_appointments` (\n `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\n `name` varchar( 0.000
|
22332 root localhost NULL Killed 347 Filling schema table SHOW STATUS 0.000
|
22493 root localhost NULL Killed 187 Filling schema table show global status like '%dirty%' 0.000
|
22568 root localhost bdd_principale Sleep 114 NULL 0.000
|
22684 root localhost NULL Query 0 starting show processlist 0.000
|
gdb output : stillstuckmariadb.txt
it unstuck itself after 15 minutes. The server was doing nothing else, wasn't overloaded
|
|
freeze again, one simple insert stuck again. But this time, "SHOW ENGINE INNODB STATUS;" or "show status" works
Id User Host db Command Time State Info Progress
12407 root localhost bdd_principale Query 4466 Update INSERT INTO `**` VALUES (2510****RE 0.000
23056 root localhost bdd_principale Query 5802 Waiting for table metadata lock SELECT eff*****, 0.000
25326 root localhost NULL Sleep 50 NULL 0.000
29037 root localhost NULL Query 0 starting show processlist 0.000
the second query waiting for table metadata is because it's on the same table. The insert is stuck for more than 1 hour.
2021-09-13 18:16:57 0x7fe7d01f6700 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 48 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 17624 srv_idle
|
srv_master_thread log flush and writes: 17624
|
----------
|
SEMAPHORES
|
----------
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 90242
|
Purge done for trx's n:o < 70633 undo n:o < 0 state: running
|
History list length 9692
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
---TRANSACTION 90241, ACTIVE 4416 sec inserting
|
mysql tables in use 1, locked 1
|
1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 342
|
MariaDB thread id 12407, OS thread handle 140633604970240, query id 83758 localhost root Update
|
INSERT INTO `****` VALUES (2510******
|
--------
|
FILE I/O
|
--------
|
Pending flushes (fsync) log: 0; buffer pool: 0
|
206 OS file reads, 1454819 OS file writes, 89726 OS fsyncs
|
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
|
-------------------------------------
|
INSERT BUFFER AND ADAPTIVE HASH INDEX
|
-------------------------------------
|
Ibuf: size 1, free list len 0, seg size 2, 0 merges
|
merged operations:
|
insert 0, delete mark 0, delete 0
|
discarded operations:
|
insert 0, delete mark 0, delete 0
|
0.00 hash searches/s, 0.00 non-hash searches/s
|
---
|
LOG
|
---
|
Log sequence number 60340282849
|
Log flushed up to 60340282849
|
Pages flushed up to 28160947137
|
Last checkpoint at 28160947137
|
0 pending log flushes, 0 pending chkp writes
|
50192 log i/o's done, 0.00 log i/o's/second
|
----------------------
|
BUFFER POOL AND MEMORY
|
----------------------
|
Total large memory allocated 161094828032
|
Dictionary memory allocated 956706280
|
Buffer pool size 9716400
|
Free buffers 6346865
|
Database pages 3369535
|
Old database pages 1243812
|
Modified db pages 2030057
|
Percent of dirty pages(LRU & free pages): 20.893
|
Max dirty pages percent: 5.000
|
Pending reads 0
|
Pending writes: LRU 0, flush list 1025
|
Pages made young 33628, not young 77
|
0.00 youngs/s, 0.00 non-youngs/s
|
Pages read 187, created 3498534, written 1403598
|
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
|
No buffer pool page gets since the last printout
|
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
|
LRU len: 3369535, unzip_LRU len: 0
|
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
|
--------------
|
ROW OPERATIONS
|
--------------
|
0 read views open inside InnoDB
|
Process ID=0, Main thread ID=0, state: sleeping
|
Number of rows inserted 127004126, updated 0, deleted 0, read 0
|
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
|
Number of system rows inserted 0, updated 0, deleted 0, read 0
|
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
|
----------------------------
|
END OF INNODB MONITOR OUTPUT
|
output of show status :
show status.txt
gdb output :
stillstuckmariadb3
Marko, do you need more infos ?
|
|
martina342, to which version did you upgrade? Unfortunately, stillstuckmariadb3 is showing ?? for all InnoDB functions. I do not think that a metadata lock wait counts as a server hang. You can try setting lock_wait_timeout to a smaller value if you expect frequent conflicts with DDL operations or statements like FLUSH TABLES WITH READ LOCK.
|
|
it is version 10.6.4-MariaDB-1:10.6.4+maria~bullseye mariadb.org binary distribution
did you saw that I got a freeze during 15 minutes, and any request like "SHOW status" was freezing with "Filling schema table"
"stillstuckmariadb3 is showing ?? for all InnoDB functions."
what do you recommand to make it show the function name instead of "??" ?
modifying the variable "lock_wait_timeout" would be only a temporary workaround, right ?
I don't know if that was about "Waiting for table metadata lock", which is not important here as it is probably the dump locking the table.
My problem is not that, but the INSERT which was stuck (i killed it after more than 2 hours), while other imports for the same table took much less time, and mariadb was not doing anyting
|
|
i started over again by removing the mariadb data folder and reimporting again the mysqldump backups.
This time it froze earlier, with another table
Id User Host db Command Time State Info Progress
|
3793 root localhost bdd_principale Query 15033 Update INSERT INTO `***` VALUES ('***** 0.000
|
16167 root localhost ***** Query 37960 Sending data SELECT /*!40001 SQL_NO_CACHE */ `id`,***** 0.000
|
44841 root localhost NULL Sleep 9064 NULL 0.000
|
53986 root localhost NULL Query 0 starting show processlist 0.000
|
the query "SELECT /*!40001 SQL_NO_CACHE" that kicked in is the automatic backup, but it's stuck too.
The table at which is stuck at, the frm file is about 100MB, but it got stuck at 68MB
I ran mariadb using perf record. It generated a 8GB file perf.data.
Here is the result (the beginning) for "perf report".
Samples: 217M of event 'cycles', Event count (approx.): 156165128856310
|
Overhead Command Shared Object Symbol
|
65.66% mariadbd mariadbd [.] 0x0000000000de75e3 â—†
|
30.53% mariadbd mariadbd [.] 0x0000000000de75e7 â–’
|
0.91% mariadbd mariadbd [.] 0x0000000000de75ee â–’
|
0.28% mariadbd mariadbd [.] MYSQLparse â–’
|
0.10% mariadbd mariadbd [.] 0x0000000000de75de â–’
|
0.08% mariadbd mariadbd [.] Lex_input_stream::get_text â–’
|
0.07% mariadbd mariadbd [.] 0x0000000000d1c195 â–’
|
0.06% mariadbd mariadbd [.] Lex_input_stream::lex_one_token â–’
|
0.04% mariadbd mariadbd [.] my_numchars_mb â–’
|
0.03% mariadbd [kernel.kallsyms] [k] asm_sysvec_apic_timer_interrupt â–’
|
0.03% mariadbd libc-2.31.so [.] __memmove_avx_unaligned_erms â–’
|
0.03% mariadbd mariadbd [.] 0x0000000000ddb73e â–’
|
0.03% mariadbd [kernel.kallsyms] [k] ktime_get_update_offsets_now â–’
|
0.02% mariadbd mariadbd [.] fill_record â–’
|
0.02% mariadbd mariadbd [.] 0x0000000000f24360 â–’
|
0.02% mariadbd mariadbd [.] 0x0000000000ddb707 â–’
|
0.02% mariadbd mariadbd [.] setup_fields â–’
|
0.02% mariadbd [kernel.kallsyms] [k] psi_group_change â–’
|
0.02% mariadbd mariadbd [.] 0x0000000000de75eb â–’
|
0.02% mariadbd mariadbd [.] mysql_insert â–’
|
0.02% mariadbd [kernel.kallsyms] [k] native_write_msr â–’
|
0.02% mariadbd libc-2.31.so [.] _int_malloc â–’
|
0.02% mariadbd [kernel.kallsyms] [k] irq_enter_rcu â–’
|
0.02% mariadbd mariadbd [.] alloc_root â–’
|
0.02% mariadbd libc-2.31.so [.] __memcmp_avx2_movbe â–’
|
0.02% mariadbd [kernel.kallsyms] [k] acct_account_cputime â–’
|
0.02% mariadbd mariadbd [.] Query_arena::free_items â–’
|
0.02% mariadbd mariadbd [.] 0x0000000000cf8c70 â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000d1c57b â–’
|
0.01% mariadbd mariadbd [.] get_charset â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000ddb784 â–’
|
0.01% mariadbd [kernel.kallsyms] [k] hrtimer_interrupt â–’
|
0.01% mariadbd mariadbd [.] Lex_input_stream::unescape â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000f29083 â–’
|
0.01% mariadbd [kernel.kallsyms] [k] cpufreq_this_cpu_can_update â–’
|
0.01% mariadbd [kernel.kallsyms] [k] pvclock_gtod_notify â–’
|
0.01% mariadbd libc-2.31.so [.] _int_free â–’
|
0.01% mariadbd [kernel.kallsyms] [k] _raw_spin_lock_irqsave â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000f29067 â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000f2904b â–’
|
0.01% mariadbd [kernel.kallsyms] [k] __update_load_avg_se â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000ddb7d4 â–’
|
0.01% mariadbd [kernel.kallsyms] [k] perf_event_task_tick â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000f24363 â–’
|
0.01% mariadbd mariadbd [.] THD::make_string_literal â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000d1c573 â–’
|
0.01% mariadbd mariadbd [.] 0x0000000000d1b13a â–’
|
0.01% mariadbd [kernel.kallsyms] [k] native_read_msr â–’
|
0.01% mariadbd [kernel.kallsyms] [k] account_user_time â–’
|
0.01% mariadbd [kernel.kallsyms] [k] update_curr â–’
|
0.01% mariadbd mariadbd [.] my_convert â–’
|
0.01% mariadbd [kernel.kallsyms] [k] __sysvec_apic_timer_interrupt
|
i suppose I should compile a debug version of mariadb to show real names instead of "??" or "0x..." ?
|
|
i compiled a debug version to test, but it's importing like 20 times slower. It will take a week to import
|
|
martina342, I just realized that stillstuckmariadb3 is missing the function parameters as well. This would suggest that separately available "dbgsym" packages were not installed.
I do not know why the non-stripped executable would be missing InnoDB symbols. That is not a new problem, and it seems to be affecting every system except Microsoft Windows. I suspect that it has something to do with the way how the intermediate target libinnobase.a is being built and linked with the final executables, but I have not investigated it. I think that the numeric addresses from perf could be due to the same problem.
I think that you will have to install additional packages, with names something like the following:
mariadb-server-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb
mariadb-server-core-10.6-dbgsym_10.6.4+maria~bullseye_amd64.deb
After installing those, we should see fully resolved stack traces, with function parameter values as well.
Note: The debugging information for GDB and similar tools independent of additional runtime checks that would be enabled in special debug builds. I do not think that it would be useful to track this problem using a debug build, because the timing characteristics could become completely different.
|
|
martina342 thanks for helping out marko. I've been getting ready to release the below functionality as a service, its still really early days so any problems let me know.
This uses a build of the 10.6, exactly what appears in a release with the debug symbols installed in the container. This is from about 4 days ago currently (ref: https://quay.io/repository/mariadb-foundation/mariadb-debug?tab=tags).
I usually use podman, but docker works as well. Same arguments on command line:
Steps:
can you copy your cnf file, without datadir, tmpdir or log_error to a file in a directory my.cnf.d
Run the latest 10.6 version with the same configuration on your current data (or a copy if you want).
|
run image with debug info
|
docker run --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0.0.1:3306:3306 -d --name mariadb quay.io/mariadb-foundation/mariadb-debug:10.6-amd64
|
Note: seccomp is docker only.
|
install gdb into container
|
docker exec -ti mariadb sh -c "apt-get update && apt-get install -y gdb"
|
When you are at a point and want to take a snapshot:
|
take snapshot
|
docker exec -ti mariadb gosu mysql gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" --eval-command="print dict_sys.latch" --eval-command="print dict_sys.mutex" -p 1 | tee mariadb_bt_all_threads.txt
|
Test this now before applying workload to check all permissions have been relaxed enough to examine workload. Look at logs of the container with:
|
container logs
|
docker logs mariadb
|
Apply workload
To include some metadata like which exact build it was generated from in the same file:
|
capture metadata
|
docker exec -ti mariadb cat /manifest.txt | tee -a mariadb_bt_all_threads.txt
|
In my test with Marko mariadb_bt_all_threads.txt was enough of a backtrace.
|
|
thanks both of you.
Before your message, Daniel, I already had started over again, but with the package mariadb-server-10.6-dbgsym installed , so I let it run until it froze, instead of stopping it.
So it looks like it's blocked again
Id User Host db Command Time State Info Progress
|
2717 root localhost bdd_principale Query 3625 Update INSERT INTO `i***********998 0.000
|
13685 root localhost bdd_principale Sleep 30 NULL 0.000
|
13717 root localhost NULL Query 0 starting show processlist 0.000
|
I ran this :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "thread apply all bt full" --batch -p $(pidof mariadbd)
Here is the output :
hopeenoughdata.txt
And this command :
gdb -ex "set pagination 0" -ex "set print frame-arguments all" -ex "print dict_sys.latch" --batch -p $(pidof mariadbd)
[New LWP 2201079]
[New LWP 2201080]
[New LWP 2201092]
[New LWP 2201098]
[New LWP 2201100]
[New LWP 2201104]
[New LWP 2201136]
[New LWP 2202646]
[New LWP 2209615]
[New LWP 2212111]
[New LWP 2228121]
[New LWP 2229924]
[New LWP 2253944]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f2c857913ff in _GI__poll (fds=0x560ff3d83bf8, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
No symbol "dict_sys" in current context.
[Inferior 1 (process 2201077) detached]
And here is the perf record
40.05% mariadbd mariadbd [.] 0x0000000000de75e3
|
15.19% mariadbd mariadbd [.] 0x0000000000de75e7
|
8.74% mariadbd mariadbd [.] MYSQLparse
|
1.80% mariadbd mariadbd [.] Lex_input_stream::lex_one_token
|
1.71% mariadbd mariadbd [.] 0x0000000000d1c195
|
1.46% mariadbd mariadbd [.] Lex_input_stream::get_text
|
0.55% mariadbd mariadbd [.] setup_fields
|
0.54% mariadbd mariadbd [.] 0x0000000000ddb7d4
|
0.53% mariadbd mariadbd [.] my_numchars_mb
|
0.48% mariadbd mariadbd [.] fill_record
|
0.48% mariadbd mariadbd [.] Query_arena::free_items
|
0.45% mariadbd libc-2.31.so [.] __memmove_avx_unaligned_erms
|
0.44% mariadbd libc-2.31.so [.] __memcmp_avx2_movbe
|
0.42% mariadbd mariadbd [.] alloc_root
|
0.38% mariadbd mariadbd [.] 0x0000000000de75ee
|
0.37% mariadbd mariadbd [.] 0x0000000000cf8c70
|
0.37% mariadbd libc-2.31.so [.] _int_malloc
|
0.35% mariadbd mariadbd [.] THD::make_string_literal
|
0.34% mariadbd mariadbd [.] 0x0000000000ddb707
|
0.33% mariadbd mariadbd [.] 0x0000000000ddb73e
|
0.33% mariadbd mariadbd [.] Item::Item
|
0.31% mariadbd libc-2.31.so [.] _int_free
|
0.30% mariadbd mariadbd [.] 0x0000000000ddb844
|
0.28% mariadbd mariadbd [.] get_charset
|
0.27% mariadbd mariadbd [.] Lex_input_stream::add_digest_token
|
0.27% mariadbd mariadbd [.] 0x0000000000ddbb81
|
0.26% mariadbd mariadbd [.] 0x0000000000d1c57b
|
0.22% mariadbd mariadbd [.] 0x0000000000ddb784
|
0.22% mariadbd mariadbd [.] Lex_input_stream::lex_token
|
0.21% mariadbd mariadbd [.] 0x0000000000f24360
|
0.21% mariadbd mariadbd [.] 0x0000000000d1c573
|
0.21% mariadbd mariadbd [.] mysql_insert
|
0.20% mariadbd mariadbd [.] 0x0000000000d1b1ce
|
0.18% mariadbd mariadbd [.] 0x0000000000cd9192
|
0.18% mariadbd mariadbd [.] 0x0000000000d1b13a
|
0.18% mariadbd mariadbd [.] 0x0000000000f2904b
|
0.18% mariadbd mariadbd [.] 0x0000000000f29083
|
0.18% mariadbd mariadbd [.] 0x0000000000d1b1ef
|
0.17% mariadbd mariadbd [.] 0x0000000000f29067
|
0.17% mariadbd mariadbd [.] Item::set_name
|
0.16% mariadbd mariadbd [.] Item::cleanup
|
0.16% mariadbd libc-2.31.so [.] malloc
|
0.16% mariadbd mariadbd [.] Field_blob::store
|
0.15% mariadbd mariadbd [.] 0x0000000000f24363
|
0.14% mariadbd mariadbd [.] 0x0000000000ddb8ff
|
0.14% mariadbd mariadbd [.] 0x0000000000d1c259
|
0.14% mariadbd libc-2.31.so [.] __memset_avx2_unaligned_erms
|
0.14% mariadbd mariadbd [.] Item_string::save_in_field
|
0.13% mariadbd mariadbd [.] 0x0000000000ddb8f9
|
0.13% mariadbd mariadbd [.] my_convert
|
0.13% mariadbd mariadbd [.] 0x0000000000dc55cb
|
0.12% mariadbd libc-2.31.so [.] __memmove_avx_unaligned
|
|
I will follow the steps for the docker test
edit: I have an error with the first command :
|
docker run --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined -v ./my.cnf.d:/etc/mysql/conf.d:z -v /home/mysql:/var/lib/mysql -p 127.0.0.1:3306:3306 -d --name mariadb quay.io/mariadb-foundation/mariadb-debug:10.6-amd64
|
Unable to find image 'quay.io/mariadb-foundation/mariadb-debug:10.6-amd64' locally
|
10.6-amd64: Pulling from mariadb-foundation/mariadb-debug
|
5297cdb712ed: Pull complete
|
834628376a35: Pull complete
|
1a4e5185f03f: Pull complete
|
Digest: sha256:d86994bf9f22c7bd097445a0d232835ac15e5847acab190f7602501437582fca
|
Status: Downloaded newer image for quay.io/mariadb-foundation/mariadb-debug:10.6-amd64
|
docker: Error response from daemon: create ./my.cnf.d: "./my.cnf.d" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
|
See 'docker run --help'.
|
the folder exists :
|
|
martina342, it looks like the interesting command thread apply all backtrace was not being executed at all. It is not promising either that the symbol dict_sys.latch was not recognized by the debugger. Hopefully it was just because the current stack frame was in a shared library and not the main mariadbd executable. So, that GDB invocation did not get us any further yet.
I do not know if perf report is supposed to be helped by the dbgsym packages.
If this involves Docker, I understood from earlier discussions with danblack that some extra effort may be needed to make GDB work. He is our Docker expert.
I am sorry that this is so tricky, but for hangs or some forms of performance trouble there really is no substitute for having stack traces of all threads. For some hangs, we would want to see several samples to distinguish genuine deadlocks from mere slowness.
|
|
Sorry, I overlooked the attachment hopeenoughdata.txt again. It seems that we are still missing function call parameter names and values and local variables for most stack frames. I only see them for some library functions, such as the pthread functions.
There is no occurrence of innobase in the output, but there are ?? function names, some of which must belong to InnoDB threads. One is Thread 9 apparently executing an INSERT statement. One of Thread 4 or Thread 3 should be the buf_flush_page_cleaner(), but I cannot know which one, and I have no idea what the other one could be. Maybe it’d be an Aria checkpointer thread, because there are no Aria symbols either. Thread 13 could be running some InnoDB code. We would really need to know what those threads are doing.
One thing that you might try is to set innodb_stats_persistent=OFF or innodb_stats_auto_recalc=OFF. If this hang is related to the automatic recalculation of persistent statistics (MDEV-15020), it should be fixed in the upcoming 10.6.5 release. But, I’d rather expect Thread 13 to run the InnoDB purge coordinator task than the statistics updates.
|
|
i replaced ./my.cnf.d with the fullpath to fix the error I mentioned.
I also have problems to access mariadb from the server to docker as only 127.0.0.1 is allowed, so I had to adjust it.
Here is the log file using docker, without any workload mariadb_bt_all_threads.txt
without workload :
docker logs mariadb
|
2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.
|
2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
|
2021-09-15 19:56:14+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.
|
Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2
|
Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3
|
2021-09-15 19:56:15 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ...
|
2021-09-15 19:56:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2021-09-15 19:56:15 0 [Note] InnoDB: Number of pools: 1
|
2021-09-15 19:56:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2021-09-15 19:56:15 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
|
2021-09-15 19:56:15 0 [Note] InnoDB: Using Linux native AIO
|
2021-09-15 19:56:15 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728
|
2021-09-15 19:56:15 0 [Note] InnoDB: Completed initialization of buffer pool
|
2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces
|
2021-09-15 19:56:15 0 [Note] InnoDB: Opened 2 undo tablespaces
|
2021-09-15 19:56:15 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active.
|
2021-09-15 19:56:15 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2021-09-15 19:56:15 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2021-09-15 19:56:15 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2021-09-15 19:56:15 0 [Note] InnoDB: 10.6.5 started; log sequence number 42906; transaction id 14
|
2021-09-15 19:56:15 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2021-09-15 19:56:15 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 19:56:15 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
|
2021-09-15 19:56:15 0 [Note] Server socket created on IP: '0.0.0.0'.
|
2021-09-15 19:56:15 0 [Note] Server socket created on IP: '::'.
|
2021-09-15 19:56:15 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 19:56:15
|
2021-09-15 19:56:15 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 19:56:15 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 19:56:15 0 [Note] mariadbd: ready for connections.
|
Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
|
|
i'm currently importing data on the container to generate logs with workload, but if the bug is indeed the one fixed on 10.6.5, it should not reproduce anymore
|
|
Hello,
the current import from mysqldump with the docker version of mariadb (10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4 mariadb.org binary distribution) seems to work without complete freeze.
Almost all insert take less than a second. I have a QPS of 5.8, and i'm doing a show processlist every second.
However, I noticed at some point an insert that took more than 15 minutes. I took a stacktrace during it (i censored some informations). The table that was inserting, the ibd file is 332mb "only", not all inserts on this table took this long, even on this table most of the time it took less than a second :
mariadb_bt_all_threads_workload_slowed_down.txt
Here are the mariadb logs :
2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.
|
2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
|
2021-09-15 20:09:05+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.6.5+maria~focal started.
|
Warning: skipping '!includedir /etc/mysql/conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 2
|
Warning: skipping '!includedir /etc/mysql/mariadb.conf.d/' directive as maximum includerecursion level was reached in file /etc/mysql/conf.d/my.cnf at line 3
|
2021-09-15 20:09:06 0 [Note] mariadbd (server 10.6.5-MariaDB-1:10.6.5+maria~focal as 10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4) starting as process 1 ...
|
2021-09-15 20:09:06 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2021-09-15 20:09:06 0 [Note] InnoDB: Number of pools: 1
|
2021-09-15 20:09:06 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2021-09-15 20:09:06 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
|
2021-09-15 20:09:06 0 [Note] InnoDB: Using Linux native AIO
|
2021-09-15 20:09:06 0 [Note] InnoDB: Initializing buffer pool, total size = 161061273600, chunk size = 134217728
|
2021-09-15 20:09:06 0 [Note] InnoDB: Completed initialization of buffer pool
|
2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces
|
2021-09-15 20:09:06 0 [Note] InnoDB: Opened 2 undo tablespaces
|
2021-09-15 20:09:06 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active.
|
2021-09-15 20:09:06 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2021-09-15 20:09:06 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2021-09-15 20:09:06 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2021-09-15 20:09:06 0 [Note] InnoDB: 10.6.5 started; log sequence number 42918; transaction id 14
|
2021-09-15 20:09:06 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2021-09-15 20:09:06 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
2021-09-15 20:09:06 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
|
2021-09-15 20:09:06 0 [Note] InnoDB: Buffer pool(s) load completed at 210915 20:09:06
|
2021-09-15 20:09:06 0 [Note] Server socket created on IP: '0.0.0.0'.
|
2021-09-15 20:09:06 0 [Note] Server socket created on IP: '::'.
|
2021-09-15 20:09:06 0 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 20:09:06 0 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 20:09:06 0 [Note] mariadbd: ready for connections.
|
Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
|
2021-09-15 20:09:39 3 [Warning] Aborted connection 3 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication)
|
2021-09-15 20:19:07 21 [Warning] Aborted connection 21 to db: 'unconnected' user: 'unauthenticated' host: '172.17.0.1' (This connection closed normally without authentication)
|
2021-09-15 20:22:39 24 [Warning] 'user' entry '@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 20:22:39 24 [Warning] 'proxies_priv' entry '@% root@ns3' ignored in --skip-name-resolve mode.
|
2021-09-15 20:23:33 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:23:53 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:23:54 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:24:02 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:24:38 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:24:38 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:24:39 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:24:39 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:25:16 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:25:16 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:25:17 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:25:51 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:26:30 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:26:30 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:26:31 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:26:31 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:27:01 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:31:02 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:31:02 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:31:27 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:34:46 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:34:48 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:35:30 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:35:31 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:35:31 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:36:00 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:36:01 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:36:01 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:36:09 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:42:19 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:42:19 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:42:20 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:42:20 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:42:51 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:42:51 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:43:25 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:45:03 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:45:04 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:48:17 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:48:43 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:48:43 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:48:57 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:49:25 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:49:25 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:49:53 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:49:54 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:49:54 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:49:55 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:50:23 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:52:35 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:52:36 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:53:00 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:55:56 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:55:56 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:56:32 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:56:33 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:56:33 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:57:07 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:57:08 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:57:08 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:58:48 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 20:58:48 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 20:58:49 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 20:58:49 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 20:59:18 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:00:16 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:00:17 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:00:17 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:00:46 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:00:46 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:00:47 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:00:47 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:01:17 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:01:17 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:01:18 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:01:18 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:01:46 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:01:46 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:01:47 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:01:47 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:02:15 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:08:44 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:08:45 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:08:59 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:28:05 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:30:25 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:30:25 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:34:08 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:34:34 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:34:34 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:34:35 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:34:35 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:35:07 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:35:07 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:35:08 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:35:08 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:35:38 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:35:38 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:35:39 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:35:39 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:36:08 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:36:08 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:36:09 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:36:09 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:36:38 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:36:40 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:36:40 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:37:10 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:37:11 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:37:11 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:37:12 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:37:42 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:37:43 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:37:43 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:37:44 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:38:16 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:38:16 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:38:17 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:38:46 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:38:46 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:38:47 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:39:21 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:39:22 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:39:22 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:39:59 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:39:59 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:40:00 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:40:00 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:40:37 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:40:37 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:40:38 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:40:38 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:41:14 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:41:14 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:41:15 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:41:15 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:41:51 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:41:52 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:41:52 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:42:26 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:42:27 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:42:27 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:43:03 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:43:03 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:43:04 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:43:04 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:43:40 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:43:40 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:43:41 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:43:59 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:44:34 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:44:34 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 21:44:35 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:44:35 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 21:45:13 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 21:45:13 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 21:45:14 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 23:00:57 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 23:01:01 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 23:01:02 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 23:01:02 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 23:01:41 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 23:35:25 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 23:35:26 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 23:37:30 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 23:38:05 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-15 23:45:39 0 [Note] InnoDB: Truncated .//undo001
|
2021-09-15 23:45:40 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 23:45:40 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 23:46:19 0 [Note] InnoDB: Truncating .//undo002
|
2021-09-15 23:51:57 0 [Note] InnoDB: Truncated .//undo002
|
2021-09-15 23:51:58 0 [Note] InnoDB: Truncating .//undo001
|
2021-09-16 0:02:12 0 [Note] InnoDB: Truncated .//undo001
|
|
Notice how it often truncate undo logs, but sometime it takes longer and most of the time it's instaneous :
2021-09-15 21:08:59 0 [Note] InnoDB: Truncating .//undo002
2021-09-15 21:12:09 0 [Note] InnoDB: Truncated .//undo002
2021-09-15 21:12:36 0 [Note] InnoDB: Truncating .//undo002
2021-09-15 21:28:04 0 [Note] InnoDB: Truncated .//undo002
This matches with the moment i took the stacktrace.
So if everything continue to do the same behaviour, it's only randomly stuck for some minutes, instead of blocking completely.
So I think there is still something to check. I guess I need to compile mariadb until the next version is released ? Do you have an ETA for it ?
|
|
In mariadb_bt_all_threads_workload_slowed_down.txt I see that undo tablespace truncation is being executed. (Please be aware of MDEV-26450.) There previously was a 10.5 performance regression about that (MDEV-25801), but this is the fixed code.
I also see that the INSERT is waiting for buffer pool LRU eviction. Based on this single snapshot during the workload, I would guess that the data load is being slowed down by ‘eviction flushing’ (writing out least recently used pages so that they can be evicted, to make room for new pages). If it is not possible to make the buffer pool or the redo log bigger, there is nothing that we can do about this. But, let’s wait for the stack trace for the actual hang, to see if there is some other problem. In this stack trace dump, everything still looks normal to me.
How big are the buffer pool and the redo log, and how big would the final table be? Also, what type of storage are you using?
|
|
martina342, I checked the new attachment and composed my message before you posted the update.
It looks like the problem is related to undo tablespace truncation. I think that we must test its performance as part of fixing MDEV-26450. Until that bug has been fully analyzed and fixed, I would not recommend enabling undo tablespace truncation. It is possible that this correctness bug affects all MariaDB releases since 10.2.
I do not have an estimate when we could fix that, and in my opinion it is not feasible to tweak the performance before the code has been confirmed to be correct. It is covered by a simple kill+restart test in our regression test suite, but that test never fails.
I try to treat any corruption bugs with high priority. This one might miss the next round of point releases. We aim to have releases about 4 times per year.
|
|
this are the settings :
wait_timeout = 43200
|
max_allowed_packet = 64M
|
query_cache_size = 0
|
query_cache_type = 0
|
innodb_buffer_pool_size = 150G
|
innodb_log_file_size = 37G
|
innodb_buffer_pool_instances = 64
|
innodb_undo_log_truncate = ON
|
skip-name-resolve = 1
|
tmp_table_size = 256M
|
max_heap_table_size = 256M
|
max_connections = 400
|
net_read_timeout = 250600
|
net_write_timeout = 250600
|
join_buffer_size = 1MB
|
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
|
innodb-defragment = 1
|
innodb_online_alter_log_max_size= 50000000000
|
table_definition_cache = 1000
|
innodb_undo_tablespaces = 2
|
table_open_cache = 30000
|
innodb_fatal_semaphore_wait_threshold = 2000
|
|
[mariadb]
|
core_file
|
I can't make the innodb_buffer_pool_size larger (it's already 150G), i have innodb_log_file_size at 37G, which was suggested by mysqltuner.
The table that was inserting during the dump is 317Mb large only.
I'm using only innodb tables.
https://jira.mariadb.org/browse/MDEV-26450 : this is what happened to me on version 10.5 when the semaphore wait was killing the server and forced me to restore backups. But it also happened on 10.6 during normal shutdown. So i will disable undo truncate until it's fixed.
|
|
The slowness in the container may be as a result of what docker its storage, overlayfs, as a guess. Its good enough for a few package updates but for the datadir it should be in its own volume. "docker volume create mariadbdata", and use that volume "-v mariadbdata:/var/lib/mysql" when the container is started. This should be a direct binding of a directory without the complexities of userspace filesystems.
release schedule: https://jira.mariadb.org/projects/MDEV?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page
|
|
martina342, I see. If both the buffer pool and the data were smaller (so that there wasn’t too much LRU page eviction going on), then I might suggest using a larger redo log file size, so that log checkpoints would be executed less often.
I see that you are not configuring innodb_io_capacity. The default value could be way to small for your storage. (Which type is it? SATA HDD? SSD? NVMe? What is the read and write speed?)
Possibly there still is some room of improvement for our page flushing (MDEV-26055).
In any case, please let me know if the performance is more acceptable when undo tablespace truncation is disabled. That will be helpful when we test the fix of MDEV-26450.
|
|
i'm using a raid 1 nvme (soft) which is apparently rated at this :
Random Read (4kb) : 430 Kiops
Random Write (4kb) : 40 Kiops
Seq. Read : 3000 Mb/s
Seq. Write : 1900 Mb/s
i can't test right now for real world benchmark as i'm still importing backup.
thanks for the hint of innodb_io_capacity, apparently i was bottlenecking my server due to that variable.
Globally the performance is acceptable, except when it seem to freeze like it happened, which seems to happen rarely, so hard to monitor
|
|
On 10.6, you could try to adapt your data load for MDEV-515, to avoid unnecessary row-level undo logging when inserting into empty tables. (And insert into each table only from one transaction.)
SET unique_checks=0, foreign_key_checks=0;
|
This is admittedly a hack, to be compatible with mysqldump files. And you’d better be sure that there are no violations of those checks.
|
|
here is real world benchmark using fio :
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
|
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
|
fio-3.25
|
Starting 1 process
|
test: Laying out IO file (1 file / 4096MiB)
|
Jobs: 1 (f=1): [m(1)][100.0%][r=276MiB/s,w=91.7MiB/s][r=70.6k,w=23.5k IOPS][eta 00m:00s]
|
test: (groupid=0, jobs=1): err= 0: pid=3064140: Fri Sep 17 10:39:57 2021
|
read: IOPS=74.9k, BW=293MiB/s (307MB/s)(3070MiB/10487msec)
|
bw ( KiB/s): min=278040, max=326000, per=100.00%, avg=305779.20, stdev=15118.69, samples=20
|
iops : min=69510, max=81500, avg=76444.80, stdev=3779.67, samples=20
|
write: IOPS=25.0k, BW=97.8MiB/s (103MB/s)(1026MiB/10487msec); 0 zone resets
|
bw ( KiB/s): min=91712, max=108600, per=100.00%, avg=102215.60, stdev=5362.53, samples=20
|
iops : min=22928, max=27148, avg=25553.90, stdev=1340.58, samples=20
|
cpu : usr=11.85%, sys=55.60%, ctx=309103, majf=0, minf=134
|
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
|
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
|
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
|
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
|
latency : target=0, window=0, percentile=100.00%, depth=64
|
|
Run status group 0 (all jobs):
|
READ: bw=293MiB/s (307MB/s), 293MiB/s-293MiB/s (307MB/s-307MB/s), io=3070MiB (3219MB), run=10487-10487msec
|
WRITE: bw=97.8MiB/s (103MB/s), 97.8MiB/s-97.8MiB/s (103MB/s-103MB/s), io=1026MiB (1076MB), run=10487-10487msec
|
|
Disk stats (read/write):
|
md2: ios=779713/260682, merge=0/0, ticks=366080/197908, in_queue=563988, util=99.13%, aggrios=392960/262799, aggrmerge=0/11, aggrticks=185182/181367, aggrin_queue=366550, aggrutil=98.77%
|
nvme1n1: ios=388833/262799, merge=0/11, ticks=187345/181869, in_queue=369215, util=98.69%
|
nvme0n1: ios=397087/262799, merge=0/11, ticks=183019/180865, in_queue=363885, util=98.77%
|
so i guess i could set innodb_io_capacity to at least 15000 ?
Before disabling unto truncation, I've saw another request that took 1 hour instead of being instantaneous.
I've disabled unto truncate, but could't stay to monitor if some other requests were taking too long, but it finished, so I suppose it made it works better.
|
|
DAMN
i've disabled unto truncation on runtime ... and in the my.cnf, but it corrupted the data anyway :
Version: '10.6.5-MariaDB-1:10.6.5+maria~focal' as '10.6.5-MariaDB-b873651e3cae854f887c4c4af39206def5992be4' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
|
2021-09-17 09:09:27 0x7f058b00b700 InnoDB: Assertion failure in file /buildbot/amd64-ubuntu-2004-deb-autobake/build/storage/innobase/trx/trx0purge.cc line 123
|
InnoDB: Failing assertion: purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no()
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mariadbd startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
InnoDB: about forcing recovery.
|
210917 9:09:27 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.6.5-MariaDB-1:10.6.5+maria~focal
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=0
|
max_threads=402
|
thread_count=0
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016262 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x55df3500f828
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f058b00aca8 thread_stack 0x49000
|
mariadbd(my_print_stacktrace+0x32)[0x55df33442c32]
|
please tell me how to fix that without reimporting everything, my client can't wait any more. It took 1.5 days to reimport the data
It was working, and once i restarted mariadb, it crashed
apprently it crashed but doesn't want to stop :
docker exec mariadb ps aux
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mysql 1 4.6 16.3 165041080 43171172 ? Ssl 09:09 2:19 mariadbd
|
mysql 55 0.8 1.5 160747856 4036876 ? S 09:09 0:23 mariadbd
|
root 122 0.0 0.0 10896 4248 ? Ss 09:38 0:00 mysqladmin shutdown
|
root 269 0.0 0.0 5904 2784 ? Rs 09:59 0:00 ps aux
|
|
and the logs ends with this and nothing else after :
2021-09-17 9:09:53 0 [Note] InnoDB: Buffer pool(s) load completed at 210917 9:09:53
|
2021-09-17 9:38:44 0 [Note] mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
|
2021-09-17 9:38:44 0 [Note] InnoDB: FTS optimize thread exiting.
|
|
|
martina342, if the undo logs were already corrupted due to MDEV-26450 in some way (you did not restart from the scratch after disabling that), then I am sorry but some (most?) forms of corruption in the InnoDB system and undo tablespaces are not self-healing or easy to fix.
I do not think that we currently have any option to discard and re-create the undo logs. There is only the option innodb_force_recovery=5 that would ignore all undo logs and treat all reads as READ UNCOMMITTED. Technically, it should not be hard to implement such an option, if the user accepts the resulting inconsistency and the potential corruption of secondary indexes for any partially completed row operations. So far, no support customer has requested that.
|
|
Hello
I started over again, this time with undo truncation disabled from the beginning, and I didn't noticed any freeze or corrumption using the own compiled 10.6.5
|
|
martina342, thank you. Meanwhile, I think I nailed down MDEV-26450.
I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to MDEV-15528. We will handle that as part of this ticket, not as part of the correctness issue, which is MDEV-26450.
|
|
I implemented a fix that avoids writing any pages of the being-truncated undo tablespace. It does not depend on MDEV-15528, but rather on MDEV-25113 and MDEV-15058. A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON:
| connections |
baseline/tps |
baseline 95%/ms |
baseline max/ms |
truncate/tps |
truncate 95%/ms |
truncate max/ms |
| 10 |
95285.41 |
0.19 |
77.87 |
97687.20 |
0.18 |
38.54 |
| 20 |
157405.77 |
0.18 |
195.21 |
159989.50 |
0.19 |
407.60 |
| 40 |
164189.49 |
0.42 |
336.01 |
166069.04 |
0.70 |
333.03 |
| 80 |
163296.38 |
1.32 |
773.72 |
161948.80 |
2.07 |
716.47 |
| 160 |
146380.27 |
5.00 |
161.18 |
143285.16 |
5.88 |
136.79 |
Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.
|
|
It occurred to me that it is better to not hold any mutex while truncating the file. It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.
|
|
I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB.
| connections |
throughput/tps |
95% latency/ms |
max latency/ms |
| 10 |
105850.83 |
0.12 |
19.65 |
| 20 |
161065.65 |
0.17 |
23.45 |
| 40 |
172238.07 |
0.44 |
38.84 |
| 80 |
169323.66 |
0.86 |
108.88 |
I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch.
Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint.
The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.
|
|
origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00
|
which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.
|
|