Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.6, 10.7(EOL)
-
HideDebian GNU/Linux Sid (unstable), Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)ShowDebian GNU/Linux Sid (unstable), Linux version 5.14.0-1-amd64 ( debian-kernel@lists.debian.org ) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)
Description
After a kernel upgrade, I started to experience random hangs in InnoDB related to io_uring. Yesterday with an older kernel
Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)
|
everything ran fine, but today I got several tests failing or hanging when running the following on a nproc=40 machine:
nice ./mtr --parallel=60 --big-test --force
|
Here is an example with the names of hanging tests:
10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89 |
Completed: Failed 7/5108 tests, 99.86% were successful.
|
|
Failing test(s): innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb innodb_gis.rtree_debug
|
I suspect that the culprit is the newer kernel:
Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)
|
If I disable io_uring for the MariaDB server build, then everything will work just fine using the libaio interface (io_submit, io_getevents).
Here is one example of a hang:
10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89 |
innodb_zip.innochecksum_2 '4k,innodb' w40 [ fail ] Found warnings/errors in server log file!
|
Test ended at 2021-09-24 14:43:48
|
line
|
Attempting backtrace. You can use the following information to find out
|
…
|
Thread 7 (Thread 0x7eff55043640 (LWP 954970)):
|
#0 0x00007eff56cb7bd2 in __GI___sigtimedwait (set=0x7eff55042ca8, info=0x7eff55042c20, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36
|
#1 0x000055c55ef1883b in my_sigwait (set=0x7eff55042ca8, sig=0x7eff55042c1c, code=<optimized out>) at /mariadb/10.6/include/my_pthread.h:195
|
#2 signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113
|
#3 0x00007eff57075eae in start_thread (arg=0x7eff55043640) at pthread_create.c:463
|
#4 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 6 (LWP 961402):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 5 (Thread 0x7eff5638b640 (LWP 954807)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00007eff57721cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1
|
#2 0x00007eff57721963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1
|
#3 0x000055c55f872361 in io_uring_wait_cqe_nr (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10, wait_nr=1) at /usr/include/liburing.h:494
|
#4 io_uring_wait_cqe (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10) at /usr/include/liburing.h:514
|
#5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55c562837c90) at /mariadb/10.6/tpool/aio_liburing.cc:137
|
#6 0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#7 0x00007eff57075eae in start_thread (arg=0x7eff5638b640) at pthread_create.c:463
|
#8 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 4 (LWP 961385):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 3 (Thread 0x7eff5781f640 (LWP 954792)):
|
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7eff5781ed10, clockid=1468132416, expected=0, futex_word=0x55c560ae27fc <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323
|
#1 __pthread_cond_wait_common (abstime=0x7eff5781ed10, clockid=1468132416, mutex=0x55c560ae2748 <LOCK_timer+40>, cond=0x55c560ae27d0 <COND_timer>) at pthread_cond_wait.c:520
|
#2 __pthread_cond_timedwait (cond=cond@entry=0x55c560ae27d0 <COND_timer>, mutex=mutex@entry=0x55c560ae2748 <LOCK_timer+40>, abstime=abstime@entry=0x7eff5781ed10) at pthread_cond_wait.c:656
|
#3 0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c560ae27d0 <COND_timer>, mp=0x55c560ae2720 <LOCK_timer>, abstime=abstime@entry=0x7eff5781ed10, file=0x55c55f96aaa9 "/mariadb/10.6/include/mysql/psi/mysql_thread.h", line=line@entry=1088) at /mariadb/10.6/mysys/thr_mutex.c:546
|
#4 0x000055c55f8dc26f in inline_mysql_cond_timedwait (that=0x55c560ae27fc <COND_timer+44>, mutex=0x189, abstime=0x7eff5781ed10, src_file=<optimized out>, src_line=<optimized out>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088
|
#5 timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321
|
#6 0x00007eff57075eae in start_thread (arg=0x7eff5781f640) at pthread_create.c:463
|
#7 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 2 (Thread 0x7efec9ffb640 (LWP 954925)):
|
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7efec9ffac78, clockid=-905991216, expected=0, futex_word=0x55c5627dc54c) at ../sysdeps/nptl/futex-internal.h:323
|
#1 __pthread_cond_wait_common (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:520
|
#2 __pthread_cond_clockwait (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:677
|
#3 __pthread_cond_clockwait (cond=0x55c5627dc520, mutex=0x55c5627cd990, clockid=-905991216, abstime=0x7efec9ffac78) at pthread_cond_wait.c:665
|
#4 0x000055c55f86efaf in std::__condvar::wait_until (this=0x55c5627dc520, __m=<error reading variable>, __clock=1, __abs_time=@0x7efec9ffac78: {tv_sec = 24713, tv_nsec = 284075513}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:169
|
#5 std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:201
|
#6 std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:111
|
#7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __rtime=@0x55c5627cd9b8: {__r = 60000}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:163
|
#8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c5627cd880, lk=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, thread_data=thread_data@entry=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:449
|
#9 0x000055c55f86f14e in tpool::thread_pool_generic::get_task (this=this@entry=0x55c5627cd880, thread_var=thread_var@entry=0x55c5627dc520, t=t@entry=0x7efec9ffad18) at /mariadb/10.6/tpool/tpool_generic.cc:502
|
#10 0x000055c55f86f444 in tpool::thread_pool_generic::worker_main (this=0x55c5627cd880, thread_var=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:547
|
#11 0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#12 0x00007eff57075eae in start_thread (arg=0x7efec9ffb640) at pthread_create.c:463
|
#13 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 1 (Thread 0x7eff5699a840 (LWP 954716)):
|
#0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
|
#1 0x000055c55f25a83a in handle_fatal_signal (sig=6) at /mariadb/10.6/sql/signal_handler.cc:345
|
#2 <signal handler called>
|
#3 0x00007eff57079849 in __pthread_mutex_unlock_usercnt (mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, decr=decr@entry=0) at pthread_mutex_unlock.c:58
|
#4 0x00007eff5707c1a8 in __pthread_cond_wait_common (abstime=0x7fff59490d98, clockid=0, mutex=0x55c5601cf528 <buf_pool+40>, cond=0x55c5601cf5f0 <buf_pool+240>) at pthread_cond_wait.c:421
|
#5 __pthread_cond_timedwait (cond=cond@entry=0x55c5601cf5f0 <buf_pool+240>, mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, abstime=abstime@entry=0x7fff59490d98) at pthread_cond_wait.c:656
|
#6 0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c5601cf5f0 <buf_pool+240>, mp=0x55c5601cf500 <buf_pool>, abstime=0x7fff59490d98, file=0x55c55fbd74a8 "/mariadb/10.6/storage/innobase/buf/buf0flu.cc", line=2435) at /mariadb/10.6/mysys/thr_mutex.c:546
|
#7 0x000055c55f796517 in buf_flush_buffer_pool () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2434
|
#8 0x000055c55f5a1121 in logs_empty_and_mark_files_at_shutdown () at /mariadb/10.6/storage/innobase/log/log0log.cc:1139
|
#9 0x000055c55f6d79a0 in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1970
|
#10 0x000055c55f517318 in innobase_end () at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4273
|
#11 0x000055c55f25b48f in ha_finalize_handlerton (plugin=0x55c5627abcb0) at /mariadb/10.6/sql/handler.cc:595
|
#12 0x000055c55f00b6df in plugin_deinitialize (plugin=0x55c5627abcb0, ref_check=true) at /mariadb/10.6/sql/sql_plugin.cc:1266
|
#13 0x000055c55f009f9a in reap_plugins () at /mariadb/10.6/sql/sql_plugin.cc:1342
|
#14 0x000055c55f00bba3 in plugin_shutdown () at /mariadb/10.6/sql/sql_plugin.cc:2050
|
#15 0x000055c55ef179ed in clean_up (print_message=true) at /mariadb/10.6/sql/mysqld.cc:1923
|
#16 0x000055c55ef196a2 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5844
|
#17 0x00007eff56ca1e4a in __libc_start_main (main=0x55c55ef16a40 <main(int, char**)>, argc=26, argv=0x7fff59491718, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff59491708) at ../csu/libc-start.c:314
|
#18 0x000055c55ef1697a in _start () at /mariadb/10.6/sql/my_decimal.h:147
|
This looks like a shutdown hang, and the problem could be the hung Thread 5. There are also 2 suspicious threads with a null address. I do not remember seeing such stack traces earlier.
Attachments
Issue Links
- causes
-
MDEV-36182 Use libaio when running kernel 5.14
-
- Confirmed
-
- relates to
-
MDEV-24883 add io_uring support for tpool
-
- Closed
-
-
MDEV-26555 main.innodb_ext_key fatal assertion
-
- Closed
-
- links to
Activity
danblack, thank you. I think that stress.ddl_innodb alone should be a reliable test of the hangs. It always runs for 160 seconds for me. It appears that the intention would be to extend the run-time by a factor of 5 when --big-test is specified, but that does not seem to work in my environment.
Quite right - 5.10 working correctly. Same with MDEV-26555 's main.innodb_ext_key test
10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.10.23-200.fc33.x86_64 |
$ mysql-test/mtr --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
|
Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
|
vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
|
Checking leftover processes...
|
Removing old var directory...
|
Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
|
- symlinking 'var' to '/dev/shm/var_auto_Iu3y'
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
Using parallel: 6
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
|
worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
|
innodb_zip.innochecksum_2 'innodb' [ skipped ] Requires debug build
|
gcol.innodb_virtual_basic 'innodb' w2 [ pass ] 877
|
innodb_gis.kill_server 'innodb' w5 [ pass ] 1574
|
gcol.innodb_virtual_basic 'innodb' w2 [ 2 pass ] 805
|
innodb.innodb-64k 'innodb' w1 [ pass ] 1713
|
main.index_merge_innodb 'innodb' w4 [ pass ] 2475
|
gcol.innodb_virtual_basic 'innodb' w2 [ 3 pass ] 873
|
innodb_gis.kill_server 'innodb' w5 [ 2 pass ] 1644
|
gcol.innodb_virtual_basic 'innodb' w2 [ 4 pass ] 872
|
innodb.innodb-64k 'innodb' w1 [ 2 pass ] 1704
|
gcol.innodb_virtual_basic 'innodb' w2 [ 5 pass ] 971
|
main.index_merge_innodb 'innodb' w4 [ 2 pass ] 2441
|
innodb_gis.kill_server 'innodb' w5 [ 3 pass ] 1619
|
gcol.innodb_virtual_basic 'innodb' w2 [ 6 pass ] 886
|
innodb.innodb-64k 'innodb' w1 [ 3 pass ] 1711
|
gcol.innodb_virtual_basic 'innodb' w2 [ 7 pass ] 905
|
innodb_gis.kill_server 'innodb' w5 [ 4 pass ] 1663
|
main.index_merge_innodb 'innodb' w4 [ 3 pass ] 2357
|
innodb.innodb-64k 'innodb' w1 [ 4 pass ] 1696
|
gcol.innodb_virtual_basic 'innodb' w2 [ 8 pass ] 924
|
gcol.innodb_virtual_basic 'innodb' w2 [ 9 pass ] 856
|
innodb_gis.kill_server 'innodb' w5 [ 5 pass ] 1786
|
innodb.innodb-64k 'innodb' w1 [ 5 pass ] 1686
|
gcol.innodb_virtual_basic 'innodb' w2 [ 10 pass ] 1107
|
main.index_merge_innodb 'innodb' w4 [ 4 pass ] 2515
|
innodb_gis.kill_server 'innodb' w5 [ 6 pass ] 1727
|
gcol.innodb_virtual_basic 'innodb' w2 [ 11 pass ] 847
|
innodb.innodb-64k 'innodb' w1 [ 6 pass ] 1733
|
gcol.innodb_virtual_basic 'innodb' w2 [ 12 pass ] 980
|
innodb_gis.kill_server 'innodb' w5 [ 7 pass ] 1634
|
main.index_merge_innodb 'innodb' w4 [ 5 pass ] 2506
|
gcol.innodb_virtual_basic 'innodb' w2 [ 13 pass ] 868
|
innodb.innodb-64k 'innodb' w1 [ 7 pass ] 1700
|
gcol.innodb_virtual_basic 'innodb' w2 [ 14 pass ] 900
|
innodb_gis.kill_server 'innodb' w5 [ 8 pass ] 1589
|
gcol.innodb_virtual_basic 'innodb' w2 [ 15 pass ] 840
|
innodb.innodb-64k 'innodb' w1 [ 8 pass ] 1707
|
main.index_merge_innodb 'innodb' w4 [ 6 pass ] 2654
|
innodb_gis.kill_server 'innodb' w5 [ 9 pass ] 1642
|
gcol.innodb_virtual_basic 'innodb' w2 [ 16 pass ] 1090
|
innodb.innodb-64k 'innodb' w1 [ 9 pass ] 1684
|
gcol.innodb_virtual_basic 'innodb' w2 [ 17 pass ] 978
|
innodb_gis.kill_server 'innodb' w5 [ 10 pass ] 1643
|
main.index_merge_innodb 'innodb' w4 [ 7 pass ] 2413
|
gcol.innodb_virtual_basic 'innodb' w2 [ 18 pass ] 840
|
innodb.innodb-64k 'innodb' w1 [ 10 pass ] 1684
|
gcol.innodb_virtual_basic 'innodb' w2 [ 19 pass ] 941
|
innodb_gis.kill_server 'innodb' w5 [ 11 pass ] 1754
|
gcol.innodb_virtual_basic 'innodb' w2 [ 20 pass ] 839
|
innodb.innodb-64k 'innodb' w1 [ 11 pass ] 1681
|
main.index_merge_innodb 'innodb' w4 [ 8 pass ] 2550
|
innodb_gis.kill_server 'innodb' w5 [ 12 pass ] 1661
|
innodb.innodb-64k 'innodb' w1 [ 12 pass ] 1676
|
innodb_gis.kill_server 'innodb' w5 [ 13 pass ] 1482
|
main.index_merge_innodb 'innodb' w4 [ 9 pass ] 2166
|
innodb.innodb-64k 'innodb' w1 [ 13 pass ] 1665
|
innodb_gis.kill_server 'innodb' w5 [ 14 pass ] 1493
|
main.index_merge_innodb 'innodb' w4 [ 10 pass ] 2291
|
innodb.innodb-64k 'innodb' w1 [ 14 pass ] 1662
|
innodb_gis.kill_server 'innodb' w5 [ 15 pass ] 1644
|
innodb.innodb-64k 'innodb' w1 [ 15 pass ] 1689
|
innodb_gis.kill_server 'innodb' w5 [ 16 pass ] 1501
|
main.index_merge_innodb 'innodb' w4 [ 11 pass ] 2330
|
innodb.innodb-64k 'innodb' w1 [ 16 pass ] 1694
|
innodb_gis.kill_server 'innodb' w5 [ 17 pass ] 1521
|
main.index_merge_innodb 'innodb' w4 [ 12 pass ] 2558
|
innodb.innodb-64k 'innodb' w1 [ 17 pass ] 1678
|
innodb_gis.kill_server 'innodb' w5 [ 18 pass ] 1625
|
innodb_gis.kill_server 'innodb' w5 [ 19 pass ] 1714
|
innodb.innodb-64k 'innodb' w1 [ 18 pass ] 1708
|
main.index_merge_innodb 'innodb' w4 [ 13 pass ] 2610
|
innodb_gis.kill_server 'innodb' w5 [ 20 pass ] 1677
|
innodb.innodb-64k 'innodb' w1 [ 19 pass ] 1732
|
main.index_merge_innodb 'innodb' w4 [ 14 pass ] 2524
|
innodb.innodb-64k 'innodb' w1 [ 20 pass ] 1665
|
main.index_merge_innodb 'innodb' w4 [ 15 pass ] 2228
|
main.index_merge_innodb 'innodb' w4 [ 16 pass ] 2142
|
main.index_merge_innodb 'innodb' w4 [ 17 pass ] 2145
|
main.index_merge_innodb 'innodb' w4 [ 18 pass ] 2482
|
main.index_merge_innodb 'innodb' w4 [ 19 pass ] 2258
|
main.index_merge_innodb 'innodb' w4 [ 20 pass ] 2117
|
stress.ddl_innodb 'innodb' w6 [ pass ] 159829
|
stress.ddl_innodb 'innodb' w6 [ 2 pass ] 160236
|
stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160477
|
stress.ddl_innodb 'innodb' w6 [ 4 pass ] 159516
|
stress.ddl_innodb 'innodb' w6 [ 5 pass ] 159392
|
stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159883
|
stress.ddl_innodb 'innodb' w6 [ 7 pass ] 159986
|
|
As is is 5.15-rc2
5.15.0-0.rc2.20210923git58e2cf5d7946.21.fc36.x86_64 |
$ mysql-test/mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
|
Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
|
Checking leftover processes...
|
Removing old var directory...
|
Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
|
- symlinking 'var' to '/dev/shm/var_auto_1oiy'
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
Using parallel: 8
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
|
worker[7] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
|
worker[6] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139
|
worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159
|
stress.ddl_innodb 'innodb' w6 [ pass ] 159402
|
stress.ddl_innodb 'innodb' w2 [ pass ] 159780
|
stress.ddl_innodb 'innodb' w8 [ pass ] 159673
|
stress.ddl_innodb 'innodb' w1 [ pass ] 161240
|
stress.ddl_innodb 'innodb' w5 [ pass ] 160823
|
stress.ddl_innodb 'innodb' w4 [ pass ] 160948
|
stress.ddl_innodb 'innodb' w7 [ pass ] 161745
|
stress.ddl_innodb 'innodb' w3 [ pass ] 161834
|
stress.ddl_innodb 'innodb' w6 [ 2 pass ] 161652
|
stress.ddl_innodb 'innodb' w2 [ 2 pass ] 162261
|
stress.ddl_innodb 'innodb' w7 [ 2 pass ] 162398
|
stress.ddl_innodb 'innodb' w1 [ 2 pass ] 162923
|
stress.ddl_innodb 'innodb' w8 [ 2 pass ] 163672
|
stress.ddl_innodb 'innodb' w4 [ 2 pass ] 164979
|
stress.ddl_innodb 'innodb' w5 [ 2 pass ] 165931
|
stress.ddl_innodb 'innodb' w3 [ 2 pass ] 164470
|
stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160845
|
stress.ddl_innodb 'innodb' w2 [ 3 pass ] 161123
|
stress.ddl_innodb 'innodb' w1 [ 3 pass ] 161309
|
stress.ddl_innodb 'innodb' w7 [ 3 pass ] 162090
|
stress.ddl_innodb 'innodb' w5 [ 3 pass ] 159452
|
stress.ddl_innodb 'innodb' w8 [ 3 pass ] 161963
|
stress.ddl_innodb 'innodb' w4 [ 3 pass ] 161161
|
stress.ddl_innodb 'innodb' w3 [ 3 pass ] 163132
|
stress.ddl_innodb 'innodb' w6 [ 4 pass ] 160392
|
stress.ddl_innodb 'innodb' w2 [ 4 pass ] 159988
|
stress.ddl_innodb 'innodb' w7 [ 4 pass ] 160013
|
stress.ddl_innodb 'innodb' w8 [ 4 pass ] 160109
|
stress.ddl_innodb 'innodb' w1 [ 4 pass ] 161107
|
stress.ddl_innodb 'innodb' w4 [ 4 pass ] 161043
|
stress.ddl_innodb 'innodb' w5 [ 4 pass ] 162480
|
stress.ddl_innodb 'innodb' w2 [ 5 pass ] 160167
|
stress.ddl_innodb 'innodb' w6 [ 5 pass ] 161083
|
stress.ddl_innodb 'innodb' w8 [ 5 pass ] 160605
|
stress.ddl_innodb 'innodb' w1 [ 5 pass ] 160305
|
stress.ddl_innodb 'innodb' w7 [ 5 pass ] 162074
|
stress.ddl_innodb 'innodb' w4 [ 5 pass ] 159965
|
stress.ddl_innodb 'innodb' w5 [ 5 pass ] 161298
|
stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159764
|
stress.ddl_innodb 'innodb' w2 [ 6 pass ] 160929
|
stress.ddl_innodb 'innodb' w1 [ 6 pass ] 160120
|
stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159705
|
stress.ddl_innodb 'innodb' w4 [ 6 pass ] 160425
|
stress.ddl_innodb 'innodb' w8 [ 6 pass ] 162438
|
stress.ddl_innodb 'innodb' w5 [ 6 pass ] 160353
|
A slightly newer Debian Sid kernel seemed to perform slightly better on today’s 10.6, but a number of tests (including stress.ddl_innodb) would still hang. The kernel identifies itself as follows:
Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-3 (2021-09-28)
|
I tested with Ubuntu Hirsute Hippo, Linux 5.11.0-37-generic and 10.6 (I verified that liburing 0.7-3ubuntu1 was in ldd). I could not reproduce any hangs:
an@jan-HP-ZBook-15u-G5:~/mysql/10.6/mysql-test$ ./mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
|
Logging: ./mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
vardir: /home/jan/mysql/10.6/mysql-test/var
|
Removing old var directory...
|
Creating var directory '/home/jan/mysql/10.6/mysql-test/var'...
|
- symlinking 'var' to '/dev/shm/var_auto_PGur'
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
Using parallel: 8
|
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
|
worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
|
worker[7] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139
|
worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159
|
stress.ddl_innodb 'innodb' w7 [ pass ] 159881
|
stress.ddl_innodb 'innodb' w4 [ pass ] 160407
|
stress.ddl_innodb 'innodb' w2 [ pass ] 160677
|
stress.ddl_innodb 'innodb' w8 [ pass ] 160427
|
stress.ddl_innodb 'innodb' w3 [ pass ] 160237
|
stress.ddl_innodb 'innodb' w1 [ pass ] 160600
|
stress.ddl_innodb 'innodb' w6 [ pass ] 161037
|
stress.ddl_innodb 'innodb' w5 [ pass ] 161692
|
stress.ddl_innodb 'innodb' w7 [ 2 pass ] 159778
|
stress.ddl_innodb 'innodb' w3 [ 2 pass ] 159928
|
stress.ddl_innodb 'innodb' w2 [ 2 pass ] 160536
|
stress.ddl_innodb 'innodb' w4 [ 2 pass ] 160463
|
stress.ddl_innodb 'innodb' w1 [ 2 pass ] 160456
|
stress.ddl_innodb 'innodb' w6 [ 2 pass ] 159589
|
stress.ddl_innodb 'innodb' w8 [ 2 pass ] 160849
|
stress.ddl_innodb 'innodb' w5 [ 2 pass ] 160403
|
stress.ddl_innodb 'innodb' w7 [ 3 pass ] 159267
|
stress.ddl_innodb 'innodb' w2 [ 3 pass ] 159626
|
stress.ddl_innodb 'innodb' w3 [ 3 pass ] 160433
|
stress.ddl_innodb 'innodb' w1 [ 3 pass ] 160003
|
stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160023
|
stress.ddl_innodb 'innodb' w4 [ 3 pass ] 160553
|
stress.ddl_innodb 'innodb' w8 [ 3 pass ] 160566
|
stress.ddl_innodb 'innodb' w5 [ 3 pass ] 160179
|
stress.ddl_innodb 'innodb' w7 [ 4 pass ] 160268
|
stress.ddl_innodb 'innodb' w4 [ 4 pass ] 159403
|
stress.ddl_innodb 'innodb' w6 [ 4 pass ] 159874
|
stress.ddl_innodb 'innodb' w3 [ 4 pass ] 160124
|
stress.ddl_innodb 'innodb' w1 [ 4 pass ] 160159
|
stress.ddl_innodb 'innodb' w2 [ 4 pass ] 160950
|
stress.ddl_innodb 'innodb' w5 [ 4 pass ] 159365
|
stress.ddl_innodb 'innodb' w8 [ 4 pass ] 161081
|
stress.ddl_innodb 'innodb' w7 [ 5 pass ] 159959
|
stress.ddl_innodb 'innodb' w4 [ 5 pass ] 160458
|
stress.ddl_innodb 'innodb' w6 [ 5 pass ] 160472
|
stress.ddl_innodb 'innodb' w1 [ 5 pass ] 160225
|
stress.ddl_innodb 'innodb' w5 [ 5 pass ] 159611
|
stress.ddl_innodb 'innodb' w2 [ 5 pass ] 160540
|
stress.ddl_innodb 'innodb' w3 [ 5 pass ] 160703
|
stress.ddl_innodb 'innodb' w8 [ 5 pass ] 160291
|
stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159318
|
stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159939
|
stress.ddl_innodb 'innodb' w5 [ 6 pass ] 159320
|
stress.ddl_innodb 'innodb' w4 [ 6 pass ] 160414
|
stress.ddl_innodb 'innodb' w1 [ 6 pass ] 159962
|
stress.ddl_innodb 'innodb' w3 [ 6 pass ] 159862
|
stress.ddl_innodb 'innodb' w2 [ 6 pass ] 160644
|
stress.ddl_innodb 'innodb' w8 [ 6 pass ] 160513
|
stress.ddl_innodb 'innodb' w7 [ 7 pass ] 161259
|
stress.ddl_innodb 'innodb' w6 [ 7 pass ] 159565
|
stress.ddl_innodb 'innodb' w1 [ 7 pass ] 159960
|
stress.ddl_innodb 'innodb' w5 [ 7 pass ] 160242
|
stress.ddl_innodb 'innodb' w4 [ 7 pass ] 161071
|
stress.ddl_innodb 'innodb' w3 [ 7 pass ] 160279
|
stress.ddl_innodb 'innodb' w2 [ 7 pass ] 160361
|
stress.ddl_innodb 'innodb' w8 [ 7 pass ] 159958
|
stress.ddl_innodb 'innodb' w7 [ 8 pass ] 159839
|
stress.ddl_innodb 'innodb' w6 [ 8 pass ] 159673
|
stress.ddl_innodb 'innodb' w1 [ 8 pass ] 160580
|
stress.ddl_innodb 'innodb' w5 [ 8 pass ] 160134
|
stress.ddl_innodb 'innodb' w4 [ 8 pass ] 159952
|
stress.ddl_innodb 'innodb' w3 [ 8 pass ] 160099
|
stress.ddl_innodb 'innodb' w8 [ 8 pass ] 159512
|
stress.ddl_innodb 'innodb' w2 [ 8 pass ] 161755
|
stress.ddl_innodb 'innodb' w7 [ 9 pass ] 159864
|
stress.ddl_innodb 'innodb' w6 [ 9 pass ] 160150
|
stress.ddl_innodb 'innodb' w4 [ 9 pass ] 159496
|
stress.ddl_innodb 'innodb' w5 [ 9 pass ] 160073
|
stress.ddl_innodb 'innodb' w8 [ 9 pass ] 159168
|
stress.ddl_innodb 'innodb' w1 [ 9 pass ] 160736
|
stress.ddl_innodb 'innodb' w3 [ 9 pass ] 160313
|
stress.ddl_innodb 'innodb' w2 [ 9 pass ] 159952
|
stress.ddl_innodb 'innodb' w4 [ 10 pass ] 160047
|
stress.ddl_innodb 'innodb' w7 [ 10 pass ] 161330
|
stress.ddl_innodb 'innodb' w5 [ 10 pass ] 159767
|
stress.ddl_innodb 'innodb' w8 [ 10 pass ] 160343
|
stress.ddl_innodb 'innodb' w3 [ 10 pass ] 159819
|
stress.ddl_innodb 'innodb' w6 [ 10 pass ] 162003
|
stress.ddl_innodb 'innodb' w1 [ 10 pass ] 161497
|
stress.ddl_innodb 'innodb' w2 [ 10 pass ] 162471
|
stress.ddl_innodb 'innodb' w5 [ 11 pass ] 159816
|
stress.ddl_innodb 'innodb' w4 [ 11 pass ] 160653
|
stress.ddl_innodb 'innodb' w7 [ 11 pass ] 160286
|
stress.ddl_innodb 'innodb' w6 [ 11 pass ] 159677
|
stress.ddl_innodb 'innodb' w3 [ 11 pass ] 160502
|
stress.ddl_innodb 'innodb' w1 [ 11 pass ] 159874
|
stress.ddl_innodb 'innodb' w8 [ 11 pass ] 162143
|
stress.ddl_innodb 'innodb' w2 [ 11 pass ] 161426
|
stress.ddl_innodb 'innodb' w5 [ 12 pass ] 160648
|
stress.ddl_innodb 'innodb' w4 [ 12 pass ] 161110
|
stress.ddl_innodb 'innodb' w7 [ 12 pass ] 161525
|
stress.ddl_innodb 'innodb' w3 [ 12 pass ] 160173
|
stress.ddl_innodb 'innodb' w6 [ 12 pass ] 161809
|
stress.ddl_innodb 'innodb' w8 [ 12 pass ] 160643
|
stress.ddl_innodb 'innodb' w1 [ 12 pass ] 161858
|
stress.ddl_innodb 'innodb' w2 [ 12 pass ] 161846
|
stress.ddl_innodb 'innodb' w5 [ 13 pass ] 160195
|
stress.ddl_innodb 'innodb' w4 [ 13 pass ] 160647
|
stress.ddl_innodb 'innodb' w7 [ 13 pass ] 159628
|
stress.ddl_innodb 'innodb' w8 [ 13 pass ] 159318
|
stress.ddl_innodb 'innodb' w3 [ 13 pass ] 161039
|
stress.ddl_innodb 'innodb' w6 [ 13 pass ] 160146
|
stress.ddl_innodb 'innodb' w1 [ 13 pass ] 162043
|
stress.ddl_innodb 'innodb' w2 [ 13 pass ] 160847
|
stress.ddl_innodb 'innodb' w5 [ 14 pass ] 159926
|
stress.ddl_innodb 'innodb' w4 [ 14 pass ] 159241
|
stress.ddl_innodb 'innodb' w7 [ 14 pass ] 160624
|
stress.ddl_innodb 'innodb' w8 [ 14 pass ] 159922
|
stress.ddl_innodb 'innodb' w3 [ 14 pass ] 160271
|
stress.ddl_innodb 'innodb' w6 [ 14 pass ] 160511
|
stress.ddl_innodb 'innodb' w1 [ 14 pass ] 161051
|
stress.ddl_innodb 'innodb' w2 [ 14 pass ] 160573
|
stress.ddl_innodb 'innodb' w5 [ 15 pass ] 160873
|
stress.ddl_innodb 'innodb' w7 [ 15 pass ] 159317
|
stress.ddl_innodb 'innodb' w4 [ 15 pass ] 161373
|
stress.ddl_innodb 'innodb' w8 [ 15 pass ] 159308
|
stress.ddl_innodb 'innodb' w3 [ 15 pass ] 160378
|
stress.ddl_innodb 'innodb' w6 [ 15 pass ] 160849
|
stress.ddl_innodb 'innodb' w1 [ 15 pass ] 160462
|
stress.ddl_innodb 'innodb' w2 [ 15 pass ] 160818
|
stress.ddl_innodb 'innodb' w7 [ 16 pass ] 159588
|
stress.ddl_innodb 'innodb' w5 [ 16 pass ] 160404
|
stress.ddl_innodb 'innodb' w4 [ 16 pass ] 160226
|
stress.ddl_innodb 'innodb' w8 [ 16 pass ] 160469
|
stress.ddl_innodb 'innodb' w3 [ 16 pass ] 160032
|
stress.ddl_innodb 'innodb' w6 [ 16 pass ] 159791
|
stress.ddl_innodb 'innodb' w1 [ 16 pass ] 159943
|
stress.ddl_innodb 'innodb' w2 [ 16 pass ] 160588
|
stress.ddl_innodb 'innodb' w7 [ 17 pass ] 159251
|
stress.ddl_innodb 'innodb' w5 [ 17 pass ] 160105
|
stress.ddl_innodb 'innodb' w8 [ 17 pass ] 160939
|
stress.ddl_innodb 'innodb' w4 [ 17 pass ] 161903
|
stress.ddl_innodb 'innodb' w3 [ 17 pass ] 160733
|
stress.ddl_innodb 'innodb' w6 [ 17 pass ] 159743
|
stress.ddl_innodb 'innodb' w1 [ 17 pass ] 160614
|
stress.ddl_innodb 'innodb' w2 [ 17 pass ] 161258
|
stress.ddl_innodb 'innodb' w7 [ 18 pass ] 160573
|
stress.ddl_innodb 'innodb' w8 [ 18 pass ] 159535
|
stress.ddl_innodb 'innodb' w5 [ 18 pass ] 160978
|
stress.ddl_innodb 'innodb' w4 [ 18 pass ] 159197
|
stress.ddl_innodb 'innodb' w3 [ 18 pass ] 159562
|
stress.ddl_innodb 'innodb' w6 [ 18 pass ] 161541
|
stress.ddl_innodb 'innodb' w1 [ 18 pass ] 161174
|
stress.ddl_innodb 'innodb' w2 [ 18 pass ] 162119
|
stress.ddl_innodb 'innodb' w7 [ 19 pass ] 161389
|
stress.ddl_innodb 'innodb' w5 [ 19 pass ] 160521
|
stress.ddl_innodb 'innodb' w8 [ 19 pass ] 160843
|
stress.ddl_innodb 'innodb' w4 [ 19 pass ] 160808
|
stress.ddl_innodb 'innodb' w3 [ 19 pass ] 160351
|
stress.ddl_innodb 'innodb' w6 [ 19 pass ] 162188
|
stress.ddl_innodb 'innodb' w1 [ 19 pass ] 160643
|
stress.ddl_innodb 'innodb' w2 [ 19 pass ] 161393
|
stress.ddl_innodb 'innodb' w7 [ 20 pass ] 160334
|
stress.ddl_innodb 'innodb' w5 [ 20 pass ] 159858
|
stress.ddl_innodb 'innodb' w8 [ 20 pass ] 160552
|
stress.ddl_innodb 'innodb' w4 [ 20 pass ] 160110
|
stress.ddl_innodb 'innodb' w3 [ 20 pass ] 160265
|
stress.ddl_innodb 'innodb' w6 [ 20 pass ] 159800
|
stress.ddl_innodb 'innodb' w1 [ 20 pass ] 159459
|
stress.ddl_innodb 'innodb' w2 [ 20 pass ] 159573
|
stress.ddl_innodb 'innodb' w7 [ pass ] 159908
|
stress.ddl_innodb 'innodb' w7 [ 2 pass ] 159908
|
stress.ddl_innodb 'innodb' w7 [ 3 pass ] 159753
|
stress.ddl_innodb 'innodb' w7 [ 4 pass ] 159958
|
stress.ddl_innodb 'innodb' w7 [ 5 pass ] 159503
|
stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159597
|
stress.ddl_innodb 'innodb' w7 [ 7 pass ] 159679
|
stress.ddl_innodb 'innodb' w7 [ 8 pass ] 160126
|
stress.ddl_innodb 'innodb' w7 [ 9 pass ] 159365
|
stress.ddl_innodb 'innodb' w7 [ 10 pass ] 160122
|
stress.ddl_innodb 'innodb' w7 [ 11 pass ] 160423
|
stress.ddl_innodb 'innodb' w7 [ 12 pass ] 159771
|
stress.ddl_innodb 'innodb' w7 [ 13 pass ] 159654
|
stress.ddl_innodb 'innodb' w7 [ 14 pass ] 160061
|
stress.ddl_innodb 'innodb' w7 [ 15 pass ] 159866
|
stress.ddl_innodb 'innodb' w7 [ 16 pass ] 159555
|
stress.ddl_innodb 'innodb' w7 [ 17 pass ] 160426
|
stress.ddl_innodb 'innodb' w7 [ 18 pass ] 159447
|
stress.ddl_innodb 'innodb' w7 [ 19 pass ] 159955
|
stress.ddl_innodb 'innodb' w7 [ 20 pass ] 160084
|
--------------------------------------------------------------------------
|
The servers were restarted 0 times
|
Spent 28861.196 of 6420 seconds executing testcases
|
|
Completed: All 180 tests were successful.
|
The Debian Sid (unstable) package linux-image-5.14.0-2-amd64 is still affected by this. According to /proc/version the build is 2 days old:
Linux version 5.14.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.9-2 (2021-10-03)
|
- Linux stable kernel 5.14 appears to have io_uring patches backported
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.14.y - Fedora https://koji.fedoraproject.org/koji/packageinfo?packageID=8 following stable branch.
- No further action needed in server code.
- Maybe something in release notice. Need to watch Debian sid if we release a 10.5+ before 5.15 lands there, or Debian users make get the instability they requested
.
Tested Fedora 5.14.9-200.fc34.x86_64 again. It failed. From https://gitlab.com/cki-project/kernel-ark/-/commits/fedora-5.14/ its following the Linux 5.14 LTS stable. I'll get something to Kernel upstream for their testing.
Pushed a test image to quay.io/danielgblack/mariadb-test:uring based on ubuntu:hirsute (and its distro liburing).
quay.io/danielgblack/mariadb-test:uring, 5.14.9-200.fc34.x86_64 |
$ podman run --rm --privileged=true quay.io/danielgblack/mariadb-test:uring
|
Logging: ./mtr --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
vardir: /usr/share/mysql/mysql-test/var
|
Removing old var directory...
|
Creating var directory '/usr/share/mysql/mysql-test/var'...
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB-1:10.6.0
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[4] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
worker[2] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[3] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
stress.ddl_innodb 'innodb' w1 [ pass ] 174276
|
stress.ddl_innodb 'innodb' w4 [ pass ] 175766
|
stress.ddl_innodb 'innodb' w3 [ pass ] 176009
|
stress.ddl_innodb 'innodb' w2 [ pass ] 184728
|
--------------------------------------------------------------------------
|
The servers were restarted 0 times
|
Spent 710.779 of 201 seconds executing testcases
|
|
Completed: All 4 tests were successful.
|
privileged needed for iouring to be enabled in the container.
Checked logs of running instance:
$ podman exec -ti xenodochial_buck bash
|
root@a1c4180e2dd9:/usr/share/mysql/mysql-test# grep uring var/*/log/mysqld.*.err
|
var/1/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing
|
var/2/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing
|
var/3/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing
|
var/4/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing
|
Same running kernel:
from 10.6 build |
$ mysql-test/mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
|
|
stress.ddl_innodb 'innodb' w4 [ pass ] 333060
|
stress.ddl_innodb 'innodb' w2 [ pass ] 335619
|
stress.ddl_innodb 'innodb' w3 [ fail ]
|
Test ended at 2021-10-07 17:06:44
|
|
Version: '10.6.5-MariaDB-log' socket: '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/tmp/3/mysqld.1.sock' port: 16040 Source distribution
|
2021-10-07 17:06:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
211007 17:06:43 [ERROR] mysqld got signal 6 ;
|
podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true --vardir=/vartest |
|
$ podman run -v testvardir:/vartest:Z --rm --privileged=true --entrypoint=bash mariadb_test:uring
|
root@92e943fbdf74:/usr/share/mysql/mysql-test# chown mysql: /vartest
|
|
|
root@54f4f12adac4:/vartest# mount | grep var
|
/dev/mapper/fedora_localhost--live-home on /vartest type ext4 (rw,nosuid,nodev,relatime,seclabel)
|
|
..
|
$ podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true --vardir=/vartest
|
Logging: ./mtr --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb --vardir=/vartest
|
vardir: /vartest
|
Checking leftover processes...
|
Removing old var directory...
|
Couldn't remove directory '/vartest': Permission denied at /usr/share/perl/5.32/File/Find.pm line 511.
|
Creating var directory '/vartest'...
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB-1:10.6.0
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[2] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
stress.ddl_innodb 'innodb' w3 [ pass ] 294106
|
stress.ddl_innodb 'innodb' w2 [ pass ] 295536
|
stress.ddl_innodb 'innodb' w1 [ pass ] 305118
|
stress.ddl_innodb 'innodb' w4 [ pass ] 306769
|
--------------------------------------------------------------------------
|
The servers were restarted 0 times
|
Spent 1201.529 of 323 seconds executing testcases
|
|
Completed: All 4 tests were successful.
|
|
So I'm a little a loss to explain the container behavior difference at the moment.
This has not been fixed in the linux-image-5.14.0-3-amd64 package of Debian Sid (unstable), even though the changelog mentions a potentially related change:
- io_uring: allow conditional reschedule for intensive iterators
|
This kernel identifies itself in /proc/version as follows:
Linux version 5.14.0-3-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.12-1 (2021-10-14)
|
Here is an excerpt from thread apply all backtrace output of GDB that I attached to one of the hung mariadbd processes while running mtr. One of them was for the test stresst.ddl_innodb.
Thread 27 (Thread 0x7faad3fff640 (LWP 55250) "iou-wrk-50544"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 26 (Thread 0x7faaf77ff640 (LWP 53108) "iou-wrk-50083"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 25 (Thread 0x7faafc155640 (LWP 51470) "iou-wrk-50949"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 24 (Thread 0x7faafc10a640 (LWP 51399) "iou-wrk-50950"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
impish tests:
due to bb failures - 5.13.0-19-generic appears faulty.
my testing on local VM shows linux-image-generic 5.13.0.20.31 appears to be passing.
also tested the following without error
testing mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15-rc6/
tested mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/
tests included: |
dan@impish:/usr/share/mysql/mysql-test$ ./mtr --parallel=auto --vardir=/tmp/var --repeat=10 main.innodb_ext_key{,,,,,,,,,,,,,,,}
|
./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
./mtr --vardir=/tmp/var --parallel=4 encryption.innochecksum{,,,,,}
|
using build-43438 from hasky 10.6.5
also tried the Debian kernel (running on Ubuntu):
Preparing to unpack linux-image-5.14.0-3-amd64_5.14.12-1_amd64.deb ...
Unpacking linux-image-5.14.0-3-amd64 (5.14.12-1) .
dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var --parallel=4 encryption.innochecksum{,,,,,}
|
....
|
encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 11666
|
encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 7296
|
encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w3 [ pass ] 6679
|
--------------------------------------------------------------------------
|
The servers were restarted 14 times
|
Spent 637.134 of 196 seconds executing testcases
|
|
Completed: All 72 tests were successful.
|
|
dan@impish:/usr/share/mysql/mysql-test$ uname -a
|
Linux impish 5.14.0-3-amd64 #1 SMP Debian 5.14.12-1 (2021-10-14) x86_64 x86_64 x86_64 GNU/Linux
|
|
dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
Logging: ./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
|
vardir: /tmp/var
|
Checking leftover processes...
|
Removing old var directory...
|
Creating var directory '/tmp/var'...
|
Checking supported features...
|
MariaDB Version 10.6.5-MariaDB-1:10.6.5+maria~impish
|
- SSL connections supported
|
- binaries built with wsrep patch
|
Collecting tests...
|
Installing system database...
|
|
==============================================================================
|
|
TEST WORKER RESULT TIME (ms) or COMMENT
|
--------------------------------------------------------------------------
|
|
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
|
worker[4] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
|
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
|
worker[2] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
|
stress.ddl_innodb 'innodb' w4 [ pass ] 240353
|
stress.ddl_innodb 'innodb' w1 [ pass ] 250223
|
stress.ddl_innodb 'innodb' w2 [ fail ]
|
Test ended at 2021-10-21 09:08:46
|
|
CURRENT_TEST: stress.ddl_innodb
|
|
2021-10-21 9:08:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
211021 9:08:43 [ERROR] mysqld got signal 6 ;
|
(which was known, but we've ruled it explicitly to the kernel)
Retested 5.14.14 mainline and impish 5.13.0-20 kernel both are failing. Upstream bug reported per link.
Retesting 5.15-rc6 and it still appears good.
Don’t try this at home like I did, but I installed the 5.15-rc6 kernel image and modules from https://kernel.ubuntu.com/~kernel-ppa/mainline/ in my Debian Sid system. I had to convert the .deb packages to Debian format first:
for i in linux-*5.15*.deb |
do ar x "$i" && |
zstd -dc < data.tar.zst|gzip >| data.tar.gz && |
zstd -dc < control.tar.zst|gzip >|control.tar.gz && |
ar q ubuntu-"$i" debian-binary control.tar.gz data.tar.gz |
done
|
sudo dpkg -i ubuntu-linux-*5.15*.deb |
sudo ln -s perf_5.14 /usr/bin/perf_5.15 |
With this kernel, the problem appears to be fixed. Even perf appears to work with the above hack.
My joy based on testing with /dev/shm was premature. A Sysbench run on NVMe (using O_DIRECT and ext4fs) hung after a while:
[ 100s ] thds: 16 tps: 132882.45 qps: 132882.45 (r/w/o: 0.00/132882.45/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00
|
[ 105s ] thds: 16 tps: 130555.92 qps: 130555.92 (r/w/o: 0.00/130555.92/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00
|
[ 110s ] thds: 16 tps: 93427.95 qps: 93427.95 (r/w/o: 0.00/93427.95/0.00) lat (ms,99%): 0.29 err/s: 0.00 reconn/s: 0.00
|
[ 115s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 120s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
According to sudo perf top -p $(pgrep mariadbd), most activity takes place in the kernel address space. I attached GDB to the apparently hung process and got lots of threads like this:
(gdb) thr a a bt
|
|
Thread 42 (Thread 0x7fb793fff640 (LWP 528551) "iou-wrk-527007"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 41 (Thread 0x7fb8cabaf640 (LWP 528540) "iou-wrk-526936"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 40 (Thread 0x7fb9dc0a0640 (LWP 528524) "iou-wrk-526959"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 39 (Thread 0x7fb8c0555640 (LWP 528521) "iou-wrk-527463"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 38 (Thread 0x7fb8c04bf640 (LWP 528513) "iou-wrk-527465"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 37 (Thread 0x7fb8c07f8640 (LWP 528512) "iou-wrk-527454"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 36 (Thread 0x7fb8c050a640 (LWP 528509) "iou-wrk-527464"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 35 (Thread 0x7fb8c0636640 (LWP 528506) "iou-wrk-527460"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 34 (Thread 0x7fb8c07ad640 (LWP 528502) "iou-wrk-527455"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 33 (Thread 0x7fb8c05a0640 (LWP 528500) "iou-wrk-527462"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 32 (Thread 0x7fb9dc055640 (LWP 528436) "iou-wrk-527452"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 31 (Thread 0x7fb8c8090640 (LWP 528433) "iou-wrk-527453"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 30 (Thread 0x7fb8c06cc640 (LWP 528409) "iou-wrk-527458"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 29 (Thread 0x7fb8c0474640 (LWP 528391) "iou-wrk-527466"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 28 (Thread 0x7fb8c0762640 (LWP 528364) "iou-wrk-527456"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 27 (Thread 0x7fb8c05eb640 (LWP 528323) "iou-wrk-527461"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 26 (Thread 0x7fb8c0717640 (LWP 528185) "iou-wrk-527457"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 25 (Thread 0x7fb8c0681640 (LWP 527848) "iou-wrk-527459"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
I will reboot into the Debian 5.10 kernel, which worked for me in the past.
With this latest kernel of the stable Debian release, I got no hangs during some extensive Sysbench runs:
Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-5 (2021-09-23)
|
One of the runs was 20 minutes of oltp_update_index using 8 tables, 2 million rows each. The data size grew from about 4GiB to 33GiB due to purge lag. I misconfigured an unreasonably small log, to ensure frequent checkpoints and page writes:
innodb_buffer_pool_size=4G
|
innodb_log_file_size=256M
|
After the setup and the 20 minutes of workload, the filesystem was tortured for further 30 minutes by innodb_fast_shutdown=0:
2021-10-22 10:12:10 0 [Note] InnoDB: Buffer pool(s) load completed at 211022 10:12:10
|
2021-10-22 10:32:10 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
|
2021-10-22 10:32:10 0 [Note] InnoDB: FTS optimize thread exiting.
|
2021-10-22 10:32:10 0 [Note] InnoDB: to purge 90296033 transactions
|
…
|
2021-10-22 11:01:40 0 [Note] InnoDB: to purge 277733 transactions
|
2021-10-22 11:01:43 0 [Note] InnoDB: Starting shutdown...
|
…
|
2021-10-22 11:01:49 0 [Note] InnoDB: Shutdown completed; log sequence number 38731820479; transaction id 213562752
|
2021-10-22 11:01:49 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete
|
I think that this confirms my earlier claim that the Debian linux-image-5.10.0-8-amd64 is unaffected by the bug.
While I currently hope that the problem is in the Linux kernel and not in our use of the io_uring interface, I think that we should implement a simple work-around to avoid hangs until the root cause of the problem has been diagnosed and fixed.
The simple work-around would be as follows: If we are running on a ‘too new’ kernel (anything newer than 5.10 or possibly 5.11) and the executable has been configured with io_uring rather than libaio, we will fall back to innodb_use_native_aio=off.
To work around this problem, we will set innodb_use_native_aio=OFF by default if the executable is configured for io_uring and the Linux kernel version is between 5.11 and 5.15. If the parameter is explicitly set, a warning message will be issued:
10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d |
2021-10-25 13:08:10 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.14.0-3-amd64; see https://jira.mariadb.org/browse/MDEV-26674
|
I updated my Debian Sid kernel to 5.15-rc7 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and unfortunately the hang is still there, albeit a little harder to reproduce. When I attach GDB during a hang, I see much fewer iou-wrk- threads, so presumably it is better now. Due to Jira size limit I will trim the stack traces that are waiting for a buffer page latch while executing UPDATE:
10.6 58fe6b47d4e8580e370a094d8f5768d053aa52c1 |
Thread 26 (Thread 0x7f30c89be640 (LWP 42170) "iou-wrk-39600"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 25 (Thread 0x7f25f6ffd640 (LWP 40227) "mysqld"):
|
…
|
#7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4600) at /usr/include/c++/11/condition_variable:163
|
#8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:449
|
#9 0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600, t=0x7f25f6ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502
|
#10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:547
|
#11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#12 0x00007f30cba0feae in start_thread (arg=0x7f25f6ffd640) at pthread_create.c:463
|
#13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 24 (Thread 0x7f2572ffd640 (LWP 40012) "iou-wrk-39671"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 23 (Thread 0x7f268c30c640 (LWP 39721) "mysqld"):
|
…
|
#7 mtr_t::page_lock (this=this@entry=0x7f268c30a0d0, block=block@entry=0x7f307d8094c0, rw_latch=rw_latch@entry=1) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:1175
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 22 (Thread 0x7f268c357640 (LWP 39720) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24cc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 21 (Thread 0x7f268c3a2640 (LWP 39719) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 20 (Thread 0x7f268c3ed640 (LWP 39718) "mysqld"):
|
…
|
#17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 19 (Thread 0x7f268c438640 (LWP 39717) "mysqld"):
|
…
|
#16 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 18 (Thread 0x7f268c483640 (LWP 39716) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24dc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 17 (Thread 0x7f268c4ce640 (LWP 39715) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 16 (Thread 0x7f26a004b640 (LWP 39714) "mysqld"):
|
…
|
#17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 15 (Thread 0x7f26a0096640 (LWP 39713) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 14 (Thread 0x7f26a00e1640 (LWP 39712) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 13 (Thread 0x7f26a012c640 (LWP 39711) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24ec000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 12 (Thread 0x7f26a0177640 (LWP 39710) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 11 (Thread 0x7f26a01c2640 (LWP 39709) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 10 (Thread 0x7f26a020d640 (LWP 39708) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24fc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 9 (Thread 0x7f30c805b640 (LWP 39707) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 8 (Thread 0x7f30c80a6640 (LWP 39687) "mysqld"):
|
…
|
#19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f2504000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
|
…
|
|
Thread 7 (Thread 0x7f2572ffd640 (LWP 39671) "mysqld"):
|
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7f2572ffcc20, clockid=1929366416, expected=0, futex_word=0x55c7cd5e40a8) at ../sysdeps/nptl/futex-internal.h:323
|
#1 __pthread_cond_wait_common (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:520
|
#2 __pthread_cond_clockwait (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:677
|
#3 __pthread_cond_clockwait (cond=0x55c7cd5e4080, mutex=0x55c7cd5d65f0, clockid=1929366416, abstime=0x7f2572ffcc20) at pthread_cond_wait.c:665
|
#4 0x000055c7cac69697 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x55c7cd5e4080) at /usr/include/c++/11/bits/std_mutex.h:123
|
#5 std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:201
|
#6 std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:111
|
#7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:163
|
#8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:449
|
#9 0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080, t=0x7f2572ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502
|
#10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:547
|
#11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#12 0x00007f30cba0feae in start_thread (arg=0x7f2572ffd640) at pthread_create.c:463
|
#13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 6 (Thread 0x7f30c80f1640 (LWP 39606) "mysqld"):
|
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cb4d15a8 <COND_manager+40>) at ../sysdeps/nptl/futex-internal.h:186
|
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cb4d15c0 <LOCK_manager>, cond=0x55c7cb4d1580 <COND_manager>) at pthread_cond_wait.c:508
|
#2 __pthread_cond_wait (cond=0x55c7cb4d1580 <COND_manager>, mutex=0x55c7cb4d15c0 <LOCK_manager>) at pthread_cond_wait.c:638
|
#3 0x000055c7ca69279c in inline_mysql_cond_wait (mutex=0x55c7cb4d15c0 <LOCK_manager>, that=0x55c7cb4d1580 <COND_manager>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1072
|
#4 handle_manager (arg=<optimized out>) at /mariadb/10.6/sql/sql_manager.cc:103
|
#5 0x00007f30cba0feae in start_thread (arg=0x7f30c80f1640) at pthread_create.c:463
|
#6 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 5 (Thread 0x7f30c813c640 (LWP 39605) "mysqld"):
|
#0 0x00007f30cb50cbd2 in __GI___sigtimedwait (set=0x7f30c813bbc0, info=0x7f30c813bc40, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36
|
#1 0x000055c7ca5dc863 in my_sigwait (code=<synthetic pointer>, sig=0x7f30c813bbbc, set=0x7f30c813bbc0) at /mariadb/10.6/include/my_pthread.h:195
|
#2 signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113
|
#3 0x00007f30cba0feae in start_thread (arg=0x7f30c813c640) at pthread_create.c:463
|
#4 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 4 (Thread 0x7f30c89be640 (LWP 39600) "mysqld"):
|
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cbd4e3a8 <buf_dblwr+104>) at ../sysdeps/nptl/futex-internal.h:186
|
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cbd4e350 <buf_dblwr+16>, cond=0x55c7cbd4e380 <buf_dblwr+64>) at pthread_cond_wait.c:508
|
#2 __pthread_cond_wait (cond=0x55c7cbd4e380 <buf_dblwr+64>, mutex=0x55c7cbd4e350 <buf_dblwr+16>) at pthread_cond_wait.c:638
|
#3 0x000055c7cabf223c in buf_dblwr_t::flush_buffered_writes (this=0x55c7cbd4e340 <buf_dblwr>, size=64) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:562
|
#4 0x000055c7cabf27bb in buf_dblwr_t::add_to_batch (this=0x55c7cbd4e340 <buf_dblwr>, request=..., size=16384) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:756
|
#5 0x000055c7cabf69fc in buf_flush_page (bpage=bpage@entry=0x7f302d7a8140, lru=lru@entry=false, space=space@entry=0x55c7cd6831f0) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:932
|
#6 0x000055c7cabf7c74 in buf_do_flush_list_batch (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1452
|
#7 buf_flush_list (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1527
|
#8 buf_flush_list (max_n=90000, lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1507
|
#9 0x000055c7cabf83f6 in buf_flush_sync_for_checkpoint (lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1922
|
#10 buf_flush_page_cleaner () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2191
|
#11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#12 0x00007f30cba0feae in start_thread (arg=0x7f30c89be640) at pthread_create.c:463
|
#13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 3 (Thread 0x7f30c9423640 (LWP 39598) "mysqld"):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00007f30cbf76cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1
|
#2 0x00007f30cbf76963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1
|
#3 0x000055c7cac6c8c4 in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:494
|
#4 io_uring_wait_cqe (cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:514
|
#5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55c7cd61d440) at /mariadb/10.6/tpool/aio_liburing.cc:137
|
#6 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#7 0x00007f30cba0feae in start_thread (arg=0x7f30c9423640) at pthread_create.c:463
|
#8 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 2 (Thread 0x7f30cc075640 (LWP 39597) "mysqld"):
|
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7f30cc074cd0, clockid=-871936960, expected=0, futex_word=0x55c7cbd584ac <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323
|
#1 __pthread_cond_wait_common (abstime=0x7f30cc074cd0, clockid=-871936960, mutex=0x55c7cbd584c0 <LOCK_timer>, cond=0x55c7cbd58480 <COND_timer>) at pthread_cond_wait.c:520
|
#2 __pthread_cond_timedwait (cond=cond@entry=0x55c7cbd58480 <COND_timer>, mutex=mutex@entry=0x55c7cbd584c0 <LOCK_timer>, abstime=abstime@entry=0x7f30cc074cd0) at pthread_cond_wait.c:656
|
#3 0x000055c7cacc1b51 in inline_mysql_cond_timedwait (abstime=0x7f30cc074cd0, mutex=0x55c7cbd584c0 <LOCK_timer>, that=0x55c7cbd58480 <COND_timer>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088
|
#4 timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321
|
#5 0x00007f30cba0feae in start_thread (arg=0x7f30cc075640) at pthread_create.c:463
|
#6 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
Thread 1 (Thread 0x7f30cb1f0f40 (LWP 39594) "mysqld"):
|
#0 0x00007f30cb5c4543 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffb81911c0) at ../sysdeps/unix/sysv/linux/select.c:41
|
#1 0x000055c7cacbce8f in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29
|
#2 0x000055c7ca5de6ca in close_connections () at /mariadb/10.6/sql/mysqld.cc:1758
|
#3 0x000055c7ca5e5994 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5842
|
#4 0x00007f30cb4f6e4a in __libc_start_main (main=0x55c7ca5aa810 <main(int, char**)>, argc=32, argv=0x7fffb8191aa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb8191a98) at ../csu/libc-start.c:314
|
#5 0x000055c7ca5dc0aa in _start () at /mariadb/10.6/sql/my_decimal.h:147
|
I think that previously with 5.15-rc6, I got it to hang already right before the end of the 2-minute warmup. Now it survived that and the subsequent innodb_fast_shutdown=0 shutdown (which took about 3 minutes). It hung after 7 minutes into the 20-minute run:
[ 400s ] thds: 16 tps: 132745.91 qps: 132745.91 (r/w/o: 0.00/132745.91/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00
|
[ 405s ] thds: 16 tps: 52828.95 qps: 52828.95 (r/w/o: 0.00/52828.95/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00
|
[ 410s ] thds: 16 tps: 114629.30 qps: 114629.30 (r/w/o: 0.00/114629.30/0.00) lat (ms,99%): 0.26 err/s: 0.00 reconn/s: 0.00
|
[ 415s ] thds: 16 tps: 132201.16 qps: 132201.16 (r/w/o: 0.00/132201.16/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00
|
[ 420s ] thds: 16 tps: 68733.43 qps: 68733.43 (r/w/o: 0.00/68733.43/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00
|
[ 425s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 430s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 435s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 440s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 445s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 450s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 455s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 460s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 465s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 470s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 475s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
[ 480s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
|
This was observed on ext4 file system on fast NVMe storage, using innodb_flush_log_at_trx_commit=0 and innodb_log_file_size=256M (yes, tiny to force very frequent checkpoints, a.k.a. MDEV-23855 scenario) and innodb_buffer_pool_size=40G. At the time of the hang, the data directory had grown to 18GiB. The payload is only about 4GiB (8 tables, 2M rows each), but the system tablespace would grow due to purge lag caused by the oltp_update_index test.
For what it is worth, if 5.15-rc7 and 5.14.14 would have worked for me, I would have revised my previous tweak as follows:
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
|
index 72db9df0346..83c163adf23 100644
|
--- a/storage/innobase/handler/ha_innodb.cc
|
+++ b/storage/innobase/handler/ha_innodb.cc
|
@@ -19416,9 +19416,15 @@ static bool innodb_use_native_aio_default()
|
#ifdef HAVE_URING
|
utsname &u= uname_for_io_uring;
|
if (!uname(&u) && u.release[0] == '5' && u.release[1] == '.' &&
|
- u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '6')
|
+ u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '5' &&
|
+ u.release[4] == '.')
|
{
|
- io_uring_may_be_unsafe= u.release;
|
+ /* io_uring() in some Linux kernels 5.11 to 5.14 may hang. */
|
+ if (u.release[3] == '4')
|
+ if (const char *s= strstr(u.version, "5.14."))
|
+ if ((s[5] > '1' && s[6] >= '0') || (s[5] == '1' && s[6] >= '4'))
|
+ return true; /* 5.14.14 and later should be fine */
|
+ io_uring_may_be_unsafe= u.version;
|
return false; /* working around io_uring hangs (MDEV-26674) */
|
}
|
#endif |
Because of the failure of 5.15-rc7, I did not test 5.14.14.
The Linux kernel 5.14.16 was recently packaged for Debian Sid (unstable):
Linux version 5.14.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-12) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.16-1 (2021-11-03)
|
With this kernel, I did not observe any hang during a few Sysbench runs with a total run time between 15 and 30 minutes. The page write rate was about 200 megabytes per second throughout the run. Yes, log checkpoints were very frequent due to tiny innodb_log_file_size, so that io_uring would be invoked more often. (We only invoke it for page writes, not log writes yet.)
I see that 5.14.17 and 5.15.1 were released yesterday. The next step should be to ensure that this hang is not repeatable with 5.15.0, and then update MariaDB so that we would only set innodb_use_native_aio=OFF by default on 5.11, 5.12, 5.13, and 5.14.0 through 5.14.15.
5.15.1 still hung in about 5 seconds when using 64 concurrent client connections and not using numactl to restrict MariaDB from using both CPU sockets (NUMA nodes).
In the mariadbd process I got lots of threads like this:
Thread 83 (Thread 0x7f2452ffd640 (LWP 16478) "iou-wrk-15476"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 82 (Thread 0x7f2563fff640 (LWP 16467) "iou-wrk-16299"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 81 (Thread 0x7f2562ffd640 (LWP 16466) "iou-wrk-16307"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 80 (Thread 0x7f2563fff640 (LWP 16465) "iou-wrk-16299"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 79 (Thread 0x7f2452ffd640 (LWP 16463) "iou-wrk-15476"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 78 (Thread 0x7f251affd640 (LWP 16461) "iou-wrk-15591"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 77 (Thread 0x7f25a8714640 (LWP 16456) "iou-wrk-15384"):
|
#0 0x0000000000000000 in ?? ()
|
I am not sure if I used numactl yesterday to bind the process to 1 CPU, on 5.14.16. I will reboot to that kernel and try to repeat with the exact same test.
With Mariabench-MDEV-26674-io_uring I see no hang during the total test time of 25 minutes. If I change innodb-use-native-aio=0 to 1 in that script, then the same executable will hang in less than 5 seconds into the workload on the 5.14.16 kernel. Again, there was at least 1 iou-wrk- thread with the strange stack trace.
On repeat, I the Linux kernel 5.14.16 hung already during sysbench prepare:
Creating table 'sbtest1'...
|
Inserting 2000000 records into 'sbtest1'
|
Creating a secondary index on 'sbtest1'...
|
Creating table 'sbtest2'...
|
Inserting 2000000 records into 'sbtest2'
|
Creating a secondary index on 'sbtest2'...
|
Creating table 'sbtest3'...
|
Inserting 2000000 records into 'sbtest3'
|
Creating a secondary index on 'sbtest3'...
|
Creating table 'sbtest4'...
|
Inserting 2000000 records into 'sbtest4'
|
I stopped debugging here:
(gdb) thr a a bt
|
|
Thread 27 (Thread 0x7faad786f640 (LWP 28519) "iou-wrk-28359"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
Good news: I compiled a mainline kernel tree
commit 5833291ab6de9c3e2374336b51c814e515e8f3a5 (origin/master, origin/HEAD)
|
Merge: ca2ef2d9f2aa e0217c5ba10d
|
Author: Linus Torvalds
|
Date: Thu Nov 11 15:10:18 2021 -0800
|
|
Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
|
with this patch by Jens Axboe, and I was unable to reproduce any hangs during some 2 hours of testing. I ensured with sudo perf top that some io_uring functions were executed during the test.
With Debian’s linux-image-5.15.0-1-amd64 that is based on the upstream Linux kernel version 5.15.3, my test script passed twice.
/proc/version |
Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18)
|
During the slow shutdown, the NVMe was constantly writing between 520 and 580 MB/s via ext4fs.
2021-11-23 13:44:48 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
|
2021-11-23 13:44:48 0 [Note] InnoDB: FTS optimize thread exiting.
|
2021-11-23 13:44:48 0 [Note] InnoDB: to purge 48555142 transactions
|
…
|
2021-11-23 14:04:48 0 [Note] InnoDB: to purge 1532842 transactions
|
2021-11-23 14:05:03 0 [Note] InnoDB: to purge 930142 transactions
|
2021-11-23 14:05:18 0 [Note] InnoDB: to purge 496042 transactions
|
2021-11-23 14:05:33 0 [Note] InnoDB: to purge 120442 transactions
|
2021-11-23 14:05:37 0 [Note] InnoDB: Starting shutdown...
|
2021-11-23 14:05:37 0 [Note] InnoDB: Dumping buffer pool(s) to /mariadb/sbtest/ib_buffer_pool
|
2021-11-23 14:05:37 0 [Note] InnoDB: Buffer pool(s) dump completed at 211123 14:05:37
|
2021-11-23 14:05:39 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
|
2021-11-23 14:05:39 0 [Note] InnoDB: Shutdown completed; log sequence number 19549445557; transaction id 104928440
|
2021-11-23 14:05:39 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete
|
I am raising the priority to Blocker, so that we will not forget to adjust our work-around (changing the default value of innodb_use_native_aio) before the next releases. What we already know is that Linux 5.15.3 has been fixed, and we could already adjust the code accordingly. danblack reproduced a hang with 5.14.20.
Ubuntu kernel bug reports coming per https://lore.kernel.org/linux-block/96d6241f-7bf0-cefe-947e-ee03d83fb828@samba.org/T/#mb0d0ce280dc0e39bf7e78b3fb07b305672104dc5
Final commit to allow 5.15.3+ added to 10.6
danblack, thank you. For reference, I am quoting the linked discussion:
5.14 is end-of-life and not getting any more releases (the front page of kernel.org should show that.)
In the discussion it was confirmed that 5.11 and 5.13 (which are maintained in Ubuntu but not listed as supported by https://www.kernel.org upstream) are affected and that fixes would have to be ported there.
Should someone want to use io_uring on a Linux kernel between 5.11 and 5.14, they can do that by specifying innodb_use_native_aio=1 in the configuration.
The work-around in MariaDB Server 10.6.5 is only changing the default value from 1 to 0 if the kernel version is between 5.11 and 5.15, and the revised work-around in MariaDB Server 10.6.6 is restricted to kernel versions between 5.11.0 and 5.15.2.
danblack, I am sorry to inform you, but I can reproduce io_uring related hangs on the above mentioned 5.15.3 kernel when running
./mtr --suite=galera
|
(single-threaded) on my system in a 10.6-based branch.
To provide something reproducible, I built git checkout mariadb-10.6.5 (the latest release) and started a test on the Debian linux-image-5.15.0-1-amd64 on my NUMA box (dual Intel Xeon E5-2630):
/proc/version |
Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18)
|
sudo apt install galera-4
|
./mtr --mysqld=--loose-innodb-use-native-aio=1 --suite=galera
|
However, this would cause test failures, because the mariadb-10.6.5 release "banned" all kernels 5.11 through 5.15:
mariadb-10.6.5 |
2021-11-26 8:40:48 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.15.0-1-amd64; see https://jira.mariadb.org/browse/MDEV-26674
|
^ Found warnings in /dev/shm/10.6/mysql-test/var/log/mysqld.1.err
|
To avoid recompiling any server code, we may tweak the test harness:
diff --git a/mysql-test/include/mtr_warnings.sql b/mysql-test/include/mtr_warnings.sql
|
index 0584bb934e2..d807f4e1c12 100644
|
--- a/mysql-test/include/mtr_warnings.sql
|
+++ b/mysql-test/include/mtr_warnings.sql
|
@@ -96,6 +96,7 @@ INSERT INTO global_suppressions VALUES
|
("Event Scheduler"),
|
("Failed to open log"),
|
("Failed to open the existing master info file"),
|
+ ("innodb_use_native_aio may cause hangs"),
|
("Forcing shutdown of [0-9]* plugins"),
|
|
/* |
When not running on a blocklisted kernel, the steps are simpler:
sudo apt install galera-4
|
./mtr --suite=galera
|
Unfortunately (and possibly necessarily for reproducing the hang), these tests will run the test client on only one CPU core, and the server processes will be idle most of the time. The full run should take at least 20 minutes if it succeeds. Each individual test will typically complete in 2 to 25 seconds, but the test galera.galera_restart_on_unknown_option will consume 75 seconds. You can obtain the current test name by running
pgrep mariadbd|while read;do grep ^CURRENT_TEST: /proc/"$REPLY"/fd/1|tail -1;done
|
Yesterday, I reproduced the hang using a 10.6.5 based branch, both with Debug and RelWithDebInfo, which did not seem to run any faster than Debug.
While the test is running, you can check in top whether any mariadbd are actually consuming CPU. Typically, 2 processes should be active. When the processes disappear, pgrep mariadbd and attach GDB to one of them. If you see the following type of output, I would assume that it is the kernel bug.
It seems that I finally got lucky and got a hang during the execution of galera.galera_truncate. On the first process reported by pgrep mariadbd, GDB shows several threads like this:
mariadb-10.6.5 |
Thread 18 (Thread 0x7f6d084bf640 (LWP 458667) "iou-wrk-457320"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 17 (Thread 0x7f6d08474640 (LWP 458645) "iou-wrk-458511"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 16 (Thread 0x7f6d0850a640 (LWP 458553) "iou-wrk-456971"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
In the second process, I saw no such calls.
The gnome-terminal title (anything xterm compatible should work) includes some progress reporting. Below I am including some of the test harness output:
mtr: spent 0:13:21 on 247 tests. 0:06:45 (125 tests) left, 0 failed |
galera.galera_suspend_slave 'innodb' [ pass ] 25399
|
galera.galera_sync_wait_show 'innodb' [ pass ] 2039
|
galera.galera_toi_alter_auto_increment 'innodb' [ pass ] 4065
|
galera.galera_toi_ddl_locking 'innodb' [ pass ] 6046
|
galera.galera_toi_ddl_nonconflicting 'innodb' [ pass ] 2039
|
galera.galera_toi_ddl_sequential 'innodb' [ pass ] 2032
|
galera.galera_toi_drop_database 'innodb' [ pass ] 16646
|
galera.galera_toi_ftwrl 'innodb' [ pass ] 2029
|
galera.galera_toi_lock_exclusive 'innodb' [ pass ] 3029
|
galera.galera_toi_lock_shared 'innodb' [ pass ] 2030
|
galera.galera_toi_truncate 'innodb' [ skipped ] Test requires Galera debug library with debug_sync functionality
|
galera.galera_transaction_read_only 'innodb' [ pass ] 2036
|
galera.galera_transaction_replay 'innodb' [ skipped ] Test requires Galera debug library with debug_sync functionality
|
After I killed one of the mariadbd processes after the test had been hung for several minutes, I got:
mtr: spent 0:20:54 on 248 tests. 0:10:27 (124 tests) left, 1 failed |
galera.galera_truncate 'innodb' [ fail ]
|
Test ended at 2021-11-26 09:05:48
|
|
CURRENT_TEST: galera.galera_truncate
|
mysqltest: At line 19: query 'TRUNCATE TABLE t1' failed: <Unknown> (2013): Lost connection to server during query
|
followed by an excerpt of the server error log.
I also tried the following, but it is not a promising strategy:
./mtr --mysqld=--loose-innodb-use-native-aio=1 --repeat=100 galera.galera_truncate
|
mtr: spent 0:00:08 on 1 tests. 0:00:00 (0 tests) left, 0 failed |
galera.galera_truncate 'innodb' [ pass ] 2073
|
galera.galera_truncate 'innodb' [ 2 pass ] 2069
|
…
|
galera.galera_truncate 'innodb' [ 44 pass ] 2070
|
galera.galera_truncate 'innodb' [ 45 pass ] 2068
|
^Cmysql-test-run: *** ERROR: Got ^C signal
|
The following logs prove that also the 5.10 kernel is affected:
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-bullseye-amd64/builds/418/steps/mtr-galera/logs/kernel
[ 0.000000] Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)
|
galera.galera_truncate_temporary 'innodb' w2 [ fail ] timeout after 900 seconds
|
Test ended at 2021-11-29 07:48:26
|
|
Test case timeout after 900 seconds
|
…
|
Thread 13 (Thread 0x7f3875492700 (LWP 14483)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00007f38828b3cb0 in ?? () from /lib/x86_64-linux-gnu/liburing.so.1
|
#2 0x00007f38828b3963 in __io_uring_get_cqe () from /lib/x86_64-linux-gnu/liburing.so.1
|
#3 0x000055ae066dbd0a in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:494
|
#4 io_uring_wait_cqe (cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:514
|
#5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55ae091d9660) at ./tpool/aio_liburing.cc:137
|
#6 0x00007f38822dbed0 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
|
#7 0x00007f38823e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
|
#8 0x00007f3881fffdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
Among the stack traces, you can see several threads that are waiting for a page latch, because a write completion callback was apparently never called.
Maybe we should disable io_uring also on the 5.10 kernel by default? I wonder if 5.4 is safe. As far as I remember, Ubuntu 20.10 has it but does not include a liburing package. So, we might not have any io_uring tests running on a 5.4 kernel in our CI systems.
Thankfully Ubuntu 20.10 is out of support. And containers are still focal based, so no uring support there for the arbitrary kernels they run on. So on the distribution side, I think we're safe (for supported versions), but still need a 5.15.x correction of aio defaults (so may as well include 5.10 at same time) and kernel 5.16 fix into the sid/fedora/other bleeding edge distros before our next release.
I have been trying to reproduce the hang all day, mostly on the 5.15.5 kernel, but for a short while also on 5.15.3, but I have failed. The failure that I got on the 5.15.3 kernel was not with MariaDB 10.6.5, and the test suite included additional tests.
It is very unfortunate that mtr may by design silently SIGKILL hung servers. In the error log for the test galera.galera_restart_on_unknown_option, I did not see any server shutdown messages at all, not even at the end of the test.
For some unrelated debugging, I have been applying the following patch.
diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc
|
index 4d0d1e2a3a0..abc167a4300 100644
|
--- a/mysql-test/lib/My/SafeProcess/safe_process.cc
|
+++ b/mysql-test/lib/My/SafeProcess/safe_process.cc
|
@@ -144,7 +144,7 @@ static int kill_child(bool was_killed)
|
message("Killing child: %d", child_pid);
|
// Terminate whole process group
|
if (! was_killed)
|
- kill(-child_pid, SIGKILL);
|
+ kill(-child_pid, SIGABRT);
|
|
pid_t ret_pid= waitpid(child_pid, &status, 0);
|
if (ret_pid == child_pid) |
The patch will make at least several rpl tests noisy, because some tests apparently like to clean up processes by SIGKILL, instead of initiating and waiting for proper shutdown. I have not ran galera tests with that.
This hang seems to be very sporadic. Just today, it looks like I got it again in the test encryption.debug_key_management while not actively looking for it.
Thread 21 (Thread 0x7fdc297fa640 (LWP 1066815) "iou-wrk-1064648"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 20 (Thread 0x7fdc4ce26640 (LWP 1066353) "iou-wrk-1064424"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 19 (Thread 0x7fdc4ce26640 (LWP 1064736) "iou-wrk-1064424"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 18 (Thread 0x7fdc45ffb640 (LWP 1064711) "iou-wrk-1064437"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 17 (Thread 0x7fdc467fc640 (LWP 1064670) "iou-wrk-1064436"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 16 (Thread 0x7fdc457fa640 (LWP 1064668) "iou-wrk-1064438"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 15 (Thread 0x7fdc45ffb640 (LWP 1064666) "iou-wrk-1064437"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
This is the 5.15.5 kernel that failed to reproduce anything yesterday:
/proc/version |
Linux version 5.15.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-1 (2021-11-26)
|
The server is not completely hanging; the test encryption.debug_key_management is polling for data page rewrite. That progress would be blocked by the hanging page writes.
I attached and detached GDB several times to the same process during several minutes, and those strange threads remained there.
Eventually, the above mentioned test would time out like this:
encryption.debug_key_management 'innodb' w27 [ fail ] timeout after 900 seconds
|
Test ended at 2021-12-01 14:51:34
|
|
Test case timeout after 900 seconds
|
I just ran 17×100 rounds of the test on mariadb-10.6.5, without success. Same story with 1×100 rounds (without --parallel). Each iteration of the test would complete in about half a second. I suspect that sufficient other concurrent load is needed to reproduce the hang. I will now keep trying with a debug build and
while ./mtr --parallel=auto --suite=encryption; do :; done
|
I suspect that the remaining sporadic hangs (mostly observed on persistent storage, not RAM disk) are due to InnoDB itself. I was confused by the dummy stack frames like this:
Thread 92 (Thread 0x7f55997fa640 (LWP 3003034) "iou-wrk-3002573"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 91 (Thread 0x7f5680ff9640 (LWP 3002588) "iou-wrk-2998748"):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
or like this when diagnosing a core dump:
Thread 8 (LWP 2772238):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 7 (LWP 2771665):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
|
Thread 6 (LWP 2770649):
|
#0 0x0000000000000000 in ?? ()
|
Backtrace stopped: Cannot access memory at address 0x0
|
I was able to repeat a shutdown hang with the following invocation on NVMe (512-byte physical blocks) on a 10.8-based tree:
./mtr --parallel=auto --repeat=10 innodb.undo_truncate_recover,4k{,,,,,,,,,,,,,,}
|
The hang would repeat also in a cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 build (when invoking pwrite() in simulated_aio threads). I will post another update after I have analyzed and fixed this hang, and attempted another io_uring run.
I reported the 10.6 hang of innodb.undo_truncate_recover as MDEV-27414.
On our CI systems, I have noticed checkpoint-related hangs in 10.5+ innodb.innodb-wl5522* tests mostly on Microsoft Windows (where we do not use a RAM disk). We do not even have an io_uring interface in 10.5. Edit: I reproduced this and filed as MDEV-27416.
I think that the remaining hard-to-reproduce sporadic hangs where used to blame io_uring are in our own code.
Our revised work-around of the Linux kernel bug will consider that anything between Linux 5.11.0 and 5.15.2 (inclusive) is potentially broken. The initial work-around that appeared in 10.6.5 disabled innodb_use_native_aio by default on any 5.15 kernel.
I've gotten reports of users that sound similar to this with kernel 5.16, btrfs and io_uring (see https://github.com/NixOS/nixpkgs/issues/160516).
Should I open a separate issue for that?
ajs124, yes, please file a separate ticket. The closest matching open bug is MDEV-27449, reporting corruption on btrfs and Linux 5.15, but it was not stated whether the io_uring interface was being used (instead of libaio).
Repeatable for me as below. I've got a older kernel downloaded for regression testing.
10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.13.16-200.fc34.x86_64stress.ddl_innodb 'innodb'
$ mysql-test/mtr --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
innodb_zip.innochecksum_2 '16k,innodb' [ skipped ] Requires debug build
innodb_zip.innochecksum_2 '4k,innodb' [ skipped ] Requires debug build
innodb_zip.innochecksum_2 '8k,innodb' [ skipped ] Requires debug build
gcol.innodb_virtual_basic 'innodb' w4 [ pass ] 870
innodb_gis.kill_server 'innodb' w8 [ pass ] 1610
innodb.innodb-64k 'innodb' w3 [ pass ] 1700
main.index_merge_innodb 'innodb' w6 [ pass ] 2303
innodb.innodb-64k 'innodb' w3 [ 2 pass ] 1645
main.index_merge_innodb 'innodb' w6 [ 2 pass ] 1980
innodb.innodb-64k 'innodb' w3 [ 3 pass ] 1640
main.index_merge_innodb 'innodb' w6 [ 3 pass ] 2009
innodb.innodb-64k 'innodb' w3 [ 4 pass ] 1635
innodb.innodb-64k 'innodb' w3 [ 5 pass ] 1676
innodb.innodb-64k 'innodb' w3 [ 6 pass ] 1816
innodb.innodb-64k 'innodb' w3 [ 7 pass ] 2015
innodb.innodb-64k 'innodb' w3 [ 8 pass ] 1675
innodb.innodb-64k 'innodb' w3 [ 9 pass ] 1657
innodb.innodb-64k 'innodb' w3 [ 10 pass ] 1646
innodb.innodb-64k 'innodb' w3 [ 11 pass ] 1658
innodb.innodb-64k 'innodb' w3 [ 12 pass ] 1662
innodb.innodb-64k 'innodb' w3 [ 13 pass ] 1631
innodb.innodb-64k 'innodb' w3 [ 14 pass ] 1656
innodb.innodb-64k 'innodb' w3 [ 15 pass ] 1662
innodb.innodb-64k 'innodb' w3 [ 16 pass ] 1663
innodb.innodb-64k 'innodb' w3 [ 17 pass ] 1657
innodb.innodb-64k 'innodb' w3 [ 18 pass ] 1633
innodb.innodb-64k 'innodb' w3 [ 19 pass ] 1632
innodb.innodb-64k 'innodb' w3 [ 20 pass ] 1620
main.index_merge_innodb 'innodb' w6 [ 4 pass ] 62158
main.index_merge_innodb 'innodb' w6 [ 5 pass ] 1766
main.index_merge_innodb 'innodb' w6 [ 6 pass ] 1825
main.index_merge_innodb 'innodb' w6 [ 7 pass ] 1761
main.index_merge_innodb 'innodb' w6 [ 8 pass ] 1772
main.index_merge_innodb 'innodb' w6 [ 9 pass ] 1811
main.index_merge_innodb 'innodb' w6 [ 10 pass ] 1827
main.index_merge_innodb 'innodb' w6 [ 11 pass ] 1769
stress.ddl_innodb 'innodb' w1 [ fail ]
Test ended at 2021-09-27 13:02:39
CURRENT_TEST: stress.ddl_innodb
mysqltest: In included file "./suite/stress/include/ddl1.inc":
included from /home/dan/repos/mariadb-server-10.6/mysql-test/suite/stress/t/ddl_innodb.test at line 37:
At line 201: query 'EXECUTE create_table' failed: <Unknown> (2013): Lost connection to server during query
Core was generated by `/home/dan/repos/build-mariadb-server-10.6/sql/mariadbd --defaults-group-suffix='.
Program terminated with signal SIGABRT, Aborted.
....
Thread 2 (Thread 0x7fc1933fd640 (LWP 125168)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1 0x00007fc19dc5ca9f in __sys_io_uring_enter2 (fd=8, to_submit=0, min_complete=1, flags=1, sig=0x0, sz=8) at syscall.c:64
No locals.
#2 0x00007fc19dc5c02c in _io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, data=0x7fc1933fcbf8) at queue.c:122
flags = 1
nr_available = 0
ret = 0
need_enter = true
cq_overflow_flush = false
cqe = 0x0
err = 0
#3 0x00007fc19dc5befd in __io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, submit=0, wait_nr=1, sigmask=0x0) at queue.c:150
data = {submit = 0, wait_nr = 1, get_flags = 0, sz = 8, arg = 0x0}
#4 0x0000000000d456c2 in io_uring_wait_cqe_nr (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, wait_nr=1) at /usr/local/include/liburing.h:745
No locals.
#5 io_uring_wait_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48) at /usr/local/include/liburing.h:765
No locals.
#6 (anonymous namespace)::aio_uring::thread_routine (aio=0x346e890) at /home/dan/repos/mariadb-server-10.6/tpool/aio_liburing.cc:137
ret = <optimized out>
iocb = <optimized out>
res = <optimized out>
cqe = 0x7fc19dd37210
#7 0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#8 0x00007fc19d635299 in start_thread (arg=0x7fc1933fd640) at pthread_create.c:481
ret = <optimized out>
pd = 0x7fc1933fd640
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140469375850048, -4352454039086231479, 140724256649310, 140724256649311, 0, 140469375850048, 4331297784445341769, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 0
#9 0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Thread 1 (Thread 0x7fc1637fe640 (LWP 125448)):
#0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
pd = <optimized out>
tid = 125448
pid = 125138
val = 0
#1 0x00000000008f8086 in handle_fatal_signal (sig=6) at /home/dan/repos/mariadb-server-10.6/sql/signal_handler.cc:345
tm = {tm_sec = 35, tm_min = 2, tm_hour = 13, tm_mday = 27, tm_mon = 8, tm_year = 121, tm_wday = 1, tm_yday = 269, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x3226d00 "AEST"}
print_invalid_query_pointer = <optimized out>
curr_time = 1632711755
thd = 0x0
#2 <signal handler called>
No locals.
#3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
set = {__val = {552967, 0, 0, 34359738369, 140468574738416, 140469552401123, 18034086, 0, 0, 34359738369, 140468574738464, 140469552400814, 1, 4294967296, 54978712, 4349946008}}
pid = <optimized out>
tid = <optimized out>
ret = <optimized out>
#4 0x00007fc19d2408a4 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x7fc1637fda08, sa_sigaction = 0x7fc1637fda08}, sa_mask = {__val = {140469544983184, 140467576806272, 0, 14, 158, 18033935, 158, 140468574739008, 11587585731724782336, 0, 18446744073709551272, 11, 140469375847038, 140469375847039, 54610080, 140468574738968}}, sa_flags = -1658165304, sa_restorer = 0x7fc1637fd910}
sigs = {__val = {32, 23344272, 140469375847038, 140469375847039, 140468574738880, 18014776, 140468574738688, 10288849, 140468574738928, 10284766, 158, 140467375237072, 158, 289, 140467375237072, 140467375177856}}
#5 0x0000000000c86a0f in ib::fatal::~fatal (this=0x7fc1637fda60) at /home/dan/repos/mariadb-server-10.6/storage/innobase/ut/ut0ut.cc:519
No locals.
#6 0x0000000000c5e4f0 in srv_monitor_task () at /home/dan/repos/mariadb-server-10.6/storage/innobase/srv/srv0srv.cc:1312
waited = <optimized out>
start = <optimized out>
old_lsn = 51229423
new_lsn = <optimized out>
now = <optimized out>
threshold = <optimized out>
#7 0x0000000000d4468c in tpool::thread_pool_generic::timer_generic::run (this=0x35454d0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:313
expected = false
#8 0x0000000000d44f37 in tpool::task::execute (this=0x3545510) at /home/dan/repos/mariadb-server-10.6/tpool/task.cc:37
No locals.
#9 0x0000000000d42b16 in tpool::thread_pool_generic::worker_main (this=0x3405480, thread_var=0x34148a0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:549
task = 0x2
#10 0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#11 0x00007fc19d635299 in start_thread (arg=0x7fc1637fe640) at pthread_create.c:481
ret = <optimized out>
pd = 0x7fc1637fe640
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140468574742080, -4352454039086231479, 140469375847038, 140469375847039, 0, 140468574742080, 4330840937900871753, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 0
#12 0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.