Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26674

io_uring related hangs on the Linux kernel

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.6, 10.7(EOL)
    • 10.8.0, 10.6.6, 10.7.2

    Description

      After a kernel upgrade, I started to experience random hangs in InnoDB related to io_uring. Yesterday with an older kernel

      Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)
      

      everything ran fine, but today I got several tests failing or hanging when running the following on a nproc=40 machine:

      nice ./mtr --parallel=60 --big-test --force
      

      Here is an example with the names of hanging tests:

      10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89

      Completed: Failed 7/5108 tests, 99.86% were successful.
       
      Failing test(s): innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb innodb_gis.rtree_debug
      

      I suspect that the culprit is the newer kernel:

      Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)
      

      If I disable io_uring for the MariaDB server build, then everything will work just fine using the libaio interface (io_submit, io_getevents).

      Here is one example of a hang:

      10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89

      innodb_zip.innochecksum_2 '4k,innodb'    w40 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2021-09-24 14:43:48
      line
      Attempting backtrace. You can use the following information to find out
      Thread 7 (Thread 0x7eff55043640 (LWP 954970)):
      #0  0x00007eff56cb7bd2 in __GI___sigtimedwait (set=0x7eff55042ca8, info=0x7eff55042c20, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36
      #1  0x000055c55ef1883b in my_sigwait (set=0x7eff55042ca8, sig=0x7eff55042c1c, code=<optimized out>) at /mariadb/10.6/include/my_pthread.h:195
      #2  signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113
      #3  0x00007eff57075eae in start_thread (arg=0x7eff55043640) at pthread_create.c:463
      #4  0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 6 (LWP 961402):
      #0  0x0000000000000000 in ?? ()
      Backtrace stopped: Cannot access memory at address 0x0
       
      Thread 5 (Thread 0x7eff5638b640 (LWP 954807)):
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x00007eff57721cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1
      #2  0x00007eff57721963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1
      #3  0x000055c55f872361 in io_uring_wait_cqe_nr (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10, wait_nr=1) at /usr/include/liburing.h:494
      #4  io_uring_wait_cqe (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10) at /usr/include/liburing.h:514
      #5  (anonymous namespace)::aio_uring::thread_routine (aio=0x55c562837c90) at /mariadb/10.6/tpool/aio_liburing.cc:137
      #6  0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #7  0x00007eff57075eae in start_thread (arg=0x7eff5638b640) at pthread_create.c:463
      #8  0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 4 (LWP 961385):
      #0  0x0000000000000000 in ?? ()
      Backtrace stopped: Cannot access memory at address 0x0
       
      Thread 3 (Thread 0x7eff5781f640 (LWP 954792)):
      #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7eff5781ed10, clockid=1468132416, expected=0, futex_word=0x55c560ae27fc <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323
      #1  __pthread_cond_wait_common (abstime=0x7eff5781ed10, clockid=1468132416, mutex=0x55c560ae2748 <LOCK_timer+40>, cond=0x55c560ae27d0 <COND_timer>) at pthread_cond_wait.c:520
      #2  __pthread_cond_timedwait (cond=cond@entry=0x55c560ae27d0 <COND_timer>, mutex=mutex@entry=0x55c560ae2748 <LOCK_timer+40>, abstime=abstime@entry=0x7eff5781ed10) at pthread_cond_wait.c:656
      #3  0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c560ae27d0 <COND_timer>, mp=0x55c560ae2720 <LOCK_timer>, abstime=abstime@entry=0x7eff5781ed10, file=0x55c55f96aaa9 "/mariadb/10.6/include/mysql/psi/mysql_thread.h", line=line@entry=1088) at /mariadb/10.6/mysys/thr_mutex.c:546
      #4  0x000055c55f8dc26f in inline_mysql_cond_timedwait (that=0x55c560ae27fc <COND_timer+44>, mutex=0x189, abstime=0x7eff5781ed10, src_file=<optimized out>, src_line=<optimized out>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088
      #5  timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321
      #6  0x00007eff57075eae in start_thread (arg=0x7eff5781f640) at pthread_create.c:463
      #7  0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 2 (Thread 0x7efec9ffb640 (LWP 954925)):
      #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7efec9ffac78, clockid=-905991216, expected=0, futex_word=0x55c5627dc54c) at ../sysdeps/nptl/futex-internal.h:323
      #1  __pthread_cond_wait_common (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:520
      #2  __pthread_cond_clockwait (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:677
      #3  __pthread_cond_clockwait (cond=0x55c5627dc520, mutex=0x55c5627cd990, clockid=-905991216, abstime=0x7efec9ffac78) at pthread_cond_wait.c:665
      #4  0x000055c55f86efaf in std::__condvar::wait_until (this=0x55c5627dc520, __m=<error reading variable>, __clock=1, __abs_time=@0x7efec9ffac78: {tv_sec = 24713, tv_nsec = 284075513}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:169
      #5  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:201
      #6  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:111
      #7  std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __rtime=@0x55c5627cd9b8: {__r = 60000}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:163
      #8  tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c5627cd880, lk=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, thread_data=thread_data@entry=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:449
      #9  0x000055c55f86f14e in tpool::thread_pool_generic::get_task (this=this@entry=0x55c5627cd880, thread_var=thread_var@entry=0x55c5627dc520, t=t@entry=0x7efec9ffad18) at /mariadb/10.6/tpool/tpool_generic.cc:502
      #10 0x000055c55f86f444 in tpool::thread_pool_generic::worker_main (this=0x55c5627cd880, thread_var=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:547
      #11 0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #12 0x00007eff57075eae in start_thread (arg=0x7efec9ffb640) at pthread_create.c:463
      #13 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 1 (Thread 0x7eff5699a840 (LWP 954716)):
      #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
      #1  0x000055c55f25a83a in handle_fatal_signal (sig=6) at /mariadb/10.6/sql/signal_handler.cc:345
      #2  <signal handler called>
      #3  0x00007eff57079849 in __pthread_mutex_unlock_usercnt (mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, decr=decr@entry=0) at pthread_mutex_unlock.c:58
      #4  0x00007eff5707c1a8 in __pthread_cond_wait_common (abstime=0x7fff59490d98, clockid=0, mutex=0x55c5601cf528 <buf_pool+40>, cond=0x55c5601cf5f0 <buf_pool+240>) at pthread_cond_wait.c:421
      #5  __pthread_cond_timedwait (cond=cond@entry=0x55c5601cf5f0 <buf_pool+240>, mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, abstime=abstime@entry=0x7fff59490d98) at pthread_cond_wait.c:656
      #6  0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c5601cf5f0 <buf_pool+240>, mp=0x55c5601cf500 <buf_pool>, abstime=0x7fff59490d98, file=0x55c55fbd74a8 "/mariadb/10.6/storage/innobase/buf/buf0flu.cc", line=2435) at /mariadb/10.6/mysys/thr_mutex.c:546
      #7  0x000055c55f796517 in buf_flush_buffer_pool () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2434
      #8  0x000055c55f5a1121 in logs_empty_and_mark_files_at_shutdown () at /mariadb/10.6/storage/innobase/log/log0log.cc:1139
      #9  0x000055c55f6d79a0 in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1970
      #10 0x000055c55f517318 in innobase_end () at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4273
      #11 0x000055c55f25b48f in ha_finalize_handlerton (plugin=0x55c5627abcb0) at /mariadb/10.6/sql/handler.cc:595
      #12 0x000055c55f00b6df in plugin_deinitialize (plugin=0x55c5627abcb0, ref_check=true) at /mariadb/10.6/sql/sql_plugin.cc:1266
      #13 0x000055c55f009f9a in reap_plugins () at /mariadb/10.6/sql/sql_plugin.cc:1342
      #14 0x000055c55f00bba3 in plugin_shutdown () at /mariadb/10.6/sql/sql_plugin.cc:2050
      #15 0x000055c55ef179ed in clean_up (print_message=true) at /mariadb/10.6/sql/mysqld.cc:1923
      #16 0x000055c55ef196a2 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5844
      #17 0x00007eff56ca1e4a in __libc_start_main (main=0x55c55ef16a40 <main(int, char**)>, argc=26, argv=0x7fff59491718, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff59491708) at ../csu/libc-start.c:314
      #18 0x000055c55ef1697a in _start () at /mariadb/10.6/sql/my_decimal.h:147
      

      This looks like a shutdown hang, and the problem could be the hung Thread 5. There are also 2 suspicious threads with a null address. I do not remember seeing such stack traces earlier.

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            Repeatable for me as below. I've got a older kernel downloaded for regression testing.

            10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.13.16-200.fc34.x86_64stress.ddl_innodb 'innodb'

            $ mysql-test/mtr  --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
             
            innodb_zip.innochecksum_2 '16k,innodb'   [ skipped ]  Requires debug build
            innodb_zip.innochecksum_2 '4k,innodb'    [ skipped ]  Requires debug build
            innodb_zip.innochecksum_2 '8k,innodb'    [ skipped ]  Requires debug build
            gcol.innodb_virtual_basic 'innodb'       w4 [ pass ]    870
            innodb_gis.kill_server 'innodb'          w8 [ pass ]   1610
            innodb.innodb-64k 'innodb'               w3 [ pass ]   1700
            main.index_merge_innodb 'innodb'         w6 [ pass ]   2303
            innodb.innodb-64k 'innodb'               w3 [ 2 pass ]   1645
            main.index_merge_innodb 'innodb'         w6 [ 2 pass ]   1980
            innodb.innodb-64k 'innodb'               w3 [ 3 pass ]   1640
            main.index_merge_innodb 'innodb'         w6 [ 3 pass ]   2009
            innodb.innodb-64k 'innodb'               w3 [ 4 pass ]   1635
            innodb.innodb-64k 'innodb'               w3 [ 5 pass ]   1676
            innodb.innodb-64k 'innodb'               w3 [ 6 pass ]   1816
            innodb.innodb-64k 'innodb'               w3 [ 7 pass ]   2015
            innodb.innodb-64k 'innodb'               w3 [ 8 pass ]   1675
            innodb.innodb-64k 'innodb'               w3 [ 9 pass ]   1657
            innodb.innodb-64k 'innodb'               w3 [ 10 pass ]   1646
            innodb.innodb-64k 'innodb'               w3 [ 11 pass ]   1658
            innodb.innodb-64k 'innodb'               w3 [ 12 pass ]   1662
            innodb.innodb-64k 'innodb'               w3 [ 13 pass ]   1631
            innodb.innodb-64k 'innodb'               w3 [ 14 pass ]   1656
            innodb.innodb-64k 'innodb'               w3 [ 15 pass ]   1662
            innodb.innodb-64k 'innodb'               w3 [ 16 pass ]   1663
            innodb.innodb-64k 'innodb'               w3 [ 17 pass ]   1657
            innodb.innodb-64k 'innodb'               w3 [ 18 pass ]   1633
            innodb.innodb-64k 'innodb'               w3 [ 19 pass ]   1632
            innodb.innodb-64k 'innodb'               w3 [ 20 pass ]   1620
            main.index_merge_innodb 'innodb'         w6 [ 4 pass ]  62158
            main.index_merge_innodb 'innodb'         w6 [ 5 pass ]   1766
            main.index_merge_innodb 'innodb'         w6 [ 6 pass ]   1825
            main.index_merge_innodb 'innodb'         w6 [ 7 pass ]   1761
            main.index_merge_innodb 'innodb'         w6 [ 8 pass ]   1772
            main.index_merge_innodb 'innodb'         w6 [ 9 pass ]   1811
            main.index_merge_innodb 'innodb'         w6 [ 10 pass ]   1827
            main.index_merge_innodb 'innodb'         w6 [ 11 pass ]   1769
            stress.ddl_innodb 'innodb'               w1 [ fail ]
                    Test ended at 2021-09-27 13:02:39
             
            CURRENT_TEST: stress.ddl_innodb
            mysqltest: In included file "./suite/stress/include/ddl1.inc": 
            included from /home/dan/repos/mariadb-server-10.6/mysql-test/suite/stress/t/ddl_innodb.test at line 37:
            At line 201: query 'EXECUTE create_table' failed: <Unknown> (2013): Lost connection to server during query
             
            Core was generated by `/home/dan/repos/build-mariadb-server-10.6/sql/mariadbd --defaults-group-suffix='.
            Program terminated with signal SIGABRT, Aborted.
            ....
            Thread 2 (Thread 0x7fc1933fd640 (LWP 125168)):
            #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            No locals.
            #1  0x00007fc19dc5ca9f in __sys_io_uring_enter2 (fd=8, to_submit=0, min_complete=1, flags=1, sig=0x0, sz=8) at syscall.c:64
            No locals.
            #2  0x00007fc19dc5c02c in _io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, data=0x7fc1933fcbf8) at queue.c:122
                    flags = 1
                    nr_available = 0
                    ret = 0
                    need_enter = true
                    cq_overflow_flush = false
                    cqe = 0x0
                    err = 0
            #3  0x00007fc19dc5befd in __io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, submit=0, wait_nr=1, sigmask=0x0) at queue.c:150
                    data = {submit = 0, wait_nr = 1, get_flags = 0, sz = 8, arg = 0x0}
            #4  0x0000000000d456c2 in io_uring_wait_cqe_nr (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, wait_nr=1) at /usr/local/include/liburing.h:745
            No locals.
            #5  io_uring_wait_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48) at /usr/local/include/liburing.h:765
            No locals.
            #6  (anonymous namespace)::aio_uring::thread_routine (aio=0x346e890) at /home/dan/repos/mariadb-server-10.6/tpool/aio_liburing.cc:137
                    ret = <optimized out>
                    iocb = <optimized out>
                    res = <optimized out>
                    cqe = 0x7fc19dd37210
            #7  0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
            No symbol table info available.
            #8  0x00007fc19d635299 in start_thread (arg=0x7fc1933fd640) at pthread_create.c:481
                    ret = <optimized out>
                    pd = 0x7fc1933fd640
                    unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140469375850048, -4352454039086231479, 140724256649310, 140724256649311, 0, 140469375850048, 4331297784445341769, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
                    not_first_call = 0
            #9  0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            No locals.
             
            Thread 1 (Thread 0x7fc1637fe640 (LWP 125448)):
            #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
                    pd = <optimized out>
                    tid = 125448
                    pid = 125138
                    val = 0
            #1  0x00000000008f8086 in handle_fatal_signal (sig=6) at /home/dan/repos/mariadb-server-10.6/sql/signal_handler.cc:345
                    tm = {tm_sec = 35, tm_min = 2, tm_hour = 13, tm_mday = 27, tm_mon = 8, tm_year = 121, tm_wday = 1, tm_yday = 269, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x3226d00 "AEST"}
                    print_invalid_query_pointer = <optimized out>
                    curr_time = 1632711755
                    thd = 0x0
            #2  <signal handler called>
            No locals.
            #3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
                    set = {__val = {552967, 0, 0, 34359738369, 140468574738416, 140469552401123, 18034086, 0, 0, 34359738369, 140468574738464, 140469552400814, 1, 4294967296, 54978712, 4349946008}}
                    pid = <optimized out>
                    tid = <optimized out>
                    ret = <optimized out>
            #4  0x00007fc19d2408a4 in __GI_abort () at abort.c:79
                    save_stage = 1
                    act = {__sigaction_handler = {sa_handler = 0x7fc1637fda08, sa_sigaction = 0x7fc1637fda08}, sa_mask = {__val = {140469544983184, 140467576806272, 0, 14, 158, 18033935, 158, 140468574739008, 11587585731724782336, 0, 18446744073709551272, 11, 140469375847038, 140469375847039, 54610080, 140468574738968}}, sa_flags = -1658165304, sa_restorer = 0x7fc1637fd910}
                    sigs = {__val = {32, 23344272, 140469375847038, 140469375847039, 140468574738880, 18014776, 140468574738688, 10288849, 140468574738928, 10284766, 158, 140467375237072, 158, 289, 140467375237072, 140467375177856}}
            #5  0x0000000000c86a0f in ib::fatal::~fatal (this=0x7fc1637fda60) at /home/dan/repos/mariadb-server-10.6/storage/innobase/ut/ut0ut.cc:519
            No locals.
            #6  0x0000000000c5e4f0 in srv_monitor_task () at /home/dan/repos/mariadb-server-10.6/storage/innobase/srv/srv0srv.cc:1312
                    waited = <optimized out>
                    start = <optimized out>
                    old_lsn = 51229423
                    new_lsn = <optimized out>
                    now = <optimized out>
                    threshold = <optimized out>
            #7  0x0000000000d4468c in tpool::thread_pool_generic::timer_generic::run (this=0x35454d0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:313
                    expected = false
            #8  0x0000000000d44f37 in tpool::task::execute (this=0x3545510) at /home/dan/repos/mariadb-server-10.6/tpool/task.cc:37
            No locals.
            #9  0x0000000000d42b16 in tpool::thread_pool_generic::worker_main (this=0x3405480, thread_var=0x34148a0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:549
                    task = 0x2
            #10 0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
            No symbol table info available.
            #11 0x00007fc19d635299 in start_thread (arg=0x7fc1637fe640) at pthread_create.c:481
                    ret = <optimized out>
                    pd = 0x7fc1637fe640
                    unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140468574742080, -4352454039086231479, 140469375847038, 140469375847039, 0, 140468574742080, 4330840937900871753, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
                    not_first_call = 0
            #12 0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            No locals.
            
            

            danblack Daniel Black added a comment - Repeatable for me as below. I've got a older kernel downloaded for regression testing. 10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.13.16-200.fc34.x86_64stress.ddl_innodb 'innodb' $ mysql-test/mtr --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb   innodb_zip.innochecksum_2 '16k,innodb' [ skipped ] Requires debug build innodb_zip.innochecksum_2 '4k,innodb' [ skipped ] Requires debug build innodb_zip.innochecksum_2 '8k,innodb' [ skipped ] Requires debug build gcol.innodb_virtual_basic 'innodb' w4 [ pass ] 870 innodb_gis.kill_server 'innodb' w8 [ pass ] 1610 innodb.innodb-64k 'innodb' w3 [ pass ] 1700 main.index_merge_innodb 'innodb' w6 [ pass ] 2303 innodb.innodb-64k 'innodb' w3 [ 2 pass ] 1645 main.index_merge_innodb 'innodb' w6 [ 2 pass ] 1980 innodb.innodb-64k 'innodb' w3 [ 3 pass ] 1640 main.index_merge_innodb 'innodb' w6 [ 3 pass ] 2009 innodb.innodb-64k 'innodb' w3 [ 4 pass ] 1635 innodb.innodb-64k 'innodb' w3 [ 5 pass ] 1676 innodb.innodb-64k 'innodb' w3 [ 6 pass ] 1816 innodb.innodb-64k 'innodb' w3 [ 7 pass ] 2015 innodb.innodb-64k 'innodb' w3 [ 8 pass ] 1675 innodb.innodb-64k 'innodb' w3 [ 9 pass ] 1657 innodb.innodb-64k 'innodb' w3 [ 10 pass ] 1646 innodb.innodb-64k 'innodb' w3 [ 11 pass ] 1658 innodb.innodb-64k 'innodb' w3 [ 12 pass ] 1662 innodb.innodb-64k 'innodb' w3 [ 13 pass ] 1631 innodb.innodb-64k 'innodb' w3 [ 14 pass ] 1656 innodb.innodb-64k 'innodb' w3 [ 15 pass ] 1662 innodb.innodb-64k 'innodb' w3 [ 16 pass ] 1663 innodb.innodb-64k 'innodb' w3 [ 17 pass ] 1657 innodb.innodb-64k 'innodb' w3 [ 18 pass ] 1633 innodb.innodb-64k 'innodb' w3 [ 19 pass ] 1632 innodb.innodb-64k 'innodb' w3 [ 20 pass ] 1620 main.index_merge_innodb 'innodb' w6 [ 4 pass ] 62158 main.index_merge_innodb 'innodb' w6 [ 5 pass ] 1766 main.index_merge_innodb 'innodb' w6 [ 6 pass ] 1825 main.index_merge_innodb 'innodb' w6 [ 7 pass ] 1761 main.index_merge_innodb 'innodb' w6 [ 8 pass ] 1772 main.index_merge_innodb 'innodb' w6 [ 9 pass ] 1811 main.index_merge_innodb 'innodb' w6 [ 10 pass ] 1827 main.index_merge_innodb 'innodb' w6 [ 11 pass ] 1769 stress.ddl_innodb 'innodb' w1 [ fail ] Test ended at 2021-09-27 13:02:39   CURRENT_TEST: stress.ddl_innodb mysqltest: In included file "./suite/stress/include/ddl1.inc": included from /home/dan/repos/mariadb-server-10.6/mysql-test/suite/stress/t/ddl_innodb.test at line 37: At line 201: query 'EXECUTE create_table' failed: <Unknown> (2013): Lost connection to server during query   Core was generated by `/home/dan/repos/build-mariadb-server-10.6/sql/mariadbd --defaults-group-suffix='. Program terminated with signal SIGABRT, Aborted. .... Thread 2 (Thread 0x7fc1933fd640 (LWP 125168)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 No locals. #1 0x00007fc19dc5ca9f in __sys_io_uring_enter2 (fd=8, to_submit=0, min_complete=1, flags=1, sig=0x0, sz=8) at syscall.c:64 No locals. #2 0x00007fc19dc5c02c in _io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, data=0x7fc1933fcbf8) at queue.c:122 flags = 1 nr_available = 0 ret = 0 need_enter = true cq_overflow_flush = false cqe = 0x0 err = 0 #3 0x00007fc19dc5befd in __io_uring_get_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, submit=0, wait_nr=1, sigmask=0x0) at queue.c:150 data = {submit = 0, wait_nr = 1, get_flags = 0, sz = 8, arg = 0x0} #4 0x0000000000d456c2 in io_uring_wait_cqe_nr (ring=0x346e898, cqe_ptr=0x7fc1933fcc48, wait_nr=1) at /usr/local/include/liburing.h:745 No locals. #5 io_uring_wait_cqe (ring=0x346e898, cqe_ptr=0x7fc1933fcc48) at /usr/local/include/liburing.h:765 No locals. #6 (anonymous namespace)::aio_uring::thread_routine (aio=0x346e890) at /home/dan/repos/mariadb-server-10.6/tpool/aio_liburing.cc:137 ret = <optimized out> iocb = <optimized out> res = <optimized out> cqe = 0x7fc19dd37210 #7 0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6 No symbol table info available. #8 0x00007fc19d635299 in start_thread (arg=0x7fc1933fd640) at pthread_create.c:481 ret = <optimized out> pd = 0x7fc1933fd640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140469375850048, -4352454039086231479, 140724256649310, 140724256649311, 0, 140469375850048, 4331297784445341769, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #9 0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals.   Thread 1 (Thread 0x7fc1637fe640 (LWP 125448)): #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54 pd = <optimized out> tid = 125448 pid = 125138 val = 0 #1 0x00000000008f8086 in handle_fatal_signal (sig=6) at /home/dan/repos/mariadb-server-10.6/sql/signal_handler.cc:345 tm = {tm_sec = 35, tm_min = 2, tm_hour = 13, tm_mday = 27, tm_mon = 8, tm_year = 121, tm_wday = 1, tm_yday = 269, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x3226d00 "AEST"} print_invalid_query_pointer = <optimized out> curr_time = 1632711755 thd = 0x0 #2 <signal handler called> No locals. #3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 set = {__val = {552967, 0, 0, 34359738369, 140468574738416, 140469552401123, 18034086, 0, 0, 34359738369, 140468574738464, 140469552400814, 1, 4294967296, 54978712, 4349946008}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #4 0x00007fc19d2408a4 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x7fc1637fda08, sa_sigaction = 0x7fc1637fda08}, sa_mask = {__val = {140469544983184, 140467576806272, 0, 14, 158, 18033935, 158, 140468574739008, 11587585731724782336, 0, 18446744073709551272, 11, 140469375847038, 140469375847039, 54610080, 140468574738968}}, sa_flags = -1658165304, sa_restorer = 0x7fc1637fd910} sigs = {__val = {32, 23344272, 140469375847038, 140469375847039, 140468574738880, 18014776, 140468574738688, 10288849, 140468574738928, 10284766, 158, 140467375237072, 158, 289, 140467375237072, 140467375177856}} #5 0x0000000000c86a0f in ib::fatal::~fatal (this=0x7fc1637fda60) at /home/dan/repos/mariadb-server-10.6/storage/innobase/ut/ut0ut.cc:519 No locals. #6 0x0000000000c5e4f0 in srv_monitor_task () at /home/dan/repos/mariadb-server-10.6/storage/innobase/srv/srv0srv.cc:1312 waited = <optimized out> start = <optimized out> old_lsn = 51229423 new_lsn = <optimized out> now = <optimized out> threshold = <optimized out> #7 0x0000000000d4468c in tpool::thread_pool_generic::timer_generic::run (this=0x35454d0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:313 expected = false #8 0x0000000000d44f37 in tpool::task::execute (this=0x3545510) at /home/dan/repos/mariadb-server-10.6/tpool/task.cc:37 No locals. #9 0x0000000000d42b16 in tpool::thread_pool_generic::worker_main (this=0x3405480, thread_var=0x34148a0) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:549 task = 0x2 #10 0x00007fc19d4ddc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6 No symbol table info available. #11 0x00007fc19d635299 in start_thread (arg=0x7fc1637fe640) at pthread_create.c:481 ret = <optimized out> pd = 0x7fc1637fe640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140468574742080, -4352454039086231479, 140469375847038, 140469375847039, 0, 140468574742080, 4330840937900871753, 4331267510241136713}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #12 0x00007fc19d31a353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals.

            danblack, thank you. I think that stress.ddl_innodb alone should be a reliable test of the hangs. It always runs for 160 seconds for me. It appears that the intention would be to extend the run-time by a factor of 5 when --big-test is specified, but that does not seem to work in my environment.

            marko Marko Mäkelä added a comment - danblack , thank you. I think that stress.ddl_innodb alone should be a reliable test of the hangs. It always runs for 160 seconds for me. It appears that the intention would be to extend the run-time by a factor of 5 when --big-test is specified, but that does not seem to work in my environment.
            danblack Daniel Black added a comment - - edited

            Quite right - 5.10 working correctly. Same with MDEV-26555 's main.innodb_ext_key test

            10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.10.23-200.fc33.x86_64

            $ mysql-test/mtr --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
            Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl  --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb
            vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
            Checking leftover processes...
            Removing old var directory...
            Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
             - symlinking 'var' to '/dev/shm/var_auto_Iu3y'
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
            Using parallel: 6
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
            worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
            worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
            innodb_zip.innochecksum_2 'innodb'       [ skipped ]  Requires debug build
            gcol.innodb_virtual_basic 'innodb'       w2 [ pass ]    877
            innodb_gis.kill_server 'innodb'          w5 [ pass ]   1574
            gcol.innodb_virtual_basic 'innodb'       w2 [ 2 pass ]    805
            innodb.innodb-64k 'innodb'               w1 [ pass ]   1713
            main.index_merge_innodb 'innodb'         w4 [ pass ]   2475
            gcol.innodb_virtual_basic 'innodb'       w2 [ 3 pass ]    873
            innodb_gis.kill_server 'innodb'          w5 [ 2 pass ]   1644
            gcol.innodb_virtual_basic 'innodb'       w2 [ 4 pass ]    872
            innodb.innodb-64k 'innodb'               w1 [ 2 pass ]   1704
            gcol.innodb_virtual_basic 'innodb'       w2 [ 5 pass ]    971
            main.index_merge_innodb 'innodb'         w4 [ 2 pass ]   2441
            innodb_gis.kill_server 'innodb'          w5 [ 3 pass ]   1619
            gcol.innodb_virtual_basic 'innodb'       w2 [ 6 pass ]    886
            innodb.innodb-64k 'innodb'               w1 [ 3 pass ]   1711
            gcol.innodb_virtual_basic 'innodb'       w2 [ 7 pass ]    905
            innodb_gis.kill_server 'innodb'          w5 [ 4 pass ]   1663
            main.index_merge_innodb 'innodb'         w4 [ 3 pass ]   2357
            innodb.innodb-64k 'innodb'               w1 [ 4 pass ]   1696
            gcol.innodb_virtual_basic 'innodb'       w2 [ 8 pass ]    924
            gcol.innodb_virtual_basic 'innodb'       w2 [ 9 pass ]    856
            innodb_gis.kill_server 'innodb'          w5 [ 5 pass ]   1786
            innodb.innodb-64k 'innodb'               w1 [ 5 pass ]   1686
            gcol.innodb_virtual_basic 'innodb'       w2 [ 10 pass ]   1107
            main.index_merge_innodb 'innodb'         w4 [ 4 pass ]   2515
            innodb_gis.kill_server 'innodb'          w5 [ 6 pass ]   1727
            gcol.innodb_virtual_basic 'innodb'       w2 [ 11 pass ]    847
            innodb.innodb-64k 'innodb'               w1 [ 6 pass ]   1733
            gcol.innodb_virtual_basic 'innodb'       w2 [ 12 pass ]    980
            innodb_gis.kill_server 'innodb'          w5 [ 7 pass ]   1634
            main.index_merge_innodb 'innodb'         w4 [ 5 pass ]   2506
            gcol.innodb_virtual_basic 'innodb'       w2 [ 13 pass ]    868
            innodb.innodb-64k 'innodb'               w1 [ 7 pass ]   1700
            gcol.innodb_virtual_basic 'innodb'       w2 [ 14 pass ]    900
            innodb_gis.kill_server 'innodb'          w5 [ 8 pass ]   1589
            gcol.innodb_virtual_basic 'innodb'       w2 [ 15 pass ]    840
            innodb.innodb-64k 'innodb'               w1 [ 8 pass ]   1707
            main.index_merge_innodb 'innodb'         w4 [ 6 pass ]   2654
            innodb_gis.kill_server 'innodb'          w5 [ 9 pass ]   1642
            gcol.innodb_virtual_basic 'innodb'       w2 [ 16 pass ]   1090
            innodb.innodb-64k 'innodb'               w1 [ 9 pass ]   1684
            gcol.innodb_virtual_basic 'innodb'       w2 [ 17 pass ]    978
            innodb_gis.kill_server 'innodb'          w5 [ 10 pass ]   1643
            main.index_merge_innodb 'innodb'         w4 [ 7 pass ]   2413
            gcol.innodb_virtual_basic 'innodb'       w2 [ 18 pass ]    840
            innodb.innodb-64k 'innodb'               w1 [ 10 pass ]   1684
            gcol.innodb_virtual_basic 'innodb'       w2 [ 19 pass ]    941
            innodb_gis.kill_server 'innodb'          w5 [ 11 pass ]   1754
            gcol.innodb_virtual_basic 'innodb'       w2 [ 20 pass ]    839
            innodb.innodb-64k 'innodb'               w1 [ 11 pass ]   1681
            main.index_merge_innodb 'innodb'         w4 [ 8 pass ]   2550
            innodb_gis.kill_server 'innodb'          w5 [ 12 pass ]   1661
            innodb.innodb-64k 'innodb'               w1 [ 12 pass ]   1676
            innodb_gis.kill_server 'innodb'          w5 [ 13 pass ]   1482
            main.index_merge_innodb 'innodb'         w4 [ 9 pass ]   2166
            innodb.innodb-64k 'innodb'               w1 [ 13 pass ]   1665
            innodb_gis.kill_server 'innodb'          w5 [ 14 pass ]   1493
            main.index_merge_innodb 'innodb'         w4 [ 10 pass ]   2291
            innodb.innodb-64k 'innodb'               w1 [ 14 pass ]   1662
            innodb_gis.kill_server 'innodb'          w5 [ 15 pass ]   1644
            innodb.innodb-64k 'innodb'               w1 [ 15 pass ]   1689
            innodb_gis.kill_server 'innodb'          w5 [ 16 pass ]   1501
            main.index_merge_innodb 'innodb'         w4 [ 11 pass ]   2330
            innodb.innodb-64k 'innodb'               w1 [ 16 pass ]   1694
            innodb_gis.kill_server 'innodb'          w5 [ 17 pass ]   1521
            main.index_merge_innodb 'innodb'         w4 [ 12 pass ]   2558
            innodb.innodb-64k 'innodb'               w1 [ 17 pass ]   1678
            innodb_gis.kill_server 'innodb'          w5 [ 18 pass ]   1625
            innodb_gis.kill_server 'innodb'          w5 [ 19 pass ]   1714
            innodb.innodb-64k 'innodb'               w1 [ 18 pass ]   1708
            main.index_merge_innodb 'innodb'         w4 [ 13 pass ]   2610
            innodb_gis.kill_server 'innodb'          w5 [ 20 pass ]   1677
            innodb.innodb-64k 'innodb'               w1 [ 19 pass ]   1732
            main.index_merge_innodb 'innodb'         w4 [ 14 pass ]   2524
            innodb.innodb-64k 'innodb'               w1 [ 20 pass ]   1665
            main.index_merge_innodb 'innodb'         w4 [ 15 pass ]   2228
            main.index_merge_innodb 'innodb'         w4 [ 16 pass ]   2142
            main.index_merge_innodb 'innodb'         w4 [ 17 pass ]   2145
            main.index_merge_innodb 'innodb'         w4 [ 18 pass ]   2482
            main.index_merge_innodb 'innodb'         w4 [ 19 pass ]   2258
            main.index_merge_innodb 'innodb'         w4 [ 20 pass ]   2117
            stress.ddl_innodb 'innodb'               w6 [ pass ]  159829
            stress.ddl_innodb 'innodb'               w6 [ 2 pass ]  160236
            stress.ddl_innodb 'innodb'               w6 [ 3 pass ]  160477
            stress.ddl_innodb 'innodb'               w6 [ 4 pass ]  159516
            stress.ddl_innodb 'innodb'               w6 [ 5 pass ]  159392
            stress.ddl_innodb 'innodb'               w6 [ 6 pass ]  159883
            stress.ddl_innodb 'innodb'               w6 [ 7 pass ]  159986
            
            

            As is is 5.15-rc2

            5.15.0-0.rc2.20210923git58e2cf5d7946.21.fc36.x86_64

            $ mysql-test/mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
            Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl  --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
            vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
            Checking leftover processes...
            Removing old var directory...
            Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
             - symlinking 'var' to '/dev/shm/var_auto_1oiy'
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
            Using parallel: 8
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
            worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
            worker[7] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
            worker[6] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139
            worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159
            stress.ddl_innodb 'innodb'               w6 [ pass ]  159402
            stress.ddl_innodb 'innodb'               w2 [ pass ]  159780
            stress.ddl_innodb 'innodb'               w8 [ pass ]  159673
            stress.ddl_innodb 'innodb'               w1 [ pass ]  161240
            stress.ddl_innodb 'innodb'               w5 [ pass ]  160823
            stress.ddl_innodb 'innodb'               w4 [ pass ]  160948
            stress.ddl_innodb 'innodb'               w7 [ pass ]  161745
            stress.ddl_innodb 'innodb'               w3 [ pass ]  161834
            stress.ddl_innodb 'innodb'               w6 [ 2 pass ]  161652
            stress.ddl_innodb 'innodb'               w2 [ 2 pass ]  162261
            stress.ddl_innodb 'innodb'               w7 [ 2 pass ]  162398
            stress.ddl_innodb 'innodb'               w1 [ 2 pass ]  162923
            stress.ddl_innodb 'innodb'               w8 [ 2 pass ]  163672
            stress.ddl_innodb 'innodb'               w4 [ 2 pass ]  164979
            stress.ddl_innodb 'innodb'               w5 [ 2 pass ]  165931
            stress.ddl_innodb 'innodb'               w3 [ 2 pass ]  164470
            stress.ddl_innodb 'innodb'               w6 [ 3 pass ]  160845
            stress.ddl_innodb 'innodb'               w2 [ 3 pass ]  161123
            stress.ddl_innodb 'innodb'               w1 [ 3 pass ]  161309
            stress.ddl_innodb 'innodb'               w7 [ 3 pass ]  162090
            stress.ddl_innodb 'innodb'               w5 [ 3 pass ]  159452
            stress.ddl_innodb 'innodb'               w8 [ 3 pass ]  161963
            stress.ddl_innodb 'innodb'               w4 [ 3 pass ]  161161
            stress.ddl_innodb 'innodb'               w3 [ 3 pass ]  163132
            stress.ddl_innodb 'innodb'               w6 [ 4 pass ]  160392
            stress.ddl_innodb 'innodb'               w2 [ 4 pass ]  159988
            stress.ddl_innodb 'innodb'               w7 [ 4 pass ]  160013
            stress.ddl_innodb 'innodb'               w8 [ 4 pass ]  160109
            stress.ddl_innodb 'innodb'               w1 [ 4 pass ]  161107
            stress.ddl_innodb 'innodb'               w4 [ 4 pass ]  161043
            stress.ddl_innodb 'innodb'               w5 [ 4 pass ]  162480
            stress.ddl_innodb 'innodb'               w2 [ 5 pass ]  160167
            stress.ddl_innodb 'innodb'               w6 [ 5 pass ]  161083
            stress.ddl_innodb 'innodb'               w8 [ 5 pass ]  160605
            stress.ddl_innodb 'innodb'               w1 [ 5 pass ]  160305
            stress.ddl_innodb 'innodb'               w7 [ 5 pass ]  162074
            stress.ddl_innodb 'innodb'               w4 [ 5 pass ]  159965
            stress.ddl_innodb 'innodb'               w5 [ 5 pass ]  161298
            stress.ddl_innodb 'innodb'               w6 [ 6 pass ]  159764
            stress.ddl_innodb 'innodb'               w2 [ 6 pass ]  160929
            stress.ddl_innodb 'innodb'               w1 [ 6 pass ]  160120
            stress.ddl_innodb 'innodb'               w7 [ 6 pass ]  159705
            stress.ddl_innodb 'innodb'               w4 [ 6 pass ]  160425
            stress.ddl_innodb 'innodb'               w8 [ 6 pass ]  162438
            stress.ddl_innodb 'innodb'               w5 [ 6 pass ]  160353
            

            danblack Daniel Black added a comment - - edited Quite right - 5.10 working correctly. Same with MDEV-26555 's main.innodb_ext_key test 10.6-d95361107c07b6e8257a7a82c41b18af64ab8d89, kernel 5.10.23-200.fc33.x86_64 $ mysql-test/mtr --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl --mem --repeat=20 --parallel=auto innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var Checking leftover processes... Removing old var directory... Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'... - symlinking 'var' to '/dev/shm/var_auto_Iu3y' Checking supported features... MariaDB Version 10.6.5-MariaDB - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database... Using parallel: 6   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119 worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099 innodb_zip.innochecksum_2 'innodb' [ skipped ] Requires debug build gcol.innodb_virtual_basic 'innodb' w2 [ pass ] 877 innodb_gis.kill_server 'innodb' w5 [ pass ] 1574 gcol.innodb_virtual_basic 'innodb' w2 [ 2 pass ] 805 innodb.innodb-64k 'innodb' w1 [ pass ] 1713 main.index_merge_innodb 'innodb' w4 [ pass ] 2475 gcol.innodb_virtual_basic 'innodb' w2 [ 3 pass ] 873 innodb_gis.kill_server 'innodb' w5 [ 2 pass ] 1644 gcol.innodb_virtual_basic 'innodb' w2 [ 4 pass ] 872 innodb.innodb-64k 'innodb' w1 [ 2 pass ] 1704 gcol.innodb_virtual_basic 'innodb' w2 [ 5 pass ] 971 main.index_merge_innodb 'innodb' w4 [ 2 pass ] 2441 innodb_gis.kill_server 'innodb' w5 [ 3 pass ] 1619 gcol.innodb_virtual_basic 'innodb' w2 [ 6 pass ] 886 innodb.innodb-64k 'innodb' w1 [ 3 pass ] 1711 gcol.innodb_virtual_basic 'innodb' w2 [ 7 pass ] 905 innodb_gis.kill_server 'innodb' w5 [ 4 pass ] 1663 main.index_merge_innodb 'innodb' w4 [ 3 pass ] 2357 innodb.innodb-64k 'innodb' w1 [ 4 pass ] 1696 gcol.innodb_virtual_basic 'innodb' w2 [ 8 pass ] 924 gcol.innodb_virtual_basic 'innodb' w2 [ 9 pass ] 856 innodb_gis.kill_server 'innodb' w5 [ 5 pass ] 1786 innodb.innodb-64k 'innodb' w1 [ 5 pass ] 1686 gcol.innodb_virtual_basic 'innodb' w2 [ 10 pass ] 1107 main.index_merge_innodb 'innodb' w4 [ 4 pass ] 2515 innodb_gis.kill_server 'innodb' w5 [ 6 pass ] 1727 gcol.innodb_virtual_basic 'innodb' w2 [ 11 pass ] 847 innodb.innodb-64k 'innodb' w1 [ 6 pass ] 1733 gcol.innodb_virtual_basic 'innodb' w2 [ 12 pass ] 980 innodb_gis.kill_server 'innodb' w5 [ 7 pass ] 1634 main.index_merge_innodb 'innodb' w4 [ 5 pass ] 2506 gcol.innodb_virtual_basic 'innodb' w2 [ 13 pass ] 868 innodb.innodb-64k 'innodb' w1 [ 7 pass ] 1700 gcol.innodb_virtual_basic 'innodb' w2 [ 14 pass ] 900 innodb_gis.kill_server 'innodb' w5 [ 8 pass ] 1589 gcol.innodb_virtual_basic 'innodb' w2 [ 15 pass ] 840 innodb.innodb-64k 'innodb' w1 [ 8 pass ] 1707 main.index_merge_innodb 'innodb' w4 [ 6 pass ] 2654 innodb_gis.kill_server 'innodb' w5 [ 9 pass ] 1642 gcol.innodb_virtual_basic 'innodb' w2 [ 16 pass ] 1090 innodb.innodb-64k 'innodb' w1 [ 9 pass ] 1684 gcol.innodb_virtual_basic 'innodb' w2 [ 17 pass ] 978 innodb_gis.kill_server 'innodb' w5 [ 10 pass ] 1643 main.index_merge_innodb 'innodb' w4 [ 7 pass ] 2413 gcol.innodb_virtual_basic 'innodb' w2 [ 18 pass ] 840 innodb.innodb-64k 'innodb' w1 [ 10 pass ] 1684 gcol.innodb_virtual_basic 'innodb' w2 [ 19 pass ] 941 innodb_gis.kill_server 'innodb' w5 [ 11 pass ] 1754 gcol.innodb_virtual_basic 'innodb' w2 [ 20 pass ] 839 innodb.innodb-64k 'innodb' w1 [ 11 pass ] 1681 main.index_merge_innodb 'innodb' w4 [ 8 pass ] 2550 innodb_gis.kill_server 'innodb' w5 [ 12 pass ] 1661 innodb.innodb-64k 'innodb' w1 [ 12 pass ] 1676 innodb_gis.kill_server 'innodb' w5 [ 13 pass ] 1482 main.index_merge_innodb 'innodb' w4 [ 9 pass ] 2166 innodb.innodb-64k 'innodb' w1 [ 13 pass ] 1665 innodb_gis.kill_server 'innodb' w5 [ 14 pass ] 1493 main.index_merge_innodb 'innodb' w4 [ 10 pass ] 2291 innodb.innodb-64k 'innodb' w1 [ 14 pass ] 1662 innodb_gis.kill_server 'innodb' w5 [ 15 pass ] 1644 innodb.innodb-64k 'innodb' w1 [ 15 pass ] 1689 innodb_gis.kill_server 'innodb' w5 [ 16 pass ] 1501 main.index_merge_innodb 'innodb' w4 [ 11 pass ] 2330 innodb.innodb-64k 'innodb' w1 [ 16 pass ] 1694 innodb_gis.kill_server 'innodb' w5 [ 17 pass ] 1521 main.index_merge_innodb 'innodb' w4 [ 12 pass ] 2558 innodb.innodb-64k 'innodb' w1 [ 17 pass ] 1678 innodb_gis.kill_server 'innodb' w5 [ 18 pass ] 1625 innodb_gis.kill_server 'innodb' w5 [ 19 pass ] 1714 innodb.innodb-64k 'innodb' w1 [ 18 pass ] 1708 main.index_merge_innodb 'innodb' w4 [ 13 pass ] 2610 innodb_gis.kill_server 'innodb' w5 [ 20 pass ] 1677 innodb.innodb-64k 'innodb' w1 [ 19 pass ] 1732 main.index_merge_innodb 'innodb' w4 [ 14 pass ] 2524 innodb.innodb-64k 'innodb' w1 [ 20 pass ] 1665 main.index_merge_innodb 'innodb' w4 [ 15 pass ] 2228 main.index_merge_innodb 'innodb' w4 [ 16 pass ] 2142 main.index_merge_innodb 'innodb' w4 [ 17 pass ] 2145 main.index_merge_innodb 'innodb' w4 [ 18 pass ] 2482 main.index_merge_innodb 'innodb' w4 [ 19 pass ] 2258 main.index_merge_innodb 'innodb' w4 [ 20 pass ] 2117 stress.ddl_innodb 'innodb' w6 [ pass ] 159829 stress.ddl_innodb 'innodb' w6 [ 2 pass ] 160236 stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160477 stress.ddl_innodb 'innodb' w6 [ 4 pass ] 159516 stress.ddl_innodb 'innodb' w6 [ 5 pass ] 159392 stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159883 stress.ddl_innodb 'innodb' w6 [ 7 pass ] 159986 As is is 5.15-rc2 5.15.0-0.rc2.20210923git58e2cf5d7946.21.fc36.x86_64 $ mysql-test/mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,} Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var Checking leftover processes... Removing old var directory... Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'... - symlinking 'var' to '/dev/shm/var_auto_1oiy' Checking supported features... MariaDB Version 10.6.5-MariaDB - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database... Using parallel: 8   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099 worker[7] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119 worker[6] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139 worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159 stress.ddl_innodb 'innodb' w6 [ pass ] 159402 stress.ddl_innodb 'innodb' w2 [ pass ] 159780 stress.ddl_innodb 'innodb' w8 [ pass ] 159673 stress.ddl_innodb 'innodb' w1 [ pass ] 161240 stress.ddl_innodb 'innodb' w5 [ pass ] 160823 stress.ddl_innodb 'innodb' w4 [ pass ] 160948 stress.ddl_innodb 'innodb' w7 [ pass ] 161745 stress.ddl_innodb 'innodb' w3 [ pass ] 161834 stress.ddl_innodb 'innodb' w6 [ 2 pass ] 161652 stress.ddl_innodb 'innodb' w2 [ 2 pass ] 162261 stress.ddl_innodb 'innodb' w7 [ 2 pass ] 162398 stress.ddl_innodb 'innodb' w1 [ 2 pass ] 162923 stress.ddl_innodb 'innodb' w8 [ 2 pass ] 163672 stress.ddl_innodb 'innodb' w4 [ 2 pass ] 164979 stress.ddl_innodb 'innodb' w5 [ 2 pass ] 165931 stress.ddl_innodb 'innodb' w3 [ 2 pass ] 164470 stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160845 stress.ddl_innodb 'innodb' w2 [ 3 pass ] 161123 stress.ddl_innodb 'innodb' w1 [ 3 pass ] 161309 stress.ddl_innodb 'innodb' w7 [ 3 pass ] 162090 stress.ddl_innodb 'innodb' w5 [ 3 pass ] 159452 stress.ddl_innodb 'innodb' w8 [ 3 pass ] 161963 stress.ddl_innodb 'innodb' w4 [ 3 pass ] 161161 stress.ddl_innodb 'innodb' w3 [ 3 pass ] 163132 stress.ddl_innodb 'innodb' w6 [ 4 pass ] 160392 stress.ddl_innodb 'innodb' w2 [ 4 pass ] 159988 stress.ddl_innodb 'innodb' w7 [ 4 pass ] 160013 stress.ddl_innodb 'innodb' w8 [ 4 pass ] 160109 stress.ddl_innodb 'innodb' w1 [ 4 pass ] 161107 stress.ddl_innodb 'innodb' w4 [ 4 pass ] 161043 stress.ddl_innodb 'innodb' w5 [ 4 pass ] 162480 stress.ddl_innodb 'innodb' w2 [ 5 pass ] 160167 stress.ddl_innodb 'innodb' w6 [ 5 pass ] 161083 stress.ddl_innodb 'innodb' w8 [ 5 pass ] 160605 stress.ddl_innodb 'innodb' w1 [ 5 pass ] 160305 stress.ddl_innodb 'innodb' w7 [ 5 pass ] 162074 stress.ddl_innodb 'innodb' w4 [ 5 pass ] 159965 stress.ddl_innodb 'innodb' w5 [ 5 pass ] 161298 stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159764 stress.ddl_innodb 'innodb' w2 [ 6 pass ] 160929 stress.ddl_innodb 'innodb' w1 [ 6 pass ] 160120 stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159705 stress.ddl_innodb 'innodb' w4 [ 6 pass ] 160425 stress.ddl_innodb 'innodb' w8 [ 6 pass ] 162438 stress.ddl_innodb 'innodb' w5 [ 6 pass ] 160353

            A slightly newer Debian Sid kernel seemed to perform slightly better on today’s 10.6, but a number of tests (including stress.ddl_innodb) would still hang. The kernel identifies itself as follows:

            Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-3 (2021-09-28)
            

            marko Marko Mäkelä added a comment - A slightly newer Debian Sid kernel seemed to perform slightly better on today’s 10.6, but a number of tests (including stress.ddl_innodb ) would still hang. The kernel identifies itself as follows: Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-3 (2021-09-28)

            I tested with Ubuntu Hirsute Hippo, Linux 5.11.0-37-generic and 10.6 (I verified that liburing 0.7-3ubuntu1 was in ldd). I could not reproduce any hangs:

            an@jan-HP-ZBook-15u-G5:~/mysql/10.6/mysql-test$ ./mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
            Logging: ./mtr  --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
            vardir: /home/jan/mysql/10.6/mysql-test/var
            Removing old var directory...
            Creating var directory '/home/jan/mysql/10.6/mysql-test/var'...
             - symlinking 'var' to '/dev/shm/var_auto_PGur'
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
            Using parallel: 8
            worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099
            worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
            worker[7] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139
            worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159
            stress.ddl_innodb 'innodb'               w7 [ pass ]  159881
            stress.ddl_innodb 'innodb'               w4 [ pass ]  160407
            stress.ddl_innodb 'innodb'               w2 [ pass ]  160677
            stress.ddl_innodb 'innodb'               w8 [ pass ]  160427
            stress.ddl_innodb 'innodb'               w3 [ pass ]  160237
            stress.ddl_innodb 'innodb'               w1 [ pass ]  160600
            stress.ddl_innodb 'innodb'               w6 [ pass ]  161037
            stress.ddl_innodb 'innodb'               w5 [ pass ]  161692
            stress.ddl_innodb 'innodb'               w7 [ 2 pass ]  159778
            stress.ddl_innodb 'innodb'               w3 [ 2 pass ]  159928
            stress.ddl_innodb 'innodb'               w2 [ 2 pass ]  160536
            stress.ddl_innodb 'innodb'               w4 [ 2 pass ]  160463
            stress.ddl_innodb 'innodb'               w1 [ 2 pass ]  160456
            stress.ddl_innodb 'innodb'               w6 [ 2 pass ]  159589
            stress.ddl_innodb 'innodb'               w8 [ 2 pass ]  160849
            stress.ddl_innodb 'innodb'               w5 [ 2 pass ]  160403
            stress.ddl_innodb 'innodb'               w7 [ 3 pass ]  159267
            stress.ddl_innodb 'innodb'               w2 [ 3 pass ]  159626
            stress.ddl_innodb 'innodb'               w3 [ 3 pass ]  160433
            stress.ddl_innodb 'innodb'               w1 [ 3 pass ]  160003
            stress.ddl_innodb 'innodb'               w6 [ 3 pass ]  160023
            stress.ddl_innodb 'innodb'               w4 [ 3 pass ]  160553
            stress.ddl_innodb 'innodb'               w8 [ 3 pass ]  160566
            stress.ddl_innodb 'innodb'               w5 [ 3 pass ]  160179
            stress.ddl_innodb 'innodb'               w7 [ 4 pass ]  160268
            stress.ddl_innodb 'innodb'               w4 [ 4 pass ]  159403
            stress.ddl_innodb 'innodb'               w6 [ 4 pass ]  159874
            stress.ddl_innodb 'innodb'               w3 [ 4 pass ]  160124
            stress.ddl_innodb 'innodb'               w1 [ 4 pass ]  160159
            stress.ddl_innodb 'innodb'               w2 [ 4 pass ]  160950
            stress.ddl_innodb 'innodb'               w5 [ 4 pass ]  159365
            stress.ddl_innodb 'innodb'               w8 [ 4 pass ]  161081
            stress.ddl_innodb 'innodb'               w7 [ 5 pass ]  159959
            stress.ddl_innodb 'innodb'               w4 [ 5 pass ]  160458
            stress.ddl_innodb 'innodb'               w6 [ 5 pass ]  160472
            stress.ddl_innodb 'innodb'               w1 [ 5 pass ]  160225
            stress.ddl_innodb 'innodb'               w5 [ 5 pass ]  159611
            stress.ddl_innodb 'innodb'               w2 [ 5 pass ]  160540
            stress.ddl_innodb 'innodb'               w3 [ 5 pass ]  160703
            stress.ddl_innodb 'innodb'               w8 [ 5 pass ]  160291
            stress.ddl_innodb 'innodb'               w7 [ 6 pass ]  159318
            stress.ddl_innodb 'innodb'               w6 [ 6 pass ]  159939
            stress.ddl_innodb 'innodb'               w5 [ 6 pass ]  159320
            stress.ddl_innodb 'innodb'               w4 [ 6 pass ]  160414
            stress.ddl_innodb 'innodb'               w1 [ 6 pass ]  159962
            stress.ddl_innodb 'innodb'               w3 [ 6 pass ]  159862
            stress.ddl_innodb 'innodb'               w2 [ 6 pass ]  160644
            stress.ddl_innodb 'innodb'               w8 [ 6 pass ]  160513
            stress.ddl_innodb 'innodb'               w7 [ 7 pass ]  161259
            stress.ddl_innodb 'innodb'               w6 [ 7 pass ]  159565
            stress.ddl_innodb 'innodb'               w1 [ 7 pass ]  159960
            stress.ddl_innodb 'innodb'               w5 [ 7 pass ]  160242
            stress.ddl_innodb 'innodb'               w4 [ 7 pass ]  161071
            stress.ddl_innodb 'innodb'               w3 [ 7 pass ]  160279
            stress.ddl_innodb 'innodb'               w2 [ 7 pass ]  160361
            stress.ddl_innodb 'innodb'               w8 [ 7 pass ]  159958
            stress.ddl_innodb 'innodb'               w7 [ 8 pass ]  159839
            stress.ddl_innodb 'innodb'               w6 [ 8 pass ]  159673
            stress.ddl_innodb 'innodb'               w1 [ 8 pass ]  160580
            stress.ddl_innodb 'innodb'               w5 [ 8 pass ]  160134
            stress.ddl_innodb 'innodb'               w4 [ 8 pass ]  159952
            stress.ddl_innodb 'innodb'               w3 [ 8 pass ]  160099
            stress.ddl_innodb 'innodb'               w8 [ 8 pass ]  159512
            stress.ddl_innodb 'innodb'               w2 [ 8 pass ]  161755
            stress.ddl_innodb 'innodb'               w7 [ 9 pass ]  159864
            stress.ddl_innodb 'innodb'               w6 [ 9 pass ]  160150
            stress.ddl_innodb 'innodb'               w4 [ 9 pass ]  159496
            stress.ddl_innodb 'innodb'               w5 [ 9 pass ]  160073
            stress.ddl_innodb 'innodb'               w8 [ 9 pass ]  159168
            stress.ddl_innodb 'innodb'               w1 [ 9 pass ]  160736
            stress.ddl_innodb 'innodb'               w3 [ 9 pass ]  160313
            stress.ddl_innodb 'innodb'               w2 [ 9 pass ]  159952
            stress.ddl_innodb 'innodb'               w4 [ 10 pass ]  160047
            stress.ddl_innodb 'innodb'               w7 [ 10 pass ]  161330
            stress.ddl_innodb 'innodb'               w5 [ 10 pass ]  159767
            stress.ddl_innodb 'innodb'               w8 [ 10 pass ]  160343
            stress.ddl_innodb 'innodb'               w3 [ 10 pass ]  159819
            stress.ddl_innodb 'innodb'               w6 [ 10 pass ]  162003
            stress.ddl_innodb 'innodb'               w1 [ 10 pass ]  161497
            stress.ddl_innodb 'innodb'               w2 [ 10 pass ]  162471
            stress.ddl_innodb 'innodb'               w5 [ 11 pass ]  159816
            stress.ddl_innodb 'innodb'               w4 [ 11 pass ]  160653
            stress.ddl_innodb 'innodb'               w7 [ 11 pass ]  160286
            stress.ddl_innodb 'innodb'               w6 [ 11 pass ]  159677
            stress.ddl_innodb 'innodb'               w3 [ 11 pass ]  160502
            stress.ddl_innodb 'innodb'               w1 [ 11 pass ]  159874
            stress.ddl_innodb 'innodb'               w8 [ 11 pass ]  162143
            stress.ddl_innodb 'innodb'               w2 [ 11 pass ]  161426
            stress.ddl_innodb 'innodb'               w5 [ 12 pass ]  160648
            stress.ddl_innodb 'innodb'               w4 [ 12 pass ]  161110
            stress.ddl_innodb 'innodb'               w7 [ 12 pass ]  161525
            stress.ddl_innodb 'innodb'               w3 [ 12 pass ]  160173
            stress.ddl_innodb 'innodb'               w6 [ 12 pass ]  161809
            stress.ddl_innodb 'innodb'               w8 [ 12 pass ]  160643
            stress.ddl_innodb 'innodb'               w1 [ 12 pass ]  161858
            stress.ddl_innodb 'innodb'               w2 [ 12 pass ]  161846
            stress.ddl_innodb 'innodb'               w5 [ 13 pass ]  160195
            stress.ddl_innodb 'innodb'               w4 [ 13 pass ]  160647
            stress.ddl_innodb 'innodb'               w7 [ 13 pass ]  159628
            stress.ddl_innodb 'innodb'               w8 [ 13 pass ]  159318
            stress.ddl_innodb 'innodb'               w3 [ 13 pass ]  161039
            stress.ddl_innodb 'innodb'               w6 [ 13 pass ]  160146
            stress.ddl_innodb 'innodb'               w1 [ 13 pass ]  162043
            stress.ddl_innodb 'innodb'               w2 [ 13 pass ]  160847
            stress.ddl_innodb 'innodb'               w5 [ 14 pass ]  159926
            stress.ddl_innodb 'innodb'               w4 [ 14 pass ]  159241
            stress.ddl_innodb 'innodb'               w7 [ 14 pass ]  160624
            stress.ddl_innodb 'innodb'               w8 [ 14 pass ]  159922
            stress.ddl_innodb 'innodb'               w3 [ 14 pass ]  160271
            stress.ddl_innodb 'innodb'               w6 [ 14 pass ]  160511
            stress.ddl_innodb 'innodb'               w1 [ 14 pass ]  161051
            stress.ddl_innodb 'innodb'               w2 [ 14 pass ]  160573
            stress.ddl_innodb 'innodb'               w5 [ 15 pass ]  160873
            stress.ddl_innodb 'innodb'               w7 [ 15 pass ]  159317
            stress.ddl_innodb 'innodb'               w4 [ 15 pass ]  161373
            stress.ddl_innodb 'innodb'               w8 [ 15 pass ]  159308
            stress.ddl_innodb 'innodb'               w3 [ 15 pass ]  160378
            stress.ddl_innodb 'innodb'               w6 [ 15 pass ]  160849
            stress.ddl_innodb 'innodb'               w1 [ 15 pass ]  160462
            stress.ddl_innodb 'innodb'               w2 [ 15 pass ]  160818
            stress.ddl_innodb 'innodb'               w7 [ 16 pass ]  159588
            stress.ddl_innodb 'innodb'               w5 [ 16 pass ]  160404
            stress.ddl_innodb 'innodb'               w4 [ 16 pass ]  160226
            stress.ddl_innodb 'innodb'               w8 [ 16 pass ]  160469
            stress.ddl_innodb 'innodb'               w3 [ 16 pass ]  160032
            stress.ddl_innodb 'innodb'               w6 [ 16 pass ]  159791
            stress.ddl_innodb 'innodb'               w1 [ 16 pass ]  159943
            stress.ddl_innodb 'innodb'               w2 [ 16 pass ]  160588
            stress.ddl_innodb 'innodb'               w7 [ 17 pass ]  159251
            stress.ddl_innodb 'innodb'               w5 [ 17 pass ]  160105
            stress.ddl_innodb 'innodb'               w8 [ 17 pass ]  160939
            stress.ddl_innodb 'innodb'               w4 [ 17 pass ]  161903
            stress.ddl_innodb 'innodb'               w3 [ 17 pass ]  160733
            stress.ddl_innodb 'innodb'               w6 [ 17 pass ]  159743
            stress.ddl_innodb 'innodb'               w1 [ 17 pass ]  160614
            stress.ddl_innodb 'innodb'               w2 [ 17 pass ]  161258
            stress.ddl_innodb 'innodb'               w7 [ 18 pass ]  160573
            stress.ddl_innodb 'innodb'               w8 [ 18 pass ]  159535
            stress.ddl_innodb 'innodb'               w5 [ 18 pass ]  160978
            stress.ddl_innodb 'innodb'               w4 [ 18 pass ]  159197
            stress.ddl_innodb 'innodb'               w3 [ 18 pass ]  159562
            stress.ddl_innodb 'innodb'               w6 [ 18 pass ]  161541
            stress.ddl_innodb 'innodb'               w1 [ 18 pass ]  161174
            stress.ddl_innodb 'innodb'               w2 [ 18 pass ]  162119
            stress.ddl_innodb 'innodb'               w7 [ 19 pass ]  161389
            stress.ddl_innodb 'innodb'               w5 [ 19 pass ]  160521
            stress.ddl_innodb 'innodb'               w8 [ 19 pass ]  160843
            stress.ddl_innodb 'innodb'               w4 [ 19 pass ]  160808
            stress.ddl_innodb 'innodb'               w3 [ 19 pass ]  160351
            stress.ddl_innodb 'innodb'               w6 [ 19 pass ]  162188
            stress.ddl_innodb 'innodb'               w1 [ 19 pass ]  160643
            stress.ddl_innodb 'innodb'               w2 [ 19 pass ]  161393
            stress.ddl_innodb 'innodb'               w7 [ 20 pass ]  160334
            stress.ddl_innodb 'innodb'               w5 [ 20 pass ]  159858
            stress.ddl_innodb 'innodb'               w8 [ 20 pass ]  160552
            stress.ddl_innodb 'innodb'               w4 [ 20 pass ]  160110
            stress.ddl_innodb 'innodb'               w3 [ 20 pass ]  160265
            stress.ddl_innodb 'innodb'               w6 [ 20 pass ]  159800
            stress.ddl_innodb 'innodb'               w1 [ 20 pass ]  159459
            stress.ddl_innodb 'innodb'               w2 [ 20 pass ]  159573
            stress.ddl_innodb 'innodb'               w7 [ pass ]  159908
            stress.ddl_innodb 'innodb'               w7 [ 2 pass ]  159908
            stress.ddl_innodb 'innodb'               w7 [ 3 pass ]  159753
            stress.ddl_innodb 'innodb'               w7 [ 4 pass ]  159958
            stress.ddl_innodb 'innodb'               w7 [ 5 pass ]  159503
            stress.ddl_innodb 'innodb'               w7 [ 6 pass ]  159597
            stress.ddl_innodb 'innodb'               w7 [ 7 pass ]  159679
            stress.ddl_innodb 'innodb'               w7 [ 8 pass ]  160126
            stress.ddl_innodb 'innodb'               w7 [ 9 pass ]  159365
            stress.ddl_innodb 'innodb'               w7 [ 10 pass ]  160122
            stress.ddl_innodb 'innodb'               w7 [ 11 pass ]  160423
            stress.ddl_innodb 'innodb'               w7 [ 12 pass ]  159771
            stress.ddl_innodb 'innodb'               w7 [ 13 pass ]  159654
            stress.ddl_innodb 'innodb'               w7 [ 14 pass ]  160061
            stress.ddl_innodb 'innodb'               w7 [ 15 pass ]  159866
            stress.ddl_innodb 'innodb'               w7 [ 16 pass ]  159555
            stress.ddl_innodb 'innodb'               w7 [ 17 pass ]  160426
            stress.ddl_innodb 'innodb'               w7 [ 18 pass ]  159447
            stress.ddl_innodb 'innodb'               w7 [ 19 pass ]  159955
            stress.ddl_innodb 'innodb'               w7 [ 20 pass ]  160084
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 28861.196 of 6420 seconds executing testcases
             
            Completed: All 180 tests were successful.
            

            jplindst Jan Lindström (Inactive) added a comment - I tested with Ubuntu Hirsute Hippo, Linux 5.11.0-37-generic and 10.6 (I verified that liburing 0.7-3ubuntu1 was in ldd). I could not reproduce any hangs: an@jan-HP-ZBook-15u-G5:~/mysql/10.6/mysql-test$ ./mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,} Logging: ./mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb vardir: /home/jan/mysql/10.6/mysql-test/var Removing old var directory... Creating var directory '/home/jan/mysql/10.6/mysql-test/var'... - symlinking 'var' to '/dev/shm/var_auto_PGur' Checking supported features... MariaDB Version 10.6.5-MariaDB - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database... Using parallel: 8 worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 worker[5] Using MTR_BUILD_THREAD 304, with reserved ports 16080..16099 worker[6] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119 worker[7] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139 worker[8] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159 stress.ddl_innodb 'innodb' w7 [ pass ] 159881 stress.ddl_innodb 'innodb' w4 [ pass ] 160407 stress.ddl_innodb 'innodb' w2 [ pass ] 160677 stress.ddl_innodb 'innodb' w8 [ pass ] 160427 stress.ddl_innodb 'innodb' w3 [ pass ] 160237 stress.ddl_innodb 'innodb' w1 [ pass ] 160600 stress.ddl_innodb 'innodb' w6 [ pass ] 161037 stress.ddl_innodb 'innodb' w5 [ pass ] 161692 stress.ddl_innodb 'innodb' w7 [ 2 pass ] 159778 stress.ddl_innodb 'innodb' w3 [ 2 pass ] 159928 stress.ddl_innodb 'innodb' w2 [ 2 pass ] 160536 stress.ddl_innodb 'innodb' w4 [ 2 pass ] 160463 stress.ddl_innodb 'innodb' w1 [ 2 pass ] 160456 stress.ddl_innodb 'innodb' w6 [ 2 pass ] 159589 stress.ddl_innodb 'innodb' w8 [ 2 pass ] 160849 stress.ddl_innodb 'innodb' w5 [ 2 pass ] 160403 stress.ddl_innodb 'innodb' w7 [ 3 pass ] 159267 stress.ddl_innodb 'innodb' w2 [ 3 pass ] 159626 stress.ddl_innodb 'innodb' w3 [ 3 pass ] 160433 stress.ddl_innodb 'innodb' w1 [ 3 pass ] 160003 stress.ddl_innodb 'innodb' w6 [ 3 pass ] 160023 stress.ddl_innodb 'innodb' w4 [ 3 pass ] 160553 stress.ddl_innodb 'innodb' w8 [ 3 pass ] 160566 stress.ddl_innodb 'innodb' w5 [ 3 pass ] 160179 stress.ddl_innodb 'innodb' w7 [ 4 pass ] 160268 stress.ddl_innodb 'innodb' w4 [ 4 pass ] 159403 stress.ddl_innodb 'innodb' w6 [ 4 pass ] 159874 stress.ddl_innodb 'innodb' w3 [ 4 pass ] 160124 stress.ddl_innodb 'innodb' w1 [ 4 pass ] 160159 stress.ddl_innodb 'innodb' w2 [ 4 pass ] 160950 stress.ddl_innodb 'innodb' w5 [ 4 pass ] 159365 stress.ddl_innodb 'innodb' w8 [ 4 pass ] 161081 stress.ddl_innodb 'innodb' w7 [ 5 pass ] 159959 stress.ddl_innodb 'innodb' w4 [ 5 pass ] 160458 stress.ddl_innodb 'innodb' w6 [ 5 pass ] 160472 stress.ddl_innodb 'innodb' w1 [ 5 pass ] 160225 stress.ddl_innodb 'innodb' w5 [ 5 pass ] 159611 stress.ddl_innodb 'innodb' w2 [ 5 pass ] 160540 stress.ddl_innodb 'innodb' w3 [ 5 pass ] 160703 stress.ddl_innodb 'innodb' w8 [ 5 pass ] 160291 stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159318 stress.ddl_innodb 'innodb' w6 [ 6 pass ] 159939 stress.ddl_innodb 'innodb' w5 [ 6 pass ] 159320 stress.ddl_innodb 'innodb' w4 [ 6 pass ] 160414 stress.ddl_innodb 'innodb' w1 [ 6 pass ] 159962 stress.ddl_innodb 'innodb' w3 [ 6 pass ] 159862 stress.ddl_innodb 'innodb' w2 [ 6 pass ] 160644 stress.ddl_innodb 'innodb' w8 [ 6 pass ] 160513 stress.ddl_innodb 'innodb' w7 [ 7 pass ] 161259 stress.ddl_innodb 'innodb' w6 [ 7 pass ] 159565 stress.ddl_innodb 'innodb' w1 [ 7 pass ] 159960 stress.ddl_innodb 'innodb' w5 [ 7 pass ] 160242 stress.ddl_innodb 'innodb' w4 [ 7 pass ] 161071 stress.ddl_innodb 'innodb' w3 [ 7 pass ] 160279 stress.ddl_innodb 'innodb' w2 [ 7 pass ] 160361 stress.ddl_innodb 'innodb' w8 [ 7 pass ] 159958 stress.ddl_innodb 'innodb' w7 [ 8 pass ] 159839 stress.ddl_innodb 'innodb' w6 [ 8 pass ] 159673 stress.ddl_innodb 'innodb' w1 [ 8 pass ] 160580 stress.ddl_innodb 'innodb' w5 [ 8 pass ] 160134 stress.ddl_innodb 'innodb' w4 [ 8 pass ] 159952 stress.ddl_innodb 'innodb' w3 [ 8 pass ] 160099 stress.ddl_innodb 'innodb' w8 [ 8 pass ] 159512 stress.ddl_innodb 'innodb' w2 [ 8 pass ] 161755 stress.ddl_innodb 'innodb' w7 [ 9 pass ] 159864 stress.ddl_innodb 'innodb' w6 [ 9 pass ] 160150 stress.ddl_innodb 'innodb' w4 [ 9 pass ] 159496 stress.ddl_innodb 'innodb' w5 [ 9 pass ] 160073 stress.ddl_innodb 'innodb' w8 [ 9 pass ] 159168 stress.ddl_innodb 'innodb' w1 [ 9 pass ] 160736 stress.ddl_innodb 'innodb' w3 [ 9 pass ] 160313 stress.ddl_innodb 'innodb' w2 [ 9 pass ] 159952 stress.ddl_innodb 'innodb' w4 [ 10 pass ] 160047 stress.ddl_innodb 'innodb' w7 [ 10 pass ] 161330 stress.ddl_innodb 'innodb' w5 [ 10 pass ] 159767 stress.ddl_innodb 'innodb' w8 [ 10 pass ] 160343 stress.ddl_innodb 'innodb' w3 [ 10 pass ] 159819 stress.ddl_innodb 'innodb' w6 [ 10 pass ] 162003 stress.ddl_innodb 'innodb' w1 [ 10 pass ] 161497 stress.ddl_innodb 'innodb' w2 [ 10 pass ] 162471 stress.ddl_innodb 'innodb' w5 [ 11 pass ] 159816 stress.ddl_innodb 'innodb' w4 [ 11 pass ] 160653 stress.ddl_innodb 'innodb' w7 [ 11 pass ] 160286 stress.ddl_innodb 'innodb' w6 [ 11 pass ] 159677 stress.ddl_innodb 'innodb' w3 [ 11 pass ] 160502 stress.ddl_innodb 'innodb' w1 [ 11 pass ] 159874 stress.ddl_innodb 'innodb' w8 [ 11 pass ] 162143 stress.ddl_innodb 'innodb' w2 [ 11 pass ] 161426 stress.ddl_innodb 'innodb' w5 [ 12 pass ] 160648 stress.ddl_innodb 'innodb' w4 [ 12 pass ] 161110 stress.ddl_innodb 'innodb' w7 [ 12 pass ] 161525 stress.ddl_innodb 'innodb' w3 [ 12 pass ] 160173 stress.ddl_innodb 'innodb' w6 [ 12 pass ] 161809 stress.ddl_innodb 'innodb' w8 [ 12 pass ] 160643 stress.ddl_innodb 'innodb' w1 [ 12 pass ] 161858 stress.ddl_innodb 'innodb' w2 [ 12 pass ] 161846 stress.ddl_innodb 'innodb' w5 [ 13 pass ] 160195 stress.ddl_innodb 'innodb' w4 [ 13 pass ] 160647 stress.ddl_innodb 'innodb' w7 [ 13 pass ] 159628 stress.ddl_innodb 'innodb' w8 [ 13 pass ] 159318 stress.ddl_innodb 'innodb' w3 [ 13 pass ] 161039 stress.ddl_innodb 'innodb' w6 [ 13 pass ] 160146 stress.ddl_innodb 'innodb' w1 [ 13 pass ] 162043 stress.ddl_innodb 'innodb' w2 [ 13 pass ] 160847 stress.ddl_innodb 'innodb' w5 [ 14 pass ] 159926 stress.ddl_innodb 'innodb' w4 [ 14 pass ] 159241 stress.ddl_innodb 'innodb' w7 [ 14 pass ] 160624 stress.ddl_innodb 'innodb' w8 [ 14 pass ] 159922 stress.ddl_innodb 'innodb' w3 [ 14 pass ] 160271 stress.ddl_innodb 'innodb' w6 [ 14 pass ] 160511 stress.ddl_innodb 'innodb' w1 [ 14 pass ] 161051 stress.ddl_innodb 'innodb' w2 [ 14 pass ] 160573 stress.ddl_innodb 'innodb' w5 [ 15 pass ] 160873 stress.ddl_innodb 'innodb' w7 [ 15 pass ] 159317 stress.ddl_innodb 'innodb' w4 [ 15 pass ] 161373 stress.ddl_innodb 'innodb' w8 [ 15 pass ] 159308 stress.ddl_innodb 'innodb' w3 [ 15 pass ] 160378 stress.ddl_innodb 'innodb' w6 [ 15 pass ] 160849 stress.ddl_innodb 'innodb' w1 [ 15 pass ] 160462 stress.ddl_innodb 'innodb' w2 [ 15 pass ] 160818 stress.ddl_innodb 'innodb' w7 [ 16 pass ] 159588 stress.ddl_innodb 'innodb' w5 [ 16 pass ] 160404 stress.ddl_innodb 'innodb' w4 [ 16 pass ] 160226 stress.ddl_innodb 'innodb' w8 [ 16 pass ] 160469 stress.ddl_innodb 'innodb' w3 [ 16 pass ] 160032 stress.ddl_innodb 'innodb' w6 [ 16 pass ] 159791 stress.ddl_innodb 'innodb' w1 [ 16 pass ] 159943 stress.ddl_innodb 'innodb' w2 [ 16 pass ] 160588 stress.ddl_innodb 'innodb' w7 [ 17 pass ] 159251 stress.ddl_innodb 'innodb' w5 [ 17 pass ] 160105 stress.ddl_innodb 'innodb' w8 [ 17 pass ] 160939 stress.ddl_innodb 'innodb' w4 [ 17 pass ] 161903 stress.ddl_innodb 'innodb' w3 [ 17 pass ] 160733 stress.ddl_innodb 'innodb' w6 [ 17 pass ] 159743 stress.ddl_innodb 'innodb' w1 [ 17 pass ] 160614 stress.ddl_innodb 'innodb' w2 [ 17 pass ] 161258 stress.ddl_innodb 'innodb' w7 [ 18 pass ] 160573 stress.ddl_innodb 'innodb' w8 [ 18 pass ] 159535 stress.ddl_innodb 'innodb' w5 [ 18 pass ] 160978 stress.ddl_innodb 'innodb' w4 [ 18 pass ] 159197 stress.ddl_innodb 'innodb' w3 [ 18 pass ] 159562 stress.ddl_innodb 'innodb' w6 [ 18 pass ] 161541 stress.ddl_innodb 'innodb' w1 [ 18 pass ] 161174 stress.ddl_innodb 'innodb' w2 [ 18 pass ] 162119 stress.ddl_innodb 'innodb' w7 [ 19 pass ] 161389 stress.ddl_innodb 'innodb' w5 [ 19 pass ] 160521 stress.ddl_innodb 'innodb' w8 [ 19 pass ] 160843 stress.ddl_innodb 'innodb' w4 [ 19 pass ] 160808 stress.ddl_innodb 'innodb' w3 [ 19 pass ] 160351 stress.ddl_innodb 'innodb' w6 [ 19 pass ] 162188 stress.ddl_innodb 'innodb' w1 [ 19 pass ] 160643 stress.ddl_innodb 'innodb' w2 [ 19 pass ] 161393 stress.ddl_innodb 'innodb' w7 [ 20 pass ] 160334 stress.ddl_innodb 'innodb' w5 [ 20 pass ] 159858 stress.ddl_innodb 'innodb' w8 [ 20 pass ] 160552 stress.ddl_innodb 'innodb' w4 [ 20 pass ] 160110 stress.ddl_innodb 'innodb' w3 [ 20 pass ] 160265 stress.ddl_innodb 'innodb' w6 [ 20 pass ] 159800 stress.ddl_innodb 'innodb' w1 [ 20 pass ] 159459 stress.ddl_innodb 'innodb' w2 [ 20 pass ] 159573 stress.ddl_innodb 'innodb' w7 [ pass ] 159908 stress.ddl_innodb 'innodb' w7 [ 2 pass ] 159908 stress.ddl_innodb 'innodb' w7 [ 3 pass ] 159753 stress.ddl_innodb 'innodb' w7 [ 4 pass ] 159958 stress.ddl_innodb 'innodb' w7 [ 5 pass ] 159503 stress.ddl_innodb 'innodb' w7 [ 6 pass ] 159597 stress.ddl_innodb 'innodb' w7 [ 7 pass ] 159679 stress.ddl_innodb 'innodb' w7 [ 8 pass ] 160126 stress.ddl_innodb 'innodb' w7 [ 9 pass ] 159365 stress.ddl_innodb 'innodb' w7 [ 10 pass ] 160122 stress.ddl_innodb 'innodb' w7 [ 11 pass ] 160423 stress.ddl_innodb 'innodb' w7 [ 12 pass ] 159771 stress.ddl_innodb 'innodb' w7 [ 13 pass ] 159654 stress.ddl_innodb 'innodb' w7 [ 14 pass ] 160061 stress.ddl_innodb 'innodb' w7 [ 15 pass ] 159866 stress.ddl_innodb 'innodb' w7 [ 16 pass ] 159555 stress.ddl_innodb 'innodb' w7 [ 17 pass ] 160426 stress.ddl_innodb 'innodb' w7 [ 18 pass ] 159447 stress.ddl_innodb 'innodb' w7 [ 19 pass ] 159955 stress.ddl_innodb 'innodb' w7 [ 20 pass ] 160084 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 28861.196 of 6420 seconds executing testcases   Completed: All 180 tests were successful.

            The Debian Sid (unstable) package linux-image-5.14.0-2-amd64 is still affected by this. According to /proc/version the build is 2 days old:

            Linux version 5.14.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.9-2 (2021-10-03)
            

            marko Marko Mäkelä added a comment - The Debian Sid (unstable) package linux-image-5.14.0-2-amd64 is still affected by this. According to /proc/version the build is 2 days old: Linux version 5.14.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.9-2 (2021-10-03)
            danblack Daniel Black added a comment -
            danblack Daniel Black added a comment - Linux stable kernel 5.14 appears to have io_uring patches backported https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.14.y Fedora https://koji.fedoraproject.org/koji/packageinfo?packageID=8 following stable branch. No further action needed in server code. Maybe something in release notice. Need to watch Debian sid if we release a 10.5+ before 5.15 lands there, or Debian users make get the instability they requested .
            danblack Daniel Black added a comment -

            Tested Fedora 5.14.9-200.fc34.x86_64 again. It failed. From https://gitlab.com/cki-project/kernel-ark/-/commits/fedora-5.14/ its following the Linux 5.14 LTS stable. I'll get something to Kernel upstream for their testing.

            danblack Daniel Black added a comment - Tested Fedora 5.14.9-200.fc34.x86_64 again. It failed. From https://gitlab.com/cki-project/kernel-ark/-/commits/fedora-5.14/ its following the Linux 5.14 LTS stable. I'll get something to Kernel upstream for their testing.
            danblack Daniel Black added a comment -

            Pushed a test image to quay.io/danielgblack/mariadb-test:uring based on ubuntu:hirsute (and its distro liburing).

            quay.io/danielgblack/mariadb-test:uring, 5.14.9-200.fc34.x86_64

            $ podman run --rm --privileged=true quay.io/danielgblack/mariadb-test:uring
            Logging: ./mtr  --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
            vardir: /usr/share/mysql/mysql-test/var
            Removing old var directory...
            Creating var directory '/usr/share/mysql/mysql-test/var'...
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB-1:10.6.0
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[4] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
            worker[2] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[3] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            stress.ddl_innodb 'innodb'               w1 [ pass ]  174276
            stress.ddl_innodb 'innodb'               w4 [ pass ]  175766
            stress.ddl_innodb 'innodb'               w3 [ pass ]  176009
            stress.ddl_innodb 'innodb'               w2 [ pass ]  184728
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 710.779 of 201 seconds executing testcases
             
            Completed: All 4 tests were successful.
            

            privileged needed for iouring to be enabled in the container.

            Checked logs of running instance:

            $ podman exec -ti  xenodochial_buck bash
            root@a1c4180e2dd9:/usr/share/mysql/mysql-test# grep uring var/*/log/mysqld.*.err
            var/1/log/mysqld.1.err:2021-10-07  5:42:46 0 [Note] InnoDB: Using liburing
            var/2/log/mysqld.1.err:2021-10-07  5:42:46 0 [Note] InnoDB: Using liburing
            var/3/log/mysqld.1.err:2021-10-07  5:42:46 0 [Note] InnoDB: Using liburing
            var/4/log/mysqld.1.err:2021-10-07  5:42:46 0 [Note] InnoDB: Using liburing
            

            Same running kernel:

            from 10.6 build

            $  mysql-test/mtr  --mem  --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}
             
            stress.ddl_innodb 'innodb'               w4 [ pass ]  333060
            stress.ddl_innodb 'innodb'               w2 [ pass ]  335619
            stress.ddl_innodb 'innodb'               w3 [ fail ]
                    Test ended at 2021-10-07 17:06:44
             
            Version: '10.6.5-MariaDB-log'  socket: '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/tmp/3/mysqld.1.sock'  port: 16040  Source distribution
            2021-10-07 17:06:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
            211007 17:06:43 [ERROR] mysqld got signal 6 ;
            

            podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true --vardir=/vartest

             
            $ podman run -v testvardir:/vartest:Z  --rm --privileged=true  --entrypoint=bash mariadb_test:uring
            root@92e943fbdf74:/usr/share/mysql/mysql-test# chown mysql: /vartest
             
             
            root@54f4f12adac4:/vartest# mount | grep var
            /dev/mapper/fedora_localhost--live-home on /vartest type ext4 (rw,nosuid,nodev,relatime,seclabel)
             
            ..
            $ podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true    --vardir=/vartest
            Logging: ./mtr  --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb --vardir=/vartest
            vardir: /vartest
            Checking leftover processes...
            Removing old var directory...
            Couldn't remove directory '/vartest': Permission denied at /usr/share/perl/5.32/File/Find.pm line 511.
            Creating var directory '/vartest'...
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB-1:10.6.0
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[2] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
            stress.ddl_innodb 'innodb'               w3 [ pass ]  294106
            stress.ddl_innodb 'innodb'               w2 [ pass ]  295536
            stress.ddl_innodb 'innodb'               w1 [ pass ]  305118
            stress.ddl_innodb 'innodb'               w4 [ pass ]  306769
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 1201.529 of 323 seconds executing testcases
             
            Completed: All 4 tests were successful.
            
            

            So I'm a little a loss to explain the container behavior difference at the moment.

            danblack Daniel Black added a comment - Pushed a test image to quay.io/danielgblack/mariadb-test:uring based on ubuntu:hirsute (and its distro liburing). quay.io/danielgblack/mariadb-test:uring, 5.14.9-200.fc34.x86_64 $ podman run --rm --privileged=true quay.io/danielgblack/mariadb-test:uring Logging: ./mtr --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb vardir: /usr/share/mysql/mysql-test/var Removing old var directory... Creating var directory '/usr/share/mysql/mysql-test/var'... Checking supported features... MariaDB Version 10.6.5-MariaDB-1:10.6.0 - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database...   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[4] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 worker[2] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[3] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 stress.ddl_innodb 'innodb' w1 [ pass ] 174276 stress.ddl_innodb 'innodb' w4 [ pass ] 175766 stress.ddl_innodb 'innodb' w3 [ pass ] 176009 stress.ddl_innodb 'innodb' w2 [ pass ] 184728 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 710.779 of 201 seconds executing testcases   Completed: All 4 tests were successful. privileged needed for iouring to be enabled in the container. Checked logs of running instance: $ podman exec -ti xenodochial_buck bash root@a1c4180e2dd9:/usr/share/mysql/mysql-test# grep uring var/*/log/mysqld.*.err var/1/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing var/2/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing var/3/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing var/4/log/mysqld.1.err:2021-10-07 5:42:46 0 [Note] InnoDB: Using liburing Same running kernel: from 10.6 build $ mysql-test/mtr --mem --repeat=20 --parallel=auto stress.ddl_innodb{,,,,,,,,}   stress.ddl_innodb 'innodb' w4 [ pass ] 333060 stress.ddl_innodb 'innodb' w2 [ pass ] 335619 stress.ddl_innodb 'innodb' w3 [ fail ] Test ended at 2021-10-07 17:06:44   Version: '10.6.5-MariaDB-log' socket: '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/tmp/3/mysqld.1.sock' port: 16040 Source distribution 2021-10-07 17:06:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ 211007 17:06:43 [ERROR] mysqld got signal 6 ; podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true --vardir=/vartest   $ podman run -v testvardir:/vartest:Z --rm --privileged=true --entrypoint=bash mariadb_test:uring root@92e943fbdf74:/usr/share/mysql/mysql-test# chown mysql: /vartest     root@54f4f12adac4:/vartest# mount | grep var /dev/mapper/fedora_localhost--live-home on /vartest type ext4 (rw,nosuid,nodev,relatime,seclabel)   .. $ podman run -v testvardir:/vartest:Z quay.io/danielgblack/mariadb-test:uring --rm --privileged=true --vardir=/vartest Logging: ./mtr --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb --vardir=/vartest vardir: /vartest Checking leftover processes... Removing old var directory... Couldn't remove directory '/vartest': Permission denied at /usr/share/perl/5.32/File/Find.pm line 511. Creating var directory '/vartest'... Checking supported features... MariaDB Version 10.6.5-MariaDB-1:10.6.0 - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database...   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[2] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 stress.ddl_innodb 'innodb' w3 [ pass ] 294106 stress.ddl_innodb 'innodb' w2 [ pass ] 295536 stress.ddl_innodb 'innodb' w1 [ pass ] 305118 stress.ddl_innodb 'innodb' w4 [ pass ] 306769 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 1201.529 of 323 seconds executing testcases   Completed: All 4 tests were successful. So I'm a little a loss to explain the container behavior difference at the moment.

            This has not been fixed in the linux-image-5.14.0-3-amd64 package of Debian Sid (unstable), even though the changelog mentions a potentially related change:

                - io_uring: allow conditional reschedule for intensive iterators
            

            This kernel identifies itself in /proc/version as follows:

            Linux version 5.14.0-3-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.12-1 (2021-10-14)
            

            Here is an excerpt from thread apply all backtrace output of GDB that I attached to one of the hung mariadbd processes while running mtr. One of them was for the test stresst.ddl_innodb.

            Thread 27 (Thread 0x7faad3fff640 (LWP 55250) "iou-wrk-50544"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 26 (Thread 0x7faaf77ff640 (LWP 53108) "iou-wrk-50083"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 25 (Thread 0x7faafc155640 (LWP 51470) "iou-wrk-50949"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 24 (Thread 0x7faafc10a640 (LWP 51399) "iou-wrk-50950"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            marko Marko Mäkelä added a comment - This has not been fixed in the linux-image-5.14.0-3-amd64 package of Debian Sid (unstable), even though the changelog mentions a potentially related change: - io_uring: allow conditional reschedule for intensive iterators This kernel identifies itself in /proc/version as follows: Linux version 5.14.0-3-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-11) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.12-1 (2021-10-14) Here is an excerpt from thread apply all backtrace output of GDB that I attached to one of the hung mariadbd processes while running mtr . One of them was for the test stresst.ddl_innodb . Thread 27 (Thread 0x7faad3fff640 (LWP 55250) "iou-wrk-50544"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 26 (Thread 0x7faaf77ff640 (LWP 53108) "iou-wrk-50083"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 25 (Thread 0x7faafc155640 (LWP 51470) "iou-wrk-50949"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 24 (Thread 0x7faafc10a640 (LWP 51399) "iou-wrk-50950"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0
            danblack Daniel Black added a comment -

            impish tests:

            due to bb failures - 5.13.0-19-generic appears faulty.

            my testing on local VM shows linux-image-generic 5.13.0.20.31 appears to be passing.

            also tested the following without error
            testing mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15-rc6/
            tested mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/

            tests included:

            dan@impish:/usr/share/mysql/mysql-test$ ./mtr --parallel=auto  --vardir=/tmp/var   --repeat=10 main.innodb_ext_key{,,,,,,,,,,,,,,,}
             ./mtr --vardir=/tmp/var   --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb 
             ./mtr --vardir=/tmp/var   --parallel=4 encryption.innochecksum{,,,,,}
            

            using build-43438 from hasky 10.6.5

            danblack Daniel Black added a comment - impish tests: due to bb failures - 5.13.0-19-generic appears faulty. my testing on local VM shows linux-image-generic 5.13.0.20.31 appears to be passing. also tested the following without error testing mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15-rc6/ tested mainline https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ tests included: dan@impish:/usr/share/mysql/mysql-test$ ./mtr --parallel=auto --vardir=/tmp/var --repeat=10 main.innodb_ext_key{,,,,,,,,,,,,,,,} ./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb ./mtr --vardir=/tmp/var --parallel=4 encryption.innochecksum{,,,,,} using build-43438 from hasky 10.6.5
            danblack Daniel Black added a comment -

            also tried the Debian kernel (running on Ubuntu):
            Preparing to unpack linux-image-5.14.0-3-amd64_5.14.12-1_amd64.deb ...
            Unpacking linux-image-5.14.0-3-amd64 (5.14.12-1) .

            dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var   --parallel=4 encryption.innochecksum{,,,,,}
            ....
            encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ]  11666
            encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ]   7296
            encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w3 [ pass ]   6679
            --------------------------------------------------------------------------
            The servers were restarted 14 times
            Spent 637.134 of 196 seconds executing testcases
             
            Completed: All 72 tests were successful.
             
            dan@impish:/usr/share/mysql/mysql-test$ uname -a
            Linux impish 5.14.0-3-amd64 #1 SMP Debian 5.14.12-1 (2021-10-14) x86_64 x86_64 x86_64 GNU/Linux
             
            dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var   --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb 
            Logging: ./mtr  --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb
            vardir: /tmp/var
            Checking leftover processes...
            Removing old var directory...
            Creating var directory '/tmp/var'...
            Checking supported features...
            MariaDB Version 10.6.5-MariaDB-1:10.6.5+maria~impish
             - SSL connections supported
             - binaries built with wsrep patch
            Collecting tests...
            Installing system database...
             
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            worker[4] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
            worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
            worker[2] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
            stress.ddl_innodb 'innodb'               w4 [ pass ]  240353
            stress.ddl_innodb 'innodb'               w1 [ pass ]  250223
            stress.ddl_innodb 'innodb'               w2 [ fail ]
                    Test ended at 2021-10-21 09:08:46
             
            CURRENT_TEST: stress.ddl_innodb
             
            2021-10-21  9:08:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
            211021  9:08:43 [ERROR] mysqld got signal 6 ;
            

            (which was known, but we've ruled it explicitly to the kernel)

            danblack Daniel Black added a comment - also tried the Debian kernel (running on Ubuntu): Preparing to unpack linux-image-5.14.0-3-amd64_5.14.12-1_amd64.deb ... Unpacking linux-image-5.14.0-3-amd64 (5.14.12-1) . dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var --parallel=4 encryption.innochecksum{,,,,,} .... encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 11666 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 7296 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w3 [ pass ] 6679 -------------------------------------------------------------------------- The servers were restarted 14 times Spent 637.134 of 196 seconds executing testcases   Completed: All 72 tests were successful.   dan@impish:/usr/share/mysql/mysql-test$ uname -a Linux impish 5.14.0-3-amd64 #1 SMP Debian 5.14.12-1 (2021-10-14) x86_64 x86_64 x86_64 GNU/Linux   dan@impish:/usr/share/mysql/mysql-test$ ./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb Logging: ./mtr --vardir=/tmp/var --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb vardir: /tmp/var Checking leftover processes... Removing old var directory... Creating var directory '/tmp/var'... Checking supported features... MariaDB Version 10.6.5-MariaDB-1:10.6.5+maria~impish - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database...   ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[4] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[2] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 stress.ddl_innodb 'innodb' w4 [ pass ] 240353 stress.ddl_innodb 'innodb' w1 [ pass ] 250223 stress.ddl_innodb 'innodb' w2 [ fail ] Test ended at 2021-10-21 09:08:46   CURRENT_TEST: stress.ddl_innodb   2021-10-21 9:08:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ 211021 9:08:43 [ERROR] mysqld got signal 6 ; (which was known, but we've ruled it explicitly to the kernel)
            danblack Daniel Black added a comment -

            Retested 5.14.14 mainline and impish 5.13.0-20 kernel both are failing. Upstream bug reported per link.

            Retesting 5.15-rc6 and it still appears good.

            danblack Daniel Black added a comment - Retested 5.14.14 mainline and impish 5.13.0-20 kernel both are failing. Upstream bug reported per link. Retesting 5.15-rc6 and it still appears good.

            Don’t try this at home like I did, but I installed the 5.15-rc6 kernel image and modules from https://kernel.ubuntu.com/~kernel-ppa/mainline/ in my Debian Sid system. I had to convert the .deb packages to Debian format first:

            for i in linux-*5.15*.deb
            do ar x "$i" &&
               zstd -dc < data.tar.zst|gzip >| data.tar.gz &&
               zstd -dc < control.tar.zst|gzip >|control.tar.gz &&
               ar q ubuntu-"$i" debian-binary control.tar.gz data.tar.gz
            done
            sudo dpkg -i ubuntu-linux-*5.15*.deb
            sudo ln -s perf_5.14 /usr/bin/perf_5.15
            

            With this kernel, the problem appears to be fixed. Even perf appears to work with the above hack.

            marko Marko Mäkelä added a comment - Don’t try this at home like I did, but I installed the 5.15-rc6 kernel image and modules from https://kernel.ubuntu.com/~kernel-ppa/mainline/ in my Debian Sid system. I had to convert the .deb packages to Debian format first: for i in linux-*5.15*.deb do ar x "$i" && zstd - dc < data. tar .zst| gzip >| data. tar .gz && zstd - dc < control. tar .zst| gzip >|control. tar .gz && ar q ubuntu- "$i" debian-binary control. tar .gz data. tar .gz done sudo dpkg -i ubuntu-linux-*5.15*.deb sudo ln -s perf_5.14 /usr/bin/perf_5 .15 With this kernel, the problem appears to be fixed. Even perf appears to work with the above hack.

            My joy based on testing with /dev/shm was premature. A Sysbench run on NVMe (using O_DIRECT and ext4fs) hung after a while:

            [ 100s ] thds: 16 tps: 132882.45 qps: 132882.45 (r/w/o: 0.00/132882.45/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00
            [ 105s ] thds: 16 tps: 130555.92 qps: 130555.92 (r/w/o: 0.00/130555.92/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00
            [ 110s ] thds: 16 tps: 93427.95 qps: 93427.95 (r/w/o: 0.00/93427.95/0.00) lat (ms,99%): 0.29 err/s: 0.00 reconn/s: 0.00
            [ 115s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 120s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            

            According to sudo perf top -p $(pgrep mariadbd), most activity takes place in the kernel address space. I attached GDB to the apparently hung process and got lots of threads like this:

            (gdb) thr a a bt
             
            Thread 42 (Thread 0x7fb793fff640 (LWP 528551) "iou-wrk-527007"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 41 (Thread 0x7fb8cabaf640 (LWP 528540) "iou-wrk-526936"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 40 (Thread 0x7fb9dc0a0640 (LWP 528524) "iou-wrk-526959"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 39 (Thread 0x7fb8c0555640 (LWP 528521) "iou-wrk-527463"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 38 (Thread 0x7fb8c04bf640 (LWP 528513) "iou-wrk-527465"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 37 (Thread 0x7fb8c07f8640 (LWP 528512) "iou-wrk-527454"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 36 (Thread 0x7fb8c050a640 (LWP 528509) "iou-wrk-527464"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 35 (Thread 0x7fb8c0636640 (LWP 528506) "iou-wrk-527460"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 34 (Thread 0x7fb8c07ad640 (LWP 528502) "iou-wrk-527455"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 33 (Thread 0x7fb8c05a0640 (LWP 528500) "iou-wrk-527462"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 32 (Thread 0x7fb9dc055640 (LWP 528436) "iou-wrk-527452"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 31 (Thread 0x7fb8c8090640 (LWP 528433) "iou-wrk-527453"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 30 (Thread 0x7fb8c06cc640 (LWP 528409) "iou-wrk-527458"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 29 (Thread 0x7fb8c0474640 (LWP 528391) "iou-wrk-527466"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 28 (Thread 0x7fb8c0762640 (LWP 528364) "iou-wrk-527456"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 27 (Thread 0x7fb8c05eb640 (LWP 528323) "iou-wrk-527461"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 26 (Thread 0x7fb8c0717640 (LWP 528185) "iou-wrk-527457"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 25 (Thread 0x7fb8c0681640 (LWP 527848) "iou-wrk-527459"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            I will reboot into the Debian 5.10 kernel, which worked for me in the past.

            marko Marko Mäkelä added a comment - My joy based on testing with /dev/shm was premature. A Sysbench run on NVMe (using O_DIRECT and ext4fs ) hung after a while: [ 100s ] thds: 16 tps: 132882.45 qps: 132882.45 (r/w/o: 0.00/132882.45/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00 [ 105s ] thds: 16 tps: 130555.92 qps: 130555.92 (r/w/o: 0.00/130555.92/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00 [ 110s ] thds: 16 tps: 93427.95 qps: 93427.95 (r/w/o: 0.00/93427.95/0.00) lat (ms,99%): 0.29 err/s: 0.00 reconn/s: 0.00 [ 115s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 120s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 According to sudo perf top -p $(pgrep mariadbd) , most activity takes place in the kernel address space. I attached GDB to the apparently hung process and got lots of threads like this: (gdb) thr a a bt   Thread 42 (Thread 0x7fb793fff640 (LWP 528551) "iou-wrk-527007"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 41 (Thread 0x7fb8cabaf640 (LWP 528540) "iou-wrk-526936"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 40 (Thread 0x7fb9dc0a0640 (LWP 528524) "iou-wrk-526959"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 39 (Thread 0x7fb8c0555640 (LWP 528521) "iou-wrk-527463"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 38 (Thread 0x7fb8c04bf640 (LWP 528513) "iou-wrk-527465"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 37 (Thread 0x7fb8c07f8640 (LWP 528512) "iou-wrk-527454"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 36 (Thread 0x7fb8c050a640 (LWP 528509) "iou-wrk-527464"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 35 (Thread 0x7fb8c0636640 (LWP 528506) "iou-wrk-527460"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 34 (Thread 0x7fb8c07ad640 (LWP 528502) "iou-wrk-527455"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 33 (Thread 0x7fb8c05a0640 (LWP 528500) "iou-wrk-527462"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 32 (Thread 0x7fb9dc055640 (LWP 528436) "iou-wrk-527452"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 31 (Thread 0x7fb8c8090640 (LWP 528433) "iou-wrk-527453"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 30 (Thread 0x7fb8c06cc640 (LWP 528409) "iou-wrk-527458"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 29 (Thread 0x7fb8c0474640 (LWP 528391) "iou-wrk-527466"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 28 (Thread 0x7fb8c0762640 (LWP 528364) "iou-wrk-527456"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 27 (Thread 0x7fb8c05eb640 (LWP 528323) "iou-wrk-527461"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 26 (Thread 0x7fb8c0717640 (LWP 528185) "iou-wrk-527457"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 25 (Thread 0x7fb8c0681640 (LWP 527848) "iou-wrk-527459"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 I will reboot into the Debian 5.10 kernel, which worked for me in the past.

            With this latest kernel of the stable Debian release, I got no hangs during some extensive Sysbench runs:

            Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-5 (2021-09-23)
            

            One of the runs was 20 minutes of oltp_update_index using 8 tables, 2 million rows each. The data size grew from about 4GiB to 33GiB due to purge lag. I misconfigured an unreasonably small log, to ensure frequent checkpoints and page writes:

            innodb_buffer_pool_size=4G
            innodb_log_file_size=256M
            

            After the setup and the 20 minutes of workload, the filesystem was tortured for further 30 minutes by innodb_fast_shutdown=0:

            2021-10-22 10:12:10 0 [Note] InnoDB: Buffer pool(s) load completed at 211022 10:12:10
            2021-10-22 10:32:10 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
            2021-10-22 10:32:10 0 [Note] InnoDB: FTS optimize thread exiting.
            2021-10-22 10:32:10 0 [Note] InnoDB: to purge 90296033 transactions
            2021-10-22 11:01:40 0 [Note] InnoDB: to purge 277733 transactions
            2021-10-22 11:01:43 0 [Note] InnoDB: Starting shutdown...
            2021-10-22 11:01:49 0 [Note] InnoDB: Shutdown completed; log sequence number 38731820479; transaction id 213562752
            2021-10-22 11:01:49 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete
            

            I think that this confirms my earlier claim that the Debian linux-image-5.10.0-8-amd64 is unaffected by the bug.

            marko Marko Mäkelä added a comment - With this latest kernel of the stable Debian release, I got no hangs during some extensive Sysbench runs: Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-5 (2021-09-23) One of the runs was 20 minutes of oltp_update_index using 8 tables, 2 million rows each. The data size grew from about 4GiB to 33GiB due to purge lag. I misconfigured an unreasonably small log, to ensure frequent checkpoints and page writes: innodb_buffer_pool_size=4G innodb_log_file_size=256M After the setup and the 20 minutes of workload, the filesystem was tortured for further 30 minutes by innodb_fast_shutdown=0 : 2021-10-22 10:12:10 0 [Note] InnoDB: Buffer pool(s) load completed at 211022 10:12:10 2021-10-22 10:32:10 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown 2021-10-22 10:32:10 0 [Note] InnoDB: FTS optimize thread exiting. 2021-10-22 10:32:10 0 [Note] InnoDB: to purge 90296033 transactions … 2021-10-22 11:01:40 0 [Note] InnoDB: to purge 277733 transactions 2021-10-22 11:01:43 0 [Note] InnoDB: Starting shutdown... … 2021-10-22 11:01:49 0 [Note] InnoDB: Shutdown completed; log sequence number 38731820479; transaction id 213562752 2021-10-22 11:01:49 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete I think that this confirms my earlier claim that the Debian linux-image-5.10.0-8-amd64 is unaffected by the bug.

            While I currently hope that the problem is in the Linux kernel and not in our use of the io_uring interface, I think that we should implement a simple work-around to avoid hangs until the root cause of the problem has been diagnosed and fixed.

            The simple work-around would be as follows: If we are running on a ‘too new’ kernel (anything newer than 5.10 or possibly 5.11) and the executable has been configured with io_uring rather than libaio, we will fall back to innodb_use_native_aio=off.

            marko Marko Mäkelä added a comment - While I currently hope that the problem is in the Linux kernel and not in our use of the io_uring interface, I think that we should implement a simple work-around to avoid hangs until the root cause of the problem has been diagnosed and fixed. The simple work-around would be as follows: If we are running on a ‘too new’ kernel (anything newer than 5.10 or possibly 5.11) and the executable has been configured with io_uring rather than libaio , we will fall back to innodb_use_native_aio=off .

            To work around this problem, we will set innodb_use_native_aio=OFF by default if the executable is configured for io_uring and the Linux kernel version is between 5.11 and 5.15. If the parameter is explicitly set, a warning message will be issued:

            10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d

            2021-10-25 13:08:10 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.14.0-3-amd64; see https://jira.mariadb.org/browse/MDEV-26674
            

            marko Marko Mäkelä added a comment - To work around this problem, we will set innodb_use_native_aio=OFF by default if the executable is configured for io_uring and the Linux kernel version is between 5.11 and 5.15 . If the parameter is explicitly set, a warning message will be issued: 10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d 2021-10-25 13:08:10 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.14.0-3-amd64; see https://jira.mariadb.org/browse/MDEV-26674

            I updated my Debian Sid kernel to 5.15-rc7 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and unfortunately the hang is still there, albeit a little harder to reproduce. When I attach GDB during a hang, I see much fewer iou-wrk- threads, so presumably it is better now. Due to Jira size limit I will trim the stack traces that are waiting for a buffer page latch while executing UPDATE:

            10.6 58fe6b47d4e8580e370a094d8f5768d053aa52c1

            Thread 26 (Thread 0x7f30c89be640 (LWP 42170) "iou-wrk-39600"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 25 (Thread 0x7f25f6ffd640 (LWP 40227) "mysqld"):
            #7  std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4600) at /usr/include/c++/11/condition_variable:163
            #8  tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:449
            #9  0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600, t=0x7f25f6ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502
            #10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:547
            #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
            #12 0x00007f30cba0feae in start_thread (arg=0x7f25f6ffd640) at pthread_create.c:463
            #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 24 (Thread 0x7f2572ffd640 (LWP 40012) "iou-wrk-39671"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 23 (Thread 0x7f268c30c640 (LWP 39721) "mysqld"):
            #7  mtr_t::page_lock (this=this@entry=0x7f268c30a0d0, block=block@entry=0x7f307d8094c0, rw_latch=rw_latch@entry=1) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:1175
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 22 (Thread 0x7f268c357640 (LWP 39720) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24cc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 21 (Thread 0x7f268c3a2640 (LWP 39719) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 20 (Thread 0x7f268c3ed640 (LWP 39718) "mysqld"):
            #17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 19 (Thread 0x7f268c438640 (LWP 39717) "mysqld"):
            #16 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 18 (Thread 0x7f268c483640 (LWP 39716) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24dc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 17 (Thread 0x7f268c4ce640 (LWP 39715) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 16 (Thread 0x7f26a004b640 (LWP 39714) "mysqld"):
            #17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 15 (Thread 0x7f26a0096640 (LWP 39713) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 14 (Thread 0x7f26a00e1640 (LWP 39712) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 13 (Thread 0x7f26a012c640 (LWP 39711) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24ec000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 12 (Thread 0x7f26a0177640 (LWP 39710) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 11 (Thread 0x7f26a01c2640 (LWP 39709) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 10 (Thread 0x7f26a020d640 (LWP 39708) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24fc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 9 (Thread 0x7f30c805b640 (LWP 39707) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 8 (Thread 0x7f30c80a6640 (LWP 39687) "mysqld"):
            #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f2504000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086
             
            Thread 7 (Thread 0x7f2572ffd640 (LWP 39671) "mysqld"):
            #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f2572ffcc20, clockid=1929366416, expected=0, futex_word=0x55c7cd5e40a8) at ../sysdeps/nptl/futex-internal.h:323
            #1  __pthread_cond_wait_common (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:520
            #2  __pthread_cond_clockwait (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:677
            #3  __pthread_cond_clockwait (cond=0x55c7cd5e4080, mutex=0x55c7cd5d65f0, clockid=1929366416, abstime=0x7f2572ffcc20) at pthread_cond_wait.c:665
            #4  0x000055c7cac69697 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x55c7cd5e4080) at /usr/include/c++/11/bits/std_mutex.h:123
            #5  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:201
            #6  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:111
            #7  std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:163
            #8  tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:449
            #9  0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080, t=0x7f2572ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502
            #10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:547
            #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
            #12 0x00007f30cba0feae in start_thread (arg=0x7f2572ffd640) at pthread_create.c:463
            #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 6 (Thread 0x7f30c80f1640 (LWP 39606) "mysqld"):
            #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cb4d15a8 <COND_manager+40>) at ../sysdeps/nptl/futex-internal.h:186
            #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cb4d15c0 <LOCK_manager>, cond=0x55c7cb4d1580 <COND_manager>) at pthread_cond_wait.c:508
            #2  __pthread_cond_wait (cond=0x55c7cb4d1580 <COND_manager>, mutex=0x55c7cb4d15c0 <LOCK_manager>) at pthread_cond_wait.c:638
            #3  0x000055c7ca69279c in inline_mysql_cond_wait (mutex=0x55c7cb4d15c0 <LOCK_manager>, that=0x55c7cb4d1580 <COND_manager>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1072
            #4  handle_manager (arg=<optimized out>) at /mariadb/10.6/sql/sql_manager.cc:103
            #5  0x00007f30cba0feae in start_thread (arg=0x7f30c80f1640) at pthread_create.c:463
            #6  0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 5 (Thread 0x7f30c813c640 (LWP 39605) "mysqld"):
            #0  0x00007f30cb50cbd2 in __GI___sigtimedwait (set=0x7f30c813bbc0, info=0x7f30c813bc40, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36
            #1  0x000055c7ca5dc863 in my_sigwait (code=<synthetic pointer>, sig=0x7f30c813bbbc, set=0x7f30c813bbc0) at /mariadb/10.6/include/my_pthread.h:195
            #2  signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113
            #3  0x00007f30cba0feae in start_thread (arg=0x7f30c813c640) at pthread_create.c:463
            #4  0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 4 (Thread 0x7f30c89be640 (LWP 39600) "mysqld"):
            #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cbd4e3a8 <buf_dblwr+104>) at ../sysdeps/nptl/futex-internal.h:186
            #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cbd4e350 <buf_dblwr+16>, cond=0x55c7cbd4e380 <buf_dblwr+64>) at pthread_cond_wait.c:508
            #2  __pthread_cond_wait (cond=0x55c7cbd4e380 <buf_dblwr+64>, mutex=0x55c7cbd4e350 <buf_dblwr+16>) at pthread_cond_wait.c:638
            #3  0x000055c7cabf223c in buf_dblwr_t::flush_buffered_writes (this=0x55c7cbd4e340 <buf_dblwr>, size=64) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:562
            #4  0x000055c7cabf27bb in buf_dblwr_t::add_to_batch (this=0x55c7cbd4e340 <buf_dblwr>, request=..., size=16384) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:756
            #5  0x000055c7cabf69fc in buf_flush_page (bpage=bpage@entry=0x7f302d7a8140, lru=lru@entry=false, space=space@entry=0x55c7cd6831f0) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:932
            #6  0x000055c7cabf7c74 in buf_do_flush_list_batch (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1452
            #7  buf_flush_list (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1527
            #8  buf_flush_list (max_n=90000, lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1507
            #9  0x000055c7cabf83f6 in buf_flush_sync_for_checkpoint (lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1922
            #10 buf_flush_page_cleaner () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2191
            #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
            #12 0x00007f30cba0feae in start_thread (arg=0x7f30c89be640) at pthread_create.c:463
            #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 3 (Thread 0x7f30c9423640 (LWP 39598) "mysqld"):
            #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #1  0x00007f30cbf76cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1
            #2  0x00007f30cbf76963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1
            #3  0x000055c7cac6c8c4 in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:494
            #4  io_uring_wait_cqe (cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:514
            #5  (anonymous namespace)::aio_uring::thread_routine (aio=0x55c7cd61d440) at /mariadb/10.6/tpool/aio_liburing.cc:137
            #6  0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
            #7  0x00007f30cba0feae in start_thread (arg=0x7f30c9423640) at pthread_create.c:463
            #8  0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 2 (Thread 0x7f30cc075640 (LWP 39597) "mysqld"):
            #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f30cc074cd0, clockid=-871936960, expected=0, futex_word=0x55c7cbd584ac <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323
            #1  __pthread_cond_wait_common (abstime=0x7f30cc074cd0, clockid=-871936960, mutex=0x55c7cbd584c0 <LOCK_timer>, cond=0x55c7cbd58480 <COND_timer>) at pthread_cond_wait.c:520
            #2  __pthread_cond_timedwait (cond=cond@entry=0x55c7cbd58480 <COND_timer>, mutex=mutex@entry=0x55c7cbd584c0 <LOCK_timer>, abstime=abstime@entry=0x7f30cc074cd0) at pthread_cond_wait.c:656
            #3  0x000055c7cacc1b51 in inline_mysql_cond_timedwait (abstime=0x7f30cc074cd0, mutex=0x55c7cbd584c0 <LOCK_timer>, that=0x55c7cbd58480 <COND_timer>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088
            #4  timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321
            #5  0x00007f30cba0feae in start_thread (arg=0x7f30cc075640) at pthread_create.c:463
            #6  0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
            Thread 1 (Thread 0x7f30cb1f0f40 (LWP 39594) "mysqld"):
            #0  0x00007f30cb5c4543 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffb81911c0) at ../sysdeps/unix/sysv/linux/select.c:41
            #1  0x000055c7cacbce8f in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29
            #2  0x000055c7ca5de6ca in close_connections () at /mariadb/10.6/sql/mysqld.cc:1758
            #3  0x000055c7ca5e5994 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5842
            #4  0x00007f30cb4f6e4a in __libc_start_main (main=0x55c7ca5aa810 <main(int, char**)>, argc=32, argv=0x7fffb8191aa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb8191a98) at ../csu/libc-start.c:314
            #5  0x000055c7ca5dc0aa in _start () at /mariadb/10.6/sql/my_decimal.h:147
            

            I think that previously with 5.15-rc6, I got it to hang already right before the end of the 2-minute warmup. Now it survived that and the subsequent innodb_fast_shutdown=0 shutdown (which took about 3 minutes). It hung after 7 minutes into the 20-minute run:

            [ 400s ] thds: 16 tps: 132745.91 qps: 132745.91 (r/w/o: 0.00/132745.91/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00
            [ 405s ] thds: 16 tps: 52828.95 qps: 52828.95 (r/w/o: 0.00/52828.95/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00
            [ 410s ] thds: 16 tps: 114629.30 qps: 114629.30 (r/w/o: 0.00/114629.30/0.00) lat (ms,99%): 0.26 err/s: 0.00 reconn/s: 0.00
            [ 415s ] thds: 16 tps: 132201.16 qps: 132201.16 (r/w/o: 0.00/132201.16/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00
            [ 420s ] thds: 16 tps: 68733.43 qps: 68733.43 (r/w/o: 0.00/68733.43/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00
            [ 425s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 430s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 435s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 440s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 445s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 450s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 455s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 460s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 465s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 470s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 475s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            [ 480s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
            

            This was observed on ext4 file system on fast NVMe storage, using innodb_flush_log_at_trx_commit=0 and innodb_log_file_size=256M (yes, tiny to force very frequent checkpoints, a.k.a. MDEV-23855 scenario) and innodb_buffer_pool_size=40G. At the time of the hang, the data directory had grown to 18GiB. The payload is only about 4GiB (8 tables, 2M rows each), but the system tablespace would grow due to purge lag caused by the oltp_update_index test.

            marko Marko Mäkelä added a comment - I updated my Debian Sid kernel to 5.15-rc7 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and unfortunately the hang is still there, albeit a little harder to reproduce. When I attach GDB during a hang, I see much fewer iou-wrk- threads, so presumably it is better now. Due to Jira size limit I will trim the stack traces that are waiting for a buffer page latch while executing UPDATE: 10.6 58fe6b47d4e8580e370a094d8f5768d053aa52c1 Thread 26 (Thread 0x7f30c89be640 (LWP 42170) "iou-wrk-39600"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 25 (Thread 0x7f25f6ffd640 (LWP 40227) "mysqld"): … #7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4600) at /usr/include/c++/11/condition_variable:163 #8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:449 #9 0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600, t=0x7f25f6ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502 #10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4600) at /mariadb/10.6/tpool/tpool_generic.cc:547 #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f30cba0feae in start_thread (arg=0x7f25f6ffd640) at pthread_create.c:463 #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 24 (Thread 0x7f2572ffd640 (LWP 40012) "iou-wrk-39671"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 23 (Thread 0x7f268c30c640 (LWP 39721) "mysqld"): … #7 mtr_t::page_lock (this=this@entry=0x7f268c30a0d0, block=block@entry=0x7f307d8094c0, rw_latch=rw_latch@entry=1) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:1175 … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 22 (Thread 0x7f268c357640 (LWP 39720) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24cc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 21 (Thread 0x7f268c3a2640 (LWP 39719) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24c8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 20 (Thread 0x7f268c3ed640 (LWP 39718) "mysqld"): … #17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 19 (Thread 0x7f268c438640 (LWP 39717) "mysqld"): … #16 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 18 (Thread 0x7f268c483640 (LWP 39716) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24dc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 17 (Thread 0x7f268c4ce640 (LWP 39715) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24d8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 16 (Thread 0x7f26a004b640 (LWP 39714) "mysqld"): … #17 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 15 (Thread 0x7f26a0096640 (LWP 39713) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e4000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 14 (Thread 0x7f26a00e1640 (LWP 39712) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 13 (Thread 0x7f26a012c640 (LWP 39711) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24ec000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 12 (Thread 0x7f26a0177640 (LWP 39710) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24e8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 11 (Thread 0x7f26a01c2640 (LWP 39709) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f0000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 10 (Thread 0x7f26a020d640 (LWP 39708) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24fc000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 9 (Thread 0x7f30c805b640 (LWP 39707) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f24f8000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 8 (Thread 0x7f30c80a6640 (LWP 39687) "mysqld"): … #19 0x000055c7ca75a21e in mysql_update (thd=thd@entry=0x7f2504000c58, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.6/sql/sql_update.cc:1086 …   Thread 7 (Thread 0x7f2572ffd640 (LWP 39671) "mysqld"): #0 futex_abstimed_wait_cancelable (private=0, abstime=0x7f2572ffcc20, clockid=1929366416, expected=0, futex_word=0x55c7cd5e40a8) at ../sysdeps/nptl/futex-internal.h:323 #1 __pthread_cond_wait_common (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:520 #2 __pthread_cond_clockwait (abstime=0x7f2572ffcc20, clockid=1929366416, mutex=0x55c7cd5d65f0, cond=0x55c7cd5e4080) at pthread_cond_wait.c:677 #3 __pthread_cond_clockwait (cond=0x55c7cd5e4080, mutex=0x55c7cd5d65f0, clockid=1929366416, abstime=0x7f2572ffcc20) at pthread_cond_wait.c:665 #4 0x000055c7cac69697 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x55c7cd5e4080) at /usr/include/c++/11/bits/std_mutex.h:123 #5 std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:201 #6 std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:111 #7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __rtime=..., __lock=..., this=0x55c7cd5e4080) at /usr/include/c++/11/condition_variable:163 #8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c7cd5d64e0, lk=..., thread_data=thread_data@entry=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:449 #9 0x000055c7cac69be0 in tpool::thread_pool_generic::get_task (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080, t=0x7f2572ffccc8) at /mariadb/10.6/tpool/tpool_generic.cc:502 #10 0x000055c7cac6ad5d in tpool::thread_pool_generic::worker_main (this=0x55c7cd5d64e0, thread_var=0x55c7cd5e4080) at /mariadb/10.6/tpool/tpool_generic.cc:547 #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f30cba0feae in start_thread (arg=0x7f2572ffd640) at pthread_create.c:463 #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 6 (Thread 0x7f30c80f1640 (LWP 39606) "mysqld"): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cb4d15a8 <COND_manager+40>) at ../sysdeps/nptl/futex-internal.h:186 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cb4d15c0 <LOCK_manager>, cond=0x55c7cb4d1580 <COND_manager>) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x55c7cb4d1580 <COND_manager>, mutex=0x55c7cb4d15c0 <LOCK_manager>) at pthread_cond_wait.c:638 #3 0x000055c7ca69279c in inline_mysql_cond_wait (mutex=0x55c7cb4d15c0 <LOCK_manager>, that=0x55c7cb4d1580 <COND_manager>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1072 #4 handle_manager (arg=<optimized out>) at /mariadb/10.6/sql/sql_manager.cc:103 #5 0x00007f30cba0feae in start_thread (arg=0x7f30c80f1640) at pthread_create.c:463 #6 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 5 (Thread 0x7f30c813c640 (LWP 39605) "mysqld"): #0 0x00007f30cb50cbd2 in __GI___sigtimedwait (set=0x7f30c813bbc0, info=0x7f30c813bc40, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36 #1 0x000055c7ca5dc863 in my_sigwait (code=<synthetic pointer>, sig=0x7f30c813bbbc, set=0x7f30c813bbc0) at /mariadb/10.6/include/my_pthread.h:195 #2 signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113 #3 0x00007f30cba0feae in start_thread (arg=0x7f30c813c640) at pthread_create.c:463 #4 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 4 (Thread 0x7f30c89be640 (LWP 39600) "mysqld"): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c7cbd4e3a8 <buf_dblwr+104>) at ../sysdeps/nptl/futex-internal.h:186 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c7cbd4e350 <buf_dblwr+16>, cond=0x55c7cbd4e380 <buf_dblwr+64>) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x55c7cbd4e380 <buf_dblwr+64>, mutex=0x55c7cbd4e350 <buf_dblwr+16>) at pthread_cond_wait.c:638 #3 0x000055c7cabf223c in buf_dblwr_t::flush_buffered_writes (this=0x55c7cbd4e340 <buf_dblwr>, size=64) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:562 #4 0x000055c7cabf27bb in buf_dblwr_t::add_to_batch (this=0x55c7cbd4e340 <buf_dblwr>, request=..., size=16384) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:756 #5 0x000055c7cabf69fc in buf_flush_page (bpage=bpage@entry=0x7f302d7a8140, lru=lru@entry=false, space=space@entry=0x55c7cd6831f0) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:932 #6 0x000055c7cabf7c74 in buf_do_flush_list_batch (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1452 #7 buf_flush_list (lsn=19812648754, max_n=90000) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1527 #8 buf_flush_list (max_n=90000, lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1507 #9 0x000055c7cabf83f6 in buf_flush_sync_for_checkpoint (lsn=19812648754) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1922 #10 buf_flush_page_cleaner () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2191 #11 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f30cba0feae in start_thread (arg=0x7f30c89be640) at pthread_create.c:463 #13 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 3 (Thread 0x7f30c9423640 (LWP 39598) "mysqld"): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f30cbf76cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1 #2 0x00007f30cbf76963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1 #3 0x000055c7cac6c8c4 in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:494 #4 io_uring_wait_cqe (cqe_ptr=0x7f30c9422cb8, ring=0x55c7cd61d448) at /usr/include/liburing.h:514 #5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55c7cd61d440) at /mariadb/10.6/tpool/aio_liburing.cc:137 #6 0x00007f30cb8c8864 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f30cba0feae in start_thread (arg=0x7f30c9423640) at pthread_create.c:463 #8 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 2 (Thread 0x7f30cc075640 (LWP 39597) "mysqld"): #0 futex_abstimed_wait_cancelable (private=0, abstime=0x7f30cc074cd0, clockid=-871936960, expected=0, futex_word=0x55c7cbd584ac <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323 #1 __pthread_cond_wait_common (abstime=0x7f30cc074cd0, clockid=-871936960, mutex=0x55c7cbd584c0 <LOCK_timer>, cond=0x55c7cbd58480 <COND_timer>) at pthread_cond_wait.c:520 #2 __pthread_cond_timedwait (cond=cond@entry=0x55c7cbd58480 <COND_timer>, mutex=mutex@entry=0x55c7cbd584c0 <LOCK_timer>, abstime=abstime@entry=0x7f30cc074cd0) at pthread_cond_wait.c:656 #3 0x000055c7cacc1b51 in inline_mysql_cond_timedwait (abstime=0x7f30cc074cd0, mutex=0x55c7cbd584c0 <LOCK_timer>, that=0x55c7cbd58480 <COND_timer>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088 #4 timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321 #5 0x00007f30cba0feae in start_thread (arg=0x7f30cc075640) at pthread_create.c:463 #6 0x00007f30cb5cca5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   Thread 1 (Thread 0x7f30cb1f0f40 (LWP 39594) "mysqld"): #0 0x00007f30cb5c4543 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffb81911c0) at ../sysdeps/unix/sysv/linux/select.c:41 #1 0x000055c7cacbce8f in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29 #2 0x000055c7ca5de6ca in close_connections () at /mariadb/10.6/sql/mysqld.cc:1758 #3 0x000055c7ca5e5994 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5842 #4 0x00007f30cb4f6e4a in __libc_start_main (main=0x55c7ca5aa810 <main(int, char**)>, argc=32, argv=0x7fffb8191aa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb8191a98) at ../csu/libc-start.c:314 #5 0x000055c7ca5dc0aa in _start () at /mariadb/10.6/sql/my_decimal.h:147 I think that previously with 5.15-rc6, I got it to hang already right before the end of the 2-minute warmup. Now it survived that and the subsequent innodb_fast_shutdown=0 shutdown (which took about 3 minutes). It hung after 7 minutes into the 20-minute run: [ 400s ] thds: 16 tps: 132745.91 qps: 132745.91 (r/w/o: 0.00/132745.91/0.00) lat (ms,99%): 0.21 err/s: 0.00 reconn/s: 0.00 [ 405s ] thds: 16 tps: 52828.95 qps: 52828.95 (r/w/o: 0.00/52828.95/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00 [ 410s ] thds: 16 tps: 114629.30 qps: 114629.30 (r/w/o: 0.00/114629.30/0.00) lat (ms,99%): 0.26 err/s: 0.00 reconn/s: 0.00 [ 415s ] thds: 16 tps: 132201.16 qps: 132201.16 (r/w/o: 0.00/132201.16/0.00) lat (ms,99%): 0.22 err/s: 0.00 reconn/s: 0.00 [ 420s ] thds: 16 tps: 68733.43 qps: 68733.43 (r/w/o: 0.00/68733.43/0.00) lat (ms,99%): 0.23 err/s: 0.00 reconn/s: 0.00 [ 425s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 430s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 435s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 440s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 445s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 450s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 455s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 460s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 465s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 470s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 475s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 480s ] thds: 16 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00 This was observed on ext4 file system on fast NVMe storage, using innodb_flush_log_at_trx_commit=0 and innodb_log_file_size=256M (yes, tiny to force very frequent checkpoints, a.k.a. MDEV-23855 scenario) and innodb_buffer_pool_size=40G . At the time of the hang, the data directory had grown to 18GiB. The payload is only about 4GiB (8 tables, 2M rows each), but the system tablespace would grow due to purge lag caused by the oltp_update_index test.

            For what it is worth, if 5.15-rc7 and 5.14.14 would have worked for me, I would have revised my previous tweak as follows:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 72db9df0346..83c163adf23 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -19416,9 +19416,15 @@ static bool innodb_use_native_aio_default()
             #ifdef HAVE_URING
               utsname &u= uname_for_io_uring;
               if (!uname(&u) && u.release[0] == '5' && u.release[1] == '.' &&
            -      u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '6')
            +      u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '5' &&
            +      u.release[4] == '.')
               {
            -    io_uring_may_be_unsafe= u.release;
            +    /* io_uring() in some Linux kernels 5.11 to 5.14 may hang. */
            +    if (u.release[3] == '4')
            +      if (const char *s= strstr(u.version, "5.14."))
            +        if ((s[5] > '1' && s[6] >= '0') || (s[5] == '1' && s[6] >= '4'))
            +          return true; /* 5.14.14 and later should be fine */
            +    io_uring_may_be_unsafe= u.version;
                 return false; /* working around io_uring hangs (MDEV-26674) */
               }
             #endif
            

            Because of the failure of 5.15-rc7, I did not test 5.14.14.

            marko Marko Mäkelä added a comment - For what it is worth, if 5.15-rc7 and 5.14.14 would have worked for me, I would have revised my previous tweak as follows: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 72db9df0346..83c163adf23 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -19416,9 +19416,15 @@ static bool innodb_use_native_aio_default() #ifdef HAVE_URING utsname &u= uname_for_io_uring; if (!uname(&u) && u.release[0] == '5' && u.release[1] == '.' && - u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '6') + u.release[2] == '1' && u.release[3] > '0' && u.release[3] < '5' && + u.release[4] == '.') { - io_uring_may_be_unsafe= u.release; + /* io_uring() in some Linux kernels 5.11 to 5.14 may hang. */ + if (u.release[3] == '4') + if (const char *s= strstr(u.version, "5.14.")) + if ((s[5] > '1' && s[6] >= '0') || (s[5] == '1' && s[6] >= '4')) + return true; /* 5.14.14 and later should be fine */ + io_uring_may_be_unsafe= u.version; return false; /* working around io_uring hangs (MDEV-26674) */ } #endif Because of the failure of 5.15-rc7, I did not test 5.14.14.

            The Linux kernel 5.14.16 was recently packaged for Debian Sid (unstable):

            Linux version 5.14.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-12) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.16-1 (2021-11-03)
            

            With this kernel, I did not observe any hang during a few Sysbench runs with a total run time between 15 and 30 minutes. The page write rate was about 200 megabytes per second throughout the run. Yes, log checkpoints were very frequent due to tiny innodb_log_file_size, so that io_uring would be invoked more often. (We only invoke it for page writes, not log writes yet.)

            I see that 5.14.17 and 5.15.1 were released yesterday. The next step should be to ensure that this hang is not repeatable with 5.15.0, and then update MariaDB so that we would only set innodb_use_native_aio=OFF by default on 5.11, 5.12, 5.13, and 5.14.0 through 5.14.15.

            marko Marko Mäkelä added a comment - The Linux kernel 5.14.16 was recently packaged for Debian Sid (unstable): Linux version 5.14.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-12) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.16-1 (2021-11-03) With this kernel, I did not observe any hang during a few Sysbench runs with a total run time between 15 and 30 minutes. The page write rate was about 200 megabytes per second throughout the run. Yes, log checkpoints were very frequent due to tiny innodb_log_file_size , so that io_uring would be invoked more often. (We only invoke it for page writes, not log writes yet.) I see that 5.14.17 and 5.15.1 were released yesterday. The next step should be to ensure that this hang is not repeatable with 5.15.0, and then update MariaDB so that we would only set innodb_use_native_aio=OFF by default on 5.11, 5.12, 5.13, and 5.14.0 through 5.14.15.

            5.15.1 still hung in about 5 seconds when using 64 concurrent client connections and not using numactl to restrict MariaDB from using both CPU sockets (NUMA nodes).
            In the mariadbd process I got lots of threads like this:

            Thread 83 (Thread 0x7f2452ffd640 (LWP 16478) "iou-wrk-15476"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 82 (Thread 0x7f2563fff640 (LWP 16467) "iou-wrk-16299"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 81 (Thread 0x7f2562ffd640 (LWP 16466) "iou-wrk-16307"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 80 (Thread 0x7f2563fff640 (LWP 16465) "iou-wrk-16299"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 79 (Thread 0x7f2452ffd640 (LWP 16463) "iou-wrk-15476"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 78 (Thread 0x7f251affd640 (LWP 16461) "iou-wrk-15591"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 77 (Thread 0x7f25a8714640 (LWP 16456) "iou-wrk-15384"):
            #0  0x0000000000000000 in ?? ()
            

            I am not sure if I used numactl yesterday to bind the process to 1 CPU, on 5.14.16. I will reboot to that kernel and try to repeat with the exact same test.

            marko Marko Mäkelä added a comment - 5.15.1 still hung in about 5 seconds when using 64 concurrent client connections and not using numactl to restrict MariaDB from using both CPU sockets (NUMA nodes). In the mariadbd process I got lots of threads like this: Thread 83 (Thread 0x7f2452ffd640 (LWP 16478) "iou-wrk-15476"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 82 (Thread 0x7f2563fff640 (LWP 16467) "iou-wrk-16299"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 81 (Thread 0x7f2562ffd640 (LWP 16466) "iou-wrk-16307"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 80 (Thread 0x7f2563fff640 (LWP 16465) "iou-wrk-16299"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 79 (Thread 0x7f2452ffd640 (LWP 16463) "iou-wrk-15476"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 78 (Thread 0x7f251affd640 (LWP 16461) "iou-wrk-15591"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 77 (Thread 0x7f25a8714640 (LWP 16456) "iou-wrk-15384"): #0 0x0000000000000000 in ?? () I am not sure if I used numactl yesterday to bind the process to 1 CPU, on 5.14.16. I will reboot to that kernel and try to repeat with the exact same test.

            With Mariabench-MDEV-26674-io_uring I see no hang during the total test time of 25 minutes. If I change innodb-use-native-aio=0 to 1 in that script, then the same executable will hang in less than 5 seconds into the workload on the 5.14.16 kernel. Again, there was at least 1 iou-wrk- thread with the strange stack trace.

            marko Marko Mäkelä added a comment - With Mariabench-MDEV-26674-io_uring I see no hang during the total test time of 25 minutes. If I change innodb-use-native-aio=0 to 1 in that script, then the same executable will hang in less than 5 seconds into the workload on the 5.14.16 kernel. Again, there was at least 1 iou-wrk- thread with the strange stack trace.

            On repeat, I the Linux kernel 5.14.16 hung already during sysbench prepare:

            Creating table 'sbtest1'...
            Inserting 2000000 records into 'sbtest1'
            Creating a secondary index on 'sbtest1'...
            Creating table 'sbtest2'...
            Inserting 2000000 records into 'sbtest2'
            Creating a secondary index on 'sbtest2'...
            Creating table 'sbtest3'...
            Inserting 2000000 records into 'sbtest3'
            Creating a secondary index on 'sbtest3'...
            Creating table 'sbtest4'...
            Inserting 2000000 records into 'sbtest4'
            

            I stopped debugging here:

            (gdb) thr a a bt
             
            Thread 27 (Thread 0x7faad786f640 (LWP 28519) "iou-wrk-28359"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            marko Marko Mäkelä added a comment - On repeat, I the Linux kernel 5.14.16 hung already during sysbench prepare : Creating table 'sbtest1'... Inserting 2000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... Creating table 'sbtest2'... Inserting 2000000 records into 'sbtest2' Creating a secondary index on 'sbtest2'... Creating table 'sbtest3'... Inserting 2000000 records into 'sbtest3' Creating a secondary index on 'sbtest3'... Creating table 'sbtest4'... Inserting 2000000 records into 'sbtest4' I stopped debugging here: (gdb) thr a a bt   Thread 27 (Thread 0x7faad786f640 (LWP 28519) "iou-wrk-28359"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0

            Good news: I compiled a mainline kernel tree

            commit 5833291ab6de9c3e2374336b51c814e515e8f3a5 (origin/master, origin/HEAD)
            Merge: ca2ef2d9f2aa e0217c5ba10d
            Author: Linus Torvalds
            Date:   Thu Nov 11 15:10:18 2021 -0800
             
                Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
            

            with this patch by Jens Axboe, and I was unable to reproduce any hangs during some 2 hours of testing. I ensured with sudo perf top that some io_uring functions were executed during the test.

            marko Marko Mäkelä added a comment - Good news: I compiled a mainline kernel tree commit 5833291ab6de9c3e2374336b51c814e515e8f3a5 (origin/master, origin/HEAD) Merge: ca2ef2d9f2aa e0217c5ba10d Author: Linus Torvalds Date: Thu Nov 11 15:10:18 2021 -0800   Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci with this patch by Jens Axboe , and I was unable to reproduce any hangs during some 2 hours of testing. I ensured with sudo perf top that some io_uring functions were executed during the test.

            With Debian’s linux-image-5.15.0-1-amd64 that is based on the upstream Linux kernel version 5.15.3, my test script passed twice.

            /proc/version

            Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18)
            

            During the slow shutdown, the NVMe was constantly writing between 520 and 580 MB/s via ext4fs.

            2021-11-23 13:44:48 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
            2021-11-23 13:44:48 0 [Note] InnoDB: FTS optimize thread exiting.
            2021-11-23 13:44:48 0 [Note] InnoDB: to purge 48555142 transactions
            2021-11-23 14:04:48 0 [Note] InnoDB: to purge 1532842 transactions
            2021-11-23 14:05:03 0 [Note] InnoDB: to purge 930142 transactions
            2021-11-23 14:05:18 0 [Note] InnoDB: to purge 496042 transactions
            2021-11-23 14:05:33 0 [Note] InnoDB: to purge 120442 transactions
            2021-11-23 14:05:37 0 [Note] InnoDB: Starting shutdown...
            2021-11-23 14:05:37 0 [Note] InnoDB: Dumping buffer pool(s) to /mariadb/sbtest/ib_buffer_pool
            2021-11-23 14:05:37 0 [Note] InnoDB: Buffer pool(s) dump completed at 211123 14:05:37
            2021-11-23 14:05:39 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
            2021-11-23 14:05:39 0 [Note] InnoDB: Shutdown completed; log sequence number 19549445557; transaction id 104928440
            2021-11-23 14:05:39 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete
            

            marko Marko Mäkelä added a comment - With Debian’s linux-image-5.15.0-1-amd64 that is based on the upstream Linux kernel version 5.15.3, my test script passed twice. /proc/version Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18) During the slow shutdown, the NVMe was constantly writing between 520 and 580 MB/s via ext4fs . 2021-11-23 13:44:48 0 [Note] /dev/shm/10.6/sql/mysqld (initiated by: root[root] @ localhost []): Normal shutdown 2021-11-23 13:44:48 0 [Note] InnoDB: FTS optimize thread exiting. 2021-11-23 13:44:48 0 [Note] InnoDB: to purge 48555142 transactions … 2021-11-23 14:04:48 0 [Note] InnoDB: to purge 1532842 transactions 2021-11-23 14:05:03 0 [Note] InnoDB: to purge 930142 transactions 2021-11-23 14:05:18 0 [Note] InnoDB: to purge 496042 transactions 2021-11-23 14:05:33 0 [Note] InnoDB: to purge 120442 transactions 2021-11-23 14:05:37 0 [Note] InnoDB: Starting shutdown... 2021-11-23 14:05:37 0 [Note] InnoDB: Dumping buffer pool(s) to /mariadb/sbtest/ib_buffer_pool 2021-11-23 14:05:37 0 [Note] InnoDB: Buffer pool(s) dump completed at 211123 14:05:37 2021-11-23 14:05:39 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1" 2021-11-23 14:05:39 0 [Note] InnoDB: Shutdown completed; log sequence number 19549445557; transaction id 104928440 2021-11-23 14:05:39 0 [Note] /dev/shm/10.6/sql/mysqld: Shutdown complete

            I am raising the priority to Blocker, so that we will not forget to adjust our work-around (changing the default value of innodb_use_native_aio) before the next releases. What we already know is that Linux 5.15.3 has been fixed, and we could already adjust the code accordingly. danblack reproduced a hang with 5.14.20.

            marko Marko Mäkelä added a comment - I am raising the priority to Blocker, so that we will not forget to adjust our work-around (changing the default value of innodb_use_native_aio ) before the next releases. What we already know is that Linux 5.15.3 has been fixed, and we could already adjust the code accordingly. danblack reproduced a hang with 5.14.20 .
            danblack Daniel Black added a comment - Ubuntu kernel bug reports coming per https://lore.kernel.org/linux-block/96d6241f-7bf0-cefe-947e-ee03d83fb828@samba.org/T/#mb0d0ce280dc0e39bf7e78b3fb07b305672104dc5 Final commit to allow 5.15.3+ added to 10.6

            danblack, thank you. For reference, I am quoting the linked discussion:

            5.14 is end-of-life and not getting any more releases (the front page of kernel.org should show that.)

            In the discussion it was confirmed that 5.11 and 5.13 (which are maintained in Ubuntu but not listed as supported by https://www.kernel.org upstream) are affected and that fixes would have to be ported there.

            Should someone want to use io_uring on a Linux kernel between 5.11 and 5.14, they can do that by specifying innodb_use_native_aio=1 in the configuration.

            The work-around in MariaDB Server 10.6.5 is only changing the default value from 1 to 0 if the kernel version is between 5.11 and 5.15, and the revised work-around in MariaDB Server 10.6.6 is restricted to kernel versions between 5.11.0 and 5.15.2.

            marko Marko Mäkelä added a comment - danblack , thank you. For reference, I am quoting the linked discussion: 5.14 is end-of-life and not getting any more releases (the front page of kernel.org should show that.) In the discussion it was confirmed that 5.11 and 5.13 (which are maintained in Ubuntu but not listed as supported by https://www.kernel.org upstream) are affected and that fixes would have to be ported there. Should someone want to use io_uring on a Linux kernel between 5.11 and 5.14, they can do that by specifying innodb_use_native_aio=1 in the configuration. The work-around in MariaDB Server 10.6.5 is only changing the default value from 1 to 0 if the kernel version is between 5.11 and 5.15, and the revised work-around in MariaDB Server 10.6.6 is restricted to kernel versions between 5.11.0 and 5.15.2.

            danblack, I am sorry to inform you, but I can reproduce io_uring related hangs on the above mentioned 5.15.3 kernel when running

            ./mtr --suite=galera
            

            (single-threaded) on my system in a 10.6-based branch.

            marko Marko Mäkelä added a comment - danblack , I am sorry to inform you, but I can reproduce io_uring related hangs on the above mentioned 5.15.3 kernel when running ./mtr --suite=galera (single-threaded) on my system in a 10.6-based branch.

            To provide something reproducible, I built git checkout mariadb-10.6.5 (the latest release) and started a test on the Debian linux-image-5.15.0-1-amd64 on my NUMA box (dual Intel Xeon E5-2630):

            /proc/version

            Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18)
            

            sudo apt install galera-4
            ./mtr --mysqld=--loose-innodb-use-native-aio=1 --suite=galera
            

            However, this would cause test failures, because the mariadb-10.6.5 release "banned" all kernels 5.11 through 5.15:

            mariadb-10.6.5

            2021-11-26  8:40:48 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.15.0-1-amd64; see https://jira.mariadb.org/browse/MDEV-26674
            ^ Found warnings in /dev/shm/10.6/mysql-test/var/log/mysqld.1.err
            

            To avoid recompiling any server code, we may tweak the test harness:

            diff --git a/mysql-test/include/mtr_warnings.sql b/mysql-test/include/mtr_warnings.sql
            index 0584bb934e2..d807f4e1c12 100644
            --- a/mysql-test/include/mtr_warnings.sql
            +++ b/mysql-test/include/mtr_warnings.sql
            @@ -96,6 +96,7 @@ INSERT INTO global_suppressions VALUES
              ("Event Scheduler"),
              ("Failed to open log"),
              ("Failed to open the existing master info file"),
            + ("innodb_use_native_aio may cause hangs"),
              ("Forcing shutdown of [0-9]* plugins"),
             
              /*
            

            When not running on a blocklisted kernel, the steps are simpler:

            sudo apt install galera-4
            ./mtr --suite=galera
            

            Unfortunately (and possibly necessarily for reproducing the hang), these tests will run the test client on only one CPU core, and the server processes will be idle most of the time. The full run should take at least 20 minutes if it succeeds. Each individual test will typically complete in 2 to 25 seconds, but the test galera.galera_restart_on_unknown_option will consume 75 seconds. You can obtain the current test name by running

            pgrep mariadbd|while read;do grep ^CURRENT_TEST: /proc/"$REPLY"/fd/1|tail -1;done
            

            Yesterday, I reproduced the hang using a 10.6.5 based branch, both with Debug and RelWithDebInfo, which did not seem to run any faster than Debug.

            While the test is running, you can check in top whether any mariadbd are actually consuming CPU. Typically, 2 processes should be active. When the processes disappear, pgrep mariadbd and attach GDB to one of them. If you see the following type of output, I would assume that it is the kernel bug.

            It seems that I finally got lucky and got a hang during the execution of galera.galera_truncate. On the first process reported by pgrep mariadbd, GDB shows several threads like this:

            mariadb-10.6.5

            Thread 18 (Thread 0x7f6d084bf640 (LWP 458667) "iou-wrk-457320"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 17 (Thread 0x7f6d08474640 (LWP 458645) "iou-wrk-458511"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 16 (Thread 0x7f6d0850a640 (LWP 458553) "iou-wrk-456971"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            In the second process, I saw no such calls.

            The gnome-terminal title (anything xterm compatible should work) includes some progress reporting. Below I am including some of the test harness output:

            mtr: spent 0:13:21 on 247 tests. 0:06:45 (125 tests) left, 0 failed

            galera.galera_suspend_slave 'innodb'     [ pass ]  25399
            galera.galera_sync_wait_show 'innodb'    [ pass ]   2039
            galera.galera_toi_alter_auto_increment 'innodb' [ pass ]   4065
            galera.galera_toi_ddl_locking 'innodb'   [ pass ]   6046
            galera.galera_toi_ddl_nonconflicting 'innodb' [ pass ]   2039
            galera.galera_toi_ddl_sequential 'innodb' [ pass ]   2032
            galera.galera_toi_drop_database 'innodb' [ pass ]  16646
            galera.galera_toi_ftwrl 'innodb'         [ pass ]   2029
            galera.galera_toi_lock_exclusive 'innodb' [ pass ]   3029
            galera.galera_toi_lock_shared 'innodb'   [ pass ]   2030
            galera.galera_toi_truncate 'innodb'      [ skipped ]  Test requires Galera debug library with debug_sync functionality
            galera.galera_transaction_read_only 'innodb' [ pass ]   2036
            galera.galera_transaction_replay 'innodb' [ skipped ]  Test requires Galera debug library with debug_sync functionality
            

            After I killed one of the mariadbd processes after the test had been hung for several minutes, I got:

            mtr: spent 0:20:54 on 248 tests. 0:10:27 (124 tests) left, 1 failed

            galera.galera_truncate 'innodb'          [ fail ]
                    Test ended at 2021-11-26 09:05:48
             
            CURRENT_TEST: galera.galera_truncate
            mysqltest: At line 19: query 'TRUNCATE TABLE t1' failed: <Unknown> (2013): Lost connection to server during query
            

            followed by an excerpt of the server error log.

            I also tried the following, but it is not a promising strategy:

            ./mtr --mysqld=--loose-innodb-use-native-aio=1 --repeat=100 galera.galera_truncate
            

            mtr: spent 0:00:08 on 1 tests. 0:00:00 (0 tests) left, 0 failed

            galera.galera_truncate 'innodb'          [ pass ]   2073
            galera.galera_truncate 'innodb'          [ 2 pass ]   2069
            galera.galera_truncate 'innodb'          [ 44 pass ]   2070
            galera.galera_truncate 'innodb'          [ 45 pass ]   2068
            ^Cmysql-test-run: *** ERROR: Got ^C signal
            

            marko Marko Mäkelä added a comment - To provide something reproducible, I built git checkout mariadb-10.6.5 (the latest release) and started a test on the Debian linux-image-5.15.0-1-amd64 on my NUMA box (dual Intel Xeon E5-2630): /proc/version Linux version 5.15.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.3-1 (2021-11-18) sudo apt install galera-4 ./mtr --mysqld=--loose-innodb-use-native-aio=1 --suite=galera However, this would cause test failures, because the mariadb-10.6.5 release "banned" all kernels 5.11 through 5.15: mariadb-10.6.5 2021-11-26 8:40:48 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.15.0-1-amd64; see https://jira.mariadb.org/browse/MDEV-26674 ^ Found warnings in /dev/shm/10.6/mysql-test/var/log/mysqld.1.err To avoid recompiling any server code, we may tweak the test harness: diff --git a/mysql-test/include/mtr_warnings.sql b/mysql-test/include/mtr_warnings.sql index 0584bb934e2..d807f4e1c12 100644 --- a/mysql-test/include/mtr_warnings.sql +++ b/mysql-test/include/mtr_warnings.sql @@ -96,6 +96,7 @@ INSERT INTO global_suppressions VALUES ("Event Scheduler"), ("Failed to open log"), ("Failed to open the existing master info file"), + ("innodb_use_native_aio may cause hangs"), ("Forcing shutdown of [0-9]* plugins"), /* When not running on a blocklisted kernel, the steps are simpler: sudo apt install galera-4 ./mtr --suite=galera Unfortunately (and possibly necessarily for reproducing the hang), these tests will run the test client on only one CPU core, and the server processes will be idle most of the time. The full run should take at least 20 minutes if it succeeds. Each individual test will typically complete in 2 to 25 seconds, but the test galera.galera_restart_on_unknown_option will consume 75 seconds. You can obtain the current test name by running pgrep mariadbd|while read;do grep ^CURRENT_TEST: /proc/"$REPLY"/fd/1|tail -1;done Yesterday, I reproduced the hang using a 10.6.5 based branch, both with Debug and RelWithDebInfo, which did not seem to run any faster than Debug. While the test is running, you can check in top whether any mariadbd are actually consuming CPU. Typically, 2 processes should be active. When the processes disappear, pgrep mariadbd and attach GDB to one of them. If you see the following type of output, I would assume that it is the kernel bug. It seems that I finally got lucky and got a hang during the execution of galera.galera_truncate . On the first process reported by pgrep mariadbd , GDB shows several threads like this: mariadb-10.6.5 Thread 18 (Thread 0x7f6d084bf640 (LWP 458667) "iou-wrk-457320"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 17 (Thread 0x7f6d08474640 (LWP 458645) "iou-wrk-458511"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 16 (Thread 0x7f6d0850a640 (LWP 458553) "iou-wrk-456971"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 In the second process, I saw no such calls. The gnome-terminal title (anything xterm compatible should work) includes some progress reporting. Below I am including some of the test harness output: mtr: spent 0:13:21 on 247 tests. 0:06:45 (125 tests) left, 0 failed galera.galera_suspend_slave 'innodb' [ pass ] 25399 galera.galera_sync_wait_show 'innodb' [ pass ] 2039 galera.galera_toi_alter_auto_increment 'innodb' [ pass ] 4065 galera.galera_toi_ddl_locking 'innodb' [ pass ] 6046 galera.galera_toi_ddl_nonconflicting 'innodb' [ pass ] 2039 galera.galera_toi_ddl_sequential 'innodb' [ pass ] 2032 galera.galera_toi_drop_database 'innodb' [ pass ] 16646 galera.galera_toi_ftwrl 'innodb' [ pass ] 2029 galera.galera_toi_lock_exclusive 'innodb' [ pass ] 3029 galera.galera_toi_lock_shared 'innodb' [ pass ] 2030 galera.galera_toi_truncate 'innodb' [ skipped ] Test requires Galera debug library with debug_sync functionality galera.galera_transaction_read_only 'innodb' [ pass ] 2036 galera.galera_transaction_replay 'innodb' [ skipped ] Test requires Galera debug library with debug_sync functionality After I killed one of the mariadbd processes after the test had been hung for several minutes, I got: mtr: spent 0:20:54 on 248 tests. 0:10:27 (124 tests) left, 1 failed galera.galera_truncate 'innodb' [ fail ] Test ended at 2021-11-26 09:05:48   CURRENT_TEST: galera.galera_truncate mysqltest: At line 19: query 'TRUNCATE TABLE t1' failed: <Unknown> (2013): Lost connection to server during query followed by an excerpt of the server error log. I also tried the following, but it is not a promising strategy: ./mtr --mysqld=--loose-innodb-use-native-aio=1 --repeat=100 galera.galera_truncate mtr: spent 0:00:08 on 1 tests. 0:00:00 (0 tests) left, 0 failed galera.galera_truncate 'innodb' [ pass ] 2073 galera.galera_truncate 'innodb' [ 2 pass ] 2069 … galera.galera_truncate 'innodb' [ 44 pass ] 2070 galera.galera_truncate 'innodb' [ 45 pass ] 2068 ^Cmysql-test-run: *** ERROR: Got ^C signal

            The following logs prove that also the 5.10 kernel is affected:
            http://buildbot.askmonty.org/buildbot/builders/kvm-deb-bullseye-amd64/builds/418/steps/mtr-galera/logs/kernel

            [    0.000000] Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)
            

            http://buildbot.askmonty.org/buildbot/builders/kvm-deb-bullseye-amd64/builds/418/steps/mtr-galera/logs/stdio

            galera.galera_truncate_temporary 'innodb' w2 [ fail ]  timeout after 900 seconds
                    Test ended at 2021-11-29 07:48:26
             
            Test case timeout after 900 seconds
            Thread 13 (Thread 0x7f3875492700 (LWP 14483)):
            #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #1  0x00007f38828b3cb0 in ?? () from /lib/x86_64-linux-gnu/liburing.so.1
            #2  0x00007f38828b3963 in __io_uring_get_cqe () from /lib/x86_64-linux-gnu/liburing.so.1
            #3  0x000055ae066dbd0a in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:494
            #4  io_uring_wait_cqe (cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:514
            #5  (anonymous namespace)::aio_uring::thread_routine (aio=0x55ae091d9660) at ./tpool/aio_liburing.cc:137
            #6  0x00007f38822dbed0 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #7  0x00007f38823e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #8  0x00007f3881fffdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            Among the stack traces, you can see several threads that are waiting for a page latch, because a write completion callback was apparently never called.

            Maybe we should disable io_uring also on the 5.10 kernel by default? I wonder if 5.4 is safe. As far as I remember, Ubuntu 20.10 has it but does not include a liburing package. So, we might not have any io_uring tests running on a 5.4 kernel in our CI systems.

            marko Marko Mäkelä added a comment - The following logs prove that also the 5.10 kernel is affected: http://buildbot.askmonty.org/buildbot/builders/kvm-deb-bullseye-amd64/builds/418/steps/mtr-galera/logs/kernel [ 0.000000] Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03) http://buildbot.askmonty.org/buildbot/builders/kvm-deb-bullseye-amd64/builds/418/steps/mtr-galera/logs/stdio galera.galera_truncate_temporary 'innodb' w2 [ fail ] timeout after 900 seconds Test ended at 2021-11-29 07:48:26   Test case timeout after 900 seconds … Thread 13 (Thread 0x7f3875492700 (LWP 14483)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f38828b3cb0 in ?? () from /lib/x86_64-linux-gnu/liburing.so.1 #2 0x00007f38828b3963 in __io_uring_get_cqe () from /lib/x86_64-linux-gnu/liburing.so.1 #3 0x000055ae066dbd0a in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:494 #4 io_uring_wait_cqe (cqe_ptr=0x7f3875491d98, ring=0x55ae091d9668) at /usr/include/liburing.h:514 #5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55ae091d9660) at ./tpool/aio_liburing.cc:137 #6 0x00007f38822dbed0 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f38823e8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007f3881fffdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Among the stack traces, you can see several threads that are waiting for a page latch, because a write completion callback was apparently never called. Maybe we should disable io_uring also on the 5.10 kernel by default? I wonder if 5.4 is safe. As far as I remember, Ubuntu 20.10 has it but does not include a liburing package. So, we might not have any io_uring tests running on a 5.4 kernel in our CI systems.
            danblack Daniel Black added a comment -

            Thankfully Ubuntu 20.10 is out of support. And containers are still focal based, so no uring support there for the arbitrary kernels they run on. So on the distribution side, I think we're safe (for supported versions), but still need a 5.15.x correction of aio defaults (so may as well include 5.10 at same time) and kernel 5.16 fix into the sid/fedora/other bleeding edge distros before our next release.

            danblack Daniel Black added a comment - Thankfully Ubuntu 20.10 is out of support. And containers are still focal based, so no uring support there for the arbitrary kernels they run on. So on the distribution side, I think we're safe (for supported versions), but still need a 5.15.x correction of aio defaults (so may as well include 5.10 at same time) and kernel 5.16 fix into the sid/fedora/other bleeding edge distros before our next release.

            I have been trying to reproduce the hang all day, mostly on the 5.15.5 kernel, but for a short while also on 5.15.3, but I have failed. The failure that I got on the 5.15.3 kernel was not with MariaDB 10.6.5, and the test suite included additional tests.

            It is very unfortunate that mtr may by design silently SIGKILL hung servers. In the error log for the test galera.galera_restart_on_unknown_option, I did not see any server shutdown messages at all, not even at the end of the test.

            For some unrelated debugging, I have been applying the following patch.

            diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc
            index 4d0d1e2a3a0..abc167a4300 100644
            --- a/mysql-test/lib/My/SafeProcess/safe_process.cc
            +++ b/mysql-test/lib/My/SafeProcess/safe_process.cc
            @@ -144,7 +144,7 @@ static int kill_child(bool was_killed)
               message("Killing child: %d", child_pid);
               // Terminate whole process group
               if (! was_killed)
            -    kill(-child_pid, SIGKILL);
            +    kill(-child_pid, SIGABRT);
             
               pid_t ret_pid= waitpid(child_pid, &status, 0);
               if (ret_pid == child_pid)
            

            The patch will make at least several rpl tests noisy, because some tests apparently like to clean up processes by SIGKILL, instead of initiating and waiting for proper shutdown. I have not ran galera tests with that.

            marko Marko Mäkelä added a comment - I have been trying to reproduce the hang all day, mostly on the 5.15.5 kernel, but for a short while also on 5.15.3, but I have failed. The failure that I got on the 5.15.3 kernel was not with MariaDB 10.6.5, and the test suite included additional tests. It is very unfortunate that mtr may by design silently SIGKILL hung servers. In the error log for the test galera.galera_restart_on_unknown_option , I did not see any server shutdown messages at all, not even at the end of the test. For some unrelated debugging, I have been applying the following patch. diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc index 4d0d1e2a3a0..abc167a4300 100644 --- a/mysql-test/lib/My/SafeProcess/safe_process.cc +++ b/mysql-test/lib/My/SafeProcess/safe_process.cc @@ -144,7 +144,7 @@ static int kill_child(bool was_killed) message("Killing child: %d", child_pid); // Terminate whole process group if (! was_killed) - kill(-child_pid, SIGKILL); + kill(-child_pid, SIGABRT); pid_t ret_pid= waitpid(child_pid, &status, 0); if (ret_pid == child_pid) The patch will make at least several rpl tests noisy, because some tests apparently like to clean up processes by SIGKILL , instead of initiating and waiting for proper shutdown. I have not ran galera tests with that.

            This hang seems to be very sporadic. Just today, it looks like I got it again in the test encryption.debug_key_management while not actively looking for it.

            Thread 21 (Thread 0x7fdc297fa640 (LWP 1066815) "iou-wrk-1064648"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 20 (Thread 0x7fdc4ce26640 (LWP 1066353) "iou-wrk-1064424"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 19 (Thread 0x7fdc4ce26640 (LWP 1064736) "iou-wrk-1064424"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 18 (Thread 0x7fdc45ffb640 (LWP 1064711) "iou-wrk-1064437"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 17 (Thread 0x7fdc467fc640 (LWP 1064670) "iou-wrk-1064436"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 16 (Thread 0x7fdc457fa640 (LWP 1064668) "iou-wrk-1064438"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 15 (Thread 0x7fdc45ffb640 (LWP 1064666) "iou-wrk-1064437"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            This is the 5.15.5 kernel that failed to reproduce anything yesterday:

            /proc/version

            Linux version 5.15.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-1 (2021-11-26)
            

            The server is not completely hanging; the test encryption.debug_key_management is polling for data page rewrite. That progress would be blocked by the hanging page writes.

            I attached and detached GDB several times to the same process during several minutes, and those strange threads remained there.

            marko Marko Mäkelä added a comment - This hang seems to be very sporadic. Just today, it looks like I got it again in the test encryption.debug_key_management while not actively looking for it. Thread 21 (Thread 0x7fdc297fa640 (LWP 1066815) "iou-wrk-1064648"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 20 (Thread 0x7fdc4ce26640 (LWP 1066353) "iou-wrk-1064424"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 19 (Thread 0x7fdc4ce26640 (LWP 1064736) "iou-wrk-1064424"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 18 (Thread 0x7fdc45ffb640 (LWP 1064711) "iou-wrk-1064437"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 17 (Thread 0x7fdc467fc640 (LWP 1064670) "iou-wrk-1064436"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 16 (Thread 0x7fdc457fa640 (LWP 1064668) "iou-wrk-1064438"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 15 (Thread 0x7fdc45ffb640 (LWP 1064666) "iou-wrk-1064437"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 This is the 5.15.5 kernel that failed to reproduce anything yesterday: /proc/version Linux version 5.15.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-1 (2021-11-26) The server is not completely hanging; the test encryption.debug_key_management is polling for data page rewrite. That progress would be blocked by the hanging page writes. I attached and detached GDB several times to the same process during several minutes, and those strange threads remained there.

            Eventually, the above mentioned test would time out like this:

            encryption.debug_key_management 'innodb' w27 [ fail ]  timeout after 900 seconds
                    Test ended at 2021-12-01 14:51:34
             
            Test case timeout after 900 seconds
            

            I just ran 17×100 rounds of the test on mariadb-10.6.5, without success. Same story with 1×100 rounds (without --parallel). Each iteration of the test would complete in about half a second. I suspect that sufficient other concurrent load is needed to reproduce the hang. I will now keep trying with a debug build and

            while ./mtr --parallel=auto --suite=encryption; do :; done
            

            marko Marko Mäkelä added a comment - Eventually, the above mentioned test would time out like this: encryption.debug_key_management 'innodb' w27 [ fail ] timeout after 900 seconds Test ended at 2021-12-01 14:51:34   Test case timeout after 900 seconds I just ran 17×100 rounds of the test on mariadb-10.6.5, without success. Same story with 1×100 rounds (without --parallel ). Each iteration of the test would complete in about half a second. I suspect that sufficient other concurrent load is needed to reproduce the hang. I will now keep trying with a debug build and while ./mtr --parallel=auto --suite=encryption; do :; done

            I suspect that the remaining sporadic hangs (mostly observed on persistent storage, not RAM disk) are due to InnoDB itself. I was confused by the dummy stack frames like this:

            Thread 92 (Thread 0x7f55997fa640 (LWP 3003034) "iou-wrk-3002573"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 91 (Thread 0x7f5680ff9640 (LWP 3002588) "iou-wrk-2998748"):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            or like this when diagnosing a core dump:

            Thread 8 (LWP 2772238):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 7 (LWP 2771665):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
             
            Thread 6 (LWP 2770649):
            #0  0x0000000000000000 in ?? ()
            Backtrace stopped: Cannot access memory at address 0x0
            

            I was able to repeat a shutdown hang with the following invocation on NVMe (512-byte physical blocks) on a 10.8-based tree:

            ./mtr --parallel=auto --repeat=10 innodb.undo_truncate_recover,4k{,,,,,,,,,,,,,,}
            

            The hang would repeat also in a cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 build (when invoking pwrite() in simulated_aio threads). I will post another update after I have analyzed and fixed this hang, and attempted another io_uring run.

            marko Marko Mäkelä added a comment - I suspect that the remaining sporadic hangs (mostly observed on persistent storage, not RAM disk) are due to InnoDB itself. I was confused by the dummy stack frames like this: Thread 92 (Thread 0x7f55997fa640 (LWP 3003034) "iou-wrk-3002573"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 91 (Thread 0x7f5680ff9640 (LWP 3002588) "iou-wrk-2998748"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 or like this when diagnosing a core dump: Thread 8 (LWP 2772238): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 7 (LWP 2771665): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0   Thread 6 (LWP 2770649): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 I was able to repeat a shutdown hang with the following invocation on NVMe (512-byte physical blocks) on a 10.8-based tree: ./mtr --parallel=auto --repeat=10 innodb.undo_truncate_recover,4k{,,,,,,,,,,,,,,} The hang would repeat also in a cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 build (when invoking pwrite() in simulated_aio threads). I will post another update after I have analyzed and fixed this hang, and attempted another io_uring run.
            marko Marko Mäkelä added a comment - - edited

            I reported the 10.6 hang of innodb.undo_truncate_recover as MDEV-27414.

            On our CI systems, I have noticed checkpoint-related hangs in 10.5+ innodb.innodb-wl5522* tests mostly on Microsoft Windows (where we do not use a RAM disk). We do not even have an io_uring interface in 10.5. Edit: I reproduced this and filed as MDEV-27416.

            I think that the remaining hard-to-reproduce sporadic hangs where used to blame io_uring are in our own code.

            Our revised work-around of the Linux kernel bug will consider that anything between Linux 5.11.0 and 5.15.2 (inclusive) is potentially broken. The initial work-around that appeared in 10.6.5 disabled innodb_use_native_aio by default on any 5.15 kernel.

            marko Marko Mäkelä added a comment - - edited I reported the 10.6 hang of innodb.undo_truncate_recover as MDEV-27414 . On our CI systems, I have noticed checkpoint-related hangs in 10.5+ innodb.innodb-wl5522* tests mostly on Microsoft Windows (where we do not use a RAM disk). We do not even have an io_uring interface in 10.5. Edit: I reproduced this and filed as MDEV-27416 . I think that the remaining hard-to-reproduce sporadic hangs where used to blame io_uring are in our own code. Our revised work-around of the Linux kernel bug will consider that anything between Linux 5.11.0 and 5.15.2 (inclusive) is potentially broken. The initial work-around that appeared in 10.6.5 disabled innodb_use_native_aio by default on any 5.15 kernel.
            ajs124 ajs124 added a comment -

            I've gotten reports of users that sound similar to this with kernel 5.16, btrfs and io_uring (see https://github.com/NixOS/nixpkgs/issues/160516).

            Should I open a separate issue for that?

            ajs124 ajs124 added a comment - I've gotten reports of users that sound similar to this with kernel 5.16, btrfs and io_uring (see https://github.com/NixOS/nixpkgs/issues/160516 ). Should I open a separate issue for that?

            ajs124, yes, please file a separate ticket. The closest matching open bug is MDEV-27449, reporting corruption on btrfs and Linux 5.15, but it was not stated whether the io_uring interface was being used (instead of libaio).

            marko Marko Mäkelä added a comment - ajs124 , yes, please file a separate ticket. The closest matching open bug is MDEV-27449 , reporting corruption on btrfs and Linux 5.15, but it was not stated whether the io_uring interface was being used (instead of libaio ).

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.