[MDEV-26674] io_uring related hangs on the Linux kernel - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6, 10.7(EOL)
Fix Version/s: 10.8.0, 10.6.6, 10.7.2
Component/s: Storage Engine - InnoDB
Labels:
- hang
- linux
Environment:

Hide
Debian GNU/Linux Sid (unstable), Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)

Show
Debian GNU/Linux Sid (unstable), Linux version 5.14.0-1-amd64 ( debian-kernel@lists.debian.org ) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)

Description

After a kernel upgrade, I started to experience random hangs in InnoDB related to io_uring. Yesterday with an older kernel

Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)

everything ran fine, but today I got several tests failing or hanging when running the following on a nproc=40 machine:

nice ./mtr --parallel=60 --big-test --force

Here is an example with the names of hanging tests:

10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89
Completed: Failed 7/5108 tests, 99.86% were successful.

Failing test(s): innodb.innodb-64k innodb_zip.innochecksum_2 gcol.innodb_virtual_basic stress.ddl_innodb innodb_gis.kill_server main.index_merge_innodb innodb_gis.rtree_debug

I suspect that the culprit is the newer kernel:

Linux version 5.14.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.3.0-10) 10.3.0, GNU ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.14.6-2 (2021-09-19)

If I disable io_uring for the MariaDB server build, then everything will work just fine using the libaio interface (io_submit, io_getevents).

Here is one example of a hang:

10.6 d95361107c07b6e8257a7a82c41b18af64ab8d89
innodb_zip.innochecksum_2 '4k,innodb' w40 [ fail ] Found warnings/errors in server log file!
Test ended at 2021-09-24 14:43:48
line
Attempting backtrace. You can use the following information to find out
…
Thread 7 (Thread 0x7eff55043640 (LWP 954970)):
#0 0x00007eff56cb7bd2 in __GI___sigtimedwait (set=0x7eff55042ca8, info=0x7eff55042c20, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:36
#1 0x000055c55ef1883b in my_sigwait (set=0x7eff55042ca8, sig=0x7eff55042c1c, code=<optimized out>) at /mariadb/10.6/include/my_pthread.h:195
#2 signal_hand (arg=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:3113
#3 0x00007eff57075eae in start_thread (arg=0x7eff55043640) at pthread_create.c:463
#4 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (LWP 961402):
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (Thread 0x7eff5638b640 (LWP 954807)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007eff57721cb0 in ?? () from /usr/lib/x86_64-linux-gnu/liburing.so.1
#2 0x00007eff57721963 in __io_uring_get_cqe () from /usr/lib/x86_64-linux-gnu/liburing.so.1
#3 0x000055c55f872361 in io_uring_wait_cqe_nr (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10, wait_nr=1) at /usr/include/liburing.h:494
#4 io_uring_wait_cqe (ring=0x55c562837c98, cqe_ptr=0x7eff5638ad10) at /usr/include/liburing.h:514
#5 (anonymous namespace)::aio_uring::thread_routine (aio=0x55c562837c90) at /mariadb/10.6/tpool/aio_liburing.cc:137
#6 0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007eff57075eae in start_thread (arg=0x7eff5638b640) at pthread_create.c:463
#8 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (LWP 961385):
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 3 (Thread 0x7eff5781f640 (LWP 954792)):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7eff5781ed10, clockid=1468132416, expected=0, futex_word=0x55c560ae27fc <COND_timer+44>) at ../sysdeps/nptl/futex-internal.h:323
#1 __pthread_cond_wait_common (abstime=0x7eff5781ed10, clockid=1468132416, mutex=0x55c560ae2748 <LOCK_timer+40>, cond=0x55c560ae27d0 <COND_timer>) at pthread_cond_wait.c:520
#2 __pthread_cond_timedwait (cond=cond@entry=0x55c560ae27d0 <COND_timer>, mutex=mutex@entry=0x55c560ae2748 <LOCK_timer+40>, abstime=abstime@entry=0x7eff5781ed10) at pthread_cond_wait.c:656
#3 0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c560ae27d0 <COND_timer>, mp=0x55c560ae2720 <LOCK_timer>, abstime=abstime@entry=0x7eff5781ed10, file=0x55c55f96aaa9 "/mariadb/10.6/include/mysql/psi/mysql_thread.h", line=line@entry=1088) at /mariadb/10.6/mysys/thr_mutex.c:546
#4 0x000055c55f8dc26f in inline_mysql_cond_timedwait (that=0x55c560ae27fc <COND_timer+44>, mutex=0x189, abstime=0x7eff5781ed10, src_file=<optimized out>, src_line=<optimized out>) at /mariadb/10.6/include/mysql/psi/mysql_thread.h:1088
#5 timer_handler (arg=<optimized out>) at /mariadb/10.6/mysys/thr_timer.c:321
#6 0x00007eff57075eae in start_thread (arg=0x7eff5781f640) at pthread_create.c:463
#7 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7efec9ffb640 (LWP 954925)):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7efec9ffac78, clockid=-905991216, expected=0, futex_word=0x55c5627dc54c) at ../sysdeps/nptl/futex-internal.h:323
#1 __pthread_cond_wait_common (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:520
#2 __pthread_cond_clockwait (abstime=0x7efec9ffac78, clockid=-905991216, mutex=0x55c5627cd990, cond=0x55c5627dc520) at pthread_cond_wait.c:677
#3 __pthread_cond_clockwait (cond=0x55c5627dc520, mutex=0x55c5627cd990, clockid=-905991216, abstime=0x7efec9ffac78) at pthread_cond_wait.c:665
#4 0x000055c55f86efaf in std::__condvar::wait_until (this=0x55c5627dc520, __m=<error reading variable>, __clock=1, __abs_time=@0x7efec9ffac78: {tv_sec = 24713, tv_nsec = 284075513}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:169
#5 std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:201
#6 std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __atime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:111
#7 std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x55c5627dc520, __lock=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, __rtime=@0x55c5627cd9b8: {__r = 60000}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:163
#8 tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x55c5627cd880, lk=@0x7efec9ffacc8: {_M_device = 0x55c5627cd990, _M_owns = true}, thread_data=thread_data@entry=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:449
#9 0x000055c55f86f14e in tpool::thread_pool_generic::get_task (this=this@entry=0x55c5627cd880, thread_var=thread_var@entry=0x55c5627dc520, t=t@entry=0x7efec9ffad18) at /mariadb/10.6/tpool/tpool_generic.cc:502
#10 0x000055c55f86f444 in tpool::thread_pool_generic::worker_main (this=0x55c5627cd880, thread_var=0x55c5627dc520) at /mariadb/10.6/tpool/tpool_generic.cc:547
#11 0x00007eff56f2e8c4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007eff57075eae in start_thread (arg=0x7efec9ffb640) at pthread_create.c:463
#13 0x00007eff56d77a5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7eff5699a840 (LWP 954716)):
#0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
#1 0x000055c55f25a83a in handle_fatal_signal (sig=6) at /mariadb/10.6/sql/signal_handler.cc:345
#2 <signal handler called>
#3 0x00007eff57079849 in __pthread_mutex_unlock_usercnt (mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, decr=decr@entry=0) at pthread_mutex_unlock.c:58
#4 0x00007eff5707c1a8 in __pthread_cond_wait_common (abstime=0x7fff59490d98, clockid=0, mutex=0x55c5601cf528 <buf_pool+40>, cond=0x55c5601cf5f0 <buf_pool+240>) at pthread_cond_wait.c:421
#5 __pthread_cond_timedwait (cond=cond@entry=0x55c5601cf5f0 <buf_pool+240>, mutex=mutex@entry=0x55c5601cf528 <buf_pool+40>, abstime=abstime@entry=0x7fff59490d98) at pthread_cond_wait.c:656
#6 0x000055c55f8dbacf in safe_cond_timedwait (cond=0x55c5601cf5f0 <buf_pool+240>, mp=0x55c5601cf500 <buf_pool>, abstime=0x7fff59490d98, file=0x55c55fbd74a8 "/mariadb/10.6/storage/innobase/buf/buf0flu.cc", line=2435) at /mariadb/10.6/mysys/thr_mutex.c:546
#7 0x000055c55f796517 in buf_flush_buffer_pool () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2434
#8 0x000055c55f5a1121 in logs_empty_and_mark_files_at_shutdown () at /mariadb/10.6/storage/innobase/log/log0log.cc:1139
#9 0x000055c55f6d79a0 in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1970
#10 0x000055c55f517318 in innobase_end () at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4273
#11 0x000055c55f25b48f in ha_finalize_handlerton (plugin=0x55c5627abcb0) at /mariadb/10.6/sql/handler.cc:595
#12 0x000055c55f00b6df in plugin_deinitialize (plugin=0x55c5627abcb0, ref_check=true) at /mariadb/10.6/sql/sql_plugin.cc:1266
#13 0x000055c55f009f9a in reap_plugins () at /mariadb/10.6/sql/sql_plugin.cc:1342
#14 0x000055c55f00bba3 in plugin_shutdown () at /mariadb/10.6/sql/sql_plugin.cc:2050
#15 0x000055c55ef179ed in clean_up (print_message=true) at /mariadb/10.6/sql/mysqld.cc:1923
#16 0x000055c55ef196a2 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/sql/mysqld.cc:5844
#17 0x00007eff56ca1e4a in __libc_start_main (main=0x55c55ef16a40 <main(int, char**)>, argc=26, argv=0x7fff59491718, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff59491708) at ../csu/libc-start.c:314
#18 0x000055c55ef1697a in _start () at /mariadb/10.6/sql/my_decimal.h:147

This looks like a shutdown hang, and the problem could be the hung Thread 5. There are also 2 suspicious threads with a null address. I do not remember seeing such stack traces earlier.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Mariabench-MDEV-26674-io_uring
2 kB
2021-11-09 11:29
Mariabench-MDEV-26674-io_uring-1
2 kB
2021-11-09 22:40

Issue Links

causes

MDEV-36182 Use libaio when running kernel 5.14

Confirmed

relates to

MDEV-24883 add io_uring support for tpool

Closed

MDEV-26555 main.innodb_ext_key fatal assertion

Closed

links to

5.14 needs one more patch, but no release coming

debian sid linux kernel-5.14.12 bug 996951

fix 5.15.3

fix 5.16-rc1

fix - 5.14.19

linux block upstream bug report

(4 links to)

Activity

Ascending order - Click to sort in descending order

View 35 older comments

Marko Mäkelä added a comment - 2021-12-01 13:05

Eventually, the above mentioned test would time out like this:

encryption.debug_key_management 'innodb' w27 [ fail ]  timeout after 900 seconds

        Test ended at 2021-12-01 14:51:34

Test case timeout after 900 seconds

I just ran 17×100 rounds of the test on mariadb-10.6.5, without success. Same story with 1×100 rounds (without --parallel). Each iteration of the test would complete in about half a second. I suspect that sufficient other concurrent load is needed to reproduce the hang. I will now keep trying with a debug build and

while ./mtr --parallel=auto --suite=encryption; do :; done

Marko Mäkelä added a comment - 2021-12-01 13:05 Eventually, the above mentioned test would time out like this: encryption.debug_key_management 'innodb' w27 [ fail ] timeout after 900 seconds Test ended at 2021-12-01 14:51:34 Test case timeout after 900 seconds I just ran 17×100 rounds of the test on mariadb-10.6.5, without success. Same story with 1×100 rounds (without --parallel ). Each iteration of the test would complete in about half a second. I suspect that sufficient other concurrent load is needed to reproduce the hang. I will now keep trying with a debug build and while ./mtr --parallel=auto --suite=encryption; do :; done

Marko Mäkelä added a comment - 2022-01-03 14:01

I suspect that the remaining sporadic hangs (mostly observed on persistent storage, not RAM disk) are due to InnoDB itself. I was confused by the dummy stack frames like this:

Thread 92 (Thread 0x7f55997fa640 (LWP 3003034) "iou-wrk-3002573"):

#0  0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x0

Thread 91 (Thread 0x7f5680ff9640 (LWP 3002588) "iou-wrk-2998748"):

#0  0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x0

or like this when diagnosing a core dump:

Thread 8 (LWP 2772238):

#0  0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 2771665):

#0  0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 2770649):

#0  0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x0

I was able to repeat a shutdown hang with the following invocation on NVMe (512-byte physical blocks) on a 10.8-based tree:

./mtr --parallel=auto --repeat=10 innodb.undo_truncate_recover,4k{,,,,,,,,,,,,,,}

The hang would repeat also in a cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 build (when invoking pwrite() in simulated_aio threads). I will post another update after I have analyzed and fixed this hang, and attempted another io_uring run.

Marko Mäkelä added a comment - 2022-01-03 14:01 I suspect that the remaining sporadic hangs (mostly observed on persistent storage, not RAM disk) are due to InnoDB itself. I was confused by the dummy stack frames like this: Thread 92 (Thread 0x7f55997fa640 (LWP 3003034) "iou-wrk-3002573"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 Thread 91 (Thread 0x7f5680ff9640 (LWP 3002588) "iou-wrk-2998748"): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 or like this when diagnosing a core dump: Thread 8 (LWP 2772238): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 Thread 7 (LWP 2771665): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 Thread 6 (LWP 2770649): #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 I was able to repeat a shutdown hang with the following invocation on NVMe (512-byte physical blocks) on a 10.8-based tree: ./mtr --parallel=auto --repeat=10 innodb.undo_truncate_recover,4k{,,,,,,,,,,,,,,} The hang would repeat also in a cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 build (when invoking pwrite() in simulated_aio threads). I will post another update after I have analyzed and fixed this hang, and attempted another io_uring run.

Marko Mäkelä added a comment - 2022-01-03 15:53 - edited

I reported the 10.6 hang of innodb.undo_truncate_recover as ~~MDEV-27414~~.

On our CI systems, I have noticed checkpoint-related hangs in 10.5+ innodb.innodb-wl5522* tests mostly on Microsoft Windows (where we do not use a RAM disk). We do not even have an io_uring interface in 10.5. Edit: I reproduced this and filed as ~~MDEV-27416~~.

I think that the remaining hard-to-reproduce sporadic hangs where used to blame io_uring are in our own code.

Our revised work-around of the Linux kernel bug will consider that anything between Linux 5.11.0 and 5.15.2 (inclusive) is potentially broken. The initial work-around that appeared in 10.6.5 disabled innodb_use_native_aio by default on any 5.15 kernel.

Marko Mäkelä added a comment - 2022-01-03 15:53 - edited I reported the 10.6 hang of innodb.undo_truncate_recover as MDEV-27414 . On our CI systems, I have noticed checkpoint-related hangs in 10.5+ innodb.innodb-wl5522* tests mostly on Microsoft Windows (where we do not use a RAM disk). We do not even have an io_uring interface in 10.5. Edit: I reproduced this and filed as MDEV-27416 . I think that the remaining hard-to-reproduce sporadic hangs where used to blame io_uring are in our own code. Our revised work-around of the Linux kernel bug will consider that anything between Linux 5.11.0 and 5.15.2 (inclusive) is potentially broken. The initial work-around that appeared in 10.6.5 disabled innodb_use_native_aio by default on any 5.15 kernel.

ajs124 added a comment - 2022-02-18 13:18

I've gotten reports of users that sound similar to this with kernel 5.16, btrfs and io_uring (see https://github.com/NixOS/nixpkgs/issues/160516).

Should I open a separate issue for that?

ajs124 added a comment - 2022-02-18 13:18 I've gotten reports of users that sound similar to this with kernel 5.16, btrfs and io_uring (see https://github.com/NixOS/nixpkgs/issues/160516 ). Should I open a separate issue for that?

Marko Mäkelä added a comment - 2022-02-19 09:31

ajs124, yes, please file a separate ticket. The closest matching open bug is ~~MDEV-27449~~, reporting corruption on btrfs and Linux 5.15, but it was not stated whether the io_uring interface was being used (instead of libaio).

Marko Mäkelä added a comment - 2022-02-19 09:31 ajs124 , yes, please file a separate ticket. The closest matching open bug is MDEV-27449 , reporting corruption on btrfs and Linux 5.15, but it was not stated whether the io_uring interface was being used (instead of libaio ).

MariaDB Server

io_uring related hangs on the Linux kernel

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration