[MDEV-24883] add io_uring support for tpool Created: 2021-02-16  Updated: 2023-06-26  Resolved: 2021-03-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.6.0

Type: Task Priority: Major
Reporter: Eugene Kosov (Inactive) Assignee: Eugene Kosov (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-25760 Assertion failure on io_uring_cqe_get... Closed
causes MDEV-28441 liburing failure causes assertions Closed
Relates
relates to MDEV-24927 Install liburing-dev in the CI enviro... Closed
relates to MDEV-26569 10.6 mariadbd error: io_uring_queue_i... Closed
relates to MDEV-26674 io_uring related hangs on the Linux k... Closed
relates to MDEV-29610 uring tmpfs InnoDB: IO Error: 125 dur... Stalled
relates to MDEV-26555 main.innodb_ext_key fatal assertion Closed

 Description   

io_uring is fast and convenient. In contrast, Linux AIO has flaws: https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/

liburing is a user space library which protects from writing a boiler plate code. https://github.com/axboe/liburing Let's use it.



 Comments   
Comment by Marko Mäkelä [ 2021-02-19 ]

Thank you. I tested this using an artificially small redo log, to get a MDEV-23855 ‘horror scenario’ with frequent ‘furious flushing’ that would impact throughput and latency. I tested with both innodb_flush_log_at_trx_commit=0 and innodb_flush_log_at_trx_commit=1, as well as innodb_flush_method=fsync and innodb_flush_method=O_DIRECT (MDEV-24854). The results were consistent on my NVMe drive (INTEL SSDPED1D960GAY, Optane 960 series): liburing was always slightly better than libaio.

On a SATA 3.0 HDD, the difference was more random and liburing could have been slightly slower at times. My 2-minute benchmark runs were probably simply too short to filter out randomness by averaging over a longer period of time. I can imagine that on rotational storage media, the performance depends on which sectors happen to be under the read/write head at the time of the request, in both dimensions (the spindle is rotating and the head is moving).

Comment by Otto Kekäläinen [ 2021-03-09 ]

I can see that `liburing-dev` is available in Debian unstable (and soon to be released Debian 11) and in Ubuntu since Groovy (20.10): https://tracker.debian.org/pkg/liburing

I see kevg assigned this issue to me last month but without any comments. What do you expect I should help out with here?

Comment by Marko Mäkelä [ 2021-03-10 ]

otto, I see that you helped with PR#1773. If you do not expect any problems with adding the dependency in our packaging, I think that you can mark the review done and assign this back to kevg.

Comment by Otto Kekäläinen [ 2021-03-10 ]

Related PR: https://github.com/MariaDB/server/pull/1773

But it seems the build does not obey `-DIGNORE_AIO_CHECK=YES` not use uring yet. So I can test this only once there is some actual uring support in 10.6 code base.

Comment by Marko Mäkelä [ 2021-03-12 ]

I think that there are 2 things that must be fixed before this is completed:

Startup fails with ENOMEM followed by SIGSEGV. This is repeatable if you set ulimit -l 0 before starting the server:

Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: io_uring_queue_init() failed with errno 12
Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: 210312  1:14:08 [ERROR] mysqld got signal 11 ;

Shutdown (after a successful startup) fails with SIGABRT:

Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
0x00007ffff765366f in __GI___poll (fds=fds@entry=0x555557593a60, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) c
Continuing.
2021-03-12  9:11:20 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
io_uring_wait_cqe() returned -4
2021-03-12  9:11:20 0 [Note] Event Scheduler: Purging the queue. 0 events
2021-03-12  9:11:20 0 [Note] InnoDB: FTS optimize thread exiting.
[Thread 0x7ffff44fd640 (LWP 9238) exited]
[Thread 0x7ffff44b2640 (LWP 9291) exited]
 
Thread 4 "mysqld" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe1d1a640 (LWP 9226)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff756c864 in __GI_abort () at abort.c:79
#2  0x00005555563b70bf in (anonymous namespace)::aio_uring::thread_routine (aio=0x555557594850) at ./tpool/aio_liburing.cc:122
#3  0x00007ffff7957d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7a76590 in start_thread (arg=0x7fffe1d1a640) at pthread_create.c:463
#5  0x00007ffff765f223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Finally, the systemd configuration must set LimitMEMLOCK to 503282 or some smaller value that is determined to be sufficient, so that we will not have to fall back to simulated AIO. I think that the proper place to handle it is support-files/CMakeLists.txt.

Comment by Marko Mäkelä [ 2021-03-18 ]

I pushed a follow-up adjustment that allows ./mtr --rr to work out of the box until io_uring() system call emulation has been implemented in rr. Note: the older system call io_setup() always returned an error under rr.

Generated at Thu Feb 08 09:33:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.