I think that there are 2 things that must be fixed before this is completed:
Startup fails with ENOMEM followed by SIGSEGV. This is repeatable if you set ulimit -l 0 before starting the server:
Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: io_uring_queue_init() failed with errno 12
|
Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: 210312 1:14:08 [ERROR] mysqld got signal 11 ;
|
Shutdown (after a successful startup) fails with SIGABRT:
Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
|
0x00007ffff765366f in __GI___poll (fds=fds@entry=0x555557593a60, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
|
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
|
(gdb) c
|
Continuing.
|
2021-03-12 9:11:20 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
|
io_uring_wait_cqe() returned -4
|
2021-03-12 9:11:20 0 [Note] Event Scheduler: Purging the queue. 0 events
|
2021-03-12 9:11:20 0 [Note] InnoDB: FTS optimize thread exiting.
|
[Thread 0x7ffff44fd640 (LWP 9238) exited]
|
[Thread 0x7ffff44b2640 (LWP 9291) exited]
|
|
Thread 4 "mysqld" received signal SIGABRT, Aborted.
|
[Switching to Thread 0x7fffe1d1a640 (LWP 9226)]
|
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
|
49 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
|
(gdb) bt
|
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
|
#1 0x00007ffff756c864 in __GI_abort () at abort.c:79
|
#2 0x00005555563b70bf in (anonymous namespace)::aio_uring::thread_routine (aio=0x555557594850) at ./tpool/aio_liburing.cc:122
|
#3 0x00007ffff7957d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
|
#4 0x00007ffff7a76590 in start_thread (arg=0x7fffe1d1a640) at pthread_create.c:463
|
#5 0x00007ffff765f223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
Finally, the systemd configuration must set LimitMEMLOCK to 503282 or some smaller value that is determined to be sufficient, so that we will not have to fall back to simulated AIO. I think that the proper place to handle it is support-files/CMakeLists.txt.
Thank you. I tested this using an artificially small redo log, to get a
MDEV-23855‘horror scenario’ with frequent ‘furious flushing’ that would impact throughput and latency. I tested with both innodb_flush_log_at_trx_commit=0 and innodb_flush_log_at_trx_commit=1, as well as innodb_flush_method=fsync and innodb_flush_method=O_DIRECT (MDEV-24854). The results were consistent on my NVMe drive (INTEL SSDPED1D960GAY, Optane 960 series): liburing was always slightly better than libaio.On a SATA 3.0 HDD, the difference was more random and liburing could have been slightly slower at times. My 2-minute benchmark runs were probably simply too short to filter out randomness by averaging over a longer period of time. I can imagine that on rotational storage media, the performance depends on which sectors happen to be under the read/write head at the time of the request, in both dimensions (the spindle is rotating and the head is moving).