Details

    Description

      io_uring is fast and convenient. In contrast, Linux AIO has flaws: https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/

      liburing is a user space library which protects from writing a boiler plate code. https://github.com/axboe/liburing Let's use it.

      Attachments

        Issue Links

          Activity

            Thank you. I tested this using an artificially small redo log, to get a MDEV-23855 ‘horror scenario’ with frequent ‘furious flushing’ that would impact throughput and latency. I tested with both innodb_flush_log_at_trx_commit=0 and innodb_flush_log_at_trx_commit=1, as well as innodb_flush_method=fsync and innodb_flush_method=O_DIRECT (MDEV-24854). The results were consistent on my NVMe drive (INTEL SSDPED1D960GAY, Optane 960 series): liburing was always slightly better than libaio.

            On a SATA 3.0 HDD, the difference was more random and liburing could have been slightly slower at times. My 2-minute benchmark runs were probably simply too short to filter out randomness by averaging over a longer period of time. I can imagine that on rotational storage media, the performance depends on which sectors happen to be under the read/write head at the time of the request, in both dimensions (the spindle is rotating and the head is moving).

            marko Marko Mäkelä added a comment - Thank you. I tested this using an artificially small redo log, to get a MDEV-23855 ‘horror scenario’ with frequent ‘furious flushing’ that would impact throughput and latency. I tested with both innodb_flush_log_at_trx_commit=0 and innodb_flush_log_at_trx_commit=1 , as well as innodb_flush_method=fsync and innodb_flush_method=O_DIRECT ( MDEV-24854 ). The results were consistent on my NVMe drive (INTEL SSDPED1D960GAY, Optane 960 series): liburing was always slightly better than libaio . On a SATA 3.0 HDD, the difference was more random and liburing could have been slightly slower at times. My 2-minute benchmark runs were probably simply too short to filter out randomness by averaging over a longer period of time. I can imagine that on rotational storage media, the performance depends on which sectors happen to be under the read/write head at the time of the request, in both dimensions (the spindle is rotating and the head is moving).

            I can see that `liburing-dev` is available in Debian unstable (and soon to be released Debian 11) and in Ubuntu since Groovy (20.10): https://tracker.debian.org/pkg/liburing

            I see kevg assigned this issue to me last month but without any comments. What do you expect I should help out with here?

            otto Otto Kekäläinen added a comment - I can see that `liburing-dev` is available in Debian unstable (and soon to be released Debian 11) and in Ubuntu since Groovy (20.10): https://tracker.debian.org/pkg/liburing I see kevg assigned this issue to me last month but without any comments. What do you expect I should help out with here?

            otto, I see that you helped with PR#1773. If you do not expect any problems with adding the dependency in our packaging, I think that you can mark the review done and assign this back to kevg.

            marko Marko Mäkelä added a comment - otto , I see that you helped with PR#1773 . If you do not expect any problems with adding the dependency in our packaging, I think that you can mark the review done and assign this back to kevg .

            Related PR: https://github.com/MariaDB/server/pull/1773

            But it seems the build does not obey `-DIGNORE_AIO_CHECK=YES` not use uring yet. So I can test this only once there is some actual uring support in 10.6 code base.

            otto Otto Kekäläinen added a comment - Related PR: https://github.com/MariaDB/server/pull/1773 But it seems the build does not obey `-DIGNORE_AIO_CHECK=YES` not use uring yet. So I can test this only once there is some actual uring support in 10.6 code base.

            I think that there are 2 things that must be fixed before this is completed:

            Startup fails with ENOMEM followed by SIGSEGV. This is repeatable if you set ulimit -l 0 before starting the server:

            Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: io_uring_queue_init() failed with errno 12
            Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: 210312  1:14:08 [ERROR] mysqld got signal 11 ;
            

            Shutdown (after a successful startup) fails with SIGABRT:

            Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
            0x00007ffff765366f in __GI___poll (fds=fds@entry=0x555557593a60, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
            29	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
            (gdb) c
            Continuing.
            2021-03-12  9:11:20 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
            io_uring_wait_cqe() returned -4
            2021-03-12  9:11:20 0 [Note] Event Scheduler: Purging the queue. 0 events
            2021-03-12  9:11:20 0 [Note] InnoDB: FTS optimize thread exiting.
            [Thread 0x7ffff44fd640 (LWP 9238) exited]
            [Thread 0x7ffff44b2640 (LWP 9291) exited]
             
            Thread 4 "mysqld" received signal SIGABRT, Aborted.
            [Switching to Thread 0x7fffe1d1a640 (LWP 9226)]
            __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
            49	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
            (gdb) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
            #1  0x00007ffff756c864 in __GI_abort () at abort.c:79
            #2  0x00005555563b70bf in (anonymous namespace)::aio_uring::thread_routine (aio=0x555557594850) at ./tpool/aio_liburing.cc:122
            #3  0x00007ffff7957d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #4  0x00007ffff7a76590 in start_thread (arg=0x7fffe1d1a640) at pthread_create.c:463
            #5  0x00007ffff765f223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            Finally, the systemd configuration must set LimitMEMLOCK to 503282 or some smaller value that is determined to be sufficient, so that we will not have to fall back to simulated AIO. I think that the proper place to handle it is support-files/CMakeLists.txt.

            marko Marko Mäkelä added a comment - I think that there are 2 things that must be fixed before this is completed: Startup fails with ENOMEM followed by SIGSEGV . This is repeatable if you set ulimit -l 0 before starting the server: Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: io_uring_queue_init() failed with errno 12 Mar 12 01:14:08 ubuntu-groovy-amd64 mariadbd[3838]: 210312 1:14:08 [ERROR] mysqld got signal 11 ; Shutdown (after a successful startup) fails with SIGABRT: Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1. 0x00007ffff765366f in __GI___poll (fds=fds@entry=0x555557593a60, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory. (gdb) c Continuing. 2021-03-12 9:11:20 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown io_uring_wait_cqe() returned -4 2021-03-12 9:11:20 0 [Note] Event Scheduler: Purging the queue. 0 events 2021-03-12 9:11:20 0 [Note] InnoDB: FTS optimize thread exiting. [Thread 0x7ffff44fd640 (LWP 9238) exited] [Thread 0x7ffff44b2640 (LWP 9291) exited]   Thread 4 "mysqld" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe1d1a640 (LWP 9226)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007ffff756c864 in __GI_abort () at abort.c:79 #2 0x00005555563b70bf in (anonymous namespace)::aio_uring::thread_routine (aio=0x555557594850) at ./tpool/aio_liburing.cc:122 #3 0x00007ffff7957d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007ffff7a76590 in start_thread (arg=0x7fffe1d1a640) at pthread_create.c:463 #5 0x00007ffff765f223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Finally, the systemd configuration must set LimitMEMLOCK to 503282 or some smaller value that is determined to be sufficient, so that we will not have to fall back to simulated AIO. I think that the proper place to handle it is support-files/CMakeLists.txt .

            I pushed a follow-up adjustment that allows ./mtr --rr to work out of the box until io_uring() system call emulation has been implemented in rr. Note: the older system call io_setup() always returned an error under rr.

            marko Marko Mäkelä added a comment - I pushed a follow-up adjustment that allows ./mtr --rr to work out of the box until io_uring() system call emulation has been implemented in rr . Note: the older system call io_setup() always returned an error under rr .

            People

              kevg Eugene Kosov (Inactive)
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.