Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11378

AliSQL: [Perf] Issue#23 MERGE INNODB AIO REQUEST

Details

    Description

      Description:
      ------------
      The InnoDB engine support native AIO and simulated AIO on linux platform.
      Native AIO use io_submit that glibc supplied to request IO.
      But InnoDB engine requested AIO one by one through io_submit when trigger read-ahead,
      so it is a little inefficiency.
       
      Solution:
      ---------
      We buffered the AIO requests, then io_submit all.
      For example: when linear-ahead. we buffered next 64 pages io requests,
      at last, io_submit all io requests.
      

      https://github.com/alibaba/AliSQL/commit/4c9d1c72b9db5f7d2267906e0fa6d66948f5dc6c

      Attachments

        Issue Links

          Activity

            I wonder whether combining requests would make any sense at all with modern storage devices, which should have deep work queues and could combine requests at the low level by themselves. I do not know it, but I could believe that even on HDD a native command queue could implement the ‘elevator algorithm’ for optimizing the head movements.

            One reason against combining read requests would seem to be that if we completed the reads of multiple pages at once, then we would be validating page checksums within only one execution thread. If we received read completion callbacks for each individual page, then multiple checksums could be calculated in parallel and we could utilize the I/O capacity better. It still was nowhere near the maximum capacity of a fast NVMe when I tested MDEV-26547.

            marko Marko Mäkelä added a comment - I wonder whether combining requests would make any sense at all with modern storage devices, which should have deep work queues and could combine requests at the low level by themselves. I do not know it, but I could believe that even on HDD a native command queue could implement the ‘elevator algorithm’ for optimizing the head movements. One reason against combining read requests would seem to be that if we completed the reads of multiple pages at once, then we would be validating page checksums within only one execution thread. If we received read completion callbacks for each individual page, then multiple checksums could be calculated in parallel and we could utilize the I/O capacity better. It still was nowhere near the maximum capacity of a fast NVMe when I tested MDEV-26547 .

            Modern storage devices does bigger internal reads, but only 'around' the requested page, not forward from the current page.
            For example, on SSD with 128K internal reads, if you read a page starting ad 64K, it will read data from 0-128K.
            Newer ssd's based on persistent memory will only read exactly what you ask. However if you can read out what you are likely to use, that will be faster than many independent reads even on these kind of devices.
            The ONLY way to know is to run benchmarks on a set of devices:
            Modern hard disk, modern SSD and on persistent memory devices.
            (Hard disks will be used on the cloud for the foreseeable future just because they are MUCH cheaper)

            Another thing is that doing one kernel request instead of 64, is still much better!

            monty Michael Widenius added a comment - Modern storage devices does bigger internal reads, but only 'around' the requested page, not forward from the current page. For example, on SSD with 128K internal reads, if you read a page starting ad 64K, it will read data from 0-128K. Newer ssd's based on persistent memory will only read exactly what you ask. However if you can read out what you are likely to use, that will be faster than many independent reads even on these kind of devices. The ONLY way to know is to run benchmarks on a set of devices: Modern hard disk, modern SSD and on persistent memory devices. (Hard disks will be used on the cloud for the foreseeable future just because they are MUCH cheaper) Another thing is that doing one kernel request instead of 64, is still much better!

            I agree that it could make sense to merge read I/O requests at least when initializing the buffer pool according to the ib_buffer_pool file. Each read request could comprise multiple adjacent pages (say, 64 pages or 1 megabyte per request). Multi-threaded processing would still be possible.

            marko Marko Mäkelä added a comment - I agree that it could make sense to merge read I/O requests at least when initializing the buffer pool according to the ib_buffer_pool file. Each read request could comprise multiple adjacent pages (say, 64 pages or 1 megabyte per request). Multi-threaded processing would still be possible.

            I guess this all needs some experimentation to prove whether increased complexity here is justified. I'm not entirely convinced that submitting and processing 64x16K asynchronous IO requests in sorted order would be much slower than submitting 1x1MB request, and then processing 64x16k pages be it on NVME, SSD or harddisk .

            wlad Vladislav Vaintroub added a comment - I guess this all needs some experimentation to prove whether increased complexity here is justified. I'm not entirely convinced that submitting and processing 64x16K asynchronous IO requests in sorted order would be much slower than submitting 1x1MB request, and then processing 64x16k pages be it on NVME, SSD or harddisk .

            Recent experience in MDEV-30986 suggests that a read-ahead of multiple adjacent pages in a single request could be well worth the added complexity.

            When it comes to page writes in buf_flush_page_cleaner(), possibly we could check if buf_dblwr_t::flush_buffered_writes_completed() could submit a single scatter-gather write request when a write of up to 128 pages has completed. Similarly, when the doublewrite buffer is disabled or not needed (MDEV-19738), we might try to include multiple pages in a single write request.

            marko Marko Mäkelä added a comment - Recent experience in MDEV-30986 suggests that a read-ahead of multiple adjacent pages in a single request could be well worth the added complexity. When it comes to page writes in buf_flush_page_cleaner() , possibly we could check if buf_dblwr_t::flush_buffered_writes_completed() could submit a single scatter-gather write request when a write of up to 128 pages has completed. Similarly, when the doublewrite buffer is disabled or not needed ( MDEV-19738 ), we might try to include multiple pages in a single write request.

            People

              marko Marko Mäkelä
              svoj Sergey Vojtovich
              Votes:
              5 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.