[MDEV-11378] AliSQL: [Perf] Issue#23 MERGE INNODB AIO REQUEST - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Storage Engine - InnoDB
Labels:
- linux
- performance

Epic Link:
AliSQL patches

Description

Description:

------------

The InnoDB engine support native AIO and simulated AIO on linux platform.

Native AIO use io_submit that glibc supplied to request IO.

But InnoDB engine requested AIO one by one through io_submit when trigger read-ahead,

so it is a little inefficiency.

Solution:

---------

We buffered the AIO requests, then io_submit all.

For example: when linear-ahead. we buffered next 64 pages io requests,

at last, io_submit all io requests.

https://github.com/alibaba/AliSQL/commit/4c9d1c72b9db5f7d2267906e0fa6d66948f5dc6c

Attachments

Issue Links

relates to

MDEV-26547 Restoring InnoDB buffer pool dump is single-threaded for no reason

Closed

MDEV-30986 Slow full index scan in 10.6 vs 10.5 for the (slow) I/O-bound case

Closed

MDEV-32067 InnoDB linear read ahead had better be logical

Confirmed

MDEV-16526 Overhaul the InnoDB page flushing

Closed

MDEV-31095 Create separate tpool thread for async aio

Closed

MDEV-31492 Provide a way to "pin" InnoDB table in memory

Open

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Marko Mäkelä added a comment - 2021-09-15 07:36

I wonder whether combining requests would make any sense at all with modern storage devices, which should have deep work queues and could combine requests at the low level by themselves. I do not know it, but I could believe that even on HDD a native command queue could implement the ‘elevator algorithm’ for optimizing the head movements.

One reason against combining read requests would seem to be that if we completed the reads of multiple pages at once, then we would be validating page checksums within only one execution thread. If we received read completion callbacks for each individual page, then multiple checksums could be calculated in parallel and we could utilize the I/O capacity better. It still was nowhere near the maximum capacity of a fast NVMe when I tested ~~MDEV-26547~~.

Marko Mäkelä added a comment - 2021-09-15 07:36 I wonder whether combining requests would make any sense at all with modern storage devices, which should have deep work queues and could combine requests at the low level by themselves. I do not know it, but I could believe that even on HDD a native command queue could implement the ‘elevator algorithm’ for optimizing the head movements. One reason against combining read requests would seem to be that if we completed the reads of multiple pages at once, then we would be validating page checksums within only one execution thread. If we received read completion callbacks for each individual page, then multiple checksums could be calculated in parallel and we could utilize the I/O capacity better. It still was nowhere near the maximum capacity of a fast NVMe when I tested MDEV-26547 .

Michael Widenius added a comment - 2021-09-15 12:40

Modern storage devices does bigger internal reads, but only 'around' the requested page, not forward from the current page.
For example, on SSD with 128K internal reads, if you read a page starting ad 64K, it will read data from 0-128K.
Newer ssd's based on persistent memory will only read exactly what you ask. However if you can read out what you are likely to use, that will be faster than many independent reads even on these kind of devices.
The ONLY way to know is to run benchmarks on a set of devices:
Modern hard disk, modern SSD and on persistent memory devices.
(Hard disks will be used on the cloud for the foreseeable future just because they are MUCH cheaper)

Another thing is that doing one kernel request instead of 64, is still much better!

Michael Widenius added a comment - 2021-09-15 12:40 Modern storage devices does bigger internal reads, but only 'around' the requested page, not forward from the current page. For example, on SSD with 128K internal reads, if you read a page starting ad 64K, it will read data from 0-128K. Newer ssd's based on persistent memory will only read exactly what you ask. However if you can read out what you are likely to use, that will be faster than many independent reads even on these kind of devices. The ONLY way to know is to run benchmarks on a set of devices: Modern hard disk, modern SSD and on persistent memory devices. (Hard disks will be used on the cloud for the foreseeable future just because they are MUCH cheaper) Another thing is that doing one kernel request instead of 64, is still much better!

Marko Mäkelä added a comment - 2023-04-18 12:43

I agree that it could make sense to merge read I/O requests at least when initializing the buffer pool according to the ib_buffer_pool file. Each read request could comprise multiple adjacent pages (say, 64 pages or 1 megabyte per request). Multi-threaded processing would still be possible.

Marko Mäkelä added a comment - 2023-04-18 12:43 I agree that it could make sense to merge read I/O requests at least when initializing the buffer pool according to the ib_buffer_pool file. Each read request could comprise multiple adjacent pages (say, 64 pages or 1 megabyte per request). Multi-threaded processing would still be possible.

Vladislav Vaintroub added a comment - 2023-05-29 20:06

I guess this all needs some experimentation to prove whether increased complexity here is justified. I'm not entirely convinced that submitting and processing 64x16K asynchronous IO requests in sorted order would be much slower than submitting 1x1MB request, and then processing 64x16k pages be it on NVME, SSD or harddisk .

Vladislav Vaintroub added a comment - 2023-05-29 20:06 I guess this all needs some experimentation to prove whether increased complexity here is justified. I'm not entirely convinced that submitting and processing 64x16K asynchronous IO requests in sorted order would be much slower than submitting 1x1MB request, and then processing 64x16k pages be it on NVME, SSD or harddisk .

Marko Mäkelä added a comment - 2023-08-11 06:17

Recent experience in ~~MDEV-30986~~ suggests that a read-ahead of multiple adjacent pages in a single request could be well worth the added complexity.

When it comes to page writes in buf_flush_page_cleaner(), possibly we could check if buf_dblwr_t::flush_buffered_writes_completed() could submit a single scatter-gather write request when a write of up to 128 pages has completed. Similarly, when the doublewrite buffer is disabled or not needed (~~MDEV-19738~~), we might try to include multiple pages in a single write request.

Marko Mäkelä added a comment - 2023-08-11 06:17 Recent experience in MDEV-30986 suggests that a read-ahead of multiple adjacent pages in a single request could be well worth the added complexity. When it comes to page writes in buf_flush_page_cleaner() , possibly we could check if buf_dblwr_t::flush_buffered_writes_completed() could submit a single scatter-gather write request when a write of up to 128 pages has completed. Similarly, when the doublewrite buffer is disabled or not needed ( MDEV-19738 ), we might try to include multiple pages in a single write request.

MariaDB Server

AliSQL: [Perf] Issue#23 MERGE INNODB AIO REQUEST

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration