Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5
-
Linux kernel 5.4.0
Description
Figure "NVMe worse than SATA.png" shows that the TPC-C benchmark performance is even worse in Samsung 980 pro than in Samsung 960 evo when using the default configuration.
innodb_flush_method = fsync | O_DIRECT_NO_FSYNC
|
innodb_doublewrite = on | off
|
↑ | ↑
|
purple | yellow
|
After reducing the frequency of calling fsync, the performance get back to normal (yellow). Then I debug the issue and find that the slowness is not inside the application level: firstly, I use fio to benchmark the ideal limit as shown in figure "fio-benchmark.png".
fio --filename=/dev/nvme2n1 --size=50g --ioengine=[sync/libaio] --iodepth=[1/32] --numjobs=16 --rw=randwrite --buffered=0 --direct=1 --fsync=[1/0] --bs=[4k/128k] --sync=[none/sync] |
Then I use blktrace to further debug:
--bs=4k --fsync=1 --ioengine=libaio --iodepth=32
|
|
==================== Device Overhead ====================
|
|
DEV | Q2G G2I Q2M I2D D2C <------ time the I/O is “active” in the driver and on the device |
---------- | --------- --------- --------- --------- ---------
|
(259, 12) | 0.0158% 0.0000% 0.0007% 0.0000% 92.6055%
|
---------- | --------- --------- --------- --------- ---------
|
Overall | 0.0158% 0.0000% 0.0007% 0.0000% 92.6055%
|
Using libaio with fsync dramatically damages the random write performance. Well, this is an well-know problem, where fsync can make libaio fall back to synchronous IO. However, from the figure "fio-benchmark.png", we can confirm that using O_SYNC can workaround my problem, while innodb_flush_method surprisingly does not support O_SYNC (even support O_DSYNC). So could it possible to add another option for this parameter in a future version? Thanks!
Another request: since "devices get extremely fast, interrupt-driven work is no longer as efficient as polling for completions — a common theme that underlies the architecture of performance-oriented I/O systems." So why not plan to move to io_uring in some future version? I observe the nvme dirver queues and find that with libaio, the queues are almost empty, and sometimes even are used in a serialized way.
Attachments
Issue Links
- relates to
-
MDEV-29343 MariaDB 10.6.x slower mysqldump etc.
- Closed
-
MDEV-30136 Map innodb_flush_method to new settable Booleans innodb_{log,data}_file_{buffering,write_through}
- Closed
-
MDEV-24854 Change innodb_flush_method=O_DIRECT by default
- Closed