Details
- 
    Bug 
- 
    Status: Stalled (View Workflow)
- 
    Major 
- 
    Resolution: Unresolved
- 
    10.5(EOL)
- 
    Linux kernel 5.4.0
Description
Figure "NVMe worse than SATA.png" shows that the TPC-C benchmark performance is even worse in Samsung 980 pro than in Samsung 960 evo when using the default configuration.
| innodb_flush_method = fsync   |   O_DIRECT_NO_FSYNC | 
| innodb_doublewrite  = on      |   off  | 
|                        ↑      |    ↑ | 
|                     purple    |  yellow
 | 
After reducing the frequency of calling fsync, the performance get back to normal (yellow). Then I debug the issue and find that the slowness is not inside the application level: firstly, I use fio to benchmark the ideal limit as shown in figure "fio-benchmark.png".
| fio --filename=/dev/nvme2n1 --size=50g --ioengine=[sync/libaio] --iodepth=[1/32] --numjobs=16 --rw=randwrite --buffered=0 --direct=1 --fsync=[1/0] --bs=[4k/128k] --sync=[none/sync] | 
Then I use blktrace to further debug:
| --bs=4k --fsync=1 --ioengine=libaio --iodepth=32 | 
|  | 
| ==================== Device Overhead ==================== | 
|  | 
| DEV | Q2G G2I Q2M I2D D2C <------ time the I/O is “active” in the driver and on the device | 
| ---------- | --------- --------- --------- --------- --------- | 
|  (259, 12) |   0.0158%   0.0000%   0.0007%   0.0000%  92.6055% | 
| ---------- | --------- --------- --------- --------- --------- | 
|    Overall |   0.0158%   0.0000%   0.0007%   0.0000%  92.6055%
 | 
Using libaio with fsync dramatically damages the random write performance. Well, this is an well-know problem, where fsync can make libaio fall back to synchronous IO. However, from the figure "fio-benchmark.png", we can confirm that using O_SYNC can workaround my problem, while innodb_flush_method surprisingly does not support O_SYNC (even support O_DSYNC). So could it possible to add another option for this parameter in a future version? Thanks!
Another request: since "devices get extremely fast, interrupt-driven work is no longer as efficient as polling for completions — a common theme that underlies the architecture of performance-oriented I/O systems." So why not plan to move to io_uring in some future version? I observe the nvme dirver queues and find that with libaio, the queues are almost empty, and sometimes even are used in a serialized way.
Attachments
Issue Links
- relates to
- 
                    MDEV-29343 MariaDB 10.6.x slower mysqldump etc. -         
- Closed
 
-         
- 
                    MDEV-30136 Map innodb_flush_method to new settable Booleans innodb_{log,data}_file_{buffering,write_through} -         
- Closed
 
-         
- 
                    MDEV-24854 Change innodb_flush_method=O_DIRECT by default -         
- Closed
 
-         

