[MDEV-28909] Write performance not scale to NVMe SSD Created: 2022-06-20 Updated: 2023-10-05 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Tim He | Assignee: | Axel Schwenke |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
Linux kernel 5.4.0 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Figure "NVMe worse than SATA.png" shows that the TPC-C benchmark performance is even worse in Samsung 980 pro than in Samsung 960 evo when using the default configuration.
After reducing the frequency of calling fsync, the performance get back to normal (yellow). Then I debug the issue and find that the slowness is not inside the application level: firstly, I use fio to benchmark the ideal limit as shown in figure "fio-benchmark.png".
Then I use blktrace to further debug:
Using libaio with fsync dramatically damages the random write performance. Well, this is an well-know problem, where fsync can make libaio fall back to synchronous IO. However, from the figure "fio-benchmark.png", we can confirm that using O_SYNC can workaround my problem, while innodb_flush_method surprisingly does not support O_SYNC (even support O_DSYNC). So could it possible to add another option for this parameter in a future version? Thanks! Another request: since "devices get extremely fast, interrupt-driven work is no longer as efficient as polling for completions — a common theme that underlies the architecture of performance-oriented I/O systems." So why not plan to move to io_uring in some future version? I observe the nvme dirver queues and find that with libaio, the queues are almost empty, and sometimes even are used in a serialized way. |
| Comments |
| Comment by Marko Mäkelä [ 2022-06-20 ] | |
|
Tim He, can you please state the MariaDB Server version that you are using? Is it older than 10.6, where What are your other InnoDB configuration parameters? Note: innodb_doublewrite=off is only safe if writes of innodb_page_size are known to be atomic. I have not seen any Linux documentation on this, but I might assume that on a SSD with a physical block size of 4096 bytes, innodb_page_size=4k could be safe to be used with innodb_doublewrite=off. Also note: With 4096-byte physical block size, MariaDB Server 10.8 should yield better write performance than earlier versions. See also | |
| Comment by Tim He [ 2022-06-22 ] | |
|
Version 10.5.13. Other parameters are unchanged. | |
| Comment by Marko Mäkelä [ 2022-08-26 ] | |
|
Tim He, were you able to test the performance of MariaDB Server 10.6 or 10.8? Also, did you test a setup where the ib_logfile0 and the InnoDB data files reside on separate devices? Which writes and fsync() are we talking about? I had written some notes about O_DIRECT and fdatasync() or fsync() in In 10.8, you may also want to check | |
| Comment by Marko Mäkelä [ 2022-11-01 ] | |
|
I think that it is worth running some performance tests, beyond what already was done in The allowed innodb_flush_method values are as follows:
Starting with
I think that we should deprecate the confusing parameter innodb_flush_method and map the existing values to some combinations of new parameters, which could be changed with SET GLOBAL while the server is running:
I do not think that we need to continue supporting innodb_flush_method=nosync on the redo log. A similar effect can already be achieved with non-default values of innodb_flush_log_at_trx_commit. | |
| Comment by Marko Mäkelä [ 2023-01-05 ] | |
|
Running performance tests should be easier by using the 4 settable Boolean parameters that were introduced in |