[MDEV-25451] TPC-C in-memory performance degradation (dblwr + s.t. more) Created: 2021-04-19 Updated: 2021-09-16 Resolved: 2021-07-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.5.12, 10.6.3 |
| Type: | Task | Priority: | Major |
| Reporter: | Alexander Krizhanovsky (Inactive) | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | performance | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Epic Link: | adaptive flushing | ||||||||
| Description |
|
I'm evaluating MariaDB 10.5.9 performance against 10.5.5 as was described in https://www.percona.com/blog/2020/08/14/evaluating-performance-improvements-in-mariadb-10-5-5/ . Also I tried current 10.6 (as of 2ad61c678243dec2a808b1f67b5760a725163052) and noticed significant performance degradation against 10.5.9 on in-memory workload (see mariadb_all.png). Data size produced with sysbench-tpcc is 104G and buffer pool is 116G, so this is pure in-memory workload (all buffer_LRU_batch* metrics are zero). The testbed machine has slow NVMe drive (26 KIOPS for random writes and 320 KIOPS for random reads), 128GB RAM and 40 hyperthreads. While the original tests are 3 hours, 10.6 shows much slower performance right at the begin of the test, so I reproduced the problem on 10 minutes workloads. As tpcc_mariadb-10.6_116bp_600.stat and tpcc_mariadb-10.5.9_116bp_600.stat sysbench statistic reports there are 3845 TPS for 10.6 and 4965 TPS for 10.5.9. 10.6 and 10.5.9 have quite different I/O and CPU profiles shown in the .dstat files. Since this is in-memory workload, I suspected either doublewrite buffer or checkpointing. I reran the tests with doublewrite buffer switched off (_nodblwr.stat files): 10.6 is still slower than 10.5.9, but 10.6 is much more affected by switching off doublewrite buffer, than 10.5.9. So there is some problem with double write, but there is definitely something more. I collected some InnoDB LRU, flushing, checkpointing, and log metrics with 10 second intervals (*.metrics files), but I didn't find any significant differences except log_lsn_checkpoint_age (see checkpoint.png). There were more or less the same number of checkpoints, page flushes, and log writes. 10.6 was built with io_uring, but I checked with gbd, that there were no aio_uring::submit_io() calls. |
| Comments |
| Comment by Marko Mäkelä [ 2021-04-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please check the newest 10.6 where the fix of | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Krizhanovsky (Inactive) [ 2021-04-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
The recent commit 8751aa7397b2e698fa0b46ec3e60abb9e2fd7e1b significantly improves 10.6 performance, but there is still small performance degradation observed on short 10 minutes and long 3 hours TPC-C tests. I'm attaching plots for 3 hours test. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Krizhanovsky (Inactive) [ 2021-04-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I also collected results for the IO bound TPC-C workloads. The test is exactly the same, except buffer pool is configured to 25GB and there are only 40 threads. The average TPS for 10.5.9 is 1337, while for 10.6 with the fix is 1284. The plots are attached (I'll try to tune out the waves later). | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Krizhanovsky (Inactive) [ 2021-04-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-06-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I would suspect that the initial dip of performance could be related to It should be possible to avoid that overhead by making use of the (admittedly awkward) interface that was introduced in
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Krizhanovsky (Inactive) [ 2021-06-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried the 10.6 as of commit 891a927e804c5a3a582f6137c2f316ef7abb25ca and unfortunately there is still the deep performance dip at the begin of the TPC-C workload. I'm also attaching the sysbench output for the graph. The good point is that after the dip, MariaDB shows pretty stable performance, there is no more minor waves (the graph uses 60 seconds average and sysbenech output still contains very short performance dips, but previously we saw the dips even in the averaged graphs). | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that the easiest way to confirm my hypothesis about | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
TPCC runs in tpcc2.pdf | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you, axel. I believe that the reported regression has been fixed. To avoid the initial degradation of performance, the data load phase could be improved to take advantage of There still appears to be a scalability regression when the server is running on multiple NUMA nodes. Both 10.6.3 and the upcoming 10.5.12 show similar performance. On a 2-socket Intel server, when the number of concurrect connections is between 1 and 2 times the number of threads per CPU socket, MariaDB 10.5.6 (before | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Krizhanovsky (Inactive) [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi axel, I noticed that you used 40GB buffer pool while I observed the performance for relatively large buffer pool sizes, 100GB and more. Also how slow the SSD is? The NVMe SSD on the original servers were able to handle 64K random writes by 16KB issued by 4 threads (I verified this with fio tool), so I wouldn't say that the drive was slow. However, Dimitry had issues with reproducing performance waves http://dimitrik.free.fr/blog/posts/mysql-80-innodb-checkpointing.html with fast enough drives. I guess the deepness of the dip is caused by the relation of the buffer pool and the disk speed. When the workload begins InnoDB doesn't flush dirty pages at all (almost zero IO), but when we hit the maximum checkpoint age it goes into furious flushing and we're going into the dip. I described the behavior in https://jira.mariadb.org/browse/MDEV-25113 I'm wondering if InnoDB starts flushing right on the start of the workload? Is it possible to reproduce the original Percona tests with large buffer pool? | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
The primary purpose of axel’s benchmark was to test something else, on a different system. So, the results are not entirely comparable, but I was thinking that it should be close enough to say that we have an explanation for the initial dip of performance.
|