I conducted some more analysis today.
Unlike yesterday, today I did not observe any regression for Sysbench oltp_update_index when using 1000 concurrent connections, 4GiB data set size, 2GiB buffer pool size. I had rebooted my workstation, and /dev/shm could be smaller or Firefox might not have consumed as much memory as it had by yesterday evening.
After I reduced the buffer pool size to 1GiB or 512MiB, I finally repeated yesterday’s regression at 1000 concurrent connections. For up to 32 concurrent connections there always was improvement (roughly halved maximum latency time), even when using such a small buffer pool size. The culprit for the regression appears to be increased contention on buf_pool.mutex.
This benchmark setup may not be representative, because I used very fast NVMe storage and innodb_flush_log_at_trx_commit=0. With proper durability setting and slower storage (SATA SSD or HDD), I/O latency should dominate.
For the record, I used the following commands to collect profiling information while the workload was running:
sudo offcputime-bpfcc --stack-storage-size=1048576 -df -p $(pgrep -nx mysqld) 30 > out64.stacks
|
flamegraph.pl --color=io --title="Off-CPU Time Flame Graph" --countname=us < out64.stacks > out64.svg
|
See http://www.brendangregg.com/offcpuanalysis.html for more information.
Works well for me on Windows, you patch. On higher thread counts, the patch outperforms the baseline by the factor of 3.