Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
This came up during the MDEV-15016 review.
I started to wonder whether multiple InnoDB buffer pools actually help with any workloads. Yes, it probably was a good idea to split the buffer pool mutex when Inaam Rana introduced multiple buffer pools in MySQL 5.5.5, but since then, there have been multiple fixes to reduce contention on the buffer pool mutex, such as Inaam's follow-up fix in MySQL 5.6.2 to use rw-locks instead of mutexes for the buf_pool->page_hash.
In MySQL 8.0.0, Shaohua Wang implemented one more thing that MariaDB should copy: MDEV-15053 Split buf_pool_t::mutex.
I think that we should seriously consider removing all code to support multiple buffer pools or page cleaners.
Should multiple buffer pools be needed in the future (for example, on NUMA machines), it should be designed better from the ground up. Currently the partitioning is arbitrary; buffer pool membership is basically determined by a hash of the page number.
The description of WL#6642: InnoDB: multiple page_cleaner threads seems to imply that it may have been a mistake to partition the buffer pool.
Note: partitioning or splitting mutexes often seems to be a good idea. But partitioning data structures or threads might not be.
axel, please test different workloads with innodb_buffer_pool_instances=1 and innodb_page_cleaners=1, and compare the performance to configurations that use multiple buffer pools (and page cleaners). If using a single buffer pool instance never seems to causes any regression, I think that we should simplify the code.
Attachments
Issue Links
- blocks
-
MDEV-21962 Allocate buf_pool statically
-
- Closed
-
- causes
-
MDEV-22027 Assertion oldest_lsn >= log_sys.last_checkpoint_lsn failed in log_checkpoint()
-
- Closed
-
-
MDEV-22114 Assertion `!is_owned()' failed. | handle_fatal_signal (sig=6) in MutexDebug
-
- Closed
-
-
MDEV-23399 10.5 performance regression with IO-bound tpcc
-
- Closed
-
-
MDEV-33966 sysbench performance regression with concurrent workloads
-
- Stalled
-
- relates to
-
MDEV-15053 Reduce buf_pool_t::mutex contention
-
- Closed
-
-
MDEV-21212 buf_page_get_gen -> buf_pool->stat.n_page_gets++ is a cpu waste (0.5-1%)
-
- Closed
-
-
MDEV-15016 multiple page cleaner threads use a lot of CPU on idle server
-
- Closed
-
-
MDEV-15685 large pages - out of memory handling results in SIGBUS sql/sql_show.cc:6822(get_schema_views_record
-
- Open
-
-
MDEV-16526 Overhaul the InnoDB page flushing
-
- Closed
-
So, I ran this benchmark, which I think resembles the axel "sweetspot" close enough
my.cnf
[mysqld]
#####non innodb options
max_connections = 300
table_open_cache = 600
query_cache_type = 0
#####innodb options
innodb_buffer_pool_size = 1G
innodb_log_buffer_size = 32M
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = 0
loose-innodb_adaptive_hash_index_partitions = 32
loose-innodb_adaptive_hash_index_parts = 32
#####SSD
innodb-flush-method = O_DIRECT
innodb_io_capacity = 4000
loose-innodb_flush_neighbors = 0
innodb_write_io_threads = 8
#####the variables for this test
innodb_buffer_pool_instances = 1
innodb_max_dirty_pages_pct = 99
skip-innodb_adaptive_hash_index
skip-innodb-stats-persistent
innodb-change-buffering=none
innodb_file_per_table = 0
script to run with sysbench 1.0
sysbench --test=/usr/share/sysbench/oltp_update_index.lua --tables=32 --table-size=1250000 --rand-seed=42 --rand-type=uniform --num-threads=32 --report-interval=2 --mysql-socket=/tmp/mysql.sock --time=300 --max-requests=0 --mysql-user=root --percentile=95 $1
where $1 is either "prepare" or "run" (you need to have a database called sbtest)
Note on benchmark itself - it uses very low bufferpool to data size ratio (I believe data would be around 8-10GB if it was in files, rather than ibdata1), and only 1GB buffer pool, so it is designed to be IO intensive. It uses whole 2 CPUs out of 56 on the benchmark machine (and the difference between 1 and 4 buffer pools was not obvious in "top")
For the benchmarks, I ran the server with either innodb_buffer_pool_instances set to 1 or 4
4 buffer pools wins against 1 buffer pool with about 9000 tps against around 6000 tps, at least we can say that whatever Axel had found is reproducible for this use case.
I attached pt-pmp output 1bp.txt
(single buffer pool instance) and 4bp.txt
(4 instances), made with 20 samples separated by 10 seconds delay (If someone knows a more modern tool to profile on contention , please tell)
From that, I grepped TTAS to find out lines with innodb mutexes (but please, take a look also on anything else, maybe I missed something)
I think the contention might be on the buffer pool mutex in buf_page_io_complete(buf0buf.cc:6019) , at least it appears rather oft in 1bp.txt in a couple of different callstacks . here is the code in question.
https://github.com/MariaDB/server/blob/f3dac591747dfbd88bd8ae2855f9a0e64006ce75/storage/innobase/buf/buf0buf.cc#L6019