Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
This came up during the MDEV-15016 review.
I started to wonder whether multiple InnoDB buffer pools actually help with any workloads. Yes, it probably was a good idea to split the buffer pool mutex when Inaam Rana introduced multiple buffer pools in MySQL 5.5.5, but since then, there have been multiple fixes to reduce contention on the buffer pool mutex, such as Inaam's follow-up fix in MySQL 5.6.2 to use rw-locks instead of mutexes for the buf_pool->page_hash.
In MySQL 8.0.0, Shaohua Wang implemented one more thing that MariaDB should copy: MDEV-15053 Split buf_pool_t::mutex.
I think that we should seriously consider removing all code to support multiple buffer pools or page cleaners.
Should multiple buffer pools be needed in the future (for example, on NUMA machines), it should be designed better from the ground up. Currently the partitioning is arbitrary; buffer pool membership is basically determined by a hash of the page number.
The description of WL#6642: InnoDB: multiple page_cleaner threads seems to imply that it may have been a mistake to partition the buffer pool.
Note: partitioning or splitting mutexes often seems to be a good idea. But partitioning data structures or threads might not be.
axel, please test different workloads with innodb_buffer_pool_instances=1 and innodb_page_cleaners=1, and compare the performance to configurations that use multiple buffer pools (and page cleaners). If using a single buffer pool instance never seems to causes any regression, I think that we should simplify the code.
Attachments
Issue Links
- blocks
-
MDEV-21962 Allocate buf_pool statically
-
- Closed
-
- causes
-
MDEV-22027 Assertion oldest_lsn >= log_sys.last_checkpoint_lsn failed in log_checkpoint()
-
- Closed
-
-
MDEV-22114 Assertion `!is_owned()' failed. | handle_fatal_signal (sig=6) in MutexDebug
-
- Closed
-
-
MDEV-23399 10.5 performance regression with IO-bound tpcc
-
- Closed
-
-
MDEV-33966 sysbench performance regression with concurrent workloads
-
- Stalled
-
- relates to
-
MDEV-15053 Reduce buf_pool_t::mutex contention
-
- Closed
-
-
MDEV-21212 buf_page_get_gen -> buf_pool->stat.n_page_gets++ is a cpu waste (0.5-1%)
-
- Closed
-
-
MDEV-15016 multiple page cleaner threads use a lot of CPU on idle server
-
- Closed
-
-
MDEV-15685 large pages - out of memory handling results in SIGBUS sql/sql_show.cc:6822(get_schema_views_record
-
- Open
-
-
MDEV-16526 Overhaul the InnoDB page flushing
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Attachment | MDEV-15058.ods [ 45112 ] | |
Attachment | MDEV-15058.pdf [ 45113 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Sprint | 10.3.5-1 [ 229 ] |
Attachment | MDEV-15058-B.ods [ 45173 ] | |
Attachment | MDEV-15058-B.pdf [ 45174 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Attachment | MDEV-15058-thiru.ods [ 45238 ] | |
Attachment | MDEV-15058-thiru.pdf [ 45239 ] |
Attachment | MDEV-15058-tpcc.ods [ 45240 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Sprint | 10.3.5-1 [ 229 ] |
Attachment | MDEV-15058-singleBP.ods [ 45306 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Attachment | MDEV-15058-RAM-Intel.ods [ 45344 ] |
Attachment |
|
Attachment | MDEV-15058-RAM-Intel.ods [ 45345 ] |
Attachment | MDEV-15058-RAM-ARM.ods [ 45346 ] |
Attachment | MDEV-15058-SSD-ARM.ods [ 45354 ] |
Attachment | MDEV-15058-SSD-Intel.ods [ 45355 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Link |
This issue relates to |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Assignee | Axel Schwenke [ axel ] | Marko Mäkelä [ marko ] |
Link | This issue relates to MDEV-15685 [ MDEV-15685 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue is blocked by |
Link |
This issue relates to |
Link |
This issue relates to |
Assignee | Marko Mäkelä [ marko ] | Axel Schwenke [ axel ] |
Attachment | MDEV-15058-10.4.10.ods [ 50230 ] | |
Attachment | MDEV-15058-10.4vs10.5.ods [ 50231 ] | |
Attachment | MDEV-15058-10.5.ods [ 50232 ] |
Attachment | MDEV-15058-10.5-dev.ods [ 50267 ] |
Attachment | MDEV-15058-10.5-34dafb7e3a8.ods [ 50309 ] |
Link |
This issue is blocked by |
Link |
This issue relates to |
Attachment | ramdisk-ro1.svg [ 50318 ] | |
Attachment | ramdisk-ro4.svg [ 50319 ] | |
Attachment | ramdisk-rw1.svg [ 50320 ] | |
Attachment | ramdisk-rw4.svg [ 50321 ] |
Assignee | Axel Schwenke [ axel ] | Vladislav Vaintroub [ wlad ] |
Assignee | Vladislav Vaintroub [ wlad ] | Marko Mäkelä [ marko ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2020-02-12 12:59:27.0 | 2020-02-12 12:59:27.063 |
Fix Version/s | 10.5.1 [ 24029 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue is blocked by |
Link |
This issue relates to |
Link |
This issue blocks |
Link |
This issue is blocked by |
Link |
This issue relates to |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Workflow | MariaDB v3 [ 85121 ] | MariaDB v4 [ 133448 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 36714 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 36714 ] |
Link | This issue causes MDEV-33966 [ MDEV-33966 ] |
I did a range of tests on thwo machines.
The MariaDB version used was 10.3.4 (built locally). The benchmark was sysbench in different variations and workloads (see the single sheets).
The results are a mixed bag. On Intel it looks like multiple BP partitions don't help with performance. Certainly not for INSERT workload. That are sheets 1 and 2.
Sheets 3 and 4 are for Intel, sysbench OLTP with varying percentage of writes. Here it looks that we get small benefits for read-only, but the more writes are done and the more BP partitions we have, the worse things get.
Sheet 5 is ARM, sysbench OLTP ro/rw and wo. Here we have no clear verdict. It seems that 16 or 32 buffer pools give indeed a benefit.
Sheet 6 is not buffer pool partitions, but AHI partitions. This is now quite clear, increasing AHI partitions up to 32 is good for performance. Actually I ran this test first and used 32 fpr the other tests.
About the attached files: LibreOffice messes up the conditional formatting of the cells, so I attach the sheet also as PDF. The cells in the "throughput per used core" tables are color-coded. Red means "more than 1% slower than 1 partition", green means "more than 1% faster than 1 partition".
"Throughput per used core" - this is system throughput (qps) divided by min(benchmark threads, available hw threads). On a perfecly scaling system it would give the same number independent from benchmark thread count.