[MDEV-29445] reorganise innodb buffer pool (and remove buffer pool chunks) - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.11.12, 11.4.6, 11.8.2
Component/s: Storage Engine - InnoDB
Labels:
- energy
- performance

Description

The InnoDB buffer pool had been allocated in multiple chunks, because SET GLOBAL innodb_buffer_pool_size would extend the buffer pool in chunks. This would lead to many limitations, such as the inability to shrink the buffer pool below innodb_buffer_pool_chunk_size.

It would be cleaner to:

allocate a contiguous virtual address range for a maximum supported size of buffer pool (a new parameter innodb_buffer_pool_size_max, which defaults to the initially specified innodb_buffer_pool_size)
allow the innodb_buffer_pool_size to be changed in increments of 1 megabyte
define a fixed mapping between the virtual memory addresses of buffer page descriptors page frames, to fix bugs like ~~MDEV-34677~~ and ~~MDEV-35485~~
refactor the shrinking of the buffer pool to provide more meaningful progress output and to avoid hangs

The complicated logic of having multiple buffer pool chunks can be removed, and the parameter innodb_buffer_pool_chunk_size will be deprecated and ignored.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-29445-sizes.gnumeric
2 kB
2023-08-15 14:35

Issue Links

blocks

MDEV-21203 Bad value for the variable "Buffer pool size"

Closed

MDEV-28805 SET GLOBAL innodb_buffer_pool_size=12*1024*1024 has different outcomes depending on version

Closed

MDEV-34677 Server crashes when resizing default innodb buffer pool after setting innodb-buffer-pool-chunk-size to 1M

Closed

MDEV-36197 Implement Buffer Pool Auto-Scaling Based on RAM Availability

Open

includes

MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases

Closed

is blocked by

MDEV-33559 matched_rec::block should be allocated from the buffer pool

Closed

relates to

MDEV-29432 innodb huge pages reclaim

Needs Feedback

MDEV-31976 buf_pool.unzip_LRU wastes memory and CPU

Stalled

MDEV-32175 page_align() or page_offset() may cost some performance

Closed

MDEV-32544 Setting innodb_buffer_pool_size to the maximum value can cause drastic performance degradation

Open

MDEV-33588 buf::Block_hint is a performance hog

Closed

MDEV-36061 Incorrect error handling on DDL with FULLTEXT INDEX

Closed

MDEV-9236 Dramatically overallocation of InnoDB buffer pool leads to crash

Open

MDEV-25341 innodb buffer pool soft decommit of memory

Closed

MDEV-32339 decreasing innodb_buffer_pool_size at runtime does not release memory

Closed

MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases

Closed

MDEV-35485 The test innodb.innodb_buffer_pool_resize occasionally crashes

Closed

(1 is blocked by, 11 relates to)

Activity

Descending order - Click to sort in ascending order

Marko Mäkelä added a comment - 4 days ago

The changes made many crash recovery tests hang in a Valgrind environment. I was able to reproduce the problem locally. I applied a fixup that is reducing the problem at least to some extent. The underlying issue is that the default Valgrind Memcheck tool uses an unfair scheduler. If a thread is waiting other threads to do something, thread context switches must be enforced by suitable system calls.

Marko Mäkelä added a comment - 4 days ago The changes made many crash recovery tests hang in a Valgrind environment. I was able to reproduce the problem locally. I applied a fixup that is reducing the problem at least to some extent. The underlying issue is that the default Valgrind Memcheck tool uses an unfair scheduler. If a thread is waiting other threads to do something, thread context switches must be enforced by suitable system calls.

Marko Mäkelä added a comment - 1 week ago

madvise(MADV_HUGEPAGE) is something for enabling Transparent Huge Pages (THP). When the large_pages interface is being used, we are allocating explicit huge pages with mmap(). I think that if we were to experiment with madvise(MADV_HUGEPAGE), it should be tied to a configuration parameter that is disabled by default.

Marko Mäkelä added a comment - 1 week ago madvise(MADV_HUGEPAGE) is something for enabling Transparent Huge Pages (THP). When the large_pages interface is being used, we are allocating explicit huge pages with mmap() . I think that if we were to experiment with madvise(MADV_HUGEPAGE) , it should be tied to a configuration parameter that is disabled by default.

Vladislav Vaintroub added a comment - 2025-03-19 14:13 - edited

So, I understand this correctly, that one can't "reserve address space" on Linux for large pages, but only allocate them immediately, i.e among other things, MAP_NORESERVE is a no-op.

There is however a mentioning of madvise(MADV_HUGEPAGE) however, and this sounds like it could be used. It is less explicit, but if internet, and Linux documentation does not lie, it sometimes works, for some Linuxes, thus perhaps can be used, to "commit memory"

Vladislav Vaintroub added a comment - 2025-03-19 14:13 - edited So, I understand this correctly, that one can't "reserve address space" on Linux for large pages, but only allocate them immediately, i.e among other things, MAP_NORESERVE is a no-op. There is however a mentioning of madvise(MADV_HUGEPAGE) however, and this sounds like it could be used. It is less explicit, but if internet, and Linux documentation does not lie, it sometimes works, for some Linuxes, thus perhaps can be used, to "commit memory"

Marko Mäkelä added a comment - 2025-03-19 13:51

HugeTLB pages are unavailable by default on Linux. You have to explicitly reserve physical memory for it to be able to use it:

echo 4|sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

echo 1024|sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Based on my testing, madvise(MADV_DONTNEED) cannot shrink hugepage mappings and release such mappings to the operating system.

Maybe Microsoft Windows can defer the allocation of page mappings until a TLB miss, but Linux appears to populate the page mappings immediately.

Marko Mäkelä added a comment - 2025-03-19 13:51 HugeTLB pages are unavailable by default on Linux. You have to explicitly reserve physical memory for it to be able to use it: echo 4|sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages echo 1024|sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Based on my testing, madvise(MADV_DONTNEED) cannot shrink hugepage mappings and release such mappings to the operating system. Maybe Microsoft Windows can defer the allocation of page mappings until a TLB miss, but Linux appears to populate the page mappings immediately.

Vladislav Vaintroub added a comment - 2025-03-19 11:42

marko I just think you can't reserve "large pages" on Linux, and that MAP_NORESERVE does not work for them. So, the large pages are not resizable, and one should not attempt to reserve that.

Vladislav Vaintroub added a comment - 2025-03-19 11:42 marko I just think you can't reserve "large pages" on Linux, and that MAP_NORESERVE does not work for them. So, the large pages are not resizable, and one should not attempt to reserve that.

View 32 older comments

People

Assignee:: Marko Mäkelä

Reporter:: Daniel Black

Votes:: 1 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 2022-09-02 09:16

Updated:: Yesterday 05:46

Resolved:: 6 days ago 15:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration