Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29445

reorganise innodb buffer pool (and remove buffer pool chunks)

Details

    Description

      The InnoDB buffer pool had been allocated in multiple chunks, because SET GLOBAL innodb_buffer_pool_size would extend the buffer pool in chunks. This would lead to many limitations, such as the inability to shrink the buffer pool below innodb_buffer_pool_chunk_size.

      It would be cleaner to:

      • allocate a contiguous virtual address range for a maximum supported size of buffer pool (a new parameter innodb_buffer_pool_size_max, which defaults to the initially specified innodb_buffer_pool_size)
      • allow the innodb_buffer_pool_size to be changed in increments of 1 megabyte
      • define a fixed mapping between the virtual memory addresses of buffer page descriptors page frames, to fix bugs like MDEV-34677 and MDEV-35485
      • refactor the shrinking of the buffer pool to provide more meaningful progress output and to avoid hangs

      The complicated logic of having multiple buffer pool chunks can be removed, and the parameter innodb_buffer_pool_chunk_size will be deprecated and ignored.

      Attachments

        Issue Links

          Activity

            marko I just think you can't reserve "large pages" on Linux, and that MAP_NORESERVE does not work for them. So, the large pages are not resizable, and one should not attempt to reserve that.

            wlad Vladislav Vaintroub added a comment - marko I just think you can't reserve "large pages" on Linux, and that MAP_NORESERVE does not work for them. So, the large pages are not resizable, and one should not attempt to reserve that.

            HugeTLB pages are unavailable by default on Linux. You have to explicitly reserve physical memory for it to be able to use it:

            echo 4|sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
            echo 1024|sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
            

            Based on my testing, madvise(MADV_DONTNEED) cannot shrink hugepage mappings and release such mappings to the operating system.

            Maybe Microsoft Windows can defer the allocation of page mappings until a TLB miss, but Linux appears to populate the page mappings immediately.

            marko Marko Mäkelä added a comment - HugeTLB pages are unavailable by default on Linux. You have to explicitly reserve physical memory for it to be able to use it: echo 4|sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages echo 1024|sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Based on my testing, madvise(MADV_DONTNEED) cannot shrink hugepage mappings and release such mappings to the operating system. Maybe Microsoft Windows can defer the allocation of page mappings until a TLB miss, but Linux appears to populate the page mappings immediately.
            wlad Vladislav Vaintroub added a comment - - edited

            So, I understand this correctly, that one can't "reserve address space" on Linux for large pages, but only allocate them immediately, i.e among other things, MAP_NORESERVE is a no-op.

            There is however a mentioning of madvise(MADV_HUGEPAGE) however, and this sounds like it could be used. It is less explicit, but if internet, and Linux documentation does not lie, it sometimes works, for some Linuxes, thus perhaps can be used, to "commit memory"

            wlad Vladislav Vaintroub added a comment - - edited So, I understand this correctly, that one can't "reserve address space" on Linux for large pages, but only allocate them immediately, i.e among other things, MAP_NORESERVE is a no-op. There is however a mentioning of madvise(MADV_HUGEPAGE) however, and this sounds like it could be used. It is less explicit, but if internet, and Linux documentation does not lie, it sometimes works, for some Linuxes, thus perhaps can be used, to "commit memory"
            marko Marko Mäkelä added a comment -

            madvise(MADV_HUGEPAGE) is something for enabling Transparent Huge Pages (THP). When the large_pages interface is being used, we are allocating explicit huge pages with mmap(). I think that if we were to experiment with madvise(MADV_HUGEPAGE), it should be tied to a configuration parameter that is disabled by default.

            marko Marko Mäkelä added a comment - madvise(MADV_HUGEPAGE) is something for enabling Transparent Huge Pages (THP). When the large_pages interface is being used, we are allocating explicit huge pages with mmap() . I think that if we were to experiment with madvise(MADV_HUGEPAGE) , it should be tied to a configuration parameter that is disabled by default.
            marko Marko Mäkelä added a comment -

            The changes made many crash recovery tests hang in a Valgrind environment. I was able to reproduce the problem locally. I applied a fixup that is reducing the problem at least to some extent. The underlying issue is that the default Valgrind Memcheck tool uses an unfair scheduler. If a thread is waiting other threads to do something, thread context switches must be enforced by suitable system calls.

            marko Marko Mäkelä added a comment - The changes made many crash recovery tests hang in a Valgrind environment. I was able to reproduce the problem locally. I applied a fixup that is reducing the problem at least to some extent. The underlying issue is that the default Valgrind Memcheck tool uses an unfair scheduler. If a thread is waiting other threads to do something, thread context switches must be enforced by suitable system calls.

            People

              marko Marko Mäkelä
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.