Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24051

Remapping .text and .data application segments to huge pages

Details

    Description

      The applications usually benefit from remapping .text and .data ELF sections to huge pages. The performance speedup comes form significant reduce of iTLB and dTLB misses. Of course, the approach isn't new, the example implementations at the moment are:

      libhugetlbfs uses huge pages, meanwhile Google/Facebook rely on transparent huge pages. We decided to follow the approach which is used by libhugetlbfs, since it has less dependency on the particular kernel allocation/defragmentation algorithm, so provides more persistent results.

      We tried libhugetlbfs, however currently it has four major drawbacks:
      1. A bug with position independent executables (linked with '--pie' parameter): https://github.com/libhugetlbfs/libhugetlbfs/issues/49
      2. It might potentially unmap heap segment which immediately follows data segment in popular OS systems (e.g. Linux).
      3. It supports remapping of maximum 3 ELF segments.
      4. No integration with the target application: it works silently right during the startup.

      So the custom implementation is provided, well adjusted for the MySQL code base:
      1. No issues with position independent code / additional virtual memory randomization.
      2. Tested with lld/gold/bfd linkers.
      3. Preserves heap segment from unmapping, tested with standard glibc and jemalloc allocators.
      4. Since it's a part of mysqld code now, any number of segments could be specified (currently = 16).
      5. Integration with the MySQL code base: configuration variable is used to turn the functionality on and current logging system for error/notification messages.

      Performance increase is up to 9% in sysbench OLTP_PS.

      Restrictions:
      1. Currently works with 2mb huge pages only.
      2. Needs to be linked in a specific way (additional alignment for ELF segments).
      3. Support is provided only for Linux systems (tested for kernels >= 3.10).

      For more information refer to the documentation inside the sql/huge.cc (contains in the patch).

      The patch is tested with the commit "5d4599f9750140f92cfdbbe4d292ae1b8dd456f8" (v10.6.0)

      I submit this contribution under the New BSD License (in the compliance with https://mariadb.org/easier-licensing-for-mariadb-contributors).

      Attachments

        Issue Links

          Activity

            Happy New Year!

            We shared our experience in remapping code segments to huge pages in this article and opened code on github. The published code significantly differs from the patch shared by me a year ago: it became simpler and more robust. Since the ticket is still open, I think it would be useful for your project.

            P.S. For Russian speaking people there's a Russian blog post on habr.

            dmitriy.philimonov Dmitriy Philimonov added a comment - Happy New Year! We shared our experience in remapping code segments to huge pages in this article and opened code on github . The published code significantly differs from the patch shared by me a year ago: it became simpler and more robust. Since the ticket is still open, I think it would be useful for your project. P.S. For Russian speaking people there's a Russian blog post on habr .
            danblack Daniel Black added a comment -

            Happy new year dmitriy.philimonov, FWIW I was looking the LD_RELOAD path with mmap_ksm.c with an intent to look at modifying the appropriate flags for MMAP in the as a model for this and for KSM (multiple mariadb instances). I haven't quite got it working. I'll look into your code too.

            danblack Daniel Black added a comment - Happy new year dmitriy.philimonov , FWIW I was looking the LD_RELOAD path with mmap_ksm.c with an intent to look at modifying the appropriate flags for MMAP in the as a model for this and for KSM (multiple mariadb instances). I haven't quite got it working. I'll look into your code too.
            danblack Daniel Black added a comment -

            FYI From glibc-2.35

            • On Linux, a new tunable, glibc.malloc.hugetlb, can be used to
              either make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk
              or to use huge pages directly with mmap calls with the MAP_HUGETLB
              flags). The former can improve performance when Transparent Huge Pages
              is set to 'madvise' mode while the latter uses the system reserved
              huge pages.
            danblack Daniel Black added a comment - FYI From glibc-2.35 On Linux, a new tunable, glibc.malloc.hugetlb, can be used to either make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk or to use huge pages directly with mmap calls with the MAP_HUGETLB flags). The former can improve performance when Transparent Huge Pages is set to 'madvise' mode while the latter uses the system reserved huge pages.

            I suspect code bloat and other things leading to more TLB misses is a bigger problem for MySQL than for MariaDB. I suppose that is good news.
            https://smalldatum.blogspot.com/2024/10/how-to-run-mysql-with-text-in-huge-pages.html

            mdcallag Mark Callaghan added a comment - I suspect code bloat and other things leading to more TLB misses is a bigger problem for MySQL than for MariaDB. I suppose that is good news. https://smalldatum.blogspot.com/2024/10/how-to-run-mysql-with-text-in-huge-pages.html
            danblack Daniel Black added a comment -

            Yep, nice write-up mdcallag. I didn't consider that linker values would directly impacting the loading to huge pages. Good to see the loader can take a hint.

            On related code concepts to get back to MDEV-21145 - linker scripts to put global vars together as mostly read only, are copied for every new connection (where session) or are frequently read in codepaths for branch conditions.

            danblack Daniel Black added a comment - Yep, nice write-up mdcallag . I didn't consider that linker values would directly impacting the loading to huge pages. Good to see the loader can take a hint. On related code concepts to get back to MDEV-21145 - linker scripts to put global vars together as mostly read only, are copied for every new connection (where session) or are frequently read in codepaths for branch conditions.

            People

              Unassigned Unassigned
              dmitriy.philimonov Dmitriy Philimonov
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.