The applications usually benefit from remapping .text and .data ELF sections to huge pages. The performance speedup comes form significant reduce of iTLB and dTLB misses. Of course, the approach isn't new, the example implementations at the moment are:
- libhugetlbfs: https://github.com/libhugetlbfs/libhugetlbfs/blob/master/elflink.c ('remap_segments' function)
- Google: https://chromium.googlesource.com/chromium/src/+/refs/heads/master/chromeos/hugepage_text/hugepage_text.cc ('RemapHugetlbText*' functions)
- Facebook: https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/program-functions.cpp ('HugifyText' function)
libhugetlbfs uses huge pages, meanwhile Google/Facebook rely on transparent huge pages. We decided to follow the approach which is used by libhugetlbfs, since it has less dependency on the particular kernel allocation/defragmentation algorithm, so provides more persistent results.
We tried libhugetlbfs, however currently it has four major drawbacks:
1. A bug with position independent executables (linked with '--pie' parameter): https://github.com/libhugetlbfs/libhugetlbfs/issues/49
2. It might potentially unmap heap segment which immediately follows data segment in popular OS systems (e.g. Linux).
3. It supports remapping of maximum 3 ELF segments.
4. No integration with the target application: it works silently right during the startup.
So the custom implementation is provided, well adjusted for the MySQL code base:
1. No issues with position independent code / additional virtual memory randomization.
2. Tested with lld/gold/bfd linkers.
3. Preserves heap segment from unmapping, tested with standard glibc and jemalloc allocators.
4. Since it's a part of mysqld code now, any number of segments could be specified (currently = 16).
5. Integration with the MySQL code base: configuration variable is used to turn the functionality on and current logging system for error/notification messages.
Performance increase is up to 9% in sysbench OLTP_PS.
1. Currently works with 2mb huge pages only.
2. Needs to be linked in a specific way (additional alignment for ELF segments).
3. Support is provided only for Linux systems (tested for kernels >= 3.10).
For more information refer to the documentation inside the sql/huge.cc (contains in the patch).
The patch is tested with the commit "5d4599f9750140f92cfdbbe4d292ae1b8dd456f8" (v10.6.0)
I submit this contribution under the New BSD License (in the compliance with https://mariadb.org/easier-licensing-for-mariadb-contributors).