There is a strong and persistent demand for queries to use more than a single core in various MariaDB Foundation ServerFest presentations by users and in other forums.
The ideal that autoparallelism will magicly appear when required as so far proved to be false.
An investigation of potential solutions arrives at OpenMP as one of the better options due the the following attributes:
- OpenMP 4.0 is fully supported for C, C++ and Fortran since GCC 4.9;
- Support for all non-offloading features of OpenMP 4.5 has been available since Clang 3.9
- Visual Studio - Starting in Visual Studio 2019 version 16.10 Preview 2, in addition to X64 support, we have added support for targeting LLVM’s OpenMP runtime library with -openmp:llvm for the x86 and ARM64 architectures
These compilers are readily available of supported Operating Systems. Being part of the compiler and the compiler runtime its not a significant dependency addition.
Even the 8+ year old OpenMP-4.0 specification supports a significant set of directives that will aid in the prototyping.
The introduction of parallel workloads can be done using largely #pragma omp directives. A lightweight standardised runtime component in #include <omp.h> provides access to lower level constructs. The richness of these #pragma directives allows the introduction of parallel programming without any significant changes to the existing code supporting an easy rebase if proved to be mature.
OpenMP implement operate a threadpooling based approach consistent to the existing server codebase. Postgres has lead the way in what can be achieved with parallel queries.
OpenMP has NUMA aware directives if required.
A significant number of training resources and commercially available training are available for OpenMP.
Using a number of scenarios (in subtasks) starting with stand alone operations like filesort we'll prototype and benchmark what is required to achieve parallel operations, the level of portability and what performance gains, can be achieved.
|implement concurrent buffer pool chunk initialization
|implement parallel for filesort - priority queue
|implement parallel for filesort - merge
|Rollback segment can be initialized parallel