Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.6.17, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.4
    • 10.11.8, 11.0.6, 11.1.5, 11.2.4, 11.4.2
    • Packaging, Server
    • None
    • registry.access.redhat.com/ubi8/ubi-minimal

    Description

      MariaDB-server-10.6.17-1.el8.src.rpm requires libpmem.so.1 which does not seem available in RHEL 8+ at all: https://pkgs.org/search/?q=libpmem

      Attachments

        Issue Links

          Activity

            Cyprus Socialite Dmitrii Odintcov created issue -
            Cyprus Socialite Dmitrii Odintcov made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -

            Based on MDEV-25124, the PMEM dependency in MariaDB Server 10.6 is rather useless: there hardly is any performance improvement.

            For 10.11 it is a different story, thanks to MDEV-14425 and MDEV-27774. However, Intel wound down its Optane business back in July 2022. This is the probable reason why libpmem is not available in RHEL.

            To fix this, we’d have to revise the build environment: remove cmake -DWITH_PMEM=ON and uninstall the libpmem packages, or add cmake -DCMAKE_DISABLE_FIND_PACKAGE_PMEM=1 so that the package cannot be found during the build.

            marko Marko Mäkelä added a comment - Based on MDEV-25124 , the PMEM dependency in MariaDB Server 10.6 is rather useless: there hardly is any performance improvement. For 10.11 it is a different story, thanks to MDEV-14425 and MDEV-27774 . However, Intel wound down its Optane business back in July 2022. This is the probable reason why libpmem is not available in RHEL. To fix this, we’d have to revise the build environment: remove cmake -DWITH_PMEM=ON and uninstall the libpmem packages, or add cmake -DCMAKE_DISABLE_FIND_PACKAGE_PMEM=1 so that the package cannot be found during the build.

            The libpmem and libpmem-devel packages are in the rhel-8-for-x86_64-appstream-rpms repository. So it is just a matter of enabling that repository if it is not enabled, e.g.:

            sudo subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms
            

            dbart Daniel Bartholomew added a comment - The libpmem and libpmem-devel packages are in the rhel-8-for-x86_64-appstream-rpms repository. So it is just a matter of enabling that repository if it is not enabled, e.g.: sudo subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms

            marko, pmem is only used in InnoDB. Do you want to stop building with it?

            serg Sergei Golubchik added a comment - marko , pmem is only used in InnoDB. Do you want to stop building with it?
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            danblack Daniel Black added a comment - - edited

            Let's remove from RHEL9+ (in buildbot) as its deprecated there https://access.redhat.com/solutions/7029076

            And keep for RHEL8.

            danblack Daniel Black added a comment - - edited Let's remove from RHEL9+ (in buildbot) as its deprecated there https://access.redhat.com/solutions/7029076 And keep for RHEL8.
            serg Sergei Golubchik made changes -
            Summary MariaDB Server on RHEL 8+ pmpm in RHEL 8+
            serg Sergei Golubchik made changes -
            Summary pmpm in RHEL 8+ libpmem in RHEL 8+
            serg Sergei Golubchik added a comment - - edited

            marko what do you think about commits 770b96a7840 and 19b3f17f2dd? it's 10.11+ only, though.

            Make sure to use --ignore-space-change, I've changed the indentation of few big blocks

            serg Sergei Golubchik added a comment - - edited marko what do you think about commits 770b96a7840 and 19b3f17f2dd? it's 10.11+ only, though. Make sure to use --ignore-space-change , I've changed the indentation of few big blocks
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ] Sergei Golubchik [ serg ]
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            I don’t think that this should be complicated further with another level of indirection that would seem to introduce additional runtime overhead on systems where libpmem never was available, such as any 32-bit targets.

            As far as I understand, PMEM may in the future be replaced with clx.mem, and something may support file systems and mount -o dax on such devices. Such a future may still be a few years ahead, or it may not arrive at all.

            An alternative could be to implement pmem_persist() ourselves; after all, it is just a wrapper for a few AMD64 instructions (to be precise, clflushopt or clwb followed by sfence if those instructions are available according to cpuid, and falling back to clflush). Yes, there is/was a libpmem available for ARMv8, POWER and RISC-V, but maybe no suitable hardware to run it on. For any ISA, it should be a fairly simple piece of code to force a cache write-back.

            It would seem to be easiest to just build RHEL packages without any dependency on libpmem. Should mount -o dax file systems on clx.mem devices start to become practical in a few years, then it should be possible to replace pmem_persist() or libpmem with something applicable.

            marko Marko Mäkelä added a comment - I don’t think that this should be complicated further with another level of indirection that would seem to introduce additional runtime overhead on systems where libpmem never was available, such as any 32-bit targets. As far as I understand, PMEM may in the future be replaced with clx.mem , and something may support file systems and mount -o dax on such devices. Such a future may still be a few years ahead, or it may not arrive at all. An alternative could be to implement pmem_persist() ourselves; after all, it is just a wrapper for a few AMD64 instructions (to be precise, clflushopt or clwb followed by sfence if those instructions are available according to cpuid , and falling back to clflush ). Yes, there is/was a libpmem available for ARMv8, POWER and RISC-V, but maybe no suitable hardware to run it on. For any ISA, it should be a fairly simple piece of code to force a cache write-back. It would seem to be easiest to just build RHEL packages without any dependency on libpmem . Should mount -o dax file systems on clx.mem devices start to become practical in a few years, then it should be possible to replace pmem_persist() or libpmem with something applicable.
            marko Marko Mäkelä made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Sergei Golubchik [ serg ]

            There is no indirection when libpmem is not installed, in this case have_pmem() returns false and InnoDB uses non-pmem code.

            And these commits, exactly, make the server independent from libpmem.

            Or let's just drop pmem support, if nobody uses or needs it.

            serg Sergei Golubchik added a comment - There is no indirection when libpmem is not installed, in this case have_pmem() returns false and InnoDB uses non-pmem code. And these commits, exactly, make the server independent from libpmem. Or let's just drop pmem support, if nobody uses or needs it.

            I would call replacing a compile-time check with a run-time one an indirection.

            It is easiest to not specify cmake -DWITH_PMEM=ON. I see that currently only the scripts for Debian-like builds specify that, which is fine as long as libpmem remains available in Debian or its derivatives, such as Ubuntu.

            I have understood that elsewhere, we just build what happens to be installed in the build environment, which seems to go against the idea of reproducible builds and SBOM. The dependency could be removed by removing the libpmem development package, or by specifying cmake -DCMAKE_DISABLE_FIND_PACKAGE_PMEM=1 when the CMakeLists.txt is being generated.

            I think that the PMEM code path makes debugging and testing much more convenient, thanks to the "fake PMEM" tweak, which allows us to use memory-mapped ib_logfile0 in /dev/shm. Instead of having to set breakpoints on log file reads or writes, it is possible to simply set hardware watchpoints on log_sys.buf in rr replay to find out when a particular log record was read or written. Bypassing the pread or pwrite system calls can also make regression tests run faster. Disabling PMEM on RHEL 9 would add some diversity on our CI systems: at least some 64-bit Linux targets would then be using the file system I/O calls for the redo log.

            marko Marko Mäkelä added a comment - I would call replacing a compile-time check with a run-time one an indirection. It is easiest to not specify cmake -DWITH_PMEM=ON . I see that currently only the scripts for Debian-like builds specify that, which is fine as long as libpmem remains available in Debian or its derivatives, such as Ubuntu. I have understood that elsewhere, we just build what happens to be installed in the build environment, which seems to go against the idea of reproducible builds and SBOM. The dependency could be removed by removing the libpmem development package, or by specifying cmake -DCMAKE_DISABLE_FIND_PACKAGE_PMEM=1 when the CMakeLists.txt is being generated. I think that the PMEM code path makes debugging and testing much more convenient, thanks to the "fake PMEM" tweak, which allows us to use memory-mapped ib_logfile0 in /dev/shm . Instead of having to set breakpoints on log file reads or writes, it is possible to simply set hardware watchpoints on log_sys.buf in rr replay to find out when a particular log record was read or written. Bypassing the pread or pwrite system calls can also make regression tests run faster. Disabling PMEM on RHEL 9 would add some diversity on our CI systems: at least some 64-bit Linux targets would then be using the file system I/O calls for the redo log.

            Without run-time detection InnoDB was using pmem code path when no supporting hardware was available. Was it how it should've been? If yes, I can, of course, remove the run-time detection and always go the "kinda-pmem-even-if-without-hardware" path

            I definitely can just disable pmem in packages, but it'd be better to avoid different feature sets in DEB and RPM. Let's disable it everywhere?

            serg Sergei Golubchik added a comment - Without run-time detection InnoDB was using pmem code path when no supporting hardware was available. Was it how it should've been? If yes, I can, of course, remove the run-time detection and always go the "kinda-pmem-even-if-without-hardware" path I definitely can just disable pmem in packages, but it'd be better to avoid different feature sets in DEB and RPM. Let's disable it everywhere?

            The hardware detection always was part of the Linux kernel. Even if the hardware is available, but the file system is mounted without -o dax, the mmap operation with MAP_SYNC (and MAP_SHARED_VALIDATE) would fail. It is also possible to simulate PMEM with normal DRAM by using some special boot option. The clflush instruction (which is an overkill; it will unnecessarily evict all cache lines) is included in all implementations of the AMD64 ISA.

            The set of enabled features is up to the Linux distributors. If there was no problem on Debian and its derivatives, I would not touch them.

            marko Marko Mäkelä added a comment - The hardware detection always was part of the Linux kernel. Even if the hardware is available, but the file system is mounted without -o dax , the mmap operation with MAP_SYNC (and MAP_SHARED_VALIDATE ) would fail. It is also possible to simulate PMEM with normal DRAM by using some special boot option. The clflush instruction (which is an overkill; it will unnecessarily evict all cache lines) is included in all implementations of the AMD64 ISA. The set of enabled features is up to the Linux distributors. If there was no problem on Debian and its derivatives, I would not touch them.

            so, how shall we proceed? distributions decide what features to enable in their packages, but ours should have a uniform feature set. Options:

            • disable pmem in our packages
            • provider plugin with run-time detection
            • provider plugin without run-time detection, always use the pmem_persist code path, pmem_persist is a no-op without a library
            • reimplement pmem_persist internally (I cannot do that, will reassign the issue back)
            serg Sergei Golubchik added a comment - so, how shall we proceed? distributions decide what features to enable in their packages, but ours should have a uniform feature set. Options: disable pmem in our packages provider plugin with run-time detection provider plugin without run-time detection, always use the pmem_persist code path, pmem_persist is a no-op without a library reimplement pmem_persist internally (I cannot do that, will reassign the issue back)

            I think that we can reimplement pmem_persist() ourselves and remove the library dependency. It The operation is a combination of flush and fence operations, where the flush is invoked on every cache line that is covered by the address range. I checked all implementations of pmem2_arch_init() in src/libpmem2/*/init.c of https://github.com/pmem/pmdk/ and found the following machine instructions:

            ISA flush fence
            ARMv8 dc cvac, %0 dmb ishst
            ARMv8.2-A dc cvap, %0 dmb ishst
            AMD64 clflush %0 none
            AMD64(clflushopt) clflushopt %0 (0x66 clflush) _mm_sfence()
            AMD64(clwb) clwb %0 (0x66 xsaveopt) _mm_sfence()
            POWER dcbstps %0 sync 4
            loongarch none dbar 0
            RISC-V none fence w,w
            marko Marko Mäkelä added a comment - I think that we can reimplement pmem_persist() ourselves and remove the library dependency. It The operation is a combination of flush and fence operations, where the flush is invoked on every cache line that is covered by the address range. I checked all implementations of pmem2_arch_init() in src/libpmem2/*/init.c of https://github.com/pmem/pmdk/ and found the following machine instructions: ISA flush fence ARMv8 dc cvac, %0 dmb ishst ARMv8.2-A dc cvap, %0 dmb ishst AMD64 clflush %0 none AMD64(clflushopt) clflushopt %0 (0x66 clflush ) _mm_sfence() AMD64(clwb) clwb %0 (0x66 xsaveopt ) _mm_sfence() POWER dcbstps %0 sync 4 loongarch none dbar 0 RISC-V none fence w,w
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Marko Mäkelä [ marko ]
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Daniel Black [ danblack ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            I reviewed the instruction encodings for POWER on https://godbolt.org. I also ended up implementing this for Loongarch, even though our Debian packaging had not enabled libpmem for it previously.

            marko Marko Mäkelä added a comment - I reviewed the instruction encodings for POWER on https://godbolt.org . I also ended up implementing this for Loongarch, even though our Debian packaging had not enabled libpmem for it previously.
            marko Marko Mäkelä made changes -
            Assignee Daniel Black [ danblack ] Vladislav Vaintroub [ wlad ]

            For the 10.6 series, the solution could be simply to stop linking against libpmem. As noted in MDEV-25124, there were no measurable performance gains in 10.6 because of the ib_logfile0 format limitations that were lifted in MDEV-14425.

            marko Marko Mäkelä added a comment - For the 10.6 series, the solution could be simply to stop linking against libpmem . As noted in MDEV-25124 , there were no measurable performance gains in 10.6 because of the ib_logfile0 format limitations that were lifted in MDEV-14425 .
            marko Marko Mäkelä made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 10.6 [ 24028 ]
            Affects Version/s 10.11 [ 27614 ]
            Affects Version/s 11.0 [ 28320 ]
            Affects Version/s 11.1 [ 28549 ]
            Affects Version/s 11.2 [ 28603 ]
            Affects Version/s 11.4 [ 29301 ]
            wlad Vladislav Vaintroub made changes -
            Assignee Vladislav Vaintroub [ wlad ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            This change in MariaDB Server 10.11 and later will remove the libpmem dependency along with the build parameter WITH_PMEM, and introduces a new Boolean parameter WITH_INNODB_PMEM, which defaults to ON on 64-bit x86, ARM and POWER.

            Because our CI coverage lacks RISC-V and Loongarch, we disable this code by default on those architectures.

            From MariaDB Server 10.6, MDEV-32791 will remove the libpmem dependency without any replacement.

            marko Marko Mäkelä added a comment - This change in MariaDB Server 10.11 and later will remove the libpmem dependency along with the build parameter WITH_PMEM , and introduces a new Boolean parameter WITH_INNODB_PMEM , which defaults to ON on 64-bit x86, ARM and POWER. Because our CI coverage lacks RISC-V and Loongarch, we disable this code by default on those architectures. From MariaDB Server 10.6, MDEV-32791 will remove the libpmem dependency without any replacement.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2024-04-19 08:41:16.0 2024-04-19 08:41:16.22
            marko Marko Mäkelä made changes -
            Fix Version/s 10.11.8 [ 29630 ]
            Fix Version/s 11.0.6 [ 29628 ]
            Fix Version/s 11.1.5 [ 29629 ]
            Fix Version/s 11.2.4 [ 29631 ]
            Fix Version/s 11.4.2 [ 29633 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            People

              marko Marko Mäkelä
              Cyprus Socialite Dmitrii Odintcov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.