Details

    Description

      The MariaDB code base contains quite a few implementations of CRC-32, some (or all?) of them using the CRC-32C polynomial. It would be good to create a uniform interface and remove any code duplication.

      • There is a crc32() function defined somewhere. Is it always the same, or does the bundled zlib use something else?
      • Mariabackup defines crc32_intel_pclmul(), which is enabled on AMD64 based on CPUID flags.
      • InnoDB defines its own things in ut_crc32_init(), most recently an interface to ARMv8 (AArch64, ARM64) crc32c_aarch64(). Apparently this function is not being used anywhere else.
      • MDEV-9872 introduced POWER crc32_vpmsum() that was later (in MariaDB Server 10.3) replaced with a C-based implementation.
      • A recent update of MyRocks introduced a duplicated implementation of crc32_vpmsum(), noticed by me due to build breakage (MDEV-19830).

      At the very least, we should have a common CRC-32C implementation on all platforms and remove ut0crc32.cc from InnoDB code base. (Maybe it is not worth touching the code in the bundled zlib.)

      If other CRC-32 polynomials are needed, then we should define a common interface for those as well.

      Attachments

        Issue Links

          Activity

            An unified interface (with acceleration) for the zlib crc32() function will be introduced by MDEV-22641, using the function my_checksum().

            After that, what remains to be done (in this task) is unifying the interface to CRC-32C (using the Castagnoli polynomial), which is used by MyRocks (RocksDB) and InnoDB. The RocksDB implementation is superior to InnoDB’s, because it can make use of the pclmul instruction, which apparently can outperform the SSE4.2 crc32 instructions that InnoDB is using.

            The run-time check for the pclmul instruction should use the MDEV-22641 predicate crc32_pclmul_enabled().

            marko Marko Mäkelä added a comment - An unified interface (with acceleration) for the zlib crc32() function will be introduced by MDEV-22641 , using the function my_checksum() . After that, what remains to be done (in this task) is unifying the interface to CRC-32C (using the Castagnoli polynomial), which is used by MyRocks (RocksDB) and InnoDB. The RocksDB implementation is superior to InnoDB’s, because it can make use of the pclmul instruction, which apparently can outperform the SSE4.2 crc32 instructions that InnoDB is using. The run-time check for the pclmul instruction should use the MDEV-22641 predicate crc32_pclmul_enabled() .
            wlad Vladislav Vaintroub added a comment - MyRocks uses both CRC32 instruction and pclmul , as described in https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf

            The following works at least starting with GCC 4.8.2 and clang 4.0.0. Note: I am only using this for illustration purposes; I am not suggesting to use any GCC-style built-in functions:

            __attribute__((target("sse4.2")))
            static unsigned crc32c_sse42(unsigned crc, unsigned data)
            {
            return __builtin_ia32_crc32qi(crc, data);
            }
            __attribute__((target("pclmul")))
            static unsigned crc32c_pclmul(unsigned crc, unsigned data)
            {
            return 42; // FIXME: Use pclmul
            }
            unsigned crc32c(unsigned crc, unsigned data)
            {
              if (crc & 1) return crc32c_sse42(crc,data);
              if (crc & 2) return crc32c_pclmul(crc,data);
              return crc^data;
            }
            

            I think that using the target attribute is preferred to compiling the entire compilation unit with special flags. We do not want the compiler to accidentally use some SIMD instructions for unrelated parts of the code. This technique would also allow us to keep all code for different ISA dialects in the same compilation unit.

            Side note: defining

            __attribute__((target_clones("sse4.2,pclmul,default")))
            

            seems to be a dead end, because it is not supported on even the newest clang, and only supported starting with GCC 6.

            marko Marko Mäkelä added a comment - The following works at least starting with GCC 4.8.2 and clang 4.0.0. Note: I am only using this for illustration purposes; I am not suggesting to use any GCC-style built-in functions: __attribute__((target( "sse4.2" ))) static unsigned crc32c_sse42(unsigned crc, unsigned data) { return __builtin_ia32_crc32qi(crc, data); } __attribute__((target( "pclmul" ))) static unsigned crc32c_pclmul(unsigned crc, unsigned data) { return 42; // FIXME: Use pclmul } unsigned crc32c(unsigned crc, unsigned data) { if (crc & 1) return crc32c_sse42(crc,data); if (crc & 2) return crc32c_pclmul(crc,data); return crc^data; } I think that using the target attribute is preferred to compiling the entire compilation unit with special flags. We do not want the compiler to accidentally use some SIMD instructions for unrelated parts of the code. This technique would also allow us to keep all code for different ISA dialects in the same compilation unit. Side note: defining __attribute__((target_clones( "sse4.2,pclmul,default" ))) seems to be a dead end, because it is not supported on even the newest clang , and only supported starting with GCC 6.

            Side note: The Galera library appears to include its own implementations:

            strings /usr/lib/galera/libgalera_smm.so|grep CRC-32C
            

            galera-4 (26.4.5-1)

            CRC-32C: using hardware acceleration.
            CRC-32C: using "slicing-by-8" algorithm.
            unexpected CRC-32C implementation.
            Using CRC-32C for message checksums.
            

            marko Marko Mäkelä added a comment - Side note: The Galera library appears to include its own implementations: strings /usr/lib/galera/libgalera_smm.so|grep CRC-32C galera-4 (26.4.5-1) CRC-32C: using hardware acceleration. CRC-32C: using "slicing-by-8" algorithm. unexpected CRC-32C implementation. Using CRC-32C for message checksums.

            after talking to marko, decided to push into 10.5, as the revised CRC32C implementation (CRC32+PCLMULQDQ) also gives nice speedup on for innodb checksum calculation on x64.

            Nice speedup amounts to something like 2x faster, in my benchmarks with aligned 16K pages

            wlad Vladislav Vaintroub added a comment - after talking to marko , decided to push into 10.5, as the revised CRC32C implementation (CRC32+PCLMULQDQ) also gives nice speedup on for innodb checksum calculation on x64. Nice speedup amounts to something like 2x faster, in my benchmarks with aligned 16K pages

            People

              wlad Vladislav Vaintroub
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.