Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22749

Implement portable PCLMUL accelerated crc32() with Intel intrinsics

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.5.7
    • mariabackup, Server
    • cross-platform, gcc, clang, msvc, amd64

    Description

      The pclmul acceleration for the zlib crc32() function that was cleaned up in MDEV-22641 is not available on Windows, because the code is using GCC-style inline assembler. Intel intrinsics should be used.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            I would prefer to see something like this in the implementation. The approach was demonstrated in MDEV-20386. For GCC 4, we must work around broken header files.

            #include <wmmintrin.h>
             
            #ifdef __GNUC__
            __attribute__((target("sse4.2,pclmul")))
            #endif
            static unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)
            {
                __m128i a,b,c;
                c=_mm_clmulepi64_si128 (a,b,1);
            }
            

            For GCC 4 (the oldest that we currently support is 4.8.2), some extra hoops are needed to replace the broken #include header:

            #include <string.h>
            #include <stdint.h>
             
            #if defined __GNUC__ && !defined __clang__ && __GNUC__ < 5
            # define _mm_clmulepi64_si128 __builtin_ia32_pclmulqdq128
            typedef long long __attribute((vector_size(16))) __m128i;
            #else
            # include <wmmintrin.h>
            #endif
             
            #ifdef __GNUC__
            __attribute__((target("sse4.2,pclmul")))
            #endif
            unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)
            {
                __m128i a,b,c;
                c=_mm_clmulepi64_si128 (a,b,1);
                return 0;
            }
            

            I would find it risky to compile the entire compilation unit with something that enables many SIMD features that may not be available on the least common denominator (SSE2 for AMD64; Intel 80486 or possibly something newer for IA-32). We do not want the compiler to accidentally enable some exotic instructions for any ‘glue’ code that is supposed to run anywhere.

            marko Marko Mäkelä added a comment - - edited I would prefer to see something like this in the implementation. The approach was demonstrated in MDEV-20386 . For GCC 4, we must work around broken header files . #include <wmmintrin.h>   #ifdef __GNUC__ __attribute__((target( "sse4.2,pclmul" ))) #endif static unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size) { __m128i a,b,c; … c=_mm_clmulepi64_si128 (a,b,1); … } For GCC 4 (the oldest that we currently support is 4.8.2), some extra hoops are needed to replace the broken #include header: #include <string.h> #include <stdint.h>   #if defined __GNUC__ && !defined __clang__ && __GNUC__ < 5 # define _mm_clmulepi64_si128 __builtin_ia32_pclmulqdq128 typedef long long __attribute((vector_size(16))) __m128i; #else # include <wmmintrin.h> #endif   #ifdef __GNUC__ __attribute__((target( "sse4.2,pclmul" ))) #endif unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size) { __m128i a,b,c; c=_mm_clmulepi64_si128 (a,b,1); return 0; } I would find it risky to compile the entire compilation unit with something that enables many SIMD features that may not be available on the least common denominator (SSE2 for AMD64; Intel 80486 or possibly something newer for IA-32). We do not want the compiler to accidentally enable some exotic instructions for any ‘glue’ code that is supposed to run anywhere.

            I do not see why anew feature and a workaround for the said feature will help code to be better. It only adds more code, we will need compiler options for clang-cl, and there are already compiler options for ARM etc. So, if anything, I would prefer not to add both attributes and GCC intrinsics for the broken attributes, to reduce complexity, and increase portability.

            I can't imagine that compiler will accidentially add pclmul in cpuid test (crc32_pclmul_enabled), which is single function in question in this particular compilation unit

            wlad Vladislav Vaintroub added a comment - I do not see why anew feature and a workaround for the said feature will help code to be better. It only adds more code, we will need compiler options for clang-cl, and there are already compiler options for ARM etc. So, if anything, I would prefer not to add both attributes and GCC intrinsics for the broken attributes, to reduce complexity, and increase portability. I can't imagine that compiler will accidentially add pclmul in cpuid test (crc32_pclmul_enabled), which is single function in question in this particular compilation unit

            People

              wlad Vladislav Vaintroub
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.