[MDEV-22749] Implement portable PCLMUL accelerated crc32() with Intel intrinsics Created: 2020-05-29  Updated: 2021-04-13  Resolved: 2020-09-04

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Server
Affects Version/s: 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.5.7

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: performance
Environment:

cross-platform, gcc, clang, msvc, amd64


Issue Links:
Blocks
blocks MDEV-19935 Create unified CRC-32 interface Closed
is blocked by MDEV-22641 Provide SIMD optimized wrapper for zl... Closed
Problem/Incident
causes MDEV-23680 Assertion `data' failed in crcr32_cal... Closed
causes MDEV-24745 Fallback CRC-32C computation wrongly ... Closed

 Description   

The pclmul acceleration for the zlib crc32() function that was cleaned up in MDEV-22641 is not available on Windows, because the code is using GCC-style inline assembler. Intel intrinsics should be used.



 Comments   
Comment by Marko Mäkelä [ 2020-08-27 ]

I would prefer to see something like this in the implementation. The approach was demonstrated in MDEV-20386. For GCC 4, we must work around broken header files.

#include <wmmintrin.h>
 
#ifdef __GNUC__
__attribute__((target("sse4.2,pclmul")))
#endif
static unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)
{
    __m128i a,b,c;
    c=_mm_clmulepi64_si128 (a,b,1);
}

For GCC 4 (the oldest that we currently support is 4.8.2), some extra hoops are needed to replace the broken #include header:

#include <string.h>
#include <stdint.h>
 
#if defined __GNUC__ && !defined __clang__ && __GNUC__ < 5
# define _mm_clmulepi64_si128 __builtin_ia32_pclmulqdq128
typedef long long __attribute((vector_size(16))) __m128i;
#else
# include <wmmintrin.h>
#endif
 
#ifdef __GNUC__
__attribute__((target("sse4.2,pclmul")))
#endif
unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)
{
    __m128i a,b,c;
    c=_mm_clmulepi64_si128 (a,b,1);
    return 0;
}

I would find it risky to compile the entire compilation unit with something that enables many SIMD features that may not be available on the least common denominator (SSE2 for AMD64; Intel 80486 or possibly something newer for IA-32). We do not want the compiler to accidentally enable some exotic instructions for any ‘glue’ code that is supposed to run anywhere.

Comment by Vladislav Vaintroub [ 2020-09-04 ]

I do not see why anew feature and a workaround for the said feature will help code to be better. It only adds more code, we will need compiler options for clang-cl, and there are already compiler options for ARM etc. So, if anything, I would prefer not to add both attributes and GCC intrinsics for the broken attributes, to reduce complexity, and increase portability.

I can't imagine that compiler will accidentially add pclmul in cpuid test (crc32_pclmul_enabled), which is single function in question in this particular compilation unit

Generated at Thu Feb 08 09:17:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.