[MDEV-22749] Implement portable PCLMUL accelerated crc32() with Intel intrinsics - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Fix Version/s: 10.5.7
Component/s: mariabackup, Server
Labels:
- performance
Environment:
cross-platform, gcc, clang, msvc, amd64

Description

The pclmul acceleration for the zlib crc32() function that was cleaned up in ~~MDEV-22641~~ is not available on Windows, because the code is using GCC-style inline assembler. Intel intrinsics should be used.

Attachments

Issue Links

blocks

MDEV-19935 Create unified CRC-32 interface

Closed

MDEV-33817 Implement AVX512BW and VPCLMULQDQ based CRC-32 algorithms

Closed

causes

MDEV-23680 Assertion `data' failed in crcr32_calc_pclmulqdq

Closed

MDEV-24745 Fallback CRC-32C computation wrongly uses SSE4.1 instructions

Closed

is blocked by

MDEV-22641 Provide SIMD optimized wrapper for zlib crc32()

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2020-08-27 15:18 - edited

I would prefer to see something like this in the implementation. The approach was demonstrated in ~~MDEV-20386~~. For GCC 4, we must work around broken header files.

#include <wmmintrin.h>

#ifdef __GNUC__

__attribute__((target("sse4.2,pclmul")))

#endif

static unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)

    __m128i a,b,c;

…

    c=_mm_clmulepi64_si128 (a,b,1);

…

For GCC 4 (the oldest that we currently support is 4.8.2), some extra hoops are needed to replace the broken #include header:

#include <string.h>

#include <stdint.h>

#if defined __GNUC__ && !defined __clang__ && __GNUC__ < 5

# define _mm_clmulepi64_si128 __builtin_ia32_pclmulqdq128

typedef long long __attribute((vector_size(16))) __m128i;

#else

# include <wmmintrin.h>

#endif

#ifdef __GNUC__

__attribute__((target("sse4.2,pclmul")))

#endif

unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size)

    __m128i a,b,c;

    c=_mm_clmulepi64_si128 (a,b,1);

    return 0;

I would find it risky to compile the entire compilation unit with something that enables many SIMD features that may not be available on the least common denominator (SSE2 for AMD64; Intel 80486 or possibly something newer for IA-32). We do not want the compiler to accidentally enable some exotic instructions for any ‘glue’ code that is supposed to run anywhere.

Marko Mäkelä added a comment - 2020-08-27 15:18 - edited I would prefer to see something like this in the implementation. The approach was demonstrated in MDEV-20386 . For GCC 4, we must work around broken header files . #include <wmmintrin.h> #ifdef __GNUC__ __attribute__((target( "sse4.2,pclmul" ))) #endif static unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size) { __m128i a,b,c; … c=_mm_clmulepi64_si128 (a,b,1); … } For GCC 4 (the oldest that we currently support is 4.8.2), some extra hoops are needed to replace the broken #include header: #include <string.h> #include <stdint.h> #if defined __GNUC__ && !defined __clang__ && __GNUC__ < 5 # define _mm_clmulepi64_si128 __builtin_ia32_pclmulqdq128 typedef long long __attribute((vector_size(16))) __m128i; #else # include <wmmintrin.h> #endif #ifdef __GNUC__ __attribute__((target( "sse4.2,pclmul" ))) #endif unsigned crc32c_pclmul(unsigned crc32c, void *data, size_t size) { __m128i a,b,c; c=_mm_clmulepi64_si128 (a,b,1); return 0; } I would find it risky to compile the entire compilation unit with something that enables many SIMD features that may not be available on the least common denominator (SSE2 for AMD64; Intel 80486 or possibly something newer for IA-32). We do not want the compiler to accidentally enable some exotic instructions for any ‘glue’ code that is supposed to run anywhere.

Vladislav Vaintroub added a comment - 2020-09-04 11:31

I do not see why anew feature and a workaround for the said feature will help code to be better. It only adds more code, we will need compiler options for clang-cl, and there are already compiler options for ARM etc. So, if anything, I would prefer not to add both attributes and GCC intrinsics for the broken attributes, to reduce complexity, and increase portability.

I can't imagine that compiler will accidentially add pclmul in cpuid test (crc32_pclmul_enabled), which is single function in question in this particular compilation unit

Vladislav Vaintroub added a comment - 2020-09-04 11:31 I do not see why anew feature and a workaround for the said feature will help code to be better. It only adds more code, we will need compiler options for clang-cl, and there are already compiler options for ARM etc. So, if anything, I would prefer not to add both attributes and GCC intrinsics for the broken attributes, to reduce complexity, and increase portability. I can't imagine that compiler will accidentially add pclmul in cpuid test (crc32_pclmul_enabled), which is single function in question in this particular compilation unit

People

Assignee:: Vladislav Vaintroub

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2020-05-29 09:23

Updated:: 2024-04-03 10:41

Resolved:: 2020-09-04 22:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server