Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28162

clang compiler emits atomic library calls on x86/32bit

Details

    Description

      Clang expects 8-byte alignment for some 64-bit atomic operations
      in some 32-bit targets. Native instruction lock cmpxchg8b (for x86)
      should only require 4-byte alignment.

      As a result laterst mariadb fails to build with clang on 32bit/x86 architecture because it
      can not find the missing symbol

      undefined reference to `__atomic_fetch_or_8'

      here is a shortened testcase that can reproduce the problem with clang

      #include <stdint.h>
      # ifdef __GNUC__
      typedef __attribute__((__aligned__(8))) uint64_t ATOMIC_U64;
      # else
      typedef uint64_t ATOMIC_U64;
      # endif
       
      int main() {return 0;}
       
      #ifdef BUGGY
      uint64_t foo(uint64_t *val, uint64_t op)
      #else
      uint64_t foo(ATOMIC_U64 *val, uint64_t op)
      #endif
      {
          return __atomic_or_fetch(val, op, __ATOMIC_ACQ_REL);
      }
       
      
      

      compiles ok with `gcc -m32` or `clang -m32` fine but interesting thing happens with

      `gcc -DBUGGY -m32` or `clang -DBUGGY -m32`

      and you see the difference. Clang says

      /tmp/a.c:17:12: warning: misaligned atomic operation may incur significant performance penalty; the expected alignment (8 bytes) exceeds the actual alignment (4 bytes) [-Watomic-alignment]
          return __atomic_or_fetch(val, op, __ATOMIC_ACQ_REL);
                 ^
      1 warning generated.
      /usr/bin/ld: /tmp/a-4a5d79.o: in function `foo':
      a.c:(.text+0x6b): undefined reference to `__atomic_fetch_or_8'
      clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
       
      
      

      but gcc sails along.

      so I guess we need to make sure that any data being accessed atomically is aligned to 64bit as well to
      help the compiler a bit here.

      Attachments

        Issue Links

          Activity

            khem KHEM RAJ added a comment -

            I have attached a patch which fixes it.

            khem KHEM RAJ added a comment - I have attached a patch which fixes it.

            Thank you. Starting with MariaDB Server 10.4, the my_atomic wrappers should not be used in C++ code, but the C++11 std::atomic should be used instead.

            I am curious about the claim that the Intel Pentium lock cmpxchg8b (infamous of the f00f bug) would only require 4-byte alignment. Is that really so? The external data bus of the Pentium is 64 bits wide. How would a misaligned lock cmpxchg8b work in a multiprocessor environment? I found a claim that the Pentium 75 would support multiprocessing with up to 2 CPUs.

            Also, the suggested patch is checking for the preprocessor symbol _GNUC. All clang versions that I have used (I think, between versions 4 and 14) predefine GNUC_ as 4. So, the patch would affect both gcc and clang.

            If I understood the patch correctly, it would explicitly declare 8-byte alignment for the 64-bit atomic types.

            As far as I can tell by executing the following:

            git grep 'my_atomic_.*long'
            

            the macros are unused in MariaDB Server 10.4 and later. In MariaDB Server 10.2 and 10.3 (which are still not EOL) some of those macros are invoked by InnoDB. I do not think that it is necessary to touch those old versions, unless someone can demonstrate a failure.

            I think that the easiest fix of this is to remove the unused macros for 64-bit atomics.

            marko Marko Mäkelä added a comment - Thank you. Starting with MariaDB Server 10.4, the my_atomic wrappers should not be used in C++ code, but the C++11 std::atomic should be used instead. I am curious about the claim that the Intel Pentium lock cmpxchg8b (infamous of the f00f bug ) would only require 4-byte alignment. Is that really so? The external data bus of the Pentium is 64 bits wide. How would a misaligned lock cmpxchg8b work in a multiprocessor environment? I found a claim that the Pentium 75 would support multiprocessing with up to 2 CPUs. Also, the suggested patch is checking for the preprocessor symbol _ GNUC . All clang versions that I have used (I think, between versions 4 and 14) predefine GNUC _ as 4. So, the patch would affect both gcc and clang . If I understood the patch correctly, it would explicitly declare 8-byte alignment for the 64-bit atomic types. As far as I can tell by executing the following: git grep 'my_atomic_.*long' the macros are unused in MariaDB Server 10.4 and later. In MariaDB Server 10.2 and 10.3 (which are still not EOL) some of those macros are invoked by InnoDB. I do not think that it is necessary to touch those old versions, unless someone can demonstrate a failure. I think that the easiest fix of this is to remove the unused macros for 64-bit atomics.

            The following definitions became unused in MDEV-17441:

            diff --git a/include/my_atomic.h b/include/my_atomic.h
            index 81da9e35cf9..25ae772ebc3 100644
            --- a/include/my_atomic.h
            +++ b/include/my_atomic.h
            @@ -115,22 +115,6 @@
             #include "atomic/gcc_builtins.h"
             #endif
             
            -#if SIZEOF_LONG == 4
            -#define my_atomic_addlong(A,B) my_atomic_add32((int32*) (A), (B))
            -#define my_atomic_loadlong(A) my_atomic_load32((int32*) (A))
            -#define my_atomic_loadlong_explicit(A,O) my_atomic_load32_explicit((int32*) (A), (O))
            -#define my_atomic_storelong(A,B) my_atomic_store32((int32*) (A), (B))
            -#define my_atomic_faslong(A,B) my_atomic_fas32((int32*) (A), (B))
            -#define my_atomic_caslong(A,B,C) my_atomic_cas32((int32*) (A), (int32*) (B), (C))
            -#else
            -#define my_atomic_addlong(A,B) my_atomic_add64((int64*) (A), (B))
            -#define my_atomic_loadlong(A) my_atomic_load64((int64*) (A))
            -#define my_atomic_loadlong_explicit(A,O) my_atomic_load64_explicit((int64*) (A), (O))
            -#define my_atomic_storelong(A,B) my_atomic_store64((int64*) (A), (B))
            -#define my_atomic_faslong(A,B) my_atomic_fas64((int64*) (A), (B))
            -#define my_atomic_caslong(A,B,C) my_atomic_cas64((int64*) (A), (int64*) (B), (C))
            -#endif
            -
             #ifndef MY_MEMORY_ORDER_SEQ_CST
             #define MY_MEMORY_ORDER_RELAXED
             #define MY_MEMORY_ORDER_CONSUME
            

            The 64-bit atomic wrappers that these macros refer to are only being invoked by static member functions of class PFS_atomic, defined in the header file storage/perfschema/pfs_atomic.h. That file should be removed and the PFS_atomic replaced with something based on C++11 std::atomic, perhaps Atomic_relaxed. That should allow all matches of

            git grep 'my_atomic.*64'
            

            to be removed.

            khem, can you please submit a GitHub pull request for replacing the PFS_atomic and removing all 64-bit my_atomic functions?

            The 32-bit atomic wrappers are being used much more, and also they could be replaced with std::atomic in C++ code (but not C). That should be done in a separate ticket.

            marko Marko Mäkelä added a comment - The following definitions became unused in MDEV-17441 : diff --git a/include/my_atomic.h b/include/my_atomic.h index 81da9e35cf9..25ae772ebc3 100644 --- a/include/my_atomic.h +++ b/include/my_atomic.h @@ -115,22 +115,6 @@ #include "atomic/gcc_builtins.h" #endif -#if SIZEOF_LONG == 4 -#define my_atomic_addlong(A,B) my_atomic_add32((int32*) (A), (B)) -#define my_atomic_loadlong(A) my_atomic_load32((int32*) (A)) -#define my_atomic_loadlong_explicit(A,O) my_atomic_load32_explicit((int32*) (A), (O)) -#define my_atomic_storelong(A,B) my_atomic_store32((int32*) (A), (B)) -#define my_atomic_faslong(A,B) my_atomic_fas32((int32*) (A), (B)) -#define my_atomic_caslong(A,B,C) my_atomic_cas32((int32*) (A), (int32*) (B), (C)) -#else -#define my_atomic_addlong(A,B) my_atomic_add64((int64*) (A), (B)) -#define my_atomic_loadlong(A) my_atomic_load64((int64*) (A)) -#define my_atomic_loadlong_explicit(A,O) my_atomic_load64_explicit((int64*) (A), (O)) -#define my_atomic_storelong(A,B) my_atomic_store64((int64*) (A), (B)) -#define my_atomic_faslong(A,B) my_atomic_fas64((int64*) (A), (B)) -#define my_atomic_caslong(A,B,C) my_atomic_cas64((int64*) (A), (int64*) (B), (C)) -#endif - #ifndef MY_MEMORY_ORDER_SEQ_CST #define MY_MEMORY_ORDER_RELAXED #define MY_MEMORY_ORDER_CONSUME The 64-bit atomic wrappers that these macros refer to are only being invoked by static member functions of class PFS_atomic , defined in the header file storage/perfschema/pfs_atomic.h . That file should be removed and the PFS_atomic replaced with something based on C++11 std::atomic , perhaps Atomic_relaxed . That should allow all matches of git grep 'my_atomic.*64' to be removed. khem , can you please submit a GitHub pull request for replacing the PFS_atomic and removing all 64-bit my_atomic functions? The 32-bit atomic wrappers are being used much more, and also they could be replaced with std::atomic in C++ code (but not C). That should be done in a separate ticket.

            I removed the unused definitions. This bug must remain open until the PFS_atomic has been replaced.

            marko Marko Mäkelä added a comment - I removed the unused definitions . This bug must remain open until the PFS_atomic has been replaced.

            Note: 64-bit atomic read-modify-write operations would map to loops around the venerable lock cmpxchg8b instruction that is known for the Pentium F00F bug and the reason why Windows XP would require some tweaks to run on the 80486. Therefore, I do not think that 64-bit read-modify-writes are possible on IA-32 without specifying at least -march=i586. Loads and stores seem to be possible by using some floating point instructions, such as fildll and fistpll, which I encountered the other day while diagnosing MDEV-33972.

            marko Marko Mäkelä added a comment - Note: 64-bit atomic read-modify-write operations would map to loops around the venerable lock cmpxchg8b instruction that is known for the Pentium F00F bug and the reason why Windows XP would require some tweaks to run on the 80486 . Therefore, I do not think that 64-bit read-modify-writes are possible on IA-32 without specifying at least -march=i586 . Loads and stores seem to be possible by using some floating point instructions, such as fildll and fistpll , which I encountered the other day while diagnosing MDEV-33972 .
            marko Marko Mäkelä added a comment - - edited

            The problem that is described in the Description is related to split locks, that is, using atomic operations on unaligned data, in the worst case spanning two cache lines. I might have expected the ABI to require uint64_t* parameters to point to aligned data, but x86 was traditionally very relaxed about this. Basically only some SIMD instructions really require strict alignment.

            marko Marko Mäkelä added a comment - - edited The problem that is described in the Description is related to split locks , that is, using atomic operations on unaligned data, in the worst case spanning two cache lines. I might have expected the ABI to require uint64_t* parameters to point to aligned data, but x86 was traditionally very relaxed about this. Basically only some SIMD instructions really require strict alignment.
            nikitamalyavin Nikita Malyavin added a comment - - edited

            According to https://www.felixcloutier.com/x86/lock#:~:text=If%20the%20LOCK%20prefix%20is%20used%20with%20an%20instruction%20not%20listed

            MOV can't be used with LOCK, so I guess atomic::store/load might be broken for unaligned uint64_t.

            More interesting insights are in godbolt.org/z/K8sf5hf6P
            Clang/gcc/icc use XCHG without LOCK
            MSVC 17 uses movq xmm, which seems the most interesting and promising

            nikitamalyavin Nikita Malyavin added a comment - - edited According to https://www.felixcloutier.com/x86/lock#:~:text=If%20the%20LOCK%20prefix%20is%20used%20with%20an%20instruction%20not%20listed MOV can't be used with LOCK, so I guess atomic::store/load might be broken for unaligned uint64_t . More interesting insights are in godbolt.org/z/K8sf5hf6P Clang/gcc/icc use XCHG without LOCK MSVC 17 uses movq xmm, which seems the most interesting and promising

            People

              nikitamalyavin Nikita Malyavin
              khem KHEM RAJ
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.