[MDEV-28162] clang compiler emits atomic library calls on x86/32bit Created: 2022-03-23  Updated: 2023-12-22

Status: Open
Project: MariaDB Server
Component/s: Performance Schema
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Minor
Reporter: KHEM RAJ Assignee: Nikita Malyavin
Resolution: Unresolved Votes: 0
Labels: alignment, beginner-friendly, c++

Attachments: File clang-64bit-atomics.patch    

 Description   

Clang expects 8-byte alignment for some 64-bit atomic operations
in some 32-bit targets. Native instruction lock cmpxchg8b (for x86)
should only require 4-byte alignment.

As a result laterst maridb fails to build with clang on 32bit/x86 architecture because it
can not find the missing symbol

undefined reference to `__atomic_fetch_or_8'

here is a shortened testcase that can reproduce the problem with clang

#include <stdint.h>
# ifdef __GNUC__
typedef __attribute__((__aligned__(8))) uint64_t ATOMIC_U64;
# else
typedef uint64_t ATOMIC_U64;
# endif
 
int main() {return 0;}
 
#ifdef BUGGY
uint64_t foo(uint64_t *val, uint64_t op)
#else
uint64_t foo(ATOMIC_U64 *val, uint64_t op)
#endif
{
    return __atomic_or_fetch(val, op, __ATOMIC_ACQ_REL);
}
 

compiles ok with `gcc -m32` or `clang -m32` fine but interesting thing happens with

`gcc -DBUGGY -m32` or `clang -DBUGGY -m32`

and you see the difference. Clang says

/tmp/a.c:17:12: warning: misaligned atomic operation may incur significant performance penalty; the expected alignment (8 bytes) exceeds the actual alignment (4 bytes) [-Watomic-alignment]
    return __atomic_or_fetch(val, op, __ATOMIC_ACQ_REL);
           ^
1 warning generated.
/usr/bin/ld: /tmp/a-4a5d79.o: in function `foo':
a.c:(.text+0x6b): undefined reference to `__atomic_fetch_or_8'
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
 

but gcc sails along.

so I guess we need to make sure that any data being accessed atomically is aligned to 64bit as well to
help the compiler a bit here.



 Comments   
Comment by KHEM RAJ [ 2022-03-23 ]

I have attached a patch which fixes it.

Comment by Marko Mäkelä [ 2022-03-24 ]

Thank you. Starting with MariaDB Server 10.4, the my_atomic wrappers should not be used in C++ code, but the C++11 std::atomic should be used instead.

I am curious about the claim that the Intel Pentium lock cmpxchg8b (infamous of the f00f bug) would only require 4-byte alignment. Is that really so? The external data bus of the Pentium is 64 bits wide. How would a misaligned lock cmpxchg8b work in a multiprocessor environment? I found a claim that the Pentium 75 would support multiprocessing with up to 2 CPUs.

Also, the suggested patch is checking for the preprocessor symbol _GNUC. All clang versions that I have used (I think, between versions 4 and 14) predefine GNUC_ as 4. So, the patch would affect both gcc and clang.

If I understood the patch correctly, it would explicitly declare 8-byte alignment for the 64-bit atomic types.

As far as I can tell by executing the following:

git grep 'my_atomic_.*long'

the macros are unused in MariaDB Server 10.4 and later. In MariaDB Server 10.2 and 10.3 (which are still not EOL) some of those macros are invoked by InnoDB. I do not think that it is necessary to touch those old versions, unless someone can demonstrate a failure.

I think that the easiest fix of this is to remove the unused macros for 64-bit atomics.

Comment by Marko Mäkelä [ 2022-03-24 ]

The following definitions became unused in MDEV-17441:

diff --git a/include/my_atomic.h b/include/my_atomic.h
index 81da9e35cf9..25ae772ebc3 100644
--- a/include/my_atomic.h
+++ b/include/my_atomic.h
@@ -115,22 +115,6 @@
 #include "atomic/gcc_builtins.h"
 #endif
 
-#if SIZEOF_LONG == 4
-#define my_atomic_addlong(A,B) my_atomic_add32((int32*) (A), (B))
-#define my_atomic_loadlong(A) my_atomic_load32((int32*) (A))
-#define my_atomic_loadlong_explicit(A,O) my_atomic_load32_explicit((int32*) (A), (O))
-#define my_atomic_storelong(A,B) my_atomic_store32((int32*) (A), (B))
-#define my_atomic_faslong(A,B) my_atomic_fas32((int32*) (A), (B))
-#define my_atomic_caslong(A,B,C) my_atomic_cas32((int32*) (A), (int32*) (B), (C))
-#else
-#define my_atomic_addlong(A,B) my_atomic_add64((int64*) (A), (B))
-#define my_atomic_loadlong(A) my_atomic_load64((int64*) (A))
-#define my_atomic_loadlong_explicit(A,O) my_atomic_load64_explicit((int64*) (A), (O))
-#define my_atomic_storelong(A,B) my_atomic_store64((int64*) (A), (B))
-#define my_atomic_faslong(A,B) my_atomic_fas64((int64*) (A), (B))
-#define my_atomic_caslong(A,B,C) my_atomic_cas64((int64*) (A), (int64*) (B), (C))
-#endif
-
 #ifndef MY_MEMORY_ORDER_SEQ_CST
 #define MY_MEMORY_ORDER_RELAXED
 #define MY_MEMORY_ORDER_CONSUME

The 64-bit atomic wrappers that these macros refer to are only being invoked by static member functions of class PFS_atomic, defined in the header file storage/perfschema/pfs_atomic.h. That file should be removed and the PFS_atomic replaced with something based on C++11 std::atomic, perhaps Atomic_relaxed. That should allow all matches of

git grep 'my_atomic.*64'

to be removed.

khem, can you please submit a GitHub pull request for replacing the PFS_atomic and removing all 64-bit my_atomic functions?

The 32-bit atomic wrappers are being used much more, and also they could be replaced with std::atomic in C++ code (but not C). That should be done in a separate ticket.

Comment by Marko Mäkelä [ 2022-03-24 ]

I removed the unused definitions. This bug must remain open until the PFS_atomic has been replaced.

Generated at Thu Feb 08 09:58:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.