Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.5, 10.6, 10.11, 11.4
-
i686, x86-64
Description
There are some compiler bugs around some single-bit test-and-set or test-and-reset operations, mostly affecting the most significant bit of a word.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
https://github.com/llvm/llvm-project/issues/37322
That is, when the compilers generate code for the IA-32 or AMD64 ISA, they would emit a loop around the LOCK CMPXCHG instruction, instead of emitting LOCK BTS, LOCK BTR, LOCK OR or LOCK AND. Here is a test program:
#include <atomic>
|
uint32_t fa(std::atomic<uint32_t>&a) { return a.fetch_and(~(1U<<31)) & 1U<<31;} |
uint32_t fo(std::atomic<uint32_t>&a) { return a.fetch_or(1U<<31) & 1U<<31;} |
Starting with clang++-15 or GCC 7.1 (the first GA release of that GCC series), no LOCK CMPXCHG will be generated for the above.
Fortunately, the oldest major GCC version that is included in any currently supported GNU/Linux distribution is 7. Also, FreeBSD 14 is using clang version 16. So, there should not be any need to worry about older versions of these compilers anymore.
However, even the most recent version of MSVC would generate LOCK CMPXCHG for these. We would be forced to use _interlockedbittestandset() or _interlockedbittestandreset() on that platform in order to avoid a performance regression.