The equivalent patch for 10.4 would be as follows. It should apply to 10.3 as well, but krunalbauskar is right, the performance impact could be a mixed bag, and we’d better limit us to newer versions. Besides, within the scope of MDEV-14374 on 10.3, this alternative had been considered and rejected (on a different ARMv8 implementation):
diff --git a/include/my_cpu.h b/include/my_cpu.h
|
index b7d7008a8e3..0b51d3ef90f 100644
|
--- a/include/my_cpu.h
|
+++ b/include/my_cpu.h
|
@@ -53,6 +53,7 @@
|
#ifdef _WIN32
|
#elif defined HAVE_PAUSE_INSTRUCTION
|
#elif defined(_ARCH_PWR8)
|
+#elif defined __GNUC__ && (defined __arm__ || defined __aarch64__)
|
#else
|
# include "my_atomic.h"
|
#endif
|
@@ -80,6 +81,9 @@ static inline void MY_RELAX_CPU(void)
|
#endif
|
#elif defined(_ARCH_PWR8)
|
__ppc_get_timebase();
|
+#elif defined __GNUC__ && (defined __arm__ || defined __aarch64__)
|
+ /* Mainly, prevent the compiler from optimizing away delay loops */
|
+ __asm__ __volatile__ ("":::"memory")
|
#else
|
int32 var, oldval = 0;
|
my_atomic_cas32_strong_explicit(&var, &oldval, 1, MY_MEMORY_ORDER_RELAXED,
|
Marko,
On ARM as quoted on channel as per benchmarking switching to simple barrier (vs CAS) helps improve performance.
For other architecture, we should benchmark it and then decide.