[MDEV-24630] MY_RELAX_CPU assembly instruction upgrade/research for memory barrier on ARM Created: 2021-01-20 Updated: 2021-06-01 Resolved: 2021-03-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5.8 |
| Fix Version/s: | 10.5.10, 10.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | ZhaoBo | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ARM, ARMv8, performance | ||
| Environment: |
ARMv8 |
||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
From the latest Mysql 8.0.23(2021/1/18) release, we found there are the same optimization towards MY_RELAX_CPU. See below In current situation for Mariadb server, we introduced For the specific instruction, such as DMB/DSB/ISB(https://developer.arm.com/documentation/dui0489/c/CIHGHHIE), we need to deep analysis or test on ARM, which one is good for ARM. So this JIRA ticket will trace the test result and following fix/upgrade for this goal. But let's wait for whether this extra operation is visible in some benchmarks. |
| Comments |
| Comment by Daniel Black [ 2021-01-20 ] | |
|
"isb" is an odd choice and it throws a bunch of work to neighbouring CPUS to do work to delay the current thread. This would seem harmful in the way CAS was. | |
| Comment by Marko Mäkelä [ 2021-01-20 ] | |
|
bzhaoopenstack, thank you. I remember discussing the dummy barrier
with some people in the past, possibly svoj who was working on ARMv8 optimizations (MDEV-14442). If I remember correctly, the conclusion was that this statement has two effects:
Starting with MariaDB Server 10.4, where InnoDB uses C++11 std::atomic, the compiler-memory barrier should not be necessary. (But, there is also some code that uses the C wrapper functions my_atomic_ where the barrier may be necessary.) So, the main effect of the statement should be to prevent the ut_delay() loop from being optimized away. In I see that the -moutline-atomics flag was already added to MariaDB (10.4.14, 10.5.5). So, the outstanding issue is whether the isb instruction would be beneficial. I think that the impact should be demonstrated with benchmarking. Please note that the 10.4, 10.5, and 10.6 branches are expected to perform rather differently. In the latest 10.5, the buf_pool.page_hash uses a lighter-weight rw_lock implementation ( In 10.6, all mutexes were replaced with standard ones ( For demonstrating the impact of the proposed change, I would like to see benchmark results especially on the latest 10.6 development snapshot. There is still one caveat that I am currently working on. Until | |
| Comment by Krunal Bauskar [ 2021-03-30 ] | |
|
Marko, I tried to benchmark <isb> approach. 10.5 the said patch helps across the board. 10.6 results are mixed so I would not suggest it to be applied for 10.6 but only limit it to 10.5. | |
| Comment by Marko Mäkelä [ 2021-03-30 ] | |
|
krunalbauskar, thank you. I am happy to merge any pull request that is guaranteed to have no impact on other architectures than ARM, to any branch. If some change is specific to 10.5 only, then I would like to see the 10.6 version at the same time, to help with merging. If a change might potentially affect the AMD64 ISA, then the performance impact would have to be tested carefully. At MariaDB, we only have AMD64 based hardware for running benchmarks. | |
| Comment by Marko Mäkelä [ 2021-03-30 ] | |
|
krunalbauskar’s fix is now in 10.5 and null-merged to 10.6. | |
| Comment by Daniel Black [ 2021-05-28 ] | |
|
bzhaoopenstack, krunalbauskar, is there a more specific define for where isb exists? Debian wants to revert this because of an armhf failure - https://salsa.debian.org/mariadb-team/mariadb-10.5/-/blob/master/debian/patches/revert-isb-assembly-instruction.patch#L1 | |
| Comment by Krunal Bauskar [ 2021-05-28 ] | |
|
Daniel, Seems like debian has support for 3 different arm machine and probably one which is failing is a pretty old version which doesn't support the said instruction. So we can make it specific to _aarch64_ (as per the arm official reference manual ISB should be defined by all ARMv8 processors). Do let me know if we should submit the patch or you can direct make the changes. ============================================================= Debian/armhf works only on newer 32-bit ARM processors which implement at least the ARMv7 architecture with version 3 of the ARM vector floating point specification (VFPv3). It makes use of the extended features and performance enhancements available on these models. Debian/arm64 works on 64-bit ARM processors which implement at least the ARMv8 architecture. | |
| Comment by Vicențiu Ciorbaru [ 2021-05-28 ] | |
|
FYI, I've created |