[MDEV-37147] ARMv8 -moutline-atomics is suboptimal - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.11, 11.4, 11.8
Fix Version/s: 10.11
Component/s: Storage Engine - InnoDB
Labels:
- ARMv8
- performance
Environment:
ARMv8.1-A

Bug Category:
Related to performance

Description

I thought that I would check how the atomic memory access operations in the executables that we distribute are actually implemented. I built MariaDB Server 10.11 in a Debian 12 environment:

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo /source

gdb sql/mariadbd

disassemble mtr_t::finish_writer<false>

It turns out that the -moutline-atomics, which was the way to enable Large System Extensions, is actually checking the availability CPU feature on every single function call. Not to mention that each simple instruction, such as ldadd for std::atomic::fetch_add() is being replaced with a call to a library function.

Here is an excerpt of the above code:

   0x0000000000cab42c <+76>:	cmp	x2, #0x0

   0x0000000000cab430 <+80>:	csel	x27, x27, x1, eq	// eq = none

   0x0000000000cab434 <+84>:	ubfx	x22, x0, #4, #1

   0x0000000000cab438 <+88>:	mov	x1, x23

   0x0000000000cab43c <+92>:	mov	x0, x20

   0x0000000000cab440 <+96>:	bl	0xee86d0 <__aarch64_ldadd8_relax>

   0x0000000000cab444 <+100>:	and	x0, x0, #0x3ffffffff

   0x0000000000cab448 <+104>:	cmp	x19, x0

   0x0000000000cab44c <+108>:	b.ls	0xcab570 <_ZN5mtr_t13finish_writerILb0EEESt4pairImNS_16page_flush_aheadEEPS_m+400>  // b.plast

If I go ahead and compile the code with

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_{C,CXX}_FLAGS='-march=armv8.1-a' /source

then the function call will be replaced with the bare instruction, bringing it closer to what we are running on AMD64:

   0x0000000000ca807c <+76>:    cmp x2, #0x0

   0x0000000000ca8080 <+80>:    csel    x27, x27, x1, eq    // eq = none

   0x0000000000ca8084 <+84>:    ubfx    x23, x0, #4, #1

   0x0000000000ca8088 <+88>:    ldadd   x20, x0, [x21]

   0x0000000000ca808c <+92>:    and x0, x0, #0x3ffffffff

   0x0000000000ca8090 <+96>:    cmp x19, x0

   0x0000000000ca8094 <+100>:   b.ls    0xca81b8 <_ZN5mtr_t13finish_writerILb0EEESt4pairImNS_16page_flush_aheadEEPS_m+392>  // b.plast

This is 8 bytes shorter in the caller. The library function seems to detect the availability of the CPU feature on each and every call. We have the LSE version at +16 and the compare-exchange loop at +28.

Dump of assembler code for function __aarch64_ldadd8_relax:

   0x0000000000ee86d0 <+0>: bti c

   0x0000000000ee86d4 <+4>: adrp    x16, 0x2138000 <_ZN4ShowL17user_stats_fieldsE+1080>

   0x0000000000ee86d8 <+8>: ldrb    w16, [x16, #3369]

   0x0000000000ee86dc <+12>:    cbz w16, 0xee86e8 <__aarch64_ldadd8_relax+24>

   0x0000000000ee86e0 <+16>:    ldadd   x0, x0, [x1]

   0x0000000000ee86e4 <+20>:    ret

   0x0000000000ee86e8 <+24>:    mov x16, x0

   0x0000000000ee86ec <+28>:    ldxr    x0, [x1]

   0x0000000000ee86f0 <+32>:    add x17, x0, x16

   0x0000000000ee86f4 <+36>:    stxr    w15, x17, [x1]

   0x0000000000ee86f8 <+40>:    cbnz    w15, 0xee86ec <__aarch64_ldadd8_relax+28>

   0x0000000000ee86fc <+44>:    ret

I think that it would be better to instantiate the mtr_t::finish_writer template as well as some other functions for multiple ISA targets. This particular function already is being invoked via the function pointer mtr_t::finisher.

In a non-scientific experiment (single-treaded test on persistent storage), I observed a 2% performance improvement. The environment that I used for analysis is not suitable for performance testing.

Attachments

Issue Links

relates to

MDEV-21923 LSN allocation is a bottleneck

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Assigned for Implementation:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2025-07-03 08:28

Updated:: 2025-07-03 09:15

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.