[MDEV-27088] Server crash on ARM (WMM architecture) due to missing barriers in lf-hash Created: 2021-11-19  Updated: 2022-04-28  Resolved: 2021-11-30

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.7
Fix Version/s: 10.2.42, 10.3.33, 10.4.23, 10.5.14, 10.6.6, 10.7.2

Type: Bug Priority: Critical
Reporter: Martin Beck Assignee: Daniel Black
Resolution: Fixed Votes: 0
Labels: ARMv8, crash, hash, lock-free, synchronization
Environment:

ARMv8 architecture, Ubuntu bionic, LLVM-8


Attachments: File abstracted_crash_mwe.c     File abstracted_crash_mwe_fixed.c     File build.sh     File parallel.sh     File purgatory_fix.diff    
Issue Links:
PartOf
includes MDEV-23510 Crash in my_strnncoll_binary while ru... Closed
Relates
relates to MDEV-28430 lf_alloc isn't safe on aarch64 (or pp... In Review
relates to MDEV-12897 unit.lf failed in buildbot Closed

 Description   

MariaDB server crashes on ARM (weak memory model architecture) while concurrently executing l_find to load node->key and add_to_purgatory to store node->key = NULL. l_find then uses key (which is NULL), to pass it to a comparison function.

The specific problem is the out-of-order execution that happens on a weak memory model architecture. Two essential reorderings are possible, which need to be prevented.

a) As l_find has no barriers in place between the optimistic read of the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124, the processor can reorder the load to happen after the while-loop.

In that case, a concurrent thread executing add_to_purgatory on the same node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253 before key is loaded in l_find.

b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both CAS succeed, the key field is written to by add_to_purgatory. However, due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253 can be moved ahead of the two CAS operations, which makes the value of the local purgatory list stored by add_to_purgatory visible to all threads operating on the list. As the node is not marked as deleted yet, the same error occurs in l_find.

Stack trace

Thread 179 "lf-t" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffff9e7cf1d0 (LWP 1256877)]
0x0000fffff772c8d8 in memcmp () from /lib/aarch64-linux-gnu/libc.so.6
(gdb) bt
#0  0x0000fffff772c8d8 in memcmp () from /lib/aarch64-linux-gnu/libc.so.6
#1  0x0000000000405580 in my_strnncoll_binary (cs=<optimized out>, s=0x0, slen=4,
    t=0xffff9e7ce968 "\234\267\277xz", tlen=<optimized out>, t_is_prefix=<optimized out>)
    at /workspace/mariadb-server/strings/ctype-bin.c:87
#2  0x0000000000405038 in l_find (head=<optimized out>, cs=<optimized out>,
    hashnr=<optimized out>, key=<optimized out>, keylen=<optimized out>,
    cursor=<optimized out>, pins=<optimized out>, callback=<optimized out>)
    at /workspace/mariadb-server/mysys/lf_hash.c:133
#3  0x0000000000403bec in l_delete (head=0xffffe8003500, cs=0x434458 <my_charset_bin>,
    hashnr=807645945, key=<optimized out>, keylen=4, pins=<optimized out>)
    at /workspace/mariadb-server/mysys/lf_hash.c:231
#4  lf_hash_delete (pins=<optimized out>, key=<optimized out>, keylen=4,
    hash=<optimized out>) at /workspace/mariadb-server/mysys/lf_hash.c:455
#5  test_lf_hash (arg=<optimized out>)
    at /workspace/mariadb-server/unittest/mysys/lf-t.c:155
#6  0x0000fffff7b17088 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#7  0x0000fffff777fffc in ?? () from /lib/aarch64-linux-gnu/libc.so.6

In frame #1 my_strnncoll_binary gets s=0x0 as a source pointer from l_find and hands that to memcmp, which causes a SEGV upon NULL pointer input.

Reproduce

The crash can be observed by running the lf-t unittest, as well as the oltp_write sysbench benchmark upon the most recent repository version. The server is build using optimization level -O0 or -O1. No other change to the configuration is done. unittest/mysys/lf-t is run until the crash is triggered.

The probability of hitting the bug can be increased easily.

The build script (build.sh) used to generate the faulty binary is attached.
To increase the probability the following changes can be done.

Output

The current repository version produces a crash. (unfixed)

./lf-t
# N CPUs: 96
1..6
# Testing lf_hash (without my_thread_init) with 30 threads, 3000000 iterations... 
Bail out! Signal 11 thrown
# 6 tests planned,  0 failed,  0 was last executed

The version with the propsed fix finishes as expected.

./lf-t
# N CPUs: 96
1..6
# Testing lf_hash (without my_thread_init) with 30 threads, 3000000 iterations... 
# 0 mallocs, 30 pins in stack, 4096 hash size, 37857309 inserts, 721149 scans
ok 1 - tested lf_hash (without my_thread_init) in 82.0496 secs (0)
# 6 tests planned but only 1 executed

Solution

Find attached a diff that changes three accesses to be atomic.

  1. optimistic read of key in l_find lf_hash.cc#L117
  2. read of link for verification lf_hash.cc#L124
  3. write of key in add_to_purgatory lf_alloc-pin.c#L253

(1 & 2) The implicit SC fence of the atomic load of key and link disallows reordering the load of key after load of link. (3) The implicit RELEASE fence of the atomic write of key disallows reordering the write of key to happen before the CAS that removed the node from the list.

Attached is also a minimal working example for the bug, which was used to verify the fixed version with GenMC, a model checker for weak memory models. The model checker can be applied to the attached MWE to get an execution on a weak memory model architecture that leads to the bug. It was also applied to a much more complete extraction of the actual lock-free hash from MariaDB to verify the data structure. However, due to ease of presentation, only the MWE is included here. If there is further interest in the extracted lf-hash and its verification with GenMC, it can be supplied.

Relation to MDEV-23510

The problem described in MDEV-23510 is the result of the herein described bug.
However, the commit attached to it d30c1331a18d875e553f3fcf544997e4f33fb943, is not fixing the bug, but hiding it, partially.

The bug is not related to the cache line size, or splitting between two cache lines.
Still the commit affects the crash.
Moving link and key right next to each other in memory, plus removing the volatile declaration of link allows an optimization to take place during code generation for ARM that combines two load ldr instructions for consecutive memory positions into a single ldp instruction for loading a pair. This instruction loads link and key together in a single instruction. Reordering this instruction to happen after the loop verification (L124) is not possible, as the load of link needs to happen for the verification. As such the load of key cannot be reordered to be executed after verification.

However, depending on a specific compiler optimization (Aarch64 Load Store Optimization) for correctness of the algorithm is risky and the algorithm is still incorrect. This can easily be seen as optimization levels -O0 and -O1 do not produce assembly containing the necessary pair load ldp of link and key. As such, both versions do crash on ARM. This fragile state might break due to changes in the struct LF_SLIST, changes to the algorithm or compiler changes.



 Comments   
Comment by Daniel Black [ 2021-11-20 ]

Thansk mbeck for the detail and why the previous only partially worked.

dan@eq-arm1:~/build-mariadb-10.2$ /usr/bin/clang --version
Debian clang version 11.0.1-2
Target: aarch64-unknown-linux-gnu
Thread model: posix
 
dan@eq-arm1:~/build-mariadb-10.2$ cmake  ../mariadb-10.2/ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-O1" -DCMAKE_CXX_FLAGS=-O1
 
dan@eq-arm1:~/build-mariadb-10.2$ cmake --build . --verbose
[1/7] cd /home/dan/build-mariadb-10.2 && /usr/bin/cmake -P /home/dan/mariadb-10.2/cmake/info_bin.cmake
[2/7] cd /home/dan/build-mariadb-10.2 && /usr/bin/cmake -P /home/dan/mariadb-10.2/cmake/info_src.cmake
[3/7] cd /home/dan/build-mariadb-10.2 && /usr/bin/cmake -DCOMPILER=/usr/bin/clang -DSOURCE_DIR=/home/dan/mariadb-10.2 -DBINARY_DIR=/home/dan/build-mariadb-10.2 "-DABI_HEADERS=/home/dan/mariadb-10.2/include/mysql/plugin_audit.h;/home/dan/mariadb-10.2/include/mysql/plugin_ftparser.h;/home/dan/mariadb-10.2/include/mysql.h;/home/dan/mariadb-10.2/include/mysql/psi/psi_abi_v1.h;/home/dan/mariadb-10.2/include/mysql/psi/psi_abi_v2.h;/home/dan/mariadb-10.2/include/mysql/client_plugin.h;/home/dan/mariadb-10.2/include/mysql/plugin_auth.h;/home/dan/mariadb-10.2/include/mysql/plugin_password_validation.h;/home/dan/mariadb-10.2/include/mysql/plugin_encryption.h" -P /home/dan/mariadb-10.2/cmake/do_abi_check.cmake
[4/7] /usr/bin/clang -DDBUG_TRACE -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -Iinclude -I/home/dan/mariadb-10.2/include -I/home/dan/mariadb-10.2/unittest/mytap -O1 -fstack-protector --param=ssp-buffer-size=4 -O2 -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings   -DHAVE_YASSL -DYASSL_PREFIX -DHAVE_OPENSSL -DMULTI_THREADED -MD -MT unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -MF unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o.d -o unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -c /home/dan/mariadb-10.2/unittest/mysys/lf-t.c
[5/7] /usr/bin/clang -DDBUG_TRACE -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -Iinclude -I/home/dan/mariadb-10.2/include -I/home/dan/mariadb-10.2/unittest/mytap -O1 -fstack-protector --param=ssp-buffer-size=4 -O2 -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings   -DHAVE_YASSL -DYASSL_PREFIX -DHAVE_OPENSSL -DMULTI_THREADED -MD -MT unittest/mysys/CMakeFiles/my_atomic-t.dir/my_atomic-t.c.o -MF unittest/mysys/CMakeFiles/my_atomic-t.dir/my_atomic-t.c.o.d -o unittest/mysys/CMakeFiles/my_atomic-t.dir/my_atomic-t.c.o -c /home/dan/mariadb-10.2/unittest/mysys/my_atomic-t.c
[6/7] : && /usr/bin/clang++ -O1 -fstack-protector --param=ssp-buffer-size=4 -fno-rtti -O2 -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings -Wl,-z,relro,-z,now unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -o unittest/mysys/lf-t  -lpthread  unittest/mytap/libmytap.a  mysys/libmysys.a  dbug/libdbug.a  mysys/libmysys.a  dbug/libdbug.a  strings/libstrings.a  -lz  -lm  -ldl  -lpthread && :
[7/7] : && /usr/bin/clang++ -O1 -fstack-protector --param=ssp-buffer-size=4 -fno-rtti -O2 -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings -Wl,-z,relro,-z,now unittest/mysys/CMakeFiles/my_atomic-t.dir/my_atomic-t.c.o -o unittest/mysys/my_atomic-t  -lpthread  unittest/mytap/libmytap.a  mysys/libmysys.a  dbug/libdbug.a  mysys/libmysys.a  dbug/libdbug.a  strings/libstrings.a  -lz  -lm  -ldl  -lpthread && :
dan@eq-arm1:~/build-mariadb-10.2$ ./unittest/mysys/lf-t
# N CPUs: 160, atomic ops: gcc-atomics-smp
1..7
ok 1 - my_atomic_initialize() returned 0
# Testing lf_hash (without my_thread_init) with 160 threads, 300000 iterations... 
# 0 mallocs, 160 pins in stack, 4096 hash size, 16167030 inserts, 54494 scans
ok 2 - tested lf_hash (without my_thread_init) in 115.919 secs (0)
# 7 tests planned but only 2 executed
dan@eq-arm1:~/build-mariadb-10.2$ cmake --build .
[7/7] Linking CXX executable unittest/mysys/lf-t
dan@eq-arm1:~/build-mariadb-10.2$ ./unittest/mysys/lf-t
# N CPUs: 160, atomic ops: gcc-atomics-smp
1..7
ok 1 - my_atomic_initialize() returned 0
# Testing lf_hash (without my_thread_init) with 2 threads, 30000000 iterations... 
# 0 mallocs, 2 pins in stack, 2048 hash size, 14880152 inserts, 7568227 scans
ok 2 - tested lf_hash (without my_thread_init) in 126.009 secs (0)
# 7 tests planned but only 2 executed
dan@eq-arm1:~/build-mariadb-10.2$  /usr/bin/clang -DDBUG_TRACE -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -Iinclude -I/home/dan/mariadb-10.2/include -I/home/dan/mariadb-10.2/unittest/mytap -O1 -fstack-protector --param=ssp-buffer-size=4 -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings   -DHAVE_YASSL -DYASSL_PREFIX -DHAVE_OPENSSL -DMULTI_THREADED -MD -MT unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -MF unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o.d -o unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -c /home/dan/mariadb-10.2/unittest/mysys/lf-t.c
dan@eq-arm1:~/build-mariadb-10.2$ /usr/bin/clang++ -O1 -fstack-protector --param=ssp-buffer-size=4 -fno-rtti  -g -DNDEBUG -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wdeclaration-after-statement -Wextra -Wformat-security -Wno-init-self -Wno-null-conversion -Wno-unused-parameter -Wno-unused-private-field -Woverloaded-virtual -Wvla -Wwrite-strings -Wl,-z,relro,-z,now unittest/mysys/CMakeFiles/lf-t.dir/lf-t.c.o -o unittest/mysys/lf-t  -lpthread  unittest/mytap/libmytap.a  mysys/libmysys.a  dbug/libdbug.a  mysys/libmysys.a  dbug/libdbug.a  strings/libstrings.a  -lz  -lm  -ldl  -lpthread
dan@eq-arm1:~/build-mariadb-10.2$ ./unittest/mysys/lf-t
# N CPUs: 160, atomic ops: gcc-atomics-smp
1..7
ok 1 - my_atomic_initialize() returned 0
# Testing lf_hash (without my_thread_init) with 2 threads, 30000000 iterations... 
# 0 mallocs, 2 pins in stack, 1024 hash size, 14999984 inserts, 7490617 scans
ok 2 - tested lf_hash (without my_thread_init) in 107.189 secs (0)
# 7 tests planned but only 2 executed
 
dan@eq-arm1:~/build-mariadb-10.2$ more /proc/cpuinfo 
processor	: 0
BogoMIPS	: 50.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1

So I'm having trouble reproducing it. Your threads shows 30 despite being 2 in the descriptoin.

With changes, this does passes mtr and lf-t in loop

I do believe you however:

https://github.com/MariaDB/server/pull/1951 (10.2) - please check I did the casts right here.
https://github.com/MariaDB/server/pull/1952 (minor differences in 10.5 compared to 10.2)

Should the unit tests CYCLES be increased to your reproducer values?

Comment by Martin Beck [ 2021-11-20 ]

The number of CYCLES and THREADS did influence the probability of hitting the bug during execution. I just reran with 30 threads and a cycle count of 30000 and hit the bug within a few seconds.

With the previous patch (compiled with -O1; at commit e0ba68ba34f5623cfa3c61e2e1966971703297f5) I get about 6% (213/3200) chance to hit the bug:

$ for i in `seq 100`; do ./parallel.sh ./lf-t; done | grep 'Bail out' | wc -l
213

Without that patch (compiled with -O3 at commit 666565c7f00b0f39bbb459a428efd0093ed05fc8) the chance to hit the bug is about 38% (1247/3200):

$ for i in `seq 100`; do ./parallel.sh ./lf-t; done | grep 'Bail out' | wc -l
1247

On 10.2.41, without changing THREADS or CYCLES, just the vanilla source, 0.5% chance (16/3200):

$ cmake  ../mariadb-10.2/ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-O1" -DCMAKE_CXX_FLAGS=-O1
$ cmake --build . --parallel=100 --verbose
$ for i in `seq 100`; do ../parallel.sh unittest/mysys/lf-t; done | grep 'Bail out' | wc -l
16

parallel.sh is attached, but basically just spawns 32 processes in parallel and waits.
parallel.sh

clang --version
clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

The default CYCLE count is rather low to exhibit concurrency bugs, as there is not much time for the OS to generate many interleavings. The probability of this bug appearing is increased by increasing CYCLES. 30000 was just a tradeoff that worked in my case.

I will review the propositions (pull 1951 / 1952 later)

Comment by Daniel Black [ 2021-11-30 ]

Thank you very much mbeck and svoj

Comment by Daniel Black [ 2021-12-15 ]

note lf.hash test still failing - https://buildbot.mariadb.org/#/builders/295/builds/3971/steps/6/logs/stdio

if you see why please create/reopen a bug report.

Generated at Thu Feb 08 09:50:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.