MariaDB server crashes on ARM (weak memory model architecture) while concurrently executing l_find to load node->key and add_to_purgatory to store node->key = NULL. l_find then uses key (which is NULL), to pass it to a comparison function.
The specific problem is the out-of-order execution that happens on a weak memory model architecture. Two essential reorderings are possible, which need to be prevented.
a) As l_find has no barriers in place between the optimistic read of the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124, the processor can reorder the load to happen after the while-loop.
In that case, a concurrent thread executing add_to_purgatory on the same node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253 before key is loaded in l_find.
b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both CAS succeed, the key field is written to by add_to_purgatory. However, due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253 can be moved ahead of the two CAS operations, which makes the value of the local purgatory list stored by add_to_purgatory visible to all threads operating on the list. As the node is not marked as deleted yet, the same error occurs in l_find.
In frame #1 my_strnncoll_binary gets s=0x0 as a source pointer from l_find and hands that to memcmp, which causes a SEGV upon NULL pointer input.
The crash can be observed by running the lf-t unittest, as well as the oltp_write sysbench benchmark upon the most recent repository version. The server is build using optimization level -O0 or -O1. No other change to the configuration is done. unittest/mysys/lf-t is run until the crash is triggered.
The probability of hitting the bug can be increased easily.
The build script (build.sh) used to generate the faulty binary is attached.
To increase the probability the following changes can be done.
- #define CYCLES 300000 in unittest/mysys/thr_template.c
- #define THREADS 2 in unittest/mysys/thr_template.c
- Disable the first 5 tests in unittest/mysys/lf-t.c, only keeping the last lf_hash (without my_thread_init) enabled
The current repository version produces a crash. (unfixed)
The version with the propsed fix finishes as expected.
Find attached a diff that changes three accesses to be atomic.
- optimistic read of key in l_find lf_hash.cc#L117
- read of link for verification lf_hash.cc#L124
- write of key in add_to_purgatory lf_alloc-pin.c#L253
(1 & 2) The implicit SC fence of the atomic load of key and link disallows reordering the load of key after load of link. (3) The implicit RELEASE fence of the atomic write of key disallows reordering the write of key to happen before the CAS that removed the node from the list.
Attached is also a minimal working example for the bug, which was used to verify the fixed version with GenMC, a model checker for weak memory models. The model checker can be applied to the attached MWE to get an execution on a weak memory model architecture that leads to the bug. It was also applied to a much more complete extraction of the actual lock-free hash from MariaDB to verify the data structure. However, due to ease of presentation, only the MWE is included here. If there is further interest in the extracted lf-hash and its verification with GenMC, it can be supplied.
The problem described in
MDEV-23510 is the result of the herein described bug.
However, the commit attached to it d30c1331a18d875e553f3fcf544997e4f33fb943, is not fixing the bug, but hiding it, partially.
The bug is not related to the cache line size, or splitting between two cache lines.
Still the commit affects the crash.
Moving link and key right next to each other in memory, plus removing the volatile declaration of link allows an optimization to take place during code generation for ARM that combines two load ldr instructions for consecutive memory positions into a single ldp instruction for loading a pair. This instruction loads link and key together in a single instruction. Reordering this instruction to happen after the loop verification (L124) is not possible, as the load of link needs to happen for the verification. As such the load of key cannot be reordered to be executed after verification.
However, depending on a specific compiler optimization (Aarch64 Load Store Optimization) for correctness of the algorithm is risky and the algorithm is still incorrect. This can easily be seen as optimization levels -O0 and -O1 do not produce assembly containing the necessary pair load ldp of link and key. As such, both versions do crash on ARM. This fragile state might break due to changes in the struct LF_SLIST, changes to the algorithm or compiler changes.