[MDEV-27088] Server crash on ARM (WMM architecture) due to missing barriers in lf-hash Created: 2021-11-19 Updated: 2022-04-28 Resolved: 2021-11-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.7 |
| Fix Version/s: | 10.2.42, 10.3.33, 10.4.23, 10.5.14, 10.6.6, 10.7.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Martin Beck | Assignee: | Daniel Black |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ARMv8, crash, hash, lock-free, synchronization | ||
| Environment: |
ARMv8 architecture, Ubuntu bionic, LLVM-8 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
MariaDB server crashes on ARM (weak memory model architecture) while concurrently executing l_find to load node->key and add_to_purgatory to store node->key = NULL. l_find then uses key (which is NULL), to pass it to a comparison function. The specific problem is the out-of-order execution that happens on a weak memory model architecture. Two essential reorderings are possible, which need to be prevented. a) As l_find has no barriers in place between the optimistic read of the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124, the processor can reorder the load to happen after the while-loop. In that case, a concurrent thread executing add_to_purgatory on the same node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253 before key is loaded in l_find. b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both CAS succeed, the key field is written to by add_to_purgatory. However, due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253 can be moved ahead of the two CAS operations, which makes the value of the local purgatory list stored by add_to_purgatory visible to all threads operating on the list. As the node is not marked as deleted yet, the same error occurs in l_find. Stack trace
In frame #1 my_strnncoll_binary gets s=0x0 as a source pointer from l_find and hands that to memcmp, which causes a SEGV upon NULL pointer input. ReproduceThe crash can be observed by running the lf-t unittest, as well as the oltp_write sysbench benchmark upon the most recent repository version. The server is build using optimization level -O0 or -O1. No other change to the configuration is done. unittest/mysys/lf-t is run until the crash is triggered. The probability of hitting the bug can be increased easily. The build script (build.sh) used to generate the faulty binary is attached.
OutputThe current repository version produces a crash. (unfixed)
The version with the propsed fix finishes as expected.
SolutionFind attached a diff that changes three accesses to be atomic.
(1 & 2) The implicit SC fence of the atomic load of key and link disallows reordering the load of key after load of link. (3) The implicit RELEASE fence of the atomic write of key disallows reordering the write of key to happen before the CAS that removed the node from the list. Attached is also a minimal working example for the bug, which was used to verify the fixed version with GenMC, a model checker for weak memory models. The model checker can be applied to the attached MWE to get an execution on a weak memory model architecture that leads to the bug. It was also applied to a much more complete extraction of the actual lock-free hash from MariaDB to verify the data structure. However, due to ease of presentation, only the MWE is included here. If there is further interest in the extracted lf-hash and its verification with GenMC, it can be supplied. Relation to
|
| Comments |
| Comment by Daniel Black [ 2021-11-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thansk mbeck for the detail and why the previous only partially worked.
So I'm having trouble reproducing it. Your threads shows 30 despite being 2 in the descriptoin. With changes, this does passes mtr and lf-t in loop I do believe you however: https://github.com/MariaDB/server/pull/1951 (10.2) - please check I did the casts right here. Should the unit tests CYCLES be increased to your reproducer values? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Beck [ 2021-11-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The number of CYCLES and THREADS did influence the probability of hitting the bug during execution. I just reran with 30 threads and a cycle count of 30000 and hit the bug within a few seconds. With the previous patch (compiled with -O1; at commit e0ba68ba34f5623cfa3c61e2e1966971703297f5) I get about 6% (213/3200) chance to hit the bug:
Without that patch (compiled with -O3 at commit 666565c7f00b0f39bbb459a428efd0093ed05fc8) the chance to hit the bug is about 38% (1247/3200):
On 10.2.41, without changing THREADS or CYCLES, just the vanilla source, 0.5% chance (16/3200):
parallel.sh is attached, but basically just spawns 32 processes in parallel and waits.
The default CYCLE count is rather low to exhibit concurrency bugs, as there is not much time for the OS to generate many interleavings. The probability of this bug appearing is increased by increasing CYCLES. 30000 was just a tradeoff that worked in my case. I will review the propositions (pull 1951 / 1952 later) | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-12-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
note lf.hash test still failing - https://buildbot.mariadb.org/#/builders/295/builds/3971/steps/6/logs/stdio if you see why please create/reopen a bug report. |