Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27088

Server crash on ARM (WMM architecture) due to missing barriers in lf-hash

    XMLWordPrintable

Details

    Description

      MariaDB server crashes on ARM (weak memory model architecture) while concurrently executing l_find to load node->key and add_to_purgatory to store node->key = NULL. l_find then uses key (which is NULL), to pass it to a comparison function.

      The specific problem is the out-of-order execution that happens on a weak memory model architecture. Two essential reorderings are possible, which need to be prevented.

      a) As l_find has no barriers in place between the optimistic read of the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124, the processor can reorder the load to happen after the while-loop.

      In that case, a concurrent thread executing add_to_purgatory on the same node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253 before key is loaded in l_find.

      b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both CAS succeed, the key field is written to by add_to_purgatory. However, due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253 can be moved ahead of the two CAS operations, which makes the value of the local purgatory list stored by add_to_purgatory visible to all threads operating on the list. As the node is not marked as deleted yet, the same error occurs in l_find.

      Stack trace

      Thread 179 "lf-t" received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0xffff9e7cf1d0 (LWP 1256877)]
      0x0000fffff772c8d8 in memcmp () from /lib/aarch64-linux-gnu/libc.so.6
      (gdb) bt
      #0  0x0000fffff772c8d8 in memcmp () from /lib/aarch64-linux-gnu/libc.so.6
      #1  0x0000000000405580 in my_strnncoll_binary (cs=<optimized out>, s=0x0, slen=4,
          t=0xffff9e7ce968 "\234\267\277xz", tlen=<optimized out>, t_is_prefix=<optimized out>)
          at /workspace/mariadb-server/strings/ctype-bin.c:87
      #2  0x0000000000405038 in l_find (head=<optimized out>, cs=<optimized out>,
          hashnr=<optimized out>, key=<optimized out>, keylen=<optimized out>,
          cursor=<optimized out>, pins=<optimized out>, callback=<optimized out>)
          at /workspace/mariadb-server/mysys/lf_hash.c:133
      #3  0x0000000000403bec in l_delete (head=0xffffe8003500, cs=0x434458 <my_charset_bin>,
          hashnr=807645945, key=<optimized out>, keylen=4, pins=<optimized out>)
          at /workspace/mariadb-server/mysys/lf_hash.c:231
      #4  lf_hash_delete (pins=<optimized out>, key=<optimized out>, keylen=4,
          hash=<optimized out>) at /workspace/mariadb-server/mysys/lf_hash.c:455
      #5  test_lf_hash (arg=<optimized out>)
          at /workspace/mariadb-server/unittest/mysys/lf-t.c:155
      #6  0x0000fffff7b17088 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
      #7  0x0000fffff777fffc in ?? () from /lib/aarch64-linux-gnu/libc.so.6
      

      In frame #1 my_strnncoll_binary gets s=0x0 as a source pointer from l_find and hands that to memcmp, which causes a SEGV upon NULL pointer input.

      Reproduce

      The crash can be observed by running the lf-t unittest, as well as the oltp_write sysbench benchmark upon the most recent repository version. The server is build using optimization level -O0 or -O1. No other change to the configuration is done. unittest/mysys/lf-t is run until the crash is triggered.

      The probability of hitting the bug can be increased easily.

      The build script (build.sh) used to generate the faulty binary is attached.
      To increase the probability the following changes can be done.

      Output

      The current repository version produces a crash. (unfixed)

      ./lf-t
      # N CPUs: 96
      1..6
      # Testing lf_hash (without my_thread_init) with 30 threads, 3000000 iterations... 
      Bail out! Signal 11 thrown
      # 6 tests planned,  0 failed,  0 was last executed
      

      The version with the propsed fix finishes as expected.

      ./lf-t
      # N CPUs: 96
      1..6
      # Testing lf_hash (without my_thread_init) with 30 threads, 3000000 iterations... 
      # 0 mallocs, 30 pins in stack, 4096 hash size, 37857309 inserts, 721149 scans
      ok 1 - tested lf_hash (without my_thread_init) in 82.0496 secs (0)
      # 6 tests planned but only 1 executed
      

      Solution

      Find attached a diff that changes three accesses to be atomic.

      1. optimistic read of key in l_find lf_hash.cc#L117
      2. read of link for verification lf_hash.cc#L124
      3. write of key in add_to_purgatory lf_alloc-pin.c#L253

      (1 & 2) The implicit SC fence of the atomic load of key and link disallows reordering the load of key after load of link. (3) The implicit RELEASE fence of the atomic write of key disallows reordering the write of key to happen before the CAS that removed the node from the list.

      Attached is also a minimal working example for the bug, which was used to verify the fixed version with GenMC, a model checker for weak memory models. The model checker can be applied to the attached MWE to get an execution on a weak memory model architecture that leads to the bug. It was also applied to a much more complete extraction of the actual lock-free hash from MariaDB to verify the data structure. However, due to ease of presentation, only the MWE is included here. If there is further interest in the extracted lf-hash and its verification with GenMC, it can be supplied.

      Relation to MDEV-23510

      The problem described in MDEV-23510 is the result of the herein described bug.
      However, the commit attached to it d30c1331a18d875e553f3fcf544997e4f33fb943, is not fixing the bug, but hiding it, partially.

      The bug is not related to the cache line size, or splitting between two cache lines.
      Still the commit affects the crash.
      Moving link and key right next to each other in memory, plus removing the volatile declaration of link allows an optimization to take place during code generation for ARM that combines two load ldr instructions for consecutive memory positions into a single ldp instruction for loading a pair. This instruction loads link and key together in a single instruction. Reordering this instruction to happen after the loop verification (L124) is not possible, as the load of link needs to happen for the verification. As such the load of key cannot be reordered to be executed after verification.

      However, depending on a specific compiler optimization (Aarch64 Load Store Optimization) for correctness of the algorithm is risky and the algorithm is still incorrect. This can easily be seen as optimization levels -O0 and -O1 do not produce assembly containing the necessary pair load ldp of link and key. As such, both versions do crash on ARM. This fragile state might break due to changes in the struct LF_SLIST, changes to the algorithm or compiler changes.

      Attachments

        1. abstracted_crash_mwe_fixed.c
          2 kB
          Martin Beck
        2. abstracted_crash_mwe.c
          2 kB
          Martin Beck
        3. build.sh
          0.3 kB
          Martin Beck
        4. parallel.sh
          0.2 kB
          Martin Beck
        5. purgatory_fix.diff
          2 kB
          Martin Beck

        Issue Links

          Activity

            People

              danblack Daniel Black
              mbeck Martin Beck
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.