Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6615

Incorrect barrier logic in mutex_exit path causes data corruption on AArch64

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 5.5.36
    • 5.5.40
    • None
    • None
    • Ubuntu 14.04 - Trusty on AArch64, Juno development board.

    Description

      Hi,
      We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at storage/xtradb/fil/fil0fil.c:5288:

      fil_node_complete_io(
      /*=================*/
              fil_node_t*     node,   /*!< in: file node */
              fil_system_t*   system, /*!< in: tablespace memory cache */
              ulint           type)   /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                      the node as modified if
                                      type == OS_FILE_WRITE */
      {
              ut_ad(node);
              ut_ad(system);
              ut_ad(mutex_own(&(system->mutex)));
       
              ut_a(node->n_pending > 0); <-- failure point
       
              node->n_pending--;

      An attached debugger gave the following backtrace:

      (gdb) bt full     
      #0  0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
          at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
              _sys_result = 0
              pd = 0x7fa2fff1a0
              pid = <optimised out>
              selftid = 5661
      #1  0x0000007fb1d4818c in __GI_abort () at abort.c:89
              save_stage = 2
              act = {__sigaction_handler = {sa_handler = 0x7f00000000, 
                  sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976, 
                    404, 1, 404, 0, 366924161824, 548195526816, 366921716804, 
                    366927454208, 3, 0, 548434850424, 366927869208, 366936283408, 
                    548195527344, 548434850424}}, sa_flags = 5288, 
                sa_restorer = 0xa2fff1a0}
              sigs = {__val = {32, 0 <repeats 15 times>}}
      #2  0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>, 
          system=<optimised out>, type=<optimised out>)
          at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
      No locals.
      #3  0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
          at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
              ret = <optimised out>
              fil_node = 0x7fb14a0e78
              message = 0x7fa54d4350
              type = 10
              space_id = 0
      #4  0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
          at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
              segment = 3
      #5  0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
          at pthread_create.c:314
              pd = 0x7fa2fff1a0
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120, 
                      548981639624, 548449456128, 0, 548449452032, 548195529312, 
                      548195527344, 548443965168, 8388608, 548449472512, 
                      548195527056, 13770210553321828185, 0, 13770210553602140361, 
                      0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
                    0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
                    prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
              not_first_call = 0
              pagesize_m1 = <optimised out>
              sp = <optimised out>
              freesize = <optimised out>
              __PRETTY_FUNCTION__ = "start_thread"
      #6  0x0000007fb1dd9c40 in clone ()
          at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
      No locals.

      Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

      Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
      http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

      A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

      We have the following in program order:

      1. mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
      2. protected work
      3. mutex_exit -> load-acquire lock, set it to 0 regardless.

      However, the following sequence of events can be observed by another core:

      1. mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
      2. some of the protected work
      3. mutex_exit -> load-acquire lock, set it to 0 regardless.
      4. some more of the protected work (not protected).

      The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

      I've attached an emergency patch that re-introduces __ sync_lock_release to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the __ atomic_* functions? Ideally:

      To acquire the lock:

      __atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)

      To release the lock:

      __atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)

      (which also worked on my test system).

      I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

      Cheers,
      –
      Steve Capper

      Attachments

        Issue Links

          Activity

            This is a duplicate of MDEV-6450, which was fixed in 10.0. We'll consider this task as a back port to 5.5 request.

            svoj Sergey Vojtovich added a comment - This is a duplicate of MDEV-6450 , which was fixed in 10.0. We'll consider this task as a back port to 5.5 request.

            Thanks for this bug report. Fixed in 5.5.40:

            revno: 4274
            revision-id: svoj@mariadb.org-20140829121411-1vcbqxr98r9tbgtp
            parent: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v
            committer: Sergey Vojtovich <svoj@mariadb.org>
            branch nick: 5.5
            timestamp: Fri 2014-08-29 16:14:11 +0400
            message:
              Backport from 10.0:
             
              MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
             
              This problem affects only debug builds on PPC64.
             
              There are at least two race conditions around
              rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
             
              - rw_lock_debug_waiters was loaded/stored without setting
                appropriate locks/memory barriers.
              - there is a gap between calls to os_event_reset() and
                os_event_wait() and in such case we're supposed to pass
                return value of the former to the latter.
             
              Fixed by replacing self-cooked spinlocks with system mutexes.
              These days system mutexes offer much better performance. OTOH
              performance is not that critical for debug builds.
            ------------------------------------------------------------
            revno: 4273
            revision-id: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v
            parent: sergii@pisem.net-20140825145819-i643q99zm0a6hjg2
            committer: Sergey Vojtovich <svoj@mariadb.org>
            branch nick: 5.5
            timestamp: Fri 2014-08-29 16:02:46 +0400
            message:
              Backport from 10.0:
             
              MDEV-6450 - MariaDB crash on Power8 when built with advance tool
                          chain
             
              InnoDB mutex_exit() function calls __sync_test_and_set() to release
              the lock. According to manual this function is supposed to create
              "acquire" memory barrier whereas in fact we need "release" memory
              barrier at mutex_exit().
             
              The problem isn't repeatable with gcc because it creates
              "acquire-release" memory barrier for __sync_test_and_set().
              ATC creates just "acquire" barrier.
             
              Fixed by creating proper barrier at mutex_exit() by using
              __sync_lock_release() instead of __sync_test_and_set().

            svoj Sergey Vojtovich added a comment - Thanks for this bug report. Fixed in 5.5.40: revno: 4274 revision-id: svoj@mariadb.org-20140829121411-1vcbqxr98r9tbgtp parent: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Fri 2014-08-29 16:14:11 +0400 message: Backport from 10.0:   MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64   This problem affects only debug builds on PPC64.   There are at least two race conditions around rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:   - rw_lock_debug_waiters was loaded/stored without setting appropriate locks/memory barriers. - there is a gap between calls to os_event_reset() and os_event_wait() and in such case we're supposed to pass return value of the former to the latter.   Fixed by replacing self-cooked spinlocks with system mutexes. These days system mutexes offer much better performance. OTOH performance is not that critical for debug builds. ------------------------------------------------------------ revno: 4273 revision-id: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v parent: sergii@pisem.net-20140825145819-i643q99zm0a6hjg2 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Fri 2014-08-29 16:02:46 +0400 message: Backport from 10.0:   MDEV-6450 - MariaDB crash on Power8 when built with advance tool chain   InnoDB mutex_exit() function calls __sync_test_and_set() to release the lock. According to manual this function is supposed to create "acquire" memory barrier whereas in fact we need "release" memory barrier at mutex_exit().   The problem isn't repeatable with gcc because it creates "acquire-release" memory barrier for __sync_test_and_set(). ATC creates just "acquire" barrier.   Fixed by creating proper barrier at mutex_exit() by using __sync_lock_release() instead of __sync_test_and_set().
            dannf dann frazier added a comment -

            Note that this has reappeared for AArch64 in 5.5.41:
            https://mariadb.atlassian.net/browse/MDEV-7658

            dannf dann frazier added a comment - Note that this has reappeared for AArch64 in 5.5.41: https://mariadb.atlassian.net/browse/MDEV-7658

            People

              svoj Sergey Vojtovich
              SteveCapper Steve Capper
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.